Latest news 2019-12-05: new blog post "RSS Feeds and Other Notifications."

Using tracklang in Packages with Localisation Features

The article Localisation with tracklang.tex describes the reason why I created the tracklang package. The article Integrating tracklang into Language Packages gives an example of how to integrate tracklang into a language package.

This article is for those who are writing a package that needs to detect the document’s localisation settings. The next article Writing a datetime2 Language Module provides a practical example.

The example package described here can also be found with the tracklang samples provided with the tracklang package.

I recommend that you require at least version 1.4 (2019/11/30) to ensure you’re using the newer version of \TrackLangIfFileExists which is internally used by \TrackLangRequireDialect.

\RequirePackage{tracklang}[2019/11/30]

This new version also provides \TrackLangRedefHook, which makes it easier to redefine a language hook (such as \date〈lang). To append to a hook (such as \captions〈lang) use \TrackLangAddToHook (which was added to version 1.3).

For this example, I want to create a package called animals that provides language-sensitive commands: \catname, \dogname and \ladybirdname. The default definitions are:

\newcommand\catname{cat}
\newcommand\dogname{dog}
\newcommand\ladybirdname{bishy-barney-bee}

Tracking Dialects (Optional)

In the event that the document author doesn’t want to load a language package, such as babel, but they do want a regional variation for the animals package, I’m going to supply a way for the document author to request particular dialects in the package options. Rather than define an option for every possible combination, it’s simpler to assume that any unknown option is a language identifier.

I could just do:

\DeclareOption*{\TrackLanguageTag{\CurrentOption}}

But if my animals package provides some other options as well and the document author accidentally misspells one this won’t alert them to the problem. As from tracklang version 1.3.9, there’s another command that can be used:

\DeclareOption*{%
 \TrackIfKnownLanguage{\CurrentOption}%
 {\PackageInfo{animals}{Tracking language `\CurrentOption'}}% successful
 {% failed
   \PackageError{animals}%
   {Unknown language specification `\CurrentOption'}%
   {You need to supply either a known dialect label or a valid language tag}%
 }%
}

This provides a more informative error if the document author makes a mistake.

Note that if you do allow the document author to track dialects in your package, those dialects will also be picked up by any subsequent package that also loads tracklang. You may prefer to omit this and instruct the document authors to identify their preferred dialects through the document class options. The disadvantage with this is that only predefined dialects can be passed globally through the document class. If a very specific locale is required, it may not have a corresponding predefined dialect label. (There are too many possible combinations to cater for everyone. Each new dialect label adds to the overall document load time.)

Detecting Tracked Dialects

You can use \AnyTrackedLanguages to find out if any languages have been tracked (that is, languages that have been identified as required by the document). This has two arguments. The first is done if there are tracked languages, otherwise the second argument is done. If there are tracked languages you can use \ForEachTrackedDialect to iterate over every tracked dialect (if associated information, such as region or variant is required) or you can use \ForEachTrackedLanguage to iterate over every tracked (root) language.

The tracklang package is designed to allow all the localisation information to be stored in files with the naming scheme name-localeid.ldf where localeid is described in more detail below. The name part is typically the same as the package name (animals in this example), but it doesn’t have to be (although using the package name helps to associate the file with the package). The file is loaded with:

\TrackLangRequireDialect[load code]{name}{label}

which attempts to find the appropriate localeid that best matches the dialect or language identified by label. (This command internally uses \IfTrackedLanguageFileExists to find the file.)

This means that my animals package simply needs to do:

\AnyTrackedLanguages
{%
  \ForEachTrackedDialect{\this@dialect}{%
    \TrackLangRequireDialect{animals}{\this@dialect}%
  }%
}
{% no tracked languages, default already set up
}

This will attempt to load a .ldf file for each tracked dialect. If a file isn’t found for the given dialect, a warning is issued. \TrackLangRequireDialect has an optional argument, which is the code to implement if the file is found. The default load code is:

\TrackLangRequireResource{\CurrentTrackedTag}

which loads the file (\CurrentTrackedTag is set by \IfTrackedLanguageFileExists to the localeid).

Locale Sensitive Files

The way tracklang attempts to determine the localeid part of the file name has changed slightly in version 1.4, but for a given dialect it will try to determine the file to load according to the list below in order of priority. If the dialect has an element missing (such as the region or sub-language) that item in the list will be skipped. The first two and final items aren’t skipped unless a match has already been found.

The first localeid in the list such that the file name-localeid.ldf is found (on TeX’s path) is the one chosen. In the list below, 639-1 indicates the ISO 639-1 code (e.g. “en”), 639-2 indicates the ISO 639-2 code, script indicates the alpha-4 language script (e.g. “Latn”), sublang indicates a sub-language tag (e.g. “yue”), region indicates the ISO 3166-1 region code (e.g. “GB”), and variant indicates the variant part of the BCP 47 language tag (e.g. “1996”). Other dialect information, such as the modifier, isn’t included.

  1. The first localeid to be tried is the actual BCP 47 language tag for the given dialect formed of all identified sub-parts (for example, “de-AT-1996”). Note the default script isn’t included unless it was explicitly identified when the dialect was tracked.
  2. The next localeid to be tried is the dialect label (for example, “british”). Note that this is tracklang’s dialect label, which may or may not be the same as babel’s.
  3. localeid is 639-1-sublang-script-region.
  4. localeid is 639-1-script-region.
  5. localeid is 639-1-sublang-region (if there’s no script or if the script is the default for the given language).
  6. localeid is 639-1-region (if there’s no script or if the script is the default for the given language).
  7. localeid is 639-1-sublang-script.
  8. localeid is 639-1-script.
  9. localeid is 639-1.
  10. localeid is 639-2-sublang-script-region.
  11. localeid is 639-2-script-region.
  12. localeid is 639-2-sublang-region (if there’s no script or if the script is the default for the given language).
  13. localeid is 639-2-region (if there’s no script or if the script is the default for the given language).
  14. localeid is 639-2-sublang-script.
  15. localeid is 639-2-script.
  16. localeid is 639-2.
  17. localeid is 639-3-sublang-script-region.
  18. localeid is 639-3-script-region.
  19. localeid is 639-3-sublang-region
  20. (if there’s no script or if the script is the default for the given language).
  21. localeid is 639-3-region (if there’s no script or if the script is the default for the given language).
  22. localeid is 639-3-sublang-script.
  23. localeid is 639-3-script.
  24. localeid is 639-3.
  25. localeid is just region
  26. localeid is 639-1-sublang-variant or localeid is 639-1-variant if sublang is missing.
  27. localeid is 639-2-sublang-variant or localeid is 639-2-variant if sublang is missing.
  28. localeid is 639-3-sublang-variant or localeid is 639-3-variant if sublang is missing.
  29. Finally, localeid is set to the root language label.

This looks like quite a long list, but most languages don’t have the ISO 639-3 code set (when it’s identical to the 639-2 code) so those cases will be skipped. If there’s no region, variant or sub-language then the list becomes much shorter. Any further refinements will require a more general .ldf file that tests the necessary locale elements.

For example, if the dialect is “british” then the file search will be in the order:

  1. animals-en-GB.ldf (BCP 47 language tag).
  2. animals-british.ldf (dialect label).
  3. animals-en-Latn-GB.ldf (639-1 language code, script, region).
  4. animals-en-GB.ldf (639-1 language code, region).
  5. animals-en.ldf (639-1 language code).
  6. animals-eng-Latn-GB.ldf (639-2 language code, script, region).
  7. animals-eng-GB.ldf (639-2 language code, region).
  8. animals-eng.ldf (639-2 language code).
  9. animals-GB.ldf (region).
  10. animals-english.ldf (language label).

Note that the fourth try is identical to the first try. There will occasionally be repetitions like this. If there wasn’t a match in the first instance then there obviously won’t be a match in the repetition so that check is either redundant or won’t be reached (because the file has already been found).

Another such case occurs when the dialect label is identical to the root language label. For example, if the dialect is “french” (which is the root language label and has no associated region), then the file search will be in the order:

  1. animals-fr.ldf (BCP 47 language tag).
  2. animals-french.ldf (dialect label).
  3. animals-fr-Latn.ldf (639-1 language code, script).
  4. animals-fr.ldf (639-1 language code).
  5. animals-fra-Latn.ldf (639-2 language code, script).
  6. animals-fra.ldf (639-2 language code).
  7. animals-french.ldf (language label).

With pre-v1.4 of tracklang, the “british” ordering is:

  1. animals-british.ldf (dialect label).
  2. animals-en-GB.ldf (639-1 language code, region).
  3. animals-eng-GB.ldf (639-2 language code, region).
  4. animals-en.ldf (639-1 language code).
  5. animals-eng.ldf (639-2 language code).
  6. animals-GB.ldf (region).
  7. animals-english.ldf (language label).

Note that the first item is now the dialect label and the position of the items with 639-2 codes has changed in the list.

As a general rule of thumb, I recommend against using the dialect label in the file name unless there’s something very specific to that label that’s different from a synonymous label (for example, “british” vs “UKenglish”).

On the other hand, for the root language the choice between language label and language code can depend on various factors. (For example, should the fallback file be animals-english.ldf or animals-en.ldf or animals-eng.ldf?) The advantage with using the language label is that it’s a useful final fallback, but if it happens to exactly match the dialect label then the file will be found on the second test.

It’s possible to have a .ldf file load another file, so you don’t necessarily need to have a lot of very specific files for every region and script combination. It depends on how major the differences are between each combination.

Let’s suppose for my animals package there are no regional differences, so I’m just going to use the root language label. I have a file called animals-english.ldf which contains:

\TrackLangProvidesResource{english}

\providecommand*{\englishanimals}{%
  \renewcommand*{\catname}{cat}%
  \renewcommand*{\dogname}{dog}%
  \renewcommand*{\ladybirdname}{bishy-barney-bee}%
}

\TrackLangAddToCaptions\englishanimals

and a file called animals-german.ldf which contains:

\TrackLangProvidesResource{german}

\providecommand*{\germananimals}{%
  \renewcommand*{\catname}{Katze}%
  \renewcommand*{\dogname}{Hund}%
  \renewcommand*{\ladybirdname}{Marienk\"afer}%
}

\TrackLangAddToCaptions\germananimals

The first line of each file identifies the file:

\TrackLangProvidesResource{localeid}[version]

The first argument should match the localeid part of the file name. The optional argument is the version information. For example:

\TrackLangProvidesResource{english}[2016/10/06 v1.2]

The final line uses:

\TrackLangAddToCaptions{code}

which tries to append the redefinitions (code) to the appropriate \captions〈label. Regardless of whether or not it finds the hook, this command will always do code at that point. This ensures the code is performed even if the document hasn’t loaded babel (or polyglossia).

This command uses the more general purpose:

\TrackLangAddToHook{code}{hook}

where hook is “captions”. There’s an analogous command for redefining a hook rather than appending to it (new to v1.4):

\TrackLangRedefHook{code}{hook}

(For example, to redefine \date〈label.)

In each case, the 〈label〉 part of the hook control sequence name is determined as follows:

  1. Check if the hook is defined when 〈label〉 is the tracklang dialect label. (This may or may not be the same as babel’s label.)
  2. If there’s a known mapping provided from tracklang’s label to babel’s, then babel’s label can be found from that. In which case, check if the hook is defined with babel’s label as 〈label〉.
  3. Check if the hook is defined when 〈label〉 is the root language label.

Note that there’s no warning if the hook isn’t found as there’s no guarantee that one exists. (For example, the document may not load babel or polyglossia.)

Consider for example the following:

\usepackage[jerseyenglish]{animals}

The “jerseyenglish” dialect label is recognised by tracklang as identifying “en-JE” (English language in Jersey). So \TrackLangAddToCaptions will first test if the command \captionsjerseyenglish exists. If it does, then that’s the hook chosen, but babel doesn’t define that label. The closest match is “british”, so tracklang provides a mapping from “jerseyenglish” to “british”.

Therefore, the next test checks if \captionsbritish exists. If it does, then that’s the hook chosen. However, it’s possible that the document loads polyglossia instead or it may load babel with just “english”. For example:

\documentclass[jerseyenglish]{article}
\usepackage[english]{babel}
\usepackage{animals}

Note that in this case the locale setting has been passed through the document class, so tracklang picks it up and avoids the tedious check to determine which languages babel has loaded.

So the final check performed is to test if \captionsenglish has been defined. If it has then that’s the hook used.

If none of those hooks are defined then tracklang assumes that there isn’t one available, as in the following example:

\documentclass[jerseyenglish]{article}
\usepackage{animals}
\begin{document}
\catname.
\end{document}

Let’s suppose now that in the US \ladybirdname should be “ladybug” and in the United Kingdom it should be “ladybird”. To allow for these cases, I can add two more files: animals-en-US.ldf and animals-en-GB.ldf.

There are various different possibilities. The first is simply to copy the animals-english.ldf and make the appropriate changes. So animals-en-GB.ldf contains:

\TrackLangProvidesResource{en-GB}

\providecommand*{\enGBanimals}{%
  \renewcommand*{\catname}{cat}%
  \renewcommand*{\dogname}{dog}%
  \renewcommand*{\ladybirdname}{ladybird}%
}

\TrackLangAddToCaptions\enGBanimals

and animals-en-US.ldf contains:

\TrackLangProvidesResource{en-US}

\providecommand*{\enUSanimals}{%
  \renewcommand*{\catname}{cat}%
  \renewcommand*{\dogname}{dog}%
  \renewcommand*{\ladybirdname}{ladybug}%
}

\TrackLangAddToCaptions\enUSanimals

Now, if the dialect is “british” (or “UKenglish”) the animals-en-GB.ldf file will be loaded, if the dialect is “american” (or “USenglish”) the animals-en-US.ldf file will be loaded, and for any other English dialect (such as “jerseyenglish”) the animals-english.ldf file will be loaded.

For this trivial example, this is a simple solution, but now let’s suppose we have a hundred animal names and more names may be added in future. In this case, it’s more efficient to just provide the differences in the animals-en-GB.ldf and animals-en-US.ldf files and use the root language file as a base.

For example, animals-en-GB.ldf would now contain:

\TrackLangProvidesResource{en-GB}[2016/10/06 v1.2]

\TrackLangRequireResource{english}

\providecommand*{\enGBanimals}{%
  \englishanimals
  \renewcommand*{\ladybirdname}{ladybird}%
}

\TrackLangAddToCaptions\enGBanimals

and animals-en-US.ldf would now contain:

\TrackLangProvidesResource{en-US}

\TrackLangRequireResource{english}%

\providecommand*{\enUSanimals}{%
  \englishanimals
  \renewcommand*{\ladybirdname}{ladybug}%
}

\TrackLangAddToCaptions\enUSanimals

This still has some redundancy, and there may potentially be code in the root language file that I don’t want included in the other files.

Another possibility is to create a base file for the language that only contains the elements common to all dialects in that language. This file could be called, for example, animals-english-base.ldf and might look like:

\TrackLangProvidesResource{english-base}

\providecommand*{\englishcommonanimals}{%
  \renewcommand*{\catname}{cat}%
  \renewcommand*{\dogname}{dog}%
}

Note that this file doesn’t alter the language hook and it won’t be explicitly loaded by \TrackLangRequireDialect but may be loaded by another resource file.

For example, animals-english.ldf might now contain:

\TrackLangProvidesResource{english}

\TrackLangRequireResource{english-base}%

\providecommand*{\englishanimals}{%
  \englishcommonanimals
  \renewcommand*{\ladybirdname}{bishy-barney-bee}%
}

\TrackLangAddToCaptions\englishanimals

animals-en-GB.ldf might now contain:

\TrackLangProvidesResource{en-GB}

\TrackLangRequireResource{english-base}%

\providecommand*{\enGBanimals}{%
  \englishcommonanimals
  \renewcommand*{\ladybirdname}{ladybird}%
}

\TrackLangAddToCaptions\enGBanimals

and animals-en-US.ldf might now contain:

\TrackLangProvidesResource{en-US}

\TrackLangRequireResource{english-base}%

\providecommand*{\enUSanimals}{%
  \englishcommonanimals
  \renewcommand*{\ladybirdname}{ladybug}%
}

\TrackLangAddToCaptions\enUSanimals

How you choose to arrange the language files depends on how much variation there is between locales, but bear in mind that each file can only be loaded once with \TrackLangRequireResource (in the same way that packages can only be loaded once with \usepackage or \RequirePackage).

Some languages can be written in multiple scripts. For example, “sr-Latn” represents Serbian written using the Latin script. With older versions of tracklang (pre-v1.4) the script wasn’t included in the file search but it can be obtained from the command \CurrentTrackedDialectScript, which can be referenced within the language resource file.

Be careful not to include the \CurrentTracked… commands in the argument of \TrackLangAddToCaptions, \TrackLangAddToCaptions or \TrackLangRedefHook as there’s no guarantee they’ll have the same values when the hook is actually used by babel or polyglossia (or whatever other language package might be in use).

Let’s suppose we add two more language resource files for the animals package. The first is animals-sr-Cyrl.ldf which contains:

\TrackLangProvidesResource{sr-Cyrl}

\providecommand*{\srCyrlanimals}{%
  \renewcommand*{\catname}{мачка}%
  \renewcommand*{\dogname}{пас}%
  \renewcommand*{\ladybirdname}{ладибирд}%
}

\TrackLangAddToCaptions\srCyrlanimals

The second is animals-sr-Latn.ldf which contains:

\TrackLangProvidesResource{sr-Latn}

\providecommand*{\srLatnanimals}{%
  \renewcommand*{\catname}{mačka}%
  \renewcommand*{\dogname}{pas}%
  \renewcommand*{\ladybirdname}{ladibird}%
}

\TrackLangAddToCaptions\srLatnanimals

With tracklang version 1.4, these files will be found. With older versions there will be a warning from tracklang that there is no support for the dialect. These older versions require a scriptless language filename that loads the appropriate file. For example, animals-serbian.ldf would contain:

\TrackLangProvidesResource{serbian}
\TrackLangRequireResource{sr-\CurrentTrackedDialectScript}

Alternatively the file could be called animals-sr.ldf with contents:

\TrackLangProvidesResource{sr}
\TrackLangRequireResource{sr-\CurrentTrackedDialectScript}

The disadvantage with this method is that this scriptless file can only be loaded once. This means that if both sr-Cyrl and sr-Latn have been tracked, only one will be supported. One way to avoid this is to omit the initial \TrackLangProvidesResource line so that the file simply contains the \TrackLangRequireResource line.

As from version 1.4, tracklang provides a verbose switch that can be used for debugging. The following code fragment can be used to find the complete search order for all tracked dialects. It uses “dummy” as a prefix to ensure that no file match is found. (This assumes that there are no files called dummy-localeid.ldf on TeX’s path.)

\TrackLangShowVerbosetrue % add extra info messages to .log file
\ForEachTrackedDialect
 {\thisdialect}%
 {%
   \thisdialect: 
   \IfTrackedLanguageFileExists{\thisdialect}{dummy}{.ldf}{found}{not found}.
 }%

This should produce the text “not found” in the PDF file for each dialect, which is just a useful confirmation. By ensuring that the file doesn’t exist for any possibility of localeid, the complete search list can be found in the log file. Note that with the verbose mode on the information messages are just written to the log file (not in the console).

Consider the following document:

\documentclass{article}

\usepackage{tracklang}

\TrackLanguageTag{zh-cmn-Hans-CN}
\TrackLanguageTag{hy-Latn-IT-arevela}
\TrackLanguageTag{sr-Latn-ME}
\TrackLanguageTag{sr-Cyrl-ME}
\TrackLanguageTag{serbian}

\begin{document}
\TrackLangShowVerbosetrue
\ForEachTrackedDialect
 {\thisdialect}%
 {%
   \thisdialect:
   \IfTrackedLanguageFileExists{\thisdialect}{dummy}{.ldf}{found}{not found}.
 }%
\end{document}

This provides examples of a locale with a sub-language code (“cmn”), a locale with a variant (“arevela”) and three Serbian-language locales, the first two with an explicit script and region and the third with just the root language.

The first dialect label is “zhcmnHansCN” so the first message block starts with:

Package tracklang Info: Finding file for dialect `zhcmnHansCN' on input line 19

This is followed by the dialect information written by \SetCurrentTrackedDialect, which occurs at the start of the file search.

Package tracklang Info: Setting current tracked dialect `zhcmnHansCN' on input
line 19.
(tracklang)             Language: `chinese'.
(tracklang)             ISO code: `zh'.
(tracklang)             Sub-lang: `cmn'.
(tracklang)             Modifier: `'.
(tracklang)             Variant: `'.
(tracklang)             Script: `Hans'.
(tracklang)             Region: `CN'.
(tracklang)             Additional: `'.
(tracklang)             Tag: `zh-cmn-Hans-CN'.  on input line 19.

This lists all the information that can be accessed with the \CurrentTracked… commands. (Note that it only shows the ISO 639-1 language code not the ISO 639-2 code, since tracklang treats 639-1 as the dominant language code if it exists.) The next set of messages comes from the file search as each test tries a different value of localeid. In this case the list is quite long because the locale has a sub-language and region (in addition to the language and script).

The block for “sr-Latn-ME” shows that the order of priority is:

  1. sr-Latn-ME (language tag)
  2. srLatnME (dialect label)
  3. sr-Latn-ME (639-1, script, region)
  4. sr-Latn (639-1, script)
  5. sr (639-1)
  6. srp-Latn-ME (639-2, script, region)
  7. srp-Latn (639-2, script)
  8. srp (639-2)
  9. ME (region)
  10. serbian (root language)

The “sr-Cyrl-ME” dialect has the same ordering but with “Latn” replaced with “Cyrl”. The “serbian” dialect, which is actually just the root language, has a shorter list:

  1. “sr” (language tag)
  2. “serbian” (dialect label)
  3. “sr-Cyrl” (639-1, default script)
  4. “sr” (639-1)
  5. “srp-Cyrl” (639-2, default script)
  6. “srp” (639-2)
  7. “serbian” (root language label)

Note that in the absence of a script, tracklang will fallback on the default script (if one was registered when the language was defined). In this case the default script is “Cyrl”. Not all languages have a default script. (For example, where there is disagreement over which script should be the default.) If I change the line:

\TrackLanguageTag{serbian}

to

\TrackLanguageTag{serbianc}

Then the list becomes:

  1. “sr-Cyrl” (language tag)
  2. “serbianc” (dialect label)
  3. “sr-Cyrl” (639-1, default script)
  4. “sr” (639-1)
  5. “srp-Cyrl” (639-2, default script)
  6. “srp” (639-2)
  7. “serbian” (root language label)

So, returning to my example files animals-sr-Cyrl.ldf and animals-sr-Latn.ldf then the dialects identified with “sr-Cyrl-ME” and “sr-Latn-ME” will find the respective .ldf file on the fourth attempt. They would both find the animals-sr.ldf file on the fifth attempt and the animals-serbian.ldf file on the final attempt (but these won’t occur as a file match is found on the fourth attempt).

In the case of the “serbianc” dialect, the corresponding animals-sr-Cyrl.ldf file is found on the first attempt. The animals-sr.ldf file would be found on the fourth attempt and the animals-serbian.ldf file would be found on the final attempt (but these won’t occur as a file match is found on the first attempt).

In the case of the “serbian” dialect that is simply the root language without an explicit script, then the animals-sr.ldf file would be found on the first attempt and the animals-serbian.ldf file would be found on the second attempt. The file animals-sr-Cyrl.ldf would be found on the third attempt, but this won’t occur as a file match has already been found.

This means that the root language without an explicit script will load the scriptless file (either animals-sr.ldf or animals-serbian.ldf, depending on what naming scheme is chosen). Since \CurrentTrackedDialectScript is “Cyrl” then the line:

\TrackLangRequireResource{sr-\CurrentTrackedDialectScript}
will load animals-sr-Cyrl.ldf. So it’s worth including the scriptless file as a fallback, which will work for both tracklang v1.4 and v1.3.

As with the English example, you may want to provide a base file for each script that contains common elements. Again, how you choose the set of provided files (for example, one per region and script) depends on how much common code they have.