Integrating tracklang into Language Packages

The article Localisation with tracklang.tex describes the reason why I created the tracklang package. This article gives an example of how to integrate tracklang into a language package to allow other packages to pick up the document’s localisation settings.

If you are writing a package that needs to detect the document’s localisation settings, then read the next article Using tracklang in Packages with Localisation Features.

Let’s suppose I want to write a package called martian that provides language settings for a fictional Martian language. The root language label will be “martian”. The 639-1 language code is “mx”, both the 639-2 (B) code and 639-2 (T) language codes are “mxx”. The 639-3 code is “mas”. The Martian language is written in a left-to-right script called “Marshy” that has the ISO 15924 code “Qabx” (numeric code 949).

The Martian language comes in two dialects: Upper Martian (spoken in the Upper Marzy region) and Lower Martian (spoken in the Lower Marzy region). The ISO 3166 region codes are: “XX” (alpha-2), “XXB” (alpha-3) and 900 (numeric) for Upper Marzy, and “ZZ” (alpha-2), “XXG” (alpha-3) and 901 (numeric) for Lower Marzy.

The Martians have had difficulties sending messages to Earth because most of us don’t have the Marshy script installed, so they have started to adopt Earth scripts for some of their dialects. There are now both Latin and Cyrillic versions of the Upper Martian dialect, but they haven’t caught on in Lower Marzy yet, except amongst academics and a small group of Earth-watching enthusiasts.

The martian package needs to provide a command for the document author to specify the language or dialects that the document is written in. I don’t want to overly complicate this example by including code that’s not relevant to tracklang. If you’re writing a package then you most likely already know how to provide key=value options. This is very much a bare-bones example with a simplistic command:

\loadmartian{script}{region code}{variant}

This loads the Martian language with the given script, region code and variant. The variant part refers to the variant subtag of the BCP 47 language tag. A more practical real-world command would most likely have these tags set in an optional argument that may also include other elements, such as a sub-language or non-ISO modifier.

For example, if the document is written in a combination of Upper Martian Latin script using the “odyssey” orthography and Lower Martian using the defaults for that dialect:

\usepackage{martian}
\loadmartian{Latn}{XX}{odyssey}
\loadmartian{}{ZZ}{}

Most of the \loadmartian definition is omitted in this example, as the actual hyphenation patterns, captions etc aren’t relevant to tracklang. In order to let other packages know that this particular dialect has been loaded, \loadmartian needs to register the particular combination of root language, region, script, variant and other associated elements with tracklang so it can be added to the list of tracked dialects.

For convenience, the martian package provides some predefined dialect labels for commonly used combinations so that the document author can set up the document language through the package options. For example, the label “uppermartianln” could represent the Upper Martian dialect written in the Latin script.

\usepackage[uppermartianln]{martian}

Finally, the package has a command that can be used in the document to switch to a Martian dialect (that has earlier been identified with \loadmartian or through a package option).

\selectmartianlanguage{label}

Loading tracklang

The main bulk of tracklang’s code is in the generic TeX file tracklang.tex. If you are using plain TeX, you can simply input it:

\input tracklang.tex

(There’s a check at the start of the file to determine if it’s already been input.) There’s also a LaTeX style file tracklang.sty which inputs tracklang.tex, but it does more than that. It also provides package options for all predefined dialects and, if none are used, tries to detect if known language packages have been loaded.

Obviously, if your package is a language package then the language package detection is a needless waste of time. If you don’t want tracklang to pick up any document class options then it’s simplest to just input tracklang.tex. However, if you’re happy for tracklang to pick up document class options but you want to skip the known language package checks, then define \@tracklang@prelangpkgcheck@hook to do \endinput before you load tracklang.sty. For example:

\providecommand*{\@tracklang@prelangpkgcheck@hook}{\endinput}
\RequirePackage{tracklang}[2019/11/30]

Note that here I’ve stipulated at least v1.4 of tracklang (2019/11/30) as some of the commands mentioned below were introduced in that version.

In my martian example, I'm going to allow tracklang.sty to try detecting babel or polyglossia because the document author may want to have a mixture of Martian and Earth languages. So I just have:

\RequirePackage{tracklang}[2019/11/30]

Detecting Already Tracked Dialects

It’s possible the document author already tracked some document dialects before loading the language package. This may have been done by passing document class options that were picked up when tracklang.sty was loaded with \RequirePackage or they may have used texosquery to pick up their default locale settings before loading the language package. For example:

\input{texosquery}
\input{tracklang}
\TeXOSQueryLangTag{\langtag}
\TrackLanguageTag{\langtag}

\usepackage{martian}

You can find out if any languages have already been tracked using:

\AnyTrackedLanguages{true}{false}

This obviously isn’t practical for my martian package since the language has to be defined before it can be tracked, but if your package isn’t defining any new languages that tracklang doesn’t know about, then it’s possible that you may want to consider this possibility. If you do, then perform the check before any of your package code starts tracking dialects (for example, when processing package options).

If there are any tracked languages, you can iterate over all the tracked dialects with:

\ForEachTrackedDialect{cs}{body}

where cs is the control sequence that’s set to the dialect label at the start of each iteration in the loop. For example:

\AnyTrackedLanguages
{%
  \ForEachTrackedDialect{\thisdialectlabel}{%
% load hyphenation patterns, define hooks etc for this dialect
% ...
  }
}
{}% no tracked languages

Dialect Labels

The tracklang package has labels to identify root languages and labels to identify dialects. These make it possible to find the root language from a given tracked dialect label or to find all the tracked dialects for a particular root language. If a root language has a synonym that doesn’t provide any additional information then that synonym is a dialect of the root language.

So if you have a label identifying a root language that’s different from the root language label used by tracklang then you’ll need to add your label as a dialect of the tracklang label. For example, tracklang defines the “undetermined” language using:

\TrackLangNewLanguage{undetermined}{}{und}{}{}{}{Latn}

The babel package uses the label “nil” for an unspecified language so tracklang provides a dialect that’s essentially a synonym:

\TrackLangProvidePredefinedDialect{nil}{undetermined}{}{}{}{}{}

It may be that your dialect labels are less specific than tracklang’s. For example, in my martian package, I may simply want the label “lowermartian” (in caption hooks etc) regardless of the variant, modifier or sub-language. In which case, I need to provide a mapping from tracklang’s dialect label to the label used in the hooks. This is done with:

\SetTrackedDialectLabelMap{tracklang-label}{hook-label}

This will allow \TrackLangAddToHook and \TrackLangRedefHook to find the correct hook control sequence name. Note that you don’t need to provide a mapping if the hook label is the same as the root language label for that dialect.

You can obtain the hook label from the tracklang dialect label with:

\GetTrackedDialectToMapping{dialect}

This is fully expandable (so you can use it in \edef) and will expand to dialect if no mapping exists.

Defining New Scripts

The chances are that your language package will be using scripts that are already known to tracklang (defined in tracklang-scripts.tex). In which case, skip this section.

The script “Qabx” is actually defined in tracklang-scripts.tex (reserved for private use) but I want to redefine it as identifying the Marshy script. Any unknown scripts or scripts that need to be redefined should be added to a supplementary file and that file should be identified with \TrackLangAddExtraScriptFile. I’ve decided to call this file martian-scripts.tex so I need to identify it with:

\TrackLangAddExtraScriptFile{martian-scripts.tex}

The contents of this file is quite trivial in this example, as there’s only one new script:

\TrackLangScriptMap{Qabx}{949}{Marshy script}{LR}{}%
\endinput

The tracklang-scripts.tex file isn’t loaded automatically. It can be input if required either explicitly with \input or through the wrapper LaTeX package. In the event that it’s input part way through the document, any end of line characters should be commented out to avoid spurious spaces.

The \TrackLangAddExtraScriptFile command ensures that the named file is input at the end of tracklang-scripts.tex. If that file has already been loaded then \TrackLangAddExtraScriptFile will input the named file directly.

Defining New Regions

The tracklang-region-codes.tex file defines the ISO 3166 region codes that are recognised by tracklang. If all the regions that you support are listed in that file then skip this section.

As with new scripts, new regions are defined in a supplementary file. In this case, the file is identified with \TrackLangAddExtraRegionFile. This works in much the same way as \TrackLangAddExtraScriptFile.

I’ve decided to call the region file martian-region-codes.tex so I need to identify it with:

\TrackLangAddExtraRegionFile{martian-region-codes.tex}

In this file I need to define the Upper Marzy (XX) and Lower Marzy (ZZ) regions:

\TrackLangRegionMap{900}{XX}{XXB}%
\TrackLangRegionMap{901}{ZZ}{XXG}%
\endinput

Defining a New Root Language

There are approximately 200 root languages defined in tracklang.tex along with their associated ISO 639 codes, so the chances are that the languages your package supports are already defined, in which case skip this part.

A new root language is defined using:

\TrackLangNewLanguage{label}{639-1}{639-2 (T)}{639-2 (B)}{639-3}{3166-1}{default script}

If you want to be able to conveniently track the new language with \TrackPredefinedDialect then use instead:

\TrackLangDeclareLanguageOption{label}{639-1}{639-2 (T)}{639-2 (B)}{639-3}{3166-1}{default script}

This first does \TrackNewLanguage to define the language, and then does \TrackLangProvidePredefinedLanguage{label}, which defines \@tracklang@add@〈label to perform the actual tracking (used by \TrackPredefinedDialect).

In this example, the Martian language isn’t recognised by tracklang so it needs to be defined:

\TrackLangDeclareLanguageOption{martian}{mx}{mxx}{}{mas}{}{Qabx}

This internally performs the two steps:

\TrackLangNewLanguage{martian}{mx}{mxx}{}{mas}{}{Qabx}
\TrackLangProvidePredefinedLanguage{martian}

Note that the 639-2 (B) argument is empty as it’s identical to the 639-2 (T) code. The 3166-1 code is also empty as the language is spoken in multiple regions.

Tracking a Dialect

Tracking a dialect means registering a particular combination of language, region, script etc as being required by the document. For this, you need tracklang’s root language label and dialect label. If the root language hasn’t already been defined, you need to define it first (see above). The complete set of commands required to track a dialect are listed below. If an element (such as the region or variant) is missing, then skip the associated command. In the definitions below, dialect indicates the tracklang dialect label, root indicates the tracklang root language label and hook-label indicates your label that you use in your associated \captions〈hook-label command.

\AddTrackedDialect{dialect}{root}
Identifies dialect as a dialect of the given root language and adds dialect to the list of tracked dialects (which can be iterated over with \ForEachTrackedDialect). If the root language hasn’t already been identified by another tracked dialect, then root is also added to the list of tracked languages (which can be iterated over with \ForEachTrackedLanguage).
\AddTrackedLanguageIsoCodes{root}
Tracks the associated ISO language codes (identified when the root language was defined).
\AddTrackedCountryIsoCode{dialect}
If the ISO 3166-1 region code was supplied when the root language was defined, this will track that region (does nothing if no region was supplied). If no region was supplied when the root language was defined or if you want to associate the dialect with a different region, then use: \AddTrackedIsoLanguage{3166-1}{region code}{dialect} instead.
\AddTrackedIsoLanguage{code identifier}{value}{label}
Associates the given ISO code with the given dialect or root language. This is used internally by \AddTrackedLanguageIsoCodes and \AddTrackedCountryIsoCode but may be used explicitly for codes that haven’t already been supplied for the dialect or its root language. This will mostly only be required to assign a region code to a dialect.
\SetTrackedDialectVariant{dialect}{variant subtag}
Associates the given BCP 47 variant subtag with the given dialect.
\SetTrackedDialectModifier{dialect}{modifier}
Associates the given modifier with the given dialect. (This is just extra information that doesn’t fit the BCP 47 specifications, such as “new” or “ancient”.)
\SetTrackedDialectScript{dialect}{script}
Associates the given script tag (alpha-4) with the given dialect.
\SetTrackedDialectSubLang{dialect}{tag}
Sets the sub-language subtag for the given dialect.
\SetTrackedDialectAdditional{dialect}{value}
Sets the extension subtag for the given dialect.
\SetTrackedDialectLabelMap{dialect}{hook-label}
Assigns a mapping between the tracklang dialect label and your language package’s hook label.

For convenience, tracklang provides:

\TrackLangProvidePredefinedDialect{dialect}{root}{region}{modifier}{variant}{hook-label}{script}

This defines a command called \@tracklang@add@〈dialect that performs the above commands (where the associated information has been provided). The dialect can then be later tracked with \TrackedPredefinedDialect{dialect}.

For example, my martian package might provide the following predefined dialects:

\TrackLangProvidePredefinedDialect{uppermartian}{martian}{XX}{}{}{}{}
\TrackLangProvidePredefinedDialect{lowermartian}{martian}{ZZ}{}{}{}{}
\TrackLangProvidePredefinedDialect{uppermartianln}{martian}{XX}{}{}{}{Latn}
\TrackLangProvidePredefinedDialect{uppermartiancy}{martian}{XX}{}{}{}{Cyrl}

I could then have package options that can automatically tracked these predefined dialects:

\DeclareOption{uppermartian}{%
  \TrackPredefinedDialect{uppermartian}% track this dialect
  % Set up hyphenation patterns, captions etc (or defer to later)
  %...
}
\DeclareOption{uppermartianln}{%
  \TrackPredefinedDialect{uppermartianln}% track this dialect
  % Set up hyphenation patterns, captions etc (or defer to later)
  %...
}
\DeclareOption{uppermartiancy}{%
  \TrackPredefinedDialect{uppermartiancy}% track this dialect
  % Set up hyphenation patterns, captions etc (or defer to later)
  %...
}

My \loadmartian command will need to track the dialect on-the-fly (rather than relying on a predefined dialect label). The first thing this command needs to do is create a unique tracklang dialect label for the exact combination of localisation elements. Since none of the elements contain any awkward special characters, the easiest method is to simply concatenate all the elements. In my example martian package, I’m only considering the script (first argument of \loadmartian), region (second argument), and variant (third argument). So I define the label like this:

  \def\thismartiandialectlabel{martian#1#2#3}

Let’s suppose now that my language hooks don’t take the variant into account, so I only have \captionsmartian (regionless Marshy script), \captionsuppermartian (Upper Martian with Marshy script), \captionslowermartian (Lower Martian with Marshy script), \captionsmartianln (Martian with Latin script) and \captionsmartiancy (Martian with Cyrillic script).

Assuming that the hook label is stored in \thismartianhooklabel, then the mapping needs to be assigned with \SetTrackedDialectLabelMap:

\SetTrackedDialectLabelMap{\thismartiandialectlabel}{\thismartianhooklabel}

My definition of \loadmartian needs some way of working out this hook label. For example:

Selecting the Current Language

My martian package needs a command to switch language (set hyphenation patterns, redefine \languagename, implement hooks etc). It’s useful if this command also does:

\SetCurrentTrackedDialect{label}

This means that the document author can now query the current localisation elements with commands like \CurrentTrackedRegion. The label argument may be the tracklang dialect label or the hook label that has previously been assigned with \SetTrackedDialectLabelMap or the root language label. In general, it’s better to use the tracklang dialect label as there may be multiple tracklang dialects that map to the same hook label or that use the same root language.

Example Code

First an example document that uses the martian package:

\documentclass{article}

\usepackage[uppermartianln]{martian}

\begin{document}
\selectmartianlanguage{uppermartianln}

Current language: \CurrentTrackedLanguage.
Current dialect: \CurrentTrackedDialect.
Current region: \CurrentTrackedRegion.
Current script: \CurrentTrackedDialectScript.
\end{document}

Next the martian-scripts.tex file:

\TrackLangScriptMap{Qabx}{949}{Marshy script}{LR}{}%
\endinput

And the martian-region-codes.tex file:

\TrackLangRegionMap{900}{XX}{XXB}%
\TrackLangRegionMap{901}{ZZ}{XXG}%
\endinput

Skeletal martian.sty file:

\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{martian}
% Uncomment the following line to prevent tracklang.sty from 
% testing if languages packages like babel have been loaded:
%\providecommand*{\@tracklang@prelangpkgcheck@hook}{\endinput}
\RequirePackage{tracklang}[2019/11/30]

% SECTION: DEFINING NEW SCRIPTS

% This section should be omitted if the scripts
% are already recognised by tracklang.

\TrackLangAddExtraScriptFile{martian-scripts.tex}

% SECTION: DEFINING NEW LANGUAGE

% This section should be omitted for root languages 
% recognised by tracklang.

\TrackLangDeclareLanguageOption{martian}{mx}{mxx}{}{mas}{}{Qabx}

% SECTION: DEFINING NEW REGIONS (ISO 3166 COUNTRY CODES)

% This section should be omitted for regions
% recognised by tracklang.

\TrackLangAddExtraRegionFile{martian-region-codes.tex}

% SECTION: PREDEFINED DIALECTS

% Omit this section if you don't want to predefine any 
% tracklang dialect labels. The hook-label argument is
% left empty here. It will need to be filled in if it should 
% be different from the first argument.

\TrackLangProvidePredefinedDialect{uppermartian}{martian}{XX}{}{}{}{}
\TrackLangProvidePredefinedDialect{lowermartian}{martian}{ZZ}{}{}{}{}

\TrackLangProvidePredefinedDialect{uppermartianln}{martian}{XX}{}{}{}{Latn}
\TrackLangProvidePredefinedDialect{uppermartiancy}{martian}{XX}{}{}{}{Cyrl}

% SECTION: PACKAGE OPTIONS

% In this example, the package options simply track the predefined
% dialects.

\DeclareOption{martian}{\TrackPredefinedDialect{martian}}
\DeclareOption{uppermartian}{\TrackPredefinedDialect{uppermartian}}
\DeclareOption{uppermartianln}{\TrackPredefinedDialect{uppermartianln}}
\DeclareOption{uppermartiancy}{\TrackPredefinedDialect{uppermartiancy}}
\DeclareOption{lowermartian}{\TrackPredefinedDialect{lowermartian}}

\ProcessOptions

% SECTION: LANGUAGE INITIALISATION

% This section is unrelated to tracklang but provide a stub
% to allow the example document to compile:

\newcommand{\@loadmartian}[1]{%
%
% This command definition is unrelated to tracklang and would deal
% with whatever needs to be done to set up the document language
% settings (hyphenation patterns, language hooks etc).
% The argument is the hook label.
% ...
}

% SECTION: DETECT TRACKED LANGUAGES

% Pick up any tracked dialects. These will include those identified
% in the package options but also any that may have been tracked
% before this package was loaded.

\AnyTrackedLanguages
{%
  \ForEachTrackedDialect{\thisdialectlabel}{%
% load hyphenation patterns, define hooks etc for this dialect
% ...
% Get the hook label from the tracklang dialect label
   \edef\thismartianhooklabel{\GetTrackedDialectToMapping{\thisdialectlabel}}%
   \@loadmartian{\thismartianhooklabel}%
  }
}
{}% no tracked languages

% SECTION: TRACK AND LOAD

% To avoid overly complicating this example, I'm just going 
% to use \ifx to detect if an argument is empty (none of the arguments
% here should contain \relax as the tags are all alphanumeric). A real package 
% would need a more appropriate test.

\newcommand{\loadmartian}[3]{%
  \def\thismartiandialectlabel{martian#1#2#3}% tracklang dialect label
% (You may want to test if this dialect label has already been tracked
% using \IfTrackedDialect.)
  \AddTrackedDialect{\thismartiandialectlabel}{martian}%
  \AddTrackedLanguageIsoCodes{martian}%
  \ifx\relax#1\relax
  \else
    \SetTrackedDialectScript{\thismartiandialectlabel}{#1}% script
  \fi
  \ifx\relax#2\relax
  \else
    \AddTrackedIsoLanguage{3166-1}{#2}{\thismartiandialectlabel}% region
  \fi
  \ifx\relax#3\relax
  \else
   \SetTrackedDialectVariant{\thismartiandialectlabel}{#3}% variant
  \fi
% Similarly for other elements, such as sub-language or modifier.
% Adapt as appropriate.
% ...
% For simplicity, I'm just going to set the hook label to 'martian'.
% Adapt as appropriate.
 \def\thismartianhooklabel{martian}%
% Omit the mapping if the hook label is the same as the dialect label.
 \ifx\thismartiandialectlabel\thismartianhooklabel
 \else
  \SetTrackedDialectLabelMap{\thismartiandialectlabel}{\thismartianhooklabel}%
 \fi
% That's all for the tracklang stuff. The rest of this command involves loading
% hyphenation patterns, defining \languagename, setting up language hooks
% etc.
  \@loadmartian{\thismartianhooklabel}%
}

% SECTION: LANGUAGE SELECTIONS

\newcommand{\selectmartianlanguage}[1]{%
  \SetCurrentTrackedDialect{#1}%
% The rest of this command definition is unrelated to tracklang and
% would deal with setting \languagename, implementing language hooks
% and switching hyphenation rules etc.
}

\endinput