Localisation with tracklang.tex
TeX provides the means to set up multiple hyphenation rules, so
you can select the hyphenation rule for a particular language
with the primitive \language〈number〉
where 〈number〉 is a number that identifies the
hyphenation rule. That’s pretty much the limit of localisation
support, but TeX itself doesn’t really provide any commands that
may vary according to language or region. The TeXbook contains
a definition of \today
that typesets the current
date in US format as an example, but TeX doesn’t come with
\today
predefined.
The LaTeX kernel also doesn’t define \today
.¹
This command is actually defined by LaTeX classes, such as
article.cls or report.cls.
However, the LaTeX user manual states that \today
produces the current date in the format July 29, 1985 so
many classes define it in this way (unless they are designed for a specific
locale).
Similarly, commands that produce fixed text, such as
\chaptername
and \seename
, are
not provided by the kernel but are defined by classes or packages
that need them. For example, \chaptername
is
defined in report.cls and
book.cls (which define
\chapter
) but not in article.cls
(which doesn’t define \chapter
). None of those
classes define \seename
but the makeidx package does.
With modern distributions, \languagename
is
defined before the class file is loaded. A simple test
document:
\show\languagename \documentclass{article} \begin{document} \end{document}
This will show the definition of \languagename
in the transcript. However, there’s no mention of that command
in the documented code of the LaTeX kernel (texdoc source2e
).
So the only localisation feature of the TeX core is the ability to switch hyphenation rules. It’s possible that this may be addressed with LaTeX3. (There is some reference to localisation in the “Case-Changing” section of the “LaTeX3 Interfaces” document.)
It might be useful here to view development in a historical context:
- November 1967
- ISO/R 639:1967 Symbols for languages, countries and authorities was published (now withdrawn).
- 1978
- TeX was first released.
- 1985
- LaTeX was first released.
- October 1986
- ISO 8879:1986 Information processing — Text and office systems — Standard Generalized Markup Language (SGML) was published.
- March 1988
- ISO 639:1988 Code for the representation of names of languages was published (now withdrawn).
- 1989
- The World Wide Web (WWW) was invented. (Berners-Lee released his WWW software in 1991.)
- 1993
- HTML was first released.
- 1994
- LaTeX2e was released.
- March 1995
- RFC 1766: Tags for the Identification of Languages was published. (Became obsolete with the publication of RFC 3066 in January 2002.)
- January 1997
- RFC 2070: Internationalization of the Hypertext Markup Language was published. (Became obsolete with the publication of RFC 2854.)
- July 2002
- ISO 639-1:2002 Codes for the representation of names of languages — Part 1: Alpha-2 code was published.
As can be seen from this timeline, TeX and LaTeX were first developed while the standards for identifying languages and other localisation information were still being formed.
The original LaTeX2.09 user manual (1985) doesn’t give much help:
See the Local Guide to find out if any foreign language versions of LaTeX are available for your computer.
The updated LaTeX2e user manual (1994) suggests the use of
the babel package and directs the reader to
The LaTeX Companion. Following that reference, The LaTeX Companion
(1994) indicates a very simple user interface with two basic
commands: \selectlanguage{language}
and
\iflanguage{language}{true}{false}
(these days use the iflang
package to test the active language). The LaTeX Companion states:
Any language that you use in your document should be declared
on the \usepackage
command as a language
option.¹
The footnote reads:
¹In principle, since the language(s) in which a document
is written is a global characteristic of the document in question,
it makes good sense to declare it on the \documentclass
command.
This is a really important point and one that’s not just pertinent to TeX or LaTeX. If you inspect the HTML source code for this webpage, you’ll find:
<html lang="en-GB">
which globally declares the primary language for the page.
Unfortunately the evolution of the LaTeX language packages
have moved away from this key point. There’s no core framework
to globally register the document languages. Why is this a
problem? Surely babel and polyglossia etc deal with all the localisation
support? Well, actually, they don’t. They only provide
translations for common elements, such as
\chaptername
. There are now thousands of packages
on the Comprehensive TeX Archive Network (CTAN) and many of
them provide commands that produce fixed text or data in a
format that varies according to language or region.
Suppose I want to write a package that typesets invoices. This may have fixed text, such as “Description” or “Price”. It may need to display the currency sign and format the decimal part. The package therefore needs to know the document language in order to provide the relevant translations. However, there’s no standard mechanism for querying this information.
The simplest method from the package writer’s point of view is to get the document author to specify the particular language, and assume the document only has a single language. For example:
\usepackage[british]{myinvoice}
This can get rather frustrating for the document author if they require multiple packages that provide localisation support.
\usepackage[british]{foo} \usepackage[UKenglish]{bar} \usepackage[enGB]{baz} \usepackage[en-GB]{wibble} \usepackage[englishUK]{whatever}
Note that in the above, not only does each package require the localisation information in the option list but also each package has a different labelling system used to identify a particular locale.
Why can’t the package just test if
\captions〈language〉
has been defined?
Some do, but the code ends up quite complicated and it doesn’t
warn the user about unsupported languages. For example, suppose
I have a package that supports just English and French, then I
would need the following tests:
- Test if
\captionsamerican
is defined. - Test if
\captionsaustralian
is defined. - Test if
\captionsbritish
is defined. - Test if
\captionscanadian
is defined. - Test if
\captionsenglish
is defined. - Test if
\captionsnewzealand
is defined. - Test if
\captionsUKenglish
is defined. - Test if
\captionsUSenglish
is defined. - Test if
\captionsacadian
is defined. - Test if
\captionscanadien
is defined. - Test if
\captionsfrancais
is defined. - Test if
\captionsfrenchb
is defined. - Test if
\captionsfrench
is defined.
That’s just for two languages and it’s already complicated.
What if babel introduces new dialect
labels (e.g. southafrican
or belgique
)?
What if the document is using an unsupported language? For
example, if the document has loaded babel with french
and
ngerman
then my package will provide the French
support but will silently ignore the German selection. The lack
of warning may confuse the document author.
It would be really useful to have a list of all the document languages. While some language packages do provide such a list, it’s an undocumented internal command and, as such, can’t be relied upon.
For example, if the translator
package has been loaded, then \trans@languages
expands to a comma-separated list of languages (using translator’s labelling scheme). New
versions of polyglossia now store the
language list in \xpg@loaded
.
Recent versions of babel define
\bbl@loaded
, but this only contains a list of
languages that are identified in the package options. For
example, with:
\usepackage[british,naustrian]{babel}
then \bbl@loaded
is defined as naustrian,british
but with:
\usepackage[nil]{babel} \babelprovide[import]{british} \babelprovide[import,main]{austrian}
then \bbl@loaded
is simply defined as nil
.
Without a list, the only way of determining which languages
have been loaded is to iterate over all known language labels and
test if \captions〈label〉
has been
defined. This is now a very long list that’s expanding over time.
What if the document isn’t using babel? Perhaps it’s using polyglossia instead. For example:
\usepackage{polyglossia} \setmainlanguage[variant=uk]{english}
This defines \captionsenglish
and \xpg@loaded
expands to english
, but there’s no
clue about the region. For my example invoice package, I need
to know the region in order to set the currency symbol.
[Update: as from v1.47, polyglossia
provides \xpg@bcp@loaded
which expands to
a comma-separated list of BCP-47 tags.]
The lack of a standardised way of conveniently identifying which languages have been loaded is a source of frustration for package writers who are trying to provide localisation support. This is the reason why I wrote the tracklang package.
The main bulk of the tracklang code is in tracklang.tex, which is generic TeX so it can be used with other TeX formats. The tracklang.sty file is a LaTeX package that internally inputs tracklang.tex, but it also provides package options (which can also be passed through the document class options) to conveniently track predefined dialect labels. This means that if the document author does:
\documentclass[british,naustrian]{article} \usepackage{babel} \usepackage{mypackage}% internally loads tracklang.sty
then tracklang.sty can just pick up the document class options without having to perform cumbersome tests. If the author does:
\documentclass{article} \usepackage[british,naustrian]{babel} \usepackage{mypackage}% internally loads tracklang.sty
then things are a little harder for tracklang.sty but if the version of
babel is new enough to provide
\bbl@loaded
then it’s not too hard as tracklang.sty just needs to iterate
over the provided list. If however the author does:
\documentclass{article} \usepackage[nil]{babel} \babelprovide[import]{british} \babelprovide[import,main]{austrian} \usepackage{mypackage}% internally loads tracklang.sty
then tracklang.sty can’t detect
the actual document languages. It only picks up nil
from \bbl@loaded
, which tracklang
considers a dialect of the undetermined language with ISO 639-2 code
“und”. As far as I can tell (at the time of writing this),
babel
doesn’t add the language labels to any internal list when they
are specified with \babelprovide
.
One possibility is for tracklang
to test if \bbl@loaded
is nil
and, if
so, iterate through all known labels and test if
\captions〈label〉
is defined. The
tracklang.tex file currently defines
around 200 root language labels and around 100 dialect labels.
That’s a lot of labels to iterate over and this doesn’t take into
account any new babel dialects that might
be added in future. It also doesn’t take into account the
possibility that the document author might do:
\documentclass{article} \usepackage[british]{babel} \babelprovide[import,main]{austrian} \usepackage{mypackage}% internally loads tracklang.sty
If tracklang.sty has to always iterate
over 300 labels on the off-chance that this situation has
occurred then it will result in a slower document compilation
time even if the document author hasn’t used
\babelprovide
. The document author will complain to the package author
that their package is slow to load, and the package author will
complain to me that tracklang is slow to
load, and we’ll all end up grumpy and frustrated with the
situation. Currently tracklang.sty
will only resort to this method if it detects that babel
has been loaded but \bbl@load
isn’t defined or if
polyglossia has been loaded but
\xpg@loaded
hasn’t been defined.
If the document author really wants to use
\babelprovide
then they’ll need to pass the
relevant options to tracklang.sty (or
use the tracking commands provided in tracklang.tex). This can be done in the
document class options, as in the earlier example, but this
will also pass those options to babel, which is presumably not what the document
author wants (otherwise they would just pass the options to
babel). Another possibility is to
load tracklang before babel:
\documentclass{article} \usepackage[british,austrian]{tracklang} \usepackage[british]{babel} \babelprovide[import,main]{austrian} \usepackage{mypackage}% internally loads tracklang.sty
The \RequirePackage{tracklang}
line in mypackage.sty will now do nothing since
tracklang.sty has already been
loaded. We’re now back to the situation where the document
author has to specify the required localisation multiple times.
Ideally it would be best if all language packages used a common framework to globally register the document localisation settings. If you are a language package author and you want to use tracklang for this then the article Integrating tracklang into Language Packages gives an example of how to do this.
If you’re a package author and you need your package to detect the document localisation settings then the article Using tracklang in Packages with Localisation Features gives an example of how to do this.
The final article in this set is Writing a datetime2 Language Module, which provides a practical example.
¹The definition of the
command \today
is shown in the LaTeX kernel documentation
(texdoc source2e
) but texdef -t latex -c minimal today
shows that it’s not part of the minimal core code.