Localisation with tracklang.tex
TeX provides the means to set up multiple hyphenation rules, so
you can select the hyphenation rule for a particular language
with the primitive \language〈number〉
where 〈number〉 is a number that identifies the
hyphenation rule. That’s pretty much the limit of localisation
support, but TeX itself doesn’t really provide any commands that
may vary according to language or region. The TeXbook contains
a definition of \today
that typesets the current
date in US format as an example, but TeX doesn’t come with
\today
predefined.
The LaTeX kernel also doesn’t define \today
.¹
This command is actually defined by LaTeX classes, such as
article.cls or report.cls.
However, the LaTeX user manual states that \today
produces the current date in the format July 29, 1985 so
many classes define it in this way (unless they are designed for a specific
locale).
Similarly, commands that produce fixed text, such as
\chaptername
and \seename
, are
not provided by the kernel but are defined by classes or packages
that need them. For example, \chaptername
is
defined in report.cls and
book.cls (which define
\chapter
) but not in article.cls
(which doesn’t define \chapter
). None of those
classes define \seename
but the makeidx package does.
With modern distributions, \languagename
is
defined before the class file is loaded. A simple test
document:
\show\languagename \documentclass{article} \begin{document} \end{document}
This will show the definition of \languagename
in the transcript. However, there’s no mention of that command
in the documented code of the LaTeX kernel (texdoc source2e
).
So the only localisation feature of the TeX core is the ability to switch hyphenation rules. It’s possible that this may be addressed with LaTeX3. (There is some reference to localisation in the “Case-Changing” section of the “LaTeX3 Interfaces” document.)
It might be useful here to view development in a historical context:
- November 1967
- ISO/R 639:1967 Symbols for languages, countries and authorities was published (now withdrawn).
- 1978
- TeX was first released.
- 1985
- LaTeX was first released.
- October 1986
- ISO 8879:1986 Information processing — Text and office systems — Standard Generalized Markup Language (SGML) was published.
- March 1988
- ISO 639:1988 Code for the representation of names of languages was published (now withdrawn).
- 1989
- The World Wide Web (WWW) was invented. (Berners-Lee released his WWW software in 1991.)
- 1993
- HTML was first released.
- 1994
- LaTeX2e was released.
- March 1995
- RFC 1766: Tags for the Identification of Languages was published. (Became obsolete with the publication of RFC 3066 in January 2002.)
- January 1997
- RFC 2070: Internationalization of the Hypertext Markup Language was published. (Became obsolete with the publication of RFC 2854.)
- July 2002
- ISO 639-1:2002 Codes for the representation of names of languages — Part 1: Alpha-2 code was published.
As can be seen from this timeline, TeX and LaTeX were first developed while the standards for identifying languages and other localisation information were still being formed.
The original LaTeX2.09 user manual (1985) doesn’t give much help:
See the Local Guide to find out if any foreign language versions of LaTeX are available for your computer.
The updated LaTeX2e user manual (1994) suggests the use of
the babel package and directs the reader to
The LaTeX Companion. Following that reference, The LaTeX Companion
(1994) indicates a very simple user interface with two basic
commands: \selectlanguage{language}
and
\iflanguage{language}{true}{false}
(these days use the iflang
package to test the active language). The LaTeX Companion states:
Any language that you use in your document should be declared
on the \usepackage
command as a language
option.¹
The footnote reads:
¹In principle, since the language(s) in which a document
is written is a global characteristic of the document in question,
it makes good sense to declare it on the \documentclass
command.
This is a really important point and one that’s not just pertinent to TeX or LaTeX. If you inspect the HTML source code for this webpage, you’ll find:
<html lang="en-GB">
which globally declares the primary language for the page.
Unfortunately the evolution of the LaTeX language packages
have moved away from this key point. There’s no core framework
to globally register the document languages. Why is this a
problem? Surely babel and polyglossia etc deal with all the localisation
support? Well, actually, they don’t. They mostly just provide
translations for common elements, such as
\chaptername
, and the date format for
\today
. There are now thousands of packages
on the Comprehensive TeX Archive Network (CTAN) and many of
them provide commands that produce fixed text or data in a
format that varies according to language or region.
Suppose I want to write a package that typesets invoices. This may have fixed text, such as “Description” or “Price”. It may need to display the currency sign and format the decimal part. The package therefore needs to know the document language in order to provide the relevant translations. However, there’s no standard mechanism for querying this information.
The simplest method from the package writer’s point of view is to get the document author to specify the particular language, and assume the document only has a single language. For example:
\usepackage[british]{myinvoice}
This can get rather frustrating for the document author if they require multiple packages that provide localisation support.
\usepackage[british]{foo} \usepackage[UKenglish]{bar} \usepackage[enGB]{baz} \usepackage[en-GB]{wibble} \usepackage[englishUK]{whatever}
Note that in the above, not only does each package require the localisation information in the option list but also each package has a different labelling system used to identify a particular locale.
Why can’t the package just test if
\captions〈language〉
has been defined?
Some do, but the code ends up quite complicated and it doesn’t
warn the user about unsupported languages. For example, suppose
I have a package that supports just English and French, then I
would need the following tests:
- Test if
\captionsamerican
is defined. - Test if
\captionsaustralian
is defined. - Test if
\captionsbritish
is defined. - Test if
\captionscanadian
is defined. - Test if
\captionsenglish
is defined. - Test if
\captionsnewzealand
is defined. - Test if
\captionsUKenglish
is defined. - Test if
\captionsUSenglish
is defined. - Test if
\captionsacadian
is defined. - Test if
\captionscanadien
is defined. - Test if
\captionsfrancais
is defined. - Test if
\captionsfrenchb
is defined. - Test if
\captionsfrench
is defined.
That’s just for two languages and it’s already complicated.
What if babel introduces new dialect
labels (e.g. southafrican
or belgique
)?
What if the document is using an unsupported language? For
example, if the document has loaded babel with french
and
ngerman
then my package will provide the French
support but will silently ignore the German selection. The lack
of warning may confuse the document author.
It would be really useful to have a list of all the document languages. [Update 2025-01-27] Fortunately, since I first wrote tracklang (2014) and this article (2019), both polyglossia and babel now provide convenient commands.
New versions of polyglossia store the
language list in \xpg@loaded
but, better still,
polyglossia now also has \xpg@bcp@loaded
which is a list of
BCP 47 language tags.
In the case of babel,
the list
of all languages can be iterated over with
\LocaleForEach
. This can be combined with
babel’s \getlocaleproperty
to obtain the BCP 47 language tag.
If the translator
package has been loaded, then \trans@languages
expands to a comma-separated list of languages (using translator’s labelling scheme).
The lack of a standardised way of conveniently identifying which languages have been loaded is a source of frustration for package writers who are trying to provide localisation support. This is the reason why I wrote the tracklang package. When I write packages that require localisation support, I now don’t need to worry about which language package has been used by the document author
The main bulk of the tracklang code is in tracklang.tex, which is generic TeX so it can be used with other TeX formats. The tracklang.sty file is a LaTeX package that internally inputs tracklang.tex, but it also provides package options (which can also be passed through the document class options) to conveniently track predefined dialect labels. This means that if the document author does:
\documentclass[british,naustrian]{article} \usepackage{babel} \usepackage{mypackage}% internally loads tracklang.sty
then tracklang.sty can just pick up the document class options without having to perform cumbersome tests. If the author does:
\documentclass{article} \usepackage[british,naustrian]{babel} \usepackage{mypackage}% internally loads tracklang.styor
\documentclass{article} \usepackage[nil]{babel} \babelprovide[import]{british} \babelprovide[import,main]{austrian} \usepackage{mypackage}% internally loads tracklang.sty
then, as from tracklang v1.6.4,
the languages can be detected with \LocaleForEach
.
This has simplified things a great deal, but if \LocaleForEach
isn’t defined, then tracklang will
fallback on its old behaviour, which is to then test if
\bbl@loaded
or \xpg@bcp@loaded
or \xpg@loaded
is defined.
Note, however, that this requires all languages to be
identified before tracklang.sty is
loaded. This means that it doesn’t support “just in time” or
“lazy loading” in the document.
However, lazy loading is typically used for short fragments
of foreign text and that context is less likely to require
the full feature set for that language.
If you’re a package author and you need your package to detect the document localisation settings then the article Using tracklang in Packages with Localisation Features gives an example of how to do this.
The articles Writing a datetime2 Language Module and Localisation with datatool v3.0+ provide practical examples.
¹The definition of the
command \today
is shown in the LaTeX kernel documentation
(texdoc source2e
) but texdef -t latex -c minimal today
shows that it’s not part of the minimal core code.
You can, however, access the date and time information with
primitives, such as \day
, or with LaTeX3 commands,
such as \c_sys_timestamp_str
.