Localisation with datatool v3.0+
The article Localisation with tracklang.tex describes the reason why I created the tracklang package. The article Integrating tracklang into Language Packages gives an example of how to integrate tracklang into a language package. The article Using tracklang in Packages with Localisation Features is for those who are writing a package that needs to detect the document’s localisation settings.
This article describes how to write a datatool v3.0+ language resource file. I recommend you first read the previous articles (particularly Using tracklang in Packages with Localisation Features) to understand the purpose of tracklang and how it’s designed to allow packages to input the appropriate language resource files.
Note that if you are using a pre-3.0 version of
datatool, there’s no localisation
support and the ldf files described below will be ignored.
As from version 3.0, the
datatool package
provides localisation support, although it’s actually the
underlying base package, datatool-base
that deals with loading tracklang
and using its interface to find and input the relevant
ldf files.
Unlike datetime2, which has files
such as datetime2-en-GB.ldf that
are tied to a particular language and region combination,
the datatool localisation support is
split into language-independent region files, such as
datatool-GB.ldf (provided with
datatool), and regionless
language files, such as datatool-english.ldf
(provided in a separate datatool-english module).
This separation of language and region means that, if a
language isn’t supported, then at least the region (if provided) can
be loaded. Likewise, if the region isn’t support but the language is
then that can be provided. This allows for partial support and it
also means that you can mix and match language and region.
The datatool-
file
is only required in cases where a setting is specifically tied
to both the language and region. For example, Canadian English
has a decimal dot whereas Canadian French has a decimal comma.
So the region file - .ldfdatatool-CA.ldf
defines the currency symbol (CAD) and the language file
datatool-english.ldf provides the
English language rules for sorting alphabetically, but
and additional file datatool-en-CA.ldf
is needed that sets up the default number symbols.
This means a slightly different approach is needed for
finding and loading the required files. The file search performed
by Therefore datatool-base.sty will instead
use The command For example, if the dialect is “british” then
In the case of en-CA, this will first do:
Each tracked dialect is iterated over with:
The language code must always be included when tracking with
tracklang. An error will occur if the
language part of the tag is missing. However,
datatool-base.sty will allow just the
region code in its The region files deal with setting the default currency and
number characters (group separator and decimal symbol). They
also provide a way of parsing numeric dates and times with the
default DMY, MDY or YMD applicable to the region. This doesn’t
require too much knowledge of the region as the information is
easily available from sources such as Wikipedia. The files are
much the same (although some may require the currency to be
defined) so it’s relatively easy for me to add them to the
datatool-regions bundle.
The language files not only deal with defining token list
type commands to expand to the appropriate word (or words) but
also need to deal with the rather more complex lexicography
associated with the language. This is beyond my skill set
for anything other than English. Additionally, support can be
added for parsing dates that include textual elements, such as
month names. These files are therefore distributed as separate
modules, which will need to be installed if any of the language
elements are required. I have provided
datatool-english, which can be used as
an example.
This separation of language and region not only makes it
easier to divide maintenance across those with the best skills
but also makes it possible to mix and match language and region
as well as allowing minimal regional support even if the
language support is missing.
The supplementary packages databib.sty
and person.sty also have localisation
support, but in this case there’s no need for the separation of
language and region.
The person package only needs
And the code to iterate over all dialects:
Similarly for databib.sty:
The region files are all bundled in datatool-regions which needs to be installed
separately. Originally, I was going to include them with
datatool but
even a small update to the
datatool bundle requires a
time-consuming testing and build sequence to make it ready for
upload, and if the package is already undergoing modifications
pending a new version, those changes need to be completed.
By distributing
datatool-regions separately, support
for a new region can quickly be added, without having to wait
for the completion of a new version of datatool.
As described above, the region filename should be in the form
datatool-
where .ldf is the two-letter
uppercase region code. The first line of the file should
use If applicable, the number group and decimal characters should
be set up. This part should be omitted if the number format for
the region also depends on the language. To allow for extra
flexibility an intermediate command is defined, which will be
added to the captions hook. For example,
datatool-GB.ldf
defines The currency symbol should then be defined, except for “EUR”
which is already defined in datatool-base.sty.
Once the currency symbol is defined, the language hook should be
adjusted to switch to that currency. Again, intermediate
commands allow for extra flexibility.
For example, datatool-GB.ldf defines:
The numeric date and time parsing commands can also be
defined, but this feature is still experimental.
For example, datatool-GB.ldf defines
The token list variable Again, intermediate commands allow for extra flexibility, so
a single command is defined to add all region settings to the captions hook. For
example, datatool-GB.ldf defines
The root language filename should be in the form
datatool-
where .ldf is tracklang’s root language label.
The first line of the file should
use The word handler hook This is implemented by providing the files
datatool-english-utf8.ldf
datatool-english-latin1.ldf and
datatool-english-ascii.ldf. The code
to load the applicable file is:
In the case of datatool-english.ldf
only the local handler macro is sensitive to the file encoding.
The rest of the definitions can all be placed in
datatool-english.ldf.
The commands need To obtain the first letter of a word using English orthography:
The letter group commands:
The date and time parsing commands can also be
defined, but this feature is still experimental.
The token list variable As with the region file, there is a single command to setup
everything that will be added to the captions hook:
In the case of specific language and region combinations,
the file should be provided with the other ldf files for that
language. For example, datatool-english
includes datatool-en-CA.ldf and
datatool-en-ZA.ldf.
Again, the file should first identify itself with
Then just add the applicable commands to the captions hook.
Again, intermediary commands make it easier to customize.
For example, datatool-en-CA.ldf
defines The language bundle should also provide support for
databib.sty and
person.sty (supplementary packages
supplied with datatool).
These are more straight-forward as they don’t typically require the
region, so only databib- and
.ldfperson- need be provided.
.ldf For example, datatool-english
provided databib-english.ldf
and person-english.ldf.
Overview
\TrackLangRequireDialect
will find the region file,
if installed, and will then stop the search, which means that the
language file won’t be found. For example, if both
datatool-GB.ldf and
datatool-english.ldf are installed, then
for the tracked locale “en-GB” only datatool-GB.ldf
will be found an input.
\TrackLangRequireDialectOmitDialectLabelOmitOnlyRegion
to find the separate region and language files. This command is new to
tracklang v1.6.3 so if the command isn’t
defined datatool-base.sty will fallback
on \TrackLangRequireDialect
with a warning.
\RequireDatatoolDialect
is essentially:
\newcommand\RequireDatatoolDialect[1]{%
\TrackLangRequireDialectOmitDialectLabelOmitOnlyRegion
[%
\ifdefempty{\CurrentTrackedRegion}{}%
{\TrackLangRequireResource{\CurrentTrackedRegion}}%
\TrackLangRequireResource{\CurrentTrackedTag}%
]%
{datatool}{#1}%
}
(The actual definition is slightly different to allow for
downgrading to use \TrackLangRequireDialect
.)
\CurrentTrackedRegion
will be “GB” so this will first
do:
\TrackLangRequireResource{GB}
This means that, if datatool-GB.ldf is
installed, (and hasn’t already be loaded) then it will be input.
After that, the more usual
\TrackLangRequireResource{\CurrentTrackedTag}
is implemented. However, the use of
\TrackLangRequireDialectOmitDialectLabelOmitOnlyRegion
will cause datatool-british.ldf
and datatool-GB.ldf to be omitted from the
search. This means that datatool-english.ldf
can be found and also input (if it hasn’t already been loaded).
\TrackLangRequireResource{CA}
which will load datatool-CA.ldf (if
installed and not already loaded). The current tracked tag is
“en-CA” so the next step is:
\TrackLangRequireResource{en-CA}
This will find and input datatool-en-CA.ldf
(if installed and not already loaded). This file is provided with
datatool-english and ensures that
the root language file is also loaded:
\TrackLangRequireResource{english}
\AnyTrackedLanguages
{%
\ForEachTrackedDialect{\@dtl@thisdialect}%
{%
\RequireDatatoolDialect{\@dtl@thisdialect}%
}%
}
{}%
(Again, the actual definition is slightly different to allow for
downgrading for old versions of tracklang.)
locales
package option. In this
case, it will automatically insert und-
at the start of
the tag when using \TrackLanguageTag
. The “und”
code corresponds to ”undetermined“ and datatool
includes datatool-undetermined.ldf
which will be loaded in this case. This allows for region-specific
localisation without any language support.
Supplementary Packages
\TrackLangRequireDialect
to find the files:
\newcommand*{\RequirePersonDialect}[1]{%
\TrackLangRequireDialect{person}{#1}%
}
\AnyTrackedLanguages
{%
\ForEachTrackedDialect{\@dtl@thisdialect}%
{%
\RequirePersonDialect{\@dtl@thisdialect}%
}%
}%
{}%
\newcommand*{\RequireDataBibDialect}[1]{%
\TrackLangRequireDialect{databib}{#1}%
}
and:
\AnyTrackedLanguages
{%
\ForEachTrackedDialect{\@dtl@thisdialect}%
{%
\RequireDataBibDialect{\@dtl@thisdialect}%
}%
}%
{}%
Region Files
\TrackLangProvidesResource
to identify the file
and version.
\datatoolGBSetNumberChars
.
\newcommand\datatoolGBsetcurrency{%
\DTLsetdefaultcurrency{GBP}%
\renewcommand\DTLCurrentLocaleCurrencyDP{2}%
}
\datatoolGBsettemporalparsers
and
\datatoolGBsettemporalformatters
.
\l_datatool_current_region_tl
should be set (in the captions hook) to the region code.
\DataToolBaseGB
:
\newcommand \DataToolBaseGB
{
\datatoolGBSetNumberChars
\datatoolGBsetcurrency
\datatoolGBsettemporalparsers
\datatoolGBsettemporalformatters
\tl_set:Nn \l_datatool_current_region_tl { GB }
}
and then adds it to the caption hook:
\TrackLangAddToCaptions{\DataToolBaseGB}
Language Files
\TrackLangProvidesResource
to identify the file
and version.
\DTLCurrentLocaleWordHandler
needs to be defined to ensure that word sorting will use the
ordering that matches the language’s alphabet. This depends on
the file encoding. Although LaTeX now defaults to UTF-8, it’s
helpful to also provide support for other encodings that might be
used with the language.
For example, datatool-english.ldf
supports UTF-8 and Latin-1. In the event that a different
encoding is used, US-ASCII is provided as a fallback.
% Try loading datatool-english-<encoding>.ldf
\TrackLangRequestResource{english-\TrackLangEncodingName}
{
% Not found, fallback on datatool-english-ascii.ldf
\TrackLangRequireResource{english-ascii}
}
Again, intermediary commands are provided. Each encoding file
defines \DTLenLocaleHandler
. The language hook then
needs to set \DTLCurrentLocaleWordHandler
to this
command.
\ExplSyntaxOn
. Don’t forget to
switch it off again with \ExplSyntaxOff
at the end.
\newcommand \DTLenLocaleGetInitialLetter [ 2 ]
{
\datatool_get_first_letter:nN { #1 } #2
}
\newcommand \DTLenSetLetterGroups
{
\renewcommand \dtllettergroup [ 1 ]
{ \text_titlecase_first:n { ##1 } }
\renewcommand \dtlnonlettergroup [ 1 ] { Symbols }
\renewcommand \dtlnumbergroup [ 1 ] { Numbers }
\renewcommand \dtlcurrencygroup [ 2 ] { Currency }
}
\l_datatool_current_language_tl
should be set (in the captions hook) to the language code.
\newcommand \DataToolBaseEnglish
{
\let
\DTLCurrentLocaleWordHandler
\DTLenLocaleHandler
\let
\DTLCurrentLocaleGetInitialLetter
\DTLenLocaleGetInitialLetter
\DTLenSetLetterGroups
\let
\DTLCurrentLocaleGetMonthNameMap
\datatool_en_get_monthname_map:n
\let
\DTLCurrentLocaleIfpmTF
\datatool_en_if_pm:nTF
\tl_set:Nn \l_datatool_current_language_tl { en }
\renewcommand \DTLandname { and }
}
\ExplSyntaxOff
\TrackLangAddToCaptions{\DataToolBaseEnglish}
\TrackLangProvidesResource
.
Then it should ensure the root language is loaded. The region file
should automatically be loaded before this file is loaded, but you
can make sure that it has been. For example,
datatool-en-CA.ldf has:
\TrackLangRequireResource{CA}
\TrackLangRequireResource{english}
\datatoolEnglishCASetNumberChars
to set the number group and decimal characters
and \DataToolBaseEnglishCA
, which is added to the
captions hook:
\TrackLangAddToCaptions{\DataToolBaseEnglishCA}