Latest news 2021-09-06: new blog post "Legacy Documents and TeX Live Docker Images".

bib2gls application FAQ

What file encoding does bib2gls use? 🔗

Input Encoding:
The encoding of the .bib file should be set using the encoding comment within the .bib file. For example:
% Encoding: UTF-8

This is best placed at the start of the file where it can easily be found. You can also use the charset resource option but it makes more sense to have the encoding identified within the .bib file. (charset takes precedence.)

If the .bib file doesn’t contain an encoding comment and charset isn’t used then the default file encoding for the Java Runtime Environment is assumed (see below).

Output Encoding:
The encoding of the .glstex file (which is input by \GlsXtrLoadResources) may be identified in the .aux file by \glsxtr@texencoding{encoding}. The glossaries-extra package writes this information when it encounters the first instance of \GlsXtrLoadResources. It obtains encoding from \inputencodingname if it has been defined. If that command hasn’t been set but fontspec has been loaded then it will assume that the encoding is utf8. If neither \inputencodingname has been defined nor fontspec has been loaded then the information is omitted from the .aux file.

bib2gls has a set of mappings from the encoding labels used in \glsxtr@texencoding (such as utf8) and Java’s recognised encoding names (such as UTF-8). However there are some inputenc labels that aren’t recognised.

If the encoding information is missing from the .aux file or if the encoding label is unrecognised then bib2gls will use the default file encoding for the Java Runtime Environment (see below). If this is incorrect, then you can use the --tex-encoding command line switch to specify the encoding. Alternatively, you may want to consider changing Java’s default file encoding.

The Java default file encoding is determined by the file.encoding property. This usually matches the operating system’s default file encoding, but it can be changed with the Java option -Dfile.encoding=Encoding. (The syntax is -Dproperty-name=property-value.) This may be set in the JAVA_TOOLS_OPTIONS environment variable if you want this default for all your installed Java applications. The value of this environment variable is a space-separated list of Java options (switches that can be used when invoking the java command line application).

How you set or modify environment variables depends on your operating system. (See How do I set or change the PATH system variable? and substitute PATH for JAVA_TOOLS_OPTIONS.) For example, with Bash:

JAVA_TOOLS_OPTIONS=$JAVA_TOOLS_OPTIONS -Dfile.encoding=UTF-8
export JAVA_TOOLS_OPTIONS

Alternatively:

declare -x JAVA_TOOLS_OPTIONS=$JAVA_TOOLS_OPTIONS -Dfile.encoding=UTF-8

Note that the bib2gls Bash script invokes the application as follows:

if [ -z "$JAVA_TOOL_OPTIONS" ]; then
  exec java -Djava.locale.providers=CLDR,JRE,SPI -jar "$jarpath" "$@"
else
  exec java -jar "$jarpath" "$@"
fi

This means that with Unix-like systems, if you define JAVA_TOOLS_OPTIONS you may also need to add the java.locale.providers property for Java 8 to ensure the CLDR is listed first (otherwise with Java 8 the JRE provider will take precedence). Windows users with Java 8 may want to set this property in JAVA_TOOLS_OPTIONS regardless of whether or not they need to change the default file encoding.

If you have a problem with any non-ASCII characters not appearing correctly in your document:

2020-07-02 11:15:56


Permalink: https://www.dickimaw-books.com/faq.php?id=248
Alternative link: https://www.dickimaw-books.com/faq.php?itemlabel=bib2gls-encoding

Category: bib2gls application
Topic: General Queries