6.2 Using LaTeX to Sort and Collate Indexes or Glossaries (`datagidx` package)

§6.1. Using an External Indexing Application described how to create an index or glossaries using an external indexing application. Some users stumble when it comes to invoking the indexing application. There is an alternative where TeX does the sorting and collating. This by-passes the need to use makeindex, xindy or makeglossaries, but it's less efficient and takes longer to build your document. This section describes how to do this using the datagidx package. This package comes with my datatool bundle (at least version 2.13). The documentation for datagidx is included in the datatool user manual [17].

The datatool package allows you to define databases that you can access in your document. The datagidx package has a special interface to this facility that allows you to define databases for the purposes of indexing. These databases and their definitions must be defined in the preamble. In this section, the term “indexing” will be used to refer to either indexes or glossaries, as the same mechanism is used for both tasks.

A new indexing database is defined using:

\newgidx{<label>}{<title>}

where <label> is a label that uniquely identifies this database and <title> is the title to be used when the index (or glossary) is displayed. For example:

\newgidx{index}{Index}

creates a new database labelled index. When the index is displayed, it will have the section heading “Index”.

As in §6.1. Using an External Indexing Application, each term in the index (or glossary) database has an associated location list. This list is initially null. The locations are added to terms used in the document on the second LaTeX run. When you display the index, only those entries with a non-null location list or a cross-reference will be shown. The default location is the page number on which the entry was referenced. The datagidx package knows about the following page numbering styles: arabic, roman, Roman, alph and Alph. If your document has another type of numbering style, or if you want to use a different counter for the location, consult the datagidx section of the datatool manual [17].

Once you have defined the indexing database, you can now define terms associated with that database using

\newterm[<options>]{<name>}

where <name> is the term and <options> is a list of <key>=<value> options. The following keys are available:

database
Identifies the database in which to store this term. For example:

\newterm[database=index]{eigenvalue}
It can be somewhat cumbersome having to type the database for each new term. Instead you can define the default database using:

\DTLgidxSetDefaultDB{<label>}
For example:

\newgidx{index}{Index} \DTLgidxSetDefaultDB{index} \newterm{eigenvalue} \newterm{eigenvector}
label
A label uniquely identifying this term. If omitted the label is extracted from <name>.
sort
The sort key. If omitted this is extracted from <name>.
parent
The parent entry, if this is a sub-term. (The value should be the label identifying the parent, which must already be defined.)
text
How the entry should appear in the document text. If omitted, <name> is used. If present, <name> indicates how the term should appear in the index/glossary.
description
An optional associated description.
plural
The plural form of this term. If omitted this value is obtained by appending “s” to <name> (or the value of text if supplied).
symbol
An optional associated symbol.
short
An associated short form, if required. (Defaults to <name> if omitted.)
long
An associated long form, if required. (Defaults to <name> if omitted.)
shortplural
The plural of the associated short form. If omitted, the value is obtained by appending “s” to the short form.
longplural
The plural of the associated long form. If omitted, the value is obtained by appending “s” to the long form.
see
A cross-reference to a synonym. The value should be the label of another entry. This entry will not have a location list, just the reference to the other term.
seealso
A cross-reference to a closely related term. Both this term and the cross-referenced term should have a location list.

It's also possible to add your own custom keys. See the datagidx section of the datatool user guide [17] for further details.

As with \newglossaryentry, discussed in §6.1.2. Defining Glossary Entries, if the term starts with an accented letter (or a ligature) the letter must be grouped.

Example:

\newterm[label=elite,sort=elite]{{é}lite} \newterm [% plural={{œ}sophagi}, label={oesophagus}, sort={oesophagus}, description={tube connecting throat and stomach} ] {{œ}sophagus}

There is a shortcut command for defining acronyms:

\newacro[<options>]{<short>}{<long>}

where <short> is the abbreviation and <long> is the long form. The optional argument <options> is the same as for \newterm. This is equivalent to:

\newterm [% description={\capitalisewords{<long>}},% short={\acronymfont{<short>}},% long={<long>},% text={\DTLgidxAcrStyle{<long>}{\acronymfont{<short>}}},% plural={\DTLgidxAcrStyle{<long>s}{\acronymfont{<short>s}}},% sort={<short>},% <options>% ]% {\MakeTextUppercase{<short>}}

where

\DTLgidxAcrStyle{<long>}{<short>}

formats the full version of the acronym. This defaults to: <long> (<short>), and

\acronymfont{<text>}

is the font used to format acronyms. By default this just displays its argument, but can be redefined if you want the acronyms formatted in a particular style or font (such as small-caps). The other commands used above are:

\MakeTextUppercase{<text>}

This is defined by the textcase package and converts <text> to uppercase.

\capitalisewords{<text>}

This is defined by the mfirstuc package and capitalises the first letter of each word in <text>.

Example:

\newacro{svm}{support vector machine}

Once you have defined the terms in the preamble, you can later use them in the document:

\gls{[<format>]<label>}

\glspl{[<format>]<label>}

\Gls{[<format>]<label>}

\Glspl{[<format>]<label>}

These are similar to those described in §6.1.2. Displaying Terms in the Document, but they have a different syntax. Here <format> is the name of a text-block commands (such as \textbf) without the initial backslash that should be used to format the location for this reference. This is analogous to the | special character described in §6.1.1. Setting the Location Format.

There are also commands associated with acronyms:

\acr{[<format>]<label>}

\acrpl{[<format>]<label>}

\Acr{[<format>]<label>}

\Acrpl{[<format>]<label>}

⚠

Unlike the glossaries package, described in §6.1.2. Creating Glossaries, Lists of Symbols or Acronyms (glossaries package), there is a difference between datagidx's \gls and \acr. Here \gls will always display the value of the text field, whereas \acr will display the full form on first use (the text field) and the abbreviation on subsequent use (the short field).

You can also add terms to the index without creating any link text:

\glsadd{<label>}

This adds the term uniquely identified by <label>.

\glsaddall{<database name>}

This adds all the terms defined in the database uniquely identified by <database name>.

Note:

⚠

Unlike most commands, the optional part of the above commands occurs inside the mandatory argument.

Examples:

Given the elite and oesophagus examples defined earlier, I can reference those entries in the text as follows:

\Gls{elite} and \glspl{oesophagus}.

This produces:

Elsewhere, I might have the main topic about œsophagi:

The \gls{[textbf]oesophagus} connects the throat and the stomach.

This produces:

and the associated location will be typeset in bold.

Here's an example using the svm example defined earlier:

First use: \acr{svm}\@. Subsequent use: \acr{svm}\@. Full form: \gls{svm}.

This produces:

You can unset and reset acronyms using

\glsunset{<label>}

and

\glsreset{<label>}

To display the index or glossary or list of acronyms use:

\printterms[<options>]

where <options> is a comma-separated <key>=<value> list. Common options are:

database
The label uniquely identifying the database containing the relevant terms.
postdesc
This may have the value dot (put a full stop after the description, if there is a description) or none (don't put a full stop after the description).
columns
This value must be an integer greater than or equal to 1, indicating the number of columns for the page layout.
style
The style to use. There are a number of predefined styles, such as index or gloss. See the user guide [17] for further details.
namecase
Indicates whether any case change should be applied to the entry's name. Available values are: nochange (no change), uc (convert to uppercase), lc (convert to lower case), firstuc (convert the first letter to uppercase) and capitalise (capitalise each initial letter using \capitalisewords).

For a full list of options see the datagidx section of the datatool user guide [17].

Listing 20 can now be rewritten as follows:

Listing 21:

% arara: pdflatex: { synctex: on } % arara: biber % arara: pdflatex: { synctex: on } % arara: pdflatex: { synctex: on } \documentclass[oneside,12pt]{scrbook} \usepackage{datagidx} \newgidx{index}{Index} \newgidx{glossary}{Glossary} \newgidx{acronym}{Acronyms} \newgidx{notation}{Notation} \DTLgidxSetDefaultDB{glossary} \newterm [% description={a rectangular table of elements},% brief description plural={matrices}% the plural ]% {matrix}% the name \DTLgidxSetDefaultDB{acronym} \newacro{svm}{support vector machine} \DTLgidxSetDefaultDB{notation} \newterm [% label={not:set},% label description={A set},% sort={S}% ]% {\ensuremath{\mathcal{S}}} \DTLgidxSetDefaultDB{index} \newterm [% label={function},% text={function}% ]% {functions} \newterm [% see={sqrt},% ]% {square root} \newterm [% label={fn.sqrt}, parent={function} ]% {\texttt{sqrt()}} \newterm [% label={sqrt}, ]% {sqrt()} \newterm{tautology} \newterm{contradiction} % later in the document:

\Glspl{matrix} are usually denoted by a bold capital letter, such as $\mathbf{A}$. The \gls{matrix}'s $(i,j)$th element is usually denoted $a_{ij}$. \Gls{matrix} $\mathbf{I}$ is the identity \gls{matrix}.

First use: \acr{svm}\@. Next use: \acr{svm}\@. Full: \gls{svm}\@.

A \gls{not:set} is a collection of objects.

...

Some sample code is shown in Listing~\ref{lst:sample}. This uses the function \gls{fn.sqrt}.\glsadd{sqrt}

...

\begin{Definition}[Tautology]
A \emph{\gls{[textbf]tautology}} is a proposition that is always true for any value of its variables.
\end{Definition}

\begin{Definition}[Contradiction]
A \emph{\gls{[textbf]contradiction}} is a proposition that is always false for any value of its variables.
\end{Definition}

% At the end of the document: \backmatter \printterms[database=glossary] \printterms[database=acronym] \printterms[database=notation] \printbibliography \printterms[database=index]

Note that there is now no need to call either makeindex or makeglossaries. The only external application being called is biber for the bibliography.

⇦

⇧

⇨

This book is also available as A4 PDF or 12.8cm x 9.6cm PDF or paperback (ISBN 978-1-909440-02-9).

6.2 Using LaTeX to Sort and Collate Indexes or Glossaries (datagidx package)

6.2 Using LaTeX to Sort and Collate Indexes or Glossaries (`datagidx` package)