2.2.1 ⁑Loading Data From a CSV File
Most spreadsheet applications can export data to a CSV file. By default
datatool assumes that the data in this file is separated by commas
where the values are optionally delimited with the double-quote
character "
but if this isn't the
case you need to specify the separator using:
and the delimiter using:
A common alternative is to use the tab character ↹ as a separator but this is awkward to specify in LaTeX, so datatool provides
which is the same as \DTLsetseparator
{
⟨character⟩}
where
⟨character⟩ is a tab. Note that this
command changes the category code of the tab character. If you later
want to treat a tab as a regular space, you can reset the
category code after you have loaded the data using:
For example, suppose your data is saved in a CSV file in the form:
Surname;First Name;Title;Registration Number |Smith, Jr|;John;Mr;12345 Brown;Jane;Miss;12346 Brown;Andy;12347 Adams;|Z\"oe|then in your document, before you load this file, you need to write:
For convenience, all the examples in this book assume the default comma separator and double-quote delimiter.
isn't the same as:
If the CSV file contains extended characters,
make sure the file was saved with the same encoding as your LaTeX document and use the inputenc [40] and
fontenc [59] packages.4 (See Volume 1.)
The sample files
that accompany this
book all use UTF-8 encoding.
Make sure you load the inputenc package with the
utf8
option before you load
any of these comma-separated variable (CSV) files.
Once your data is in a CSV file you can load it into a datatool database using:
where the CSV file is called ⟨filename⟩. The argument ⟨db-name⟩ is the database label, as described above.
The \DTLloaddb
command assumes the CSV file either doesn't
contain any of TeX's special characters (see Volume 1)
or, if it does, they form correct LaTeX code. If this isn't the case,
instead of using \DTLloaddb
, you can use:
This is like \DTLloaddb
except that it performs a substitution
on nine of the ten special characters. (The backslash always retains
its special state.) The mappings are listed in
Table 2.1.
Character | Mapping |
---|---|
% | \% |
$ | \$ |
& | \& |
# | \# |
_ | \_ |
{ | \{ |
} | \} |
~ | \textasciitilde |
^ | \textasciicircum |
You can add extra mappings using:
For example, to replace the character £ with
\pounds
:
(Alternatively use the inputenc package.)
Examples
- Suppose your CSV file looks like:
Experiment,Result 1,$42.08\pm 0.1$ 2,$48.03\pm 0.2$
In this file, the $ character has been used to indicate in-line maths mode (see Volume 1). This should be left as it is when the data is loaded, so use\DTLloaddb
. - Suppose your CSV file looks like:
Title,Price "Duck & Goose's Adventures",$10.00 "The Return of Duck & Goose",$11.00
Now the characters&
and$
are intended literally and should not be interpreted by TeX, so you need to use\DTLloadrawdb
to ensure they are converted to the correct commands. - Now suppose your CSV file looks like:
Title,Price "Duck & Goose's Adventures",£10.00 "The Return of Duck & Goose",£11.00
Again you need to use\DTLloadrawdb
but before you do that you may need to define a new mapping for the £ character using\DTLrawmap
, as described above. (Or use the inputenc package.)
\DTLloaddb
and \DTLloadrawdb
can't
read data where there are EOL characters within a cell. For example,
neither command can read a CSV file that looks like:
"Duck and Goose's
Adventures",10.00
"The Return of
Duck and Goose",11.00
If you have a CSV file in this form, you can use datatooltk to convert the CSV file to a datatool (.dbtex) file, described in the next section.
You may have noticed that both \DTLloaddb
and
\DTLloadrawdb
have an optional argument. This is a key=value list.
Available keys are as follows:
noheader
- This is
a boolean key. The value can be either
false
(the CSV file has a header row, as in the examples above) ortrue
(the CSV file doesn't have a header row). The default isfalse
. If you want to set this value you may omit =true, so noheader=true is the same as just noheader. keys
- Each column must have
a unique label assigned to it. This
makes it easier to reference but, like the database label, the
column label mustn't contain special characters. The default action of
\DTLloaddb
and\DTLloadrawdb
is to use the value given in the header row as a label. This may be inappropriate, so you can set a different set of labels using this option. The value must be a comma-separated list of labels in the same order as the columns in the CSV file. For example,(note the braces).
You should use this option if the header row contains special characters. If the CSV file has no header and no label has been specified, a default label is generated in the form
\dtldefaultkey ⟨n⟩where ⟨n⟩ is the column number. By default,
\dtldefaultkey
is just “Column”, but you can change this by redefining the command (see Volume 1). Note that an empty item in thekeys
list indicates an empty label for that column. autokeys
- This is
a boolean key that was introduced in version 2.22. If true, all the
column labels will automatically be assigned the default label
\dtldefaultkey
⟨n⟩ described above. This is useful if you have a lot of columns where the header may contain special characters, and you don't want to have to list every column in thekeys
list. This means that you need to know the column index if you want to reference the data in it. headers
- Each column not
only has a unique label assigned to
it, but also has a header or
title. The column headers are used in commands such as
\DTLdisplaydb
described in §2.6 Displaying Tabulated Data. The default column headers are taken from the header row in the CSV file but if they aren't appropriate or your file doesn't have a header row, you can use this option to assign headers. As with thekeys
option, described above, the value must be a comma-separated list of header text in the same order as the columns in the CSV file. For example:(note the braces). An empty item in the
headers
list indicates an empty header for that column. For example:indicates that the second column has a blank header.
omitlines
- The value must be a non-negative number
indicating how many lines to skip at the start of the CSV file.
For example, if the CSV file contains two lines of unwanted
material at the start, then you need to use omitlines=2.
Examples
- Suppose your CSV file called products.csv looks like:
This is a list of products in my shop. Last edited 2014-01-22 Title,Price (£) "Duck & Goose's Adventures",10.00 "The Return of Duck & Goose",11.00
When you load this data into your document, you need to skip the first three lines as they don't form part of the data. You also need to map the pound symbol (£) and the ampersand (&). Additionally, it's a good idea to provide short unique labels to identify the columns:\DTLrawmap{£}{\pounds}% add mapping for £ symbol \DTLloadrawdb [% omitlines=3,% header row is on line 4 keys={title,price}% column labels ]% {products}% database label {products.csv}% filename
- Suppose your CSV file called products.csv looks like:
"Duck and Goose's Adventures",10.00 "The Return of Duck and Goose",11.00
Here there isn't a header row. You could assign labels as in the previous example:\DTLloaddb [% noheader,% no header row in file keys={title,price},% column labels headers={Title,Price (\pounds)}% column titles ]% {products}% database label {products.csv}% filename
However, if you're not interested in referencing any columns (for example, you just want to display the data in a table, as described in §2.6 Displaying Tabulated Data) you can let datatool assign default labels:
\DTLloaddb [% noheader,% no header row in file headers={Title,Price (\pounds)}% column titles ]% {products}% database label {products.csv}% filename
Footnotes
- ... packages.4
- Just use fontspec [76] for XeLaTeX.
This book is also available as A4 PDF or 12.8cm x 9.6cm PDF or paperback (ISBN 978-1-909440-07-4).