Bug Tracker 
ID | 297🔗 |
---|---|
Date: | 2025-03-13 09:15:40 |
Last update: | 2025-03-14 16:13:56 |
Status | Open Sign in if you want to bump this report. |
Category | datatool |
Version | 3.1 |
Summary | Compilation time increase by factor 7 with new datatool version 3.1 |
Sign in to subscribe to notifications about this report.
Description
I'm using datatool to read a CSV file and use its values in my LaTeX documents. I've noticed that compilation takes significantly longer version 3.1, an increase by factor 7, in fact.I cannot be certain of the reason for this increase; however, I've noticed based on console output that datatool 3.1 seems to get "stuck" on creating the .aux file for a good while. This is noticeable even with relatively short and simple CSVs.
For example, an MWE with a CSV of 15 keys and 30 lines of values leads to the following compilation times:
Run 1: 0:04.65
Run 2: 0:04.80
Run 3: 0:05.08
Now this doesn't seem like much with an MWE, but it's nevertheless a factor 7 increase in compilation time. This would probably not be too much of a problem if I were compiling just one document (though still inconvenient), but I'm compiling over 60 documents in total, so that's quite a time difference. For me, the increase means the difference between 450 seconds (or 7 minutes, 30 seconds) and 3200 seconds (or 53 minutes, 20 seconds).
I would very much appreciate if you could look into the reason for the increase and, if possible, provide a fix.
Note: I also have an mwe CSV file, but I cannot provide it through your upload form.
MWE
Download (1.05K)
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} % For generating lorem ipsum text % Load the CSV file \DTLsetseparator{,} \DTLloaddb{exampledb}{mwe_datatools.csv} \begin{document} \section*{Data from CSV} \begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|} \hline \textbf{Key1} & \textbf{Key2} & \textbf{Key3} & \textbf{Key4} & \textbf{Key5} & \textbf{Key6} & \textbf{Key7} & \textbf{Key8} & \textbf{Key9} & \textbf{Key10} & \textbf{Key11} & \textbf{Key12} & \textbf{Key13} & \textbf{Key14} & \textbf{Key15} \\ \hline \DTLforeach*{exampledb}{% \keyone=Key1,\keytwo=Key2,\keythree=Key3,\keyfour=Key4,\keyfive=Key5,\keysix=Key6,\keyseven=Key7,\keyeight=Key8,\keynine=Key9,\keyten=Key10,\keyeleven=Key11,\keytwelve=Key12,\keythirteen=Key13,\keyfourteen=Key14,\keyfifteen=Key15}{% \keyone & \keytwo & \keythree & \keyfour & \keyfive & \keysix & \keyseven & \keyeight & \keynine & \keyten & \keyeleven & \keytwelve & \keythirteen & \keyfourteen & \keyfifteen \\ \hline } \end{tabular} \section*{Lorem Ipsum} \lipsum[1-2] \end{document}
Evaluation
Update 2025-03-14: I've just uploaded datatool v3.2 to CTAN. It should reach the TeX distributions in a few days. When it's available, try:
\DTLread[ format=csv, csv-content=no-parse, name=exampledb ]{mwe_datatools.csv}This won't do any parsing of the cells and will assume the content is just text (with valid LaTeX syntax or no special characters), which should make loading faster.
Original evaluation:
If you want to include a csv file in your upload you can embed it in the filecontents
environment (as long as it's not too big). Just put it before the \documentclass
line. For example:
\begin{filecontents*}{sample-data.csv} Key1,Key2,Key3,Key4,Key5,Key6,Key7,Key8,Key9,Key10,Key11,Key12,Key13,Key14,Key15 ... \end{filecontents*} \documentclass{article}
I've created a test CSV file with 30 rows and 15 columns of randomly generated numbers and called the file sample-data.csv:
Key1,Key2,Key3,Key4,Key5,Key6,Key7,Key8,Key9,Key10,Key11,Key12,Key13,Key14,Key15 4493,884.600983440294,2330.70809665275,-4326.9755356328,-2190.60347054363,-4.32889457456986,-764.607303805498,307.94183066142,2152.79270420442,-266.8824320358,-917.976321490706,-4469.24178870653,-4076.2715341317,-2732.97808051424,2566.26267059342 4797,802.673035948977,-1779.84435461411,1190.266491089,-1566.17420186187,2768.79504390212,-2142.15643888977,-1788.4502952106,-4065.61059895331,3455.38903420877,3429.59174710188,-2289.90528405024,3642.55351383271,3822.49288879212,4309.05833753357 3656,-4041.45119259574,3453.24952858054,4583.74436390471,-4052.36018568065,986.183187661211,1438.22769600508,-4463.65265469367,3037.00571338563,-1601.24639850245,4318.33294102066,-547.931965558064,-4599.53436407506,-3092.38206746521,4210.93926185325 -2048,1968.79895079402,-3838.44181453021,-4374.46768689878,-3173.8588498942,3797.56605626493,-2819.07782401383,2945.80481305747,-519.721876974728,-1480.66235671397,1938.44262530796,-4242.63811583071,4726.77999588484,3033.89577122953,-4254.75947033497 -4443,3820.64017344831,4929.37293120516,1398.71456019414,2604.06281806585,1362.62717170865,-672.867997023907,-3782.05932688438,4216.67840674065,4854.08948874678,3269.71131278324,-1842.75330159529,-2459.87619893949,-3974.42183618296,-4979.93200403432 -4919,-1899.42821579972,1371.23520962646,-1661.39731484012,-1655.49648696135,-3661.58964134168,86.9254606848808,-889.757692933024,3282.20482087325,4233.11488125496,466.572840075941,2796.64860329742,1756.7013758562,3075.67486393484,4049.11523303383 -191,-3962.55793168358,-2947.64005569004,3875.18069330852,2787.82694819764,-3973.1630074909,-461.702664944106,-4788.47277753772,-4783.7268715374,-986.636978165549,4596.44762426212,-4707.75552203705,-2890.31767917354,32.7096274437363,3154.685244638 1924,-2460.03750446409,2721.51551167418,-4510.3576169107,-1712.46839340583,-627.315396847088,2845.95323175829,1060.83868860598,4635.97224669144,2203.22284545841,-4026.95936768048,-3726.27801935987,3811.70243348535,528.211252708636,4926.48521229533 -3452,970.411303021024,-4354.11453550859,3537.74114811962,3053.53207678014,3488.61125381248,-1353.4379779259,-1020.330794749,-3152.29188418467,1944.64591957484,4665.67233609236,2773.87781944931,4911.84510905061,-102.48447880727,3890.86791969785 -1082,-2027.47471174998,-923.051809393201,-4262.54940113456,-1073.73554981699,-902.509269003637,-2128.62263435213,-740.290646544999,-3285.95187550839,3170.04354015186,-2364.34466224154,4907.86732982951,-965.626917736628,-397.946778681267,1472.20711901397 1860,-1442.84336372358,-3571.22071481424,-441.077394108902,-2416.70321809028,-51.1986118541327,1414.25939901001,-248.335820171377,-4770.66174522125,3738.5444728141,1438.90434165886,682.347023785895,2812.23532955902,-3176.40506309935,2477.50870462557 664,521.558588729611,1903.27235897449,-576.291147767413,827.013299243617,-1491.01749858421,4633.55763475285,4174.82523284992,-3422.13700162219,3286.01357037424,-3055.31092076542,-3660.90715561821,2029.02874311491,2473.64960390382,-3240.17573415738 -974,-3306.84206100912,2960.76380240972,-1307.26904522113,1080.79590425721,-267.370258207507,-965.759435319065,1493.60968398618,1212.23686721283,1416.60372142681,4241.65145458513,-3232.62326863365,1544.1525616075,4122.45792269658,2669.75274984258 -573,1506.43412270103,1196.66692494079,3033.76354580987,4493.07086624536,4648.73910841707,3936.61978228465,-1931.05136621359,2132.81038390257,3283.08093618705,-2309.00014240873,4824.43644025121,4607.81882739745,-229.644301083987,3080.65917111691 737,-4725.64054782417,334.302167717801,-1750.815405655,5.69462382738766,2651.04864339172,2415.36570361834,1153.58765550866,3488.12808118833,-2646.7186164674,439.43222278859,-4351.98048909044,-1074.03004749759,-1622.77643286934,-3572.50758495521 2600,1927.30105411567,-1340.65065369377,848.317228341848,3775.45427032473,-734.599311211106,317.5813417689,-1465.24867221355,-4376.61515771993,-3593.71633437906,3678.5549938427,3144.08994130975,-3468.45515565619,-294.037232134805,3700.26618208605 -1376,1206.52104877905,-1198.12590384004,4204.5141351435,-4688.34970014296,1599.56577663635,-3192.86114028703,-2660.48979040512,2775.36035294904,4661.3362809893,-28.6034716875247,-293.572037240537,-1738.08734751969,2937.68251984332,-3500.21608131843 817,3786.16485437309,-3060.21809348323,-2244.54410535021,-3874.21998628319,2548.20882511996,-4148.64769648389,-2324.70096311463,-692.751836049084,-4302.98077186844,638.764430711944,-4001.11699048654,4205.58966425666,-1439.93568855887,-3671.11238502943 2974,-274.847223616987,4643.44861798796,-3907.04595869437,2218.34908861137,3700.18358416935,90.8929964397839,1957.38200008734,1067.4484117467,-1452.62644395963,-1735.46466993482,-3754.62631411835,-4734.16821466248,370.087201504603,-3150.01605146971 -4817,-4523.5328211783,-1406.7750048509,-152.647132330195,-4911.44567328234,4620.31799772159,-1464.22114656097,-3774.51805972935,943.106793330202,-2708.98632790736,-650.767274490036,1405.77382170736,3385.05764252545,-214.168175085269,3145.68992408205 -1597,-242.999711882561,3022.53900605248,2997.86230162901,1972.03296434456,-2895.31692837812,1682.07917711765,1807.97489500158,1812.96228471655,4272.73093361393,4268.9134025866,-3785.38615616641,3522.00501857485,-1443.30828393834,-2115.29633214258 -958,-2221.35783116368,1932.31402083668,2868.95153057625,-3989.84509534507,-2839.76529865637,-2451.10474873655,105.351542519081,2126.36667467628,4273.11160984875,-1046.63823320848,3789.45318058495,-3580.44545324141,-3541.66523711083,71.4736106601777 -4603,345.88255729183,-1320.25782483435,1134.44965177237,-1885.55078396579,1678.57224570405,3170.44586995568,4981.90083990522,2231.77856566235,-2803.87845950592,-1387.85071916441,-4869.99476714303,195.795867274704,677.34585581455,2942.03030553465 4958,-1193.28631831781,-1854.25514909376,-2001.66220289201,-212.690234370001,-3723.78644004076,-3055.28649756134,-4916.65041720918,-3407.34325496811,3741.0350400479,4937.98387662828,2877.26716388494,897.968591165039,-3289.68040587629,-1808.20511111541 996,-4090.73909941867,-1356.87123423242,1088.28202867564,-2346.009198783,4288.4718702966,286.157115382331,-468.651868691339,425.391854957446,-173.587156883457,-2041.56908300025,2230.96042138771,-2079.14727500093,1559.23287221341,-2910.82405535665 4889,3339.1186791059,1115.25729411994,3967.75527674418,-725.803077412692,376.052403022875,-21.2433505957679,-4147.42604224845,1841.8089492166,-2032.72405694857,4267.27461694981,3842.32471203756,1941.74874927235,3857.217011542,3049.35755598748 2802,-3877.63713276776,2269.57310211137,2345.89940481101,1257.03197956039,-4788.510590379,-1943.55162249405,1083.10969145592,1632.45909716348,3409.80025424521,-3044.8040580421,376.844894531416,-2778.34101843883,1404.97141709154,-1905.3599833806 1761,-1565.3894312786,-2377.08602793965,3058.43682621092,-877.346774324792,3510.48660100428,-3761.1331726017,3806.68780238285,-900.640804485846,1159.80933590716,-3061.31651455253,-3967.37786053446,4190.50718952434,-2646.06963015229,-1981.72172585537 2325,-4506.02804490146,502.435639759504,1012.32406458358,1341.89014841045,-661.64526254056,-2098.45965064307,-4666.44619610541,1351.90793420509,-4388.58784654812,-2024.78192867947,-4530.69707830089,3610.45062869326,-427.086856728636,3373.90672636115 116,-1642.37925052458,2748.31615482228,-2617.39533985949,2639.35402020955,2531.53573018924,-825.864661065339,2791.71575435822,-282.435998786567,-2105.23228272972,-1996.54849123405,-2097.92345604061,715.413919613681,3943.32967832956,2134.23656837421
Starting with your example and datatool v3.1 on TeXLive 2025 with no aux file the processing times for a single pdflatex call are:
real 0m7.720s user 0m7.618s sys 0m0.061sThe aux file simply contains:
\relax \gdef \@abspage@last{2}So I can't see any reason why there should be any delay in either reading or writing the aux file. I next tried with rollback to v2.23:
\usepackage{datatool}[=v2.32]
The processing times were:
real 0m0.819s user 0m0.697s sys 0m0.057sso I agree there is certainly a difference. I think the place where it seems to be stuck is actually on the
\DTLloaddb
line just before the aux file is read.
To test the package load time, I just loaded the package without doing anything other than printing the lipsum text:
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} \begin{document} \lipsum[1-2] \end{document}
The processing time was:
real 0m0.563s user 0m0.521s sys 0m0.039swith rollback the processing time was:
real 0m0.445s user 0m0.392s sys 0m0.050sSo the package loading time has gone up, but this is because new features have been added.
Now testing the time to just load the data:
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} \DTLsetseparator{,} \DTLloaddb{exampledb}{sample-data.csv} \begin{document} \lipsum[1-2] \end{document}
The processing time for v3.1 was:
real 0m7.483s user 0m7.410s sys 0m0.042s
With rollback:
real 0m0.712s user 0m0.661s sys 0m0.047s
The CSV file loading has been rewritten to allow for additional features and improved parsing, which now use regular expressions for matching scientific notation and regular expressions do unfortunately slow things down a bit. The old \DTLloaddb
command has been rewritten to use the new \DTLread
command. Using \DTLread
explicitly (with mapping):
\DTLread[format=csv,csv-content=literal,name=exampledb]{sample-data.csv}(this matches the behaviour of the old
\DTLloadrawdb
command).
The processing time was:
real 0m9.391s user 0m9.317s sys 0m0.038sIf you don't have any special characters in the CSV file that need converting, it's quicker to use:
\DTLread[format=csv,csv-content=tex,name=exampledb]{sample-data.csv}(this matches the behaviour of
\DTLloaddb
, so this is more directly comparable to your MWE).
The processing time:
real 0m7.460s user 0m7.381s sys 0m0.050sUsing
\DTLsetup{store-datum}
before \DTLread
also slows things down a little:
real 0m7.592s user 0m7.520s sys 0m0.042s
The changes in v3.0 were primarily focused on improving the support for parsing data types (including allowing scientific notation and better localisation support) and rewriting the sorting functions.
You can speed things up a bit if the data doesn't need parsing. For example, if the first column contains only integers and the other 14 columns all contain decimals with a decimal point and no number group separator:
\DTLsetup{store-datum} \DTLread[ format=csv, csv-content=tex, convert-numbers, data-types={integer,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal}, name=exampledb ]{sample-data.csv}
The processing times:
real 0m5.951s user 0m5.894s sys 0m0.034s
Now consider the numeric-data.csv test data, which has 1000 rows and four columns: integer, decimal, currency and scientific notation. The test document is now:
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} \DTLsetup{store-datum} \DTLread[ format=csv, csv-content=tex, name=exampledb ]{numeric-data.csv} \begin{document} \lipsum[1-2] \end{document}
Processing time:
real 0m45.237s user 0m45.014s sys 0m0.053sAgain, this is much longer than with the equivalent using rollback:
\documentclass{article} \usepackage{datatool}[=v2.32] \usepackage{lipsum} \DTLloaddb{exampledb}{numeric-data.csv} \begin{document} \lipsum[1-2] \end{document}
Processing time:
real 0m19.381s user 0m19.259s sys 0m0.055s
However, the final column containing scientific notation is treated as containing only strings.
The improvements come when the data needs to be processed in some way after it has been loaded. Staying with the above example using rollback, but adding \dtlsort
to sort the data according to the second field (which is a decimal value):
\documentclass{article} \usepackage{datatool}[=v2.32] \usepackage{lipsum} \DTLloaddb{exampledb}{numeric-data.csv} \DTLsort{Field2}{exampledb} \begin{document} \lipsum[1-2] \end{document}
Processing time:
real 46m5.909s user 45m54.246s sys 0m0.660sNow with v3.1:
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} \DTLsetup{store-datum} \DTLread[ format=csv, csv-content=tex, name=exampledb ]{numeric-data.csv} \DTLsortdata{exampledb}{Field2} \begin{document} \lipsum[1-2] \end{document}
Processing time:
real 0m57.222s user 0m56.912s sys 0m0.078sThis has reduced a single pdflatex run by 45 minutes.
Testing column aggregate (calculating standard deviation). First with rollback:
\documentclass{article} \usepackage{datatool}[=v2.32] \usepackage{lipsum} \DTLloaddb{exampledb}{numeric-data.csv} \begin{document} \DTLsdforcolumn{exampledb}{Field2}{\result} Standard deviation: \result. \lipsum[1-2] \end{document}
Processing time:
real 0m31.807s user 0m31.627s sys 0m0.067sNow with v3.1:
\documentclass{article} \usepackage{datatool}%[=v2.32] \usepackage{lipsum} \DTLsetup{store-datum} \DTLread[ format=csv, csv-content=tex, name=exampledb ]{numeric-data.csv} \begin{document} \DTLsdforcolumn{exampledb}{Field2}{\result} Standard deviation: \result. \lipsum[1-2] \end{document}
Processing time:
real 0m46.172s user 0m45.950s sys 0m0.049sThis is slightly slower. Alternatively:
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} \DTLsetup{store-datum} \DTLread[ format=csv, csv-content=tex, name=exampledb ]{numeric-data.csv} \begin{document} \DTLaction[name=exampledb,options={sd},key=Field2]{aggregate} Standard deviation: \DTLuse{sd}. Mean: \DTLuse{mean}. Sum: \DTLuse{sum}. Number of items: \DTLuse{count}. \lipsum[1-2] \end{document}
Processing time:
real 0m45.994s user 0m45.746s sys 0m0.063s
There isn't much difference in this case, except that this not only obtains the standard deviation, but also the intermediate calculations.
So the changes made for v3.0 have the most significant improvements for documents with large databases that require sorting. The reduction in build time is the result of improvements in parsing to detect scientific notation and localised formatting.
Since parsing CSV files has an impact on build time, an alternative approach is to use datatooltk to convert the CSV file to a file that can be quickly loaded. The version currently available on CTAN only supports the old dbtex v2.0 format. There's a new version of datatooltk that's currently under development that supports the newer dbtex v3.0 format. The document needs to be changed slightly:
\documentclass{article} \usepackage{datatool} \usepackage{lipsum} \DTLread[ format=dbtex, name=exampledb ]{numeric-data.dbtex} \begin{document} \lipsum[1-2] \end{document}
The document build is now (using the current development version of datatooltk):
datatooltk -o numeric-data.dbtex --csv-sep ',' --noliteral --csv numeric-data.csv --output-format dbtex-3 pdflatex mwe.tex
The processing time is now:
real 0m1.736s user 0m3.934s sys 0m0.167sBear in mind that the datatooltk call is only required when the CSV file is changed, so if your document requires multiple LaTeX calls then this will help to reduce the overall build time.
I'll look into providing an option to switch off all the extra parsing for documents that don't need it.
Comments
0 comments.
Add Comment
Page permalink: https://www.dickimaw-books.com/bugtracker.php?key=297