Bug Tracker
I’m currently working on a major new version of the datatool package. This may take a while. Please be patient. (Experimental version available for testing.)
ID | 206🔗 |
---|---|
Date: | 2022-06-16 13:45:31 |
Last update: | 2022-10-19 19:31:45 |
Status | Closed (Fixed) |
Category | glossaries-extra |
Version | 1.48 |
Summary | ActualText for entries with no explicit access value |
Sign in to subscribe to notifications about this report.
Description
Hi Nicola,I've stumbled upon another issue in my glossaries
journey (but I hope to stop hammering your tracker soon, as things are already really shining :-).
When using glossaries-accsupp
with hyperref
the value stored for the shortaccess
key of entries which did not receive an explicit value get some expanded value which is probably not intended (it's something like the character codes, I don't know what it's called).
The MWE illustrates this by creating two entries ABC
and Dr
, the first has an explicit shortaccess
key, while the second doesn't. The result is that the ActualText
works as intended for the first, but not for the second. The debug=showaccsupp
shows it on the PDF, but copy-pasting the content of the generated document gets us:
Aaaa Bbbb Cccc (ABC), ABC, ABC Doctor, \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.), \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.)Best regards,
Gustavo.
PS: I was not sure how to classify the report. I didn't find glossaries-accsupp
on the list, so I went with glossaries-extra
, since that is what I had used for testing.
MWE
Download (810B)
\documentclass{article} \usepackage{hyperref} \usepackage[ accsupp, abbreviations, shortcuts=abbr, % debug=showaccsupp, ]{glossaries-extra} \makeglossaries % I'm redefining for the intended use case (copy-paste of uppercased small % caps abbreviations), but the problem also happens without this redefinition, % as is visible with 'debug=showaccsupp'. \renewcommand*{\glsshortaccsupp}[2]{\glsaccessibility{ActualText}{#1}{#2}} \setabbreviationstyle[initialism]{long-short-sc} \setabbreviationstyle{long-only-short-only} \pdfstringdefDisableCommands{\renewcommand*{\glstextup}{}} \newabbreviation[ category=initialism, shortaccess={ABC}, ]{ABC}{abc}{Aaaa Bbbb Cccc} \newabbreviation{Dr}{Dr.}{Doctor} \begin{document} \ab{ABC}, \ab{ABC}, \ab{ABC} \ab{Dr}, \ab{Dr}, \ab{Dr} \end{document}
Evaluation
Fixed in glossaries-extra v1.49. Make sure that you also update to mfirstuc v2.08 and glossaries v4.50 at the same time.
Comments
11 comments.
Date: 2022-06-16 23:17:08
Repying to: Nicola Talbot 🦜 2022-06-16 21:36:11
Hi Nicola,
Sure! I'll be happy to do some testing.
True, my range cannot be but somewhat limited, since I'm new to the package and am far from knowing my way around it.
But I'll certainly take a look in the cases I have reported, and also in the features I'm using in the project I'm working on.
I'll report back when I have done so (hopefully tomorrow, I had intended to work on an AUCTeX style file for glossaries
, but I can rearrange that ;-).
Best regards,
Gustavo.
Date: 2022-06-17 15:16:58
Hi Nicola,
reporting on v1.48, as requested.
Regarding this particular issue, there is some change of behavior I could not track. The original MWE of the report now copy pastes the following content:
Aaaa Bbbb Cccc (abc), abc, abc Doctor, \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.), \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.)The difference being that the
ActualText
for the ABC
entry no longer works. But I'm probably just missing something here. Either some adjustment on the document that the new version requires, and that I couldn't figure out. Or a corresponding new version of glossaries-accsupp
which you haven't sent me. Either way, it was due that I reported the difference.Besides the issue specific comments, which I've made above and elsewhere, in general, I put the new version to test in my current working document. Flawless. I diffed the PDFs generated with both versions, and they are equal. For you to have an idea what to infer from this, it is a book-length document (~500 pages), and I'm using basically the abbreviations functionality from glossaries-extra
. It is a subset from it, of course, but a decent one: different types of abbreviations, categories for formatting, including some foreign entries, period discarding, etc. They are not many entries (~10), but a lot of references (>1000).
Furthermore, I took a general look, of course, and can see a lot of new interesting things coming. If I understood the purpose of \GlsXtrIfInGlossary
well, I'm already have some ideas for it. Also the manual has received a lot of care, which is great. All in all, I'm definitely looking forward to it. Thank you very, very much!! :-)
Date: 2022-06-17 16:20:29
Repying to: anonymous 2022-06-17 15:16:58
Thank you very much for your feedback. It's been very helpful.
The sequence \376\377\000D\000o\000c\000t\000o\000r
translates to the characters FE FF 00 D 00 o 00 c 00 t 00 o 00 r
. The first two (FE FF
) are the byte order mark. This looks like the UTF-16 string for "Doctor".
The reason the long form is being used is because the default value for the short access is obtained from \glsdefaultshortaccess{long}{short}
. This command is defined (in glossaries-accsup.sty) as:
\newcommand*{\glsdefaultshortaccess}[2]{#1}This is the appropriate value for the short form of an abbreviation with the default "E" accessibility tag.
If you want the short form instead, you need to redefine this:
\renewcommand*{\glsdefaultshortaccess}[2]{#2}Or if you don't want a default value:
\renewcommand*{\glsdefaultshortaccess}[2]{}I'm not sure what's causing the encoding issue at the moment, but the UTF-16 is definitely stemming from glossaries-extra rather than glossaries-accsup.
The following uses \show
to show the accessibility value:
\documentclass{article} \usepackage{hyperref} \usepackage[accsupp]{glossaries-extra} \renewcommand*{\glsshortaccsupp}[2]{\def\tmp{#1}\show\tmp \glsaccessibility{ActualText}{#1}{#2}} \newacronym{dr}{Dr}{Doctor} \begin{document} \gls{dr}, \gls{dr}. \end{document}This interrupts the document build and shows the following in the transcript:
> \tmp=macro: ->\376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r).Whereas:
\documentclass{article} \usepackage{hyperref} \usepackage{glossaries-accsupp} \renewcommand*{\glsshortaccsupp}[2]{\def\tmp{#1}\show\tmp \glsaccessibility{ActualText}{#1}{#2}} \newacronym{dr}{Dr}{Doctor} \begin{document} \gls{dr}, \gls{dr}. \end{document}Shows:
> \tmp=macro: ->Doctor.
Date: 2022-06-17 18:17:55
Repying to: Nicola Talbot 🦜 2022-06-17 16:20:29
I've just noticed that glossaries-extra redefines \glsdefaultshortaccess
to "long (short)". I think it would be better to revert this back to the original default in glossaries-accsupp. This may cause some backward compatibility issues, but it's more appropriate.
The default values for the accessibility fields are assigned from the original short and long values provided to \newabbreviation
or \newacronym
(not the values obtained after processing attributes such as insertdots
). The assignment is performed using \pdfstringdef
, if it has been defined, or with just \protected@edef
otherwise.
It's \pdfstringdef
that's converting the string to UTF-16BE. I've checked with the hyperref manual and that's the default setting, so the copy+paste problem may stem from your PDF viewer not supporting UTF-16BE. You can change the encoding, for example:
\hypersetup{pdfencoding=pdfdoc}but this has limited support.
If you want to strip formatting commands, you can append a local change to \glsxtrassignactualsetup
. For example:
\appto\glsxtrassignactualsetup{\letcs{\glstextup}{@firstofone}}(I'll add that one in.)
Date: 2022-06-17 19:06:18
Repying to: Nicola Talbot 🦜 2022-06-17 18:17:55
It'sMmh, if it was\pdfstringdef
that's converting the string to UTF-16BE. I've checked with thehyperref
manual and that's the default setting, so the copy+paste problem may stem from your PDF viewer not supporting UTF-16BE. You can change the encoding, for example:
\hypersetup{pdfencoding=pdfdoc}
but this has limited support.
\pdfstringdef
per se, I'd expect that all entries would have their ActualText
converted to UTF-16, and that's not the case, since ABC
does not get converted.And, indeed, you are right that without hardcoding \pdfstringdef
into \@gls@assign@actual
(as is done in glossaries-extra
) we get correct results:
\documentclass{article} \usepackage{hyperref} \usepackage[ accsupp, abbreviations, shortcuts=abbr, debug=showaccsupp, ]{glossaries-extra} \makeglossaries \renewcommand*{\glsshortaccsupp}[2]{\glsaccessibility[method=pdfstringdef]{ActualText}{#1}{#2}} \makeatletter \renewcommand{\@gls@assign@actual}{% \begingroup \glsxtrassignactualsetup \protected@edef\@gls@tmp{\endgroup \def\noexpand\@gls@actualshort{\glsxtrorgshort}% \def\noexpand\@gls@actuallong{\glsxtrorglong}% \def\noexpand\@gls@actualshortpl{\@gls@shortpl}% \def\noexpand\@gls@actuallongpl{\@gls@longpl}% }% \@gls@tmp } \makeatletter \setabbreviationstyle[initialism]{long-short-sc} \setabbreviationstyle{long-only-short-only} \renewcommand*{\glsdefaultshortaccess}[2]{#2} \pdfstringdefDisableCommands{\renewcommand*{\glstextup}{}} \newabbreviation[ category=initialism, shortaccess={ABC}, ]{ABC}{abc}{Aaaa Bbbb Cccc} \newabbreviation{Dr}{Dr.}{Doctor} \begin{document} \ab{ABC}, \ab{ABC}, \ab{ABC} \ab{Dr}, \ab{Dr}, \ab{Dr} \end{document}From which we can copy-paste (tested with current released version):
Aaaa Bbbb Cccc (ABC), ABC, ABC Doctor, Dr., Dr.Since
method=pdfstringdef
is already accsupp
's default when hyperref
is loaded, why hard-code it like this?Still there is some mystery there, since I cannot tell why the difference between ABC
and Dr
Date: 2022-06-17 19:24:59
Nicola,
might I be granted a question?
You've called my attention to \glsdefaultshortaccess
above which I had missed in my attempts to use glossaries-accsupp
. I eventually gave up on using it for my current project because (for my use case) I felt it inappropriate that entries which did not have a shortaccess
value were wrapped with ActualText
, since the intended actual text was the same as the text itself, and this adds an unnecessary complexity. So I feel that I might have missed some handle which is already there to control this, just like I missed \glsdefaultshortaccess
.
The question is, is there a way to use the accessibility support only for some entries, particularly only those which have an explicit shortaccess
field set? In terms of the MWE, how to wrap only ABC
, but not Dr
?
(This is a question, not a request or report. So, feel to chose from "No", "It's complicated", "Possibly, but it's a homework for you" or whatever else in the same vein as an answer).
Date: 2022-06-17 19:39:47
Ah, a small correction to a previous comment. method=pdfstringdef
is not the default of accsupp
. What the documentation says is "If package hyperref
is loaded, then its \pdfstringdef
is used". (As far as I can tell from the code, the default is escape
.)
Still, one alternative would be to condition the definition of \gls@accessibility
to the existence of \pdfstringdef
, and use method=pdfstringdef
in that case, instead of conditioning the definition of \@gls@assign@actual
.
Date: 2022-10-19 19:07:27
Hi Nicola,
thank you very much for the fix!
Best regards,
Gustavo.
Date: 2022-10-19 19:31:45
Repying to: anonymous 2022-10-19 19:07:27
Hi Gustavo,
thank you for your feedback. I'm sorry it took longer than anticipated as I had to get glossaries v4.50 and mfirstuc v2.08 out of the way first. Allow a few days for all the updates to make it into the TeX distributions.
Best regards
Nicola
Date: 2022-10-19 19:46:08
Repying to: Nicola Talbot 🦜 2022-10-19 19:31:45
Hi Nicola,
oh, please, don't be. Whatever your timing is, it is appreciated.
Just as v4.50 which came in today is very much so. :-)
Thank you!
Best regards,
Gustavo.
Add Comment
Page permalink: https://www.dickimaw-books.com/bugtracker.php?key=206
Date: 2022-06-16 21:36:11
The report category will depend on whether the issue is stemming from glossaries-accsupp (in which case, I'll change the report category to glossaries) or if it's coming from a modification made by glossaries-extra.
Would you be interested in testing the pending new version of glossaries-extra? If so, you can download the experimental 1.48b: glossaries-extra-1_48b.zip It's a working copy so the documented code is incorporated into the sty files. (Release versions have the comments stripped.) Some of the issues you have logged may now be fixed.