Bug Tracker

ID 206🔗
Date: 2022-06-16 13:45:31
Last update: 2022-06-17 18:17:55
Status Open
Sign in if you want to bump this report.
Category glossaries-extra
Version 1.48
Summary ActualText for entries with no explicit access value

Sign in to subscribe to notifications about this report.

Description

Hi Nicola,

I've stumbled upon another issue in my glossaries journey (but I hope to stop hammering your tracker soon, as things are already really shining :-).

When using glossaries-accsupp with hyperref the value stored for the shortaccess key of entries which did not receive an explicit value get some expanded value which is probably not intended (it's something like the character codes, I don't know what it's called).

The MWE illustrates this by creating two entries ABC and Dr, the first has an explicit shortaccess key, while the second doesn't. The result is that the ActualText works as intended for the first, but not for the second. The debug=showaccsupp shows it on the PDF, but copy-pasting the content of the generated document gets us:

Aaaa Bbbb Cccc (ABC), ABC, ABC
Doctor, \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.), \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.)
Best regards,
Gustavo.


PS: I was not sure how to classify the report. I didn't find glossaries-accsupp on the list, so I went with glossaries-extra, since that is what I had used for testing.

MWE

Download (810B)

\documentclass{article}

\usepackage{hyperref}

\usepackage[
  accsupp,
  abbreviations,
  shortcuts=abbr,
  % debug=showaccsupp,
]{glossaries-extra}

\makeglossaries

% I'm redefining for the intended use case (copy-paste of uppercased small
% caps abbreviations), but the problem also happens without this redefinition,
% as is visible with 'debug=showaccsupp'.
\renewcommand*{\glsshortaccsupp}[2]{\glsaccessibility{ActualText}{#1}{#2}}

\setabbreviationstyle[initialism]{long-short-sc}
\setabbreviationstyle{long-only-short-only}

\pdfstringdefDisableCommands{\renewcommand*{\glstextup}{}}

\newabbreviation[
  category=initialism,
  shortaccess={ABC},
]{ABC}{abc}{Aaaa Bbbb Cccc}

\newabbreviation{Dr}{Dr.}{Doctor}

\begin{document}

\ab{ABC}, \ab{ABC}, \ab{ABC}

\ab{Dr}, \ab{Dr}, \ab{Dr}

\end{document}

Evaluation

Comments

8 comments.

🔗Comment from Nicola Talbot 🦜
Date: 2022-06-16 21:36:11

The report category will depend on whether the issue is stemming from glossaries-accsupp (in which case, I'll change the report category to glossaries) or if it's coming from a modification made by glossaries-extra.

Would you be interested in testing the pending new version of glossaries-extra? If so, you can download the experimental 1.48b: glossaries-extra-1_48b.zip It's a working copy so the documented code is incorporated into the sty files. (Release versions have the comments stripped.) Some of the issues you have logged may now be fixed.

Replying to Comment #101:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from anonymous
Date: 2022-06-16 23:17:08
Repying to: Nicola Talbot 🦜 2022-06-16 21:36:11

Hi Nicola,

Sure! I'll be happy to do some testing.
True, my range cannot be but somewhat limited, since I'm new to the package and am far from knowing my way around it.
But I'll certainly take a look in the cases I have reported, and also in the features I'm using in the project I'm working on.
I'll report back when I have done so (hopefully tomorrow, I had intended to work on an AUCTeX style file for glossaries, but I can rearrange that ;-).

Best regards,
Gustavo.

Replying to Comment #102:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from anonymous
Date: 2022-06-17 15:16:58

Hi Nicola,

reporting on v1.48, as requested.

Regarding this particular issue, there is some change of behavior I could not track. The original MWE of the report now copy pastes the following content:

Aaaa Bbbb Cccc (abc), abc, abc
Doctor, \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.), \376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r\000.)
The difference being that the ActualText for the ABC entry no longer works. But I'm probably just missing something here. Either some adjustment on the document that the new version requires, and that I couldn't figure out. Or a corresponding new version of glossaries-accsupp which you haven't sent me. Either way, it was due that I reported the difference.

Besides the issue specific comments, which I've made above and elsewhere, in general, I put the new version to test in my current working document. Flawless. I diffed the PDFs generated with both versions, and they are equal. For you to have an idea what to infer from this, it is a book-length document (~500 pages), and I'm using basically the abbreviations functionality from glossaries-extra. It is a subset from it, of course, but a decent one: different types of abbreviations, categories for formatting, including some foreign entries, period discarding, etc. They are not many entries (~10), but a lot of references (>1000).

Furthermore, I took a general look, of course, and can see a lot of new interesting things coming. If I understood the purpose of \GlsXtrIfInGlossary well, I'm already have some ideas for it. Also the manual has received a lot of care, which is great. All in all, I'm definitely looking forward to it. Thank you very, very much!! :-)

Replying to Comment #108:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from Nicola Talbot 🦜
Date: 2022-06-17 16:20:29
Repying to: anonymous 2022-06-17 15:16:58

Thank you very much for your feedback. It's been very helpful.

The sequence \376\377\000D\000o\000c\000t\000o\000r translates to the characters FE FF 00 D 00 o 00 c 00 t 00 o 00 r. The first two (FE FF) are the byte order mark. This looks like the UTF-16 string for "Doctor".

The reason the long form is being used is because the default value for the short access is obtained from \glsdefaultshortaccess{long}{short}. This command is defined (in glossaries-accsup.sty) as:

\newcommand*{\glsdefaultshortaccess}[2]{#1}
This is the appropriate value for the short form of an abbreviation with the default "E" accessibility tag.

If you want the short form instead, you need to redefine this:

\renewcommand*{\glsdefaultshortaccess}[2]{#2}
Or if you don't want a default value:
\renewcommand*{\glsdefaultshortaccess}[2]{}
I'm not sure what's causing the encoding issue at the moment, but the UTF-16 is definitely stemming from glossaries-extra rather than glossaries-accsup.

The following uses \show to show the accessibility value:

\documentclass{article}

\usepackage{hyperref}
\usepackage[accsupp]{glossaries-extra}

\renewcommand*{\glsshortaccsupp}[2]{\def\tmp{#1}\show\tmp
 \glsaccessibility{ActualText}{#1}{#2}}

\newacronym{dr}{Dr}{Doctor}

\begin{document}
\gls{dr}, \gls{dr}.
\end{document}
This interrupts the document build and shows the following in the transcript:
> \tmp=macro:
->\376\377\000D\000o\000c\000t\000o\000r (\376\377\000D\000r).
Whereas:
\documentclass{article}

\usepackage{hyperref}
\usepackage{glossaries-accsupp}

\renewcommand*{\glsshortaccsupp}[2]{\def\tmp{#1}\show\tmp
 \glsaccessibility{ActualText}{#1}{#2}}

\newacronym{dr}{Dr}{Doctor}

\begin{document}
\gls{dr}, \gls{dr}.
\end{document}
Shows:
> \tmp=macro:
->Doctor.
Replying to Comment #109:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from Nicola Talbot 🦜
Date: 2022-06-17 18:17:55
Repying to: Nicola Talbot 🦜 2022-06-17 16:20:29

I've just noticed that glossaries-extra redefines \glsdefaultshortaccess to "long (short)". I think it would be better to revert this back to the original default in glossaries-accsupp. This may cause some backward compatibility issues, but it's more appropriate.

The default values for the accessibility fields are assigned from the original short and long values provided to \newabbreviation or \newacronym (not the values obtained after processing attributes such as insertdots). The assignment is performed using \pdfstringdef, if it has been defined, or with just \protected@edef otherwise.

It's \pdfstringdef that's converting the string to UTF-16BE. I've checked with the hyperref manual and that's the default setting, so the copy+paste problem may stem from your PDF viewer not supporting UTF-16BE. You can change the encoding, for example:

\hypersetup{pdfencoding=pdfdoc}
but this has limited support.

If you want to strip formatting commands, you can append a local change to \glsxtrassignactualsetup. For example:

\appto\glsxtrassignactualsetup{\letcs{\glstextup}{@firstofone}}
(I'll add that one in.)
Replying to Comment #110:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from anonymous
Date: 2022-06-17 19:06:18
Repying to: Nicola Talbot 🦜 2022-06-17 18:17:55

It's \pdfstringdef that's converting the string to UTF-16BE. I've checked with the hyperref manual and that's the default setting, so the copy+paste problem may stem from your PDF viewer not supporting UTF-16BE. You can change the encoding, for example:

\hypersetup{pdfencoding=pdfdoc}

but this has limited support.

Mmh, if it was \pdfstringdef per se, I'd expect that all entries would have their ActualText converted to UTF-16, and that's not the case, since ABC does not get converted.

And, indeed, you are right that without hardcoding \pdfstringdef into \@gls@assign@actual (as is done in glossaries-extra) we get correct results:

\documentclass{article}

\usepackage{hyperref}

\usepackage[
  accsupp,
  abbreviations,
  shortcuts=abbr,
  debug=showaccsupp,
]{glossaries-extra}

\makeglossaries

\renewcommand*{\glsshortaccsupp}[2]{\glsaccessibility[method=pdfstringdef]{ActualText}{#1}{#2}}

\makeatletter
\renewcommand{\@gls@assign@actual}{%
  \begingroup
  \glsxtrassignactualsetup
  \protected@edef\@gls@tmp{\endgroup
    \def\noexpand\@gls@actualshort{\glsxtrorgshort}%
    \def\noexpand\@gls@actuallong{\glsxtrorglong}%
    \def\noexpand\@gls@actualshortpl{\@gls@shortpl}%
    \def\noexpand\@gls@actuallongpl{\@gls@longpl}%
  }%
  \@gls@tmp
}
\makeatletter

\setabbreviationstyle[initialism]{long-short-sc}
\setabbreviationstyle{long-only-short-only}

\renewcommand*{\glsdefaultshortaccess}[2]{#2}

\pdfstringdefDisableCommands{\renewcommand*{\glstextup}{}}

\newabbreviation[
  category=initialism,
  shortaccess={ABC},
]{ABC}{abc}{Aaaa Bbbb Cccc}

\newabbreviation{Dr}{Dr.}{Doctor}

\begin{document}

\ab{ABC}, \ab{ABC}, \ab{ABC}

\ab{Dr}, \ab{Dr}, \ab{Dr}

\end{document}
From which we can copy-paste (tested with current released version):
Aaaa Bbbb Cccc (ABC), ABC, ABC
Doctor, Dr., Dr.
Since method=pdfstringdef is already accsupp's default when hyperref is loaded, why hard-code it like this?

Still there is some mystery there, since I cannot tell why the difference between ABC and Dr

Replying to Comment #111:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from anonymous
Date: 2022-06-17 19:24:59

Nicola,

might I be granted a question?

You've called my attention to \glsdefaultshortaccess above which I had missed in my attempts to use glossaries-accsupp. I eventually gave up on using it for my current project because (for my use case) I felt it inappropriate that entries which did not have a shortaccess value were wrapped with ActualText, since the intended actual text was the same as the text itself, and this adds an unnecessary complexity. So I feel that I might have missed some handle which is already there to control this, just like I missed \glsdefaultshortaccess.

The question is, is there a way to use the accessibility support only for some entries, particularly only those which have an explicit shortaccess field set? In terms of the MWE, how to wrap only ABC, but not Dr?

(This is a question, not a request or report. So, feel to chose from "No", "It's complicated", "Possibly, but it's a homework for you" or whatever else in the same vein as an answer).

Replying to Comment #112:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

🔗Comment from anonymous
Date: 2022-06-17 19:39:47

Ah, a small correction to a previous comment. method=pdfstringdef is not the default of accsupp. What the documentation says is "If package hyperref is loaded, then its \pdfstringdef is used". (As far as I can tell from the code, the default is escape.)

Still, one alternative would be to condition the definition of \gls@accessibility to the existence of \pdfstringdef, and use method=pdfstringdef in that case, instead of conditioning the definition of \@gls@assign@actual.

Replying to Comment #113:

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.


Add Comment

Name (optional):

Are you human? Please confirm the bug report ID (which can be found at the top of this page) or login if you have an account.

Comment:

You can use the following markup:

Block:

[pre]Displayed verbatim[/pre]
[quote]block quote[/quote]

In line:

[tt]code[/tt]
[file]file/package/class name[/file]
[em]emphasized text[/em]
[b]bold text[/b]
[url]web address[/url] [sup]superscript[/sup]
[sub]subscript[/sub]

Ordered list:
[ol]
[li]first item[/li]
[li]second item[/li]
[/ol]

Unordered list:
[ul]
[li]first item[/li]
[li]second item[/li]
[/ul]

You can use the Preview button to review your message formatting before submitting.

Page permalink: https://www.dickimaw-books.com/bugtracker.php?key=206