The Dickimaw Books store has unfortunately closed until further notice. The reason for this is because PayPal has removed support for encryption with its PayPal Payments Standard option. This is where an online store redirects the customer to PayPal’s site in order to make the payment. PayPal is still providing this payment option, but the store will now only work if I switch off encryption, which I’m not prepared to do.
For those who want more detail, the way that this works is as follows. The customer adds products to the basket and proceeds through the checkout process until they arrive at the final checkout page that confirms the price of each item, any discount applied, postage and packaging, final total, invoice address and shipping address. All this information needs to be sent to PayPal so that the correct amount can be charged. Once the transaction is successfully completed, PayPal then sends a notification back to the store to confirm that the payment has been made.
Without encryption, the transaction data at the checkout page is contained in plain text within the form parameters and is sent as plain text to PayPal when the customer clicks on the continue button.
There are two problems with using plain text. The first is that these private details about the customer and their transaction can be intercepted by a third party eavesdropper.¹ The second is that a dishonest customer can open the developer tools in their web browser and alter the payment details, awarding themselves a hefty discount and defrauding the merchant. Under those circumstances, it’s hard for the merchant to prove that they didn’t have the products temporarily listed at a lower price when the transaction was made.
Encryption helps to protect both the customer’s private details and the merchant. The way that this is done is through public/private key encryption. At the checkout page, all the transaction details are stored within a single form parameter with an encrypted value. This prevents any tampering and also protects the data when it’s transmitted.
There is a two-way communication between the merchant’s site and PayPal. In order for the encryption to work, the merchant’s store needs a copy of PayPal’s public certificate (which the merchant used to be able to download from their PayPal business account). PayPal, in turn, needs the merchant’s public certificate. The encryption and decryption can’t be performed without a valid public/private key pair.
Certificates have an expiry date. This is a precaution in case the private key is stolen. Whilst stolen keys can be revoked, there’s a chance that this may not be noticed. An expiry date at least limits the length of time a stolen key can be used for.
The certificate for the Dickimaw Books store expired last Sunday. I had set myself a reminder to create a new pair and did so the day before, but when I tried to upload the new public certificate to PayPal, I encountered a 404 page not found error. I raised an issue with their merchant technical support and was informed that the encrypted option was no longer available. The checkout will now only work if I disable the encryption from the store’s admin page.
I have no idea why PayPal would intentionally remove a security feature, particularly without giving any prior warning. This will obviously impact all small merchants who use this method, although they may not discover this until their certificate expires and they try to upload a new one. I’m hoping that this issue will turn out to be a miscommunication within PayPal’s technical support department and an inadvertent broken link. Until they restore the ability to use encryption or until I find an alternative payment provider, the store will remain closed.
Meanwhile, if you want to purchase any of my paperback books, you can purchase them from a third party book seller, such as Amazon.
¹Using https instead of http does, of course, add a layer of protection, which help protect against eavesdropping, but it doesn’t protect against fraudulently altering the information before it’s sent.
You’ve probably come across websites that want you to prove that you’re human and not a robot. This may come in the form of a picture challenge (for example, select all the squares with bicycles) or it may simply require you to check a box to assert that you’re not a robot. Perhaps you’re wondering why you need to do this. Why is the website so concerned about being visited by robots? Alternatively, perhaps you’re a website developer and are determined to find a way to keep out all bots.
What is a bot? Are all bots bad?
As with cookies, bots are important tools in the digital world. However, as with cookies, bots can also be used for unwholesome purposes.
“Bot” is short for robot and is simply a piece of software (an application) that visits websites. A bot may follow one link after another, crawling through pages across the World Wide Web. For this reason, they are often called “crawlers” or “spiders”.
If you go to your favourite search engine and type in a keyword or phrase (or use a voice activated request on your mobile device) then the results usually come up fairly quickly. This is only possible because the search engine has an index that has been compiled by bots that have followed link after link, gathering information. Without this index, it would take a very long time to scour all the millions of pages that make up the web to find something relevant.
Not all bots are crawlers. For example, Facebook has a bot that’s used when a post contains a link. The bot is used to check that the link exists and it reads any Open Graph markup. This allows Facebook to include an image and short excerpt to arouse the interest of anyone who views the post. Unlike the search engine bots, this bot doesn’t roam free about the Internet but instead restricts itself to links posted on Facebook pages.
For example, “facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)” identifies the Facebook bot (facebookexternalhit), its version number (1.1) and a way of finding out information about the bot.
So these are useful bots that help users to discover interesting sites.
Although the crawlers used by search engines are useful, some crawlers that index sites to provide certain types of information for their users (who may require free or paid accounts to access it) can be a nuisance because they’re not well-behaved. For example, they may not follow the robot instructions stipulated by the website (robots.txt), they may try to access pages that are only intended for human visitors or they may hit the site so hard (that is, they look up pages so fast) that they slow down the site and it becomes unusable for everyone else.
This could be because the bot’s developer made a mistake (a bug in the bot’s code or an inexperienced programmer) or it could be because the developer simply doesn’t care and wants the information quickly regardless of the inconvenience to others (perhaps to satisfy the demands of paying customers). In the long run, this is counter-productive as it will lead to the bot (which is identified in the user agent string) being banned.
Web scraping (or harvesting) is when a bot extracts data from a webpage. In the earlier case of search engines and social media, this data can just be keywords or phrases or the URL for the page image, but some bots are designed to gather all information from a page in order to reproduce it verbatim on another site. This is often done to lure visitors to their own copycat site, which will most likely be stuffed full of adverts and tracking (which makes a profit for their owner). This is usually a violation of intellectual property. Even where the original page is available under a permissive licence, attribution is usually required but is often omitted. This happens a lot for question and answer sites, such as Stack Exchange, or forums.
These bots may well have the user agent string empty or set to the default value for the given API that they are built with.
Trolls and Spambots
These are the types of bots that the pages that require you to identify yourself as a human are mostly trying to block. The user agent string is typically set to a common browser and platform to make the bot appear as though it is a human visitor. These bots search for forms to fill in, such as contact forms to send spam messages or comment forms to advertise dubious products and sites.
While spambots are the digital equivalent of fly-posters, trollbots are the equivalent of poison-pen letter writers. They are created by individuals who take a puckish delight in causing hurt and discord. These bots are designed to search for certain keywords on a page and craft an offensive or divisive comment that relates to the topic. The creators of these bots may have a particular hatred towards a certain group of people, but they can also be chaotic nihilists with a set of offensive comments for every group.
The expression “don’t feed the trolls” has been around for a long time. I remember first encountering it on Usenet back in the early 1990s (accompanied by some ASCII art). It’s very good advice. Don’t give trolls the attention that they are looking for, but, in some cases, the troll posting the offensive comments isn’t human. It’s a bot that has no ability to reason, no feelings, no embarrassment. Its function is solely to post content that its creator programmed into it.
Chatbots can come under both this category and the next. Chatbots in general are just a tool that simulates conversation, and are often used for legitimate services, such as online help, but they are also used by criminals to deceive people. For example, a fraudster might create a fake account on an Internet dating site and use a chatbot to hook victims who believe they are chatting with a human. Once the chatbot has gained the victim’s trust, the fraudster takes over.
The worst of the bad bots are the ones created by cyber-criminals and they are designed to wreak havoc, stealing data and installing malware. These bots look for dynamic web pages that use parameters and will try to inject malicious code into the parameter values.
For example, the page https://www.dickimaw-books.com/booklist.php?book_id=11 has a parameter (book_id) that identifies a particular edition of a book. (In this case, the second paperback edition of The Private Enemy.) The parameter value (11) uniquely identifies this edition in the database that contains all the title information.
A malicious bot will try altering the parameter value to break into the database. For example, it may start out by simply appending an apostrophe (book_id=11'). If this triggers a syntax error then the site is vulnerable to SQL injection and the bot can then try something far nastier to access the contents of the database.
Or the parameter value may be the name of a template file, which is used for the main body of the web page, so the bot will try replacing the parameter value with /etc/passwd (or ../etc/passwd etc) in order to trick that web page into revealing the contents of the password file instead.
Bad bots can also disrupt a website by repeatedly accessing pages in rapid succession (a denial of service attack or, where an army of bots are working together, a distributed denial of service attack). This can make the site completely inaccessible to anyone else.
These types of bots rarely identify themselves honestly. The user agent string is typically empty or contains a common browser and platform combination (as with the trolls and spambots). I’ve also encountered attempts at SQL injection where the user agent string was the same as the aforementioned Facebook bot. At first glance, it gives the impression that a Facebook bot has gone rogue (or followed a bad link) but the IP was registered to somewhere in Russia, which seems an unlikely origin for a Facebook bot, so bad bots not only pretend to be human but also try to pass themselves off as legitimate bots.
Sometimes the user agent string will contain “sqlmap”. This is a legitimate pen testing application. However, in many jurisdictions, penetration testing can only be performed by mutual consent between the pen tester and the website owner. If you are a website developer and a pen tester has been hired by your organisation, then don’t block bots with this user agent as the site needs to be tested by an unblocked bot since most bad bots don’t conveniently identify themselves. If a pen tester hasn’t been engaged then the tool is being used illegally (which is par for the course with criminals).
So, if you’re a website developer and you want to stop bad bots, remember that you can’t rely on the user agent string. Bots pretend to be human and some humans blank their user agent string for privacy reasons. The first line of defence is to filter (e.g. ensure that a numeric value is actually a number), escape special characters (e.g. htmlentities) and use prepared statements.
If you’re just a regular website user, don’t assume that every comment you read was actually posted by a human and, while captchas may be frustrating, your web browsing experience may be far worse without them.
Update 2021-08-08: added paragraph on Signal in Good Bots section and paragraph on chatbots in Trolls and Spambots section.
If you are a regular visitor to the site, you may have noticed that there’s a new “Account” link in the navigation bar (situated below the title banner). That page provides access to the main site account where you can manage your notifications and keep track of any bug reports, feature requests, comments or typo reports that you have submitted.
Note that this main site account doesn’t include the shop or this blog as these two areas use third-party software (osCommerce and WordPress, respectively) with different databases. So if you already have a shop account (or plan to create one) and you also want a site account, you will need two separate sets of credentials as there’s no single sign-on (SSO) system. (The shop account can be accessed via the “My Account” link at the top of the shop pages.)
It’s still possible to submit bug reports, feature requests , typo reports and make comments on a bug report or feature request as a guest, but if you want to receive any email messages about updates to your post or if you want to receive any notifications about other posts then you will need to create an account. The reason for this change is that, in order to email you notifications, the site must necessarily store your email address. By having a password-protected account, you can more easily adjust your preferences or change your email address.
If you are signed in, you can also bump open bug reports. Sometimes a bug report comes in when I’m particularly busy, and I’ll open it with the intention of looking at it more thoroughly when my workload eases up, but sometimes I forget. If a bug that looks easy to fix (such as commenting out an end of line character or correcting a misspelt command name) has remained open for some months then that’s the most likely reason why. In which case, you can remind me by signing in, view the report and click on the “Bump” button. This will automatically send me an email to remind me.
The image above shows the header information for an open bug report. This starts with the bug ID followed by a permalink if you want to share or bookmark the report. This is followed by the submitter (me, in this case). This information will be omitted if the report was posted as a guest. If the report was posted by an authenticated user (someone who was logged into the site account) then the report is linked to that user’s account (so it will show up in their account page) but the “Submitted by” information is determined by their account settings, which may be one of: anonymous (default), username or display name. In this example, the report was posted by me and I have the “display name” setting on, so it shows my display name (Nicola Talbot 🦜). It shows up in green to indicate that the user has administrator privileges (just in case, by some coincidence, another user happens to have the same name).
The status line shows that the report is still open. In this case, it’s a problem that’s very tricky to fix (which is why it’s been open for so long) but, if you want to remind me about it, you can click on the “Bump” button next to the status, which will automatically send me an email. Since I don’t want to be mail-bombed by a particularly enthusiastic user repeatedly clicking on it, the bump button becomes unavailable for a couple of weeks after it’s used.
Below the report summary (and after the button that will return you to the search results), is information about whether or not you have subscribed to receive notifications about this report with a button that allows you to subscribe or unsubscribe. You can view a list of all the reports that you have signed up for in the “Notifications” area of your account page. Notifications are sent whenever a significant change is made to the report, such as a change of status or a new comment. Notifications aren’t sent for minor edits, such as fixing spelling mistakes. By way of comparison, the above report is shown below where the user isn’t logged in.
The “Bump” button is no longer available. Instead there’s a link to sign in if you want to bump it. Similarly, there’s no subscribe/unsubscribe button.
Feature requests have something similar, but instead of bumping the post you can “like” it. The number of likes a post receives will give me some idea of how popular the request is, which I can use to determine whether or not it’s worth implementing.
Another advantage with being signed in is that the site will trust you more than it does for a guest. Unfortunately a high number of bots hit the site, and some of the forms are intentionally complicated to make them harder for bots to navigate. (In the past I used CAPTCHAs, but bots can break them, they can cause accessibility issues and they use third-party code, which may implement tracking.) This means that if you are logged in, some of the forms are simpler, such as the comment forms and the report a typo form.
There are four types of notifications you can sign up for: news, bug reports, feature requests, and books. I’ve already mentioned bug reports and feature requests above. In an earlier blog post, I described the RSS feeds available on this site, but it may be that you don’t have an aggregator and don’t want the hassle of installing one. If you prefer to receive an email notification whenever a new item is added to the News page then you can either subscribe to all news or you can select the tags that you’re interested in via the News Notifications area. For example, if you want to be notified whenever a new example is added to the Gallery, then you can subscribe to the “gallery” news tag.
The Book Notifications area allows you to sign up for notifications about any of the books published by Dickimaw Books. You can either sign up for notifications about a specific title or you can sign up for notifications about particular genres. For example, if you sign up for news about the title LaTeX for Complete Novices then you will receive a notification when the first edition goes out of print and when the second edition comes into print. If you sign up for a pending title, it helps me to gauge whether there’s enough interest in the book to make it worth publishing.
In the “Notifications” section of the Account page you can choose whether to receive an email for each notification or to receive a daily or weekly digest. There aren’t usually a lot of notifications in one day or week, but occasionally a post may receive multiple comments or there may be several news items in one day.
To create an account, follow the link to the Account page. This will automatically redirect you to the login page and from there you can follow the Create Account link. The site credentials (the information you need to supply in order to login) comprise a username (not email) and password. The username must start with a letter and consist only of letters, numbers, period/full stop (.), hyphen (-) or underscore (_) and must be a minimum of three characters. The password must be at least 8 characters long and mustn’t be easy to guess. Common passwords that have been exposed in data breaches won’t be accepted. Similarly, passwords formed from easy to guess patterns (such as 12121212) aren’t allowed. If you have difficulty remembering all your passwords (and you should have a different one for every account) then I recommend that you use a password manager.
You need to supply an email address when you create an account. If you have previously signed up for bug report or feature request notifications on this site then, if you use the same email address, you can retain your existing notification settings. (You can later change your email address in your account page once your account has been created and verified.)
You can optionally specify a display name. This may consist of most printable characters (letters, numbers and punctuation) and spaces. My display name includes an emoji (🦜) at the end mainly to test UTF-8 support. As long as the display name doesn’t breach the site terms and conditions (that is, as long as it isn’t offensive etc) you should be able to choose a display name to suit you. Note that browsers may use fonts that don’t support some characters so there’s no guarantee that a display name will be rendered correctly.
Once you have created an account, you will receive an email with the verification code, which needs to be used to activate your account. All emails from the site will address you by your display name (if set) or by your username and are sent as plain text (no HTML part) so there’s no unnecessary bloat from images and there are no hidden elements (such as web beacons).
Once your account has been verified, you can login and go to the Account page to view your settings. You can change your display name, email and password but not your username.
You can also set up two-factor authentication (2FA), which I recommend. This requires a time-based one time password (TOTP) authenticator app (which provides a six-digit code that changes at regular intervals, typically 30 or 60 seconds). TOTP is a public algorithm (RFC 6238) and is used by most authenticator apps. Some companies have a tendency to promote their own TOTP app as though it’s the only one that can be used with their site and it’s only in the small print that they acknowledge that you can actually use other authenticator apps. This has unfortunately led some users into believing that they need to install multiple authenticator apps, despite the fact that most of them are compatible.
(SMS authentication isn’t supported for this site. It’s not secure and requires an extra piece of personally identifiable information to be stored in the site database, which wouldn’t otherwise be needed.)
To setup 2FA, first make sure you have an authenticator app installed then go to the “Security” section of your account page and click on the “Enable 2FA” link. This will display a QR code for you to scan. Alternatively, you can manually enter the key below the image. This key (which is embedded in the QR code) is the secret part of the TOTP algorithm. A copy is saved on the site database (encrypted) and in your authenticator app. It’s this key that’s used by the TOTP algorithm to generate the 6-digit code based on the current time. In order to ensure that the key has been correctly entered into your authenticator app, you need to enter the 6-digit code generated by the app in the text box below the QR code and click on the “Verify” button to complete the process.
Once you have enabled 2FA, you can also setup recovery codes. These are single-use codes that can be used instead of the TOTP 6-digit code and should be stored in a private place (for example, write them down and put them in a safe). If you can’t use your authenticator app (for example, your phone’s battery is flat) then you can use a recovery code instead. Once you have used up all your recovery codes (or if they have been discovered by someone else), you can generate a new set.
When 2FA is enabled, the next time you login you will need to provide the 6-digit code from your authenticator app or a recovery code (in addition to your username and password). You have the option to trust the device and browser that you are using for 30 days. If you want to enable this, you need to make sure the “trust this device” checkbox is selected before entering your 6-digit code. This means that next time you log in using that particular browser on that device you will only have to supply your username and password. Note that this requires a persistent cookie (with a lifespan of 30 days). Once the cookie expires (or is deleted) you will have to supply the 2FA code again.
When you use the “trust this device” setting, the webscript will try to determine your operating system and browser from the user agent string. This information (if available) and your IP address is stored (encrypted) in the site database so that you can review your list of trusted devices to help determine whether or not you recognize them. The information isn’t used for any other purpose.
All this extra security may seem like overkill just to receive notifications from a small site, but it’s good practice.
UK mobile networks are sending a “stay at home” message to everyone in response to the current nationwide lockdown. While the link in that specific message is safe, don’t click on links in text messages. It’s very easy for scammers to fake that message and replace the safe link with their own nasty version. It doesn’t take long to type “gov.uk” into the address bar of your browser and you can follow the appropriate link from that site’s home page.
Don’t click on links in text messages. Get into the habit of not clicking links, even if when it’s safe. There’s been a rise in scams and phishing attempts that prey on people’s fears. Please do take care.
If you’re unsure about whether or not a web address is genuine, type it into the search box of your favourite search engine. If the search box is also an address bar (as is the case for some browsers), you need to make sure it doesn’t get interpreted as a URL, which would take you to the site rather than allow you to investigate it first. For example, if you get a link to “example.com/important-info” then type something like “what is example.com” or “who is example.com” or “who owns example.com” as your search term. That should hopefully ensure that it’s interpreted as a search rather than an address. (You can also use the ICANN lookup to look up the registration data for the domain, but an Internet search may show up warnings and alerts.)
The same advice applies for emails, and with email messages you need to be even more careful as links are more dangerous in HTML content than in plain text messages because they are hidden behind the link text. On a desktop device you may be able to see the URL when you hover the mouse pointer over the link text, but you can’t do this on a mouseless mobile device. You may be able to copy the link (using a context popup menu or a long tap) but you need to take care that you don’t accidentally follow the link by mistake.
Always be very careful about emails that encourage you to click on a link or open an attachment even if they seem to be sent from a legitimate source. Sender addresses are usually sent in the form “Display Name” email@example.com. The “display name” part can be set to anything. For example, “Some Public Health Body” firstname.lastname@example.org. So be careful not to trust the display name. Copy the domain part (after @) and paste it into a search engine to investigate it (bearing in mind the earlier advice about a search bar that doubles as an address bar).
The Dickimaw Books site has some functions that will send an automated email that may include a link. For example, if you report a bug and provide your email address for confirmation then you will receive a message informing you when your report is logged with a link to the topic page on the bug tracker. I’ve amended the template used for that message to additionally provide information on how to navigate your way to the topic page without clicking on the link. It’s less convenient but it’s safer.
Stay safe and practice both physical and digital hygiene.