Book Shop Closed Indefinitely Due to PayPal Removing Support for Encrypted Website Payments

The Dickimaw Books store has unfortunately closed until further notice. The reason for this is because PayPal has removed support for encryption with its PayPal Payments Standard option. This is where an online store redirects the customer to PayPal’s site in order to make the payment. PayPal is still providing this payment option, but the store will now only work if I switch off encryption, which I’m not prepared to do.

For those who want more detail, the way that this works is as follows. The customer adds products to the basket and proceeds through the checkout process until they arrive at the final checkout page that confirms the price of each item, any discount applied, postage and packaging, final total, invoice address and shipping address. All this information needs to be sent to PayPal so that the correct amount can be charged. Once the transaction is successfully completed, PayPal then sends a notification back to the store to confirm that the payment has been made.

Without encryption, the transaction data at the checkout page is contained in plain text within the form parameters and is sent as plain text to PayPal when the customer clicks on the continue button.

There are two problems with using plain text. The first is that these private details about the customer and their transaction can be intercepted by a third party eavesdropper.¹ The second is that a dishonest customer can open the developer tools in their web browser and alter the payment details, awarding themselves a hefty discount and defrauding the merchant. Under those circumstances, it’s hard for the merchant to prove that they didn’t have the products temporarily listed at a lower price when the transaction was made.

Encryption helps to protect both the customer’s private details and the merchant. The way that this is done is through public/private key encryption. At the checkout page, all the transaction details are stored within a single form parameter with an encrypted value. This prevents any tampering and also protects the data when it’s transmitted.

There is a two-way communication between the merchant’s site and PayPal. In order for the encryption to work, the merchant’s store needs a copy of PayPal’s public certificate (which the merchant used to be able to download from their PayPal business account). PayPal, in turn, needs the merchant’s public certificate. The encryption and decryption can’t be performed without a valid public/private key pair.

Certificates have an expiry date. This is a precaution in case the private key is stolen. Whilst stolen keys can be revoked, there’s a chance that this may not be noticed. An expiry date at least limits the length of time a stolen key can be used for.

The certificate for the Dickimaw Books store expired last Sunday. I had set myself a reminder to create a new pair and did so the day before, but when I tried to upload the new public certificate to PayPal, I encountered a 404 page not found error. I raised an issue with their merchant technical support and was informed that the encrypted option was no longer available. The checkout will now only work if I disable the encryption from the store’s admin page.

I have no idea why PayPal would intentionally remove a security feature, particularly without giving any prior warning. This will obviously impact all small merchants who use this method, although they may not discover this until their certificate expires and they try to upload a new one. I’m hoping that this issue will turn out to be a miscommunication within PayPal’s technical support department and an inadvertent broken link. Until they restore the ability to use encryption or until I find an alternative payment provider, the store will remain closed.

Meanwhile, if you want to purchase any of my paperback books, you can purchase them from a third party book seller, such as Amazon.

¹Using https instead of http does, of course, add a layer of protection, which help protect against eavesdropping, but it doesn’t protect against fraudulently altering the information before it’s sent.

Good Bots and Bad Bots

You’ve probably come across websites that want you to prove that you’re human and not a robot. This may come in the form of a picture challenge (for example, select all the squares with bicycles) or it may simply require you to check a box to assert that you’re not a robot. Perhaps you’re wondering why you need to do this. Why is the website so concerned about being visited by robots? Alternatively, perhaps you’re a website developer and are determined to find a way to keep out all bots.

What is a bot? Are all bots bad?

As with cookies, bots are important tools in the digital world. However, as with cookies, bots can also be used for unwholesome purposes.

“Bot” is short for robot and is simply a piece of software (an application) that visits websites. A bot may follow one link after another, crawling through pages across the World Wide Web. For this reason, they are often called “crawlers” or “spiders”.

Good Bots

If you go to your favourite search engine and type in a keyword or phrase (or use a voice activated request on your mobile device) then the results usually come up fairly quickly. This is only possible because the search engine has an index that has been compiled by bots that have followed link after link, gathering information. Without this index, it would take a very long time to scour all the millions of pages that make up the web to find something relevant.

Not all bots are crawlers. For example, Facebook has a bot that’s used when a post contains a link. The bot is used to check that the link exists and it reads any Open Graph markup. This allows Facebook to include an image and short excerpt to arouse the interest of anyone who views the post. Unlike the search engine bots, this bot doesn’t roam free about the Internet but instead restricts itself to links posted on Facebook pages.

Well behaved bots commonly identify themselves in the user agent string in the form:

bot-name/version URL

For example, “facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)” identifies the Facebook bot (facebookexternalhit), its version number (1.1) and a way of finding out information about the bot.

So these are useful bots that help users to discover interesting sites.

However, even the good bots don’t always honestly identify themselves. For example, if you post a link in the Signal Messaging app, the bot used to fetch the preview information identifies itself as WhatsApp, and this basically seems to be a rehash of the “all browsers identify themselves as Mozilla” problem.

Not So Good Bots

Although the crawlers used by search engines are useful, some crawlers that index sites to provide certain types of information for their users (who may require free or paid accounts to access it) can be a nuisance because they’re not well-behaved. For example, they may not follow the robot instructions stipulated by the website (robots.txt), they may try to access pages that are only intended for human visitors or they may hit the site so hard (that is, they look up pages so fast) that they slow down the site and it becomes unusable for everyone else.

This could be because the bot’s developer made a mistake (a bug in the bot’s code or an inexperienced programmer) or it could be because the developer simply doesn’t care and wants the information quickly regardless of the inconvenience to others (perhaps to satisfy the demands of paying customers). In the long run, this is counter-productive as it will lead to the bot (which is identified in the user agent string) being banned.

Scrapers

Web scraping (or harvesting) is when a bot extracts data from a webpage. In the earlier case of search engines and social media, this data can just be keywords or phrases or the URL for the page image, but some bots are designed to gather all information from a page in order to reproduce it verbatim on another site. This is often done to lure visitors to their own copycat site, which will most likely be stuffed full of adverts and tracking (which makes a profit for their owner). This is usually a violation of intellectual property. Even where the original page is available under a permissive licence, attribution is usually required but is often omitted. This happens a lot for question and answer sites, such as Stack Exchange, or forums.

These bots may well have the user agent string empty or set to the default value for the given API that they are built with.

Trolls and Spambots

These are the types of bots that the pages that require you to identify yourself as a human are mostly trying to block. The user agent string is typically set to a common browser and platform to make the bot appear as though it is a human visitor. These bots search for forms to fill in, such as contact forms to send spam messages or comment forms to advertise dubious products and sites.

While spambots are the digital equivalent of fly-posters, trollbots are the equivalent of poison-pen letter writers. They are created by individuals who take a puckish delight in causing hurt and discord. These bots are designed to search for certain keywords on a page and craft an offensive or divisive comment that relates to the topic. The creators of these bots may have a particular hatred towards a certain group of people, but they can also be chaotic nihilists with a set of offensive comments for every group.

The expression “don’t feed the trolls” has been around for a long time. I remember first encountering it on Usenet back in the early 1990s (accompanied by some ASCII art). It’s very good advice. Don’t give trolls the attention that they are looking for, but, in some cases, the troll posting the offensive comments isn’t human. It’s a bot that has no ability to reason, no feelings, no embarrassment. Its function is solely to post content that its creator programmed into it.

Chatbots can come under both this category and the next. Chatbots in general are just a tool that simulates conversation, and are often used for legitimate services, such as online help, but they are also used by criminals to deceive people. For example, a fraudster might create a fake account on an Internet dating site and use a chatbot to hook victims who believe they are chatting with a human. Once the chatbot has gained the victim’s trust, the fraudster takes over.

Malware Bots

The worst of the bad bots are the ones created by cyber-criminals and they are designed to wreak havoc, stealing data and installing malware. These bots look for dynamic web pages that use parameters and will try to inject malicious code into the parameter values.

For example, the page https://www.dickimaw-books.com/booklist.php?book_id=11 has a parameter (book_id) that identifies a particular edition of a book. (In this case, the second paperback edition of The Private Enemy.) The parameter value (11) uniquely identifies this edition in the database that contains all the title information.

A malicious bot will try altering the parameter value to break into the database. For example, it may start out by simply appending an apostrophe (book_id=11'). If this triggers a syntax error then the site is vulnerable to SQL injection and the bot can then try something far nastier to access the contents of the database.

Another possibility is that the parameter value may be printed on the web page, so the bot will try replacing the value with JavaScript. For example, the bot may start out with a simple alert. If the bot detects an alert box then the site is vulnerable to cross-site scripting (XSS) and the bot can try something more damaging.

Or the parameter value may be the name of a template file, which is used for the main body of the web page, so the bot will try replacing the parameter value with /etc/passwd (or ../etc/passwd etc) in order to trick that web page into revealing the contents of the password file instead.

Bad bots can also disrupt a website by repeatedly accessing pages in rapid succession (a denial of service attack or, where an army of bots are working together, a distributed denial of service attack). This can make the site completely inaccessible to anyone else.

These types of bots rarely identify themselves honestly. The user agent string is typically empty or contains a common browser and platform combination (as with the trolls and spambots). I’ve also encountered attempts at SQL injection where the user agent string was the same as the aforementioned Facebook bot. At first glance, it gives the impression that a Facebook bot has gone rogue (or followed a bad link) but the IP was registered to somewhere in Russia, which seems an unlikely origin for a Facebook bot, so bad bots not only pretend to be human but also try to pass themselves off as legitimate bots.

Sometimes the user agent string will contain “sqlmap”. This is a legitimate pen testing application. However, in many jurisdictions, penetration testing can only be performed by mutual consent between the pen tester and the website owner. If you are a website developer and a pen tester has been hired by your organisation, then don’t block bots with this user agent as the site needs to be tested by an unblocked bot since most bad bots don’t conveniently identify themselves. If a pen tester hasn’t been engaged then the tool is being used illegally (which is par for the course with criminals).

So, if you’re a website developer and you want to stop bad bots, remember that you can’t rely on the user agent string. Bots pretend to be human and some humans blank their user agent string for privacy reasons. The first line of defence is to filter (e.g. ensure that a numeric value is actually a number), escape special characters (e.g. htmlentities) and use prepared statements.

If you’re just a regular website user, don’t assume that every comment you read was actually posted by a human and, while captchas may be frustrating, your web browsing experience may be far worse without them.


Update 2021-08-08: added paragraph on Signal in Good Bots section and paragraph on chatbots in Trolls and Spambots section.

Be Careful of Message Links

UK mobile networks are sending a “stay at home” message to everyone in response to the current nationwide lockdown. While the link in that specific message is safe, don’t click on links in text messages. It’s very easy for scammers to fake that message and replace the safe link with their own nasty version. It doesn’t take long to type “gov.uk” into the address bar of your browser and you can follow the appropriate link from that site’s home page.

Don’t click on links in text messages. Get into the habit of not clicking links, even if when it’s safe. There’s been a rise in scams and phishing attempts that prey on people’s fears. Please do take care.

If you’re unsure about whether or not a web address is genuine, type it into the search box of your favourite search engine. If the search box is also an address bar (as is the case for some browsers), you need to make sure it doesn’t get interpreted as a URL, which would take you to the site rather than allow you to investigate it first. For example, if you get a link to “example.com/important-info” then type something like “what is example.com” or “who is example.com” or “who owns example.com” as your search term. That should hopefully ensure that it’s interpreted as a search rather than an address. (You can also use the ICANN lookup to look up the registration data for the domain, but an Internet search may show up warnings and alerts.)

The same advice applies for emails, and with email messages you need to be even more careful as links are more dangerous in HTML content than in plain text messages because they are hidden behind the link text. On a desktop device you may be able to see the URL when you hover the mouse pointer over the link text, but you can’t do this on a mouseless mobile device. You may be able to copy the link (using a context popup menu or a long tap) but you need to take care that you don’t accidentally follow the link by mistake.

Always be very careful about emails that encourage you to click on a link or open an attachment even if they seem to be sent from a legitimate source. Sender addresses are usually sent in the form “Display Name” user@example.com. The “display name” part can be set to anything. For example, “Some Public Health Body” scammer@baddomain.com. So be careful not to trust the display name. Copy the domain part (after @) and paste it into a search engine to investigate it (bearing in mind the earlier advice about a search bar that doubles as an address bar).

The Dickimaw Books site has some functions that will send an automated email that may include a link. For example, if you report a bug and provide your email address for confirmation then you will receive a message informing you when your report is logged with a link to the topic page on the bug tracker. I’ve amended the template used for that message to additionally provide information on how to navigate your way to the topic page without clicking on the link. It’s less convenient but it’s safer.

Stay safe and practice both physical and digital hygiene.