image of dickimaw parrot with cookies in clouds

Clouds, Cookies and Migration Part 1: Migration

Once upon a time, a little parrot decided to migrate across the vast ocean to the cloud lands, with nothing more than a handful of cookies.

I switched web hosting provider in May, and have since made quite a few changes to the site. It all took much longer than I had originally intended. I’m sorry for any disruption, but everything should hopefully now be all sorted. One of the new additions is this blog, and it seems appropriate that this first post should explain the reason for all the changes, but first a little background about the origins of Dickimaw Books.

I had some tutorials on theoval.cmp.uea.ac.uk/~nlct that I had developed from some training courses for staff and postgraduates that I occasionally taught at the University of East Anglia (UEA) where I’m an honorary lecturer (that is, I’m not a full-time paid member of staff; I just do the odd little project or occasional short staff and PG training courses in LaTeX). I provided the online tutorials as a supplement to the courses, which I made available under the GNU Free Documentation License (GNU FDL). After a while I discovered that people from outside of the courses were reading and sharing them. Some people were also printing them because they preferred to read hard copies. The tutorials were gaining in popularity but there’s no guarantee that “theoval.cmp.uea.ac.uk” will continue to be available. It was originally “theoval.sys.uea.ac.uk”. It may have another name change, or may be retired. Most of all, it’s quite complicated to access the files on that server when off-site, which makes them hard for me to update. My honorary status is reviewed every few years and if it isn’t renewed I’ll lose my UEA account. I needed a domain name and server that I had more control over and easier access to.

I started to wonder about providing printed versions. Would people want to buy a book that’s freely available online? Free resources are often subsidised by adverts, but I find them intrusive. There are some sites so loaded with ads that it’s almost impossible to actually read the real page content. I’d rather not have ads on my site if possible, but without them I have to sell enough copies to cover the costs. It was a bit of a gamble.

If you’ve bought a copy or copies of my books, a big thank you. It’s because of you that this site is still going and free of ads.

I’ve already written about being an independent writer/publisher but I didn’t mention about web hosting. With only a small budget available, I had to opt for a low-cost shared hosting account. I started out with Hostgator’s “hatchling” account. I’ve been a computer programmer for over 25 years (my introduction to programming was a Modular 2 first year undergraduate course in 1988), and I had written some web pages and scripts before setting up my own site, but this was the first time I was responsible for an entire domain.

The shared hosting account came with an easy-to-use web application that a contact form could use to email me a message according to a template. Unfortunately it proved far too easy to use for spam-bots, and it also seemed to have a problem with messages that contained a backslash, so pretty much every LaTeX-related message was garbled. I replaced the forms with Perl CGI scripts since I was already familiar with programming in Perl. This fixed the problem of the garbled messages but the spam was still getting through.

I knew enough to be wary of bots trying to inject spammy links or malicious code and I’ve been on Usenet long enough to know about the existence of trolls, but I didn’t realise until then about the troll-bots whose sole purpose seems to be to seek out forms and post inflammatory comments based on certain text found on the page (such as “books”). At least, I’m assuming they were posted by bots. Any real person who can mistake a bug report form for a thought-provoking literary article has to be singularly devoid of intelligence.

(If I’m mistaken and it turns out they were posted by a real person rather than a bot, I do apologise if I caused any offence. I’ll rephrase it: people who deliberately write content with the express intent to insult or inflame while cravenly hiding behind the cloak of anonymity are the same kind of witless cowards who would, a hundred years ago, have been the type to write illiterate poison pen letters to their local community. The community is much wider these days, but it’s the same mentality.)

My first attempt to reduce the spam was to use Google’s reCAPTCHA. The premise seemed quite good initially. It’s a CAPTCHA-like system designed to check if the user is human (rather than a bot) whilst at the same time help with the digitization of books. It showed a wiggly, distorted word that you needed to confirm in the box (common in CAPTCHAs), and also a word scanned from a book, which you also needed to confirm. However, when I looked at the scripts with the reCAPTCHA at a later date I noticed that the scanned words had been replaced with photos of house number plates, which struck me as a bit creepy. I ended up removing them, and I tried other approaches, such as getting the user to confirm an ID (for bug reports and feature requests) or making the form multi-paged (which allows the form to be customized, depending on the initial settings, but also makes it a little harder for a bot to follow).

My original plan was just to provide a website to host the online versions of my books and related information, but I started to add more stuff and eventually decided to try e-commerce. I opted for the open-source osCommerce which is written in PHP. The advantage with an open source project is that I can modify the code to better suit my requirements. This was my first introduction to PHP, and at first I only made minor modifications to make it specifically a book store rather than a more general store (such as changing “manufacturer” to “author”). Later I tried to make it more mobile-friendly. The great thing about osCommerce is that the upgrades are provided in terms of differences between the old version and the new one. It’s less convenient than simply clicking on an upgrade button that will do everything for me, but it means I’m less likely to lose my changes.

If you happen to be planning on setting up your own online store, I recommend you get an expert to write up the legal documents (such as the terms and conditions). A few years ago my electricity supplier decided to launch a brand new site with user accounts to manage bills etc online. I dutifully started filling in the new account form and, being the pedantic person I am, I followed the “terms and conditions” link so that I could read it. (I don’t like ticking boxes to say I’ve read something when I haven’t even glanced at it.) I found myself on a page that was blank except for the title “Cookies”, so I contacted the company. I received an apology and the “correct” URL, which turned out to be their cookie policy page. I wrote back and asked for their actual terms and conditions page. Their reply effectively said that was their only terms and conditions and that I should stop fussing and just tick the check box. (They didn’t literally write that last bit, but that was the subtext.)

Terms and conditions are there to protect the site owner by imposing restrictions (don’t post nasty stuff on our site, don’t try breaking it, don’t sue us if it goes off-line occasionally). A site shouldn’t have to provide such obvious conditions. Society expects visitors to behave well: wipe your feet on the doormat, don’t smash the crockery or insult the other guests. The terms and conditions may go beyond that (which is why it’s a good idea to check them). A cookie policy (as well as a privacy policy), on the other hand, is legally required in countries with legislation such as the General Data Protection Regulation (GDPR) as it relates to your personal data.

My concern with that particular form wasn’t that the company wasn’t trying to impose any conditions on my use of the site. The problem was that it gave the impression that the site was designed by an amateur who didn’t know the difference between the two types of document. If they didn’t know that, how could I be confident that they knew how to securely store my data?

Trust is fundamental to business. It doesn’t matter how good your product is, a certain level of trust is required for people to buy from your store. This is particularly true for on-line small business that have a tendency to pop up and disappear. One of the things I realised I had to do was add an SSL certificate to the site. This wasn’t an option with Hostgator’s hatchling account. There was a button on the cPanel dashboard, but it just popped up a message saying I needed to upgrade my account in order to include SSL. I decided to upgrade to the business account, which included a free SSL certificate.

When I first started with Hostgator, the support channels included email and a ticketing system. It was only when my site encountered a problem in late 2017 that I discovered that some of these channels had gone. My Perl CGI scripts suddenly stopped working. The CTAN team alerted me to the problem. They have a tool that checks all the links on their site and the ones leading to my FAQ were now triggering an error code.

I soon discovered that some of the Perl modules that those scripts depended on were no longer installed. The most obvious solution was to reinstall them, but when I tried I received a rather unhelpful “an error has occurred” message. I modified all of the scripts so that they no longer triggered an error code and just displayed a message saying the particular function was currently unavailable while I investigated further. I couldn’t find a link to the ticketing system so I tried to send an email to the support address but received an automated reply telling me to phone or use the on-line chat. I didn’t fancy being stuck in a trans-Atlantic telephone queue so I tried the on-line chat. The operator didn’t know anything about Perl and eventually gave up and apologised that the technical support staff were all currently unavailable. Please try again later. I found on the forum that I wasn’t the only one experiencing this problem, but there were no solutions offered.

That was when I first considered moving to a new web hosting provider. I started looking around and noticed that many of the shared hosting accounts seemed to use cPanel as a convenient web application to allow site administrators to access the site files, email accounts, databases etc. I investigated cPanel and found that a new version had recently come out. There was a note advising that following the upgrade it may be necessary to reinstall some Perl modules. It therefore seemed likely that it was a cPanel update that had caused the modules to disappear. In which case, if the problem was with this new version of cPanel then moving web hosting provider may not help. In the end, I decided to stick with Hostgator and replace the Perl CGI scripts with PHP. This turned out to take a lot longer than I had originally anticipated due to other more urgent commitments.

A few months ago I started wondering about adding a blog to this site. I was previously using the author blog on GoodReads, but I don’t have much control over the interface, and I’ve started to become wary about putting content on third-party sites. I investigated WordPress (another open-source web application written in PHP) and considered adding it to my site. The business shared hosting account provided the option for sub-domains. By this time, I was starting to get quite adventurous. Perhaps I could have a sub-domain for the shop and another sub-domain for the blog.

I started with the shop. I made a temporary index page for dickimaw-books.com/shop to say that it was currently off-line and set to work moving all the files over to a sub-domain. I ran through some test transactions and everything seemed to work fine. Then I decided that I ought to review all the security settings just to make sure everything was as tight as it could be.

The shop doesn’t store any financial details. It works by transferring the customer to PayPal to perform the actual payment. The shop, therefore, has to communicate with PayPal to provide the transaction details (customer name, invoice address, shipping address, cost etc). PayPal, in turn, has to send a message back to the shop to say the transaction has been completed. The customer is returned to the store, which triggers the confirmation email and the store empties the customer’s basket. The communications between the store and PayPal have to be made securely (in case anyone attempts to intercept them and also to prevent any naughty customer from trying to alter the total cost).

While reviewing all the settings I noticed that “verify SSL” was set to false and that looked like the kind of thing that ought to be on. So I switched it to true and ran a test transaction. An error message appear when I was returned from PayPal’s sandbox account to the store. This was rather annoying. So I copied the message into a search engine to find a solution. I encountered a lot of questions about the problem but the only “solution” offered was to switch off the “verify SSL” setting. This didn’t strike me as a particularly good answer.

Near the “verify SSL” setting there’s a “test connection” button, so I tried that out. It displayed a red “failure” when the setting was on and a green “success” when it was off. The most obvious next step was to find out exactly what that test did. It took a bit of rummaging around the PHP code, but I finally discovered that it was calling curl (client URL, a command-line tool for transferring data). I modified the code so that on failure it would also provide curl’s error message. This gave me a new bit of information in my search, and this time I finally found an answer. My version of curl was likely too old. I found the version number (7.19.7) and looked it up in curl’s release page. It was over 9 years old with 40 known vulnerabilities. Definitely time for an upgrade.

So I was back on Hostgator’s chat. “An operator will be with you in 5 minutes.” Fifteen minutes later I’m finally connected with someone. I was staggered by the response to my request to upgrade curl: sorry, that option isn’t available for the shared hosting accounts. “I hope you understand” the operator concluded. I reiterated it’s over 9 years old with 40 known vulnerabilities but I received the same response again concluded with “I hope you understand”.

Quite frankly, I don’t understand. I think it’s reprehensible of a web hosting provider to not have regular updates for common tools that form an essential part of web security, particularly for a business account that’s marketed as being suitable for e-commerce.

I replied that, in that case, I would find a new web hosting provider and did the internet equivalent of slamming the phone on the hook: I clicked on the close button. So now I really had to migrate and I once again began investigating alternatives.

This was back in May and my news feed had recently included articles about fake reviews, which made me quite wary. Was the 5 star “X is brilliant” written by a genuinely happy customer of X or was it written by someone paid by X? Was the 1 star “X is awful so I switched to Y who are brilliant” really written by a dissatisfied customer of X or was it written by someone paid by Y to boost Y and slate the opposition? How old is the review? Companies can improve or deteriorate over time. An old low rating could’ve prompted them to change for the better.

I decided not to be strongly influenced by ratings and reviews but instead draw up a list according to my requirements and find out from each company in turn what version of curl they had installed. Given the way software updates usually work, the chances are that if curl was up-to-date then so would the other tools I also require.

So, what are my primary requirements? Shared hosting (small budget), Linux (I don’t want to waste time learning how to use an unfamiliar operating system), apache (web server), MySQL (for my databases), PHP (web scripts), Perl (web scripts and non-web command line tools for maintenance etc). That doesn’t really narrow the list as this is quite a common set of requirements.

In order to filter the list further so that I had an easier starting point, I decided to add a secondary requirement that I was willing to drop if I couldn’t find a satisfactory fit: a company with a UK base so I didn’t have to worry about international helplines, currency conversion or bank fees for foreign transactions.

In the end I opted for TSO Host. The chat operator was quick to respond, helpful, and provided me with a curl version number that I checked and found to be less than 2 months old with no known vulnerabilities. I had the option to select either a cPanel account or a cloud account. There are three equivalent packages for each type of account. The price varies according to the package but doesn’t depend on whether you opt for cPanel or cloud. Since I was already familiar with cPanel the operator suggested I might prefer that, which I originally agreed about, but if I changed my mind I could switch to the cloud.

The next post deals with migrating from Hostgator’s cPanel single server to TSO Host’s cloud server cluster.