RSS Feeds and Other Notifications

Did you know that the Dickimaw Books site has a number of RSS feeds? These are links denoted with RSS Feed. This means you can pick up the latest news in your news aggregator. Some mail applications, such as Thunderbird, have an aggregator function.

The main RSS feed can be found on the banner at the top of the pages on the main part of this site (outside of this blog and the shop). If you have hidden that area on the banner using the settings page, then you can find the RSS feed link on the news page. That main feed is for the brief news items, such as updates to the site or software or new articles. Each news item is usually no more than one or two sentences.

The feed for this blog can be found at the top of the blog side bar. This picks up any new blog post and includes the entire post in the feed. There are also feeds for individual blog categories. For example, if you go to the “site” category, there’s a link to the category feed after the category title. If you subscribe to that feed, then you will just get the new posts for that particular category.

There isn’t a specific RSS feed for the shop, but if you have a store account then you can sign up for the shop newsletter. This doesn’t go out very often, just when there’s some particular news that’s specific to the store, such as new products, store closures or promotional offers.

If you don’t have a store account but you want to be notified when a new book is released then you can sign up for new book alerts.

So there are a number of different ways of picking up the latest news on this site. Alternatively, you can just drop in on the site from time to time and look at the recent news page.

Sticky Hamburgers

The previous posts covered the migration of this website to a cloud cluster server and the site cookies (strictly necessary and optional). The optional cookie is only set if you choose to change the default site settings. This post describes those settings in more detail.

The settings page allows you to adjust various stylistic options on this site. They’re not used on every page, but if you visit the site regularly you may prefer to make some adjustments to better suit your device and taste.

Image of a page on the Dickimaw Books site using the default settings

The main part of this site has a yellow and green theme to match the logo’s yellow parrot holding a green book. The main banner shows the site name “Dickimaw Books” with the logo on the right. At the bottom of the banner is the cookie notice and some icon links to the RSS feed and social sites. Both the cookie notice and those icon links can be hidden by changing the default settings. (This won’t usually make a noticeable difference to the overall banner height.) If you hide the cookie notice for the main site it will also hide the cookie notice on this blog.

Below the banner is the main navigation bar (with a green background and yellow text) and below that is the area where the latest news item is displayed.

If the browser window is wide enough (over 600px) there’s a side panel on the right with additional links related to the current page. This right-hand panel will stay in view as you scroll down the page. However, if the browser window is narrow (≤600px), the panel will be moved to the area below the latest news item and will scroll off the screen (along with the banner, main navigation bar and latest news) as you move down the page. This secondary context-sensitive navigation area now starts with a “Skip to main content” link, which will take you to the start of the main page content (below the navigation area) to make it easier to skip past a potentially long navigation list.

Image of site page displayed in a narrow window.

If you have a wide screen but require a large font then you may prefer to have the context navigation panel below the recent news (as with a narrow screen) instead of on the side of the page. In which case, if you go to the settings page and scroll down to the “page style” area then you can switch from the default style to the “without side panel” style (shown below).

Image of site page using the “without side panel” style.

Alternatively, you may prefer to hide the context sensitive panel until you need to use it. This requires a button that can show or hide the side panel. This type of button is sometimes called a “hamburger” button because it often has an icon with three horizontal bars (representing the items in the navigation list or menu) that look a little like a stylised hamburger. Buttons that require action but don’t submit information to the webserver require client-side code to perform the action. This means that if you want to use the hamburger page style you need to allow JavaScript. Most browsers support JavaScript but some people prefer to disable it out of security concerns. So if you have disabled JavaScript, don’t use the hamburger page style.

The hamburger style also makes the main navigation bar sticky (it sticks to the top of the window). The hamburger button is placed at the right hand side of this bar so you can still access it as you scroll down the page (see image below).

Image of site page with green navigation bar stuck to the top of the window.

Perhaps you don’t like the yellow and green, in which case there are plain versions for all of the above, which just have a grey border. For example, the plain style with side panel is shown below. (This setting will also change the colour scheme used by this blog.)

Image of site page in plain style with side panel.

Some of the images are quite large, especially in the shop, gallery and book list. If you have a narrow screen or limited bandwidth you may prefer smaller images. In which case you can select the small images option in the settings page. Below shows the standard size image in a narrow screen compared with the same page using the small image setting.

Image of page in narrow window with image wider than the window. Image of same page as before but with a smaller image.

If you like to have little icons, there are a selection you can choose from. Some of the icons are images, designed for either the yellow/​green or the plain styles. For example, below shows the yellow/​green sticky hamburger style with image icons in the main navigation bar (following each associated text link) as well as for the bullet points in the context sensitive navigation list (not currently supported on all pages). Note also that in the main body of the page the “available in store” link is now followed by the chosen shop icon and the permalink icon has also been changed to an image icon. The side panel also has an icon to hide it (or you can click on the hamburger button again).

image of site page with image icons

Most of the available icons that you can select are actually Unicode characters. Their appearance (and whether or not they are supported) depends on the font used by your browser. For example, below shows the page using coloured Unicode pictographs as the icons (in the main navigation bar and the permalink).

Image of site page with coloured Unicode pictographs.

The same page with the same settings in a different browser with a different font is shown below. In this case the font doesn’t have coloured pictographs but instead has symbols that just use the current text font.

Image of site page in different browser.

Not all fonts support all the possible symbols. Unavailable symbols are usually depicted with an empty rectangle or a boxed question mark.

For a more compact layout you may prefer to hide the text in the navigation bar. The image below shows the plain sticky hamburger style with no navigation text so the navigation only consists of pictographs. (It also has the small image setting on.)

Image of site page with no navigation text.

The pictographs mostly relate to the topic (such as the email symbol U+1F4E7 📧 for the link to the contact page) but for those of a more fun-loving nature there are also more frivolous symbols available.

For example, the image below has the “About” icon set to the U+1F99C🦜 parrot character, the “Shop” icon set to the U+1F986 🦆 duck character, the “LaTeX” icon set to the U+1F981 🦁 lion face character, the “Software” icon set to the U+1F427 🐧 penguin character, the “Books” icon set to the U+1F989 🦉 owl character, the “Gallery” icon set to the U+1F99A 🦚 peacock character, the “News” icon set to the U+1F993 🦓 zebra face character, the “Contact” icon set to the U+1F40C 🐌 snail character, the “Blog” icon set to the U+1F99B 🦛 hippopotamus character, the “Settings” icon set to the U+1F984 🦄 unicorn face character, and the menu icon set to the U+1F995 🦕 sauropod character. The close icon for the side panel is the U+1F996 🦖 t-rex character. The page also has the U+1F418 🐘 elephant character for the permalink and the context-sensitive navigation list has the U+1F95A 🥚 egg and U+1F423 🐣 hatching chick characters for the top-level, and the U+1F41B 🐛 bug and U+1F98B 🦋 butterfly characters for the second level.

Image of site page with frivolous icons.

The settings are stored in a cookie. You can opt for a session cookie, which your browser should delete at the end of the session (but remember from the previous post that some browsers use session restoring) or you can opt for a persistent cookie. Once the cookie expires or if you use your browser privacy settings to delete it, the site will revert back to its default style.

Clouds, Cookies and Migration Part 3: Cookies

Once upon a time, a little parrot decided to migrate across the vast ocean to the cloud lands, with nothing more than a handful of cookies.

This is a continuation of the story about this site’s migration to a cloud cluster web hosting account. The first part described what prompted the move, and the second part described the move from a single server to a cloud cluster.

This site now has two strictly necessary cookies. It also has one optional cookie that’s only set if you explicitly request it (in the settings page). It’s hard to visit a site these days without having a box popping up on the screen about cookies, and cookies have had a lot of bad press because they’re often used for tracking. However, despite the number of sites screaming at you about their cookies, they’re still a bit of a mystery to some Internet users. It’s some secret thingummy that follows you around, spying on your every move.

When you click on a link, select a bookmark or type an address in your browser’s location bar, your browser sends a request to the domain’s server for the contents of the particular page. The returned content is usually (but not always) in a format called hypertext markup language (HTML). For example, suppose you visit www.example.com/test.html then this corresponds to a file called test.html that’s on example.com’s server. In this case, all the server has to do is send the contents of this file to the browser. This is a “static page” because it’s always the same (until someone edits the test.html file).

Now suppose you visit www.example.com/test.php then this corresponds to a file called test.php. This is a script (application) that when run by the server creates the HTML content that needs to be sent to the browser. This is a “dynamic page” because the content may change depending on certain conditions.

The hypertext transfer protocol (HTTP) used by web pages (both static and dynamic) is stateless. That means a web page can’t remember anything, so it has to be repeatedly retold everything it needs to know. (Your browser may remember the information you’ve typed into forms to help autofill if you need to fill in the form again, but the server doesn’t remember.)

Image of text greeting guest and prompting for guest’s name.

Suppose Alice visits www.example.com/​​test.php. Her browser asks the example.com server for test.php and that script sends back the text “Hello, guest. What’s your name?” with a box for Alice to supply her name. She does this and clicks on the submit button.

Image greeting Alice and asking for her favourite colour.

Her browser now asks the server for test.php along with the information that the “name” parameter has been set to “Alice”. This time the script sends back the text “Hello, Alice. What’s your favourite colour?” with a box for Alice to specify her favourite colour. So she enters “purple” and clicks the submit button. This time her browser asks the server for test.php along with the information that the “colour” parameter has been set to “purple”. Now the script knows the colour but it’s forgotten her name.

The usual method is for the script to include the information it’s already been supplied with as hidden parameters in the new form so that the user doesn’t have to keep retyping the same information. So let’s start again.

The browser asks the server for test.php with no further information, so the script sends its default message “Hello, guest. What’s your name?” with a box for the name. Alice fills it in and clicks on the submit button. The browser then asks the server for test.php along with the information that the “name” parameter has been set to “Alice”. The script sends back the text “Hello, Alice. What’s your favourite colour?” with the colour box, but it also includes markup that defines a hidden parameter called “name” that’s set to “Alice”. Hidden parameters aren’t visible on the page but now when Alice clicks on the submit button her browser asks the server for test.php and tells it the name is “Alice” and the colour is “purple”. So this time the script knows both her name and her favourite colour and sends back the text “Hello, Alice” with an instruction that the browser should display her name in purple. The next time Alice visits test.php she’ll be greeted with the default message, because the script has forgotten the information she previously provided.

There are two types of parameters: “get” and “post”. The get parameters are included in the web address. For example, www.​​example.​​com/​​test.php?​​​​name=​​Alice&​​colour=​​purple (the question mark separates the address of the script and the parameters that need to be supplied to the script). If Alice sends this address to Bob and he clicks on the link, he’ll be greeted with the message “Hello, Alice”.

Post parameters aren’t included in the web address. They can only be sent when you submit a form.¹ If, after completing the form, Alice shares the page with Bob and he clicks on the link then he’ll be at the start.

Now let’s suppose that both Alice and Bob visit dickimaw-books.com/shop. Alice adds a copy of The Foolish Hedgehog to the basket, and Bob adds a copy of LaTeX for Complete Novices to the basket. Both of them click on the “Cart Contents” link which leads them to dickimaw-books.com/shop/shopping_cart.php but how does the shopping_cart.php script know that it needs to show Alice a link to The Foolish Hedgehog and Bob a link to LaTeX for Complete Novices?

Each visitor is assigned a session ID, which is a long sequence of letters and numbers that uniquely identifies the visitor. The server stores this ID in a database along with the shopping cart contents, but it needs to be told the ID along with the page request so that it can match it up with the ID in its database.

If the session ID is sent as a parameter then it can be sent as a hidden post parameter in a form (if Alice or Bob click on a button) but it would have to be sent as a get parameter (part of the address) if they click on a link.

Suppose Alice clicks on a link to Quack, Quack Quack, Give My Hat Back! and decides that Eve might be interested in it, so she shares the link with Eve. Eve clicks on the link but it includes Alice’s session ID as a get parameter, so the shop thinks that Eve is Alice. If Alice happens to be logged in then Eve will find herself logged in as Alice.

Ooh, thinks Eve. I wonder what kind of things Alice has been ordering? Eve clicks on the order history link, and the script sends her a list of Alice’s orders because Eve is using Alice’s session ID. Ooh, thinks Eve. Alice ordered a copy of The Private Enemy and sent it to Bob. Oh so that’s his address. I think I’ll pop round this evening to find out why he’s been avoiding me. Won’t he be surprised to find out I know he’s received a present from Alice!

A more secure solution is for the server to tell the browser to save the session ID and send it along with any requests for pages on that website. That way the ID doesn’t get included in any links so it won’t be accidentally shared. (It could be intentionally intercepted by an eavesdropper but that’s another story.) So when Alice now logs into her account the server creates a new session ID and asks the browser to save it. The browser writes the ID in a small file (on Alice’s computer or mobile device) that it associates with the website’s domain (dickimaw-books.com in this case). That file is called a cookie.

This particular type of cookie (used to store a session ID) is normally a session cookie, which means that it expires at the end of the session (that is, when the browser is closed). It’s the browser’s responsibility to delete expired cookies (although you can explicitly delete cookies through the browser’s privacy settings). If the site has a “remember me” option then a persistent cookie is needed instead. Also, some browsers use “session restoring” which makes the session cookies permanent, so it’s a good idea to check your browser configuration.

This site has two session cookies: one that contains the session ID for the shop and one for load balancing. The shop cookie is only valid on the dickimaw-books.com/shop path, not across the entire site (so the browser will only send it while you’re visiting the shop). The load balancing cookie helps to distribute the visitors to this site across the servers in the cluster.

A persistent cookie is one that stays around after the session has ended. This may have an expiry date and again the browser should delete it once that date has passed. This site has a persistent cookie (called “dickimaw_settings”) that will only be set if you change the default site settings. When you return to the site, your browser will be able to tell the server your preferences (which are stored in the cookie) so you don’t have to keep changing it whenever you visit.

If I enable comments for this blog, there will be additional cookies so that you can create an account and log in to post comments or modify your preferences. These are covered in the blog’s privacy policy in case I decide to enable blog commenting at some later date. Private areas of the site have other cookies, but they’re private so you shouldn’t be going there anyway.

The load balancing and session ID cookies are known as “strictly necessary” cookies because they’re needed to ensure that the website functions efficiently and to help protect you against accidentally sharing your session ID. The cookie that stores your preferences is an optional cookie because the site is able to function without it.

Some sites only have strictly necessary cookies and so have a box informing you of this and all you can do is dismiss the box (by clicking on “okay”, “got it!”, “close” or whatever). Some sites have optional cookies so the notification box will ask you if you want those optional cookies or not.

I once visited a site where I clicked on “no thanks” because I was getting annoyed with all these stupid cookie boxes popping up at me while I was searching for something. The site changed its content so that it simply displayed a churlish message saying that if I wasn’t going to accept its cookies it wouldn’t show me anything. (In hindsight, I should’ve just switched to reader view.) Some weeks later I happened to land on that site again and it still displayed the same curt message. But wait a minute, how did the site remember my response? It had, of course, used a cookie to store the fact that I didn’t want that site to create any cookies.

So, if a site just has a couple of strictly necessary session cookies but also has a box telling you about those cookies then it will also need a persistent cookie to record that you’ve acknowledged the cookie notice. This is also a strictly necessary cookie because the site can’t function properly if you have to dismiss a box every time you move from one page to the next (or reload a page). So you now have two session cookies that will expire after your session ends and a persistent cookie that will linger long after those session cookies have gone just in case you happen to revisit the site at a later date.

The Information Commissioner’s Office (ICO) stipulates that “you cannot set non-essential cookies on your website’s homepage before the user has consented to them”. So this site doesn’t set the optional settings cookie unless you explicitly request it on the settings page.

Consent usually isn’t required for strictly necessary cookies that are needed for online shopping or load balancing but, as the ICO points out, “it is still good practice to provide users with information about these cookies, even if you do not need consent”. So this site has a cookie notice that’s designed to be visible but not block the page content. If you don’t want to keep looking at it, you can use the settings page to hide it, but this will, of course, create a cookie to store your preference.

The normal advice is to use your browser privacy settings to allow first-party cookies but block third-party cookies. If a page on one website embeds content from another website then the browser will send the cookies that belong to the website you’re visiting (first-party cookies) but will also send the cookies that belong to the website that provides the embedded content (third-party cookies). It’s these third-party cookies that are following you around.

The next blog post will describe the settings page in more detail.


¹Post parameters can also be sent by altering the HTTP header request, but I don’t want to get too technical here.

Clouds, Cookies and Migration Part 2: Clouds

Once upon a time, a little parrot decided to migrate across the vast ocean to the cloud lands, with nothing more than a handful of cookies.

The previous post described why I decided to migrate to a new web hosting provider, TSO Host. I had the choice between their cPanel account and their cloud account. I opted for the cloud account.

The term “cloud computing” or “in the cloud” can conjure up a fluffy image of data floating around in the air. It’s actually far more down-to-earth and basically entails a bunch of computers in a data centre that are on 24/7 so that the data on them can be accessed across the world at any time in any time zone.

Imagine a computer without a monitor, keyboard or mouse (because it’s only accessed remotely). It’s a server or, rather, has a server process running on it. This is a bit of a simplification, but in this case the server essentially serves up files on request (provided access is permitted). Now imagine that computer sitting on a rack full of other computers. Imagine row upon row upon row of these racks, filling up the entire room. All whirring away because obviously they have to be on all the time. They’re using up electricity and generating heat ― far too much heat, so they have to be cooled. Those devices could be storing private data, such as company documents or your personal photos, or they could be storing the files that make up a website. These data centres not only need to be protected from hacking they also need to be protected from physical intrusion (i.e. burglary).

In that sense, the cPanel shared hosting account is also cloud computing. The files that make up the web site are all contained on a single server in a data centre. The cloud web hosting provided by TSO Host is a particular type of cloud computing that uses a cluster server where the files that make up the web site are synchronized across multiple servers. This makes the web site more reliable. If one device goes down, the others in the cluster can keep the site on-line.

In the end, I decided on the cloud account and opted for the free migration help when I signed up. The migration team copied over all mail accounts, databases and the contents of the public_html directory from my account on Hostgator’s server. I had some files outside of that directory but they were small by comparison, and it was easy enough for me to archive them and copy them over.

There are pros and cons with the cloud cluster verses the cPanel single server. The cloud has a far simpler dashboard with a very primitive text file editor. This isn’t normally a problem for me as I mostly use the secure shell (ssh) to access my files so I can edit them with vi or vim. It’s necessary to first activate ssh, and you have to wait about 15 minutes after activating it before logging in. I then deactivate ssh after I’ve finished.

I had the choice of migrating my SSL certificate (for a fee) but it was less than a month from its expiry date so I didn’t bother. Instead I switched to Let’s Encrypt which is available for both the cPanel and cloud accounts. Unfortunately, the cloud account doesn’t support Let’s Encrypt for sub-domains. Apparently it is supported for sub-domains on cPanel. The TSO Host support staff said I could switch to cPanel if I preferred but, after considering it, I decided not to bother. My website is small enough not to really need sub-domains and I hadn’t advertised the intended change, so I moved the shop back to its original location.

For about a week I had my website files on both the Hostgator server and the TSO Host cluster while I made all the necessary changes. If you visited the site during that time you would’ve been viewing the old files on the Hostgator server.

The first thing I had to do was change any absolute paths. With the cPanel server, the home directory is usually in the form /home/username but with the cluster the home directory varies depending on which device in the cluster you’re on. For most of the scripts I could obtain the path from the DOCUMENT_ROOT server setting. With osCommerce (used by the on-line store), the configuration file conveniently defines a constant that stores the shop’s path so that just required a minor edit to reference DOCUMENT_ROOT, but there were also a few absolute paths stored in the shop’s database, such as the location of the public and private key (used to encrypt the order information sent to PayPal). I modified the software to allow me to store relative paths in the database instead. The only place that I now have a hard-coded absolute path (to a specific device in the cluster) is in the .htaccess files. I haven’t found any way around this.

The other changes I needed to make was to update my PHP files to work with PHP 7.3 as they were previously running on an older version and contained deprecated commands. With TSO Host I have the option to switch to an older version, such as PHP 5.4, which I did initially to ensure the scripts worked, but obviously it’s better to use the latest version, which comes with extra security measures. Once I’d finished making the necessary modifications I switched to PHP 7.3.

You might be wondering what happened to my Perl CGI scripts that I mentioned in my previous post. It turns out they don’t work here either. The missing modules are still missing, but at least I’m now getting an understandable error message from cpan: I don’t have permission to install them. Perhaps they’re not pre-installed because they’re now obsolete or have vulnerabilities or haven’t been vetted. Anyway, I’ve now decided that I’d rather use PHP which provides the necessary functions that those scripts require without the need to depend on extra libraries or modules. All those Perl scripts should now automatically redirect to the new PHP replacements.

Once I’d made all the modifications necessary to make the site work on the new cloud server cluster it was time to change the nameservers. (Basically, when you type an address into your web browser it has to ask the nameserver for directions.) After the switch was made, I then went back to my Hostgator account and deleted all files, databases etc because I’m paranoid tidy before closing my account.

Since then I’ve been working on the remaining new PHP scripts in between a lot of travelling and other commitments. I also installed WordPress. This was easy to do from the cloud dashboard, and the installation tool sensibly chose not to use stupidly obvious admin or database names (they seem to just be randomly generated strings). My plan is to republish the articles from my old blog here, although I may omit any time-sensitive information (such as giveaways that have now closed).

WordPress isn’t as easy to tinker with as osCommerce. For example, osCommerce has a constant defined in the configuration file that has the relative path to the admin directory. This makes it really easy to rename. WordPress, on the other hand, hard-codes the relative admin path. While it’s technically possible to alter this by editing all the files that reference this path, the changes will be lost when upgrading to a new version. Whilst one shouldn’t rely on obscurity as the only form of defence, there’s no point in making things too obvious. (Consider the Lonely Mountain in “The Hobbit”. The hidden door could only be unlocked with a key at a certain time, but that didn’t mean the dwarves went around putting up signs saying “secret entrance this way”.) There are, however, security plugins that restrict the number of login attempts etc.

Both osCommerce and WordPress have the database credentials in a configuration file within their installation paths. This is normally protected from public viewing by the server settings, but an accidental mis-edit or deletion of the .htaccess files could cause the contents of those configuration files to be shown as plain text, exposing the credentials (user name, password, database name etc). So I’ve moved them out of those configuration files to a location that can’t be accessed by a browser.

I’ve added some new PHP scripts that have replaced static pages, such as the gallery that’s now searchable, the book page and the site map. The “new book alert” Perl CGI script has been replaced with the more general book list. There have been a few glitches, but hopefully they’re all now fixed, and there are some more updates still to do but the main scripts are done.

With my previous web hosting company, this site had one strictly necessary cookie for the online store. The new cloud account has a second strictly necessary cookie. These cookies will be discussed in the next post.

Clouds, Cookies and Migration Part 1: Migration

Once upon a time, a little parrot decided to migrate across the vast ocean to the cloud lands, with nothing more than a handful of cookies.

I switched web hosting provider in May, and have since made quite a few changes to the site. It all took much longer than I had originally intended. I’m sorry for any disruption, but everything should hopefully now be all sorted. One of the new additions is this blog, and it seems appropriate that this first post should explain the reason for all the changes, but first a little background about the origins of Dickimaw Books.

I had some tutorials on theoval.cmp.uea.ac.uk/~nlct that I had developed from some training courses for staff and postgraduates that I occasionally taught at the University of East Anglia (UEA) where I’m an honorary lecturer (that is, I’m not a full-time paid member of staff; I just do the odd little project or occasional short staff and PG training courses in LaTeX). I provided the online tutorials as a supplement to the courses, which I made available under the GNU Free Documentation License (GNU FDL). After a while I discovered that people from outside of the courses were reading and sharing them. Some people were also printing them because they preferred to read hard copies. The tutorials were gaining in popularity but there’s no guarantee that “theoval.cmp.uea.ac.uk” will continue to be available. It was originally “theoval.sys.uea.ac.uk”. It may have another name change, or may be retired. Most of all, it’s quite complicated to access the files on that server when off-site, which makes them hard for me to update. My honorary status is reviewed every few years and if it isn’t renewed I’ll lose my UEA account. I needed a domain name and server that I had more control over and easier access to.

I started to wonder about providing printed versions. Would people want to buy a book that’s freely available online? Free resources are often subsidised by adverts, but I find them intrusive. There are some sites so loaded with ads that it’s almost impossible to actually read the real page content. I’d rather not have ads on my site if possible, but without them I have to sell enough copies to cover the costs. It was a bit of a gamble.

If you’ve bought a copy or copies of my books, a big thank you. It’s because of you that this site is still going and free of ads.

I’ve already written about being an independent writer/publisher but I didn’t mention about web hosting. With only a small budget available, I had to opt for a low-cost shared hosting account. I started out with Hostgator’s “hatchling” account. I’ve been a computer programmer for over 25 years (my introduction to programming was a Modular 2 first year undergraduate course in 1988), and I had written some web pages and scripts before setting up my own site, but this was the first time I was responsible for an entire domain.

The shared hosting account came with an easy-to-use web application that a contact form could use to email me a message according to a template. Unfortunately it proved far too easy to use for spam-bots, and it also seemed to have a problem with messages that contained a backslash, so pretty much every LaTeX-related message was garbled. I replaced the forms with Perl CGI scripts since I was already familiar with programming in Perl. This fixed the problem of the garbled messages but the spam was still getting through.

I knew enough to be wary of bots trying to inject spammy links or malicious code and I’ve been on Usenet long enough to know about the existence of trolls, but I didn’t realise until then about the troll-bots whose sole purpose seems to be to seek out forms and post inflammatory comments based on certain text found on the page (such as “books”). At least, I’m assuming they were posted by bots. Any real person who can mistake a bug report form for a thought-provoking literary article has to be singularly devoid of intelligence.

(If I’m mistaken and it turns out they were posted by a real person rather than a bot, I do apologise if I caused any offence. I’ll rephrase it: people who deliberately write content with the express intent to insult or inflame while cravenly hiding behind the cloak of anonymity are the same kind of witless cowards who would, a hundred years ago, have been the type to write illiterate poison pen letters to their local community. The community is much wider these days, but it’s the same mentality.)

My first attempt to reduce the spam was to use Google’s reCAPTCHA. The premise seemed quite good initially. It’s a CAPTCHA-like system designed to check if the user is human (rather than a bot) whilst at the same time help with the digitization of books. It showed a wiggly, distorted word that you needed to confirm in the box (common in CAPTCHAs), and also a word scanned from a book, which you also needed to confirm. However, when I looked at the scripts with the reCAPTCHA at a later date I noticed that the scanned words had been replaced with photos of house number plates, which struck me as a bit creepy. I ended up removing them, and I tried other approaches, such as getting the user to confirm an ID (for bug reports and feature requests) or making the form multi-paged (which allows the form to be customized, depending on the initial settings, but also makes it a little harder for a bot to follow).

My original plan was just to provide a website to host the online versions of my books and related information, but I started to add more stuff and eventually decided to try e-commerce. I opted for the open-source osCommerce which is written in PHP. The advantage with an open source project is that I can modify the code to better suit my requirements. This was my first introduction to PHP, and at first I only made minor modifications to make it specifically a book store rather than a more general store (such as changing “manufacturer” to “author”). Later I tried to make it more mobile-friendly. The great thing about osCommerce is that the upgrades are provided in terms of differences between the old version and the new one. It’s less convenient than simply clicking on an upgrade button that will do everything for me, but it means I’m less likely to lose my changes.

If you happen to be planning on setting up your own online store, I recommend you get an expert to write up the legal documents (such as the terms and conditions). A few years ago my electricity supplier decided to launch a brand new site with user accounts to manage bills etc online. I dutifully started filling in the new account form and, being the pedantic person I am, I followed the “terms and conditions” link so that I could read it. (I don’t like ticking boxes to say I’ve read something when I haven’t even glanced at it.) I found myself on a page that was blank except for the title “Cookies”, so I contacted the company. I received an apology and the “correct” URL, which turned out to be their cookie policy page. I wrote back and asked for their actual terms and conditions page. Their reply effectively said that was their only terms and conditions and that I should stop fussing and just tick the check box. (They didn’t literally write that last bit, but that was the subtext.)

Terms and conditions are there to protect the site owner by imposing restrictions (don’t post nasty stuff on our site, don’t try breaking it, don’t sue us if it goes off-line occasionally). A site shouldn’t have to provide such obvious conditions. Society expects visitors to behave well: wipe your feet on the doormat, don’t smash the crockery or insult the other guests. The terms and conditions may go beyond that (which is why it’s a good idea to check them). A cookie policy (as well as a privacy policy), on the other hand, is legally required in countries with legislation such as the General Data Protection Regulation (GDPR) as it relates to your personal data.

My concern with that particular form wasn’t that the company wasn’t trying to impose any conditions on my use of the site. The problem was that it gave the impression that the site was designed by an amateur who didn’t know the difference between the two types of document. If they didn’t know that, how could I be confident that they knew how to securely store my data?

Trust is fundamental to business. It doesn’t matter how good your product is, a certain level of trust is required for people to buy from your store. This is particularly true for on-line small business that have a tendency to pop up and disappear. One of the things I realised I had to do was add an SSL certificate to the site. This wasn’t an option with Hostgator’s hatchling account. There was a button on the cPanel dashboard, but it just popped up a message saying I needed to upgrade my account in order to include SSL. I decided to upgrade to the business account, which included a free SSL certificate.

When I first started with Hostgator, the support channels included email and a ticketing system. It was only when my site encountered a problem in late 2017 that I discovered that some of these channels had gone. My Perl CGI scripts suddenly stopped working. The CTAN team alerted me to the problem. They have a tool that checks all the links on their site and the ones leading to my FAQ were now triggering an error code.

I soon discovered that some of the Perl modules that those scripts depended on were no longer installed. The most obvious solution was to reinstall them, but when I tried I received a rather unhelpful “an error has occurred” message. I modified all of the scripts so that they no longer triggered an error code and just displayed a message saying the particular function was currently unavailable while I investigated further. I couldn’t find a link to the ticketing system so I tried to send an email to the support address but received an automated reply telling me to phone or use the on-line chat. I didn’t fancy being stuck in a trans-Atlantic telephone queue so I tried the on-line chat. The operator didn’t know anything about Perl and eventually gave up and apologised that the technical support staff were all currently unavailable. Please try again later. I found on the forum that I wasn’t the only one experiencing this problem, but there were no solutions offered.

That was when I first considered moving to a new web hosting provider. I started looking around and noticed that many of the shared hosting accounts seemed to use cPanel as a convenient web application to allow site administrators to access the site files, email accounts, databases etc. I investigated cPanel and found that a new version had recently come out. There was a note advising that following the upgrade it may be necessary to reinstall some Perl modules. It therefore seemed likely that it was a cPanel update that had caused the modules to disappear. In which case, if the problem was with this new version of cPanel then moving web hosting provider may not help. In the end, I decided to stick with Hostgator and replace the Perl CGI scripts with PHP. This turned out to take a lot longer than I had originally anticipated due to other more urgent commitments.

A few months ago I started wondering about adding a blog to this site. I was previously using the author blog on GoodReads, but I don’t have much control over the interface, and I’ve started to become wary about putting content on third-party sites. I investigated WordPress (another open-source web application written in PHP) and considered adding it to my site. The business shared hosting account provided the option for sub-domains. By this time, I was starting to get quite adventurous. Perhaps I could have a sub-domain for the shop and another sub-domain for the blog.

I started with the shop. I made a temporary index page for dickimaw-books.com/shop to say that it was currently off-line and set to work moving all the files over to a sub-domain. I ran through some test transactions and everything seemed to work fine. Then I decided that I ought to review all the security settings just to make sure everything was as tight as it could be.

The shop doesn’t store any financial details. It works by transferring the customer to PayPal to perform the actual payment. The shop, therefore, has to communicate with PayPal to provide the transaction details (customer name, invoice address, shipping address, cost etc). PayPal, in turn, has to send a message back to the shop to say the transaction has been completed. The customer is returned to the store, which triggers the confirmation email and the store empties the customer’s basket. The communications between the store and PayPal have to be made securely (in case anyone attempts to intercept them and also to prevent any naughty customer from trying to alter the total cost).

While reviewing all the settings I noticed that “verify SSL” was set to false and that looked like the kind of thing that ought to be on. So I switched it to true and ran a test transaction. An error message appear when I was returned from PayPal’s sandbox account to the store. This was rather annoying. So I copied the message into a search engine to find a solution. I encountered a lot of questions about the problem but the only “solution” offered was to switch off the “verify SSL” setting. This didn’t strike me as a particularly good answer.

Near the “verify SSL” setting there’s a “test connection” button, so I tried that out. It displayed a red “failure” when the setting was on and a green “success” when it was off. The most obvious next step was to find out exactly what that test did. It took a bit of rummaging around the PHP code, but I finally discovered that it was calling curl (client URL, a command-line tool for transferring data). I modified the code so that on failure it would also provide curl’s error message. This gave me a new bit of information in my search, and this time I finally found an answer. My version of curl was likely too old. I found the version number (7.19.7) and looked it up in curl’s release page. It was over 9 years old with 40 known vulnerabilities. Definitely time for an upgrade.

So I was back on Hostgator’s chat. “An operator will be with you in 5 minutes.” Fifteen minutes later I’m finally connected with someone. I was staggered by the response to my request to upgrade curl: sorry, that option isn’t available for the shared hosting accounts. “I hope you understand” the operator concluded. I reiterated it’s over 9 years old with 40 known vulnerabilities but I received the same response again concluded with “I hope you understand”.

Quite frankly, I don’t understand. I think it’s reprehensible of a web hosting provider to not have regular updates for common tools that form an essential part of web security, particularly for a business account that’s marketed as being suitable for e-commerce.

I replied that, in that case, I would find a new web hosting provider and did the internet equivalent of slamming the phone on the hook: I clicked on the close button. So now I really had to migrate and I once again began investigating alternatives.

This was back in May and my news feed had recently included articles about fake reviews, which made me quite wary. Was the 5 star “X is brilliant” written by a genuinely happy customer of X or was it written by someone paid by X? Was the 1 star “X is awful so I switched to Y who are brilliant” really written by a dissatisfied customer of X or was it written by someone paid by Y to boost Y and slate the opposition? How old is the review? Companies can improve or deteriorate over time. An old low rating could’ve prompted them to change for the better.

I decided not to be strongly influenced by ratings and reviews but instead draw up a list according to my requirements and find out from each company in turn what version of curl they had installed. Given the way software updates usually work, the chances are that if curl was up-to-date then so would the other tools I also require.

So, what are my primary requirements? Shared hosting (small budget), Linux (I don’t want to waste time learning how to use an unfamiliar operating system), apache (web server), MySQL (for my databases), PHP (web scripts), Perl (web scripts and non-web command line tools for maintenance etc). That doesn’t really narrow the list as this is quite a common set of requirements.

In order to filter the list further so that I had an easier starting point, I decided to add a secondary requirement that I was willing to drop if I couldn’t find a satisfactory fit: a company with a UK base so I didn’t have to worry about international helplines, currency conversion or bank fees for foreign transactions.

In the end I opted for TSO Host. The chat operator was quick to respond, helpful, and provided me with a curl version number that I checked and found to be less than 2 months old with no known vulnerabilities. I had the option to select either a cPanel account or a cloud account. There are three equivalent packages for each type of account. The price varies according to the package but doesn’t depend on whether you opt for cPanel or cloud. Since I was already familiar with cPanel the operator suggested I might prefer that, which I originally agreed about, but if I changed my mind I could switch to the cloud.

The next post deals with migrating from Hostgator’s cPanel single server to TSO Host’s cloud server cluster.