image of dickimaw parrot with cookies in clouds

Clouds, Cookies and Migration Part 3: Cookies

Once upon a time, a little parrot decided to migrate across the vast ocean to the cloud lands, with nothing more than a handful of cookies.

This is a continuation of the story about this site’s migration to a cloud cluster web hosting account. The first part described what prompted the move, and the second part described the move from a single server to a cloud cluster.

This site now has two strictly necessary cookies. It also has one optional cookie that’s only set if you explicitly request it (in the settings page). It’s hard to visit a site these days without having a box popping up on the screen about cookies, and cookies have had a lot of bad press because they’re often used for tracking. However, despite the number of sites screaming at you about their cookies, they’re still a bit of a mystery to some Internet users. It’s some secret thingummy that follows you around, spying on your every move.

When you click on a link, select a bookmark or type an address in your browser’s location bar, your browser sends a request to the domain’s server for the contents of the particular page. The returned content is usually (but not always) in a format called hypertext markup language (HTML). For example, suppose you visit www.example.com/test.html then this corresponds to a file called test.html that’s on example.com’s server. In this case, all the server has to do is send the contents of this file to the browser. This is a “static page” because it’s always the same (until someone edits the test.html file).

Now suppose you visit www.example.com/test.php then this corresponds to a file called test.php. This is a script (application) that when run by the server creates the HTML content that needs to be sent to the browser. This is a “dynamic page” because the content may change depending on certain conditions.

The hypertext transfer protocol (HTTP) used by web pages (both static and dynamic) is stateless. That means a web page can’t remember anything, so it has to be repeatedly retold everything it needs to know. (Your browser may remember the information you’ve typed into forms to help autofill if you need to fill in the form again, but the server doesn’t remember.)

Image of text greeting guest and prompting for guest’s name.

Suppose Alice visits www.example.com/​​test.php. Her browser asks the example.com server for test.php and that script sends back the text “Hello, guest. What’s your name?” with a box for Alice to supply her name. She does this and clicks on the submit button.

Image greeting Alice and asking for her favourite colour.

Her browser now asks the server for test.php along with the information that the “name” parameter has been set to “Alice”. This time the script sends back the text “Hello, Alice. What’s your favourite colour?” with a box for Alice to specify her favourite colour. So she enters “purple” and clicks the submit button. This time her browser asks the server for test.php along with the information that the “colour” parameter has been set to “purple”. Now the script knows the colour but it’s forgotten her name.

The usual method is for the script to include the information it’s already been supplied with as hidden parameters in the new form so that the user doesn’t have to keep retyping the same information. So let’s start again.

The browser asks the server for test.php with no further information, so the script sends its default message “Hello, guest. What’s your name?” with a box for the name. Alice fills it in and clicks on the submit button. The browser then asks the server for test.php along with the information that the “name” parameter has been set to “Alice”. The script sends back the text “Hello, Alice. What’s your favourite colour?” with the colour box, but it also includes markup that defines a hidden parameter called “name” that’s set to “Alice”. Hidden parameters aren’t visible on the page but now when Alice clicks on the submit button her browser asks the server for test.php and tells it the name is “Alice” and the colour is “purple”. So this time the script knows both her name and her favourite colour and sends back the text “Hello, Alice” with an instruction that the browser should display her name in purple. The next time Alice visits test.php she’ll be greeted with the default message, because the script has forgotten the information she previously provided.

There are two types of parameters: “get” and “post”. The get parameters are included in the web address. For example, www.​​example.​​com/​​test.php?​​​​name=​​Alice&​​colour=​​purple (the question mark separates the address of the script and the parameters that need to be supplied to the script). If Alice sends this address to Bob and he clicks on the link, he’ll be greeted with the message “Hello, Alice”.

Post parameters aren’t included in the web address. They can only be sent when you submit a form.¹ If, after completing the form, Alice shares the page with Bob and he clicks on the link then he’ll be at the start.

Now let’s suppose that both Alice and Bob visit dickimaw-books.com/shop. Alice adds a copy of The Foolish Hedgehog to the basket, and Bob adds a copy of LaTeX for Complete Novices to the basket. Both of them click on the “Cart Contents” link which leads them to dickimaw-books.com/shop/shopping_cart.php but how does the shopping_cart.php script know that it needs to show Alice a link to The Foolish Hedgehog and Bob a link to LaTeX for Complete Novices?

Each visitor is assigned a session ID, which is a long sequence of letters and numbers that uniquely identifies the visitor. The server stores this ID in a database along with the shopping cart contents, but it needs to be told the ID along with the page request so that it can match it up with the ID in its database.

If the session ID is sent as a parameter then it can be sent as a hidden post parameter in a form (if Alice or Bob click on a button) but it would have to be sent as a get parameter (part of the address) if they click on a link.

Suppose Alice clicks on a link to Quack, Quack Quack, Give My Hat Back! and decides that Eve might be interested in it, so she shares the link with Eve. Eve clicks on the link but it includes Alice’s session ID as a get parameter, so the shop thinks that Eve is Alice. If Alice happens to be logged in then Eve will find herself logged in as Alice.

Ooh, thinks Eve. I wonder what kind of things Alice has been ordering? Eve clicks on the order history link, and the script sends her a list of Alice’s orders because Eve is using Alice’s session ID. Ooh, thinks Eve. Alice ordered a copy of The Private Enemy and sent it to Bob. Oh so that’s his address. I think I’ll pop round this evening to find out why he’s been avoiding me. Won’t he be surprised to find out I know he’s received a present from Alice!

A more secure solution is for the server to tell the browser to save the session ID and send it along with any requests for pages on that website. That way the ID doesn’t get included in any links so it won’t be accidentally shared. (It could be intentionally intercepted by an eavesdropper but that’s another story.) So when Alice now logs into her account the server creates a new session ID and asks the browser to save it. The browser writes the ID in a small file (on Alice’s computer or mobile device) that it associates with the website’s domain (dickimaw-books.com in this case). That file is called a cookie.

This particular type of cookie (used to store a session ID) is normally a session cookie, which means that it expires at the end of the session (that is, when the browser is closed). It’s the browser’s responsibility to delete expired cookies (although you can explicitly delete cookies through the browser’s privacy settings). If the site has a “remember me” option then a persistent cookie is needed instead. Also, some browsers use “session restoring” which makes the session cookies permanent, so it’s a good idea to check your browser configuration.

This site has two session cookies: one that contains the session ID for the shop and one for load balancing. The shop cookie is only valid on the dickimaw-books.com/shop path, not across the entire site (so the browser will only send it while you’re visiting the shop). The load balancing cookie helps to distribute the visitors to this site across the servers in the cluster.

A persistent cookie is one that stays around after the session has ended. This may have an expiry date and again the browser should delete it once that date has passed. This site has a persistent cookie (called “dickimaw_settings”) that will only be set if you change the default site settings. When you return to the site, your browser will be able to tell the server your preferences (which are stored in the cookie) so you don’t have to keep changing it whenever you visit.

If I enable comments for this blog, there will be additional cookies so that you can create an account and log in to post comments or modify your preferences. These are covered in the blog’s privacy policy in case I decide to enable blog commenting at some later date. Private areas of the site have other cookies, but they’re private so you shouldn’t be going there anyway.

The load balancing and session ID cookies are known as “strictly necessary” cookies because they’re needed to ensure that the website functions efficiently and to help protect you against accidentally sharing your session ID. The cookie that stores your preferences is an optional cookie because the site is able to function without it.

Some sites only have strictly necessary cookies and so have a box informing you of this and all you can do is dismiss the box (by clicking on “okay”, “got it!”, “close” or whatever). Some sites have optional cookies so the notification box will ask you if you want those optional cookies or not.

I once visited a site where I clicked on “no thanks” because I was getting annoyed with all these stupid cookie boxes popping up at me while I was searching for something. The site changed its content so that it simply displayed a churlish message saying that if I wasn’t going to accept its cookies it wouldn’t show me anything. (In hindsight, I should’ve just switched to reader view.) Some weeks later I happened to land on that site again and it still displayed the same curt message. But wait a minute, how did the site remember my response? It had, of course, used a cookie to store the fact that I didn’t want that site to create any cookies.

So, if a site just has a couple of strictly necessary session cookies but also has a box telling you about those cookies then it will also need a persistent cookie to record that you’ve acknowledged the cookie notice. This is also a strictly necessary cookie because the site can’t function properly if you have to dismiss a box every time you move from one page to the next (or reload a page). So you now have two session cookies that will expire after your session ends and a persistent cookie that will linger long after those session cookies have gone just in case you happen to revisit the site at a later date.

The Information Commissioner’s Office (ICO) stipulates that “you cannot set non-essential cookies on your website’s homepage before the user has consented to them”. So this site doesn’t set the optional settings cookie unless you explicitly request it on the settings page.

Consent usually isn’t required for strictly necessary cookies that are needed for online shopping or load balancing but, as the ICO points out, “it is still good practice to provide users with information about these cookies, even if you do not need consent”. So this site has a cookie notice that’s designed to be visible but not block the page content. If you don’t want to keep looking at it, you can use the settings page to hide it, but this will, of course, create a cookie to store your preference.

The normal advice is to use your browser privacy settings to allow first-party cookies but block third-party cookies. If a page on one website embeds content from another website then the browser will send the cookies that belong to the website you’re visiting (first-party cookies) but will also send the cookies that belong to the website that provides the embedded content (third-party cookies). It’s these third-party cookies that are following you around.

The next blog post will describe the settings page in more detail.


¹Post parameters can also be sent by altering the HTTP header request, but I don’t want to get too technical here.