How does a computer store data: dates

In the previous post of this series on binary, we looked at how a computer is able to convert non-numeric data into binary so that it is able to store text. In this post, we will examine the concept of dates and time to see how it is possible to store them in computer memory. Once again, it all boils down to ones and zeros – but there’s a sting in the tail!

Working with dates

On the face of it, you might think that a date is no different to a piece of text. If we ask the computer to tell us what the date is, it usually returns it in a format like “22 May 2019” or “22-05-19”. Isn’t this just text? Could we not use the same approach that we used in the previous post, i.e. encode the information using a character set and then store it as integer-based binary data? The short answer is yes, we could do that. The problem comes when we then expect the computer to do something with the date. If it stores the date as text, how do you work out the difference in time between two dates, or compare a date with the current system date? What about different time zones, daylight savings time, leap years and other temporal anomalies that must be factored into any calculation involving two or more dates? It quickly becomes apparent that, while text is OK for simply displaying a date to an end user, it isn’t adequate for the computer to use internally. This is because a text representation of a date doesn’t actually mean anything to a computer; asking it to find the number of days between ’01/02/2019′ and ’01/03/2019′ would be like asking it to find the number of days between ‘Frank’ and ‘Bob’. To a computer, they are just strings which have no inherent meaning in and of themselves.

In the early days of computing, a solution to this problem was devised, which has continued – in various shapes and forms – to the present day. The idea was to devise a starting point (sometimes called the ‘epoch date’) and use that as the baseline from which to measure any other date. The chosen epoch date was midnight on 1st January 1970, and any other date was to be measured in terms of seconds elapsed since that date. If a date was needed prior to 1st January 1970, it would be a negative number; any date later than 1st January 1970 would be a positive number. Up to four bytes could be used to store the date, referred to as a “32-bit signed integer”, indicating that the first bit would be used to indicate if the number was positive (0) or negative (1), and the remaining 31 bits used to represent the number of seconds elapsed. This system of storing dates is often referred to as “Unix time” and can be found in most Unix-based systems. A variety of other models based on Unix time have since been devised, but they generally work along the same lines, i.e. measuring a date based on time elapsed since a particular epoch date. Some later models measure the time elapsed in milliseconds in order to support more precise date timestamps.

The Year 2038 Problem

Using a 32-bit signed integer to capture the total number of seconds elapsed since 1st January 1970 comes with a catch: at 03:14:07 on 19 January 2038, the binary number representing the number of seconds elapsed will reach 01111111111111111111111111111111. What’s the problem, you might think, there’s still another bit to use. The snag is that the left-most bit is the ‘sign indicator’, and 0 means the number is positive. If the computer increments the number to store 03:14:08 on 19 January 2038, the binary number will be 10000000000000000000000000000000, which does not represent the intended date because it is now a negative number (1 means negative in the left-most bit of a 32-bit signed integer) and thus the computer would interpret it as 20:45:52 on 13th December 1901!

This is something which is referred to in computing circles as “integer overflow”: what has happened, in effect, is the computer has run out of bits to capture the information requested, and so it has ‘run over’ into the next available bit in memory, ignoring the fact that the bit in question has another purpose. Similar issues arise when we try to add two bytes together but the resulting value is more than 8 bits long: the computer must either temporarily ‘carry’ the additional bit in the microprocessor’s internal register or else spill over into the next available byte in memory, potentially overwriting what was originally there. Integer overflow is similar in nature to another computing problem known as ‘stack overflow’, which is where the computer has filled up the available memory (known as a stack) and cannot add anything more. This isn’t quite the same thing as saying all memory in the computer has been used up, because typically a program will designate a memory address range for the stack with limits (or ‘bounds’), so the stack overflow happens only where the limits of the memory address range have been exceeded for that stack. However, the principles are very much the same as integer overflow: an operation occurs, insufficient memory exists to store the outcome and so the computer attempts to spill over into the next available memory location, but that location does not exist or is in use for another purpose.

Historical déjà vu?

For those old enough to remember, this is a very similar problem to the infamous Y2K problem (or “Millennium Bug”) which was forecast to occur on New Year’s Day 2000. This was rooted in a much older problem dating back to the days when computer programs were input using fixed width punch cards with a hard limit on the number of characters per data row. In the interests of conserving space on the punch card, it became quite common practice for software developers to omit the ’19’ prefix from the year component of a date. Fast forward to the 1990s, and there were genuine concerns that the coming millennium would ‘reset’ such date values to the year 1900 instead of the year 2000. The debate still rages about how much of an impact this would have genuinely caused, since while the input and display of date values for many old computer programs may have only been a two-digit year, how the computer internally stored it (e.g. using “Unix time”) may well have been different. In any case, expensive remediation projects sprang up all over the world to try and mitigate the feared effects; tales abounded of how planes would fall out of the sky at midnight on New Year’s Day, nuclear power stations would go into meltdown and bank ATMs would freeze. We will never know the full effects of what might have happened, and the extent and scale of Y2K is very hard to measure, especially after the event. Nevertheless, sufficient evidence was available at the time to indicate that some systems may well have suffered had remediation activity not taken place.

Unlike Y2K, we won’t have to wait until 2038 for the Year 2038 Problem to start taking effect, since any system needing to store a date 20+ years in the future could be affected right now. Newer 64-bit operating systems with their 64-bit signed integers for date storage easily overcome the problem; they are capable of holding dates ten times older than the known age of the universe, taking us 292 billion years into the future! However, it isn’t straightforward to convert 32-bit signed integers into 64-bit, especially if the underlying hardware itself cannot be replaced. Despite the advances in modern computing power, many critical systems – including those used by banks, insurance firms, aviation companies and the military – rely on much older system architecture utilising software and operating systems that cannot support 64-bit. It remains to be seen how well the Year 2038 challenge will be met, but given that it will surface well before the year itself comes around, we can but hope that suitable solutions will be found in time…

Conclusion

Having looked at the three major data types – numbers, text and dates – this concludes our series on how a computer stores data in binary. Of course, a computer is capable of storing other data like images, videos, music and complex computer executables. Ultimately, though, even they can be understood as very large collections of binary numbers, representing pixel colours and locations on a screen, pitch and frequency instructions for a sound output port or many thousands of lines of machine code respectively. In the end, binary is all that a computer can work with, and the history of computer science and technology has been largely about finding innovative ways of translating the information and sensory data that we can understand into the ones and zeros that a computer can.

2 comments

  1. […] This hopefully demonstrates both how character sets work in practice, and their importance to a computer when storing text-based data. In the next post, we will explore dates… […]

  2. […] In the next posts, we will look at how computers store text and dates. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: