# Why Can’t My Computer Do Simple Arithmetic?

My current computer (which cost about the same as the Apple computer) is vastly more powerful, with a CPU that runs each of its four cores at 3.40 Gigahertz (GHz), 10 Gigabytes (GB) of RAM, and two hard drives with a total capacity of 1.5 Terabytes (TB). The increases in CPU clock speed, RAM size, and disk storage represent many orders of magnitude in comparison to my old Apple.

Even though the computer I have now is light years ahead of my first computer, it is not able to do some pretty simple arithmetic, such as adding a few decimal fractions. Or maybe I should say, the usual programming tools that are available aren’t up to this simple task. The following example is a program written in C, that adds 1/10 to itself for a total of ten terms. In other words, it carries out this addition: 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 (ten terms). You might think that that the program should arrive at a final value of 1, but that’s not what happens. In this Insights article I’ll offer an explanation of why the sum shown above doesn’t result in 1.

#include<stdio.h> int main() { float a = .1; float b = .25; // Not used, but will be discussed later float c = 0.0; for (int i = 0; i < 10; i++) { c += a; } printf("c = %.14f\n", c); return 0; }

Here’s a brief explanation of this code, if you’re not familar with C. Inside the **main**() function, three variables — **a**,** b**, and **c** — are declared as float (four-byte floating point) numbers, and are initialized to the values shown. The for loop runs 10 times, each time adding 0.1 to an accumulator variable **c**, which starts off initialized to zero. In the first iteration **c** is reset to 0.1, in the second iteration it is reset to 0.2, and so on for ten iterations. The printf function displays some formatted output, including the final value of **c**.

The C code shown above produces the following output.

c = 1.00000011920929

People who are new to computer programming are usually surprised by this output, which they believe should turn out to be exactly 1. Instead, what we see is a number that is slightly largerthan 1, by .00000011920929. The reason for this discrepancy can be explained by understanding how many programming languages store real numbers in a computer’s memory.

## How floating point numbers are stored in memory

Compiler vendors for most modern programming languages adhere to the IEEE-754 standard for floating-point arithmetic (IEEE floating point). For float numbers in C or C++ or REAL or REAL(4) numbers (four-byte floating point numbers), the standard specifies one bit for the sign, eight bits for the exponent, and 23 bits for the mantissa. This representation is similar to scientific notation (for example, as in Avogadro’s Number, ##6.022 × 10^{23}##), except that the base is 2 instead of 10. This means that the decimal (base-10) number 1.5 would be stored in a form something like ##1.1_2 × 2^0##.

Here the exponent (on 2) is 0, and the mantissa is ##1.1_2##, with the subscript 2 indicating this is a binary fraction, not a decimal fraction. The places to the right of the “binary” point represent the number of halves, fourths, eighths, sixteenths, and so on, so ##1.1_2## means 1 + 1(1/2), or 1.5.

One wrinkle I haven’t mentioned yet is that the exponent is not stored as-is; it is stored “biased by 127.” That is, 127 is added to the exponent that is stored. When you retrieve the value that represents the exponent, you have to “unbias” it by subtracting 127 from it to get the actual exponent.

The last point we need to take into account is that, with few exceptions, “binary” fractions are stored in a normalized form — as 1.xxx…, where each x following the “binary” point represents either a 0 bit or a 1 biit. Since these numbers all start with a 1 digit to the left of the “binary” point, the IEEE standard doesn’t bother to store this number.

The IEEE 754 standard for 32-bit floating point numbers groups the 32 bits as follows, with bits numbered from 31 (most significant bit) down to 0 (least significant bit), left to right. The standard also specifies how 64-bit numbers are stored, as well.

SEEE EEEE EMMM MMMM MMMM MMMM MMMM MMMM

The bits have the following meanings:

- S – bit 31 – sign bit (0 for positive, 1 for negative)
- E – bits 30 through 23 – exponent bits (biased by 127)
- M – bits 22 through 0 – mantissa

Since there are only 23 bits for the mantissa (or 24 if you count the implied bit for normalization), we’re going to run into trouble in either of these cases:

- The mantissa contains a bit pattern that repeats endlessly.
- The mantissa doesn’t terminate within the available 23 bits.

## Some floating point numbers have exact representations in memory…

If you think back to when you first learned about decimal fractions, you might remember that some fractions have nice decimal forms, such as 1/5 being equal to 0.2 and 3/4 being equal to 0.75. Other fractions have decimal forms that repeated endlessly, such as 1/3 = .333…, and 5/6 = 0.8333… The same is true for numbers stored as binary fractions, with some numbers having terminating representations, others having groups of digits that repeat, and still others with no terminating or repeating pattern.

The program at the beginning of this article had the following variable definition that wasn’t used. My reason for including this variable was to be able to find the internal representation of 0.25.

float b = .25;

It turns out that .25 is one of the floating point numbers whose representation in memory is exact. If you notice that .25 is ##0 × \frac 1 2 + 1 × \frac 1 4##, then as a binary fraction this would be ##.01_2##. The subscript 2 is to emphasize that this is base-2 fraction, not a decimal fraction.

In quasi-scientific notation, this would be ##.01 × 2^0##, but this is not in the normalized form with a leading digit of 1. If we move the “binary” point one place to the right, making the mantissa larger by a factor of 2, we have to counter this by decreasing the exponent by 1, to get ##.1 × 2^{-1}##. Moving the binary point one more place to the right gets us this normalized representation: ##1.0 × 2^{-2}##.

This is almost how the number is actually stored in memory.

When I wrote the program that is presented above, I used the debugger to examine the memory where the value .25 was stored. Here is what I found: 3E 80 00 00 (**Note**: The four bytes were actually in the reverse order , as 00 00 80 3E.)

The four bytes shown above are in hexadecimal, or base-16. Hexadecimal, or “hex” is very easy to convert to or from base-2. Each hex digit represents four binary bits. In the number above, ##3 = 0011_2##, ##E = 1101_2##, and ##8 = 1000_2##.

If I rewrite 3E 80 00 00 as a pattern of 32 bits, I get this:

0011 1111 0000 0000 0000 0000 0000 0000

- The
**red**bit at the left is the sign bit, 0 for a positive number, 1 for a negative number. - The 8
**blue**bits are the biased exponent, ##7D_{16}##, or ##125_{10}##. If we unbias this (by subtracting 127), we get -2. - The 23
**orange**bits on the right are the mantissa, all zero bits in this case. Remember that there is an implied 1 digit that isn’t shown, so the mantissa in this case is really ##1.000 … 0##.

This work shows us that 3E 80 00 00 is the representation in memory of ##+1.0 × 2^{-2}##, which is the same as .25. For both the decimal representation (0.25) and the binary representation (##+1.0 × 2^{-2}##), the mantissa terminates, which means that 0.25 is stored in exact form in memory.

## … But other floating point numbers have inexact representations

One such value whose representation isn’t exact is 0.1. When I looked for the representation in memory of 0.1, I found this four-byte hexadecimal number:

3D CC CC CD

Here’s the same number, writtern as a 32-bit bit pattern:

0011 1101 1100 1100 1100 1100 1100 1101

- The
**red**bit at the left is the sign bit, 0 for a positive number, 1 for a negative number. - The 8
**blue**bits are the biased exponent, ##7B_{16}##, or ##123_{10}##. If we unbias this (by subtracting 127), we get -4. - The 23
**orange**bits on the right are the mantissa. Dividing the mantissa into groups of four bits, and tacking an extra 0 bit all the way to the right to make a complete hex digit, the mantissa is 1001 1001 1001 1001 1001 1010, which can be interpreted as the hexidecimal fraction ##1.99999A_{16}##. (The leading 1 digit is implied.) It’s worth noting that the true bit pattern for 0.1*should*have a mantissa of 1001 1001 1001 1001 1001 …, repeating endlessly. But because the infinitely long and repeating bit pattern won’t fit in the available 23 bits, there is some rounding that takes place in the last group of digits.

##1.99999A_{16}## means ##1 + \frac 9 {16^1} + \frac 9 {16^2} + \frac 9 {16^3} + \frac 9 {16^4} + \frac 9 {16^5} + \frac {10} {16^6}## =1.6000000238418500000, a result I got using a spreadsheet.

Putting together the sign, exponent, and mantissa, we get we see that the representation of 0.1 in memory is ##+1.6000000238418500000 × 2^{-4}##, or 0.100000001490, truncated at the 12th decimal place.

## Conclusion

So…. to summarize what went wrong with my program, what I thought I was adding (0.1) and what was actually being added (0.100000001490) weren’t the same numbers. Instead of getting a nice round 1 for my answer, what I really got was a little larger than 1.

We can minimize this problem to some extent by using 64-bit floating point types (by declaring variables of type double in C and C++ and of type DOUBLE PRECISION or REAL(8) in Fortran), but doing so doesn’t eliminate the problem pointed out in this article.

Some programming languages, including Python and those of .NET Framework, provide a Decimal type that yields correct arithmetic for a broad range of values. Any programmer whose application performs financial calculations that must be precise down to the cents place should be aware of the shortcomings of standard floating point types, and consider using external libraries or built-in types, if available, that handle decimal numbers correctly.

I learned a lot with this Insight!

The same problem happens in decimal, for example if you used 1/3 instead. You could use rational numbers instead but this is why you should not do if (a == 1.0) because it may be pretty close to 1.0 but not actually 1. So you need to do if (abs(a – 1.0) < 0.00001), I wish there was an approximately equals function built in. The other problem with floats is that once you get above 16 million you can only store even values, then multiples of 4 past 32 million, etc.

Great!!Just learnt something new today. Thanks for sharing!!

Good Insight. It brings back memories of the good old days (bad old days?) when we had no printers, screens or keyboards. I had to learn to read and enter 24 bit floating point numbers in binary, using 24 little lights and 24 buttons. It was made easier by the fact that almost all the numbers were close to 1.0.

Great post, Mark. I was thinking of doing this very thing. Saved me from making a noodle of myself…. Thanks for that. FWIW decimal libraries from Intel correctly do floating point decimal arithmetic. The z9 power PC also has a chip that supports IEEE-754-2008 (decimal floating point) as does the new Fujitsu SPARC64 M10 with "software on a chip". Cool idea.Cool post, too.

You may find this very helpful:https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

What C will do with that is do the computations as doubles but store the result in a float. Floats save no time. They are an old-fashioned thing in there to save space.Numerical analysis programmers learn to be experts in dealing with floating point roundoff error. Hey, you can't expect a computer to store an infinite series, which is the definition of a real number.There are packages that do calculations with rational numbers, but these are too slow for number crunching.

Maybe it's time to review/modify/improve the IEEE-754 standard for floating-point arithmetic (IEEE floating point).

Mathematica does rational arithmetic, so it gets it exact: Sum[1/10,{i,10} ] = 1 ;-)

Good insight !

Quickly fix [URL=’https://en.wikipedia.org/wiki/Avogadro_constant’]Avogadro[/URL] !

##6.022 × 10^{23}## right? I fixed it.

The first programming class I took was in 1972, using a language named PL-C. I think the ‘C’ meant it was a compact subset of PL-1. Anyway, you wrote your program and then used a keypunch machine to punch holes in a Hollerith (AKA ‘IBM’) card for each line of code, and added a few extra cards for the job control language (JCL). Then you would drop your card deck in a box, and one of the computer techs would eventually put all the card decks into a card reader to be transcribed onto a tape that would then be mounted on the IBM mainframe computer. Turnaround was usually a day, and most of my programs came back (on 17″ wide fanfold paper) with several pages of what to me was gibberish, a core dump, as my program didn’t work right. With alll this rigamarole, programming didn’t hold much interest for me.

Even as late as 1980, when I took a class in Fortran at the Univ. of Washington, we were still using keypunch machines. It was when personal computers started to really get big, and compilers could compile and run your program in about a minute or so, that I could see there was somthing to this programming thing.

Thanks, Jim, glad you enjoyed it. Do you have any links to the Intel libraries, or a name? I’d like to look into that more.

I certainly remember the days when you’d expect an integer and get “12.00000001”. I think Algodoo still has that “glitch”! I can’t even remember if my first computer had floating point operations in the CPU, it was a Tandy (radio shack) color computer 2 64kb 8086 1 Mhz processor. When I graduated to 80386 the floating point wasn’t precise enough so I devised a method to use 2 bytes for the interger (+/- 32,767) and 2 bytes for the fractions (1/65,536ths). It was limited in flexibility but exact and quite fast!

My first computer was so slow I would crunch the code for routines in machine code in my head and just type a data string of values to poke into memory through basic because even the compiler was slow and buggy… its the object oriented programming platforms that was when I really saw the “next level something” for programming.

I don’t think it had an Intel 8086 cpu. According to this wiki article, [URL]https://en.wikipedia.org/wiki/TRS-80_Color_Computer#Color_Computer_2_.281983.E2.80.931986.29[/URL], the Coco 2 had a Motorola MC6809 processor. I’m 99% sure it didn’t have hardware support for floating point operations.

You must have had the optional Intel 80387 math processing unit or one of its competitors (Cyrix and another I can’t remember). The 80386 didn’t have any hardware floating point instructions.

That’s what I thought at first 68B09E was the coco3 was the last one now that I remember, second guessed myself.

It was a pentium whatever number I forget but I was only programming up to the 386 instruction set at that time… I printed out the instruction set on that same old fanfold paper stack and just started coding.

The Pentium would have been 80586, but I guess the marketing people got involved and changed to Pentium. If it was the first model, that was the one that had the division problem where some divisions gave an incorrect answer out in the 6th or so decimal place. It cost Intel about $1 billion to recall and replace those bad chips. I think that was in ’94, not sure.

I think that was about the same time, I think it was a pentium 1, 25 Mhz intel chip with no math co-pro but it had a spot on the board for it. Perhaps it was just a 80486… can’t even remember the computer model!

Found this on wikipedia: “The i486 does not have the usual 80-prefix because of a court ruling that prohibits trademarking numbers (such as 80486). Later, with the introduction of the [URL=’https://en.wikipedia.org/wiki/Pentium_(brand)’]Pentium brand[/URL], Intel began branding its chips with words rather than numbers.”

@Mark44 [URL]https://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library[/URL] claims IEEE-754-2008 support

Edit 1/30/16