Find us on GitHub

Teaching basic lab skills
for research computing

Testing: Floating Point

Testing/Floating Point at YouTube
media 01

Hello, and welcome to the fifth episode of the Software Carpentry lecture on testing. In this episode, we’ll take a look at how computers represent real numbers, and some of the traps waiting for anyone who uses them.

media 02

Let’s go back to the fields of Saskatchewan. Instead of working directly with pixels in photographs…

media 03

…we would probably actually work with some kind of geographic information system that would store each field’s coordinates and ground cover.

media 04

Those coordinates would probably be recorded as latitude and longitude.

media 05

Which the computer would store as floating point numbers rather than integers.

media 06

And that’s where our problems start. To understand, we have to step back and look at how computers store numbers.

media 07

Finding a good representation for floating point numbers is hard.

media 07

The root of the problem is that we cannot represent an infinite number of real values with a finite set of bit patterns. And unlike integers, no matter what values we do represent, there will be an infinite number of values between each of them that we can’t.

media 09

The explanation that follows is simplified—possibly over-simplified—to keep it manageable.

media 10

Please, if you’re doing any calculation on a computer at all, take half an hour to read Goldberg’s excellent paper, “What Every Computer Scientist Should Know About Floating-Point Arithmetic”.

media 11

So: floating point numbers are usually represented using sign, magnitude, and an exponent.

media 12

In a 32-bit word, the IEEE 754 standard calls for 1 bit of sign, 23 bits for the magnitude (or mantissa), and 8 bits for the exponent.

media 13

To illustrate the problems with floating point, we’ll use a much dumber representation.

media 14

We’ll only worry about positive values without fractional parts.

media 15

And to make it all fit on our medias, we’ll only use 5 bits: 3 for the magnitude, and 2 for the exponent.

media 16

Here are the possible values (in decimal) that we can represent this way. Each one is the mantissa times two to the exponent.

media 17

For example, the decimal values 48 is binary 110 times 2 to the binary 11 power.

media 18

Which is 6 times 2 to the third…

media 19

…or 6 times 8.

media 20

Of course, real floating point representations like the IEEE 754 standard don’t have all the redundancy that you see in this table. Again, if you’d like to know how they get rid of it, and what implications this has, please read Goldberg’s paper.

media 21

Here’s a clearer view of the values our scheme can represent.

media 22

The first thing you should notice is that there are a lot of values we can’t store. We can do 8 and 10, for example, but not 9.

media 23

This is exactly like the problems hand calculators have with fractions like 1/3: in decimal, we have to round that to 0.3333 or 0.3334.

media 24

But if this scheme has no representation for 9…

media 25

…then 8+1 must be stored as either 8 or 10.

media 26

Which raises an interesting question: if 8+1 is 8, what is 8+1+1?

media 27

If we add from the left, 8+1 is 8, plus another 1 is 8 again.

media 28

If we add from the right, though, 1+1 is 2, and 2+8 is 10. Changing the order of operations can make the difference between right and wrong.

media 29

This is the sort of thing that numerical analysts spend their time on. In this case, if we sort the values we’re adding, then add from smallest to largest, it gives us the best chance of getting the best possible answer. In other situations, like inverting a matrix, the rules are much, much more complicated—so complicated that we’ll skip past them completely.

media 30

Instead, here’s another observation about our uneven number line:

media 31

The spacing between the values we can represent is uneven.

media 32

However, the relative spacing between each set of values stays the same, i.e., the first group is separated by 1, then the separation becomes 2, then 4, then 8, so that the ratio of the spacing to the values stays roughly constant.

media 33

This happens because we’re multiplying the same fixed set of mantissas by ever-larger exponents, and it points us at a couple of useful definitions.

media 34

The absolute error in some approximation to a value is simply the absolute value of the difference between the two.

media 35

The relative error, on the other hand, is the ratio of the absolute error to the value we’re approximating.

media 37

This example might help. If we’re off by 1 in approximating 8+1 and 56+1, that’s the same absolute error, but the relative error is much larger in the first case than in the second.

media 38

When we’re thinking about floating point numbers, relative error is almost always more useful than absolute—after all, it makes little sense to say that we’re off by a hundredth when the value in question is a billionth.

media 39

What does this have to do with testing?

media 40

Well, let’s have a look at a little program

media 41

The loop runs over the integers from 1 to 9 inclusive.

media 42

Using those values, we create the numbers 0.9, 0.09, 0.009, and so on, and put them in the list vals. (Remember, the double star means “raised to the power” in Python.)

media 43

We then calculate the sum of those numbers. Clearly, this should be 0.9, 0.99, 0.999, and so on.

media 44

But is it?

media 45

Let’s calculate the same value a different way, by subtracting .1 from 1, then subtracting .01 from 1, and so on.

media 46

This should create exactly the same sequence of numbers.

media 47

Let’s check by finding the difference and printing it out.

media 48

Here’s our answer. The first column is the loop index; the second, what we got by totaling up 0.9, 0.99, and so on, and the third is the difference between that number and what we got by subtracting from 1.0.

media 49

The first thing you should notice is that the very first value contributing to our sum is already slightly off. Even with 23 bits for a mantissa, we cannot exactly represent 0.9 in base 2, any more than we can exactly represent 1/3 in base 10. Doubling the size of the mantissa would reduce the error, but we can’t ever eliminate it.

media 50

The good news is, 9 times 10-1 and 1 minus 0.1 are exactly the same—it might not be precisely right, but at least it’s consistent.

media 51

The same cannot be said for some of the later values, though: at some points, there actually is a small difference between what we get from our two different formulas.

media 52

And if you weren’t confused enough, sometimes the accumulation of addition errors actually makes the result more accurate once again.

media 53

So what does this have to do with testing?

media 54

Well, if the function you’re testing uses floating point numbers, what do you compare its result to?

media 55

If we compared the sum of the first few numbers in vals to what it’s supposed to be, the answer could be False.

media 56

Even if we’re initializing the list with the right values…

media 57

…and calculating the sum correctly.

media 58

This is a hard problem—a very hard problem.

media 59

No one has a good generic answer, because the root of our problem is that we’re using approximations, and each approximation has to be judged on its own merits.

media 60

So what can you do to test your programs?

media 61

The first rule is, compare what you get to analytic solutions whenever you can. For example, if you’re looking at the behavior of drops of liquid helium, start by checking your program’s output on a stationary spherical drop in zero gravity. You should be able to calculate the right answer in that case, and if your program doesn’t work for it, it probably won’t work for anything else.

media 62

The second rule is to compare more complex versions of your code to simpler ones. If you’re about to replace a simple algorithm for calculating heat transfer with one that’s more complex, but hopefully faster, don’t throw the old code away: use its output as a check on the correctness of the new code. And if you bump into someone at a conference who has a program that can calculate some of the same results as yours, swap data sets: it’ll help you both.

media 63

The third rule is, never use == (or !=) on floating point numbers, because two numbers calculated in different ways will probably not have exactly the same bits.

media 64

(It’s OK to use less than, greater than or equals, and other orderings, though—we’ll explore why in the exercises.)

media 65

If you want to compare floating point numbers, use a function like nose.assert_almost_equals. As its documentation says:

media 66

This checks whether two numbers are within some tolerance of each other. If they are, it declares them equal.

media 67

And if you thought to wonder whether that tolerance is absolute or relative, you’re ready to start testing floating-point code.

media 68

Thank you.