Find us on GitHub

Teaching basic lab skills
for research computing

Program Design: Bugs

Program Design/Bugs at YouTube

Hello, and welcome to episode 8 of the Software Carpentry lecture on program design using invasion percolation as an example. In this episode, we'll fix a bug that we created earlier, and look at another part of the program that isn't actually a bug, but is often written incorrectly.

Let's try running the program that we created in the previous episode.

We say python of invperc.py, we ask for a 3×3 grid with random values in the range 1 to 10, and we give it 17983 as a random number seed because that's what my hands typed when they fell on the keyboard.

The program comes back to say "2 cells have been filled," which is what we'd expect: in a 3×3 grid, we fill the center cell and one other, and then we hit the boundary.

Let's try a larger grid.

We run Python on invperc.py on a 5×5 grid with random values in the range 1 to 10 and a different random number seed.

A minute goes by…

…and finally we hit Control-C. It's time to fire up the debugger…

If we take a look at the grid that was created, it looks OK: all of the values are in the range 1 to 10, and it's a 5×5 grid implemented as a list of lists.

Fill the middle cell with -1, and everything still looks good.

Remember, we're using -1 to mark filled cells.

The next cell gets chosen and filled correctly…

…and then the program goes into an infinite loop…

…in the find_candidates function.

Inside this loop, min_set contains the points (2,2) and (1,2)—the two cells shown in green above—and min_val is -1.

Uh oh—that's the problem.

Our marker value, -1, is less than any of the actual values in the grid.

Once we have marked two cells…

…each of those marked cells is adjacent to another marked cell.

Well, at least this one's going to be easy to fix.

This is the code that caused the problem. In find_candidates, we say, for x in range(N), for y in range(N), if the cell at (x,y) is a candidate, then go off and handle the cases where its value is equal to min_val or less than min_val. We'll modify this to say…

…if the cell is already filled, then continue around the loop, i.e., don't do anything with this cell, because it has already been marked as filled.

Great: we found one bug.

How many others haven't we found? How many are still lurking in the code?

How do we validate and verify a program like this? We'll discuss this topic in the next episode. Before we go on, let's look at one line of the program that people often get wrong.

The line of code is, "If x in (0, N-1) or y in (0, N-1), then break." This tests to see if x is either the value 0 or the value N-1, or y is the value 0 or N-1.

I.e., is either coordinate on the grid's edge?

This expression sounds similar, but fails. If x is (0, N-1), or y is (0, N-1). Why does this fail?

Well, this checks to see whether the value of x is the two-value tuple (0, N-1), or y is the two-value tuple (0, N-1).

But since x and y are always integers, neither will ever be a two-value tuple, so both halves of the or will always fail…

…and we will never exit the loop.

This version sounds like what we'd say to another person: if x or y is 0 or N-1.

But how will the computer interpret this? Let's take a look at the expression tree that the computer uses.

One way the computer might interpret it is, "(x or y) is (0, N-1)". Well, x is an integer, and and y is an integer. Let's assume that they're both non-zero.

The result of the expression x or y will be True.

True is never going to be a two-value tuple.

So this will always fail, and we'll never exit the loop.

This definitely isn't right.

Interpretation number two is, "x or (y is (0, N-1))". The tree on the right shows this version.

Well, y is never a two-integer tuple…

…so this is just x or False

…which is just "x is not 0"…

…and this isn't what we want either.

The expression on the right is what Python actually does.

The is operator binds more tightly than or, just as * (times) binds more tightly than +.

Thank you.