Hello, and welcome to the seventh episode of the Software Carpentry lecture on program design using invasion percolation as an example. In this episode, we'll see how to assemble the bits and pieces we've already developed to create a working program.
As you recall, we now know how to:
create a grid…
fill it with random numbers…
mark cells that have been filled…
find cells that might be filled next…
…and choose one of them at random.
It's time to put everything together.
We're going to show you this code in exactly the order we would write it.
We start at the top and work down…
…introducing functions and variables as we need them…
…and tidy up a little bit along the way. In the next-but-one episode of this lecture, we'll do a bit more tidying up as well.
Here's the first code that we write. We start with a documentation string to remind ourselves of what this program does, we import the libraries we need, and then we create the main driver.
Note that we're importing the entire random number generation library instead of just the functions that we need. This will make things easier to understand in future when we come back to revise the program.
The first thing we need to do in our main driver is get parameters from the command line. We get all but the first element of sys.argv
and store it in the variable arguments
. If you recall, sys.argv[0]
is the name of the program, and we don't need that. We then convert the first three values in arguments
to integers, and assign them to grid_size
, value_range
, and random_seed
. If we get an IndexError
, it means that one of the indices 0, 1, or 2 wasn't valid, so we don't have enough arguments. If we get a ValueError
, it means that one of our attempts to convert a string to an integer failed, so again we print an error message.
Notice that we've used a function called fail
to report errors. We haven't written this yet, so now's a good time to go back and do that.
Here it is. We give it a docstring, because every function should have one; we print the error message to standard error, so that it will appear on the user's console, and then we exit from the program.
The box in the lower right shows the structure of the program so far. We've got a documentation string, our fail
function, and then the main driver of our program.
The next step is to run the simulation. We do that by seeding the random number generator, creating a random grid, marking the center cell as filled, and then filling the rest of the grid.
In this code, we've used three functions that don't exist yet, so we're going to have to go back and write them. Before we do that, though, let's finish off the main body of the program.
The last task we have is to report results.
But we haven't actually decided what to do about this. Nothing in our specification told us whether we were supposed to draw the fractal, calculate some statistics, or do something else entirely.
For now, we'll just count the number of cells we've filled in, and print that.
Here's the code. We will change fill_grid
so that it returns the number of cells it filled in, and then we'll print that number out…
…except we'll add one to the value returned by fill_grid
because we marked the center cell as being filled manually. This is a little bit clumsy: someone who hasn't read our code carefully might think—reasonably—that fill_grid
returns the total number of cells that are filled, not one less than that. We should go back and tidy that up later.
Here's the code to create a random grid. It checks that the parameters it's been passed make sense, and then it builds a list of lists of random values.
A little documentation would help here—let's go back and add a docstring.
create_random_grid
returns an N×N grid of random values in 1 to Z. It assumes that the random number generator has already been seeded, i.e., it is not going to seed the random number generator itself.
The box in the lower right shows where we put this function in our program file.
The next step is the function mark_filled
, which marks a grid cell as being filed. The docstring says that, and then we use assertions to test that the X and Y coordintaes we've been given are actually in bounds. You might think we don't need this code, because if the X or Y coordinate is out of bounds, Python will fail and print its own error message, but there are three reasons to put these assertions in. The first is documentation: this says (better than a comment) what we expect of X and Y. The second is, these error messages are more meaningful that Python's generic "IndexError: index out of range" message. The third reason to put these assertions in is that some values of X and Y that we think of as illegal will actually work as indices in the grid. If X or Y is negative, then they will count down from the end of the grid, rather than up from the beginning. Python is quite happy to do this, but it almost certainly indicates an error somewhere in our program. The last line in this function assigns -1 to grid[x][y]
.
We're using -1 to indicate filled cells, but we don't know if people are going to remember that when they're reading our code. If you say "grid at X, Y assigned -1", it's not immediately clear what you're doing. So let's make a small change right now.
Near the top of our program we'll create a variable called FILLED
, and give it the value -1, so that in our function, we can say "grid at X, Y is assigned FILLED". FILLED
is written in capital letters because we think of it as a constant, and by convention, constants are normally written in all caps.
Once again, the box in the lower right shows us the structure of our file. The constant goes near the top of the program, because that's where people will expect to find it, and our mark_filled
function goes in the middle.
The next function in our to-do list is fill_grid
. The docstring says that it fills an N×N grid until the filled region hits the boundary. It assumes that the center cell has been filled before the function is called.
We begin by setting up the variables N
and num_filled
. N
is the grid size, and num_filled
is the number of cells that this function has filled so far. We've squeezed these two variables onto one line so that our function will fit on a media. In a real program, we'd put each one on a separate line.
We then go into what looks like an infinite loop. The loop guard is "while TRUE", and since True
is always true, this will never exit. However, in Python and most other languages, anything that looks like a deliberate infinite loop almost always signals a loop that exits in the middle.
And in fact that's the case. Down near the bottom of this loop we test a condition and use break
to break out of that loop if this condition is true. This is a very common idiom.
Inside the loop, we call another function, find_candidates
, to find the set of candidates for filling. This function hasn't been written yet, so we add it to our to-do list.
We then check that the set of candidates is non-empty, because if we haven't found any candidates for filling, something has probably gone wrong with our program.
And then, as discussed in a previous episode, we convert the set of candidates to a list, make a random choice to find the (x,y) coordinates of the cell we're going to fill, and then we mark that cell as filled, and increment our count of the number of cells we've filled so far.
We then check to see whether either the X or the Y coordinate is on the boundaries of the grid. If it is, we break out of our loop…
…and return the number of cells that we've filled.
Again, the box in the lower right shows where this function fits in the file.
One more function. find_candidates
finds low-valued neighbor cells and returns them as a set. N
is assigned the size of the grid, min_val
and min_set
are initialized as discussed in an earlier episode, and then we loop over the X and Y coordinates, checking to see if the neighbors of the cell we're looking at are already filled.
We're going to stop right there.
This code is already hard to read.
In fact, it contains a bug.
That y+1
should be y-1
, but you probably didn't notice that because there was too much code to read at once.
A good rule of thumb is, "Listen to your code as you write it." If the code is difficult to understand when read aloud, then it's probably going to be difficult to understand when you're debugging, so you should try to simplify it.
This version of find_candidates
introduces a helper function called is_candidate
.
This is much clearer when read aloud.
If the cell at (x,y) is a candidate, then we check to see whether it ties with the current lowest value or is a new lowest value. If it ties with the lowest value, we add its coordinates to the set. If it's a new lowest value, then we re-initialize min_val
and min_set
as discussed in an earlier episode.
As you can see, the find_candidates
function fits right above fill_grid
in our file.
We can then insert the is_candidate
function, which has the tests we discussed in the previous episode, right above find_candidates
.
It's finally time to run our program. But actually, it isn't.
It's finally time to test our program.
Because there's a bug lurking in the code that we just put together.
Take a moment now, read over the code, and try to find the bug before moving on to the next episode.
Thank you.