Find us on GitHub

Teaching basic lab skills
for research computing

Program Design: Tuning

Program Design/Tuning at YouTube

Hello, and welcome to the next episode of the Software Carpentry lecture on program design using invasion percolation as an example. In this episode, which will be a rather long one, we'll take a look at how to speed up the program we've developed.

Our program now works correctly.

At least, it passes all of our tests.

The next step is to figure out if we need to make it faster. If it's already fast enough to generate the results we need…

…we should spend our time doing other things.

How do we measure a program's speed anyway?

One possibility is to take the average of running it over and over again on grids of various sizes.

But that won't necessarily help us predict the running time on larger grids.

Instead, we're going to use a technique called asymptotic analysis

…which is one of the most powerful theoretical tools in all of computer science.

Our N×N grid has N2 cells.

We have to fill at least N of those cells to reach the boundary.

In the worst case, we will have to fill as many as (N-2)2+1, or N2-4N+5, cells to hit the boundary.

For large N, this is approximately N2: the ratio between N2 and the polynomial shown above goes to 1 as N gets very large.

Our program has to look at N2 cells each time it wants to find the next cell to fill, because right now, we sweep over X in 0 to range(N) and Y in 0 to range(N), and the product of those is N2.

So the best-case filling time is N×N2 or N3 steps in total.

The worst-case filling time is N2×N2 or N4 steps in total.

That's bad news.

Averaging the exponents of N3 and N4 to get N3.5 doesn't make a lot of sense…

…but it will illustrate our problem.

Here are some grid sizes, and the time it would take to fill a grid of that size if on average filling take N3.5 steps.

To make these numbers a little bit more realistic, let's add another column and a few more rows. If filling a grid of size N takes one second, then filling a grid that's 100 times as large on each axis will take 115 days. That's a pretty steep cost, because we may need to run our simulation hundreds or even thousands of times to generate good statistics about the fractals it creates.

So here's an idea: why are we always looking at all of the cells in the grid?

Why don't we just look at the cells that might be adjacent to the filled region?

We can keep track of the minimum and maximum X and Y coordinates of the area filled so far, and just loop over the cells that are one less or one greater on each axis. But this is actually going to help us very much.

On average, the filled region is about half the size of the grid.

(N/2)×(N/2) is N2/4.

This means that our 115-day run has been brought down to 29 days.

The bad news is, if we go to a grid that 148N×148N, we're back to our 115 days again because of that exponent.

We need to find a way to attack that exponent.

Well, here's another thought. If we've just added the cell shown in dark green…

…why are we bothering to check all of the cells shown in yellow?

We already know that the cells shown in red are candidates because they're adjacent to cells that we've already checked.

We've only got one new candidate cell to add to that set. This leads us to one of the big ideas in programming.

You can trade memory for time (and vice versa).

I.e., you can record redundant information in order to save recalculation as a way of making algorithms faster.

In this case, we're going to keep a list of the cells that are on the boundary of the region that we've filled so far. That's redundant information because everything we're recording is also already in the grid.

Each time a cell is filled, we will check its neighbors.

If those neighbors are already in the list, we won't do anything—we wont' re-add them to our list of neighboring cells.

But if any of the cells aren't already in the list, we will put them in.

In fact, we will insert them, in order to ensure that the cells with lowest value are always at the front of the list.

This makes it easy to choose the next cell to fill. We just look at all of the cells at the front of the list with equal lowest value and pick one of them at random.

An example will make this clearer. The list of cells on the edge is initially empty because we haven't filled anything in the grid.

When we fill the center cell, we add its neighbors to the list.

And when we add a neighbor, we add the value as well as the coordinates.

We then take the first cell from the front of the list and fill it…

…and add its neighbors again. Notice that we're always keeping the list sorted by increasing value.

That way, the candidates for filling are always at the front of the list.

And of course, if a cell is already in the list, we never add it a second time.

In case of ties, we find all of the cells at the front of the list with the lowest value…

… and pick one of those at random to fill.

How quickly can we insert a value into a sorted list? Here's a simple algorithm.

We set i equal to 0, and then look upwards through the list. As soon as we see a value in the list that is greater than or equal to the value that we've got, we break out of the loop and insert our cell there.

If you check the documentation for the list.insert method……you'll see that this function does the right thing even when the list is empty…

…or when the new value is greater than all of the existing values in the list, i.e., when we're appending the new record to the end of the list.

If a list has K values, then on average, if we're inserting things in the middle…

…it'll take about K/2 steps.

Our fractals fill somewhere between N and N2 cells, so again we're going to cheat and average those exponents to say that we fill about N1.5 cells.

There are about as many cells on the boundary as there are in the fractal itself, so using this list, we're going to fill N1.5 cells, and each time we do that, it's going to take about N1.5 steps to insert neighbors into the list.

The product is N3 steps.

That's not much of an improvement over N3.5

…but it turns out there's a much faster way to do this.

Imagine you want to look up a name in a phone book.

You open the book in the middle…

…and if the name there comes after the one you want, you go to the middle of the first half.

If the name in the middle comes before the one you want, you go to the second half.

You then repeat this procedure in that half of the phone book, subdividing and subdividing until you find the name that you're after.

This diagram shows the steps that you go through.

How fast is this procedure?

Well, one probe can find one value.

Two probes can find one value among two.

Three probes can find one value among four.

Four probes can find one among eight.

And in general, K probes can find one value among 2K.

Turning that around, log2(N) probes is enough to find a place in a list of N values.

That means that the running time for our algorithm will be about N1.5log2(N1.5). The first N1.5 is the number of times we fill a cell, and log2(N1.5) is the number of steps it takes each time we fill a cell to find the right place to insert.

If we take out the constant, we wind up with N1.5log2(N), because of course log2(N1.5) is just 1.5log2(N).

This changes things quite a bit.

Let's highlight the greatest values in each column. What was a running time of 115 days has become a running time of 2 hours.

What's more, as N gets bigger, the ratio gets bigger as well.

The "divide and conquer" technique we use to insert values into a list is called binary search.

It only works if the list values are sorted…

…but it keeps the list values sorted.

There's a Python implementation of this in the bisect library.

Are we done? No: we can do even better.

Generating a grid of N2 values take N2 steps. After all, we have to create N2 random numbers.

But if we fill the cells shown in green…

…we only ever look at the cells shown in green plus the cells shown in yellow…

…so why are we bothering to generate values for the cells shown in red? Why do we ever create those cells at all?

Instead of storing the grid as a list of lists, we can store it as a dictionary.

The keys in the dictionary are the (x,y) coordinates of cels.

The values are the current values of those cells.

Instead of looking things up directly as grid[x][y], we use a function get_value that takes the grid, X, Y, and the random range as arguments.

To understand why we need the random range, take a look at the implementation of that function. If the coordinate (x,y) is not in the grid—i.e., if it's not already a key in the dictionary called grid—then we create a random value in 1..Z and store that with the key (x,y). We then return the value that we just created.

Of course, if the coordinates (x,y) are already a key in grid, then we'll skip the if and just return the value that's there.

This is an example of another commonly-used way to optimize programs.

It's called lazy evaluation.

We're not computing values until they're actually needed: we only ever generate the random numbers that we're going to use.

This does make the program more complicated…

…but it also makes it a lot faster.

What we're doing is trading human time—i.e., programming effort—for higher machine performance. This again is a very common theme in computing.

The moral of the story?

The biggest performance gains always come from changing algorithms and data structures, not from tweaking loops or rearranging conditionals.

The other moral to this story?

If you want to write a program that's fast, you start by writing a program that's simple. You then test it and improve it piece by piece, re-using the tests to check your work at each stage.

Thank you.