Hello, and welcome to the fourth episode of the Software Carpentry lecture on testing. In this episode, we'll look at why it's important to test interfaces, rather than implementations, and why and how to design software so that it's easy to test.
One of the most important ideas in computing is the difference between interface and implementation.
Something's interface specifies how it interacts with the world: what it will accept as input, and what output it produces. It's like a contract in business: if Party A does X, then Party B guarantees Y.
Something's implementation is how it accomplishes whatever it does. This might involve calculation, database lookups, or anything else. The key is, it's hidden inside the thing: how it does what it does is nobody else's business.
For example, here's a function in Python that integrates a function of one variable over a certain interval.
Its interface is simple: given a function and the low and high bounds on the interval, it returns the appropriate integral. A fuller definition of its interface would also specify how it behaves when it's given bad parameters, error bounds on the result, and so on.
Its implementation could use any of a dozen algorithms. In fact, its implementation could change over time as new algorithms are developed. As long as its contract with the outside world stays the same, none of the programs that use it should need to change. This allows users to concentrate on their tasks, while giving whoever wrote this function the freedom to tweak it without making work for other people.
We often use this idea—the separation between interface and implementation—to simplify unit testing.
The goal of unit testing is to test the components of a program one by one—that's why it's called "unit" testing.
But the components in real programs almost always depend on each other: this function calls that one, this data structure refers to the one over there, and so on.
How can we isolate the component under test from the rest of the program so that we can test it on its own?
One technique is to replace the components we're not currently testing with simplified versions that have the same interfaces, but much simpler implementations, just as a director would use a stand-in rather than a star when fiddling with the lighting for a show.
Doing this for programs that have already been written sometimes requires some reorganization, or refactoring.
But once you understand the technique, you can build programs with it in mind to make testing easier.
Let's go back to our photographs of fields in Saskatchewan.
We want to test a function that reads a photo from a file. (Remember that a photo is just a set of rectangles.)
Here's a plausible outline of the function. It creates a set to hold the rectangles making up the photo, opens a file, and then reads rectangles from the file and puts them in the set. When the input is exhausted, the function closes the file and returns the set.
And here's a unit test for that function. It reads data from a file called unit.pht
, then checks that the result is a set containing exactly one rectangle.
This is pretty straightforward, but experience teaches us that it's a bad way to organize things.
First, this test depends on an external file, and on that file being in exactly the right place. Over time, files can be lost, or moved around, which makes tests that depend on them break.
Second, it's hard to understand a test if the fixture it depends on isn't right there with it. Yes, it's easy to open the file and read it, but every bit of extra effort is a bit less testing people will actually do.
Third, file I/O is slower than doing things in memory—tens or hundreds of thousands of times slower.
If your program has hundreds of tests, and each one takes a second to run, developers will have to wait several minutes to find out whether their latest change has broken anything that used to work. The most likely result is that they'll run the tests much less frequently…
…which means they'll waste more time backtracking to find and fix bugs that could have been caught when they were fresh if the tests only took seconds to run.
Here's how to fix this. Imagine that instead of reading rectangles, we're just counting them.
This simple function assumes the file contains one rectangle per line, with no blank lines or comments.
Of course, a real rectangle counting function would probably be more sophisticated, but this is enough to illustrate our point.
Here's the function after refactoring.
We've taken the inner core of the original function and made it a function in its own right. This new function does the actual work—i.e., it counts rectangles—but it does not open the file that the rectangles are read from.
That is still done by the original function. It opens the input file, calls the new function that we extracted, then closes the file and returns the result.
Notice that this function keeps the name of the original function, so that any program that used to call count_rect
can still do so.
Now let's write some tests.
This piece of code checks that count_rect_in
—the function that actually does the hard work—handles the three-rectangle case properly.
Instead of an external file, we're using a string in the test program as a fixture.
To make this string look like a file, we're relying on a Python class called StringIO
. As the name suggests, this acts like a file, but uses a string instead of the disk for storing data. StringIO
has all the same methods as a file, like readline
…
…so count_rect_in
doesn't know that it isn't reading from a real file on disk.
We can use this same trick to test functions that are supposed to write to files as well.
Instead of opening a file, filling it, and closing it, we create a StringIO
object and "write" to that.
We then use StringIO
's getvalue
method—one of the few things it has that real files don't—to get back the text we're written and check that it's correct.
For example, here's a unit test to check that another function, photo_write_to
, can correctly write out a photo containining only the unit square. Once again, we create a StringIO
and pass that to the function instead of an actual open file.
If photo_write_to
only writes to the file using the methods that real files provide, it won't know that it's been passed something else.
Once we're finished writing, we can call getvalue
to get the text that we wrote, and check it to make sure it's what it's supposed to be.
In order to make output testable, though, there's one more thing we have to do.
Here's a possible implementation of photo_write_to
. It puts the rectangles in the photo into a list, sorts that list, then writes the rectangles one by one.
This is simple enough, but why do the extra work of sorting? Why not just loop over the set and write the rectangles out directly?
Please take a moment and see if you can think of the reason.
Let's work backwards to the answer. This version of photo_write_to
is shorter and faster than the previous one.
But there is no way to predict its output for any photo that contains two or more rectangles.
For example, here's a simple photo showing two fields of corn ready for harvest.
And here are two lines of Python that we might put in a unit test to represent the photo, and write it to a file or a StringIO
.
You probably think the function's output will look like this…
…but it could equally well look like this, with the rectangles in reverse order.
These two representations are conceptually the same, but they're very different as text.
The problem, of course, is that sets are unordered.
Or rather, the elements in a set are stored in an arbitrary order that's under the computer's control.
Since we don't know what that order is, we can't predict the output if we loop over the set directly, which means we don't know what to compare the output to. If we sort the rectangles, on the other hand, they'll always be in the same order, and to sort them, we have to put them in a list first.
One final lesson for this lecture: you probably haven't noticed, but the tests we've written in this episode are inconsistent.
Here's the fake "file" we created for testing the photo-reading function.
And here's the string we used to check the output of our photo-writing function.
Please take a moment and see if you can see the inconsistency.
That's right: one string has a newline at the end, and the other doesn't.
It doesn't matter whether we require this or not—either convention is better than saying "maybe", because if we allow both, our code becomes more complicated, and more testing will be required.
Stepping back, the most important lesson in this episode isn't how to test functions that do I/O. The most important idea is that you should design your programs so that their components can be tested.
To do this, you should depend on interfaces, not implementations: on the contracts that functions provide, not on the details of how they accomplish whatever they do.
Following this rule will make it easy for you to replace components that you're not currently testing with simplified versions to make it easier to test the ones you are interested in.
It will also save you from writing your tests over and over as the internals of the functions you are testing are changed. Empirical studies have shown that interfaces are longer-lived than implementations: if you rely on the former rather than the latter, you'll spend less time rewriting tests, and more time figuring out what effect climate change is having on fields in Saskatchewan.
The other rule when you're designing programs to be testable is to isolate interactions with the outside world.
For example, code that opens file should be separated from code that reads data, so that you can test the latter without needing to do the former.
Finally, you should make the things you are going to examine to check the result of a test deterministic, i.e., the result of a particular function call should always be exactly the same value, so that you can compare it directly to the expected result.
Unfortunately, this last rule can sometimes be hard to follow in scientific programs. Our next episode will explain why.