Find us on GitHub

### Teaching basic lab skillsfor research computing

A student who's doing this course online emailed us a question:

I want to save the state of my program every few hours so I can re-start it later. What's the best format for doing this?

There's no simple or single answer, but exploring the options is a natural way to introduce a few more programming ideas.

## Vectors

First, let's suppose the program's state is a list of real numbers, which could represent something like the temperature at each point along a line. Our data file looks like this:

```1.0
1.1
1.015
1.23```

and so on, with one value per line—or in other words, with a newline character as a separator between numbers. It's trivial to write functions that will store and load such a list:

```def save(filename, vector):
writer = open(filename, 'w')
for value in vector:
print >> writer, value
writer.close()

vector = []
value = float(line.strip())
vector.append(value)
return vector```

## Matrices

What if the program's state is a two-dimensional matrix instead of a vector? If the matrix is a list of lists in our program, our functions are almost as simple as the vector versions:

```def save(filename, matrix):
writer = open(filename, 'w')
for row in matrix:
for value in row:
print >> writer, value,    # suppress the newline
print >> writer                # print a newline
writer.close()

matrix = []
temp = []
line = line.strip().split()
for entry in line:
value = float(entry)
temp.append(value)
matrix.append(temp)
return matrix```

and so are our data files:

```1.0 1.015 1.212 1.224
1.30 1.325 1.39 1.447
1.676 1.9 1.81 2.001```

`save` prints each row of the matrix on a single line. (Remember: a trailing comma in a `print` statement suppresses the newline Python would normally print, while `print` with no arguments just prints a newline.) `load` reads those lines, splits them on whitespace, and converts the fragments from strings to numbers, building up a list-of-lists as it goes. Another way of looking at it is that we use newline as a row separator, but a single space as a column separator.

## Error Checking

These functions work, but they wouldn't meet most programmers' standards. To see why, imagine that our file is improperly formatted:

```1.0 2.0 3.0
4.0 5.0
7.0 8.0 9.0```

A value is missing from the second row, but our `load` function won't care: it will happily return an irregular list-of-lists. The program that uses this matrix might blow up later on, but (a) then we'd have to spend time working backwards from the point of failure, which might be thousands of lines later, and (b) the program might not actually blow up, but instead just produce the wrong answer.

Since "silent failures" are the worst possible outcome, we'd like to fail early and fail often, so let's modify `load` to catch this case. The change is highlighted:

```def load(filename):
matrix = []
temp = []
line = line.strip().split()
for entry in line:
value = float(entry)
temp.append(value)

if len(matrix) > 0:
assert len(temp) == len(matrix), \
'Rows are not the same length'

matrix.append(temp)
return matrix```

Whenever we construct a new sub-list, we check if there are already any rows in `matrix`. If there are, we compare the length of the new row to the length of the zero'th row of `matrix`, and halt with an error if they're different. Doing this ensures that all rows have the same length as the first row we read.

If we're going to check input, let's check output as well. This doesn't guarantee data files will always be correct, since someone could modify them in a text editor outside our program, but it does reduce the risk.

```def save(filename, matrix):
writer = open(filename, 'w')
for row in matrix:
assert len(row) == len(matrix), \
'Rows are not the same length'
for value in row:
print >> writer, value,
print >> writer
writer.close()```

Again, the change is highlighted in bold, and again, we check the length of each row against the length of the zero'th row. And yes, the first time through the loop we will check the zero'th row's length against itself, but it's simpler (and probably more efficient) to do that unnecessary calculation than to avoid it.

Our functions now check the lengths of rows, but what if entire rows are missing? Or what if we saved the state of the program when it was running for a 780×780 case, but mistakenly try to load that file when we're doing an 870×870 calculation? The solution that almost every file format uses is to add some metadata that describes the rest of the data. In our case, the metadata is simply the dimensions of the matrix as a pair of integers on the first line of the file:

```def save(filename, matrix):
writer = open(filename, 'w')
print >> writer, len(matrix), len(matrix)
for row in matrix:
assert len(row) == len(matrix), \
'Rows are not the same length'
for value in row:
print >> writer, value,
print >> writer
writer.close()

num_rows = int(dims)
num_cols = int(dims)
matrix = []

temp = []
line = line.strip().split()
assert len(line) == num_cols, \
'Wrong number of columns in line %d' % num_lines
for entry in line:
value = float(entry)
temp.append(value)
matrix.append(temp)

assert len(matrix) == num_rows, \
'Wrong number of rows'
return matrix```

Before we move on, let's tidy up `load` a bit. Recent versions of Python support list comprehension: instead of using a loop to operate on a list element by element, we can write an expression that does the work for us. For example, if `stuff` is a list of integers, then:

`[x**2 for x in stuff if x > 3]`

creates a list of of the squares of the values in `stuff` that are greater than 3. Let's use list comprehension to shorten `load` a bit:

```def load(filename):
matrix = []

temp = [float(x) for x in line.strip().split()]
assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)

assert len(matrix) == num_rows, \
'Wrong number of rows'
return matrix```

List comprehensions are a bit harder to read when you first encounter them, but so are most abbreviations, and since they're becoming more widely used, it's worth getting to know them.

## Multiple Objects

Saving metadata in the file has another benefit. Suppose our program stores its state in two matrices, not one—or in five, or in three matrices and a pair of vectors. Without metadata about their sizes, our function would have no way to know when to stop reading the current object and start reading the next, so we'd have to put each data structure in a file of its own. With metadata, it's easy—there is no ambiguity at all in a file like:

```3 2
1.0 2.0
3.0 4.0
5.0 6.0
1 4
100.0 200.0 300.0 400.0
2 1
1000.0
2000.0```

How should we change our functions to do this? One option is to pass a list of matrix to `save`:

```def save(filename, stuff):
writer = open(filename, 'w')
for matrix in stuff:
print >> writer, len(matrix), len(matrix)
for row in matrix:
assert len(row) == len(matrix), \
'Rows are not the same length'
for value in row:
print >> writer, value,
print >> writer
writer.close()```

Reading is a bit trickier, since we have to look ahead one line to find out if we're at the end of the file or not. The function below does this, and returns a list of matrices to its caller:

results = []
while line: # exit loop at end of file
num_rows, num_cols = [int(x) for x in line.strip().split()]
matrix = []
temp = [float(x) for x in line.strip().split()]
assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)
assert len(matrix) == num_rows, \
'Wrong number of rows'
results.append(matrix)

return results

This "lookahead" pattern comes up all the time when reading data. We could avoid the need for it by starting the file with some more metadata, namely the number of matrices the file is supposed to contain. In fact, that seems like a good idea anyway, since it allows more error checking, so let's do that instead:

```def save(filename, stuff):
writer = open(filename, 'w')
print >> writer, len(stuff)       # number of matrices in file
for matrix in stuff:
print >> writer, len(matrix), len(matrix)
for row in matrix:
assert len(row) == len(matrix), \
'Rows are not the same length'
for value in row:
print >> writer, value,
print >> writer
writer.close()

results = []
for m in range(num_matrices):
num_rows, num_cols = [int(x) for x in line.strip().split()]
matrix = []
temp = [float(x) for x in line.strip().split()]
assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)
assert len(matrix) == num_rows, \
'Wrong number of rows'
results.append(matrix)

return results```

We may also want to modify these functions to handle vectors along with matrices, since right now they only work for two-dimensional list-of-lists structures. We'll revisit that point below.

## Design for Test

We can still improve these functions, though. If you have gone through the lecture on testing, you will have seen that it's important to design for test, i.e., to write code in a way that makes testing easier and more reliable. We can apply that idea here by having the `save` and `load` functions take open file objects as parameters instead of filenames; that way, we can use `stringIO` objects (that read from or write to strings instead of files) for testing. Let's do that:

```def save(writer, stuff):
print >> writer, len(stuff)
for matrix in stuff:
print >> writer, len(matrix), len(matrix)
for row in matrix:
assert len(row) == len(matrix), \
'Rows are not the same length'
for value in row:
print >> writer, value,
print >> writer

results = []
for m in range(num_matrices):
num_rows, num_cols = [int(x) for x in line.strip().split()]
matrix = []
temp = [float(x) for x in line.strip().split()]
assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)
assert len(matrix) == num_rows, \
'Wrong number of rows'
results.append(matrix)
return results```

We can now test our code like this:

```from StringIO import StringIO

Data = '''2
1 1
1.0
2 2
2.0 3.0
4.0 5.0'''

def test_one_two():
assert len(result) == 2
first = result
assert len(first) == 1
assert len(first) == 1
assert first == 1.0
second = result
...etc...```

## Data Types

Now, what if our program's state is integers rather than floating point numbers? Or complex values? We could pick one and stick to it, but we'd really like to be able to re-use our save-and-load functions elsewhere. One option is to require the user to tell us how to do the conversion using some sort of flag. To keep things simple, we'll switch back to one matrix per file for the moment:

```def load(reader, data_type):
matrix = []

if data_type == 'i':
temp = [int(x) for x in line.strip().split()]
elif data_type == 'f':
temp = [float(x) for x in line.strip().split()]
elif data_type == 'c':
temp = [complex(x) for x in line.strip().split()]
else:
assert False, 'Unknown data type "%s"' % data_type

assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)

assert len(matrix) == num_rows, \
'Wrong number of rows'
return matrix```

Again, this works, but wouldn't meet most programmers' standards. The warning sign is the `if/elif/else`: we're using a flag to select a function, and if we decide to add another data type, we'll have to modify this function. Good functions obey the Open/Closed Principle: they are open for extension, but closed for modification. Putting it another way, it should be possible to extend a function's behavior in plausible ways without rewriting it.

How can we do that here? Simple: have the user pass in the function that's to be used to convert the strings read from the file. Remember, a function is just another kind of data, one that happens to be made up of instructions instead of characters or integers or colors:

```def load(reader, converter):
matrix = []

temp = [converter(x) for x in line.strip().split()]
assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)

assert len(matrix) == num_rows, \
'Wrong number of rows'
return matrix```

We can then call this function using:

```reader = open('myfile.dat', 'r')

or:

```reader = open('myotherfile.dat', 'r')

or however else we want.

## Registration

But hang on: how do we know what conversion function to pass in? Turning the question around, if we're storing the dimensions of the matrix in the file, shouldn't we also store some indicator of the data's type? If we only have to support a fixed set of types, we can store the name of the conversion function along with the dimensions, then use that to look up the function we want. For example, if our file is:

```float 3 2
1.0 2.0
3.0 4.0
5.0 6.0```

then our `load` function would be:

```def load(reader):

converters = {
'int'     : int,
'float'   : float,
'complex' : complex
}

assert type_name in converters,
'Unknown type name "%s"' % type_name
converter = converters[type_name]

matrix = []

temp = [converter(x) for x in line.strip().split()]
assert len(temp) == num_cols, \
'Wrong number of columns in line %d' % num_lines
matrix.append(temp)

assert len(matrix) == num_rows, \
'Wrong number of rows'
return matrix```

Looking at the change more closely:

1. The dictionary `converters` maps function names to functions.
2. When we read the metadata from the file, we pull out the conversion function's name and look up the right function.
3. Everything else works pretty much as before.

Unfortunately, we're violating the Open/Closed Principle again. It's easier to add or change a dictionary entry than to hunt through `if`'s and `elif`'s, but we would still have to modify this function's internals to handle another data type...

...unless, of course, the dictionary of conversion functions was passed into the function from outside. Let's create a Python file called `saveload.py` that defines three things:

```Conversions = {
'int'     : int,
'float'   : float,
'complex' : complex
}

def save(writer, matrix, converters):

...as above...```

If we want to add another conversion function—say, for fractions—all we do is add an entry to `Conversions` in our own program:

```import saveload

def fraction(text):
...convert a string like '12/17' to a number...

writer = open('myfractionfile.dat')
writer.close()```

In fact, we should add two functions: one to convert strings to values, and one to convert values to strings. We'll leave that as an exercise too.

This is a very powerful technique, which you will see used in many different contexts. The library (in this case, `saveload.py`) creates a registry of the things it knows how to do. If we want to add new behavior, all we do is add entries to that registry: we don't have to edit, or even read, the internals of the functions in the library. We can simplify this even further if we want to:

```Conversions = {
'int'     : int,
'float'   : float,
'complex' : complex
}

def save(writer, matrix, converters=Conversions):
...

...```

Here, we're making use of the fact that Python allows us to define default parameter values. If we call `save` and `load` without passing anything in for `converters`, Python will pass in `Conversions` on our behalf.

I would still take marks off if a student submitted the code shown here for an assignment, or bounce it if it was submitted for code review. Good file formats support comments and blank lines, and really good ones allow people to embed provenance information in the data itself, rather than hiding it in comments. As an exercise, modify `saveload.py` so that it will write and read files that look like this:

```# Created by Monsterator 4.7
\$Revision: 31\$

# alpha
float 3 2
1.0 2.0
3.0 4.0
5.0 6.0

# beta
float 1 4
100.0 200.0 300.0 400.0

# gamma
int 2 1
1000
2000```

## Mixed Types

Let's tackle a slightly different problem next. Suppose our data has a mix of types, like a table of biographical data:

 'Newton' 'Isaac' 1642 1727 'Darwin' 'Charles' 1809 1882 'Turing' 'Alan' 1912 1954

We could use our matrix format:

```3 4
Newton Isaac 1642 1727
Darwin Charles 1809 1882
Turing Alan 1912 1954```

but look what happens when we try to store information about John Maynard Smith:

`Maynard Smith John 1920 2004`

Oops: the space in Maynard Smith's double-barrelled surname is going to break our code. We could try using a separator that's never supposed to appear in the representation of any value, like a comma:

```4 4
Newton,Isaac,1642,1727
Darwin,Charles,1809,1882
Turing,Alan,1912,1954
Maynard Smith,John,1920,2004```

but this isn't a general solution: as soon as we switch from people's names to (for example) quotations or paper abstracts, we need to be able to store any character that can ever appear in text, which means there isn't a "safe" separator.

We can solve this problem the same way programming languages do: by using escape sequences for characters that have special meanings. For example, we could use the two character sequence "\ " (backslash space) to indicate that a space in the data file is just a space, not a separator:

```4 4
Newton Isaac 1642 1727
Darwin Charles 1809 1882
Turing Alan 1912 1954
Maynard\ Smith John 1920 2004```

The code that reads lines is pretty simple:

```for line in reader:
fields = line.strip().split()
temp = [fields]
for f in fields[1:]:
if temp[-1][-1] == '\\':
temp[-1] += f
else:
temp.append(f)```

This turns the line:

`'Maynard\ Smith John 1920 2004\n'`

into the list `['Maynard\', 'Smith', 'John', '1920', '2004']`, then moves list entries from `fields` to `temp`, concatenating as it goes when it finds that a preceding entry has a trailing '\'.

You can probably guess how we're going to handle conversions from strings to other types. Instead of having a single format name in the metadata at the top of the table, we have as many converters as there are columns in our data:

```4 4 string string int int
Newton,Isaac,1642,1727
Darwin,Charles,1809,1882
Turing,Alan,1912,1954
Maynard Smith,John,1920,2004```

There's a (solvable) problem with this, though: how do we know how many fields in the metadata are dimensions, and how many are types? We can find out by trying to convert fields to integers one by one until something fails, and assume that's the start of the type list, but it's simpler and more robust to either:

1. put the number of dimensions at the start of the list, followed by the sizes, followed by the fields; or
2. use some character other than space to separate the dimensions from the fields.

The first option gives us headers like this:

`2 4 4 string string string int`

while the second gives us:

`4 4 | string string string int`

## Irregular Data

The data we have looked at so far has all consisted of multiple values of the same shape, each independent of the rest. If we really want to save a program's state, though, we have to handle much more general data structures. Suppose, for example, that our program's state consists of a list of lists:

`state_0 = [[1, 2], [3, 4]]`

If it always consists of exactly one list with exactly two sublists, each of which has exactly two elements, persistence is pretty simple. But what if our program's state sometimes looks like this:

`state_1 = [[1, 2, 3], ]`

or like this:

`state_2 = [[1, 2], [3, ]]`

What we really want is a general-purpose persistence tool that can save arbitrary Python data structures to files and then read them back.

To start with, let's figure out how to handle atomic values like numbers and strings, and built-in containers like lists and dictionaries. Our first function looks like this:

```def save(writer, obj):
if type(obj) in Atomic_Types:
save_atomic(writer, obj)
elif type(obj) in Sequence_Types:
save_sequence(writer, obj)
elif type(obj) in Dict_Types:
save_dict(writer, obj)
else:
raise ValueError("Bad type: " + `t`)```

The logic is pretty simple: we use the built-in `type` function to get the type of the object we're persisting, then hand the object off to a specialized function that knows how to deal with that type. Our tests depend on the following tables:

```from types import *
Atomic_Types = {BooleanType, FloatType, IntType, NoneType, StringType, UnicodeType}
Sequence_Types = {ListType, TupleType}
Dict_Types = {DictType}```

How do we save atomic values? Simple: we write out the name of the type, followed by the Python representation of the value.

```def save_atomic(writer, obj):
print >> writer, repr(obj)```

`repr` is a built-in Python function that generates a string representation of its argument. The difference between it and `str` is that `repr`'s output is exactly what Python needs to parse to re-create the object in question. The main difference is in string handling: `str('a')` is just the string `'a'`, while `repr('a')` is the three-character string `"'a'"`. And yes, the double quotes are confusing: they're needed to show that the string (the stuff between the double quotes) is three characters long, and that the first and last characters are single quotes.

All right, how do we save sequences (lists and tuples)? Here's the code:

```Braces = {
ListType  : ( '[', ']' ),
TupleType : ( '(', ')' )
}

def save_sequence(writer, seq):
open, close = Braces[type(seq)]
print >> writer, open
for element in seq:
persist(writer, element)
print >> writer, close```

Again, the logic is pretty simple: we pick opening and closing braces based on the sequence's type (list or tuple), then print them out with the sequence's values in between. How do we get the values out? Simple: we just call the top-level function `persist`.

This is the point where some people balk. "How can we call `persist`? We're already calling it!" Well, yes and no: by the time we get into `save_sequence`, there will be a call to `persist` somewhere below us on the stack, but so what? Each call to a function automatically creates its own local variables; there's no danger of confusing those belonging to the next call with those belonging to the last.

Our last function, `save_dict`, looks a lot like `save_sequence`. There's only one style of bracing, and we have to save two objects for each entry (the key and the value), but those are pretty simple modifications:

```def save_dict(writer, d):
print >> writer, '{'
for key in d:
persist(writer, key)
persist(writer, assoc[key])
print >> writer, '}'```

How are we doing so far?

```test = {
1           : 'two',
(3, 'four') : [5, [6, 7]],
'eight'     : {9 : 10}
}
persist(sys.stdout, test)
{
(
3
'four'
)
[
5
[
6
7
]
]
1
'two'
'eight'
{
9
10
}
}```

So much for output: how do we read it back in? The answer to that question is going to reveal a problem with the format we've designed so far, so let's sketch it rather than doing a full implementation. The starting point is a `load` function:

```def load(reader):
if line == '(':
if line == '[':
elif line == '{':
else:
return result```

Yes, if we were doing this for real, we'd put those string constants in a lookup table, but we're just sketching... What does `load_sequence` look like?

```def load_sequence(reader, result, close):
while line != close:
result.append(val)
return result```

There are at least two things wrong with this. The first is that it doesn't work for tuples: we can't append values to an existing tuple the way we can for lists. Fine, we can fix that by always constructing a list, then converting to a tuple at the end. But there's a much bigger problem, one that isn't as easy to solve. Suppose our input is:

```[
1
]```

The initial call to `load` reads the opening '[' and calls `load_sequence`. That function reads the '1', decides that it isn't a closing ']', and calls `load` to read the list's contents. That second call to `load` then reads the closing ']', because the '1' has already been read by `load_sequence`. Kaboom: `load` blows up because it's expecting `load_sequence` to handle closing brackets, and the value in the list is lost.

The root of the problem is that once again we need to look ahead one element to tell if we're at the end of a list (or tuple, or dictionary). We can do this by passing the line most recently read as a parameter to every function we call:

```def load(reader, next_line):
if line == '(':
if line == '[':
elif line == '{':
else:
return result

while next_line != close:
result.append(val)
return result```

This code is significantly more complicated than what we started with—complicated enough that there's still at least one bug lurking in what we've written so far. (As an exercise, see if you can trace through the program's execution for nested lists and find it.)

So let's step back and re-evaluate. Our format uses human-friendly brackets and parentheses, even though these files are intended for machines to read and write. What if we switch to this:

```dict 3
tuple 2
. 3
. "four"
list 2
. 5
list 2
. 6
. 7
. 1
. "two"
. "eight"
dict 2
. 9
. 10```

That's a lot harder for human beings to read (quick, what's the second item in the dictionary?). But it turns out to be a lot easier for computers to read:

```def load(reader):

if typename = '.':
result = eval(value)

elif typename == 'list':
count = int(value)

elif typename == 'tuple':
count = int(value)

elif typename = 'dict':
count = int(value)

return result

result = []
for i in range(count):
result.append(value)
return result```

That's a lot simpler, isn't it? To load atomic values, we just evaluate their string representation. For everything else, the file itself tells us how many entries to read—there's no need to look ahead for an end-of-collection marker.

There are two lessons here. The first is that what's easy for human beings to work with is often hard for machines to handle, and vice versa. We're bad at counting but very good at interpreting context; machines find context very hard, but counting trivial. If something really has to be written and read by both, pick whatever is easy for people, and figure out how to get the computer to understand it. If people aren't ever going to view or edit data, though, then choose formatting rules to simplify programming.

The second lesson is about the design process. Experienced designers tend to work in a sawtooth pattern: they make a few high-level decisions, see what the implications are for implementation, then use what they've learned to fix design problems right away. Novices, on the other hand, tend to make some high-level decisions early on, then stick to them no matter what, which means they often spend working around things that could be avoided entirely. Musicians talk about learning from their instruments; good programmers do the same thing. ## Programs Are Data

At this point I need to apologize, because I cheated a bit in the previous example. Let's look more closely at the first few lines of `load`:

```def load(reader):

if typename = '.':
result = eval(value)

...```

That function `eval` seems to be turning a string into—well, into pretty much anything, since this branch of the code handles all of the basic types. Let's explore a little:

```>>>eval('1')
1
>>>eval('1 + 2.3')
3.3
>>>eval('[1, 2, 3]')
[1, 2, 3]```

`eval` takes a string containing a legal Python expression and executes it. Equivalently, it does exactly what Python does when it reads a string from the prompt or from a file: it parses that string to produce some Python instructions in memory and then immediately runs those instructions. There's no magic here: this is all built into the Python interpreter, just like string concatenation or floating point multiplication. It only seems strange because compiled languages like C, Fortran, and Java separate the two phases of "compile the program" and "run the program", where interpreted languages like Python, MATLAB, and Ruby combine the two.

So let's try this:

```>>>repr(1)
'1'
>>>text = repr(1)
>>>eval(text)
1
>>>repr([1, 2, 3])
'[1, 2, 3]'
>>>text = repr([1, 2, 3])
>>>eval(text)
[1, 2, 3]```

Right: `repr` produces a string holding exactly what you have to put in a Python program to produce its argument, and `eval` evaluates such a string. So how about this as a persistence mechanism:

```def save(writer, data):
writer.write(repr(data))

It can't be that simple, can it? Let's see:

```>>> from StringIO import StringIO
>>> data = [1, 'two', {'three' : 4}, [5, 6]]
>>> writer = StringIO()
>>> save(writer, data)
>>> text = writer.getvalue()
>>> text
"[1, 'two', {'three' : 4}, [5, 6]]"```

It seems to be saving properly; what about restoring?

```>>> reader = StringIO(text)
[1, 'two', {'three' : 4}, [5, 6]]```

Again, there is no magic here. Python's `repr` is doing exactly what we did when we wrote our earlier `save` function: it is walking through the data structure recursively, writing out items as it finds them. `eval` is more complicated than our earlier `load`, but only because it knows how to do arithmetic and other calculations. If all `eval` had to do was load data structures, it would look a lot like our `load` function.

But there is one more wrinkle. Let's try this:

```>>> x = []
>>> x.append(x)```

These two innocuous lines of code create the following data structure: What happens if we give it to `repr`?

```>>> repr(x)
'[[...]]'```

That "..." is Python's way of saying, "I give up." In order to save the list, it has to save the item in the list. In order to save that item, it has to save the list, because the only item in the list is the list.

This example seems pretty artificial, so let's look at another. Let's use dictionaries to represent early kings of France:

```>>> chlodio = {'name' : 'Chlodio the Longhair', 'from' : 428, 'to' : 448}
>>> merovech = {'name' : 'Merovech', 'from' : 448, 'to' : 457}
>>> childeric = {'name' : 'Childeric I', 'from' : 457, 'to' : 481}```

```>>> chlodio['heir'] = merovech
>>> merovech['heir'] = childeric
>>> childeric['heir'] = None  # was actually Clovis I, but never mind for now```

No problem yet, so let's keep going. (This is an example of Peter's Programming Principle: software tends to expand until it breaks.)

```>>> childeric['father] = merovech
>>> merovech['father] = chlodio
>>> chlodio['father] = None  # was actually Theudemeres, but never mind that either```

This is a perfectly reasonable structure, but once again, it's circular: in order to save `chlodio`, we have to save `merovech`, but to save `merovech`, we have to save `chlodio`. Python's `repr` function detects this and prints "..." to say, "Can't help you." Our earlier `load` function, in contrast, would just keep calling itself until it had used up all the memory set aside for the function call stack, at which point Python would fall over.

The solution? Keep track of things we have already saved, and if we encounter them again, print out a marker that means, "Already done that." Again, to keep things readable, we'll only look at the case of arbitrarily-nested lists of integers: we'll ignore floats, strings, tuples, dictionaries, and everything else for the moment.

That solution has three parts. The first simply gets the ball rolling:

```def save(writer, thing):
save_recursive(writer, thing, [])```

The second part, `save_recursive`, does most of the hard work. It checks to see whether the thing we're trying to save has already been seen, i.e., if this is the second (or third, or fourth...) time we've encountered an aliased object. If it is, we write "recycle", followed by a sequence number that uniquely identifies that thing. We'll see in a minute how these sequence numbers are created, what ensures they're unique, and so on.

If we haven't already seen `thing`, then it must be either an integer or a list. If it's an integer, we write "int" and its value as we've done before. If it's a list, we write "list" and its length, followed by its elements, which are written by calling `save_recursive` for each one. Before we do that, though, we append `thing` to the end of `seen` to record the fact that we've seen it. That way, if we encounter it again, our first test—the one that looks for things we've already printed out—will say, "Yes, this one has been seen before," and we won't try printing it again.

```def save_recursive(writer, thing, seen):
elif type(thing) is IntType:
print 'int', thing
elif type(thing) is ListType:
seen.append(thing)
print 'list', len(thing)
for item in thing:
save_recursive(writer, item, seen)
else:
assert False, 'Whoops'```

All that's left is to write `look_for`, the function that checks in `seen` to see if something has already been printed. If it has, we return its index in `seen` for `save_recursive` to print out (along with the word "recycle"). If we haven't seen this thing before, we return -1.

```def look_for(thing, seen):
try:
location = seen.index(thing)
if seen[location] is thing:  # rule out false matches
result = location
except ValueError:
pass # nothing matches
return result```

We have to do two checks when looking for `thing` in `seen`. The first, `seen.index(thing)`, tells us if something with the same value as `thing` is in the list; the second, `seen[location] is thing`, checks whether that value is actually the same object in memory as what we're looking for. This is needed because of this case:

```>>> first = [1, 2]
>>> second = [1, 2]      # same value, different object in memory
>>> outer = [first]
>>> outer.index(second)
0
>>> outer is second
False```

And that's pretty much it. We now have all the pieces we need to build functions to save and restore arbitrary chunks of data, of arbitrary types, connected in arbitrary ways. There are lots of ways to generalize it, and to make it faster: for example, by using a dictionary instead of a list to store things we've seen in order to keep search times under control. The standard Python library's `pickle` module provides a fast, general implementation of all these ideas, and there are other libraries that work across multiple languages.