Find us on GitHub

Teaching basic lab skills
for research computing

Python: Lists

Python/Lists at YouTube

Hello, and welcome to the fourth episode of the Software Carpentry lecture on Python. In this episode, we'll have a look at lists.

While loops let us do things many times…

…collections let us store many values together, so that we don't have to define new variables for each piece of data we want to work with.

The most popular kind of collection in Python is the list, which takes the place of arrays in languages like C and Fortran.

To create a list, just put some values in square brackets with commas in between.

To fetch the element at a location, put the index of that location in square brackets.

For example, we can create a list of the atomic symbols of the first four noble gases…

…and then print out the list element at location 1.

And yes, Python indexes lists starting at 0, not at 1.

There actually was a reason for this back in 1970, when the language C was invented; today, we just have to put up with it.

And just as it's an error to try to get the value of a variable that hasn't been defined, it's an error to try to access a list element that doesn't exist.

For example, if our list of noble gases has four elements, legal indices for the list are 0, 1, 2, and 3, so trying to access element 4 produces an error.

If we don't know how long a list is, we can use the built-in function len to find out.

As you'd expect, it returns 4 for our list of gases.

And it returns 0 for the empty list, which is written as a pair of square brackets with nothing in between.

We said earlier that list indices start at 0, but in fact, some negative indices work as well.

In Python, values[-1] is the last element of the list, values[-2] is the next-to-last, and so on, counting backward from the end of the list.

For example, here's our list of gases again.

As you can see, element -1 is krypton (the last in the list), and element -4 is helium.

This notation is easier to read than the long-winded alternative…

…which means programmers are less likely to make mistakes with it.

Lists have two important characteristics. First, they are mutable, i.e., they can be changed after they are created.

For example, suppose we misspell the last entry in our list of gases.

We can correct our mistake by assigning to that element of the list as if it were any other variable.

Sure enough, our list has been updated in place.

As you probably expect by now, the location must exist before a value can be assigned to it.

If our list has four elements…

…then assigning to index 4 produces an error, because the legal indices are 0 to 3 (or -1 to -4 if we're counting from the end).

The second important characteristic of lists is that they are heterogeneous, i.e., they can store values of many different types. This makes them different from arrays in C and Fortran, whose entries all have to be the same type.

Here for example, we have created two lists…

…each of which contains both a string and an integer.

This picture shows what's in memory after the second list is created: each list stores a reference to a string, and a reference to an integer.

Lists can even store references to other lists. We can, for example, create a list gases whose two entries are references to the lists helium and neon.

There's nothing magical about this: if we update our picture of what's in memory, we simply have another two-element list that stores references to other things we've already created.

Nesting data structures like this allows us to do some very powerful things. It can also be a rich source of bugs, so we will delay discussion of the details to a later episode.

Lists and loops naturally go together: we almost always use a loop of some kind to operate on all the list's elements.

For example, we can use a while loop to step through the indices of a list to get each of its elements in turn.

Here's a short program that prints the noble gases one by one.

We start the loop variable i at 0, which is the first legal list index.

Each time through the loop, we add 1 to it, so that we move through the set of legal list indices in order.

We keep going as long as i is less than the length of the list, i.e., as long as it's a legal index.

And sure enough, this loop prints out each list element in order.

This works, but it's tedious to type it all in time after time.

And it's all too easy to forget to increment the loop index, or to get the loop control condition wrong.

To make things simpler, Python provides a second kind of loop called a for loop that gives the program each list element in turn.

Here, for example, we do in one line (for gas in gases) what took three lines in the previous program.

As you can see, the for loop variable is assigned each element of the list in turn…

not each index.

Python does this because it's the most common case: most of the time that a program wants to do something with each list element, it doesn't care what that element's location is.

As we said a few medias ago, lists are mutable: their elements can be changed in place. We can also delete elements entirely, which shortens the list.

Let's set up our noble gas list again…

..and then tell Python to delete element 0 using the del operator.

If we print gases out afterward, it only has three elements.

If we delete element 2 of this list (which is now the last element, since the list's length is 3)…

…we're left with a two-element list.

And yes, deleting an index that doesn't exist is an error.

We can lengthen lists, too, by appending new elements.

Let's assign an empty list to gases

…then append the string 'He'

…and the string 'Ne'

…and finally the string 'Ar'.

Our list now has three elements.

dot-append is an example of a method, and most operations on lists (and other things) are expressed this way.

A method is a function that "belongs to" (and usually operates on) a specific chunk of data.

If the data is stored in thing, then we call the method using the notation "thing dot methodname", passing in any arguments it takes inside parentheses.

To show you how this works, here are a few useful list methods.

Let's create the gases list again, but with 'He' duplicated at the front.

gases.count('He') tells us that 'He' occurs twice in the list.

gases.index('Ar') tells us that the index of the first occurrence of 'Ar' is 2. (Remember indexing starts at zero, so element 2 is the third element of the list.)

gases.insert takes two arguments: the index where we want to insert something, and the something we want to insert. It doesn't return any value…

…but if we print out the list after calling it, we can see that 'Ne' has been put at location 1, and everything above that has been bumped up to make room, leaving us with a list of five elements.

Here are two methods that are often used incorrectly.

Let's re-set the gases list…

…and then print the result of gases.sort(). As you can see, the sort method returns None, which is the special value Python uses for "nothing here".

However, if we now print gases, it has been sorted alphabetically.

Similarly, gases.reverse() returns nothing…

…but reverses the list in place.

People often expect sort and reverse to return the sorted or reversed list, which leads to a common bug:

gases = gases.sort() does sort the list that gases refers to, but then assigns None to the variable gases, effectively throwing away the data that has just been sorted.

List's find method tells us where something is in a list, but if we just want to know whether something is there or not, we can use the in operator.

Here's our list of gases again.

As expected, the expression 'He' in gases is true.

in is most often used in if statements, as in this example.

Since 'Pu' is not in the list gases, this tells us that the universe is well ordered.

The last thing we will introduce in this episode is the range function, which constructs sequences of integers.

The expression range(5) produces the list of numbers from 0 to 4…

…while range(2, 6) produces 2, 3, 4, 5…

…and range(0, 10, 3) produces 0, 3, 6, 9, i.e., starts at the first argument, and goes up to but not including the second argument, using the third argument as the step size.

range(10, 0) does not produce a list in reverse order: instead, it starts at 10, and tries to go "up to" 0. Since nothing fits that description, it produces the empty list.

Well, if len(list) is the length of a list, and range(N) is the integers from 0 to N-1, then range(len(list)) is the integers from 0 to 1 less than the length of the list, i.e., all the legal indices of the list.

An example will make this clearer. Here's our list of gases.

Its length is 4.

So range(len(gases)), or range(4), is 0, 1, 2, and 3.

If we use range(len(gases)) in a for loop, it assigns each index of the list to the loop variable in turn…

…so we can print out (index, element) pairs one by one.

This is a very common idiom in Python for those cases where we really do want to know each element's location as well as its value.

We'll see an even better way to do it later.