Hello, and welcome to the third episode of the Software Carpentry lecture on sets and dictionaries. In this episode we'll have a look at dictionaries, which are almost as widely used as lists, and often more useful.
Let's go back to the summer we spent counting birds in northern Ontario.
Our supervisor would like to know how many birds we saw of each type.
Our input is a list of several thousand observations.
The output we want is a list of names and counts.
Using what we've learned so far, we could do this using a list of pairs, where each pair is a sublist storing a bird name and the number of times we've seen it so far.
Here's a function to add another observation to such a list.
Its first argument is the list of pairs.
Its second is the bird name being added.
Inside the function, we loop over the pairs that are already in the list.
If the first element of the pair is the same as the bird name we've been given…
…we add one to the associated count and return from the function immediately—there's no need to look at the other pairs in the list.
Otherwise, if we reach the end of the list without finding a matching name, we add a new pair to the list consisting of the bird's name and '1' (since we have now seen the bird one time).
This is an instance of a common programming pattern: either we find pre-existing data in a loop and return right away, or take some default action if we fall off the end of the loop without finding a match.
Let's take a look at this function in action. We start with an empty list.
If our first bird name is 'loon', we find no match (since the list is empty), so we add a pair to the list.
The next name is 'goose'. This doesn't match 'loon', so we add another pair.
Our third record is another loon. This matches the first entry in the list, so we add one to its count and return immediately, leaving the list as shown here.
This "list of pairs" approach works, but there's a much solution to this problem, one that is simpler and many times efficient.
That solution uses a dictionary instead of a list.
A dictionary is a unordered collection of key/value pairs.
Like the elements in a set, keys are:
and not stored in any particular order.
There are no restrictions on the values stored with those keys.
In particular, they don't have to be immutable or unique.
You can create a new dictionary by putting key, colon, value pairs inside curly braces.
For example, here we're creating a new dictionary with two entries and assigning it to the variable
birthdays. The dictionary has two keys, which are the strings
'Darwin'. The value associated with
'Newton' is 1642, while the value associated with
'Darwin' is 1809.
We can get the value associated with a key by putting the key in square brackets.
This looks just like subscripting a string or list, except dictionary keys don't have to be integers—they can be strings, tuples, and so on.
For example, the expression
birthdays['Newton'] returns the value 1642, since that's what we associated with the key
It's just like using a phonebook or a real dictionary: instead of looking things up by integer index, we can look things up by name.
If we want to add another key/value pair to a dictionary, all we have to do is assign something to it.
Here, for example, the expression
birthdays['Turing'] = 1612 adds a third pair to our dictionary with the key
'Turing' and the value 1612.
Assignment also overwrites values.
Since Alan Turing was actually born in 1912, we can fix the mistake in the previous line simply by assigning that value to the key
Keep in mind that just like the elements in a set, the keys in a dictionary are not stored in any particular order.
That's why we draw dictionaries like this: a cloud of pairs, each of which has a key and a value.
If a key isn't actually in a dictionary, trying to read its value is an error, just like trying to read off the end of a list.
For example, if we try to find Florence Nightingale's birthday with
birthdays['Nightingale'], Python reports a
If we're not sure whether a key is in a dictionary or not, we can test for it using the
Given the current state of our
'Nightingale' in birthdays is
'Darwin' in birthdays is
Finally, we can loop over the keys in a dictionary using
This is slightly different from looping over a list: when we loop over a dictionary, we get the keys, which we can use to look up the values. When we loop over a list, we get its values right away, since its "keys" are just the integers 0, 1, and so on, which aren't usually very interesting.
Here, for example, the
for loop assigns each key of the
birthdays dictionary to the variable
name in turn. Inside the loop, we use
name to look up the value associated with that key, and print both out.
Now, let's go back and count those birds.
Here's the main body of our program.
The first three lines read the input file into a list of strings.
We then call a function
count_names to turn that list into a dictionary of bird names and counts…
…then use a loop like the one we just saw to print out the dictionary's contents.
Here's the function that does the counting.
As always, we start with a docstring to explain the function's purpose to whoever has to read it next.
We get set up by creating an empty dictionary to fill with data…
…then loop over the strings in the input list one by one.
After stripping off any leading or trailing whitespace as usual…
…we check to see if we've seen this bird name before.
If we have, we add one to the associated count.
If it's the first time we've seen this name, though, we add a new entry to the dictionary, using the bird name as the key and '1' as the value.
Finally, we return the dictionary we've created.
Let's watch our counting function in action.
Before we read any data, our dictionary is empty.
After we see 'loon' for the first time, our dictionary has one entry. The key is
'loon', and the value is 1.
When we see 'goose', we add another entry to the dictionary.
And when we see 'loon' for the second time, we add one to its count.
So why use a dictionary rather than a list of pairs? Because it's much more efficient. Just like sets, dictionaries are stored using hash tables, which guarantee that finding or modifying values takes roughly constant time. This is a lot better than the list-based method, where the time grows in proportion to the number of pairs in the list.
In our next episode, we'll look at some other examples of dictionaries in action.