Find us on GitHub

Teaching basic lab skills
for research computing

Sets and Dictionaries: Examples

Sets and Dictionaries/Examples at YouTube

Hello, and welcome to the fourth episode of the Software Carpentry lecture on sets and dictionaries. In this episode, we'll show you a few examples of how you can use dictionaries in your programs.

Let's start with the question, "How early in the day did we see each kind of bird?"

Our data looks like this: the date and time of the observation, the bird's name, and an optional comment.

We want the minimum of all the times associated with each bird name.

We'll use a dictionary with the bird name as the key…

…and the earliest observation time as the value.

First, though, let's read our data file and create a list of tuples, where each tuple has the date and time as strings, and the bird name.

Our function follows the pattern we've seen many times before. We set up by opening the input file and creating an empty list that we'll append records to.

We then process each line of the file in turn. Splitting the line on the '#' character and taking the first part of the result gets rid of any comment that might be present; stripping off whitespace and then splitting breaks the remainder into fields.

To prevent trouble later on, we check that there actually are three fields before going on. An industrial-strength version of this function would also check that the date and time were properly formatted, but we'll skip that for now.

Once we've done our check, we append the triple containing the date, time, and bird name to the list we're going to return.

Here's the function that processes that list of triples.

This time, we set up by creating an empty dictionary.

Our loop handles one triple at a time. Notice here that we're splitting the triple into its component fields in the loop header: the variable date is automatically assigned the first field of the current tuple, the variable time the second, and the variable name the third.

If this bird's name is not already a key in our dictionary, this must be the first time we've seen it. The value of time in this record is therefore the earliest we've seen the bird, so we record it in a new entry in the dictionary.

Otherwise, if there's already an entry for this bird in the dictionary, we record the minimum of the stored time and the new time.

All right, that wasn't too bad. What if we want to find out what birds were seen on each day that we were observing?

This is similar to the problem we just solved, so the function we'll write has a similar structure.

However, since we probably saw more than one kind of bird each day, the values in our dictionary need to be some sort of collection. Since we're only interested in which birds we saw, we can use a set.

Here's our function.

Again, we set up by creating an empty dictionary…

…and process each record in turn, unpacking it automatically in the loop header.

Since we're recording birds by date, the keys in our dictionary are dates rather than bird names. If the current date isn't already a key in the dictionary, we add it with an empty set as the associated value.

It is then safe to add the current bird to the set associated with the current date, since the previous two lines guaranteed that there would be such a set, even if this is the first observation for the date in question.

Let's watch this function in action. We start with an empty set.

After reading the first observation, we add an entry to our dictionary with the date '2010-07-03' as its key, and an empty set as its value. We then immediately add the name 'loon' to that set, leaving the structure shown here.

Our next observation is a goose on the same day, so we put 'goose' in the set we just created.

Our third observation is another loon. Since adding a value to a set that's already present has no effect, our data structure doesn't change.

Next, though, we have the first observation for July 4th. Since it's the first time we've seen this date, our function adds a new entry to the main dictionary with '2010-07-04' as the key and a set as the value, then adds 'ostrich' to that set.

Finally, we have another damn loon, which goes into the second of our sets.

For our last example, we'll find which bird we saw least frequently.

Actually, we'll find which birds, since two or more may be tied for the low score.

One way to solve this problem is to do two passes over our data.

In the first pass, we find the minimum value in the dictionary.

In the second, we find all the keys that have that value.

It's relatively easy to combine these operations, though, so we'll do the whole thing in one pass.

We'll assume that we already have a dictionary of bird names and total observation counts.

This is the skeleton for our function: as usual, it contains some setup code and a loop. The setup code creates a set to hold the names of the birds we saw least frequently. It also creates a helper variable called number, which holds the current low score. These values are just placeholders; we'll see in a moment that they are overwritten the very first time we process any real data.

Inside the loop, we have to handle three cases.

If our result set is empty, then this must be the very first bird we've processed. In that case, we record its name, and set number to the number of times we saw this bird. This case should only execute once, since after it runs, the set called result should never be empty, and number should always be greater than zero.

In our second case, the number associated with the bird we're processing is a new minimum. When this happens, we throw away the set of birds we've seen so far and replace it with a set containing only the name of this bird, and assign the new lowest score to number.

Finally, if the count for this bird is the same as the current lowest score, we add this bird's name to the set of birds with that score. There isn't a branch of the if to handle the case where the count for the current bird is greater than the current minimum, since there's nothing for us to do in that case.

Let's watch this function run. Our input dictionary has three entries; we initialize number to 0 and result to an empty set.

Processing the first entry in the dictionary takes us into the first of our three cases: number is assigned 3, and 'loon' is put in our set of birds.

The second entry has a new minimum count, so we replace both the value of number and our set. This is case #2 from above.

Finally, the third bird's count is the same as the current minimum, so we just add it to the set.

We hope these examples give you an idea of what you can do with dictionaries, and how useful they can be. In the next episode, we'll take a look at a slightly larger example. Before watching it, please give a few of the exercises a try.