Find us on GitHub

Teaching basic lab skills
for research computing

Python: First Class Functions

Python/First Class Functions at YouTube

Hello, and welcome to the ninth episode of the Software Carpentry lecture on Python. In this episode, we'll take a closer look at some things you can do with functions that will allow you to do more with less code.

As we've seen in previous episodes, an integer is just 32 bits of data…

…that a variable can refer to.

And a string is just a sequence of bytes…

…that variables can also refer to.

Well, it turns out that a function is just another sequence of bytes—ones that happen to represent instructions…

…and yes, variables can refer to them to.

This insight—the fact that code is just another kind of data, and can be manipulated like integers or strings—is one of the most useful and powerful in all computing.

To understand why, let's have a closer look at what actually happens when we define a function.

These two lines of code tell Python that threshold is a function that returns one over the sum of the values in signal.

When we define it, Python translates the statements in the function into a blob of bytes, then creates a variable called threshold and makes it point at that blob.

This is not really any different from assigning the string 'alan turing' to the variable name: the only difference is what's in the memory the variable points to.

Well, if threshold is just a reference to a value in memory, we should be able to assign that reference to another variable.

Here's our starting point once again.

And here's the assignment: t = threshold. As you can see, t now points to the same data as threshold: it is an alias for the function.

To prove this is so, let's try calling t. The result is exactly what we would get if we called threshold with the same parameters, because t and threshold are the same function.

Here's another thought: if the function is "just" data, can we put a reference to it in a list?

Let's define two functions, area and circumference, each of which takes a circle's radius as a parameter and returns the appropriate value.

Once those functions are defined, we can put them into a list like this. Of course, what we really mean is we're copying the references to the functions stored in the variables area and circumference into the list, so that its first and second elements refer to the same blobs of instructions.

We can now loop through the functions in the list, calling each in turn.

Sure enough, the output is what we would get if we called area and then circumference with the parameter 1.0.

Let's go a little further. Instead of storing a reference to a function in a list, let's pass that reference into another function, just as we would pass a reference to an integer, a string, or a list.

Here's a function called call_it that takes two parameters: a reference to some other function, and some other value. All call_it does is call that other function with the given value as a parameter.

Let's test it with area and 1.0: that's right.

And with circumference and 1.0: right again.

So far, so pointless, but now it's time for the payoff: functions of functions.

Here's a function called do_all that, as its name suggests, applies some function—anything at all that takes one argument—to each value in a list, and returns a list of the results.

If we call do_all with area and a list of numbers…

…we get what we would get if we called area directly on each number in turn.

And if we define a function to "slim down" strings of text by throwing away their first and last characters…

…we can apply it to every string in a list, without having to copy the code that loops through the list, calls the function, and concatenates the results.

Functions that operate on other functions are called higher-order functions. They're common in mathematics: integration, for example, is a function that takes some other function and two endpoints as parameters. In programming, higher-order functions allow us to re-use control flow rather than rewriting it.

Here's another example. combine_values takes a function and a list of values as parameters, and "adds up" the values in the list using the function provided, returning a single value as its result.

To show how this works, let's define add and mul to add and multiply values.

If we combine 1, 3, and 5 with add, we get their sum, 9.

If we combine the same values with mul, we get their product, 15. This same higher-order function combine_values could concatenate lists of strings, too, or multiply several matrices together, or whatever else we wanted, without us writing the loop logic ever again.

Higher-order functions are a good thing because they let us do more with less code. If we don't use higher-order functions…

…then we have to write one function for each combination of data structure and operation, i.e., one function to add numbers, another to concatenate strings, a third to sum matrices…

…and so on.

With higher-order functions, on the other hand…

…we only write one function for each basic operation, and one function for each kind of data structure…

…and since A plus B is usually a lot smaller than A times B, this saves us coding, testing, and debugging.

Of course, we have to know something about the function our higher-order function is operating on…

…like how many arguments it takes.

But in Python and many other languages, we can even get around that.

Here's a function called add_all that sums up values using plus.

Notice the '*' in front of the parameter args. This tells Python to take all of the parameters passed into the function when it's called and put them together in a special type of list called a tuple, which we'll explore in more detail in a couple of lectures.

If we call add_all with no arguments, the tuple that's assigned to args has no elements, so add_all returns 0.

If we call add_all with the integers 1, 2, and 3, though, args has three elements, which add_all sums up to get 6.

We can use this "catch-all" parameter with regular parameters, as long as the catch-all comes last.

Here, for example, is another version of combine_values.

The only difference between it and the previous version is the '*' in front of the parameter values.

This small change means that we don't have to put the values we want to combine into a list before calling the function: the first actual parameter is assigned to func as before, and everything else goes into the tuple values.

As an aside before we finish this episode, what do you think combine_values will do if we only provide a function, and no values for that function to operate on?

More importantly, what do you think it should do?

Several higher-order functions are actually built in to Python. One is filter, which constructs a new list containing all the values in an original list for which some function is true.

Another is map, which applies a function to every element of a list, returning a list of results…

…and then there's reduce, which combines values using a binary function, returning a single value as a result.

For example, if positive is True when its argument is greater than or equal to 0, then filter of positive and a list of numbers returns a list of non-negative numbers.

If negate changes the sign of its argument, map of negate returns a list of negated values…

…and of course, using reduce with add returns the sum of the values in the list.

It usually takes a while to get used to working with higher-order functions, but while the ideas are digesting, let's step back and ask, "What is programming anyway?"

Novices usually think that it means writing instructions for a computer.

But more experienced programmers think of it as creating and combining abstractions.

When we're programming, our goal is to spot a pattern, like "combine all elements of a list using a binary function".

And then write it down once, as clearly as possible…

…so that we can build more patterns on top of it.

Be cautious, though: the limits of human short-term memory still apply.

If you pile too many abstractions on top of one another, it can be difficult to figure out what the end result actually does.

Thank you.