Hello, and welcome to the ninth episode of the Software Carpentry lecture on Python. In this episode, we'll take a closer look at some things you can do with functions that will allow you to do more with less code.
As we've seen in previous episodes, an integer is just 32 bits of data…
…that a variable can refer to.
And a string is just a sequence of bytes…
…that variables can also refer to.
Well, it turns out that a function is just another sequence of bytes—ones that happen to represent instructions…
…and yes, variables can refer to them to.
This insight—the fact that code is just another kind of data, and can be manipulated like integers or strings—is one of the most useful and powerful in all computing.
To understand why, let's have a closer look at what actually happens when we define a function.
These two lines of code tell Python that threshold
is a function that returns one over the sum of the values in signal
.
When we define it, Python translates the statements in the function into a blob of bytes, then creates a variable called threshold
and makes it point at that blob.
This is not really any different from assigning the string 'alan turing'
to the variable name
: the only difference is what's in the memory the variable points to.
Well, if threshold
is just a reference to a value in memory, we should be able to assign that reference to another variable.
Here's our starting point once again.
And here's the assignment: t = threshold
. As you can see, t
now points to the same data as threshold
: it is an alias for the function.
To prove this is so, let's try calling t
. The result is exactly what we would get if we called threshold
with the same parameters, because t
and threshold
are the same function.
Here's another thought: if the function is "just" data, can we put a reference to it in a list?
Let's define two functions, area
and circumference
, each of which takes a circle's radius as a parameter and returns the appropriate value.
Once those functions are defined, we can put them into a list like this. Of course, what we really mean is we're copying the references to the functions stored in the variables area
and circumference
into the list, so that its first and second elements refer to the same blobs of instructions.
We can now loop through the functions in the list, calling each in turn.
Sure enough, the output is what we would get if we called area
and then circumference
with the parameter 1.0.
Let's go a little further. Instead of storing a reference to a function in a list, let's pass that reference into another function, just as we would pass a reference to an integer, a string, or a list.
Here's a function called call_it
that takes two parameters: a reference to some other function, and some other value. All call_it
does is call that other function with the given value as a parameter.
Let's test it with area
and 1.0: that's right.
And with circumference
and 1.0: right again.
So far, so pointless, but now it's time for the payoff: functions of functions.
Here's a function called do_all
that, as its name suggests, applies some function—anything at all that takes one argument—to each value in a list, and returns a list of the results.
If we call do_all
with area
and a list of numbers…
…we get what we would get if we called area
directly on each number in turn.
And if we define a function to "slim down" strings of text by throwing away their first and last characters…
…we can apply it to every string in a list, without having to copy the code that loops through the list, calls the function, and concatenates the results.
Functions that operate on other functions are called higher-order functions. They're common in mathematics: integration, for example, is a function that takes some other function and two endpoints as parameters. In programming, higher-order functions allow us to re-use control flow rather than rewriting it.
Here's another example. combine_values
takes a function and a list of values as parameters, and "adds up" the values in the list using the function provided, returning a single value as its result.
To show how this works, let's define add
and mul
to add and multiply values.
If we combine 1, 3, and 5 with add
, we get their sum, 9.
If we combine the same values with mul
, we get their product, 15. This same higher-order function combine_values
could concatenate lists of strings, too, or multiply several matrices together, or whatever else we wanted, without us writing the loop logic ever again.
Higher-order functions are a good thing because they let us do more with less code. If we don't use higher-order functions…
…then we have to write one function for each combination of data structure and operation, i.e., one function to add numbers, another to concatenate strings, a third to sum matrices…
…and so on.
With higher-order functions, on the other hand…
…we only write one function for each basic operation, and one function for each kind of data structure…
…and since A plus B is usually a lot smaller than A times B, this saves us coding, testing, and debugging.
Of course, we have to know something about the function our higher-order function is operating on…
…like how many arguments it takes.
But in Python and many other languages, we can even get around that.
Here's a function called add_all
that sums up values using plus.
Notice the '*' in front of the parameter args
. This tells Python to take all of the parameters passed into the function when it's called and put them together in a special type of list called a tuple, which we'll explore in more detail in a couple of lectures.
If we call add_all
with no arguments, the tuple that's assigned to args
has no elements, so add_all
returns 0.
If we call add_all
with the integers 1, 2, and 3, though, args
has three elements, which add_all
sums up to get 6.
We can use this "catch-all" parameter with regular parameters, as long as the catch-all comes last.
Here, for example, is another version of combine_values
.
The only difference between it and the previous version is the '*' in front of the parameter values
.
This small change means that we don't have to put the values we want to combine into a list before calling the function: the first actual parameter is assigned to func
as before, and everything else goes into the tuple values
.
As an aside before we finish this episode, what do you think combine_values
will do if we only provide a function, and no values for that function to operate on?
More importantly, what do you think it should do?
Several higher-order functions are actually built in to Python. One is filter
, which constructs a new list containing all the values in an original list for which some function is true.
Another is map
, which applies a function to every element of a list, returning a list of results…
…and then there's reduce
, which combines values using a binary function, returning a single value as a result.
For example, if positive
is True when its argument is greater than or equal to 0, then filter
of positive
and a list of numbers returns a list of non-negative numbers.
If negate
changes the sign of its argument, map
of negate
returns a list of negated values…
…and of course, using reduce
with add
returns the sum of the values in the list.
It usually takes a while to get used to working with higher-order functions, but while the ideas are digesting, let's step back and ask, "What is programming anyway?"
Novices usually think that it means writing instructions for a computer.
But more experienced programmers think of it as creating and combining abstractions.
When we're programming, our goal is to spot a pattern, like "combine all elements of a list using a binary function".
And then write it down once, as clearly as possible…
…so that we can build more patterns on top of it.
Be cautious, though: the limits of human short-term memory still apply.
If you pile too many abstractions on top of one another, it can be difficult to figure out what the end result actually does.
Thank you.