Hello, and welcome to the tenth episode of the Software Carpentry lecture on Python. This episode will explain how Python libraries work, and introduce you to a couple that you may find useful.
As we saw in the previous episode, a function is a way to turn a bunch of related statements into a single "chunk" that can be re-used.
Modularizing code this way eliminates duplication…
…and makes code easier to read.
A library does for functions what functions do for statements: group them together to create more usable chunks.
This hierarchical organization is similar in spirit to that used in biology:
instead of family, genus, and species, we have library, function, and statement.
Every Python file can be used as a library by other programs.
To load it into memory, use the
For example, suppose we have created a Python file called
halman.py that defines a single function called
If we want to call this function in another file, we write
import halman to load the contents of
halman.py, and then call the function as
When we run
program.py, it does the right thing.
A file that has been imported into another program is called a library or module. When a module is imported, Python:
executes the statements it contains (which are usually, but not always, function defintions), and
creates an object to store references to all the items defined in that module, and assigns it to a variable with the same name as the module.
For example, let's create a file called
noisy.py that prints out a message and defines
NOISE_LEVEL to be 1/3.
import noisy, the first statement—the
…and the variable
NOISE_LEVEL is assigned a value, which we can access as
One important feature of modules is that each one is a separate namespace, i.e., variable names defined inside a module belong to that module, and if the same name is used in two different modules, each module gets its own.
When Python sees a reference to a variable, it looks in the current function call stack frame to find its definition.
If it can't find it there, it looks in the module the function was defined in (assuming it was defined in a library).
If it still can't find it, it looks in the global namespace belonging to the top-level program as a whole.
For example, let's create a file called
module.py that defines a variable called
NAME and a function called
func that prints it out.
In our main program, we also define a variable called
…then import our module.
When we call
module.func, it sees the
NAME variable that was defined inside the module, not the one that was defined globally. This "module first" rule makes it safe to load libraries that were written independently, without worrying about whether their authors might have used the same names for things.
Python comes with many standard libraries.
One of the most useful is the
sqrt for square roots…
hypot for calculating x2+y2…
…and values for e and π that are as accurate as the machine can make them.
To help you find your way around libraries, Python provides a
math has been imported, the call
help(math) prints out the documentation embedded in the math library.
Python also provides a few convenient alternatives for doing imports.
For example, we can import specific functions from a library and then call them directly, rather than using the
We can also import a function under a different name, so that if two modules define functions with the same name, we can give one or the other a different name when we want to use them together.
We can also use
import * to bring everything in the module into the current namespace at once, which has the same effect as using
from module import a,
from module import b, and so on for every name in the module.
This is almost always a bad idea, though.
If someone adds a new function or variable to the next version of the module, that
import * could silently overwrite something that you're importing from somewhere else, leading to a hard-to-find bug.
math library is useful, the
sys library is even more so.
Once it's imported…
…we can find out exactly what version of Python we're using…
…what operating system we're running on…
…and a few other things, like how large integers in this version.
What may be more interesting is
sys.path, which defines the list of directories Python searches in to find modules. When a program executes
import X, Python looks at each of these directories in turn to see if it contains a file called
X.py, and loads the first one it finds. If your program isn't finding the definitions you think it should, try printing out
sys.path to see if the problem is a missing directory.
The most commonly-used element of
sys is probably
sys.argv, which holds the command-line arguments of the currently-executing program.
In keeping with Unix conventions, the name of the script itself is put in
sys.argv; all the arguments given to the script when it was run are put in
sys.argv, and so on..
For example, here's a program that does nothing except print out its command-line arguments.
If it is run without any arguments, it just reports that
When it is run with arguments, though, it displays those as well.
sys also creates variables to connect programs to standard I/O channels.
sys.stdin is standard input (which is usually connected to the keyboard).
sys.stdout is standard output, which by default is connected to the screen.
sys.stderr is standard error, which is also usually connected to the screen.
For more information on what these are for, and how to use them, please see the lecture on the Unix shell.
Here's a typical example of how these variables are used together. This little program looks at
sys.argv to see if it was called with a filename as an argument or not.
If there were no arguments, then
sys.argv will only hold the name of the program, and its length will be 1. In that case, the program reads data from standard input.
Otherwise, the program assumes its first command-line argument is the name of an input file, opens it, and reads from it instead.
Sure enough, if we run the program with no command-line arguments, and send it the contents of the file
a.txt using redirection, it tell us that its standard input has 48 lines.
If we run it with a filename as an argument, on the other hand, it reads from that file and tells us it has 227 lines. Again, please see the lecture on the Unix shell for more information on standard input, standard output, and redirection.
Here's a more polite way to write the program we just created. The two significant changes are:
the strings at the start of the module, and the start of the function
the funny-looking conditional
if __name__ == '__main__'. Let's look at them in that order.
If the first thing in a module or function other than blank lines or comments is a string that isn't assigned to anything, Python saves it as the documentation string, or docstring, for that module or function.
These docstrings are what online (and offline) help display.
For example, let's create a file
adder.py with a single function
add, and write docstrings for both the module and the function.
If we then import
help(adder) will print out all of its docstrings, i.e., the documentation for the module itself and for all of its functions.
We can also be more selective, and only display the help for a particular function instead.
The second part of our "more polite" program was that funny
if statement. The trick here is that when Python reads in a file, it assigns a value to a special top-level variable called
__name__ (with two underscores before and after).
If the file is being run as the main program,
__name__ is assigned the string
'__main__' (again with two underscores before and after).
If the file is being loaded as a module by some other program, though, Python assigns the module's name to the variable
So imagine the file contains some definitions, and then the conditional statement
if __name__ == '__main__'.
The definitions will always be executed…
…but the code inside the conditional will only run if the file is the main program. Put another way, the statements inside the conditional will not be run if the file is being loaded as a library by some other program.
Let's see how this works. Here's a file
stats.py that defines a function
average, and then runs three simple tests—but only if
__name__ has the value
And here's another file,
test-stats.py, that imports
stats and runs two more tests.
If we run
stats.py directly, the three tests inside it are executed.
If we run
test-stats.py, though, those three tests aren't executed—only the two in
test-stats.py itself are run. This happens (or doesn't happen) because the variable
stats is assigned the string
'stats' instead of the string
stats is loaded as a module.