Find us on GitHub

Teaching basic lab skills
for research computing

Python: Libraries

Python/Libraries at YouTube

Hello, and welcome to the tenth episode of the Software Carpentry lecture on Python. This episode will explain how Python libraries work, and introduce you to a couple that you may find useful.

As we saw in the previous episode, a function is a way to turn a bunch of related statements into a single "chunk" that can be re-used.

Modularizing code this way eliminates duplication…

…and makes code easier to read.

A library does for functions what functions do for statements: group them together to create more usable chunks.

This hierarchical organization is similar in spirit to that used in biology:

instead of family, genus, and species, we have library, function, and statement.

Every Python file can be used as a library by other programs.

To load it into memory, use the import statement.

For example, suppose we have created a Python file called halman.py that defines a single function called threshold.

If we want to call this function in another file, we write import halman to load the contents of halman.py, and then call the function as halman.threshold.

When we run program.py, it does the right thing.

A file that has been imported into another program is called a library or module. When a module is imported, Python:

executes the statements it contains (which are usually, but not always, function defintions), and

creates an object to store references to all the items defined in that module, and assigns it to a variable with the same name as the module.

For example, let's create a file called noisy.py that prints out a message and defines NOISE_LEVEL to be 1/3.

When we import noisy, the first statement—the print—is executed, displaying a message on the screen…

…and the variable NOISE_LEVEL is assigned a value, which we can access as noisy.NOISE_LEVEL.

One important feature of modules is that each one is a separate namespace, i.e., variable names defined inside a module belong to that module, and if the same name is used in two different modules, each module gets its own.

When Python sees a reference to a variable, it looks in the current function call stack frame to find its definition.

If it can't find it there, it looks in the module the function was defined in (assuming it was defined in a library).

If it still can't find it, it looks in the global namespace belonging to the top-level program as a whole.

For example, let's create a file called module.py that defines a variable called NAME and a function called func that prints it out.

In our main program, we also define a variable called NAME……

…then import our module.

When we call module.func, it sees the NAME variable that was defined inside the module, not the one that was defined globally. This "module first" rule makes it safe to load libraries that were written independently, without worrying about whether their authors might have used the same names for things.

Python comes with many standard libraries.

One of the most useful is the math library…

…which defines sqrt for square roots…

hypot for calculating x2+y2

…and values for e and π that are as accurate as the machine can make them.

To help you find your way around libraries, Python provides a help function.

If math has been imported, the call help(math) prints out the documentation embedded in the math library.

Python also provides a few convenient alternatives for doing imports.

For example, we can import specific functions from a library and then call them directly, rather than using the modulename.functionname syntax.

We can also import a function under a different name, so that if two modules define functions with the same name, we can give one or the other a different name when we want to use them together.

We can also use import * to bring everything in the module into the current namespace at once, which has the same effect as using from module import a, from module import b, and so on for every name in the module.

This is almost always a bad idea, though.

If someone adds a new function or variable to the next version of the module, that import * could silently overwrite something that you're importing from somewhere else, leading to a hard-to-find bug.

While the math library is useful, the sys library is even more so.

Once it's imported…

…we can find out exactly what version of Python we're using…

…what operating system we're running on…

…and a few other things, like how large integers in this version.

What may be more interesting is sys.path, which defines the list of directories Python searches in to find modules. When a program executes import X, Python looks at each of these directories in turn to see if it contains a file called X.py, and loads the first one it finds. If your program isn't finding the definitions you think it should, try printing out sys.path to see if the problem is a missing directory.

The most commonly-used element of sys is probably sys.argv, which holds the command-line arguments of the currently-executing program.

In keeping with Unix conventions, the name of the script itself is put in sys.argv[0]; all the arguments given to the script when it was run are put in sys.argv[1], sys.argv[2], and so on..

For example, here's a program that does nothing except print out its command-line arguments.

If it is run without any arguments, it just reports that sys.argv[0] is echo.py.

When it is run with arguments, though, it displays those as well.

sys also creates variables to connect programs to standard I/O channels. sys.stdin is standard input (which is usually connected to the keyboard).

sys.stdout is standard output, which by default is connected to the screen.

And sys.stderr is standard error, which is also usually connected to the screen.

For more information on what these are for, and how to use them, please see the lecture on the Unix shell.

Here's a typical example of how these variables are used together. This little program looks at sys.argv to see if it was called with a filename as an argument or not.

If there were no arguments, then sys.argv will only hold the name of the program, and its length will be 1. In that case, the program reads data from standard input.

Otherwise, the program assumes its first command-line argument is the name of an input file, opens it, and reads from it instead.

Sure enough, if we run the program with no command-line arguments, and send it the contents of the file a.txt using redirection, it tell us that its standard input has 48 lines.

If we run it with a filename as an argument, on the other hand, it reads from that file and tells us it has 227 lines. Again, please see the lecture on the Unix shell for more information on standard input, standard output, and redirection.

Here's a more polite way to write the program we just created. The two significant changes are:

the strings at the start of the module, and the start of the function count_lines, and

the funny-looking conditional if __name__ == '__main__'. Let's look at them in that order.

If the first thing in a module or function other than blank lines or comments is a string that isn't assigned to anything, Python saves it as the documentation string, or docstring, for that module or function.

These docstrings are what online (and offline) help display.

For example, let's create a file adder.py with a single function add, and write docstrings for both the module and the function.

If we then import adder, help(adder) will print out all of its docstrings, i.e., the documentation for the module itself and for all of its functions.

We can also be more selective, and only display the help for a particular function instead.

The second part of our "more polite" program was that funny if statement. The trick here is that when Python reads in a file, it assigns a value to a special top-level variable called __name__ (with two underscores before and after).

If the file is being run as the main program, __name__ is assigned the string '__main__' (again with two underscores before and after).

If the file is being loaded as a module by some other program, though, Python assigns the module's name to the variable __name__ instead.

So imagine the file contains some definitions, and then the conditional statement if __name__ == '__main__'.

The definitions will always be executed…

…but the code inside the conditional will only run if the file is the main program. Put another way, the statements inside the conditional will not be run if the file is being loaded as a library by some other program.

Let's see how this works. Here's a file stats.py that defines a function average, and then runs three simple tests—but only if __name__ has the value '__main__'.

And here's another file, test-stats.py, that imports stats and runs two more tests.

If we run stats.py directly, the three tests inside it are executed.

If we run test-stats.py, though, those three tests aren't executed—only the two in test-stats.py itself are run. This happens (or doesn't happen) because the variable __name__ inside stats is assigned the string 'stats' instead of the string '__main__' when stats is loaded as a module.

Thank you.