Find us on GitHub

Teaching basic lab skills
for research computing

Systems Programming: Directory and File Paths

Systems Programming/Directory and File Paths at YouTube
media 001

Hello and welcome to the fourth episode of the Software Carpentry lectures on handling directories and files in Python. In the previous episodes, we've seen how to explore directories and enquire about their contents. In this one we'll look more at handling directory and file paths, again using the os.path module.

media 002

We may want to build up paths from variables containing directory or file names. These variables might come from other functions, from configuration files or from the user, via a GUI, or the command-line. For example, here we have three variables we might use to build a file path.

media 003

We could create a new variable, path by appending base, a string with a file separator, user, another file separator string and datadir. This would work just fine.

media 004

But the use of the file separator string isn't very clean. More seriously, it assumes we're running on Linux or UNIX which means our code isn't very portable. What if we want to run on Windows too, which uses a backslash as its file separator?

media 005

Python provides a join function in its os.path module that means we don't have to worry about file separators.

media 006

Join is one of those useful functions that takes two or more arguments.

media 008

Join picks a file separator based upon what it knows to be the current operating system.

media 009

And if we ran this on Windows, this is what we would get.

media 010

Note the backslashes in the path. Actually they are double backslashes but this is only because we are printing them.

media 011

But, you might say, what about that initial forward slash. How do we handle that?

media 012

Python again comes to our rescue with its normpath function. Normpath converts paths to be consistent with the current operating system.

media 013

So for Windows it will convert forward slashes to backslashes.

media 014

And here's another example.

media 015

Normpath does more than just convert file separators. Take, for example this messy looking path. Putting this into normpath gives us...

media 016

...something far cleaner.

media 017

Normpath also removes duplicated file separators.

media 018

...and removes the dot shorthand for the current directory.

media 019

It also tries to resolve the double dot short-hand that represents parent directories.

media 020

Sometimes we might have a path and want to get the last part of the path, for example the file name or the last directory. Python provides the dirname and basename functions to do this.

media 021

Here is a path...

media 022

Dirname extracts the directories up to but not including the last component, in this example a file, in the path.

media 023

Basename returns the last component in the path, in this case it's a file name.

media 024

Split combines the behaviour of both dirname and basename and returns a pair.

media 025

The first element in the pair is the same as what dirname returns.

media 026

And the second, the same as what basename returns.

media 027

Another similar function is splitext.

media 028

Splitext returns a pair consisting of...

media 029

All of the path up to but not including the file extension.

media 030

And, the file extension itself. If there is no file extension then this is just an empty string.

media 031

Splitdrive also returns a pair.

media 032

This consists of a drive name. This will be an empty string if running on Linux or Unix.

media 033

And it also returns the rest of the path.

media 034

We may not know if a path is relative or absolute.

media 035

isabs is a function that checks this.

media 036

It just checks whether the path begins with a forward slash, for Linux and Unix, or a backslash, after the drive has been removed, for Windows.

media 037

Abspath converts a relative path to an absolute path.

media 039

It uses the current working directory, returned by getcwd which we saw in an earlier episode. It just adds this directory to the front of the path. Then it normalizes the path in a similar way to normpath. And let's check that it is indeed now an absolute path.

media 040

It is.

media 041

And here's another example, with more normalisation needed.

media 042

This sets the absolute path to be users vlad data dot-dot dot-dot

media 043

But then normalizes the dot-dot parent directory short-hand to get to users.

media 044

It is important to remember that none of these operations check whether the directories or files in the paths actually exist. They are useful, though, as they allow you to build paths for directories or files you will create later.

media 045

But it also means you need to do these checks yourself. So, remember os.path's exists function.

media 046

In this episode we saw a number of useful os.path functions. Join can join relative paths together using the file separator of the current operating system. Normpath allows us to convert a path to be consistent with the current operating system as well as cleaning it up and removing redundancy. Dirname can get the path to the final directory or file in a path. Basename can get the name of the final directory or file in a path. Split combines the dirname and basename, accessing both the path to the final directory or file and this directory or file itself. Splitext allows us to get a file extension. And, splitdrive allows us to get a drive name. Finally, isabs allows us to see whether a path is relative or absolute and abspath converts a relative path to an absolute one.

media 047

Thank you for listening.