Hello and welcome to the fourth episode of the Software Carpentry lectures on handling directories and files in Python. In the previous episodes, we've seen how to explore directories and enquire about their contents. In this one we'll look more at handling directory and file paths, again using the os.path module.
We may want to build up paths from variables containing directory or file names. These variables might come from other functions, from configuration files or from the user, via a GUI, or the command-line. For example, here we have three variables we might use to build a file path.
We could create a new variable, path by appending base, a string with a file separator, user, another file separator string and datadir. This would work just fine.
But the use of the file separator string isn't very clean. More seriously, it assumes we're running on Linux or UNIX which means our code isn't very portable. What if we want to run on Windows too, which uses a backslash as its file separator?
Python provides a join function in its os.path module that means we don't have to worry about file separators.
Join is one of those useful functions that takes two or more arguments.
Join picks a file separator based upon what it knows to be the current operating system.
And if we ran this on Windows, this is what we would get.
Note the backslashes in the path. Actually they are double backslashes but this is only because we are printing them.
But, you might say, what about that initial forward slash. How do we handle that?
Python again comes to our rescue with its normpath function. Normpath converts paths to be consistent with the current operating system.
So for Windows it will convert forward slashes to backslashes.
And here's another example.
Normpath does more than just convert file separators. Take, for example this messy looking path. Putting this into normpath gives us...
...something far cleaner.
Normpath also removes duplicated file separators.
...and removes the dot shorthand for the current directory.
It also tries to resolve the double dot short-hand that represents parent directories.
Sometimes we might have a path and want to get the last part of the path, for example the file name or the last directory. Python provides the dirname and basename functions to do this.
Here is a path...
Dirname extracts the directories up to but not including the last component, in this example a file, in the path.
Basename returns the last component in the path, in this case it's a file name.
Split combines the behaviour of both dirname and basename and returns a pair.
The first element in the pair is the same as what dirname returns.
And the second, the same as what basename returns.
Another similar function is splitext.
Splitext returns a pair consisting of...
All of the path up to but not including the file extension.
And, the file extension itself. If there is no file extension then this is just an empty string.
Splitdrive also returns a pair.
This consists of a drive name. This will be an empty string if running on Linux or Unix.
And it also returns the rest of the path.
We may not know if a path is relative or absolute.
isabs is a function that checks this.
It just checks whether the path begins with a forward slash, for Linux and Unix, or a backslash, after the drive has been removed, for Windows.
Abspath converts a relative path to an absolute path.
It uses the current working directory, returned by getcwd which we saw in an earlier episode. It just adds this directory to the front of the path. Then it normalizes the path in a similar way to normpath. And let's check that it is indeed now an absolute path.
It is.
And here's another example, with more normalisation needed.
This sets the absolute path to be users vlad data dot-dot dot-dot
But then normalizes the dot-dot parent directory short-hand to get to users.
It is important to remember that none of these operations check whether the directories or files in the paths actually exist. They are useful, though, as they allow you to build paths for directories or files you will create later.
But it also means you need to do these checks yourself. So, remember os.path's exists function.
In this episode we saw a number of useful os.path functions. Join can join relative paths together using the file separator of the current operating system. Normpath allows us to convert a path to be consistent with the current operating system as well as cleaning it up and removing redundancy. Dirname can get the path to the final directory or file in a path. Basename can get the name of the final directory or file in a path. Split combines the dirname and basename, accessing both the path to the final directory or file and this directory or file itself. Splitext allows us to get a file extension. And, splitdrive allows us to get a drive name. Finally, isabs allows us to see whether a path is relative or absolute and abspath converts a relative path to an absolute one.
Thank you for listening.