Find us on GitHub

Teaching basic lab skills
for research computing

Systems Programming: Querying Directory Contents

Systems Programming/Querying Directory Contents at YouTube
media 001

Hello and welcome to the third episode of the Software Carpentry lectures on handling directories and files in Python. Here, we'll continue to look at how we can explore directories by looking at the ways in which Python allows us to find out more about the contents of directories.

media 002

We now know how to move around directories…

media 003

…and see what's in them.

media 004

But there are many other things we might want to find out.

media 005

We might want to check whether a file or directory already exists. This can be useful before saving a file to allow the user to decide if the file is to be overwritten.

media 006

We may have a variable and want to see whether that refers to a file or a directory.

media 007

We may want to see if two variables refer to the same file or directory.

media 008

We may want to find out what we can do with a file or directory. Are we allowed to read it, to update it, or delete it?

media 009

And, we may want to get information such as the file size, who owns it and when it was last modified.

media 010

A simple check we often want to do is to see whether a file or directory exists. Python provides the exists function that does this check. This takes an argument which can be an absolute or relative path and returns true if it exists as a file or directory and false otherwise.

media 011

Let's start with a relative path to a file.

media 012

media 013

And now an absolute path.

media 014

media 015

And a relative path to a directory.

media 016

media 017

And the absolute path.

media 018

media 019

And something that does not exist.

media 020

media 021

And the absolute path to something that does not exist.

media 022

media 023

Now, let's look at telling apart files and directories. Python provides two functions, isfile and isdir, which check whether their argument is a path to a file or a directory. They can take relative or absolute paths. Let's import these.

media 024

Now, let's call isfile on a relative path to a file.

media 025

As expected, it returns true.

media 026

And, for a path to a directory.

media 027

This time, it returns false.

media 028

And for a path to something that does not exist.

media 029

It also returns false.

media 030

Isdir is much the same. When given the path to a directory…

media 031

…it returns true.

media 032

And for a file…

media 033

…it returns false.

media 034

And when given a nonexistent directory…

media 035

…again, it returns false.

media 036

As isfile and isdir are just functions that return true or false we can use them in conditionals. So here is one example where we define a simple function to print whether the path it is given is…

media 037

A file.

media 038

A directory.

media 039

Or does not exist.

media 040

And here we see it running on a path to a file. This time we use an absolute path, just for a change.

media 041

media 042

And here it is with a path to a directory.

media 043

media 044

And with a path something that does not exist.

media 045

media 046

Samefile allows us to check whether two paths point to the same file or directory. This is useful when paths are held in variables. So, let's import it.

media 047

And create some variables with file paths.

media 048

If we compare file1 and file2, which contain relative and absolute paths to the same file, then…

media 049

…we get the expected result of true.

media 050

And if we compare file1 to a different path, file3, then…

media 051

…we get false.

media 052

Before trying to perform operations on a file, for example to open it for reading or writing, to delete it, or, for files that are executable binaries, to execute it, it can be useful to check if we are allowed to do these operations. Python's access function allows us to do these checks. So let's import access.

media 053

Access takes two arguments, the path to a file or directory and a flag that specifies what access permissions we want to check. So let's import the flags. There are four.

media 054

As an example of each in turn…. F_OK allows us to check if the file or directory exists.

media 055

media 056

R_OK is for checking if we have permission to read the file or directory.

media 057

media 058

W_OK is for checking if we can edit, update, or delete it.

media 059

media 060

And, X_OK is for checking if we can execute a file.

media 061

media 062

We can combine conditions using the logical OR, vertical bar, operator. So we can check if we can both read and write a file.

media 063

media 064

Or check if we can both read and execute a file.

media 065

media 066

Or if a file exists and we can read it and write it.

media 067

media 068

It can also be useful to get operating system information about files and directories.

media 069

The stat function returns a record holding various information about a file.

media 070

The information in this record can then be accessed. This includes its protection bits.

media 071

Inode number.

media 072

Device.

media 073

Number of hard links.

media 074

Owner's user ID.

media 075

Owner's group ID.

media 076

File size in bytes.

media 077

Most recent access time. The meaning is operating system dependant.

media 078

Most recent modification time. Again, operating system dependant.

media 079

The time of the most recent change to metadata, under Linux, or creation time, under Windows.

media 080

These times may be floats or integers. You can check this by calling the stat_float_times function. Here, it says the values are integers.

media 081

The stat record may also contain operating system-specific information.

media 082

For example, for Linux this can include the number of blocks used by the file.

media 083

And the file system block size.

media 084

We've looked at a number of Python functions to find out more information about files and directories. From the os.path module we used. Exists to see if a file or directory exists. Isfile and isdir to determine whether a path specifies a file or a directory. And, samefile to see whether two paths point to the same file or directory. From the os module we used. Access to see what access permissions we have to a file or directory and determine if we can read it, write to it, delete it, or, for files, execute it. And, we used stat to get low-level operating system-specific information such as file sizes, permission bits, user and group IDs and creation and modification times.

media 085

Thank you for listening.