Find us on GitHub

Teaching basic lab skills
for research computing

Systems Programming: Browsing Directories Using walk

Systems Programming/Browsing Directories Using walk at YouTube

Hello, in this second episode of the Software Carpentry lectures on handling directories and files in Python we'll take a look at Python's walk command which explores a directory and builds a list of all the sub-directories, files, sub-sub-directories, indeed, everything, within that directory.

Walk takes in a directory and returns a list of tuples. As walk uses recursion, and this can be a quite complex concept to understand if you've not encountered it before, we'll walk through how walk works, which may help us understand its output more easily.

So, given this directory structure, walk would create a tuple with…

The path to the current directory, for example, dot.

There would be a list of the directories in the current directory, in this case A, B and C. As for listdir, the list of the directories is in no specific order.

And there would be a list of the files in the current directory. In this case there are none so the list is empty.

Walk then recurses. That is to say, it calls itself, using each directory in the current directory in turn.

So it calls itself on the first directory which is C.

In this case, the path to the directory is dot C.

C has no sub-directories so the directory list is empty.

And C has one file, c.txt.

As C has no sub-directories, the call to walk on C exits.

And we're back in the original call to walk. This now moves onto the next directory in the list…

…which is A.

A has no directories and two files, a1.txt and a2.txt.

A has no sub-directories so the call to walk on A exits.

And again we're back in the original call to walk. This now moves onto the next directory in the list…

…which is B.

B has one file, b.txt, and two directories, P and Q.

The sub-directories of B are then "walked" in turn. So, starting with P…

P has one file and no directories.

As P has no directories, we return up a level and move onto the next directory of B's…

…which is Q which has no directories and two files.

As Q has no directories, we return up to B.

As we're done both P and Q we're finished with B…

… and so we return to our original directory.

And as we've now done A, B and C, we're finished.

So, here's how we'd call walk in our code.

We now know that walk returns a list of tuples so let's save them in a variable.

We know that each tuple consists of a directory path, a list of sub-directories in that directory, and a list of files. So we can use a for-in loop to print each tuple in the list in turn.

And here is the result.

Remember, each tuple contains a directory…

The list of subdirectories in each directory. If there are none then this is an empty list.

And, each tuple also contains the list of files in each directory, again an empty list if there are none.

For each directory, the directory name given to walk is used as a prefix, in this case the dot.

So, if we use walk with getcwd to get the current working directory…

And print the results.

We can see that the current working directory is the prefix.

walk supports an optional topdown argument which by default is true. If we set this to false then..

…tuples from child directories appear before their parents in the list…

P and Q's tuples appear before that of their parent, B.

And so on.

To summarize, in this episode we saw how the walk function allows us to recursively explore a directory's contents and gather a complete list of all the directories and files beneath it.

Thank you for listening.