Find us on GitHub

Teaching basic lab skills
for research computing

The Unix Shell: Files and Directories

The Unix Shell/Files and Directories at YouTube

Hello, and welcome to the second episode of the Software Carpentry lecture on the Unix shell. In this episode, we'll have a look at how files and directories are organized, and how to navigate around them.

As we said in the last episode, a computer has four main jobs: run programs, store data, communicate with other computers, and interact with us.

One way the computer can interact with us is through a command shell: we type commands, the shell tells the computer to run programs on our behalf, and then the shell shows us the output from those programs.

Some of the commands we will use most often are ones related to storing data on disk.

The subsystem reponsible for this is called the file system.

It organizes our data into files, which hold information…

…and directories, which hold files or other directories.

In the next few minutes, we'll see how we can use the shell to view what's in the file system.

Or to be more precise, how we can use the shell to run other programs that will show us what's in the file system.

Let's start by logging in to the computer.

Here, we're showing the shell's prompt in bold.

And explanatory text (like this message) in blue.

Type our user ID—we'll show user input in green.

And then our password. Most systems will print stars to obscure it, or nothing at all, in case some evildoer is shoulder surfing behind us.

Once we have logged in, we'll see a shell prompt, which is usually just a dollar sign (but which may show extra information, like our user ID).

The shell prompt is exactly like Python's >>> prompt: it signals that the shell is waiting for us to type something in.

Type "whoami", followed by "enter". This command prints out the ID of the current user, i.e., shows us who the shell thinks we are.

When we enter it, the shell finds a program called whoami

…runs it…

…displays its output…

…and then displays a new prompt, telling us that it's ready for more commands.

Now that we know who we are, we can find out where we are using pwd, which stands for "print working directory".

This is our current default directory, i.e., the directory the computer assumes we want to use unless we specify something else explicitly.

The computer's response is /users/vlad. To understand what this means, let's have a look at how the file system as a whole is organized.

At the very top of the file system is a directory called the root directory that holds everything else the computer is storing.

When we want to refer to it, we just use a slash character /.

This is the leading slash in /users/vlad.

Inside that directory (or underneath it, if you're drawing a tree) are several other directories, such as bin (which is where some built-in programs are stored)…

data

users (where users' personal directories are located)…

tmp (for temporary files that don't need to be stored long-term), and so on.

We know that our current working directory, /users/vlad, is stored inside /users because /users is the first part of its name. Similarly, we know that /users is stored inside the root directory / because its name begins with /.

Underneath /users, we find one directory for each user with an account on this machine. The mummy's files are stored in /users/imhotep, the Wolfman's in /users/larry

…and ours in /users/vlad

…which is why vlad is the last part of the directory's name.

Notice, by the way, that there are two meanings for the / character. When it appears at the front of a file or directory name, it refers to the root directory. When it appears inside a name, it's just a separator.

Let's see what's inside Vlad's home directory by running ls, which stands for "listing".

It's not a particularly memorable name, but as we'll see, many others are unfortunately even more cryptic.

ls prints the names of all the files and directories in the current directory in alphabetical order, arranged neatly into columns.

To make its output more comprehensible, we can give it the argument, or flag, ls -F.

This tells ls to add a trailing / to the names of directories. As you can see, there are seven of these. The names without slashes—notes.txt, pizza.cfg, and solar.pdf—are plain old files.

Here's that output again, with a picture of what it's showing us.

You may have noticed that the files' names are all something dot something. By convention, the second part, called the filename extension, indicates what type of data the file holds.

.txt signals a plain text file, .pdf indicates a PDF document, .cfg is a configuration file full of parameters for some program or other, and so on.

However, this is only a convention, and not a guarantee. Files contain bytes, nothing more; it's up to us and our programs to interpret those bytes according to the rules for PDF documents, images, and so on.

Now let's run the command ls -F data, which tells ls to give us a listing of what's in our data directory.

The output shows us that there are four text files and two directories. This hierarchical organization helps us keep our work organized.

Notice while we're here how we spelled the directory name data. Since it doesn't begin with a slash, it's a relative path

…i.e., it's interpreted relative to the current working directory.

If we run ls -F /data, we get a different answer…

…because /data is an absolute path.

The leading / tells the computer to follow the path from the root of the filesystem…

…so it always refers to exactly one directory, no matter where we are when we run the command.

What if we want to change our current working directory? pwd shows us that we're still "in" /users/vlad

…and ls without any arguments shows us its contents.

We can use cd followed by a directory name to change our working directory.

cd stands for "change directory"…

…which is a bit misleading: the command doesn't change the directory…

…it changes the shell's idea of what directory we are in.

cd doesn't print anything, but if we run pwd after it, we can see that we are now "in" /users/vlad/data.

If we run ls without arguments now, it lists the contents of /users/vlad/data

…because that's where we now are.

OK, we can go down the directory tree: how do we go up? If we're still in /users/vlad/data

…we can use cd .. to up one level.

.. is a special directory name meaning "the directory containing this one".

Or more succinctly, the parent of the current directory.

Sure enough, if we run pwd after running cd .., we're back in /users/vlad.

The special directory .. doesn't usually show up when we run ls.

If we add the -a flag, though, it will be displayed.

-a stands for "show all".

It forces ls to show us directory names that begin with ., such as ..

(which, if we're in /users/vlad, points to the /users directory)

and also another special directory that's just called ., which is the directory we're currently in. It may seem redundant to have a name for where we are, but we'll see some uses for it in later episodes.

Everything we have seen so far works on Unix and its descendents, such as Linux and Mac OS X. Things are a bit different on Windows.

Here's a typical directory path on a Windows 7 machine.

The first part, C:, is a drive letter. This notation dates back to the days of floppy drives…

…and even today, each drive is a completely separate filesystem.

Instead of a forward slash, Windows uses backslash to separate the names in a path.

This causes headaches because Unix uses backslash to escape special characters. For example, if you want to put a space in a filename, you would write it as \ (backslash followed by space). Please don't ever do this, though: if you put spaces, question marks, and other special characters in filenames on Unix, you're likely to confuse the shell and a lot of other tools.

Finally, Windows filenames and directory names are case insensitive: upper and lower case letters mean the same thing.

This means that the path name C:\Users\Vlad could be spelled in 1024 different ways. Some people argue that this is more natural—after all, "VLAD" in all upper case and "Vlad" spelled normally refer to the same person—but it does cause some headaches for programmers, and can be difficult for people whose first language doesn't use a cased alphabet to understand.

The Cygwin package tries to make Windows paths look more like Unix paths by allowing us to refer to the C drive as /cygdrive/c/ instead of as C: (although the latter does usually work too).

It also allows us to use forward slash instead of backslash as a separator.

But paths are still case insensitive…

…which means that if you try to copy files called backup.txt (in all lower case) and Backup.txt (with a capital 'B') into the same directory, the second will overwrite the first.

To summarize, here are the three commands, and two special directory names, that we saw in this episode.

In the next episode, we'll see how to create, rename, and delete files and directories.