Find us on GitHub

Teaching basic lab skills
for research computing

Version Control: Setup

Hello, and welcome to the fifth episode of the Software Carpentry lecture on version control. In this episode, we will show you how to set up a repository of your own. A word of warning: you will need to know a little bit about using a Unix shell to do this. If you've never used the shell, you should probably ask someone else to create your repository.

Here's the simplified picture from our first episode of what we're trying to achieve. We want to keep the master copy of our work in a repository on a server that we can access from other machines on the internet.

That master copy is a bunch of files and directories. Nobody ever edits them directly.

Instead, a copy of Subversion runs on that machine, managing updates for us and watching for conflicts.

Our working copy is a mirror image of the master sitting on our computer. When our Subversion client needs to communicate with the master, it connects to the copy of Subversion running on the server to move data back and forth.

In order for all of this to work, we need four things. First, we need the repository itself. It's not enough to create an empty directory and start filling it with files: Subversion needs to create a lot of other structure in order to keep track of old revisions, who made what changes, and so on.

Second, we need to know the web address---the URL---of the server. In fact, we need even more than that: we need the full URL of the repository on that server, since a single server could host any number of Subversion repositories.

The third thing we need is permission to read or write the master copy. Many open source projects give the whole world permission to read from their repository, but very few allow strangers to write to it: there are just too many possibilities for abuse. Somehow, we have to set up a password or something like it so that users can prove who they are.

The fourth and final thing we need is a working copy of the repository on our computer. We saw how to create that in the second episode of this lecture by checking out a copy of the repository; please review that episode if you need a refresher.

To keep things simple, we'll start by creating the repository on the machine that we're working on. This won't let us share the repository with other people, but it will allow us to save the history of our work as we go along.

The command to create a repository is svnadmin create, followed by the path to the repository. If we want to create a repository called lair_repo directly under our home directory, we can just cd to that directory and run svnadmin create lair_repo.

This command creates a directory called lair_repo to hold our repository. We should never edit anything in this repository directly: doing so probably won't tear our sanity to shreds and leave us gibbering mindlessly in horror, but it will almost certainly make the repository unusable.

To get a working copy of this repository, we use Subversion's checkout command. If the path to our home directory is /Users/mummy, then the full path to the repository we just created is /Users/mummy/lair_repo, so we use svn checkout file:///Users/mummy/lair_repo lair_working.

The first argument is the URL of our repository. file:// says that it's part of the local machine's filesystem, and /Users/mummy/lair_repo is the path to repository directory.

Notice that the protocol ends in two slashes, while the absolute path to the repository starts with a slash, making three in total. A very common mistake is to type only two, since that's what web URLs normally have.

When we're doing a checkout, it is very important that we provide the second argument, which specifies the name of the directory we want the working copy to be put in. Without it, Subversion will try to use the name of the repository, lair_repo, as the name of the working copy. This means that Subversion will try to overwrite the repository with the working copy, since they have the same name. Again, there isn't much risk of our sanity being torn to shreds, but this would ruin our repository.

To avoid these problems, most people create a sub-directory in their account called something like repositories or repos, and then create their repositories in that. For example, we could create our repository in /Users/mummy/repos/lair, then check out a working copy as /Users/mummy/lair.

Now let's see how to create a repository on a server. Let's assume we have a Unix shell account on a server called monstrous.monsters.org, and that our home directory is /u/mummy.

To create a repository called lair, we log into that computer, then run the command svnadmin create lair.

(Once again, we would probably actually create a sub-directory called something like repos and put our repository in there, but we'll skip that step here to keep our URLs short.)

The URL for the repository we just created is monstrous.monsters.org/u/mummy/lair---except that's not a complete URL, because it doesn't specify the protocol we are going to use to connect to the repository.

A protocol like HTTP or FTP defines how communication takes place between two computers: who talks first, how each party identifies itself, what data should be sent when, and so on.

It's very common to use the HTTP protocol to communicate with Subversion, but setting that up requires some knowledge of how web servers work.

We're going to use a combination of two protocols to access our repository. The first is called SSH, which stands for "Secure Shell". You probably used it to log in to the server to create the repository, though it might have been hidden inside a GUI client like Putty. SSH specifies rules for connecting to remote computers, providing passwords to prove your identity, and so on.

The second protocol is SVN, which is a specialized protocol defined by Subversion for moving data back and forth, comparing different versions of files, and so on.

Putting these together, the full URL for our repository is svn+ssh://mummy@monstrous.monsters.org/u/mummy/lair. Breaking this back into pieces:

svn+ssh is the protocol. (It has to be spelled exactly this way: "ssh+svn" should work, but doesn't.)

mummy@monstrous.monsters.org specifies who we are and what machine we're connecting to.

/u/mummy/lair is where our repository is located.

Every Subversion repository URL has these parts: a protocol, something to identify the server (which may optionally include a user ID if the repository isn't publicly readable), and a path.

Let's switch back to our local machine and check out a copy of the repository.

When we run Subversion's checkout command, our client makes a connection to the server monstrous.monsters.org and then prompts us for the password associated with the mummy account. By entering this password, we are proving to the server that we are the user mummy, or at least that we have the right to read and write the files that belong to mummy.

Notice that our client gives us the option of saving our password locally, so that we don't have to re-enter it each time we update from or commit to this repository. A lot of people think this is a bad idea, since anyone who stole that password from our local machine would then be able to log into the server as us and do horrible things. We'll look more closely at the pros and cons of this in a future lecture.

A bigger question is how to give other people access to the repository we have just created so that they can start working with us. Unfortunately, this really does require things that we are not going to cover in this course. If you need to do this, you can:

ask your system administrator to set it up for you,

use an open source hosting service like SourceForge or Google Code, or

spend a few dollars a month on a commercial hosting service like DreamHost that provides web-based GUIs for creating and managing repositories.