Hello, and welcome to the first episode of the Software Carpentry lecture on version control. In this episode, we will explain what version control is, how it works, and why you should use it.
Suppose Wolfman…
…and Dracula…
…are writing a paper together.
They both want to edit the file at the same time.
What should they do?
They could take turns…
…but then they'd each spend a lot of time waiting.
Or, they could each work on their own copy at the same time…
…and patch things up afterward.
But stuff always winds up getting lost, overwitten, or duplicated.
And nobody wants that.
The right solution is to use a version control system.
This keeps the master copy of the file…
…in a repository located on a server—a computer that is never used directly by people, but only by the applications serving them.
No-one ever edits the master copy directly.
Instead, Wolfman and Dracula each have a working copy on their own computer. This lets them work independently, making whatever changes they want.
As soon Wolfman is ready to share his changes, he commits them to the repository.
Dracula can then update his working copy to get those changes.
And of course, when Dracula finishes working on something, he can commit…
…and then Wolfman can update.
But what if Dracula and Wolfman make changes to the same part of their working copies?
Old-fashioned version control systems prevented this from happening by locking the master copy.
Only one person (or monster) could open the lock at a time. This guaranteed that two or more people could never accidentally make changes to the same file at the same time…
…but once again, it meant that people had to take turns.
Most of today's version control systems use a different strategy. In these systems, nothing is ever locked—everyone is always allowed to edit their working copy.
Sometimes, of course, people will make changes to the same part of the paper.
If Wolfman updates the master copy first, his changes are simply copied to the repository.
If Dracula now tries to commit something that would overwrite Wolfman's changes…
…the version control system stops him…
…and marks the conflict.
It's up to Dracula to edit the file to resolve the conflict. He can accept what Wolfman did, replace it with his own work, or write something new that combines the two—it's up to him.
Once he has fixed things, he can go ahead and commit.
Version control is better than mailing files back and forth for at least three reasons.
First, it's hard (but not impossible) to accidentally overlook or overwrite someone's changes—the version control system highlights them for you automatically.
Second, there are no arguments about whose copy is the most up to date—the master copy is.
Third, nothing that is committed to version control is ever lost, which means it's like having an "infinite undo" in your editor.
This works because the version control system never overwrites the master copy in the repository.
Instead, every time someone commits a new version…
…it is saved on top of the previous one.
Since all old versions are saved…
…it's always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results.
Version control systems do have one important shortcoming.
If you are working with plain text files…
…it's easy for the version control system to find and display differences, and to help you merge them.
Unfortunately, images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they use specialized binary data formats.
Most version control systems don't know how to deal with these formats, so all they can say is, "These files differ." The rest is up to you.
Even with this limitation, version control is one of the most important concepts in this course. In the next episode, we'll introduce you to one of the most popular free version control systems around.