Date: 2008-07-03 23:01:00
git revelation

At work we use Subversion for version control. We chose Subversion for a couple of reasons:

  1. Central repository. We had been using Mercurial, and found that the way we work would be better served by having a known, single place which is the repository. Although Mercurial was very flexible, we had some instances of patches being lost because they didn't get pushed all the way up to the Mercurial repository which was designated as the central one. Using Subversion, changes are either in the repository or they aren't, which makes management easier. Besides, Subversion integrates very well with Jira.
  2. TortoiseSVN has great support for Word documents on Windows. All our software development activity must be documented, and the format for those documents is Word (arguments for or against this are another topic entirely). TortoiseSVN can seamlessly diff Word and Excel documents, which makes this at least tolerable (you can probably guess on which side my opinion lies).

Anyway, this post isn't really about Subversion. Instead, it's about Git and git-svn. If you work with Subversion repositories and you're using the usual svn client, stop right now and install git and git-svn[1]. I did, and my source code management experience improved dramatically.

By using git-svn, you get all the benefits of a modern source control management system as well as a nearly seamless interface to Subversion. Some of the most useful features of git for me are:

  1. git-stash allows me to easily save what I'm working on and temporarily switch to something else. I've found that I often work fairly nonlinearly, and am working on more than one thing at once. Being able to switch between them without having a whole new working clone is a great time (and space) saver. A full build of the software we're working on takes just over 5 hours.
  2. Staged commits might seem like an extra unnecessary step if you're used to the way Subversion works, but it's actually very convenient. With Subversion, to make a commit that consists of some (but not all) of your modified files, you must explicitly name those files on the svn command line and you've only got one shot at it. If you miss a file, you must make an additional commit which means your history no longer consists of logically atomic commits. With git, you add the changed files to the index, either one by one or all at once (or even parts of files, see the next point), then when you're satisfied that the index is complete, you commit the whole thing. And not only that, if you really screw up the commit and haven't pushed it anywhere else upstream, you can use git-reset to change your local history and fix it before anybody else needs to see it.
  3. git-add -p allows me to pick specific changes from a modified file that should be included in the next commit. In Subversion, if you make changes A and B and they affect files a and b, you can commit each change individually. However, if there is a third file c that contains both some changes from A and some changes from B, you have to do some fancy editing to make two separate commits with Subversion. With git, you can git-add -p c and interactively pick just the lines that are appropriate for change A. The staged commit system really helps here because you can then git-add the rest of the files that don't need individual attention.

Mercurial does have some of these features, and I like Mercurial. But the thing that sold me on git was the Subversion integration.

[1] git-svn requires the Subversion perl bindings. I don't care if you think this is too many dependencies, it's still worth it.

I think it's amusing that you chose subversion because people couldn't remember which repository was authoritative. :p

My source control system of choice (bazaar), also supports subversion integration via a plugin called bzr-svn.

for git-stash/git-add, functionality, there is a shelve plugin. You "shelve" code hunks off to a temporary location, commit the remainder, then unshelve them.

I don't think we've got staged commits, but if you do commit too few, there's 'bzr uncommit' which takes the last commit out of revision history so you can do a new 'bzr commit' with everything you need.

I will say that due to some nasty memory leaks in the Python svn bindings (not bzr's fault) that the initial checkout from subversion can be a pain. (I ended up having to write a script to only check out 100 revisions per execution until it had finally gotten up to the newest revision.) But once you have a bzr mirror of subversion it works quite nicely, and I find bzr commands and workflows a *lot* more user-friendly than git.

see also: Bzr vs. git :)
Well, "couldn't remember" is a bit unfair. :) I have separate observations on the merits of distributed vs central scm, which I may collect and write up later.

One of the reasons git is interesting is the huge amount of momentum behind it. I imagine this is largely due to it being a Linus creation, and therefore the huge amount of effort behind Linux naturally applies themselves to git too. And, it's built in a very hackable style, which makes it easy for people to teach it to perform new tricks.
I'd be interested to hear your thoughts re: distributed/centralized.

Yeah, I guess I jumped to conclusions re: "couldn't remember". How were patches getting lost then?

Bazaar can work in a centralized workflow by using "bzr checkout" instead of "bzr branch". Any commits that you do get pushed back up to the remote branch as well. (But you can still "bzr commit --local" if you can't access the network right then.)

I still wouldn't attempt teaching bzr to *my* coworkers... most of them still don't even use our CVS repository.

Thanks Greg! We are actually switching to Subversion on our team and this may end up being very helpful for us! :)
You may also find Git from the bottom up useful. It talks more about things from a pure-git point of view, but there's still a lot of stuff in there that is helpful for the git-svn world.
I still use CVS for the most part :)
Greg Hewgill <>