?

Log in

No account? Create an account

DVCS - Musings of Unayok

2009 Jun 07

16:36DVCS 

Previous Entry Share Next Entry
I've just pushed the first commits to my newest (public) coding project: genosha (no apologies for the name at all ;> ). A pretty small one, but it's been kind of fun to do. While the other project I've opened up to all comers (terbium) has application for general computer users (despite its lack of point-and-droolness, it's actually reasonable easy to use), genosha is a lower-level tool. So I won't be upset if this post causes much glazing-of-the-eyes for those on my friends-list.

This increase in publicly visible coding is intentional. I have started many projects that have never gone very far. Some times this is for obvious and good reasons. Others for just plain inertia. This way, there's at least some sense of completion and progress.

In putting together genosha, I was faced with a bit of a dilemma. A very geeky one. Which source control system would I use? There are a fair number out there. From the beginning, it was obvious it would be a DVCS (Distributed Version Control System). That limited it to a handful of contenders. In fact, though, it comes down to a choice between two: Mercurial and git. A quick look at some of the others (bazaar, monotone, arch, darcs, etc.) and how they were being used disuaded me from them quickly.

Both Mercurial and git stem from the same period: the Internet/Linux drama around BitKeeper that ended in the owners of that proprietary software withdrawing the gratis version that allowed the Linux kernel developers to use it. Mercurial and git were primarily responses to this.

In many respects, they are quite similar. If I were to get involved in a project that used either system, I'd be quite content to work along. But this was about choosing ones for my own use.

External hosting options exist for both. Git has github, gitorious, and a myriad of others. Mercurial has Bitbucket, code.google.com (Real Soon Now), and others. Likewise, both have significant projects using them. Git has the Linux kernel, WINE, x.org, and many others. Mercurial has Mozilla, Xen, and (when they get the migration from SVN figured out) Python. Mercurial is written in Python (with some use of C where important), git in C and Perl. This might give Mercurial a slight advantage; its internal structure is cleaner, and (to my Pythonic eyes) more readable/understandable. However, I don't want to futz with the thing; just use it to manage my own projects.

As I said, they both are fairly similar. Well. At the higher levels I'm looking at. They are, however, not identical. In the end, I (obviously from the genosha link at the top) settled on git.

There are two main differences that tipped the scales. The first is that Mercurial does not natively support "local branching". I've found this immensely handy in git; not needing to manage separate working directories for very lightweight, temporary branches. There is a Mercurial extension for local branches. To me it seems... a little hack-y.

The other difference that weighs in git's favour is its concept of staging. Files/changes to be committed in a changeset must be added to the staging area (it's a simple command, or an option when you do your commit) prior to being committed. I find this very useful as I like (to try at least) to keep my commits geared to one task. However, when actually doing that, sometimes other little changes creep in. Staging allows me to keep them separate without a lot of extra bookkeeping. Mercurial doesn't have this concept.

Now I know that there are workarounds for both of those. However, I don't like the workarounds, and feel that the more non-mainline the installation is, the more likely I'm not using the tool in the Proper Way (as it was intended). To be sure, there are a few other minor differences but I considered those mostly noise.

I expect I'll probably end up working with Mercurial eventually. This would not be a bad thing. It's a good DVCS, and I would recommend it, certainly over something as painful as SVN. It's just not quite right for me.

Comments:

[User Picture]
From:spiziks
Date:2009 Jun 07 - 22:09 (UTC)
You are =such= a geek! :)
[User Picture]
From:redwolfexr
Date:2009 Jun 11 - 22:28 (UTC)
Glad to see it wasn't just me thinking that...

Software dudes sure make things complicated.
[User Picture]
From:unayok
Date:2009 Jun 12 - 00:04 (UTC)
No, we expose the underlying complexity that is already there.

I should get a t-shirt "CENTRAL SERVICES".

10 points if you get the reference.
[User Picture]
From:theobviousname
Date:2009 Jun 16 - 02:07 (UTC)
What makes SVN painful? We're still using CVS at work, but since that's almost exclusively mediated by the IDE (which supports multiple concurrent changesets, shelving changes, etc.) I only ever really bump into it when I'm delving into the history of a file and it breaks on file move or rename.

What's the difference between a VCS and a DVCS?
[User Picture]
From:unayok
Date:2009 Jun 16 - 02:49 (UTC)
Super ultra-short response: SVN is slow, it's unfun to deal with branches or anything other than the most trivial of merges (without external assistance). It requires fiddly set up of a master repository. It pollutes your project's directory structure with .svn (.cvs) dirs for tracking. It manages changes on a file by file basis rather than a project changeset basis (even though revision numbers are "global"). And it's slow. It doesn't differentiate between local editing and merges from other branches. Getting diffs between branches is ugly at best. You can't commit to a local repository and then deal with merges from an upstream repository; you *have* to slave your local repository to the ultimate master first.

Distributed VCS differ from more traditional VCS in a few ways (and I'm not getting all of them, but this is enough to get started). First: the design doesn't revolve around a central master repository. Every repository could be considered authoritative. While this sounds chaotic, it really isn't as two things happen. (1) they provide excellent tools for merging and cherrypicking updates from other "remotes", and (2) it's very flexible for many workflows: many projects maintain an effective central authoritative repository, but this can be quickly merged with any branch on any local repository.

Because it's distributed, disconnected operation is supported and de rigeur. Your commit cycle becomes much shorter. You're working with a local repository and can commit half-way done stuff to your local tree. Only when you're happy with it do you publish your mergeset or "git push remote" to provide your changes for others. This and the ability (at least with git) to have relatively lightweight branches allows you to have separate branches (for example) for each ticket or issue. Since it's managed in your local repository, no one outside it need see the low level branching or commits you've performed; just the finished ones you merge into "master" (or whatever your workflow calls it.

(When you push to or pull from a repository, you end up with that repository's history for the branch you're pushing/pulling. This allows for the disconnected operation; you really do have a complete repository on your system, and the shared history allows the commits you do offline to be merged safely into the remote when the next connection is made.)

DVCS works with change sets. It doesn't track an individual file's history separately. Thus it is trivial to follow the code as it gets moved from file/class to file/class during refactoring. There are tools (Which may also be implemented in SVN/CVS, but I can only imagine how glacially slow they'd be) to bisect the changeset history to find precisely when X changed (where X is a file, or a line in some file that used to be some place else). Git and Mercurial (and I assume others though I have less experience with them) are both fast. Git is just mindbogglingly fast, even under windows. (Git also allows cryptographic signatures of changesets which is a great idea I haven't needed to use yet.)

This isn't a well-structured note; I'm rambling. But I hope you get at least some of the idea.

The Design section of Git's Wikipedia article may prove enlightening. Yes, it is still wikipedia, but read as one source among several, it's a decent overview.