SVN and BerkeleyDB.

In my post on setting up Subversion, I had written to use FSFS as your repository type because "I've been told the Berkeley DB version is a little buggy".  I have no experience using BerkeleyDB with SVN (or BerkeleyDB by itself) so I was going on information I had heard.

Today, I got an email from Greg Burd.  Greg is the Product Manager for Berkeley DB/JE/XML at Oracle.  He is trying to fight this prevailing perception of BerkeleyDB ("it's buggy").  Instead of paraphrasing what he said, Greg gave me permission to reproduce his email in this post:

In the latest versions of SVN the issues using Berkeley DB have been fixed.  This was a joint effort between both engineering teams (SVN and DB) over a year ago.  Clearly there are pros and cons to DB and FSFS storage schemes, but to say that DB is "buggy" is inaccurate.

I'm trying to fight this mistaken impression that DB is buggy.  Due to the size and composition of the SVN community of users this is particularly important to me, developers are our most precious resource.

In this case there has been a lot of attention to the issues, few facts, and as a result the general feeling is that "DB is not reliable, or the best choice".  Clearly I disagree with both those statements.

DB isn't an easy product to integrate, developers must consider a number of complex issues.  A few of those slipped through the SVN team's fingers.  That's to be expected, I don't fault them.  They have a large complex code base, bugs happen.  We, Sleepycat at the time, had resisted some features that the SVN team suggested to us.  After some debate we added a few new things to ease SVN's development.  It turned out that our customers appreciated these new features too.  A perfect example of open source communities benefiting commercial users.  We helped them, they helped us and in the end both teams had better products.  I think you'll get this same story from anyone in the core SVN team, just go to irc:// and ask the developers about Berkeley DB and Sleepycat.  Tell them Greg from Sleepycat sent you.

Just trying to get the story right.  :)


Berkeley DB Product Manager - Oracle

I want to thank Greg for contacting me and allowing me to post this.  And as an additional note, if you look at the release notes for SVN 1.4 you'll see the following on BerkeleyDB support:

A common problem with previous versions of Subversion is that crashed server processes could leave BerkeleyDB-based repositories in an unusable "wedged" state, requiring administrators to manually intervene and bring back online. (Note: this is not due to bugs in BerkeleyDB, but due to the unorthodox way in which Subversion uses it!)

Subversion 1.4 can now be compiled against BerkeleyDB 4.4, which has a new "auto-recovery" feature. If a Subversion server process crashes and leaves the repository in an inconsistent state, the next process which attempts to access the repository will notice the problem, grab exclusive control of the repository, and automatically recover it. In theory (and in our testing), this new feature makes BerkeleyDB-based repositories just as wedge-proof as FSFS repositories.

If you decide to try Subversion, make sure you research your choices for repository type.  Each ones has pros and cons and with 1.4, it looks like some of the negative aspects of using BerkeleyDB have been eliminated.

There's also this press release from CollabNet where founding developer of Subversion, Karl Fogel, has positive things to say about BerkeleyDB.

Technorati tags: , ,

UPDATE: Fixed the mailto: link for contacting Greg Burd.

UPDATE2: Added a link to CollabNet press release.

No Comments