An Introduction to Document Databases
When most people say database, they mean relational database. Edgar Codd defined and coined the term at IBM's Almaden Research Center about 40 years ago. Since that time, relational databases have become the foundation of nearly every enterprise system. However, Internet-scale systems have begun to push the limits of this venerable technology. What has sprung up to fill the need? Various next generation databases addressing some of the following points: being non-relational, distributed, and horizontal scalable. These attributes are characteristics of the "NO SQL" movement. In this case, NO stands for "Not Only". So how many NO SQL databases are there? More than I care to count. But most of the fall into the following categories: Document, Graph, Key/Value, and Tabular/Wide Column.
Document Databases are especially interesting. So what makes them different from the relational model?
A document-oriented database is, unsurprisingly, made up of a series of
self-contained documents. This means that all of the data for the document
in question is stored in the document itself — not in a related table as
it would be in a relational database. In fact, there are no tables, rows,
columns or relationships in a document-oriented database at all. This
means that they are schema-free; no strict schema needs to be defined in
advance of actually using the database. If a document needs to add a new
field, it can simply include that field, without adversely affecting other
documents in the database. This also documents do not have to
store empty data values for fields they do not have a value for. [ from Exploring CouchDB ]
They have some special characteristics that make them kick some serious SQL.
- Objects can be stored as documents: The relational database impedance mismatch is gone. Just serialize the object model to a document and go.
- Documents can be complex: Entire object models can be read & written at once. No need to perform a series of insert statements or create complex stored procs.
- Documents are independent: Improves performance and decreases concurrency side effects
- Open Formats: Documents are described using JSON or XML or derivatives. Clean & self-describing.
- Schema free: Strict schemas are great, until they change. Schema free gives flexibility for evolving system without forcing the existing data to be restructured.
- Built-in Versioning: Most document databases support versioning of documents with the flip of a switch.
A few of the top document databases are CouchDB, RavenDB, and MongoDB.
- CouchDB is an Apache project created by Damien Katz (built using Erlang) and just reached a 1.0 status. Damien has a background working on Lotus Notes & MySql.
- RavenDB is built on using C# and has some interesting extension capabilities using .NET classes. RavenDB was created by Ayende Rahien (the creator of Rhino Mocks & much more).
- MongoDB is written in C++ and provides some unique querying capabilities. MongoDB was originally developed by 10gen.
So, where is the best place to use a document database?
- The schema-less nature
makes it ideal to store dynamic data, such as CMS and CRM entities,
which the end user can usually customize as necessary or semi structure
data (provided by human).
Related Data, such as user sessions, shopping cart, etc. - Due to its
document based nature means that you can retrieve and store all the data
required to process a request in a single remote call.
Entities, such as user-customizable entities, entities with a large
number of optional fields, etc. - The schema free nature means that
you don't have to fight a relational model to implement it.
View Models - Instead of recreating the view model from scratch on
every request, you can store it in its final form in a document database. That leads
to reduced computation, reduced number of remote calls and improved
- Large Data Sets - The underlying storage mechanism for Raven is known to scale in excess of 1 terabyte (on a single
machine) and the non relational nature of the database makes it trivial
to shard the database across multiple machines, something that Raven
can do natively.
[ from About RavenDB ]
I'd be interested to hear your experiences with document databases. I'll go more into a RavenDB in a future post.
UPDATE : Included MongoDB
UPDATE : You may also want to take a look at my book - RavenDB High Performance