January 2008 - Posts

As explained in my previous postings, I implemented a local/embeddable version of the Amazon SimpleDB data model and API in C#. You can download the sources from my NSimpleDB Google Code Project and build the tuple space engine yourself, or you download the demo application which includes the engine as a single assembly: NSimpleDB.dll.

Using the SimpleDB API then can be as easy as referencing the engine assembly and opening a local tuple space file like this:

using NSimpleDB.Service.Contract;

 

ISimpleDBService ts;

ts = new NSimpleDB.Service.VistaDb.VistaDbSimpleDBService("hello.ts");

...

ts.Close();

See my previous posting for detailed examples.

Access to Amazon SimpleDB

The API I devised for the SimpleDB purposedly was quite "service oriented", although the implementation was just local. I did this so my implementation and Amazon´s eventually could be used interchangeably. Back then, though, I did not have access to SimpleDB due to the limited beta.

But that has changed in the meantime. I was able to use SimpleDB online and thus have now implemented access to it through the same ISimpleDBService interface. Just instanciate a different service implementation:

ts = new NSimpleDB.Service.Amazon.AmazonSimpleDBService("<accessKeyId>", "<secretAccessKey>");

Instead of the placeholders pass in your Amazon access key id and your secret access key and you´re done. From then on all operations through the interface will run on your online SimpleDB space.

Here´s a small example of what you could do: create a domain, store some items into that domain, query the items, retrieve the found items, delete the domain.

ts.CreateDomain("mydomain");

 

ts.PutAttributes("mydomain", "1",

                        new SimpleDBAttribute("name", "peter"),

                        new SimpleDBAttribute("city", "london"));

ts.PutAttributes("mydomain", "2",

                        new SimpleDBAttribute("name", "paul"),

                        new SimpleDBAttribute("city", "berlin"));

ts.PutAttributes("mydomain", "3",

                        new SimpleDBAttribute("name", "mary"),

                        new SimpleDBAttribute("city", "london"));

 

string nextToken = null;

string[] itemNames;

itemNames = ts.Query("mydomain", "['city'='london']", ref nextToken);

 

foreach (string itemName in itemNames)

{

    Console.WriteLine("item: {0}", itemName);

 

    ISimpleDBAttribute[] attributes;

    attributes = ts.GetAttributes("mydomain", itemName);

 

    foreach (ISimpleDBAttribute attr in attributes)

        Console.WriteLine("  {0}={1}", attr.Name, attr.Value);

}

 

ts.DeleteDomain("mydomain");

(With regard to my previous API descriptions only a minor changes has occurred: you need to always pass in the nextToken as a ref parameter instead of an out parameter. But that´s a detail, I guess.)

This code runs locally as well as against SimpleDB online without change.

Under the hood I´m using Amazon´s own C# API for accessing SimpleDB. It was easy to program against - but I would not want to expose it as an API. For that I don´t find its usability too low. Too much meta data in the way of getting your problem solved.

Access to SimpleDB operation meta data

In order provide access to the meta data the Amazon SimpleDB API provides without compromising the simplicity of my ISimpleDBService interface, I decided to define a second interface: ISimpleDBDashboard.

Like a dashboard in your car this interface optionally "displays" what´s going on inside an ISimpleDBService implementation. AmazonSimpleDBService implements this interfaces and through it provides access to the response and exception meta data returned from the internally used Amazon API. To use it, just cast the service object like this:

ISimpleDBDashbord db = (ts as ISimpleDBDashbord);

There are only two properties on the dashboard interface: LastResponseMetaData and LastExceptionData. They give access to the meta data/exception data of the last operation issued on the current thread. So even in case you use the same service implementation on different threads, you always will be able to exactly see, what the status of the individual operations is. This is how I tried to avoid always returning some meta data object like Amazon does. It makes for a simpler API, I´d say. Call the dashboard properties right after any operation, if you like, e.g.

ts.DeleteDomain("mydomain");

Console.WriteLine("request id: {0}", db.LastResponseMetaData.RequestId);

Console.WriteLine("box usage: {0}", db.LastResponseMetaData.BoxUsage);

 

What´s next?

Well, that´s it for now. I have reached my initial goals of 1) provide a local version of the SimpleDB data model and API for all who want to play with it more easily than online, and 2) make the use the local tuple space and the real online SimpleDB equally possible and simple.

Of course I still have some ideas as to what can be improved (see the issue list online), but I now need to give the project a little rest. Please feel free, though, to use it and give me feedback on it via info [at] ralfw [dot] de, if you like.

Enjoy!

In my previous postings about Amazon´s SimpleDB data model and API I explained, what Amazon´s online database service - or to be more precise: tuple space - has to offer in general. If this sounds interesting to you, then now welcome to the desktop. Because it´s the desktop on which you can actually experience what it´s like to use such a tuple space. SimpleDB currently (as of Jan 08) is just in limited beta and you have to line up to get one of the limited test accounts.

But you don´t need to wait for Amazon to open up more. I implemented the SimpleDB data model and API in C# for you to integrate in your desktop or web applications. It´s an Open Source project at Google Code called NSimpleDB as short for .NET SimpleDB. I´d say it pretty much offers all features SimpleDB - but as an embeddable database engine instead of an online service.

Installing NSimpleDB

There are two ways for you to use NSimpleDB: Either you download the source code from the subversion repository of the NSimpleDB site. Then you can browse the sources and compile it yourself. But please note, you need to also install a VistaDb database engine. NSimpleDB internally is based on VistaDb and needs its libraries. As you can imagine I did not want do develop a whole persistence engine just to implement SimpleDB´s tuple space. But you can download an eval copy of VistaDb and need not fear to incur any costs right away. Also VistaDb is working on a free community edition, which I will of course use for NSimpleDB once it´s available.

Or, if you don´t want to mess around with the NSimpleDB source code, you can download the small demo application from the download area, which comes with a complete NSimpleDB engine as just one assembly: NSimpleDB.dll.

Using NSimpleDB

If you´re using the precompiled version of NSimpleDB just reference NSimpleDB.dll from your .NET project. If you compiled NSimpleDB yourself, though, reference NSimpleDB.Service.Contract.dll as well as NSimpleDB.Service.VistaDb.dll from the global bin folder of the source code tree. Also be sure to either have VistaDb installed on the same machine or copy VistaDB.NET20.dll into your projects´s output folder.

In any case you then should "open" the following namespaces in your source code:

using NSimpleDB.Service.Contract;

using NSimpleDB.Service.VistaDb;

Opening a NSimpleDB database

To work with NSimpleDB you need to manage a connection to a database file like with any regular RDBMS product. You can do it with the method pair Open()/Close() like this:

VistaDbSimpleDBService ts = new VistaDbSimpleDBService();

ts.Open("hello.ts");

...

ts.Close();

Just be sure to call Close() at the end. Pass in any name you like to give to your database file. NSimpleDB does not require a certain filename extension.

Or you can rely on the compile to generate the code to automatically close the connection. VistaDbSimpleDBSevice implements IDisposable:

using (VistaDbSimpleDBService ts = new VistaDbSimpleDBService("hello.ts"))

{

    ...

}

Working with domains

In order to store any data in your NSimpleDB tuple space you need to create a domain first. It´s as simple as this:

ts.CreateDomain("contacts");

No return value, nothing. This operation is idempotent. You can´t create a domain twice. From then on you can use the domain name in other operations.

To delete a domain, call the opposite operation:

ts.DeleteDomain("contacts");

This operation also is idempotent, don´t worry. And it´s asynchronous. Although the domain will become inaccessible right away for future operations, current operations are not interrupted and the actual deletion will take place at a later time.

Reflecting on domains then can be as easy as this:

string[] domainNames;

domainNames = ts.ListDomains();

foreach (string d in domainNames)

    Console.WriteLine(d);

But this is just the NSimpleDB way of doing it. Amazon´s SimpleDB does not provide such a simple way for retrieving all domain names. Instead the resultset is returned in pages. You determine the page size on the initial call to ListDomains():

domainNames = ts.ListDomains(10, out nextToken);

Also you pass a token variable into which the method puts an identifier for the next page. To retrieve it, call ListDomains() again just with this token:

domainNames = ts.ListDomains(ref nextToken);

If there are no more pages, the token is set to null.

Here´s how you can retrieve all domain names in a pagewise manner using these operations:

string[] domainNames;

string nextToken;

domainNames = ts.ListDomains(3, out nextToken);

while (domainNames.Length > 0)

{

    foreach (string d in domainNames)

        Console.WriteLine(d);

    domainNames = ts.ListDomains(ref nextToken);

}

Working with items and attributes

Storing items

Once you created a domain you can start storing items in it. Each item is uniquely identified by the domain name and its item name. The attributes to put into the item are passed as ISimpleDBAttribute objects. NSimpleDB for this reason provides the SimpleDBAttribute class:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("Firstname", "John"),

                new SimpleDBAttribute("Lastname", "Doe"));

You can freely choose an item name. You even need to choose one. It´s the primary key of the item, so to speak.

To later add an attribute to an item, just call PutAttributes() again and pass it the additional attribute:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("DOB", "1972-10-23"));

Adding attributes is so easy, because they can contain multiple values. In the relational data model you would need to define more than one column or even set up a second table for this kind of 1:n relationship. Not so with (N)SimpleDB:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("Phone", "555-1234"),

                new SimpleDBAttribute("Phone", "0170-332 3483"));

But then, how do you replace an attribute´s value? Pass it by explicitly stating you want it to be replaced:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("Firstname", "Peter", true));

Create the attribute object with true for the replace parameter.

Retrieving attributes

Retrieving an item is even easier than storing it. What you get is an array of ISimpleDBAttribute objects:

ISimpleDBAttribute[] attributes;

attributes = ts.GetAttributes("contacts", "123");

 

foreach (ISimpleDBAttribute a in attributes)

    Console.WriteLine("{0}={1}", a.Name, a.Value);

The default is to return all attributes with all their values. But you can limit the number of attributes by passing in explicit attribute names:

attributes = ts.GetAttributes("contacts", "123", "Firstname", "Phone");

 

Deleting attributes and items

You can delete attributes from an item at any time. Just specify their names. They´ll be purged from the item with all their values:

ts.DeleteAttributes("contacts", "123", "Phone");

Once all attributes are gone, the item is gone as well. To make this easier, there is a shortcut. Just don´t specify any attribute name at all:

ts.DeleteAttributes("contacts", "123");

 

Querying for items

Use GetAttributes() to retrieve a single item. But for that you have to know the item´s name. Where do you get this item name from, though? Just use a query:

string[] itemNames;

itemNames = ts.Query("contacts", "['Firstname'='John']");

SimpleDB´s query syntax is just about conditions the items, you´re looking for, have to fulfill. So it´s not a full blown data retrieval/manipulation language like SQL, but more just like a syntax for logical expressions. Here´s a second example to illustrate this:

itemNames = ts.Query("contacts", "['Firstname'='John' OR 'Firstname'='Peter'] UNION ['Lastname'='Davis']");

To load the attributes for the found items, just call GetAttributes() in a second step:

foreach (string itemName in itemNames)

{

    Console.WriteLine(itemName);

    ISimpleDBAttribute[] attributes;

    attributes = ts.GetAttributes("contacts", itemName);

    foreach (ISimpleDBAttribute a in attributes)

        Console.WriteLine("{0}={1}", a.Name, a.Value);

}

That´s it. Easy - but different from working with SQL.

Read more about the query syntax in my previous posting.

Since a query potentially matches a large number of items, SimpleDB´s querying is paged. NSimpleDB sports an unpaged version (see above), but of course also paged querying. It works like paged domain retrieval:

string[] itemNames;

string nextToken;

itemNames = ts.Query("contacts",

                    "...",

                    10,

                    out nextToken);

while (itemNames.Length > 0)

{

    foreach (string itemName in itemNames)

    {

        ...

    }

    itemNames = ts.Query(ref nextToken);

}

Conclusion

This is NSimpleDB. A simple data persistence API - but unlike your regular RDBMS. Nevertheless I find it very interesting an am looking forward for Amazon to open up its beta program for SimpleDB. But until then you can get acquaintant with SimpleDB´s data model by using NSimpleDB locally. Enjoy!

PS: Let me know any questions and suggestions you might have. Just email me via my homepage.

Amazon´s SimpleDB is an exciting new player in the database world. It´s free, it´s online, it´s not relational. SimpleDB is a dynamic database implementing a tuple space. Currently SimpleDB (as of Jan 08) is in beta - but not everyone can get his hands on it. You have to apply and line up for one of the limited test accounts.

Nevertheless it´s worthwhile to take a closer look at SimpleDB. It´s a brave step forward by Amazon to offer an online database (accessible via a web service) that´s deviating from the mainstream data model of RDBMS.

In part 1 of the series of postings I described this data model: You store tuples (aka items) consisting of name-value pairs (aka attributes) in a SimpleDB "data space" without the need of any configuration. No schema design necessary. No tuple needs to look like an other. Just so called domains are a structuring concept to group tuples. But it´s nowhere written you have to use more than one domain. Even different kinds of items don´t force you to distribute them across domains. Domains that way are more of a concern regarding scalability and quantitative constraints Amazon put on them.

A simple SimpleDB API

The data model of SimpleDB is simple, so is its API. It´s not based on a query language (although it provides set selection, see below), but rather follows the tuple space concept in that it defines just a small number of methods to read item from and write item to the "data space".

Following I´ll use pseudo code to describe the API. I think will be pretty self explaining. In reality Amazon offers a web service to work with SimpleDB, so you´ll use some kind of proxy class in your code. Amazon even published a .NET binding - but hasn´t gotten rave reviews so far. There is much room for improvement.

Attributes as smallest data units

The smallest piece of data with SimpleDB is an attribute. An attribute is a name value pair like "Name"("Peter") or "Amount due"("000000300.00") or "DOB"("2000-05-12") or "Marked for deletion"("1").

As you can see, values are just strings. It´s like with XML. Attribute names are also strings - and they can contain white space. This makes them easier to read and use as labels in frontends.

In addition - and in stark deviation from the relational data model - attributes can have multiple values, e.g. "Phone numbers"("05195-7234", "040-413 823 090", "0170-233 4439").

Amazon suggests, you don´t try to store large pieces of data in attributes, e.g. a multi-MB image. Rather you should put such byte-blobs into some other store - e.g. a file on an FTP-server or Amazon´s S3 - and use the attribute value as a reference.

Items as containers for attributes

Attributes belong to items. In principle items can contain any number of attributes, but Amazon put some limitations on them. Currently only 256 attributes are allowed in each item.

Items can be written as tuples and are identified by an explicit id you have to provide, e.g. "123"["Name"("Peter"), "City"("Berlin")]. The id is called "item name" an again is a string.

As you can see, attributes are tuples with unnamed elements, but items are tuples whose elements are named.

Domains as containers for items

Items are stored in domains. Like them, domains have an id, the domain name. No schema needs to be defined for them. Just pour items of any structure into them as you like, e.g. "contacts"{"123"["Name"("Peter"), "Addresses"("a", "b")], "a"["City"("London"), "Country"("GB")], "b"["City"("Hamburg"), "Country"("Germany")]}.

As you can see, domains are tuples, too. Their elements are named tuples, the items.

Writing data

Roughly you can say, domains are like tables, items are like records in a table, attributes are table columns. So storing data with SimpleDB means: write items with their attributes to a domain. That´s like writing records with their column data to a table.

SimpleDB provides a single operation for writing data: PutAttributes(). Identify where you want to put the attributes - into which item in which domain -, hand in the attributes - and you´re done.

This command would write a single attribute to the item with name "123" in domain "contacts":

PutAttributes("contacts", "123", ["Name"("Peter")])

But now watch! If you then issue this command

PutAttribute("contacts", "123", ["Addresses"("a")])

you don´t overwrite what´s been stored in the item, but add to it! The same is true for this command:

PutAttribute("contacts", "123", ["Addresses"("b")])

Remember that attributes can have several values. Item "123" now looks like this: "123"["Name"("Peter"), "Addresses"("a", "b")]. So you better also write the referenced addresses to the domain:

PutAttribute("contacts", "a", ["City"("London"), "Country"("GB")])
PutAttribute("contacts", "b", ["City"("Hamburg"), "Country"("Germany")])

But how then can you overwrite data, e.g. change the name of tis contact? If you just issue a PutAttributes() with the new name, the name will be added as a second value to the existing attribute. To overwrite you need to add a replace-flag to an attribute (I´ll denote it with a "!" after the attribute name):

PutAttributes("contacts", "123", ["Name"!("Paul")])

Replacing an attribute like this deletes all (!) existing attribute values and replaces them with the new value.

A word of caution: Amazon´s SimpleDB is supposed to scale. That´s why they distribute it across many servers and need to replicate data all the time. That in turn means, it will take some time until changes you made by PutAttributes() and the other operations ripple through to all relevant servers. So don´t expect to see changes right after you applied them! Otherwise, if you issue a PutAttributes() followed right away by a GetAttributes() for the same data - this could run on a different thread - you might be in for a surprise.

Reading data

Reading items back from the SimpleDB "data space" is even easier than writing them. Just send the GetAttributes() command addressing an item in a domain and pass the names of the attributes to retrieve:

GetAttributes("contacts", "123", "Name")

will return ["Name"("Paul")]. Of course you can specify more attributes to be retrieved. And since you only state their name, they´ll be returned with all their values.

Item data can only be retrieved like this! Queries (see below) just return item names, but no attributes. Think of them as SQL statements like this:

select attributeName1, attributeName2, ... from domainName where itemName="..."

Looking up data thus always is a two step process: 1. Issue query and receive a list of matching items, 2. retrieve item´s attributes with an item name from the query result.

Deleting data

You can´t delete items explicitly. You can only delete attributes from them - and if none are left in the item, the item is deleted automatically.

DeleteAttributes("contacts", "123", "Addresses")

only deletes the references to the other items, but the contact item remains in the domain. You also need to delete its name attribute, plus, of course, the parentless addresses:

DeleteAttributes("contacts", "123", "Name")
DeleteAttributes("contacts", "a", "City", "Country")
DeleteAttributes("contacts", "b", "City", "Country")

Creating a domain

Working with domains as the containers for items is easy. You can create a domain at any time. Just call

CreateDomain("contacts")

and that´s it. Just pass in a unique domain name. From then on, you can use this domain name in item-operations.

Deleting a domain

Deleting a domain is as easy as creating it:

DeleteDomain("contacts")

The items and attributes in that domain will be gone then. But this might take up to 10 seconds, Amazon says, due to the distributed nature of SimpleDB.

Querying domains

If you want to get an overview of the domains in your SimpleDB "data space", just call ListDomains():

ListDomains(10, &nextToken)

It returns a list of domain names. This resultset is paged, though. The first parameter to ListDomains() specifies the size of these pages, e.g. 10 domain names per page, the second parameter is a token you can use to retrieve the next page.

Passing in a token to ListDomains() returns that page´s domain names and sets the token to the next page, if there is any.

nextToken = ""
domainNames = ListDomains(10, &nextToken)
// process first page of domain names
domainNames = ListDomains(10, &nextToken)
// process second page of domain names
...

Querying data

Finally, there is also a way to query items. SimpleDB sports a simple query language. You can think of the queries as the where-clause of a SQL select statement, e.g.

select itemName from domainName where simpleDB-query

Queries are limited to a single domain and return just item names as paged resultsets like ListDomains().

The building blocks of queries are predicates. A predicate is a logical expression made up of attribute comparisons, e.g.

['City' = 'Hamburg' OR 'City' = 'London']

Both attribute name and attribute value need to be put in single quotes. SimpleDB sports the usual comparison operators like =, != etc. and a STARTS-WITH which resembles the SQL like, e.g. like 'A%'.

['Name' STARTS-WITH 'A']

Remember, all comparisons are alphanumeric, since SimpleDB only stores texts.

The logical operators within predicates are AND, OR and NOT.

You may only query for a single attribute name with one predicate! ['City'='Hamburg' OR 'City'='London'] is ok, but not ['Name'='Peter' AND 'City'='London']!

To state queries on attributes with different names, you need to separate predicates for each:

['Name'='Peter'] INTERSECT ['City'='London']

The set-operations to combine the resultsets of each predicate into one are INTERSECT, UNION and NOT. INTERSECT calculates the common set of item names of two predicates, UNION merges the item name sets of two predicates. INTERSECTS thus works like the logical AND operator, UNION like the OR.

Why does Amazon deviate like this from the well established SQL way of defining queries? The reason probably lies with the internal structure of the SimpleDB "data space". Grouping the constraints on attributes with the same name probably makes query execution faster. Maybe SimpleDB is based on a column store?

EBNF SimpleDB query syntax

Query ::= ItemSetTerm { "UNION" ItemSetTerm }.

ItemSetTerm ::= ItemSetFactor { "INTERSECTION" ItemSetFactor }.
ItemSetFactor ::= [ "NOT" ] "[" PredicateExpression "]".

PredicateExpression ::= PredicateTerm { "OR" PredicateTerm }.
PredicateTerm ::= PredicateFactor { "AND" PredicateFactor }.
PredicateFactor ::= [ "NOT" ] PredicateComparison.
PredicateComparison ::= AttributeName ComparisonOperator AttributeValue.

AttributeName ::= Chars enclosed in single quotes, e.g. 'Name'.
                  All AttributeNames in a PredicateExpression need to be the same.
                  All quotes in AttributeName need to be properly escaped.
AttributeValue ::= Chars enclosed in single quotes, e.g. '003.14'.
                   All quotes in AttributeValue need to be properly escaped.

ComparisonOperator ::= "=" | "!=" | ">" | ">=" | "<" | "<=" | "STARTS-WITH".

What´s missing?

SimpleDB´s API is simple. That´s the beauty of it. A simple, dynamic data model plus a simple API sounds like a powerful combination for today´s fast moving software business.

But this simplicity comes at a price. Common operations like looking up data, are more cumbersome than with SQL. It´s a two step process due to SimpleDB´s queries returning just item names. Also currently transactions are missing completely.

Another aspect to get used to is the "eventual consistency" model, that means, changes take time to ripple through to all replicas of your data. Thus after a change there might be a short time where different clients might see the "data space" in a different state.

But overall, Amazon´s effort is very exciting nevertheless.

What´s next?

I deem SimpleDB even so exciting, that I wanted to be able to use it now and on my desktop. But there is no desktop/local version of SimpleDB and I don´t know when Amazon will grant me a test account of SimpleDB.

That´s why I sat down and developed my own Open Source version of SimpleDB: the .NET SimpleDB or NSimpleDB for short. I believe in the growing importance of tuple spaces in general and thus also am working with the University Vienna on bringing this paradigm to the hands of .NET developers. We call the basic technology "XVSM" for "eXtensible Virtual Shared Memory"; and it´s somewhat like SimpleDB. But on top we place more elaborate data structures so our space is not just partioned into domains but collections and other high level data structures. We envison them to allow for true "Space Based Collaboration" (SBC), which is in our view the foundation for "serverless real-time online collaboration". But I digress.

Back to SimpleDB: In my next posting I´ll show you, how you can use SimpleDB or the C# implementation of the SimpleDB API in your applications today without reliance on Amazon.

Have you heard about Amazon´s online "database service" SimpleDB? They describe it like this: "Amazon SimpleDB is a web service for running queries on structured data in real time." So it´s not a RDBMS, because Amazon does not call the data "relational", but just "structured". And you use a web service based API to access the data, not good old ADO.NET. Currently SimpleDB is in beta. You can get a test account to play around with it - if you´re patient. As of this writing (Jan 08) evaluation is limited; you need to apply and queue up to be assigned a test account. I have about 2 weeks ago, but haven´t heard from Amazon since then.

But why should you care? Well, SimpleDB would allow you to store data in a database without any setup costs. You don´t have to care about backup or moving to another ISP. You´re data, lots of data, can just stay with Amazon. Just add a web service proxy to you web (or desktop) application and off you go. This certainly make some (or a lot, or at least a growing number of) applications easier to implement.

Another reason to care about Amazon´s SimpleDB is its simplicity. In an age where dynamic programming becomes ever more popular and static whatever (e.g. typing, binding) loses value, making persistence more dynamic sure should look attractive. But exactly this is what Amazon´s SimpleDB is about: highly dynamic persistence of structured data.

SimpleDB data model

With SimpleDB you don´t define a database schema anymore. Your "data space" with Amazon is structured in a very simple way: it´s devided into sub-spaces called "domains" which each contain so called "items" which each contain so called "attributes". That´s it. And you can change the structure of this "data space" at any time. There is no distinction between meta data and data. Creating a domain (which resembles a table in a relational database) is a web service operation like storing an item in a domain.

image 

To make it very clear: You divide your "data space" into domains at your leisure. (Amazon currently just artificially limits the number of domains to 100.) And you stuff items of any structure into these domains. You never define a schema for a domain. The items stored in a domain don´t have to look the same. They can contain any number of attributes; all can differ in their number of attributes.

Attributes are name-value-pairs. So items are tuples of arbitrary aritiy. That means, SimpleDB is not a relational database, but a tuplespace. Just throw items/tuples into your SimpleDB instance at your leisure. That´s all their is to SimpleDB persistence. If you like, separate tuples into different domains - but if you do it or not does not make a big difference. For distinguishing between, say, customers and invoices that´s not necessary. It might even be contraproductive, since querying items is limited to one domain at a time. There is no such thing as a SQL Join.

The use of domains

So why are there domains at all? Probably they help Amazon to make replication of items between servers easier. And it might speed up queries if you distribute your data across domains. So think of domains as easy to set up data partitions in case you have to deal with huge amounts of data.

Multi-valued attributes

But not only don´t you have to define a schema for a domain and all items/tuples can have a different structure, there´s another deviation from relational thinking: Attributes can have multiple values! So items don´t even comply with the relational first normal form. See the "Phone" attribute in the following item:

 image

It´s not just several phone numbers separated by commas. No! The "Phone" attribute is really structured. You can retrieve (and query for) each phone number separatley. SimpleDB would return the item like this:

image

Think of what this means: Finally you can set up "natural references" between persistent data like in memory. A parent objekt points to its children. But when you persist these objects in a relational database, you usually invert the references. The child records will contain a foreign key to denote their parent record.

But with SimpleDB you can let parent items point to their child tuples:

image

See how all children are referenced with their id values from their parents? See how the number of attributes differ across the items? That´s all just fine with SimpleDB.

Just text

A drawback of SimpleDB might be its limitation to text. All attribute values are stored as just text. So comparison is alphanumeric and leads to effects like this: 20 > 100 because in fact the the comparison is "20" > "100". So be sure to take this into account when storing your non-text values like numbers or dates. Pad numbers with leading zeros (e.g. store "00012" instead of "12"), use a sortable date format (e.g. "2008-01-18"). If you expect to store negative numbers, move them into the positive range of numbers, e.g. instead of "-12" and "12" store "0" and "24" if you expect the value range to start from -12.

On the other hand SimpleDB in this regard does not differ from XML. Text simply is the least common denominator for storing data. Also, this makes SimpleDB more efficient, since it can be optimized for handling text (e.g. in terms of indices).

What´s next?

That´s pretty much all there is to say about SimpleDB´s data model. It´s simple. It´s dynamic.

In my next posting I´ll introduce you to the SimpleDB API. It´s simple, too. Just a couple of easy operations.

But if you want to move forward more quickly, have a look at the SimpleDB documentation on the Amazon website. You can also try out my Open Source implementation of the SimpleDB data model and API. It´s called NSimpleDB and is hosted with Google. More about this too in a future posting.

More Posts