In his recent blog posting Seth Godin once again questions the value of competence. Sure, he does not want to people dumber. He just argues that sole reliance on competence as a compass to navigate the future can - well - be a hindrance. He´s written about it already in 1999 and made clear, that competence is about accomplishing something on the basis of existing knowledge - and thus is different from finding new ways of doing stuff. Whoever is competent is not necessarily innovative or imaginative. But that´s what we need in the face of constant change.

If the environment keeps changing you need to constantly adapt. Adaption is trying out new ways of coping with the environment - hopefully finding better ways to deal with it than in the past. So adaption needs innovation, not competence.

To understand what Seth Godin means and what I see necessary for the software industry let me put the argument about competence into perspective:

In the beginning of any issue there is incompetence. People have a hard time to get things right. The need to build up competence. They need to gather a body of knowledge and rules. Conventions need to be established on how to most effectively reach desired results. This is the pre-phase of any issue. It´s pre-conventional.

Then there is a long phase of competence. It´s about rules, regulations, canonicallity. Conventions rule, so to speak. There is a way to do things right. To become competent you learn to adhere to the rules. Whoever knows and executes the rules best is most competent. You don´t know the rules, you´re incompetent - say the competent ones. It´s a phase of duality. The good are the competent people, all others are the bad who need to be converted (or just faught). To the competent ones this phase is the pinnacle of development.

But, alas, the competence phase is just a phase. Although many can live in it pretty well, in the end it´s a dead end. Innovation is hard under a regime of competence driven people.

Enter the next phase: After competence comes... conscious incompetence. Transcending competence is about knowing when it´s right to apply - and when not. Whoever "is trans" (and not just competent) knows the rules, but feels free to abide by them or not. He knows about the reasons behind the rules, their history, the conditions that once formed them. So if conditions change he can step over any no longer fitting rules and start anew as a "pre".

The cycle of pre->conventionalism->trans starts again. And with it begins innovation.

Becoming trans might not be for everyone in the competence phase. But at least the competence people should recognize the importance of stepping up. So they should allow for people to become trans and move on. They should even foster trans-formation.

Not seeing beyond the pre- or conventionalism-phase is falling prey to the pre-trans-fallacy. It´s either asserting there is nothing beyond competence. Or it´s asserting to already be trans. The latter might be more dangerous, because it´s mostly mixing up being pre with being trans. "No rules" is true in the pre and trans phase - but for different reasons. Whoever is pre denies the rules or the necessity of any - just per se. But who´s trans has gone through learning rules but sees their limitations - and thus does not feel compelled to abide by them. However, being trans means to empathically admit the (passing) phase of conventionalism.

So if we want to move on in the software industry we need to be conscious of not falling into the pre-trans-fallacy trap! Otherwise we might get stuck with our software projects in the ever changing morast technologies and requirements.

PS: If you want to read more about the pre-trans-fallacy try to google it. But never mind the context of spirituality and esoteric thinking. Although the fallacy got pointed out first in those circles it does not mean it can´t be applied to technical issues.

As I read Kevin Kelly´s "Fate of the Book" I come to wonder what this debate he´s referring to is all about? Is it about form or content? Is is about texts as opposed to video or audio? Is it about texts of a certain minimum length and/or structure as opposed to text snippets? Or is it about a certain physical container for texts as opposed to digital texts? Or is it about certain types of physical containers?

Until digital word processing it was pretty clear what a book was: a text longer than a couple of pages bound and put between covers. Text of a minimum length in a certain physical form made a book.

Since then, though, because we all write texts using word processing software and don´t need to print them out anymore to have other´s read them, since then what a book is has changed. Or at least if you talk about books you need to be more specific what you mean.

Today, I´d say, a book can be at least two different things: it can be the traditional book as described above. Go to a bookstore of your choice and you find thousands of them there. Or a book can be just a imagedigital text you call a "book". It could be just 10 pages with a single sentence on each page or it could be 500 pages full of small print text. If you assume this point of view, it´s pretty much up to you what you call a book.

Well, before you call a digital text a book, I think, something more needs to be added. Just text is not sufficient. Otherwise any blog posting like this would be a book. If you want a text to be a book, you need to prep it up a little bit. You need to make it print-ready. It should be typeset on electronic sheets of paper; also it should sport at least a title page. But other than that... pretty much any text can be called a book. Because, if you can print it and bind it, well, it becomes a "traditional" book.

So my bottom line is: essentially the book is in the eye of the beholder. Take any text you like, print it out, bind it, voilá, there´s your book.

But that´s certainly not what the debate is about. What is in question is: What´s the fate of texts longer than a couple of pages? And what´s the fate of the physical form of the book - regardless of whether it containers 5, 50 or 500 pages?

Physical Books - Quo Vadis?

It´s difficult but I´ll try to abstract from my personal taste. I like physical books. But just because I like them they don´t need to exist indefinitely. So if I try to subtract my emotional attachment from the picture, what´s left?

image I think the benefit of having a physical book in one´s hands is underestimated. Reading a book is more than "taking in" a text by scanning pages full of letters. It´s like following a conversation right in front of you. There are not just words, but real people who send signals on different "channels". It´s how they look, how they move, how their whole body language is. Following a conversation in a conference call is much more difficult.

Likewise reading a text online is more difficult than reading it printed out. And reading it on just a couple of loose pages that came out of a desktop laser printer is more difficult than reading it as a book. A book provides a context, it provides input through more than the visual sense. A physical book makes a text tangible - even more than some printouts. It literally manifests the thoughts behind a text.

If we realize this, we realize the age of the physical book is not over. Because there will always be texts which highly benefit from being "taken in" with more than the visual sense. And we should not think ePaper or Amazon´s Kindle are a danger for physical books. They simply don´t provide the total sensual input of a physical book.

Whoever want´s to read most effectively and with most please will always want a book in his/her hands.

Digital Books - Quo Vadis?

Digital books, i.e. digital texts of a certain length and form, as opposed to blog postings or podcasts are on a rise, too. Linear text is not dead. As useful as linked digital text snippets and other media are, digital books will stay useful too. Why´s that?

It´s because of the benefits of longer sequential texts for readers and authors alike.

image Readers benefit from sequential texts because they help guide and focus their thoughts. The ability to choose among any text at any moment is good; but for most people it´s hard to not just be free to decide what to do next, but to be actually forced to decide. Especially when reading for recreational purposes and learning something new it thus is a virtue to provide guidance and focus for the reader´s mind. A digital book lowers the cognitive effort a reader needs to invest to immerse herself in a topic.

That´s not to say we all should only read books. Of course not! I just want to put books and other media and textual forms in perspective. Different forms are optimal for different purposes. Until quite recently there were only physical books to transport knowledge and stories in a 1 to many way. That´s completely different today. So we can and have to choose which form and media to use for which purpose. Today almost everyone can choose between digital text, video, audio, pictures, and drawings - all of different forms. That´s good and won´t go away anymore. But it does not mean longer sequential texts are of no use anymore. Right to the contrary! We´re just in a transitional period where we need to assess all the new toys at our fingertips. It´s like with desktop publishing some decades ago: when it was new, everybody tried out all those new fonts. And many DTP products looked ugly. This period is over, we´ve learned how to use all those options in a beneficial manner. Times Roman and the Golden Section have survived the turmoil. So will the digital book.

Not only readers benefit from sequential texts, authors do, too. Finding an easy to read sequential textual layout for a topic is a process which excersises the author´s complete knowledge about his topic. And not only that, it also requires him to think very hard about the reader, the receiver of his text message. Shredding a topic into hypertext pieces is easy. But it easily puts the burden on the reader: he´s the one who needs to piece together the picture of the whole topic.

Hypertexts and text snippets seem to have "explorative learning" and the ubiquitous "time crunch" in their favor. But in the end, quite often they leave the reader alone, don´t provide guidance, and cost more time than a (good) sequential text.

An author should not let himself get lured into producing modern forms of text just because they are, well, modern and make writing easier for him. An author always needs to have his audience in mind. And that´s easier if he needs to wreck his brain to come up with a single sequential text for whatever he´s to say.

The Future of the Book

Their is a future for books (and magazines, for that matter). But it will look different than the past. Books cannot stay the same.

Firstly printed books need to become digital. Only eBooks provide all the flexibility readers want to have - including the option to print and bind them.image

Secondly the eBooks of the future need to become modular, at least whatever is not a story. Again because readers want the flexibility to just buy or print parts of a larger textual body. Why buy a whole eBook if I´m just interested in chapters 6 and 27 out of 48? Such flexibility was not possible in a world of physical books only - but it is in the digital world.

Thirdly eBooks or just any text needs to be easily printed and bound as a book. Printing texts is not on a decline but rather on a rise. As argued above reading text on paper bound as a book makes understanding it easier. And it still makes it much more portable. That means, whoever wants to read effectively with his/her whole body will want to print longer sequential texts and bind them. Print-on-Demand (PoD) and desktop laser printers are not up to that task. A world full of eBooks needs new, to be invented printing services.

Since I´m mostly concerned with software architecture and my clients are asking again and again when I´m going to write a book about the topic, I finally decided to set out and compile the material to go into the book. And I decided to do it publicly, in a new blog.

Not that I haven´t done that before here and in my German blog. But now I´ll try to be more comprehensive, put everything in a single place, and add some new stuff I have not written about before. Plus, through a blog all´s open for discussion.

So if you like, have a look at The Architect´s Napkin. It´s the title of my blog, because I think, software architecture is not an arcane art to be practiced by just a few chosen in ivory towers, but can and needs to be practices by almost all developers at some point in time. So it better be easy - and what can be more easy than something that can be done on the back of a napkin?

So the architectural images you´ll see in the blog are like this

image

or like this

image

None will be more complicated than whatever fits on a napkin in a readable and understandable way. I strongly believe in the power of visualization; and I believe that any minute invested by an architect into a simpler depiction will save his developers hours of head scratching.

Hope to see you over there at the bar at www.geekswithblogs.net. I´ll be there sketching some architectures while sipping a cocktail...

image

image Please find the sample code for my presentations at Software Architect 2008 on Aspect Oriented Programming with PostSharp and Software Transactional Memory with NSTM here for download:

http://www.ralfw.de/download/Software_Architect_08_Samples.zip

If you´ve any questions, feel free to contact me by email.

Enjoy!

You´re fluent in object oriented programming. But now and again you´re wondering what the fuzz about component orientation is? There is supposed to be more to it than just using 3rd party controls in your user interfaces. But, what and how?

Component orientation is about higher productivity, easier maintenance, better testability, more flexibility, and - if you´re fond of it - reusability.

But how´s that? How does component orientation reach all those lofty goals? The trick is pretty simple: component orientation takes the basic design principle of loose coupling very seriously. But instead of now explaining contract-first design (CfD), IoC containers, binary code units, and component workbenches, let me demonstrate component orientation in a more tangible way with a musical example.

Requirements

The requirements for my musical project are: Produce a recording of the simple piece "Bell-ringers" (Source: "Bell-ringers" by Katherine and Hugh Colledge, (c) 1988 by Boosey and Hawkes Music Publishers Ltd.) as depicted below:

image

The requirements are clear, since a complete requirements document has been provided by the imaginary customer. Additionally the customer has stated, he´d prefer the piece played on the violin.

However... although I can read and understand the requirements I can´t play the violin. But that´s not really much different from software development, isn´t it? Often a solution needs to be developed with a technology you´re not familiar with; or you are supposed to adopt a process you´ve no experience with. So I guess the requirements are pretty realistic, even though it´s just an analogy.

 
Product

Out of didactic considerations now let me already present the product I developed according to the above requirements. So first just listen to my "Bell-ringers" recording:

Maybe it´s not exactly what you or the customer expected, but it´s on its way to what a true violin expert like Nigel Kennedy whould have produced ;-) In that, though, it´s also close to reality, isn´t it? Who has ever given a customer what she had expected for a first release?

Component oriented development

Now that you know the requirements and what I delivered to the customer, let me take you backstage. How did my component oriented development process work?

1. Decomposition into components

First I determined the components the final product should be build from. For you to understand this let me define component in a somewhat unorthodox way as:

A component is a part of the product that can be produced independently of other parts.

imageFor a musical composition like "Bell-ringers" these parts or basic building blocks are all the different musical notes. I identified a, h, c'#, e, d', a', g'#, f'# and e' to be needed for a "Bell-ringers" production. To the right you see my original "analysis document".

In order to compose something from such components, though, more is needed: the relationships between the components have to clear. Components are not just dump parts but serve a purpose. They provide a service to other components. Here´s an addition for the above definition of component:

Components have a clear specification as to what services they provide and which other components´ services they depend on. This specification needs to be separate from any component´s realization.

Unfortunately here the analogy somewhat breaks. The relationships between musical notes are obvious from the musical score and are very, very simple. Their services are self-contained, so to speak.

Nevertheless there are relationships between the musical components. Each musical component (note) has a predecessor and a successor. That´s at least two relationships. And there can be more, e.g. in a chord with several musical notes played at the same time.

2. Component implementation

After I had identified all the components necessary to build the requested product I produced them independently. This was possible due to their loose coupling. I was able to arrange production in any order I saw fit. And if I had wanted to employ other musicians/developers I could have done so. They could have worked completely independent from me and in parallel. Thus components not only make the order of production flexible, but they also allow for very high productivity.

Here you see the "implementation" of two of the components. Yes, I implemented them not just as mp3-files, but videos. Above you just listened to the sound. But at the end you'll get the full monty :-)

Sample component a:

Sample component d':

Now you might think this kind of isolated production of components can only be done for musical components. But that´s not the case. My customers and I do it all the time in software projects. Admittedly then it´s a tad more difficult to determine which components to produce and what their dependencies are - but nevertheless it´s possible and it´s feasible. With a little component planning the stated benefits can easily be reaped.

(A word on the surroundings you can see in the video clips: Since I can´t play the violin I had to ask my little daughter, Verena (7 years old), to show me how to play the notes. So I´m sitting in her room while I´m implementing the components. And by the way: this all happened quite ad hoc some on some Sunday morning ;-)

3. Component integration

In monolithic projects, i.e. projects whose source code is just composed of classes but no components, the solution is finished as soon as all classes have been developed. With component orientation that´s different. The price to pay for higher productivity and increased flexibility is a separate integration phase. The components developed independently need to be integrated into a whole.

For the musical production this step consisted of connecting the components according to the relationships defined by the musical score. Here you see the components in the story board view of my video editing software:

image

From left to right it´s the components "a", "b"/"h", "c'#", "a" etc. But for now they are just roughly put together. To make them really cooperate their services need to be finetuned. This means I also need to adjust the lengths of the musical components. That´s what´s happening in the main editing view of the video software:

image

The duration of each "component instance" (remember: all components and even sequences of components (composite components) are reused several times) is adjusted to the requirements. Also you can see I added some "infrastructure" like a trailer and an intro.

While I tried to implement each component by itself with highest quality - call this unit testing if you like ;-) - it was only during integration I was able to see (or hear), if the individual qualities really added up to an overall acceptable quality.

As it turned out during integration, I had forgotten to implement one of the components. So I had to set up the video equipment again and do a retake. If you look closely you´ll be able to spot this component ;-)

But - thanks to component oriented production - this was the only thing I needed to do to save the whole production. Had I chosen the usual monolithic approach I would have had to redo everything. But component orientation made it possible to insert the additional implementation within the "network of building blocks" just where it was needed. That´s what I call flexibility!

4. Deployment

Finally, once all components were arranged, I let the video editing software encode the whole sequence into a single mp3-file (listen above) and a video file (see below). These files I then could deliver to the imaginary customer.

Here´s the whole product not only to hear, but also to see:

It´s the same sound as above - but since you´re looking under the hood you can actually see (!) it´s not monolithic, but component oriented. The customer experience is like it should be: smooth - although not yet perfect ;-) But the architecture of the whole is flexible. There is no tightly knit fabric of sounds, but sound bits (components) produced independently and then integrated in a way so that they form a seamless product.

The blessings of component orientation

Let me point out again the blessings of component orientation for this production:

  • Components let me produce the whole in separate bits according to my liking. I could have arranged component implementation with regard to difficulty or availability of my coach Verena. Or I could have outsourced production of individual components. Components made the implementation flexibile and highly productive.
  • Component quality could be checked individually. In order to attain high overall quality I did not need to produce a whole right from the start like in a live concert. I would have had to rehearse a lot for that. Rather I was able to hone the quality of small parts of the whole in isolation. Components made the implementation much easier to test.
  • As the integration phase showed, components made it easy to modify the whole by inserting (or replacing or deleting or changing) just isolated parts. Again I did not need to play the whole piece again just to compensate for a mistake. I just implemented the component I had forgotten and inserted it where needed. The component oriented architecture was easy to maintain.

Well, what can I say? Component orientation rocks!

 

PS: As you can see from the trailer of the video, the video was planned as a hommage to some Lasse Gjertsen. You can find the at YouTube with several videos which inspired me. He´s done some awsome stuff. Check out his list of productions. But if you don´t have much time and are mainly interested in component orientation, view at least his video "Amateur". It shows a "distributed application" developed by "two teams" ;-)

As explained in my previous postings, I implemented a local/embeddable version of the Amazon SimpleDB data model and API in C#. You can download the sources from my NSimpleDB Google Code Project and build the tuple space engine yourself, or you download the demo application which includes the engine as a single assembly: NSimpleDB.dll.

Using the SimpleDB API then can be as easy as referencing the engine assembly and opening a local tuple space file like this:

using NSimpleDB.Service.Contract;

 

ISimpleDBService ts;

ts = new NSimpleDB.Service.VistaDb.VistaDbSimpleDBService("hello.ts");

...

ts.Close();

See my previous posting for detailed examples.

Access to Amazon SimpleDB

The API I devised for the SimpleDB purposedly was quite "service oriented", although the implementation was just local. I did this so my implementation and Amazon´s eventually could be used interchangeably. Back then, though, I did not have access to SimpleDB due to the limited beta.

But that has changed in the meantime. I was able to use SimpleDB online and thus have now implemented access to it through the same ISimpleDBService interface. Just instanciate a different service implementation:

ts = new NSimpleDB.Service.Amazon.AmazonSimpleDBService("<accessKeyId>", "<secretAccessKey>");

Instead of the placeholders pass in your Amazon access key id and your secret access key and you´re done. From then on all operations through the interface will run on your online SimpleDB space.

Here´s a small example of what you could do: create a domain, store some items into that domain, query the items, retrieve the found items, delete the domain.

ts.CreateDomain("mydomain");

 

ts.PutAttributes("mydomain", "1",

                        new SimpleDBAttribute("name", "peter"),

                        new SimpleDBAttribute("city", "london"));

ts.PutAttributes("mydomain", "2",

                        new SimpleDBAttribute("name", "paul"),

                        new SimpleDBAttribute("city", "berlin"));

ts.PutAttributes("mydomain", "3",

                        new SimpleDBAttribute("name", "mary"),

                        new SimpleDBAttribute("city", "london"));

 

string nextToken = null;

string[] itemNames;

itemNames = ts.Query("mydomain", "['city'='london']", ref nextToken);

 

foreach (string itemName in itemNames)

{

    Console.WriteLine("item: {0}", itemName);

 

    ISimpleDBAttribute[] attributes;

    attributes = ts.GetAttributes("mydomain", itemName);

 

    foreach (ISimpleDBAttribute attr in attributes)

        Console.WriteLine("  {0}={1}", attr.Name, attr.Value);

}

 

ts.DeleteDomain("mydomain");

(With regard to my previous API descriptions only a minor changes has occurred: you need to always pass in the nextToken as a ref parameter instead of an out parameter. But that´s a detail, I guess.)

This code runs locally as well as against SimpleDB online without change.

Under the hood I´m using Amazon´s own C# API for accessing SimpleDB. It was easy to program against - but I would not want to expose it as an API. For that I don´t find its usability too low. Too much meta data in the way of getting your problem solved.

Access to SimpleDB operation meta data

In order provide access to the meta data the Amazon SimpleDB API provides without compromising the simplicity of my ISimpleDBService interface, I decided to define a second interface: ISimpleDBDashboard.

Like a dashboard in your car this interface optionally "displays" what´s going on inside an ISimpleDBService implementation. AmazonSimpleDBService implements this interfaces and through it provides access to the response and exception meta data returned from the internally used Amazon API. To use it, just cast the service object like this:

ISimpleDBDashbord db = (ts as ISimpleDBDashbord);

There are only two properties on the dashboard interface: LastResponseMetaData and LastExceptionData. They give access to the meta data/exception data of the last operation issued on the current thread. So even in case you use the same service implementation on different threads, you always will be able to exactly see, what the status of the individual operations is. This is how I tried to avoid always returning some meta data object like Amazon does. It makes for a simpler API, I´d say. Call the dashboard properties right after any operation, if you like, e.g.

ts.DeleteDomain("mydomain");

Console.WriteLine("request id: {0}", db.LastResponseMetaData.RequestId);

Console.WriteLine("box usage: {0}", db.LastResponseMetaData.BoxUsage);

 

What´s next?

Well, that´s it for now. I have reached my initial goals of 1) provide a local version of the SimpleDB data model and API for all who want to play with it more easily than online, and 2) make the use the local tuple space and the real online SimpleDB equally possible and simple.

Of course I still have some ideas as to what can be improved (see the issue list online), but I now need to give the project a little rest. Please feel free, though, to use it and give me feedback on it via info [at] ralfw [dot] de, if you like.

Enjoy!

In my previous postings about Amazon´s SimpleDB data model and API I explained, what Amazon´s online database service - or to be more precise: tuple space - has to offer in general. If this sounds interesting to you, then now welcome to the desktop. Because it´s the desktop on which you can actually experience what it´s like to use such a tuple space. SimpleDB currently (as of Jan 08) is just in limited beta and you have to line up to get one of the limited test accounts.

But you don´t need to wait for Amazon to open up more. I implemented the SimpleDB data model and API in C# for you to integrate in your desktop or web applications. It´s an Open Source project at Google Code called NSimpleDB as short for .NET SimpleDB. I´d say it pretty much offers all features SimpleDB - but as an embeddable database engine instead of an online service.

Installing NSimpleDB

There are two ways for you to use NSimpleDB: Either you download the source code from the subversion repository of the NSimpleDB site. Then you can browse the sources and compile it yourself. But please note, you need to also install a VistaDb database engine. NSimpleDB internally is based on VistaDb and needs its libraries. As you can imagine I did not want do develop a whole persistence engine just to implement SimpleDB´s tuple space. But you can download an eval copy of VistaDb and need not fear to incur any costs right away. Also VistaDb is working on a free community edition, which I will of course use for NSimpleDB once it´s available.

Or, if you don´t want to mess around with the NSimpleDB source code, you can download the small demo application from the download area, which comes with a complete NSimpleDB engine as just one assembly: NSimpleDB.dll.

Using NSimpleDB

If you´re using the precompiled version of NSimpleDB just reference NSimpleDB.dll from your .NET project. If you compiled NSimpleDB yourself, though, reference NSimpleDB.Service.Contract.dll as well as NSimpleDB.Service.VistaDb.dll from the global bin folder of the source code tree. Also be sure to either have VistaDb installed on the same machine or copy VistaDB.NET20.dll into your projects´s output folder.

In any case you then should "open" the following namespaces in your source code:

using NSimpleDB.Service.Contract;

using NSimpleDB.Service.VistaDb;

Opening a NSimpleDB database

To work with NSimpleDB you need to manage a connection to a database file like with any regular RDBMS product. You can do it with the method pair Open()/Close() like this:

VistaDbSimpleDBService ts = new VistaDbSimpleDBService();

ts.Open("hello.ts");

...

ts.Close();

Just be sure to call Close() at the end. Pass in any name you like to give to your database file. NSimpleDB does not require a certain filename extension.

Or you can rely on the compile to generate the code to automatically close the connection. VistaDbSimpleDBSevice implements IDisposable:

using (VistaDbSimpleDBService ts = new VistaDbSimpleDBService("hello.ts"))

{

    ...

}

Working with domains

In order to store any data in your NSimpleDB tuple space you need to create a domain first. It´s as simple as this:

ts.CreateDomain("contacts");

No return value, nothing. This operation is idempotent. You can´t create a domain twice. From then on you can use the domain name in other operations.

To delete a domain, call the opposite operation:

ts.DeleteDomain("contacts");

This operation also is idempotent, don´t worry. And it´s asynchronous. Although the domain will become inaccessible right away for future operations, current operations are not interrupted and the actual deletion will take place at a later time.

Reflecting on domains then can be as easy as this:

string[] domainNames;

domainNames = ts.ListDomains();

foreach (string d in domainNames)

    Console.WriteLine(d);

But this is just the NSimpleDB way of doing it. Amazon´s SimpleDB does not provide such a simple way for retrieving all domain names. Instead the resultset is returned in pages. You determine the page size on the initial call to ListDomains():

domainNames = ts.ListDomains(10, out nextToken);

Also you pass a token variable into which the method puts an identifier for the next page. To retrieve it, call ListDomains() again just with this token:

domainNames = ts.ListDomains(ref nextToken);

If there are no more pages, the token is set to null.

Here´s how you can retrieve all domain names in a pagewise manner using these operations:

string[] domainNames;

string nextToken;

domainNames = ts.ListDomains(3, out nextToken);

while (domainNames.Length > 0)

{

    foreach (string d in domainNames)

        Console.WriteLine(d);

    domainNames = ts.ListDomains(ref nextToken);

}

Working with items and attributes

Storing items

Once you created a domain you can start storing items in it. Each item is uniquely identified by the domain name and its item name. The attributes to put into the item are passed as ISimpleDBAttribute objects. NSimpleDB for this reason provides the SimpleDBAttribute class:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("Firstname", "John"),

                new SimpleDBAttribute("Lastname", "Doe"));

You can freely choose an item name. You even need to choose one. It´s the primary key of the item, so to speak.

To later add an attribute to an item, just call PutAttributes() again and pass it the additional attribute:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("DOB", "1972-10-23"));

Adding attributes is so easy, because they can contain multiple values. In the relational data model you would need to define more than one column or even set up a second table for this kind of 1:n relationship. Not so with (N)SimpleDB:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("Phone", "555-1234"),

                new SimpleDBAttribute("Phone", "0170-332 3483"));

But then, how do you replace an attribute´s value? Pass it by explicitly stating you want it to be replaced:

ts.PutAttributes("contacts",

                "123",

                new SimpleDBAttribute("Firstname", "Peter", true));

Create the attribute object with true for the replace parameter.

Retrieving attributes

Retrieving an item is even easier than storing it. What you get is an array of ISimpleDBAttribute objects:

ISimpleDBAttribute[] attributes;

attributes = ts.GetAttributes("contacts", "123");

 

foreach (ISimpleDBAttribute a in attributes)

    Console.WriteLine("{0}={1}", a.Name, a.Value);

The default is to return all attributes with all their values. But you can limit the number of attributes by passing in explicit attribute names:

attributes = ts.GetAttributes("contacts", "123", "Firstname", "Phone");

 

Deleting attributes and items

You can delete attributes from an item at any time. Just specify their names. They´ll be purged from the item with all their values:

ts.DeleteAttributes("contacts", "123", "Phone");

Once all attributes are gone, the item is gone as well. To make this easier, there is a shortcut. Just don´t specify any attribute name at all:

ts.DeleteAttributes("contacts", "123");

 

Querying for items

Use GetAttributes() to retrieve a single item. But for that you have to know the item´s name. Where do you get this item name from, though? Just use a query:

string[] itemNames;

itemNames = ts.Query("contacts", "['Firstname'='John']");

SimpleDB´s query syntax is just about conditions the items, you´re looking for, have to fulfill. So it´s not a full blown data retrieval/manipulation language like SQL, but more just like a syntax for logical expressions. Here´s a second example to illustrate this:

itemNames = ts.Query("contacts", "['Firstname'='John' OR 'Firstname'='Peter'] UNION ['Lastname'='Davis']");

To load the attributes for the found items, just call GetAttributes() in a second step:

foreach (string itemName in itemNames)

{

    Console.WriteLine(itemName);

    ISimpleDBAttribute[] attributes;

    attributes = ts.GetAttributes("contacts", itemName);

    foreach (ISimpleDBAttribute a in attributes)

        Console.WriteLine("{0}={1}", a.Name, a.Value);

}

That´s it. Easy - but different from working with SQL.

Read more about the query syntax in my previous posting.

Since a query potentially matches a large number of items, SimpleDB´s querying is paged. NSimpleDB sports an unpaged version (see above), but of course also paged querying. It works like paged domain retrieval:

string[] itemNames;

string nextToken;

itemNames = ts.Query("contacts",

                    "...",

                    10,

                    out nextToken);

while (itemNames.Length > 0)

{

    foreach (string itemName in itemNames)

    {

        ...

    }

    itemNames = ts.Query(ref nextToken);

}

Conclusion

This is NSimpleDB. A simple data persistence API - but unlike your regular RDBMS. Nevertheless I find it very interesting an am looking forward for Amazon to open up its beta program for SimpleDB. But until then you can get acquaintant with SimpleDB´s data model by using NSimpleDB locally. Enjoy!

PS: Let me know any questions and suggestions you might have. Just email me via my homepage.

Amazon´s SimpleDB is an exciting new player in the database world. It´s free, it´s online, it´s not relational. SimpleDB is a dynamic database implementing a tuple space. Currently SimpleDB (as of Jan 08) is in beta - but not everyone can get his hands on it. You have to apply and line up for one of the limited test accounts.

Nevertheless it´s worthwhile to take a closer look at SimpleDB. It´s a brave step forward by Amazon to offer an online database (accessible via a web service) that´s deviating from the mainstream data model of RDBMS.

In part 1 of the series of postings I described this data model: You store tuples (aka items) consisting of name-value pairs (aka attributes) in a SimpleDB "data space" without the need of any configuration. No schema design necessary. No tuple needs to look like an other. Just so called domains are a structuring concept to group tuples. But it´s nowhere written you have to use more than one domain. Even different kinds of items don´t force you to distribute them across domains. Domains that way are more of a concern regarding scalability and quantitative constraints Amazon put on them.

A simple SimpleDB API

The data model of SimpleDB is simple, so is its API. It´s not based on a query language (although it provides set selection, see below), but rather follows the tuple space concept in that it defines just a small number of methods to read item from and write item to the "data space".

Following I´ll use pseudo code to describe the API. I think will be pretty self explaining. In reality Amazon offers a web service to work with SimpleDB, so you´ll use some kind of proxy class in your code. Amazon even published a .NET binding - but hasn´t gotten rave reviews so far. There is much room for improvement.

Attributes as smallest data units

The smallest piece of data with SimpleDB is an attribute. An attribute is a name value pair like "Name"("Peter") or "Amount due"("000000300.00") or "DOB"("2000-05-12") or "Marked for deletion"("1").

As you can see, values are just strings. It´s like with XML. Attribute names are also strings - and they can contain white space. This makes them easier to read and use as labels in frontends.

In addition - and in stark deviation from the relational data model - attributes can have multiple values, e.g. "Phone numbers"("05195-7234", "040-413 823 090", "0170-233 4439").

Amazon suggests, you don´t try to store large pieces of data in attributes, e.g. a multi-MB image. Rather you should put such byte-blobs into some other store - e.g. a file on an FTP-server or Amazon´s S3 - and use the attribute value as a reference.

Items as containers for attributes

Attributes belong to items. In principle items can contain any number of attributes, but Amazon put some limitations on them. Currently only 256 attributes are allowed in each item.

Items can be written as tuples and are identified by an explicit id you have to provide, e.g. "123"["Name"("Peter"), "City"("Berlin")]. The id is called "item name" an again is a string.

As you can see, attributes are tuples with unnamed elements, but items are tuples whose elements are named.

Domains as containers for items

Items are stored in domains. Like them, domains have an id, the domain name. No schema needs to be defined for them. Just pour items of any structure into them as you like, e.g. "contacts"{"123"["Name"("Peter"), "Addresses"("a", "b")], "a"["City"("London"), "Country"("GB")], "b"["City"("Hamburg"), "Country"("Germany")]}.

As you can see, domains are tuples, too. Their elements are named tuples, the items.

Writing data

Roughly you can say, domains are like tables, items are like records in a table, attributes are table columns. So storing data with SimpleDB means: write items with their attributes to a domain. That´s like writing records with their column data to a table.

SimpleDB provides a single operation for writing data: PutAttributes(). Identify where you want to put the attributes - into which item in which domain -, hand in the attributes - and you´re done.

This command would write a single attribute to the item with name "123" in domain "contacts":

PutAttributes("contacts", "123", ["Name"("Peter")])

But now watch! If you then issue this command

PutAttribute("contacts", "123", ["Addresses"("a")])

you don´t overwrite what´s been stored in the item, but add to it! The same is true for this command:

PutAttribute("contacts", "123", ["Addresses"("b")])

Remember that attributes can have several values. Item "123" now looks like this: "123"["Name"("Peter"), "Addresses"("a", "b")]. So you better also write the referenced addresses to the domain:

PutAttribute("contacts", "a", ["City"("London"), "Country"("GB")])
PutAttribute("contacts", "b", ["City"("Hamburg"), "Country"("Germany")])

But how then can you overwrite data, e.g. change the name of tis contact? If you just issue a PutAttributes() with the new name, the name will be added as a second value to the existing attribute. To overwrite you need to add a replace-flag to an attribute (I´ll denote it with a "!" after the attribute name):

PutAttributes("contacts", "123", ["Name"!("Paul")])

Replacing an attribute like this deletes all (!) existing attribute values and replaces them with the new value.

A word of caution: Amazon´s SimpleDB is supposed to scale. That´s why they distribute it across many servers and need to replicate data all the time. That in turn means, it will take some time until changes you made by PutAttributes() and the other operations ripple through to all relevant servers. So don´t expect to see changes right after you applied them! Otherwise, if you issue a PutAttributes() followed right away by a GetAttributes() for the same data - this could run on a different thread - you might be in for a surprise.

Reading data

Reading items back from the SimpleDB "data space" is even easier than writing them. Just send the GetAttributes() command addressing an item in a domain and pass the names of the attributes to retrieve:

GetAttributes("contacts", "123", "Name")

will return ["Name"("Paul")]. Of course you can specify more attributes to be retrieved. And since you only state their name, they´ll be returned with all their values.

Item data can only be retrieved like this! Queries (see below) just return item names, but no attributes. Think of them as SQL statements like this:

select attributeName1, attributeName2, ... from domainName where itemName="..."

Looking up data thus always is a two step process: 1. Issue query and receive a list of matching items, 2. retrieve item´s attributes with an item name from the query result.

Deleting data

You can´t delete items explicitly. You can only delete attributes from them - and if none are left in the item, the item is deleted automatically.

DeleteAttributes("contacts", "123", "Addresses")

only deletes the references to the other items, but the contact item remains in the domain. You also need to delete its name attribute, plus, of course, the parentless addresses:

DeleteAttributes("contacts", "123", "Name")
DeleteAttributes("contacts", "a", "City", "Country")
DeleteAttributes("contacts", "b", "City", "Country")

Creating a domain

Working with domains as the containers for items is easy. You can create a domain at any time. Just call

CreateDomain("contacts")

and that´s it. Just pass in a unique domain name. From then on, you can use this domain name in item-operations.

Deleting a domain

Deleting a domain is as easy as creating it:

DeleteDomain("contacts")

The items and attributes in that domain will be gone then. But this might take up to 10 seconds, Amazon says, due to the distributed nature of SimpleDB.

Querying domains

If you want to get an overview of the domains in your SimpleDB "data space", just call ListDomains():

ListDomains(10, &nextToken)

It returns a list of domain names. This resultset is paged, though. The first parameter to ListDomains() specifies the size of these pages, e.g. 10 domain names per page, the second parameter is a token you can use to retrieve the next page.

Passing in a token to ListDomains() returns that page´s domain names and sets the token to the next page, if there is any.

nextToken = ""
domainNames = ListDomains(10, &nextToken)
// process first page of domain names
domainNames = ListDomains(10, &nextToken)
// process second page of domain names
...

Querying data

Finally, there is also a way to query items. SimpleDB sports a simple query language. You can think of the queries as the where-clause of a SQL select statement, e.g.

select itemName from domainName where simpleDB-query

Queries are limited to a single domain and return just item names as paged resultsets like ListDomains().

The building blocks of queries are predicates. A predicate is a logical expression made up of attribute comparisons, e.g.

['City' = 'Hamburg' OR 'City' = 'London']

Both attribute name and attribute value need to be put in single quotes. SimpleDB sports the usual comparison operators like =, != etc. and a STARTS-WITH which resembles the SQL like, e.g. like 'A%'.

['Name' STARTS-WITH 'A']

Remember, all comparisons are alphanumeric, since SimpleDB only stores texts.

The logical operators within predicates are AND, OR and NOT.

You may only query for a single attribute name with one predicate! ['City'='Hamburg' OR 'City'='London'] is ok, but not ['Name'='Peter' AND 'City'='London']!

To state queries on attributes with different names, you need to separate predicates for each:

['Name'='Peter'] INTERSECT ['City'='London']

The set-operations to combine the resultsets of each predicate into one are INTERSECT, UNION and NOT. INTERSECT calculates the common set of item names of two predicates, UNION merges the item name sets of two predicates. INTERSECTS thus works like the logical AND operator, UNION like the OR.

Why does Amazon deviate like this from the well established SQL way of defining queries? The reason probably lies with the internal structure of the SimpleDB "data space". Grouping the constraints on attributes with the same name probably makes query execution faster. Maybe SimpleDB is based on a column store?

EBNF SimpleDB query syntax

Query ::= ItemSetTerm { "UNION" ItemSetTerm }.

ItemSetTerm ::= ItemSetFactor { "INTERSECTION" ItemSetFactor }.
ItemSetFactor ::= [ "NOT" ] "[" PredicateExpression "]".

PredicateExpression ::= PredicateTerm { "OR" PredicateTerm }.
PredicateTerm ::= PredicateFactor { "AND" PredicateFactor }.
PredicateFactor ::= [ "NOT" ] PredicateComparison.
PredicateComparison ::= AttributeName ComparisonOperator AttributeValue.

AttributeName ::= Chars enclosed in single quotes, e.g. 'Name'.
                  All AttributeNames in a PredicateExpression need to be the same.
                  All quotes in AttributeName need to be properly escaped.
AttributeValue ::= Chars enclosed in single quotes, e.g. '003.14'.
                   All quotes in AttributeValue need to be properly escaped.

ComparisonOperator ::= "=" | "!=" | ">" | ">=" | "<" | "<=" | "STARTS-WITH".

What´s missing?

SimpleDB´s API is simple. That´s the beauty of it. A simple, dynamic data model plus a simple API sounds like a powerful combination for today´s fast moving software business.

But this simplicity comes at a price. Common operations like looking up data, are more cumbersome than with SQL. It´s a two step process due to SimpleDB´s queries returning just item names. Also currently transactions are missing completely.

Another aspect to get used to is the "eventual consistency" model, that means, changes take time to ripple through to all replicas of your data. Thus after a change there might be a short time where different clients might see the "data space" in a different state.

But overall, Amazon´s effort is very exciting nevertheless.

What´s next?

I deem SimpleDB even so exciting, that I wanted to be able to use it now and on my desktop. But there is no desktop/local version of SimpleDB and I don´t know when Amazon will grant me a test account of SimpleDB.

That´s why I sat down and developed my own Open Source version of SimpleDB: the .NET SimpleDB or NSimpleDB for short. I believe in the growing importance of tuple spaces in general and thus also am working with the University Vienna on bringing this paradigm to the hands of .NET developers. We call the basic technology "XVSM" for "eXtensible Virtual Shared Memory"; and it´s somewhat like SimpleDB. But on top we place more elaborate data structures so our space is not just partioned into domains but collections and other high level data structures. We envison them to allow for true "Space Based Collaboration" (SBC), which is in our view the foundation for "serverless real-time online collaboration". But I digress.

Back to SimpleDB: In my next posting I´ll show you, how you can use SimpleDB or the C# implementation of the SimpleDB API in your applications today without reliance on Amazon.

Have you heard about Amazon´s online "database service" SimpleDB? They describe it like this: "Amazon SimpleDB is a web service for running queries on structured data in real time." So it´s not a RDBMS, because Amazon does not call the data "relational", but just "structured". And you use a web service based API to access the data, not good old ADO.NET. Currently SimpleDB is in beta. You can get a test account to play around with it - if you´re patient. As of this writing (Jan 08) evaluation is limited; you need to apply and queue up to be assigned a test account. I have about 2 weeks ago, but haven´t heard from Amazon since then.

But why should you care? Well, SimpleDB would allow you to store data in a database without any setup costs. You don´t have to care about backup or moving to another ISP. You´re data, lots of data, can just stay with Amazon. Just add a web service proxy to you web (or desktop) application and off you go. This certainly make some (or a lot, or at least a growing number of) applications easier to implement.

Another reason to care about Amazon´s SimpleDB is its simplicity. In an age where dynamic programming becomes ever more popular and static whatever (e.g. typing, binding) loses value, making persistence more dynamic sure should look attractive. But exactly this is what Amazon´s SimpleDB is about: highly dynamic persistence of structured data.

SimpleDB data model

With SimpleDB you don´t define a database schema anymore. Your "data space" with Amazon is structured in a very simple way: it´s devided into sub-spaces called "domains" which each contain so called "items" which each contain so called "attributes". That´s it. And you can change the structure of this "data space" at any time. There is no distinction between meta data and data. Creating a domain (which resembles a table in a relational database) is a web service operation like storing an item in a domain.

image 

To make it very clear: You divide your "data space" into domains at your leisure. (Amazon currently just artificially limits the number of domains to 100.) And you stuff items of any structure into these domains. You never define a schema for a domain. The items stored in a domain don´t have to look the same. They can contain any number of attributes; all can differ in their number of attributes.

Attributes are name-value-pairs. So items are tuples of arbitrary aritiy. That means, SimpleDB is not a relational database, but a tuplespace. Just throw items/tuples into your SimpleDB instance at your leisure. That´s all their is to SimpleDB persistence. If you like, separate tuples into different domains - but if you do it or not does not make a big difference. For distinguishing between, say, customers and invoices that´s not necessary. It might even be contraproductive, since querying items is limited to one domain at a time. There is no such thing as a SQL Join.

The use of domains

So why are there domains at all? Probably they help Amazon to make replication of items between servers easier. And it might speed up queries if you distribute your data across domains. So think of domains as easy to set up data partitions in case you have to deal with huge amounts of data.

Multi-valued attributes

But not only don´t you have to define a schema for a domain and all items/tuples can have a different structure, there´s another deviation from relational thinking: Attributes can have multiple values! So items don´t even comply with the relational first normal form. See the "Phone" attribute in the following item:

 image

It´s not just several phone numbers separated by commas. No! The "Phone" attribute is really structured. You can retrieve (and query for) each phone number separatley. SimpleDB would return the item like this:

image

Think of what this means: Finally you can set up "natural references" between persistent data like in memory. A parent objekt points to its children. But when you persist these objects in a relational database, you usually invert the references. The child records will contain a foreign key to denote their parent record.

But with SimpleDB you can let parent items point to their child tuples:

image

See how all children are referenced with their id values from their parents? See how the number of attributes differ across the items? That´s all just fine with SimpleDB.

Just text

A drawback of SimpleDB might be its limitation to text. All attribute values are stored as just text. So comparison is alphanumeric and leads to effects like this: 20 > 100 because in fact the the comparison is "20" > "100". So be sure to take this into account when storing your non-text values like numbers or dates. Pad numbers with leading zeros (e.g. store "00012" instead of "12"), use a sortable date format (e.g. "2008-01-18"). If you expect to store negative numbers, move them into the positive range of numbers, e.g. instead of "-12" and "12" store "0" and "24" if you expect the value range to start from -12.

On the other hand SimpleDB in this regard does not differ from XML. Text simply is the least common denominator for storing data. Also, this makes SimpleDB more efficient, since it can be optimized for handling text (e.g. in terms of indices).

What´s next?

That´s pretty much all there is to say about SimpleDB´s data model. It´s simple. It´s dynamic.

In my next posting I´ll introduce you to the SimpleDB API. It´s simple, too. Just a couple of easy operations.

But if you want to move forward more quickly, have a look at the SimpleDB documentation on the Amazon website. You can also try out my Open Source implementation of the SimpleDB data model and API. It´s called NSimpleDB and is hosted with Google. More about this too in a future posting.

When writing more complex code you cannot really step through during debugging, it´s helpful to put stud it with statements tracing the execution flow. The .NET Framework provides for this purpose the System.Diagnostics namespace. But whenever I just quickly wanted to use it, it turned out to be a hassle to get the tracing running properly. That´s why I wrote down the following, to make it easier next time.

How to instrument the code?

In the code set up different System.Diagnostics.TraceSource objects. For each area to watch define a trace source name, e.g. "BusinessLogic" or "Validation" or "TextFileAdapter". Each such trace source later can be switched on/off separatedly.

using System.Diagnostics;

...

TraceSource ts;

ts = new TraceSource("HelloWorld", SourceLevels.All);

A TraceSource object can be instantiated for just one method or can be kept around for a long time as a global (or even static) reference.

To trace the execution use one of the TraceXYZ() methods of TraceSource. Most of the times one of the following will do. TraceInformation() writes an informational message, i.e. with TraceEventType Information:

ts.TraceInformation("# of records processed: {0}", n);

That´s the same as this:

ts.TraceEvent(TraceEventType.Information, 10, "# of records processed: {0}", n);

But TraceEvent() can do more. With it you can issue message on different levels, e.g. just informational messages, warnings, error messages. The id (10 above) will show up in the event log in its own column ("Event").

Since tracing can sometimes produce an overwhelming amount of messages, you can filter them. For example you can restrict the output of messages to errors only. But you should not do so in your code. That´s why you should pass SourceLevels.All to the TraceSource ctor. (Without setting a source level explicitly, the default is Off, so you´d see no messages at all!)

However, if you want to limit the tracing imperatively to certain levels of messages, you can do so by passing a combination of levels to the ctor, e.g.

ts = new TraceSource("HelloWorld", SourceLevels.Critical | SourceLevels.ActivityTracing);

How to attach tracing to different message sinks?

Tracing messages are written to any sink attached to the tracing infrastructure. A sink can be a text file or the console or an event log. Each sink is represented by a TraceListener object. A trace source can have any number of listeners listening for messages and write them to its sink.

Listeners can be attached to trace sources imperatively in your code - or using the App.Config. I prefer the App.Config, because it allows to you add and remove listeners without touching your code.

But as long as you´re satisfied with tracing messages being sent to the debug output window of VS.NET, you don´t need an App.Config at all. There is a default listener attached to each trace source.

If you want to direct the message to other sinks, though, or filter them, then you need to tweak the App.Config. Here is the most simple trace source defined in the App.Config - but its sink still is the debug output.

<configuration>

  <system.diagnostics>

    <sources>

      <source name="HelloWorld">

      </source>

    </sources>

  </system.diagnostics>

</configuration>

The name needs to match the name passed to a TraceSource object in your code.

Now, you can add listeners to this source element to have the source´s tracing messages sent to different sinks:

<source name="HelloWorld">

  <listeners>

    <add name="consoleListener"

        type="System.Diagnostics.ConsoleTraceListener">

    </add>

  </listeners>

</source>

This listener´s sink is the console window. By adding it to the source all tracing messages are sent to the console window - plus the debug output window. The default listener is still attached to the source. If you want messages to go to just the sinks you define, remove the default listener:

<source name="HelloWorld">

  <listeners>

    <remove name="Default"/>

 

    <add name="consoleListener"

        type="System.Diagnostics.ConsoleTraceListener">

    </add>

  </listeners>

</source>

Now you can add any number and kind of TraceListener object as within your reach. Here are the ones provided by the .NET Framework:

<system.diagnostics>

  <sources>

    <source name="HelloWorld">

      <listeners>

        <remove name="Default"/>

 

        <add name="consoleListener"

            type="System.Diagnostics.ConsoleTraceListener"/>

 

        <add name="eventlogListener"

            type="System.Diagnostics.EventLogTraceListener"

            initializeData="Application"/>

 

        <add name="textfileListener"

            type="System.Diagnostics.TextWriterTraceListener"

            initializeData="helloworld.log" />

 

        <add name="xmlfileListener"

            type="System.Diagnostics.XmlWriterTraceListener"

            initializeData="helloworld.xml" />

 

        <add name="defaultListener"

            type="System.Diagnostics.DefaultTraceListener"/>

      </listeners>

    </source>

  </sources>

</system.diagnostics>

What the sinks of the listeners are, should be pretty obvious from their names. The DefaultTraceListener is the one writing to debug output, though.

Three of the listeners need to be primed with additional data specifying the message sink. Pass a filename to the XmlWriterTraceListener and the TextWriterTraceListener. Pass an event log name to the EventLogTraceListener. Use the attribute initializeData for these parameters.

Please note: The TextWriterTraceListener works best, if you set the autoflush attribute of the <Trace/> element to true:

<system.diagnostics>