InfoQ posted a very interesting video from the ‘Democratizing the Cloud’ presentation by Erik Meijer in QCon.
Democratizing the cloud means “Make easier to program distributed applications”. He wants us to build a single tier app and publish it in multiple tiers.
He describes an IL to JavaScript compiler (like Script#) and a MapReduce implementation for LinQ. You can then write C# code and later decide to run it in the browser, or specify a query that today executes over a SQL database but when your application is successful and is used by millions of users, scale it over a cluster of distributed applications servers using MapReduce.
Of course that we’d love to have that.
The only element that I don’t find consistent with his vision is the ‘refactoring’ he needs to split the applications in tiers. Up to then, all the magic was performed by the compilers and runtimes, but for this one we need different code, so once the decision is made, it cannot be changed. He talks about ‘making irreversible decisions at the last responsible moment’. The problem is to agree when that moment is ;). Things like ‘code webservices as if they are stateful’ probably are a heresy for some people.
This kind of functionality has only been provided by code generation tools, where you generate multiple tiers or single tier applications from the same specification, but never at the language level.
He also has some interesting opinions on SQL and DSLs.
I've started reading a lot about UI in the last couple of months. During the Office 2007 beta process I run into some blog posts from Jensen Harris that I found interesting but I never followed his blog.
Now I found a post that aggregates all the content about the Office 2007 UI, and it's a fascinating read.
I had an ambivalent relationship with the Ribbon but after reading about it I love it ;).
Pat Helland has been talking about immutable data for a while. He last post on 'Normalization is for sissies' is quite fun. A not-very-accurate post from Dare made me remember about it and pushed me to post this.
Pat is playing with two ideas.
One is that immutable data should not be normalized as normalization is designed to help you dealing with updates.
Another is that you actually don't need to delete/update the database. 'Deleting' a row means setting a InvalidationTimestamp = now(), and updating a row means setting InvalidationTimestamp = now() and inserting a new row with SinceTimestamp = now() and InvalidationTimestamp = null (you actually need two sets of dates, but that's for another post).
Now, if you put the two ideas together, all the data is immutable, so you don't need to normalize anything. This means you will have a record that have all the 'extended table': the 'base table' and all the fields from related tables in your normalized model. If you have Orders, Customers, Countries, your tables will look like
Order: OrderId, OrderDate, CustomerId, CustomerName, CountryId, CountryName, SinceTimeStamp, InvalidationTimeStamp
Customer: CustomerId, CustomerName, CountryId, CountryName, SinceTimeStamp, InvalidationTimeStamp
Country: CountryId, CountryName, SinceTimeStamp, InvalidationTimeStamp
You will be wasting a lot of disk space, but that's not something to worry about. The advantages of this approach are very important. You don't need to join, and you can cache/replicate most of your data.
The main physical issue I find today with this approach is that database engines have a limit in the number of columns they can store, and an approach like this one will require a large number of columns per table.
I wonder how a model like this will impact O/R mapping tools.
They can probably hide this kind of schema automatically by changing the semantics of delete/update, writing Order.Customer.Name should return the name in the Order row.
How would they handle object identity? Now if I have 'Customer #1' in memory, every reference to Customer #1 points to the same instance. This is because the object model is normalized. Now they should point to different read/only instances.