Spans in ObjectSpaces are not enough - Proposal for sparse population of persistent objects
Matt Warren has provided a look behind the scenes of how features of ObjectSpaces (OS) come into existence in his blog entry "ObjectSpaces: Spanning the Matrix". The entry plus the comments are an interesting read in that they show, how technology features are dependent on single people who advocate them, and how Microsoft watches the market of competive products and needs of developers. Good to know, in the end it´s all just humans at Microsoft :-)
Concerning the feature in question - spans in OS - I of course like them. Passing span information to a query is a very good idea. But they only describe, how deep you dig into the graph of objects for a given use case.
What is lacking but I also find necessary is, to describe how much data should be loaded for each object in the graph retrieved!
Example:
Customers -> Orders -> Items
Scenario 1: List Customers and Orders.
Scenario 2: Edit Order.
The object graph is static. The relationships between the tables/classes do not change.
But the usage of the classes is different.
In scenario 1 I´d like to load maybe only Customers(id, name, city, zip, contactname) and Orders(id, customerid, orderno, amount).
In scenario 2, though, I´d need to load Orders(id, customerid, orderno, amount, order date, shipping date, shipping info, etc.) and Items(id, orderid, qty, price, description, etc.), Customers(id, name, city).
OS' spans solve the problem of defining whether to load Orders or Items at all when running a query.
But they don´t solve the problem of different needs of data population of each object. OS only offers all or nothing with delayed loading. That´s not enough!
In scenario 1 I neither want a "hollow object" for each Order, nor do I want the complete object. I just want enough properties populated to be able to show a list of orders (and customers) without additional database roundtrips.
This feature I haven´t found in any O/R mapping tool yet. And i don´t know why. (But I´m open to enlightenment from any O/R mapping tool manufacturer.)
It seems so obvious to me. I´ve implemented it once in my own O/R mapping tool back in 2000 with ADO - and it was very (!) convenient to use. Unfortunately, since then I had not the time to redo it in the .net world :-(
So my suggestion for OS would be: Allow for groups of persistent object properties. Here´s some pseudo code:
Class Customer
id (*)
companyname (*)
city (list, edit, phonelist)
zip (list)
phone (edit, phonelist)
orders
End Class
Class Order
id (*)
customerid (*)
orderno (list, edit)
orderdate (edit)
items
End Class
Fields "id" and "companyname" are always retrieved. "city" when fields of group "list", "phonelist" or "edit" are requested. "zip" only for group "list". "phone" only for "edit" and "phonelist".
When retrieving objects you could annotate the spans with group info, e.g.
GetObjectSet(gettype(Customer), "companyname like 'a%'", "list", "orders(list)")
Each level in the object graph retrieved would then contain enough information for the current use case, or at least maybe 95% of a use case. Each object would only be sparsely populated. But if, while accessing the objects in the graph, code wants to read a property not retrieved, OS transparently could go back to load missing data.
Advantages of field groups:
-no roundtrips: all data needed for most usual processing within the context of a use case is present. no additional roundtrips needed - most of the time.
-dynamic: data needed is specified at runtime where it´s needed.
-transparent: if data is missing, it is transparently retrieved by OS. Performance would depend how well field groups are designed and used.
Disadvantage: Even though I think this feature is necessary, I doubt, that OS can implement it easily. It would require that an object can check on property/field access, if the data has beend loaded - and if not, go back to the database and get it. This would require field access interception. And that would violate a premise of OS: Any class can be made persistent. Field access interception would mean IL code enhancement or at least property methods.
But then: You can´t have the cake and eat it. If convenience and performance are important, maybe this OS premise should be dropped? Is it important to be able to make any class persistent? I don´t think so - as I have stated earlier.. I deem it more important to have an easy to use programming model.
Microsoft might have thought: "Hey, there are so many relational databases out there. And, hey, people have defined so many classes to represent database entities in their software. To servce them well, we need to provide them with a mapping tool between the existing data models: persistent data model, OO data model."
Sounds plausible to me - from Microsoft´s point of view. But then, the real world is different, I guess. Developers using Microsoft technologies have come up with sophisticated object models much less often, than their colleagues in the Java world. For several reasons. Two being: Microsofts long standing advocacy of data binding to generic container data structures (e.g. ADODB.Recordset and ADO.NET DataSet), and VBs lack of OO concepts for years.
So I´d say, Microsoft´s vision of OS is an answer to a non existing problem. Of course there are huge amounts of existing relational DBs. But there are not that many existing object models for them. Hence, there is not need for a mapping like OS offers. And hence persistent classes could be defined in any way - that makes using them in the end easier.
Which again brings me to graphical modelling tools or domain specific languages for defining persistent classes. With them, implementing field groups would be no problem, because no existing classes needed to be kept and served.