Spans in ObjectSpaces are not enough - Proposal for sparse population of persistent objects

Friday, March 19, 2004

Matt Warren has provided a look behind the scenes of how features of ObjectSpaces (OS) come into existence in his blog entry "ObjectSpaces: Spanning the Matrix". The entry plus the comments are an interesting read in that they show, how technology features are dependent on single people who advocate them, and how Microsoft watches the market of competive products and needs of developers. Good to know, in the end it´s all just humans at Microsoft :-)

Concerning the feature in question - spans in OS - I of course like them. Passing span information to a query is a very good idea. But they only describe, how deep you dig into the graph of objects for a given use case.

What is lacking but I also find necessary is, to describe how much data should be loaded for each object in the graph retrieved!

Example:

Customers -> Orders -> Items

Scenario 1: List Customers and Orders.
Scenario 2: Edit Order.

The object graph is static. The relationships between the tables/classes do not change.

But the usage of the classes is different.

In scenario 1 I´d like to load maybe only Customers(id, name, city, zip, contactname) and Orders(id, customerid, orderno, amount).

In scenario 2, though, I´d need to load Orders(id, customerid, orderno, amount, order date, shipping date, shipping info, etc.) and Items(id, orderid, qty, price, description, etc.), Customers(id, name, city).

OS' spans solve the problem of defining whether to load Orders or Items at all when running a query.

But they don´t solve the problem of different needs of data population of each object. OS only offers all or nothing with delayed loading. That´s not enough!

In scenario 1 I neither want a "hollow object" for each Order, nor do I want the complete object. I just want enough properties populated to be able to show a list of orders (and customers) without additional database roundtrips.

This feature I haven´t found in any O/R mapping tool yet. And i don´t know why. (But I´m open to enlightenment from any O/R mapping tool manufacturer.)

It seems so obvious to me. I´ve implemented it once in my own O/R mapping tool back in 2000 with ADO - and it was very (!) convenient to use. Unfortunately, since then I had not the time to redo it in the .net world :-(

So my suggestion for OS would be: Allow for groups of persistent object properties. Here´s some pseudo code:

Class Customer
    id (*)
    companyname (*)
    city (list, edit, phonelist)
    zip (list)
    phone (edit, phonelist)
    orders
End Class

Class Order
    id (*)
    customerid (*)
    orderno (list, edit)
    orderdate (edit)
    items
End Class

Fields "id" and "companyname" are always retrieved. "city" when fields of group "list", "phonelist" or "edit" are requested. "zip" only for group "list". "phone" only for "edit" and "phonelist".

When retrieving objects you could annotate the spans with group info, e.g.

GetObjectSet(gettype(Customer), "companyname like 'a%'", "list", "orders(list)")

Each level in the object graph retrieved would then contain enough information for the current use case, or at least maybe 95% of a use case. Each object would only be sparsely populated. But if, while accessing the objects in the graph, code wants to read a property not retrieved, OS transparently could go back to load missing data.

Advantages of field groups:

-no roundtrips: all data needed for most usual processing within the context of a use case is present. no additional roundtrips needed - most of the time.
-dynamic: data needed is specified at runtime where it´s needed.
-transparent: if data is missing, it is transparently retrieved by OS. Performance would depend how well field groups are designed and used.

Disadvantage: Even though I think this feature is necessary, I doubt, that OS can implement it easily. It would require that an object can check on property/field access, if the data has beend loaded - and if not, go back to the database and get it. This would require field access interception. And that would violate a premise of OS: Any class can be made persistent. Field access interception would mean IL code enhancement or at least property methods.

But then: You can´t have the cake and eat it. If convenience and performance are important, maybe this OS premise should be dropped? Is it important to be able to make any class persistent? I don´t think so - as I have stated earlier.. I deem it more important to have an easy to use programming model.

Microsoft might have thought: "Hey, there are so many relational databases out there. And, hey, people have defined so many classes to represent database entities in their software. To servce them well, we need to provide them with a mapping tool between the existing data models: persistent data model, OO data model."

Sounds plausible to me - from Microsoft´s point of view. But then, the real world is different, I guess. Developers using Microsoft technologies have come up with sophisticated object models much less often, than their colleagues in the Java world. For several reasons. Two being: Microsofts long standing advocacy of data binding to generic container data structures (e.g. ADODB.Recordset and ADO.NET DataSet), and VBs lack of OO concepts for years.

So I´d say, Microsoft´s vision of OS is an answer to a non existing problem. Of course there are huge amounts of existing relational DBs. But there are not that many existing object models for them. Hence, there is not need for a mapping like OS offers. And hence persistent classes could be defined in any way - that makes using them in the end easier.

Which again brings me to graphical modelling tools or domain specific languages for defining persistent classes. With them, implementing field groups would be no problem, because no existing classes needed to be kept and served.

What you've seen so far in objectspaces, with regards to span has been a far cry from the actual stuff that I designed earlier on. Spans were part of the whole query experience. With them you could specify which related objects were brought back during the query (orders w/ customers), the exact set of those objects (which orders per customer), and which properties on either object that were actually populated with data. Which is nearly what you are describing.

Of course, what you'd probably rather have is full-on projection into unnamed tuples (or rows) of data, each field potentially containing collections of other objects/rows, etc. Or at least projection into secondary types that you could custom design for a particular app scenario. I did all of this for the X#/Xen research language nearly two years ago, but I guess I'm ahead of my self there.

Matt Warren - Friday, March 19, 2004 1:45:00 PM

"What you've seen so far in objectspaces, with regards to span has been a far cry from the actual stuff that I designed earlier on. Spans were part of the whole query experience. With them you could specify which related objects were brought back during the query (orders w/ customers), the exact set of those objects (which orders per customer), and which properties on either object that were actually populated with data. Which is nearly what you are describing."

But why isn't it part of objectspaces now? It requires severe changes of the query engine if you want to add this later on. I also don't think it will perform very well.

Frans Bouma - Friday, March 19, 2004 3:20:00 PM

Unless you are working with large image fields or something, I would really question the benefit of not returning one or two columns. Theoretically, it might be slightly more performantbut you would have to be pushing an extremely high volume of requests for that to be much of an issue. This just isn't an issue for 99% of the apps being targeted for objectspaces usage.

Jesse Ezell - Friday, March 19, 2004 8:28:00 PM

Like Matt said, its called a projection. Many O/R mapping vendors do it (look at the Java world). Well, what Ralfs describes is not really a projection but they are designed to solve the same problem. Ralfs dynamic proxy idea would be too chatty when following the alternate course of lazy loading properties. Instead, you must define a class for the specific application scenario, so you are not using the Customer or Order objects like in Ralfs example. AFAIK, the list returned from the query using a projection would be read only, not persistable like Ralfs idea.

Dave Foderick - Saturday, March 20, 2004 3:45:00 PM

If you were smart enough to carry identity into your projections (wouldn't work for all scenarios) then you could describe a persistence operation that would update changes to your projection back into the database.

What I would do is go ahead and define an O/R mapper as a virtualized O-R-database, so you'd have explicit set-based update/insert/delete commands that you'd submit directly as opposed to any implicit behavior built into the objects themselves for persistence. If you had this as a base, you could build automatic update behaviors into your objects on top of this facility.

Matt Warren - Tuesday, March 23, 2004 4:26:00 AM

I've worked with property groups in my own o/r implementations. They can work well, but they can also be abused when your business rules get a few layers deep and there's no longer any obvious way to declare from the top layer that property X must be around when the objects are loaded by the bottom layer.

That can be addressed, it's just that life is "easier" in a world of complex business rules if all properties are there.

I guess that's okay - it's just better IMHO if you normally load all properties, and can throw the "frankenstein switch" in your query to exclude certain groups (or conversely only read requested groups) when you know it's okay to omit them.

Of course it might always be wise to NOT load big binary blobs unless you explicitly ask for them...

*sigh*

David Goldstein - Wednesday, March 31, 2004 5:34:00 AM

This sounds too complicated, and you can have it easier. I call the solution "virtual objects". You might also call it "transparent activation". It works like virtual memory and makes (programming) life really easy.

If you want to know the details, send me an email.

Kind Regards

Martin Rösch (Roesch)

Martin R&#246;sch - Monday, July 12, 2004 12:57:00 PM

7 Comments