ObjectSpaces: Projection or sparse Objects?
My previous blog entry raised some comments I´d like to respond to.
Am I proposing just an ordinary projection feature for ObjectSpaces (OS)? I don´t think so, because with a projection in SQL (or relational databases), you get different results when you issue select colA, colB from myTable vs select colA from myTable. Hence you need a generic data structure like a cursor (e.g. a ADODB.RecordSet) or a ADO.NET DataTable/DataRow to accomodate for the different number of columns.
With OS (or any O/R-mapping (ORM) tool), though, generic data structures are less important. That´s the purpose of ORM. With ORM you define a persistent class like
class Customer
{
public string id;
public string name;
public string city;
public string zip;
}
and hopefully there is also a corresponding typed collection for each persistent class, e.g.
class CustomerCollection : IList
{
...
public Customer item(int index) {
get {
...
}
}
...
}
More you don´t want. For a given entity (e.g. customers in some database table) you don´t want to define more than one persistent class.
Then, when you query for customers, you define a selection of entities by formulating a condition on their fields/columns, e.g.
select ... from tbCustomers where name like 'a%'
The collection class represents the result of this operation.
But also, you define how much data to load for each entity. O/R-mapping tools usually support at least two modes: The default is ("full mode"), you load all columns of a table and create a full blown persistent object for each entity returned:
select * from tbCustomers ...
Or you specify to only load "hollow object"s ("hollow mode"), i.e. only IDs of entities matching the query criteria:
select id from tbCustomers ...
Object creation then is delayed until you actually access an object (thru the collection class). Of course, then a specific query has to be issued to load the data:
select * from tbCustomers where id="..."
"Hollow objects" reduce the memory footprint and the amount of data transferred initially by the query, but cause additional roundtrips later on.
From the outside, though, what is returned from a query are always fully populated persistent objects. The delayed loading of "hollow objects" is transparent to client code of the ORM API.
Now, what I´m proposing is a third mode ("sparse mode") of retrieving persistent objects. I propose to be able to load sparsely populated objects by issuing a query specifying a subset of columns/fields to be returned for each entity, e.g.
select id, name from tbCustomers ...
From this data persistent objects are created, but of course not all their fields can be populated. But from the outside, when getting a persistent object from a collection, it still is an object of the persistent class with all its fields/properties - like with "hollow object"s. There is no perceivable difference between an object loaded in first or second or third mode.
And that´s the reason why I rather would not call it projections what I´m proposing - although a SQL projection query is underlying this third mode.
I´d call it "sparsely populating persistent objects", because there still is just one persistent class for a persistent entity, but which sometimes is populated with more column data from the database, sometimes with less.
These kind of "sparse objects" cause less memory footprint than objects loaded in "full mode" or "hollow mode", because in both modes, all columns' data is loaded, before you can access any field/property of a persistent object. Both modes only differ with regard to the time, when the data is loaded. In "sparse mode", though, maybe more than the loaded columns are never needed. "Sparse mode" thus combines the small number of roundtrips of "full mode" with a much smaller mem. consumption.
Like "hollow objects", though, "sparse objects" always look fully populated, because when you access a property, whose data has not been loaded by the initial query, the missing data is fetched (preferrably all missing columns). The query could look like this when accessing the city field/property for the first time
select city from tbCustomers where id="..."
and only load the missing column or columns. Or it could look like this:
select city, zip from tbCustomers where id="..."
and load all columns missing so far. Or it could look like this:
select * from tbCustomers where id="..."
to refresh the object´s data.
Of course this is an additional roundtrip to the database, but then it´s hopefully only rarely necessary, because the fields/properties most often needed in a particular context are specified with the query.
The "sparse mode" thus combines small mem. footprint (thru only needed fields/properties populated) with high performance (thru rare roundtrips) and full read/write access to all fields/properties when needed (thru transparent delayed loading of missing columns).
On the outside it´s still just one persistent class. But on the inside it´s more flexibility than with just "full mode" or "hollow mode". With "sparse mode", "full mode" as well as "hollow mode" are only special cases. "Full mode" loads all columns and never needs an additional roundtrip to the database, "hollow mode" always causes a roundtrip for each object.
When to use which mode? Use "full mode" in editing scenarios, where a persistent object maybe is displayed for modification. Use "sparse mode" in read-only or read-mostly scenarios (but also, when just a couple of fields/properties need to be edited).
Never use "hollow mode"! There is no use for it. Or maybe there is? It´s not important, since "hollow mode" comes free once the "sparse mode" is implemented.
The implementation of "sparse mode", however, requires access to missing fields/properties can be detected. Thus it requires all persistent fields to be encapsulated by property methods for access interception. This is obviously not necessary for "full mode", but also not for "hollow mode". In "hollow mode", only proxies are loaded initially, which cause the (transparent) delayed load of the full object.
Now, since property methods are needed for "sparse mode", ObjectSpaces cannot currently support it. To require properties is against its vision of making an arbitrary class persistent.
But then, as you might know by now or have guessed: To be able to provide object persistence for any class is a lofty goal - and in my view not important to reach for many, many scenarios. Many developers could live without it, and would be happy to define their persistent classes in some special ways (e.g. by deriving from a persistent base class, annotate them with attributes, or model them with a tool or language). Developers are mostly not concerned with the purity or generality of a solution. That is not to say, that sometimes the very general approach of OS is just what a project needs. But at least the companies I´m talking to don´t need such generality.