Fabrice's weblog

Tools and Source

News

My .NET Toolbox
An error occured. See the script errors signaled by your web browser.
No tools selected yet
.NET tools by SharpToolbox.com

Read sample chapters or buy LINQ in Action now!
Our LINQ book is also available on AMAZON

.NET jobs

Emplois .NET

transatlantys hot news

Contact

Me

Others

Selected content

Choosing an object-relational mapping tool

My latest article is now available in English: "Choosing an object-relational mapping tool".

When developing an application, it is almost unavoidable to write data access and data manipulation code. Many object-relational mapping tools are available to help. Choosing the right one is important.

This article presents the criteria for selecting an object-relational mapping tool.


Version originale en français

Comments

Frans Bouma said:

Some comments:
"Be able to use inheritance, create hierarchies between entities, and use polymorphism (we are using objects!)."
Often you don't need this. Especially if you have a solid relational model. The advantages of inheritance in code are not always applicable to inheritance of entities. Manager <- Executive. Executive has 1 extra field: CompanyCar. You want to promote a loaded manager object. How? You can't cast. Copy data? perhaps. It gets messy in that scenario.

"Cache dynamically generated queries, so that they don't get rebuilt at each call."
This is not true. Almost no O/R mapper caches queries. The reason for that is that you have to find back a cached query based on criteria: this requires an extensive storing mechanism, and the management alone kills off any advantage you might get. The only queries you probably can cache are fetchentitybyPK, deleteentitybypk. All others require runtime examination: not every insert query for entity type E is the same: it only has to insert the fields which are changed for example.

"# Cache some data to avoid too many calls to the data source."
Also false. The only data retrievable from the cache without going to the db is fetchbypk for a single entity. A random select for multiple entities always has to consult the db as you never know if all entities are in the cache. So you have to read the resultset, and with that update the cached entities. This is actually SLOWER. The main point of a cache is thus not performance, but uniqueing.

if you want to save performance, cache data once fetched yourself, like a collection of country entities for example.

"Bulk updates or deletions. When we want to update or delete thousands of records at a time, it's not possible to load all the objects in memory, while this can be easily and quickly done with a SQL query (DELETE FROM Customer WHERE Balance < 0)."
We're the only one supporting this. The main reason that no other o/r mapper does this is that it destroys any inner table an O/R mapper will have for uniquing, identity map and cache: you can't know which entities delete from table where x=1 will delete, it just runs on a set. The set definition itself, required for the cache maintenance, is not available.

I must say that this contradicts with earlier statements you made in the list.

"Optimized queries (update only the modified columns; detect situations where the number of executed queries can be reduced; ...)"
This conflicts with your cached query requirement.

"Possibility to move to a new mapping tool (what would it imply? At what cost?)"
You can forget compatibility between O/R mappers on .NET. Most incompatibilities are found in the query language.

"Freedom in the design of the classes (no base class for the entities; no mandatory interface; no specific class for collections). Think POJO (Plain Old Java Object)."
This conflicts with your databinding and other requirements. If you go the Pojo/poco route, forget databinding and other nice features, unless you write them yourself. Have fun implementing ITypedList and friends :)

"Support for stored procedures. The advantages of stored procedures compared to dynamic SQL queries make for a hot debate, but it is better to have the choice."
Only for calls, not for entity persistence. Using procs for entity persistence is a true nightmare: you have to maintain the proc interface when the entity changes/table changes etc. This is unmaintainable in an enterprise system (250+ tables or so)

"Filtering objects in memory (without having new queries executed on the database)"
Often requested by people who don't think about the consequences. Filtering on a set is costly. You easily run into an expensive operation, especially if filtering occurs more than once on the same data. Indexes help in this. However with dupes in the data, a hashtable is not that easy to use (you then need overflow buckets in the value, which is not that fast also).

The main reason this is costly, is that if you have a 100 table system, with 10 fields average, you're looking at 1000 indexes, minimum. When do you set these up? Not when you load a set of entities, because setting up 10 indexes on the set loaded (on average) slows down performance and perhaps you don't use them. So creating the index when you first query is ok. But this is then hurting the query.

Also, because you're re-doing the query engine of a good RDBMS if you're doing the filtering right, you're not only wasting time, but also run the risk of doing a lesser job and with bugs. It's not automatically true that filtering in memory is fast and saves time. On the contrary.
# February 21, 2005 3:45 PM

Frans Bouma said:

"Mapping tools use reflection, which is slower than the compiled code produced by code generators"
This is not always true, some use code generation ( :)) others use IL code generation. Also reflection can be slow, but you can cache the reflection results, keeping the dip low.

Earlier on I said:
"We're the only one supporting this. The main reason that no other o/r mapper does this is that it destroys any inner table an O/R mapper will have for uniquing, identity map and cache: you can't know which entities delete from table where x=1 will delete, it just runs on a set. The set definition itself, required for the cache maintenance, is not available."
We support this because we have defined the habitat of the entity differently and work completely disconnected, without a context/session which holds references to objects, which frees us from the out-of-sync problem when you delete some rows from a table and you still keep entities in cache like there was no delete at all.. :) (just to clear something up. )

I must say, long list you put up! :) What I missed a bit is the relation with how people look at data-access in the windows/.net world. It's written IMHO more from a Java pov, which doesn't really match with how the average .net/windows developer looks at data-access.
# February 21, 2005 3:56 PM

Fabrice said:

Frans, thanks for the long and detailed comments!
You certainly have a deeper knownledge on the subject than me... for some reason ;-)
I'll try tro digest everything you wrote and eventually update the article.
# February 22, 2005 10:35 AM

Alex Yakunin (X-tensive.com) said:

I want to comment some of Frans's comments :) - generally I agree with almost all of them, nevertheless there are some that seem to be wrong (or Frans has misunderstood the author):

"Be able to use inheritance, create hierarchies between entities, and use polymorphism (we are using objects!)."
> Often you don't need this. Especially if you have a solid relational model.
It’s actually nearly the same as to say: “You can always use procedures rather then classes and methods”. I.e. you can always avoid use of OOP, if necessary, but nevertheless you prefer to use it (you’re finally a .NET developer :) ). Inheritance plays a key role in OOP, thus its support in ORM tool is actually very important.

"Cache dynamically generated queries, so that they don't get rebuilt at each call."
May be author means translation of e.g. OQL queries to SQL equivalents? In this case caching means that query doesn’t get recompiled on subsequent creation of the object that really represents it (e.g. Query object).

"Cache some data to avoid too many calls to the data source."
> “The only data retrievable from the cache without going to the db is fetchbypk for a single entity.”
E.g. DataObjects.NET caches almost anything, and properly synchronizes cached data with the database contents (VersionID column allows to achieve this). There is even a mode allowing to run any query in “fetch on demand” mode – only object IDs and VersionIDs are actually fetched by a query, but the result is transparently processed, and objects with corresponding IDs contained in caches are validated (for version compatibility). Any object having “wrong” VersionID will be transparently refetched from the storage on attempt to access its properties; any object having correct VersionID won’t require additional queries on access. So such query mode allows to significantly reduce the amount of data transferred from RDBMS (if cache works efficiently – that generally is true for the most part of cases).

"Optimized queries (update only the modified columns; detect situations where the number of executed queries can be reduced; ...)"
> This conflicts with your cached query requirement.
In case with my “query caching” explanation – no, so actually it’s possible to have update sequence optimizer as well as query cache.
# February 22, 2005 5:35 PM

Frans Bouma said:

"Inheritance plays a key role in OOP, thus its support in ORM tool is actually very important. "
True, but there is a difference between supertype/subtype hierarchies in entities and superclass/subclass hierarchies in code. The main point is that data can be easily migrated between types, objects can't. This can cause problems in code, for example in my manager-executive hierarchy.

Inheritance support for extending entity objects to add behavior: ok, but that can be done in code without bothering the O/R mapper (IMHO).

- I agree on the query caching argument you have, I didn't think of that.

- about your caching explanation: if I fire a random query, it always has to access the db to know if it doesn't miss any object and you say it fetches version + ID only, am I correct? (you have to do this, for example if the app is used on 2 desktop boxes targeting the same machine). Ok, it then determines it needs to fetch 1001 objects of the 1200 to fetch, as these aren't in the cache. You then have to requery the db with an IN (...) query, with 1001 parameters, which will break on Oracle and will be pretty slow.

The other way is to refetch simply the 1200 objects and drop the 199 already in the cache. Nevertheless, you need to fetch 2 times the data (in the RDBMS, once it sends the ID's and versions along), and update your cache as well.

Next time you again need the 1200 objects, you fetch again teh id's and versions, match them, you see nothing is changed, so you grab the objects from the cache, correct? Ok, in that case you save a bit of data to be read, agreed.

On the expense of having to have a version column and 1 field PK's. But you already have to have that anyway with your solution.
# February 23, 2005 11:00 AM

Robert Benson said:

I like the O/R paper below. You might want to add it to your list.

Robert

http://www.ksc.com/articles/patternlanguage.htm
# February 23, 2005 8:07 PM

Ashith Raj said:

Its a Detailed list of criteria to select a ORM Tool for .NET.

But Most of the Tools do not meet all of it.

I personally feel that Nhibernate, Base4.NET, are evolving in a better directions, as open source.

But Genome ORM is pretty good and they give a lot of support.

I feel the Open Source ORM should catch up with .NET 2.0 and should have a GUI Mapper, should be able to generate the Entity Classes from the Schema Definitions.
# February 26, 2005 11:37 PM

Fabrice said:

> Its a Detailed list of criteria to select a ORM Tool for .NET.
> But Most of the Tools do not meet all of it.

Who needs a tool that does everything plus coffee?!
That's not the point.
# February 27, 2005 3:46 PM

Robert van Poelgeest said:

"Be able to use inheritance, create hierarchies between entities, and use polymorphism (we are using objects!)."
'Often you don't need this. Especially if you have a solid relational model. You want to promote a loaded manager object. How? You can't cast. Copy data? perhaps. It gets messy in that scenario.'
I agree with the last bit and having to promote to an Executive might be a pain. But I disagree with the first bit because it depends on the rules you have implemented in the Business model. Reading a base class (Manager) and getting both Managers and Executives can be a live saver if the business demands a different calculation for the bonus these guys get.

"Bulk updates or deletions. When we want to update or delete thousands of records at a time, it's not possible to load all the objects in memory, while this can be easily and quickly done with a SQL query (DELETE FROM Customer WHERE Balance < 0)."
Is this really want you want? What if deleting a customer also invokes other businessrules? Obviously it is good to have a choice in the case where there are no rules.
# March 2, 2005 3:29 PM

Robert van Poelgeest said:

On Lazy Loading: I would like to add that being able to specify that for a particular relation is even better.

On cascade updates in combination with the next point (Bulk deletions)-> In order to implement this properly the tool needs to know the cases that are handled by the backend. Usually this does not involve setting the parent relation to some default parent, but let's assume this is required. If you have a model where A 1 contains (aggregate) n Bs and also has a unidirectional relation to 1 B then what should happen when an A is deleted? In Sql server this would result in a circular update path so enforcing constraints on both relations is not possible. Also from the model it's unclear if A.B.A == A unless you specify additional contraints.
# March 2, 2005 3:39 PM

Robert van Poelgeest said:

Also I would really like to know how the tool providers rate there conformance to the stated 'features', is there any body who has done a comparion chart?
# March 2, 2005 3:45 PM

Jamie said:

> ORM should catch up with .NET 2.0 and should have a GUI Mapper, should be able to generate the Entity Classes from the Schema Definitions.

Diamond Binding does this. AFAIK its based on Hibernate for .net too - and that means like 14 years of optimisations. A couple of friends of mine use it, and they swear by how simple it is. My tech lead won't touch anything without stored procedures though - which is crap :(

I played with it at home, and you dont think you will use inheritance until you find out how easy it is ;)

# September 1, 2007 11:37 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)