O/R Mappers: Simple Database Features ?

Friday, January 9, 2004

I'm going to get grilled for saying this, but I don't see why most O/R mappers leave out some very simple database things. Note that I'm not saying I have everything down, nor do I really have any of the more complex things that some will need. What I'm saying is that there are some things that everybody wants and are very simple that are often left out of O/R mappers.

First and foremost is that every O/R mapper should work in some minimalistic sense with every ANSI SQL compliant database! Everybody that wants to use an O/R mapper expects to at least be able to do the very basic CRUD operations that are mostly just simple SQL. Sure, there are a few gotchas, like how to handle database generated primary keys with inserts, and the fact that some data types like DateTime vary in their handling. But its easy to handle these basic issues with the big players (MS Sql, Access, and Oracle), and then just expose a generic OleDb and Odbc provider and warn the user they are not guaranteed. .NET even makes such generic coding to all database providers very easy if you always code to the System.Data interfaces instead of using specific providers like SqlClient. Yes, I know that this won't work for creating the most performant database code, nor for more advanced situations, but that shouldn't prevent the basics from being handled genericly. In particular, I think its very very bad that Microsoft is going to release ObjectSpaces only working with MS Sql, at least initially. Maybe they will add others later (but maybe not), but you can't really design a flexible system if you don't test it in more than one case from the beginning.

Another relatively simple case that many people want which fear performance (warranted or not), and some people also really need, are stored procedures. Yes, the bread and butter of an O/R mapper is and should be dynamic SQL created at runtime, but does that mean you can't give the customer another often requested option when its easy to implement. Like it or not, there are people that won't ever believe dynamic SQL is often just as performant, and there are some cases at least where they are right. There are also other issues related, especially when data is denormalized for performance reasons, that stored procedures simplify, although certainly there are other alternatives, like triggers. Enabling stored procedures will also allow you to provide a work-around for those silly users that want to use one of those other databases and still have database generated primary keys. There's bound to be other reasons for allowing stored procedures too -- my point is that there really is no reason to not make them an option since they are simple to include.

Finally, I have one other "simple" scenario that I included in my WilsonORMapper that no one has asked for -- but I bet everyone will want now that I've included it! Basically, there's no reason that every O/R mapper can't provide a way to ask for a specific page of data, where you get to specify the size of each page. That's right -- my ORMapper will query the database for only a specific page of data, using standard ANSI SQL, and not force you to have to jump through hoops or return more records than you really want. You can actually see me using this feature on my site where I have added a "grid" (I use repeaters) with paging of 20 records, also with sorting ascending and descending enabled. Once again there's no reason why the basic SQL can't be optimized better for the major players while still providing the generic functionality for all databases. I'm sure I'm going to be ridiculed as naive, and I am when it comes to all the advanced features that others are providing that I will never even attempt, but that's no reason to not deliver the basics.

With that I'm done with my "lessons learned" topics. Those are the basic features I expect in all O/R mappers. What do you think are the simple basics that should be required?

11 Comments

I played around with things like this and took the approach I saw many n-tier samples and applications take.

You have a basic IProvider interface and then for each database you just implement the IProvider for that specific database and boom you have access to O/R mapping for any database that implements IProvider ... you could even then optimize for each operation for the database ...

I've seen other O/R mappers take on this route but only ever implement one for SQL Server ...

Take note this affects objects composed of data from several different data sources.

Adam Weigert - Friday, January 9, 2004 9:10:00 PM

The in house O/R mapper I wrote for our applications supports paging also. Basically specific implementations (Sql, XML, Access, etc.) choose how best to implement this under the covers. The Sql one, for example, uses TOP statement to limit the max results returned from the DB (i.e. PageSize * PageIndex) and then chops the front off the result set locally.

Drew Marsh - Friday, January 9, 2004 9:11:00 PM

ANSI SQL?

Try to use ANSI joins on Oracle 8i. Very simple stuff, yet it doesn't work.

You have easy talk summed up, but when it really gets exciting, you have to dig deep into the custom code per database to support what you want to do.

Want to support updates with multi-table filters? How are you going to do that on Oracle, where no FROM clause is supported with UPDATE? Or deletes with filters spanning multi tables? No second FROM clause.

TOP clause? No can do on oracle. (rowcount). Sequences support? You don't have 'em, but you have to on Oracle. How about synonyms, one of the cornerstones in every oracle database. Does your schema reader read these?

Ansi joins? Not on oracle 8i, still a database very widely used. To support joins on Oracle 8i, you have to use stuff like:

SELECT * FROM A, B WHERE A.Foo = B.Foo(+)

For left joins.

Stored procedure support? Kicking an open door even more open. How do you filter on entity Foo by using a random field of Foo? Or combination of fields? Or filtering on a related entity of Foo to grab a set of Foo entities?

You think you can do all that with stored procs or ANSI SQL? You can't, trust me on that. My Oracle SQL generator is completely different from the Sqlserver Sql generator, because of all the Sql differences. Simple inserts or single table updates, big deal. Try to support left joins (weak relationships) on Oracle (and I don't mean just 1 single version) or other basic stuff.

As I said in a reply to Ingo's ramblings (which were completely missing the point btw): A good tool offers options to directly manipulate entities where the entities really live, like update a set of entities directly based on some criteria. I do that. It's common sense, because entities live in the database, so why can't you manipulate them directly?

Paul, a tip, forget about ANSI-SQL. It's a nice dream but it's simply not there. It won't cut it, you need custom Sql constructs per database. That's reality.

Frans Bouma - Friday, January 9, 2004 9:47:00 PM

Frans:

I don't have any significant problem with your LLBLGen Pro or Thomas' EntityBroker when it comes to multiple database support. The only one I singled out was Microsoft's ObjectSpaces, but there are others on the market that are also MS Sql only. I also did not mean that you should use only ANSI SQL at all -- I simply meant that there's no reason you can't use it to provide a minimal generic database provider. I don't want to say that its "necessary" if you support the major players (as you do), but its easily doable so I don't understand why you would not. But in no way do I think you have to use ANSI SQL for the major players, although I will readily admit that my rather simplistic ORMapper mostly does. Yes, that's probably a major flaw if I were to ever try to compete as a highly flexible and extremely performant O/R mapper, but that's your market and not mine.

As for stored procs, all I'm asking for again is minimal support, which I believe your O/R mapper has anyhow, so what's the beef? I'm not intending to say they will solve every problem, just that they are easy to implement and that they should therefore be optionally allowed. I also don't mean to gripe too much about the paging, since I just invented that request (although it sounds like others thought of it before me from the comments), but its something I do think you and others should consider going forward now. I still believe that your LLBLGen Pro and Thomas' EntityBroker are the best O/R mappers on the market, and that it also appears that this will continue to be true even after ObjectSpaces is released. I do however believe that some things can be better, and many common things can be simpler, even in your tool.

My early feedback has been very much that my simplicity has made it understandable and easy to use -- surely you guys can keep your tools more powerful while still adding some simplicity for the many many people that just want to get rid of very basic CRUD! That would also increase your sales phenomenally, since there are many more people that need to do the basics and can't handle the complexity of the more advanced tools that you are offering. Again, I have and will continue to point people to you and Thomas when they need more, and certainly many do, but consider the little guys out there too (you remember the guys that liked your original stored proc generator :).

Thanks, Paul Wilson

Paul Wilson - Friday, January 9, 2004 11:14:00 PM

Paul, stop it, you make me blush in public ;)

Seriously: normal, simple CRUD stuff isn't doable with ANSI SQL so you can support multiple databases. Here are some examples:

1) a deadsimple autonumber field as PK. Do the insert. ANSI SQL doesn't contain constructs for autonumber fields. This is different per database system. SqlServer (and sybase) uses Identity columns, and you should NOT specify a value for the column. Oracle and other databases use sequences which supply the value for the column and you SHOULD specify a value for the column (the sequence). How the get the new value back? Sqlserver2000: use SCOPE_IDENTITY(), Sqlserver7: use @@IDENTITY, Oracle: use SELECT sequencename.CURRVAL INTO :parameter FROM DUAL. Other databases use even more different schemes. ANSI SQL? You will not get far for even this very simple insert statement! -> you need special code per database

2) multiple schemas vs. multiple catalogs with multiple schemas. SqlServer uses catalogs and inside the catalogs it uses schemas. Other databases use just schemas and don't understand the concept of a catalog. So code targeting SqlServer will have to deal with selects which can span multiple schemas inside a catalog (which means schema info has to be supplied WITH the fields and tables!) and/or multiple schemas in multiple catalogs (for example look up table select in the lookup table catalog using a FK field in another catalog. Simple stuff, yet it can't be done with the same code which is used for Oracle too, because Oracle doesn't understand catalogs, so you can't formulate the query for Sqlserver using oracle constructs, you can't specify the catalog info, however, you can't use sqlserver's constructs on oracle because oracle doesn't understand the concept of a catalog. (so doesn't understand what database.dbo.tablename means.) -> you need special code per database

3) table names/field names with spaces. Stupid idea to include spaces in tablenames or fieldnames, yet people do it. (hell, even northwind contains a table with a space in the name). How to execute the most SIMPLE CRUD routines on these tables/fiels? How to specify the names? Oracle doesn't understand '[foo bar]'. SqlServer doesn't understand '"Foo bar"'. -> you need special code per database

4) You mention stored procedures for very simple CRUD. Ok, let's do that. Let's use the most SIMPLE procedure for retrieving a simple entity: SELECT * FROM Table WHERE PkField = @PkFieldValue.

You think you can call that proc with portable code? Not at all! Ever looked at how Oracle (and other databases!) return resultsets? That's right, through cursors, the Oracle's REF CURSOR parameters. Now, SqlServer does this differently, a proc simply executes a SELECT statement and the results are the resultset returned. With Oracle, you have to bind the REF CURSOR parameters to a dataset and then populate the dataset using the cursors. (or run through them manually). Furthermore, if you use the Oracle provider shipped with .NET, the Microsoft one, you are out of luck with REF CURSORs and smooth code: if a proc returns 2 resultsets, you have to walk through the ref cursors by hand, binding them one by one by hand. ODP.NET does this automatically.

Your idea about ANSI SQL is good, I had that idea once too (I spend a lot of time looking for the ISO/ANSI SQL docs to build my query language upon) but I let that idea go the minute I started working on the Oracle code: a lot of simple SqlServer code didn't work on Oracle simply because a lot of it is different.

Remember, I talked about very very very basic stuff in this reply :) I didn't even mention ANSI joins which are not supported on Oracle 8i. I also didn't mention the paging you implemented. On SqlServer you can't implement reliable paging, you have to requery every time and the new results can differ from the previous call. On Oracle you can use a rownum value to implement easy paging. Difference again.

IF you want to support even just simple CRUD but you want to do that IN FULL! for multiple databases, you have to bite the bullet and implement various paths in your code for each database vendor. That's REALITY. Thinking you don't have to is naive. I'd like to see your insert code do its work on Oracle with sequences, or inserting data in my 'Test' schema in my SqlServer catalog. Not to criticize your code, Paul, but your, sorry, naivete about this subject. :)

Frans Bouma - Saturday, January 10, 2004 10:38:00 AM

There is not much to add to Fran's statement.

My experience when working with Oracle & SqlServer & Access is identical.

SQL is great as programming langauge, but sucks as hell as abstraction. No standardisation that is really worth mentioning. You need special cases everywhere.

One of the glories of our EntityBroker is that I really don't have to care about this (as in every other O/R mapper). This IS a MAJOR pain to do manually.

Thomas Tomiczek - Saturday, January 10, 2004 12:40:00 PM

Thanks for all the links and info Frans. :) I'm sure I'll be adding more vendor specific stuff as I start dealing with more bugs and features.

Right now I just do the auto-ids differently:

"SELECT KeyField = SCOPE_IDENTITY()" for MS SQL,

"SELECT @@IDENTITY AS KeyField" for Access,

"RETURNING " + keyField + " INTO KeyField" for Oracle (this should work for Oracle 9.x if a trigger is setup to add the key from a sequence -- big assumption but it works)

and

"SELECT MAX(" + keyField + ") FROM " + table + " AS KeyField" for the rest (very bad yes, but its my generic attempt at workable)

Good proof of my naivety (spelling?)! I'm committed to getting the basics right, and I'm sure there are lots of assumptions I've made that will result in bugs that I need to fix. I'm not ever going to touch the advanced cases that Frans and Thomas have though.

Thanks, Paul Wilson

Paul Wilson - Saturday, January 10, 2004 2:25:00 PM

Well, what about the BASIC stupid errors.

Now, I have a lesson for you: .NET framework is full of bugs.

Parameter. Let's set a value :-)

* OleDb vs. Access: you can NOT assign DateTime.Now safely - blows. You HAVE to assign DateTime.Now.ToString ();

* Oracle: you can not assign an empty string "" without also setting he length explicitly. Blows. NULL and strings WITH a length are perfect, and neither in SQL Server nor in OleDb do you ahve to set the length.

Stuff like THIS makes we want to take a big club and visit the programmers.

In the EntityBroker we work by having a "SqlContext" specific for every database - itcontains the sql generation templates, but also helper methods (formatting dates etc., in such a way that the database can deal with it). With every iteration we add new methods. Pretty often - by now - just for handling errors in the access libraries.

Thomas Tomiczek - Saturday, January 10, 2004 2:46:00 PM

Oh, last saying :-)

I jsut decided to add support for "paged queries" to the EntityBroker :-)

You are right - sometimes you just want part of the result set. Now, while I can not actually stop the data to be transmitted from the SQL Server (YET - this will change once YUKON comes around and our server version moves the logic of our DAL into the database), I can make sure it never leaves the DAL.

This means fast discard, and this means:

* No queries for potential additional data (as in: tables that subobjects may have, inheritance wise)

* No transmittion to the object broker, which may be on another computer (as WE do support a remoted DAL).

Yes, makes sense in some scenarios, and is acutally half an hour of work at maximum.

Thomas Tomiczek - Sunday, January 11, 2004 11:33:00 AM

Excellent Thomas! Although you could use filter the page in the database (see the SQL I posted once before that I linked to). Its actually well-performing for millions of records, and far beats bringing more data back and then dropping it.

Paul Wilson - Sunday, January 11, 2004 3:29:00 PM

I am not so sure your SQL is "right on track and optimal", but it is a pretty nifty idea to run a query within a query - pretty nice. I will see that I get this put in as optimization for SQL Server :-)

Thomas Tomiczek - Sunday, January 11, 2004 10:33:00 PM

Comments have been disabled for this content.