O/R Mappers

Frans , Mats and Paul are blogging about O/R mappers.

The problem with OR mappers is that you are still coding your Data Access Layer, but instead of using SQL, you are using the OR syntax. If you use an OR mapper in your ASP.NET page, you'll be violating the 'do not add data access code to your ASP.NET page'. You are not building your applications in layers.

If you use DataSets and a class that knows how to load it, persist it, etc, you have a much better architecture. If you have a layered approach in that persistence layer, you can change the way your application talks with the data source without changing your application code.

For example, you can change your app from using a '2-tier' architecture and talk directly to the data source, to make it talk to a webservices layer. You can add a layer that provides online/offline support. You can host your data access components in Enterprise Services. You can provide caching. All without a single change in your application code.

You can acomplish that with OR mappers but you need the OR mapper provider to do it. If it does not support it, you are usually stuck.

With a OR mapper you are coupling your data with your architecture. With a message-based approach using DataSets, you are not. So, it's OK to use a OR mapper to load a DataSet ;).

Update:

OK, I think I should refine it ;).

If your entity objects are [Serializable] then you are probably OK if you use a O-R mapper, as you can find a way to be architecture-independent, even if it could require to think about it in advance. If your objects must extend from a MarshalByRefObject (or a subclass, i.e. ContextBoundObject) then chances are that you are in trouble.

 

34 Comments

  • Since when is creating an instance of an entity class by filling it using a DAO object any different from filling a typed dataset using a dal class which calls an adapter? So since when is using an O/R mapper which supplies code like Customer c = new Customer("123"); 'coding your DAL' and thus you're hacking database centric code in the gui? I don't see it.





    Also, don't add layers if these layers don't do anything. What's the use of a call to a BL method which simply passes on the call to another tier below? However you perfectly can do that with an O/R mapper. You work with data in a typed way, what's wrong with that?





    "With a OR mapper you are coupling your data with your architecture. With a message-based approach using DataSets, you are not. So, it's OK to use a OR mapper to load a DataSet ;)."


    Erm.. no. You are using entities in code which relate to their physical counterparts in the persistent storage. No offence, but your remarks tell me that you should take a deeper look into what an O/R mapper really is and how a good O/R mapper is set up, which means proper database abstraction, patterns and a wide range of functionality, most transparent to the developer using the O/R mapper.

  • Jesse: exactly (the BL remark). Where BL code is required in the application, you use entity classes as proxies to the data worked on / with by the BL code. Normally no-one would store BL code in the GUI, so you also won't when using an O/R mapper. BL code is added on top of an O/R mapper, not inside it (besides perhaps basic validation code).





    However I don't see a problem where the gui wants to consume lists and those lists are provided by O/R mapper code. A lot of people write code that is always calling in a BL layer while that layer isn't doing anything. However if you want to be as flexible as possible, it's the prefered way but it doesn't have to. An O/R mapper is just a typed DAL, or better: a DAL + a BL facade tier. If you understand that as an O/R mapper user, nothing is lost, in fact, it's pretty easy to work with from then on :)

  • OK, suppose you want to bind a grid to a list of customers using a O/R mapper. I will retrieve a Customer collection and bind it to the grid. OK.





    Now I want to change my app and read the Customer from a webservices layer. What do I have to do?

  • When you change the datasource you can simply change the DAO object :) The 'customer' entity object is just a data holder like the dataset row which is typed. You fill it using an outside object, like an adapter, O/R mappers fill entity classes with a DAO object, which is called from inside the entity object. (not all of them, most of them).





    The actual logic in your application doesn't change. With O/R mapper generators you simply regenerate the code and your application is again ready to roll.





    How do you see 'reading the customer from a webservices layer' ? I see that as: you call the webservice method and it will return a 'customer' object. By value, as datasets are also marshalled by value.

  • OK, my point is that unless there is an easy way for the O-R mapper user to change the way it talks to the DAO, then you have a problem.





    For example, can I switch implementations in runtime? (like, if I'm in the LAN use a real connection, if I'm at home connected use webservices, and if I'm at home disconnected use an offline mode? Of course the webservices layer must talk to my 'connected' layer, and the 'offline' layer must talk to my webservices layer. Can I do it myself or I need to build that capability in the O-R mapper?


  • Sorry Andres, but it seems to me you should spend a few days exploring or-mapping patterns and get back on the subject afterwards.

  • "OK, my point is that unless there is an easy way for the O-R mapper user to change the way it talks to the DAO, then you have a problem."


    Doesn't have to be that way. As I described: if your webservice returns entity objects by value or accepts entity objects by value and processes them (as I would set up that webservice), the entity consuming code calls the webservice, i.e. the code which consumes the entities is at least BL code (and probably gui code).





    I think you have the feeling an O/R mapper is a layer which embeds BL code, but that's not the case. O/R mappers simply offer a typed way of looking at entities in a persistent storage plus offer logic build around those entities like transactional support, collection based logic etc.





    Your example of multi-connections is a good one, but very solveable with an O/R mapper, simply because the entity consuming code should select how to access the O/R mapper code: directly (thus without a layer) (LAN), through a webservice proxy or cached storage locally (offline). ?





    In your model, you can change the adapter logic. In the O/R model you change caller code which accesses the O/R code.

  • ;)





    OK, I think I should refine my rant ;).





    If your entity objects are serializable then you are probably OK if you use a O-R mapper. If your objects must extend from a MarshalByRefObject (or from a MarshalByRefObject subclass, i.e. ContextBoundObject) then chances are that you are in trouble.





  • "In your model, you can change the adapter logic. In the O/R model you change caller code which accesses the O/R code"





    You really don't need to change the adapter logic, you need to plug a layer that forwards the calls to another one (i.e., the web services layer proxy calls a webservice that invokes a 'real' adapter running in enterprise services').





    Like:





    DataAdapterFactory f = new DynamicDataAdapterFactory(ConfigurationSettings.AppSettings["DataAdapterFactory"]);





    CustomerDataSet ds = new CustomerDataSet();


    f.GetCustomerDataAdapter().Fill(ds);





    The programmer does not know what will this end up calling.





    Anyway, if your Customer object is [Serializable], I can probably layer things in the same way using a O-R mapper. My doubt is how easy is to do it and if it requires changes in my code.


  • In my code, 'f' is just a concrete factory, it returns an adapter. The adapter will fill the dataset somehow.





    It could call a webservice, it could use .net remoting, make a local access, etc., depending on what the 'real' DataAdapter does.





    The returned DataAdapter can just forward the call to another DataAdapter layer.


  • Andres: ok, understood. I create my DAO objects also using a factory so I can, if I want, do the same, so we're on the same line here. :)





    As what Paul refers to: datasets or typed datasets have the problem they're not focussing on the entity, which can be a problem if you want to focus on the entity, but it depends of course what you want in your application: either way has limits and advantages, it's hte developer who has to pick the right tool while looking at the requirements he has set. :)

  • And it would promote duplicate code due to the lack of inheritance. Code reusability? Why use an OOP language instead of VB6 when you using its potential? Appart from prototyping custom entities are imo the way to go, encapsulating data and behavior. Provinding service interfaces for both OOP and regular .NET objects.

  • With typed datasets, I can write:





    myDataSet = dal.GetEmployees();





    foreach(CustomerDataRow employee in myDataSet.Employes)


    {


    employee.Name


    }





    Anyway, I prefer the full OO syntax, of course, but it depends on what I give up.





    Your comment is not really about how to have architecture independence, but if you prefer to use a DataSet than a custom class, which is different. If you use custom classes, you can have inheritance support (which is difficult with DataSets) and a cleaner syntax. If you use DataSets, you have:





    - Easier databinding. Even if your custom class implements are the data binding interfaces, there are somethings that are more difficult to achieve. For example, can I visually bind a TextField to the Customer.Category.Name property? (I' asking, I've never seen it but perhaps it can be done)





    - Easy XML serialization. Do your private fields serialize? Can you handle cycles in your model (ie a Customer making a direct or indirect reference to itself, which breaks the XSD XmlSerializer)? Can you apply an XSLT to your custom classes?





    - DataViews for filtering and sorting





    - Some useful methods .Merge(), .Copy(), .Clear(), .GetChanges()





    - Untyped access. Doing dataSet["Customer]["Name"] is very useful sometimes. Doing that with reflection is much harder.





    - Built-in support for DBNull values.





    - Optimistic concurrency support.





    - Probably more features ;)





    About the TableModule pattern, I'll answer it later ;).


  • Interesting discussion!


    :-)





    Just a quick note. As I see it, to call the Service Layer [Fowler PoEAA] class that returns the customer collection via a web service, only a thin Remote Facade [Fowler PoEAA] should be added. In the case when DataSets are used, I think that is often needed anyway. I don't think it's a good idea to send DataSets via Web Services when the consumer isn't known to be a .NET consumer. Or has that changed lately? I mean, I think only .NET consumers will benefit from DataSets. Other consumers will find the DataSet schema not so nice... It's not long ago I heard the tip to expose two Web Services then, one for .NET consumers and one "general".


    :-)

  • Yes, DataSets just work OK with .NET clients, it was just an example being architecture independent.





    Anyway, there is an article in MSDN showing how you can return a DataSet in a webservice as if it was an standard class.








  • As much as I prefer objects, I have to agree with Andres about the benefits of filter / view / sort, etc. on datasets.





    However, ObjectSpaces should solve this problem with the addition of OPath queries (you can also use XSLT on objects... there is an MSDN sample somewhere... but OPath is a lot better). DataSets are definately cool, and since ObjectSpaces are built on top of DataSets, they get all the cool DataSet functionality but all the cool object functionality too. A best of both worlds approach.

  • Andres, but if I write that special code for returning my DataSet as a standard class, isn't that just a Remote Facade too? Just as if I wrote special code for returning my collection?





    As I see it (and that has already been said here), DataSets are great for some situations. Typically for not so complex applications and in the short run. You get a lot of functionality for free right out of the box, but there is a point when complexity in the domain will make a Domain Model a better choice.


    :-)

  • Yes, one of these 'layers' could be a Remote Facade, but it does not matter to the programmer.





    About your assertion about the domain models, what you say is what most people says, and what Fowler says, but the problem is to find where is that complexity point ;).


  • I know, I should come up with something of my own instead.


    ;-)


    (Anyway, note that when I wrote my book two years ago I was completely confident that DataSets were a great solution for "all" situations so I can at least change my mind.)





    As a matter of fact, this bugs me for the moment since customers of mine are seing this choice as a very tough one... It would be great to be able to find a couple of good studies where they have somewhat located that complexity point. But I guess I am naive to hope for that.


    :-)





    Until then, I continue to do what everybody else does. Guess.





    Best Regards,


    Jimmy


    ###

  • Interesting thread.





    About complexity point, try 2 million lines of code!!





    I just think there are two fundamentally different approaches both of which have their advantages and disadvantages.





    I will however say that a huge disadvangtage of the dataset is its performance and it doesn't seem like that will ever be addressed.





    Would it make a difference if I manually created a single instance that represented all the rows returned in a query, similar to a dataset or recordset and implemented all the cool features you mentioned? How in essence is that architecture much better than having a single instance represent a row?





    How do I mask schema changes using a dataset.





    There is the right tool for the right job. If you need primarily binding and display functionality then maybe datasets are the way to go. If you need to model the "world" and peform row-by-row business operations, then there are other solutions.


  • Just one small comment. I think complexity can be related to the domain too. It's just not code size. For example, think about very complex relationships.

  • Ebenezer,





    If you add all the features the DataSet has to your custom Collection, then you'll probably have the same performance overhead ;). Anyway, it's true that you have a performance hit, that is probably not very important if you compare it with the cost of going to the database to fill it.





    Mapping schema changes is quite possible with the DataSets. If you map a relational database table directly to a DataSet DataTable, then you are in the wrong path, and handling schema changes is hard.





    If you map the data you need for a 'use case' to a DataSet (ie, if you need a Order data entry form, you get one DataTable for the Order header, including the Customer name, and one DataTable for the lines, including the product price), and you make the proper joins to load that data, then schema changes can be hidden by changing the one SQL statement that loads the header, or the one SQL statement that loads the lines.











  • More DataSet features:





    - Change notifications for fields and rows


    - You can easily add metadata to dataset/tables/columns with ExtendedProperties





  • ok some notes on the pro-dataset arguments :D





    "- Easier databinding. Even if your custom class implements are the data binding interfaces, there are somethings that are more difficult to achieve. "


    Agreed. However with all the pain that comes with ITypedList and other interfaces, it is possible. The dataset has it all build in, so indeed you can just use that.





    "- Easy XML serialization. Do your private fields serialize? Can you handle cycles in your model (ie a Customer making a direct or indirect reference to itself, which breaks the XSD XmlSerializer)? Can you apply an XSLT to your custom classes? "


    It took me 20 lines of code to completely serialize my custom classes to XML. Also, with the implementation of ISerializable it is not hard to do. The XmlSerializer is a tough object to deal with (can't deal with interfaces etc.. ) so I'd skip it for the soap formatter.





    "- DataViews for filtering and sorting "


    You can create that easily with the implementation of the IListSource interface.





    "- Some useful methods .Merge(), .Copy(), .Clear(), .GetChanges()"


    Erm... these are collection methods, why are they typical for datasets? :)





    "- Built-in support for DBNull values. "


    'null' is your friend.





    "- Optimistic concurrency support. "


    This doesn't work very well. It will cause the loss of work no matter what.





    "- Probably more features ;) "


    Nah. :)

  • "Erm... these are collection methods, why are they typical for datasets? :)"





    The last time I've checked there was no GetChanges()/Merge() in ICollection() ;).





    "I'd skip it for the soap formatter."


    Soap formatter uses section 5 encoding, is not Schema based, so you should be careful.. If you support an object with a reference to itself check what it serializes.





    "'null' is your friend."


    That's cheating ;), and it forces you to use objects to store everything (which is what DataSets do, but not what I'd expect if I use a custom class)





    "This doesn't work very well. It will cause the loss of work no matter what."





    It depends on your app. You can handle the exception in your own code and retry, or you can help the user deal with the exception showing both version of the data and letting him choose. Functionality locking works OK in very simple scenarios. If you want to enter a new Order, I need to give an error if the product price changes between the time you enter it and the time you are save it. So, should I 'lock' the product table when adding a new Order?





    BTW, of course everything can be done with custom classes, DataSets are not magical, but you need to do implement it. If you are using a tool that does it, you are in good shape. If not, you are in trouble, you probably have better things to do.





  • "That's cheating ;), and it forces you to use objects to store everything (which is what DataSets do, but not what I'd expect if I use a custom class)"


    Internally I use object references, so I can keep everything generic. But the properties f.e. 'Customer.CustomerID' are typed as f.e. string. However you can specify 'null' as a value for a string property, which then results in the end in a DBNull value inthe database. At the class level you don't want to work with NULL's anyway, so these get converted to defaults when placed inside the class (so a Foo.PropertyBar still results in a value, not in DBNULL), but a check is available of course.





    Thanks for the pointer on the soap formatter.

  • Andres,


    "If you add all the features the DataSet has to your custom Collection, then you'll probably have the same performance overhead ;). Anyway, it's true that you have a performance hit, that is probably not very important if you compare it with the cost of going to the database to fill it."





    I'll have to disagree here as various benchmarks have shown that marshaling across process boundaries is quite expensive even when working with a relatively small number of records.





    "Mapping schema changes is quite possible with the DataSets. If you map a relational database table directly to a DataSet DataTable, then you are in the wrong path, and handling schema changes is hard."





    I'm glad you stated this because the average developer will map directly from table to dataset and the only way around this is by using a mapper of sorts. But the output of the mapped relationship using a dataset/datatables will never be as clean as if pure object associations are used.





    Datasets are great for passing data around (they really don't do a great job of modeling the domain). Databinding as highly touted as it is still has some holes. If that works ok for a solution, then it should be used. I'll be interested to see some statistics in the future of enterprise apps that use datasets.

  • If you are using a relational database, then your data model is your domain model. You can add a OO layer on top of it, but your 'real' model is the relational model.





    With DataSets that do not map to tables, you have updatable views of your model. Then you can build datasets for every use case on your app, perhaps with lots of them mapping to the same tables, but just with the data you need for the use case. So, if you need to display an Order, you don't load all the Customer information, just the customer data you need. O-R mappers will usually load the full Customer object (with eager or lazy loading).





    If you want to expose your domain model to external layers (i.e. webservices) then you have to map your domain model to a 'message' class (as you don't want to expose your domain model internals). This message class has data from multiple classes, organized in the same way I'm saying you can do with DataSets. Just the data you need, organized the way you need it.





    Why doing that just for your 'external' users? An ASP.NET programmer is also 'external' to the DAL/BL layer, so why should he deserve a special treatment and have full access to my domain model? I better give him just the data he needs structured the way he needs it.

















  • "If you are using a relational database, then your data model is your domain model. You can add a OO layer on top of it, but your 'real' model is the relational model."





    And this is were the philosophies diverge and we can only agree to disagree.





    "With DataSets that do not map to tables, you have updatable views of your model. Then you can build datasets for every use case on your app, perhaps with lots of them mapping to the same tables, but just with the data you need for the use case. So, if you need to display an Order, you don't load all the Customer information, just the customer data you need. O-R mappers will usually load the full Customer object (with eager or lazy loading). "





    However this all based on "canned" sql statements, all relevant to the use case. What is to stop you from loading just the Order when that's all you need from the O-R? Giving data the way its required is achievable in both scenarios.











  • It's just not the way you build object models. You can build an unnormalized domain model, where you have a Customer Name in an order, but you usually have the Order class containing the Customer class.





    Also, if you build an unnormalized domain model, you are acknowledging that the real model is the relational model, as it's the one that keeps the data consistent.











  • Just thought I would point out, using entity objects doesn't dictate that you have to use objects for your properties if they are nullable. There are three solutions for this:





    a) Have method such as the typed datasets provide: IsPropertyNull(), SetPropertyNull()


    b) Create custom classes (NullableInt32, NullableDouble)


    c) Use the SqlDataTypes, they provide all the functionality from (b). This is a good solution, since the root of the problem is that your type is a Sql data type, not a standard .NET value type.


  • "However, ObjectSpaces should solve this problem with the addition of OPath queries"





    Is this something MSFT is planning on adding sometime soon?

  • Hey, thought i would step in...





    Could someone fundamentally explain to me why DataSet would be better or worse than an object model? As I said before, they are just API on top of your data!





    The best loosely coupled architectures would definitly spit xml. Then you can build your dataset on top of your xml message, or deserialize it from your OR. Or, and that's the solution I adopted and just feel the most confortable with, have a DataSet, and use an OR to build an object model on top of it. That's what I'm using, and it gives very good flexibility even if the initialization cost is a bit high. But you only initialize when you need it. Get data as xml, then initialize whatever API you want on it, xpath, dataset or OR.





    The good thing is that then your xml can still flow without giving a damn about the CONTEXT of the API as it goes from layer to layer.





    Added bonus, change your underlying schema, and stay compatible with xslt. Add someone's web service as another datastore, or aggregate it, an xslt will give you what your app was done for. Change your api, you can plug an xslt to get the new format from the old one. Expose several interfaces for your partners from the same internal messages, that's what xsl-t is for.





    But as I said, I'm am xml geek so my reasoning could be completely messed up.

  • This is a great thread! All it's missing is Thona!! ;-)





    I started out in the .NET world thinking that I would much rather have a custom object model and that DataSets are too bulky and make for a messy API. We set out and built our model with that in mind. We ended up with a decent model, but along the way I think I realized that when you start thinking of all the scenerios and all the useful features that you would like your object model to have, you end up pretty darn close to typed DataSets with custom DataAdapters. Now when I say typed DataSets I don't mean the ones generated by VS.NET from an XSD schema. Those typed DataSets are completely useless to me. I mean typed DataSets that I have complete control over and that allow for inheritance so that my business rules can be implemented in a seperate class.





    Sure, you could have your custom objects/collections implement 90% of what DataSet does for you and you would have a pretty nice object model without all of the generalization of DataSet, but I'm not sure if it's worth it. I keep going back and forth in my mind, but lately I have been leaning a lot more towards the typed DataSet/custom DataAdapter approach.





    Alright, I have to ask a question and I'm going to feel like a moron for doing so, but I guess my only excuse is that I am a self-taught developer with no real software engineering education. You guys talk about Fowler like he is a god and quote his book like it's the bible. Is it really that good? And should I go out right this minute, buy the book, and begin reading??

Comments have been disabled for this content.