More on Lazy Loading vs. Pre-loading in O/R mapping scenarios

Recently I replied to a post on Ayende's blog which I'll quote below:

In general lazy loading is more of a burden than a blessing.

The reason for this is that it leaks persistent storage access to different tiers via the lazy loadable associations. If you want to prevent your UI developers to utilize lazy loading, or are sending entities across the wire to a service, how are you preventing that lazy loading is called under the hood? We support 2 models, one has lazy loading, the other one doesn't (and is more geared towards disconnected environments).

You don't really miss lazy loading in the second model really, as long as you have prefetch paths to specify prefetching what you want (also into an existing graph) The thing is that the model then forces you to write more service oriented software: make the call to the data producer and tell the data producer (or repository, whatever you want to call it) what to get and you get the data and work with it. there's no leaky lazy loading under the hood bypassing the repository, you need to call the dataproducer to get the data, period.

Ayende responded to that with this post: Lazy loading: The Good, The Bad, And The Evil Witch in which he disagrees with me and explains why he likes Lazy Loading so much. He's not alone, and there are good reasons why Lazy Loading is sometimes a nice thing to have, and which is also the reason why LLBLGen Pro for example supports Lazy Loading in one of the two O/R mapping paradigms it supports (SelfServicing. The other paradigm is Adapter). What I disagree with is his analogy between paging and lazy loading: paging is providing a limited window on a resultset which is actually temporarily, lazy loading is reading additional entity objects on demand into a graph in memory. The difference is in the fact that the first (paging) goes through the same channels for every fetch of new data while the second (lazy loading) first uses the normal channel to fetch data and after that uses functionality 'under the hood' to get the additional data.

This thus means that the lazy loading functionality bypasses the logic you would otherwise use to obtain data. In short this comes down to the fact that if you ask a class CustomerRepository to get you the customer entity object myCustomer which is associated with a given order entity object myOrder, you'd expect to use this route every time, as in the CustomerRepository you then can locate code where to obtain the entity object myCustomer. However, if you have myOrder and you can reach it by using myOrder.Customer and it then loads via Lazy Loading the entity from the persistent storage, it bypasses CustomerRepository, which effectively means you have two code paths in your application to obtain a customer entity, so you therefore have to duplicate code which has to be executed on that codepath, left alone the limitation that you're not able to decide where to obtain the customer entity object.

The solution for this is to 'eager' or 'prefetch' the data up front: you then can always decide how to load the data with which class, as nothing is using 'under the hood' code which is triggered automatically and which decides for itself what to use to obtain the data and from where. This isn't without disadvantages as well though: if you don't know up front which data the user will use, you probably load too much data up front. Example of this is a form where you have a list of customers and the user wants to view a couple of orders from a few customers which data is on the screen. Which orders and from which customers? You don't know up front. So you could decide to leave that to lazy loading: when the user clicks open the 'Orders' collection on a given customer, load it on the fly. You could also decide to fetch all orders up front for the customers on the form, which could be done in 2 queries: one for the customers and one for the orders. This could be more efficient if the user decides to view a lot of orders from different customers, but up front this is unknown.

This doesn't mean that lazy loading is therefore a great thing per se: you could also decide to load the orders for the customers selected manually, using the repository classes you have defined for example, or the webservices you use in your application. The lack of that choice alone makes the scope in which lazy loading pretty is really useful pretty limited. It's therefore of upmost importancy that you realize if your application is really suitable for lazy loading or not and if not, don't use it. A good O/R mapping solution always lets you do pre-fetching/eager loading of entity graphs without lazy loading.

4 Comments

  • Define aggregate roots to avoid multiple paths to the data (Domain Driven Design). And lazyloading isn't very suitable for serviceorientation, but it is what you use underneath your servicelayer (which is in effect a client technology, and therefore you should have a domain that is independent of the servicelayer) to save coding and keep encapsulated.

    ORM and lazyloading can be a complex beast, but it can be controlled, and I am convinced it is worth it.

  • There may be a number of paths to load a model, but that doesn't necessarily mean duplicates code.

    I really like the lazy loading in Ruby on Rails models. They also support prefetch paths. Is is the cleanest implementation I have seen so far, it is so elegant to use.

    Prefetch:

    articles = Articles.find(:all, :conditions => ..., :include => :author)

    Lazyload:

    books = articles[0].author.books

    Of course it isn't good to expose these models outside the service boundary, so lazy loading is not a problem in this regard.

  • And if you're not going to do an aggregate root and properly limit access to one path, at the very least, the behind the scenes call that order makes to load up customer, should still use the customer repository.

    Lazy loading should not use a different access path than normal loading, that's a flaw in the framework, not in the lazy loading approach itself.

  • There's been a healthy discussion...

Comments have been disabled for this content.