Linq to LLBLGen Pro: feature highlights, part 2

Thursday, July 3, 2008

In the first part of this series I talked about the fact that Linq to LLBLGen Pro is a full implementation of Linq and why it's so important to use a full linq provider instead of a half-baked one. Today, I'll discuss a couple of native LLBLGen Pro features we've added to our Linq provider via extension methods: hierarchical fetches and exclusion of entity fields in a query. Furthermore some other features will be in the spotlight as well. What I also want to highlight is that using an O/R mapper is more than just filling dumb classes with dumb data: it's entity management, and the O/R mapper framework should offer you tools so you will be able to manage and do whatever you want with the entity graph in memory with as less problems and friction as possible. After all, the task you have isn't writing infrastructure code, entity classes nor code to make these interact with eachother, your task is to write code which consumes these classes, and works with these classes. This thus means that you should be able to work on that code from the get-go, as that's what your client expects from you .

Exclusion / inclusion of entity fields in a query
The first feature I want to highlight today is the exclusion of entity fields in a query. Say you want to fetch a set of entities and the entities contain one or more large fields, e.g. a blob/image field or a text/clob field. If you don't need these large fields, it's useless to fetch them in your query, as the transportation of the large data (which can be many megabytes) could make the query a slow performer, as all the data has to be fetched by the database and send over the wire. LLBLGen Pro has a feature called Exclusion / Inclusion of fields, which allows you to exclude a set of fields from an entity when fetching one or more instances of that entity (exclusion). You can also specify the fields you want (inclusion) if you want to fetch just a few fields from an entity which has a lot of fields for example. If you want to fetch the fields back into the entities, that's possible too, LLBLGen Pro offers a special mechanism for that which efficiently fetches the excluded field data into the existing entities. We'll see an example of that later on in this post.

For this example, we'll fetch a set of Northwind Employee instances. The Northwind Employee entity has two large fields: an image (Photo) and an ntext field (Notes). Initially we'll fetch all the Employee entities using Linq and exclude the two fields, Photo and Notes:

// Listing 1
EntityCollection<EmployeeEntity> employees = null;
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
    LinqMetaData metaData = new LinqMetaData(adapter);
    var q = (from e in metaData.Employee
                select e).ExcludeFields(e => e.Photo, e => e.Notes);

    // consume 'q' here. Use the Execute method to return an entity collection.
    employees = ((ILLBLGenProQuery)q).Execute<EntityCollection<EmployeeEntity>>();
}

The code uses Lambda expressions which offer compile time checked correctness. Later on, we'll see how fields also can be excluded in hierarchical fetches. One could argue that this also can be achieved by a projection onto the EmployeeEntity type using a select new {} statement. That's true in theory, but it will likely be more work (as you have to specify all fields you do want) and it also will use a different pipeline internally (namely the one for custom types being fetched through a projection), and not the entity fetch pipeline.

This might sound strange but fetching entities is more than just putting data into a class instance. The biggest hurdle is inheritance. If you do a new projection, the instances to create are known: they're instances of the type specified in the projection, be it an anonymous type or a specific type. With entity fetches this is different: the type to instantiate is determined based on the data received from the database. What if the type specified in the projection isn't a known entity type? How can the system then create an instance of a subtype of that type if the data received from the database is the data of a subtype? Only in the case where the developer has specified a class of a known entity type, the same pipeline can be used, but that's not always the case, as the developer is allowed to specify any type, including anonymous types, as shown in the following example:

// Listing 2
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
    LinqMetaData metaData = new LinqMetaData(adapter);
    var q = from e in metaData.Employee
               select new { 
                     e.EmployeeId,
                     e.FirstName,
                     e.LastName,
                     e.Title,
                     e.TitleOfCourtesy,
                     e.BirthDate,
                     e.HireDate,
                     e.Address,
                     e.City,
                     e.Region,
                     e.PostalCode,
                     e.Country,
                     e.HomePhone,
                     e.Extension,
                     e.ReportsTo,
                     e.PhotoPath,
                     e.RegionId
               };

    // consume 'q' here.
}

Here, we fetch the same data, though we fetch it into an anonymous type using a new projection. We omit the two big fields so effectively this is excluding Photo and Notes. However, what if Employee was an entity type in an inheritance hierarchy and the row returned from the database was for a subtype of Employee, e.g. SalesManager. I now would exclude more than just Photo and Notes, as I also would exclude the fields for SalesManager. With the ExcludeFields() extension method used in the first example, that's not the case: if Employee is in an inheritance hierarchy, all subtypes are fetched nicely and their Photo and Notes fields would be empty, as specified.

As all Employee instances have two fields left empty, it's of course necessary to fetch these into the entities again, if that's required. Let's say I consumed the query in Listing 1 and fetched it into an entity collection of Employee instances. Say I want to fetch all Photo and Notes data into the employees from the UK, which are in my employees collection fetched in Listing 1. I'll now create, using a Lambda filter which is run in-memory, an entity view on this employees collection with solely the UK employees. Creating a view is like creating a DataView on a DataTable: it's a view on a normal collection, and you can filter it, sort it and project it onto another object again. Creating this view doesn't affect the original collection. I can also create multiple views on the same collection, with different filters and different sortings. The nice thing about this is that I can bind all views to different controls, and it will look like I have multiple collections while I have only one. As the view is a view on a live collection, modifications on the collection will be shown in the view as well.

Listing 3 will show how to create the view on the employees collection with a filter on the Country field, which is specified as a Lambda and which is ran in-memory. The view is then exported as a new collection and that collection is used to fetch the Photo and Notes field data into the entities in the collection. We've to rebuild the collection of excluded fields, as this info isn't stored inside the entity, as this allows us to be flexible which excluded fields to fetch. The excluded fields fetch code doesn't use Linq, as it would otherwise have been a bit awkward to formulate the query.

// Listing 3
// create a view. Adapter uses EntityView2, SelfServicing uses EntityView
EntityView2<EmployeeEntity> employeesFromUkView = 
            new EntityView2<EmployeeEntity>(employees, e=>Country=="UK");
// create new collection with the data of the view (same entity instances)
EntityCollection<EmployeeEntity> employeesFromUk = 
      (EntityCollection<EmployeeEntity>)employeesFromUkView.ToEntityCollection();
// create the set of excluded fields to fetch, use initializers. 
ExcludedFieldsList fieldsToFetch = new ExcludedFieldsList() { EmployeeFields.Photo, EmployeeFields.Notes};
// fetch the fields into the entities, using efficient batch queries and merging techniques
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
    adapter.FetchExcludedFields(employeesFromUk, fieldsToFetch);
}

After Listing 3 has ran, the Employee entity instances in employeesFromUk now have their Photo and Notes fields filled with data. As the view is just a view on an existing collection, the employee instances in the original collection are the same, so we effectively fetched the Photo and Notes fields in a selection of the entities in the original collection. We'll see exclusion of fields re-appear in our next section, about Prefetch Paths.

Hierarchical fetching of entity graphs using Prefetch Paths
One core part of working with entities is the ability to fetch graphs of entities efficiently. A graph of entities contains entities of multiple types which are related to each other. A typical example is a set of Customer entities which have their Orders collection filled and each Order entity has its OrderDetails collection filled, and each Order also refers to its related Employee entity. LLBLGen Pro has offered the ability to fetch these kind of graphs efficiently for a long time now and we've extended this into the Linq provider as well. In LLBLGen Pro this feature is called Prefetch Paths, and it's similar to spans (Objectspaces), Include (Entity framework) and to some extent even LoadOptions (Linq to Sql), however all of them are pretty limited compared to Prefetch Paths. LLBLGen Pro's Linq provider offers two ways to specify Prefetch Paths, and I'll use the more Linq-eske way, using extension methods written by Jeremy Skinner. These extension methods are include in the Linq to LLBLGen Pro provider.

I'll specify a fetch for the graph: Customer - Order - OrderDetails, Order - Employee. This is a multi-branch path, with 4 different nodes: Customer, Order, OrderDetails and Employee. LLBLGen Pro will therefore fetch this whole graph in just 4 queries, one for Customer, one for Order, one for Employee and one for OrderDetails. It will fetch only the data required for the graph and will merge the entities in-memory.

The Prefetch Path execution code uses some optimization techniques under the hood, for example it will use parameterized queries instead of subqueries if the number of parent entities is below a given, settable threshold. For example, if you're fetching all Customer entities from Germany and their Order instances, you can fetch the Order instances with an IN filter on Order.CustomerId and a subquery on Customer (with the filter on Country), but you can also create an IN query with just the PK values from the Customers already fetched. This is much more efficient, when the number of parent entities (here Customer) is small (say below 100). The framework will decide this for itself, so you don't have to specify anything. The framework doesn't use joins for path node fetching, because that is less efficient due to the duplication of data and also causes problems in multi-branched paths.

In Listing 4, we're fetching all Customer instances from Germany and their Orders, the Order's OrderDetails and the Employees who filed the Orders. Also, we're excluding Photo and Notes from the Employee instances fetched. Everything is merged for us by the framework so the end result is a collection of Customer instances and their related entities available through navigational properties (e.g. customer.Orders, order.Employee, order.OrderDetails), using just 4 queries!

// Listing 4
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
    LinqMetaData metaData = new LinqMetaData(adapter);
    var q = (from c in metaData.Customer
             where c.Country == "Germany"
             select c).WithPath<CustomerEntity>(cpath => cpath
                    .Prefetch<OrderEntity>(c => c.Orders)
                        .SubPath(opath=>opath
                            .Prefetch(o=>o.OrderDetails)
                            .Prefetch<EmployeeEntity>(o => o.Employee).Exclude(e => e.Photo, e => e.Notes)));

    // consume 'q' here.
}

Ok, let's break it down into pieces to discuss what happens here. As a Linq query is a sequence of statements (calls to Extension methods), and the Prefetch Path to use is a multi-branched path, we need a way to specify these multiple branches in a single line of code. This is done through the usage of multiple path definitions chained together with SubPath and Prefetch. The first few lines of the query are pretty straight forward: a query on Customer, with a filter on Country and a projection which selects the Customer instance. Added to that is a call to an extension method of Linq to LLBLGen Pro, WithPath.

WithPath is a method which allows you to specify a Prefetch Path to be used together with the query you call it on, in this case the query on Customer filtered on country. Through the usage of a Lambda expression we can define the path edge Customer - Order, using the Prefetch method on the path variable. We specify what to fetch, namely Customer.Orders, and after that we continue on the same path branch by specifying the path below Order and we do that by using the method SubPath. This method specifies a new path edge below the path edge it is called on. We define a new path edge for Order - OrderDetails using Prefetch again (using a shortcut version without generics) and we also define a second branch in the path, for Order - Employee. On that path edge, we call the Exclude extension method so we can define that the Employee instances fetched with this path should have their Photo and Notes fields excluded, as they're big and these aren't needed for now.

There are more methods defined, besides Exclude, to be called on a path edge. You can specify a filter for that path edge, e.g. if you wanted only the orders before a given Orderdate fetched in the above query, you could specify a Lambda filter on the .Prefetch<OrderEntity>(c => c.Orders) line using FilterOn, and the filter specified would of course be ran inside the database. Furthermore you can specify limiters (only fetch n instances) and a sort specification to order the fetched set. And paging with prefetch paths? Sure, paging is supported together with prefetch paths as well. As long as the page size is smaller than the set threshold. By default the threshold is set to 50, but you can adjust that to whatever you like with a parameter on the DataAccessAdapter instance. So if I add the following line below the query declaration in Listing 3:

q = q.TakePage(2, 3);

the framework will fetch page 2 of size 3 with Customer instances from the total set of Customer instances from Germany. The 3 Customer instances will be fetched together with their related entities as defined in the Prefetch Path.

As everyting is inside a graph, I can navigate that graph using normal property navigation. Also, because all collections of entities inside entities (e.g. customer.Orders) are entity collections, I can create entity views on them, similar to what I've showed above, and filter them, sort them and project them in-memory without touching the original collection. Don't make the mistake that this is similar to just running a Linq to Objects query on the collection: if I bind an entity view to a grid and add a row (which is a new entity), it's added to the collection. If I remove an entity from the collection and it happens to be in, say 3 entity views, it's removed from those 3 views as well. An entity view is a live view on a subset of the entity collection, with the awareness as if you're handling the collection.

Prefetch Paths of course support inheritance and are fully polymorphic. This means that you can specify path branches which are solely for some subtypes of a given entity fetched. This way, you're able to specify very powerful paths to fetch complex graphs with very little code.

There is another form of hierarchical fetches, using nested queries inside the projection, as I've described last time in short and also more in detail in part 14 of the Developing Linq to LLBLGen Pro articles, so rehashing here what's said there is a bit redundant. I'd recommend you to read part 14 if you're interested in how this works behind the scenes and why our mechanism is more efficient than say the one inside Linq to Sql .

Next time I'll discuss more in depth the advanced method mapping capabilities in Linq to LLBLGen Pro to map .NET constructs onto database constructs, and will also give an example of how LLBLGen Pro's authorization feature works nicely with the Linq queries, thanks to our Dependency Injection framework, so you can exclude entities, hide data etc. based on the user using the data through authorizers you write yourself. Stay tuned!

1 Comment