Linq to LLBLGen Pro: feature highlights, part 1

Tuesday, June 17, 2008

Some people asked me what the highlights are of Linq to LLBLGen Pro, which was released this week, as it seems that Linq support is apparently growing on trees these days. In this and some future posts I'll try to sum up some of the characteristic features of Linq to LLBLGen Pro, so you don't have to wade through the 15 articles I wrote about writing Linq to LLBLGen Pro . I'll write several of these articles, this is the first one. I hope to write more of them in the coming weeks.

Linq to LLBLGen Pro is a full implementation of Linq
The first feature I'd like to highlight is the simple fact that it's a full implementation of a Linq provider. You now might think "Isn't that obvious? It is a Linq provider", but I've to spoil that dream for you: Most Linq 'providers' out there are just implementations of a small subset of what a Linq provider is expected to do. You see, implementing a Linq provider isn't just about writing a handler for MethodCall expressions to Queryable's extension methods. That's just a small part of it. The main part is about writing code which supports everything you can run into when traversing and handling an Expression tree. Everything. There's no room for compromises, sadly enough: if you don't implement a feature offered to the developer through Linq, it might be the developer is unable to write the query with Linq, as you can't mix and match things to form a SQL query in the end: the query is written in Linq or it's not.

If you run the risk of getting exceptions at runtime because the expression tree contains unsupported elements or constructs or worse: whole subtrees which aren't understood, would you use such a Linq provider? You might think this isn't a big deal, but the thing is: you only know exactly which expression tree is generated and passed to the Linq provider when you run the code: it is perfectly possible that the provider only expects Queryable extension method calls at the root of the tree, and it for example doesn't expect a 'Where' call at one side of a Join for example.

Writing a Linq provider is a lot of work which requires a lot of code. If you're dealing with a Linq provider which is just, say, 32KB in size, you can be sure it will not support the majority of situations you will run into. However, the O/R mapper developer likely simply said 'We have Linq support', and it's even likely the provider can handle the more basic examples of a single entity type fetch with a Where, an Order By or even a Group By. But in real life, once you as a developer have tasted the joy of writing compact, powerful queries using Linq, you will write queries with much more complexity than these Linq 101 examples. Will the Linq 'provider' you chose be able to handle these as well? In other words: is it a full Linq provider or, as some would say, a 'toy' ?

For developers who will use a Linq provider, it's often a tough call how to decide what will be a solid Linq provider to work with and how much 'Linq' is actually implemented and supported by the used Linq provider. Below I've added a list of questions you can ask yourself when you're testing out a Linq provider. Perhaps some aren't important to you now, but consider that a software project often lasts for several years: most time on a software project is spend during maintenance, so the provider chosen has to be able to deal with many, many cases, and you shouldn't be forced to swap out the provider (and thus the O/R mapper) later on.

It can be the O/R mapper doesn't support a given feature in general, e.g. UNION queries. It's then unlikely that the Linq provider will support the feature. That's not the set of problems I'm talking about here: what I'm talking about are queries which are expected to work as Linq queries, considering the feature set of the O/R mapper (e.g. it supports paging, so paging through Linq queries should also work) but are failing at runtime due to the lack of support in the Linq provider.

The list below is far from complete, I'll update it if more topics are brought forward. I've compiled this list mainly from memory from my experiences when writing Linq to LLBLGen Pro where countless hours have been spend on answering 'what if?' questions related to what is possible with Linq.

Can it do joins?. Linq has two types of joins (it also has GroupJoins which are discussed further below)
```
// Type A: typical cross join
var q = from c in md.Customer
           from o in md.Order
           where c.CustomerId==o.CustomerId
           select c;
```
```
// Type B: real join clauses
var q = from c in md.Customer
          join o in md.Order on c.CustomerId equals o.CustomerId
          select c;
```
Type A results in a MethodCall expression to SelectMany(), Type B results in a MethodCall expression to Join(). SelectMany() is typically handled through a CROSS JOIN SQL statement, Join() is typically handled through INNER JOIN SQL statements (or non-ansi joins if the database requires it, like Oracle 8i). If the Linq provider doesn't handle type B, you're stuck with cross joins which can seriously slow down your queries at runtime. Also keep in mind that with type B you can only create INNER JOINs. To produce LEFT / RIGHT joins, you need support for GroupJoin and DefaultIfEmpty.
Can it handle GroupJoin and DefaultIfEmpty ?. A GroupJoin is a Linq specific type of join: it's a combination of group by and a join. As it doesn't really have a SQL equivalent, it's a bit difficult to translate it to SQL. GroupJoin is one of the extension methods of Queryable which isn't implemented in many Linq providers out there. However, you need it if you want to do LEFT/RIGHT joins, as it works together with DefaultIfEmpty:
```
var q = from c in md.Customer
          join o in md.Order on c.CustomerId equals o.CustomerId into co
          from x in co.DefaultIfEmpty()
          where x.OrderId == null
          select c;
```
This query produces a LEFT JOIN between Customer and Order and filters out any customer with one or more orders. LEFT / RIGHT JOINs are a vital part of any querying system, and therefore if you can't express these kind of queries, you're likely going to have a struggle writing efficient code, or you're forced to use another querying API than Linq which makes the whole point of using Linq in the first place rather moot. The biggest pain with GroupJoin is that the actual GroupJoin resides in one part of the expression tree, but the reference where it is actually being used, the DefaultIfEmpty, is in another part of the expression tree. This might not sound like a big deal, but it can be the GroupJoin is behind an alias border, which means the GroupJoin isn't directly reachable as it seems, but it should be reachable as the DefaultIfEmpty requires it: as if you pull one subtree into another part of the expression tree however with the scope of the new location. This often results in aliasing nightmares and other complex problems.
Does it support Linq on all supported databases? A typical O/R mapper supports more than one database, for example SqlServer, Oracle, DB2, MySql etc. Is Linq supported on all these databases? Or is Linq only supported on a subset of these databases? One great point of Linq is that you can write queries without having to worry about database specific issues, at least, if the Linq provider offers enough features.
Can it handle boolean values anywhere in the Linq query, also on other databases? Boolean values aren't supported in SQL, at least not in the SQL standard which is implemented in most databases. Sure, WHERE clause predicates are boolean expressions, but ever tried to use a boolean expression in the SELECT clause ? Another point is that databases like Oracle don't have support for types which can be mapped as booleans: it lacks a bit type. Does the Linq provider offer the ability to map any field in an entity to any type possible, so you can use that field with that type in a Linq query, no matter what database is used? In Linq to LLBLGen Pro I can write the following query and it works on all databases supported:
```
var q = from p in md.Product
          where !p.Discontinued
          select p;
```
Discontinued is a boolean field in the Product entity. Through type converter technology I can create my own type converter and map any field with any .NET type (so also your custom classes) to any database field type. Booleans in projections are also something which isn't always supported by the Linq providers out there:
```
var q = from c in md.Customer
           select new {
                  c.CustomerId,
                  HasOrders = (c.Orders.Count() > 0)
           };
```
The query above seems simple, but it requires a CASE statement under the hood to convert the boolean expression into 1 or 0 which are then converted back to a boolean in the projection. It might be that the Linq provider decided to do this completely on the client, inside the Linq provider, so they return the c.Orders.Count() results. But this gives the problem that if the query is folded into a subquey, it goes wrong:
```
var q = (from c in md.Customer
           select new {
                  c.CustomerId,
                  HasOrders = (c.Orders.Count() > 0)
           }).Where(c=>c.CustomerId.StartsWith("C");
```
Here, the complete query is folded into a derived table and surrounded with a query which filters on the inner result. But, it still has to bring out the boolean value. The SQL query looks like this:
```
SELECT [LPA_L1].[CustomerId], [LPA_L1].[HasOrders] 
FROM 
(
    SELECT [LPLA_1].[CustomerID] AS [CustomerId], 
           CASE 
               WHEN (
                   SELECT COUNT(*) AS [LPAV_] 
                   FROM   [Northwind].[dbo].[Orders] [LPLA_2]  
                    WHERE  [LPLA_1].[CustomerID] = [LPLA_2].[CustomerID]
                   ) > @LPFA_11
               THEN 1 
               ELSE 0 
           END AS [HasOrders] 
    FROM [Northwind].[dbo].[Customers] [LPLA_1] 
) [LPA_L1] 
WHERE [LPA_L1].[CustomerId] LIKE @CustomerId2
```
and the output looks like this:
```
{ CustomerId = CACTU, HasOrders = True }
{ CustomerId = CENTC, HasOrders = True }
{ CustomerId = CHOPS, HasOrders = True }
{ CustomerId = COMMI, HasOrders = True }
{ CustomerId = CONSH, HasOrders = True }
```
Can it combine queries into one single query? Most textbook examples of Linq are one query, stored in a var typed variable and often consumed right away. But, you'll soon find out that you'll write a lot of queries which actually are the same queries but for example have a different Where clause or a different projection. This can be solved by using routines which produce different queries which are then combined into one query. The expression tree however won't automatically fold these query's subtrees into the main tree: they'll appear as ConstantExpression instances. The Linq provider therefore has to be able to pull these expression trees into the main expression tree, and in such a way as if the subtrees always were part of the main tree. If the Linq provider at hand doesn't grasp the concept of external trees being folded into another query, you'll not be able to share query fragments through methods which are used inside the actual Linq query.
Can it do Group By in C# and VB.NET? With multiple aggregates? GroupBy is an extension method which is handled differently by C# and VB.NET: the VB.NET compiler generates the projection into the GroupBy method call expression, the C# compiler always emits a separate Select method call expression which simply refers to the Group By expression sub tree. Needless to say: if you like one of these two languages, you've to make sure your Linq provider of choice handles a GroupBy in your language of choice well. With C# it's more complicated, as the separate Select is expected but you can't simply 'look ahead' for it: you've to keep track of the GroupBy and when you run into the Select, you've to pull it out of your big hat and re-use it. Though not just simply 'use it': the Select's projection has to be folded into the GroupBy's query as the projection, in any situation. One problematic issue is multiple aggregates on a group by which groups on multiple fields. It has to do 'folding' of query fragments into the GroupBy query to make everything work (as in SQL, all aggregates have to be present in the projection of the group by query). This leads to complex code if you want to support multiple aggregates in a Group By. Be sure to check for this if you expect to group on multiple fields. (Yes, that's possible in Linq, didn't you know? )
Can it handle let? Let is a keyword in Linq queries which allows you store a query result into a variable which is from then on used instead of the query assigned to the variable. See the example below:
```
var q = from c in metaData.Category
           let x = c.Products.Select(p => (int)p.UnitsInStock).Sum()
           where x > 500
           orderby x ascending, c.CategoryName descending
           select c;
```
Here, we store the number of products of a category c which are in stock into the variable x and select only the categories which have more than 500 products overall in stock. We re-use x in the order by clause. In SQL, which is a set-based language, there is no such concept as let. Using let results in a wrapping query around the main query with the Sum() and in the Where and Order By the value of the Sum() is referenced. Let is one of these statements which aren't always straight forward for a Linq provider, so pay attention to what you want to do in Linq and if you want to use this statement.
Can it produce server-side paging queries? On all supported databases? Paging is a feature which is a crucial benefit of using an O/R mapper framework: without any effort the framework produces a query which allows you through page through a big resultset (e.g. millions of rows). Though, it's only efficient if the paging takes place in the SQL query, or during fetching of the rows, not after the whole set is fetched into memory. Linq itself has two statements, Skip and Take, which combined offer the abililty to specify which page of data to obtain. Some O/R mappers, like LLBLGen Pro also offer their own extension method to specify which page to retrieve, as Skip and Take could cause confusion, if Skip is mentioned first or if Take is mentioned first, it does make a difference in the syntaxis of Linq.
Does it handle in-memory object construction inside a query? And in-memory method calls? Linq allows the developer to mix code which runs in-memory with code which is translated into a query to be run on a database server. For example it should be possible to call a method on an object inside the projection which processes the value from the database before it is stored inside the object to return. It also should be possible to use array and collection constructors inside a linq query in combination with Contains() calls, so you don't have to first setup the collection, then use it in the query: you can define it right inside the query. See the example below:
```
var q = from c in metaData.Customer
           where !new List<string>(){"FISSA", "PARIS"}.Contains(c.CustomerId)
           select c.Orders;
```
This query retrieves all order collections for customers which actually have orders in Northwind by specifying the CustomerIds of the customers which don't have an order defined. This example is fairly simple, but far more complex examples are thinkable, also with in-projection method calls which perform last-minute in-memory processing of values. Using in-memory objects and methods inside the projection is a key element of Linq: you can process the values retrieved from the database by writing simple code inside a query. It does require that the Linq provider understands what's in-memory code and what's not in-memory code.
Does it offer a flexible way to map .NET methods/properties onto DB functions / constructs? We all know the LIKE statement in SQL, and a typical way to produce LIKE queries is to use the string methods StartsWith, EndsWith or Contains (if they're handled by the provider, of course ). However, what if you have a database function and you want to call that function inside a query? Linq only understands .NET methods and types. The way to do this is to map a .NET method onto a database construct, for example a database function, and simply specify the .NET method in the Linq query and the Linq provider then translates that method call into the usage of the database construct it is mapped on. This way, you're able to utilize the large library of database functions in for example Oracle or DB2, straight from .NET. Also, with these mechanisms it's possible to add Full Text Search support to Linq, and not only for SqlServer but also for MySql for example. More on this in a later episode of this series.
Can it do hierarchical fetches? Of entities? Efficiently? Linq allows you to specify another query in the projection of a query, which can contain one or more queries in its projection etc. etc. It's key for performance that the Linq provider fetches these nested queries as efficiently as possible, i.e.: one SQL query per nested Linq query. Take this simple example:
```
var q = from c in metaData.Customer
          select new {
                c.CustomerId,
                Orders = (from o in c.Orders
                               select new {
                                       o.OrderId,
                                       o.OrderDate })
          };
```
How many SQL queries should this Linq query result in, if we consider that there are say 50,000 customers in the database? Only two: one for the customers and one for the orders. Another typical example is fetching graphs of entities: is it possible to specify which entities to fetch in a graph, like fetch all customers from 'Germany' and all their orders? Linq doesn't offer a facility to specify such a query, which leads to the question if the Linq provider offers this facility, together with the question if the graph is fetched efficiently. I'll go deeper into this feature in a later episode of this series.
Can it deal with Nullable types and implicit casts? .NET supports nullable types, and databases support NULL values. A match made in hell, err... heaven. Nullable types have two properties: HasValue (a boolean, see the point above about booleans) and Value, the actual non-null value. These two properties can be used inside a Linq query as well, in various places. But... it's also possible to compare a Nullable type with a variable, or a value, which aren't of a Nullable type. The developer can also compare the HasValue with a boolean value, or a variable. Which leads to a different expression subtree. Add to that that the location where HasValue is compared to a value could lead to a CASE statement or not, and you're in for a lot of fun .
Does it handle type casts for inheritance scenario's? Can it handle inheritance types in Linq queries? O/R mappers typically support inheritance in one way or the other: they let you derive a subtype entity from a supertype entity and for example allow you to add a relation between the subtype and another entity. Linq offers a couple of ways to specify which types should be used in a query, and not all of them are through a Queryable extension method. If you want to use inheritance in your project, be sure your O/R mapper's Linq provider allows you to filter and specify which types to fetch in a Linq query.

As I said in the beginning, I'm sure I forgot to mention some topics to look into. Like the gazillion ways a Contains query can be written, or if the Linq provider supports joins between queries which again contain joins between entities and queries etc. but this article has already grown too long. The general point is this: if some O/R mapper vendor, and we are one of them so you can apply the same logic to our own code as well, claims 'Linq support', be sure it supports the Linq constructs you're looking for now and in the future.

Don't make the naive mistake that you won't be needing all those fancy joins, hierarchical fetches and what have you: you will need them, simply because they make life very easy for you as a developer. You want to know how simple? Let's close this post with an example of how easy Linq can make it, as it offers the combination of very powerful features right there at your fingertips. The example fetches a hierarchy, using a group by. Though instead of running an aggregate, it returns the whole grouping result. This results in a hierarchy where per key (the field(s) the set was grouped on) all matching elements are stored. In the example below it fetches per Country all Customer entities. It takes 4 lines of code and two SQL queries: one for the group by keys, and one for the data per group. I left the unit-test code around it so you have an idea what the data looks like (Linq to LLBLGen Pro, Selfservicing paradigm)

[Test]
public void GroupingOfEntitiesByGroupByKeyUsingGroupByVariable()
{
    LinqMetaData metaData = new LinqMetaData();
    var q = from c in metaData.Customer
            where c.Country != null
            group c by c.Country into g
            select g;
    int count = 0;
    foreach(var v in q)
    {
        int customerCount = 0;
        foreach(var c in v)
        {
            Assert.AreEqual(v.Key, c.Country);
            customerCount++;
        }
        Assert.IsTrue(customerCount > 0);
        count++;
    }
    Assert.AreEqual(22, count);
}

In future episode's I'll discuss excluding fields, dealing with hierarchical fetches of entities and custom projections, function mappings and much more.

Frans,

I have a question for you regarding LINQ and a layer development approach. I return the results of a LINQ to LLBLGen query through the layers of the application. When I am sending results that match an entity collection it’s easy to pass them along, when it’s a query that does not match a entity collection, say a subset, or combination of entity’s, the process is more cumbersome.

Right now I have to create a class that represents the values being transferred. Do you have another approach to return collections based on none entity? Creating 100’s of classes that are used sometime just once to transfer data seems to smell a bit to me and I don’t know why. Not really an LLBLGen question, more a LINQ and layered development.

BringerOD - Tuesday, June 17, 2008 4:02:55 PM

Thanks Jeff! :D

@Bryan: you could do:
public IQueryable GetCustomersQuery(LinqMetaData metaData)
{
return from c in metaData select c;
}

and then consume it elsewhere:
foreach (var v in GetCustomersQuery(linqMetaData))
{
// do something.
}

so you can pass it along as an IQueryable typed element. You can also return it as an ILLBLGenProQuery typed element, as all our queries have that type as well, so you can inject an adapter onto which db you want to execute the query at a later stage.

It's indeed a bit of a problem, and I think the only approach that's doable is the one described above, however it has the disadvantage that ... there's no type known at compile time. So consuming the query can for example be done through databinding at runtime, but if you want to read properties from the elements returned by the query, it is indeed a bit of a problem, which is caused by how Linq queries and anonymous types work I think. My guess is that Microsoft assumed most linq queries are consumed at the spot where they're created, which is logical concerning the fact that the design of Linq forces you to have all information and objects to execute the query INSIDE the query as well, so passing that across tiers isn't always a good thing.

FransBouma - Tuesday, June 17, 2008 6:37:08 PM

@Frans
Thanks! I am always thinking about how to make my life easier.

Oh, and I agree with Jeff!

BringerOD - Tuesday, June 17, 2008 11:07:35 PM

3 Comments