Frans Bouma's blog

Generator.CreateCoolTool();

Syndication

News




    Add to Technorati Favorites

About me

Fun stuff I created

My work

June 2008 - Posts

Email form disabled again...

I've again disabled the email form on this blog, because I now get about 15 spam emails every 5 minutes through this form and enough is enough. Apparently it's not fixable by the blog-engine overloads, so this is the only option. If you want to contact me, my email address is on the about page.

Posted Thursday, June 26, 2008 4:41 PM by FransBouma | with no comments

Filed under:

Linq to LLBLGen Pro: feature highlights, part 1

Some people asked me what the highlights are of Linq to LLBLGen Pro, which was released this week, as it seems that Linq support is apparently growing on trees these days. In this and some future posts I'll try to sum up some of the characteristic features of Linq to LLBLGen Pro, so you don't have to wade through the 15 articles I wrote about writing Linq to LLBLGen Pro . I'll write several of these articles, this is the first one. I hope to write more of them in the coming weeks.

Linq to LLBLGen Pro is a full implementation of Linq
The first feature I'd like to highlight is the simple fact that it's a full implementation of a Linq provider. You now might think "Isn't that obvious? It is a Linq provider", but I've to spoil that dream for you: Most Linq 'providers' out there are just implementations of a small subset of what a Linq provider is expected to do. You see, implementing a Linq provider isn't just about writing a handler for MethodCall expressions to Queryable's extension methods. That's just a small part of it. The main part is about writing code which supports everything you can run into when traversing and handling an Expression tree. Everything. There's no room for compromises, sadly enough: if you don't implement a feature offered to the developer through Linq, it might be the developer is unable to write the query with Linq, as you can't mix and match things to form a SQL query in the end: the query is written in Linq or it's not.

If you run the risk of getting exceptions at runtime because the expression tree contains unsupported elements or constructs or worse: whole subtrees which aren't understood, would you use such a Linq provider? You might think this isn't a big deal, but the thing is: you only know exactly which expression tree is generated and passed to the Linq provider when you run the code: it is perfectly possible that the provider only expects Queryable extension method calls at the root of the tree, and it for example doesn't expect a 'Where' call at one side of a Join for example.

Writing a Linq provider is a lot of work which requires a lot of code. If you're dealing with a Linq provider which is just, say, 32KB in size, you can be sure it will not support the majority of situations you will run into. However, the O/R mapper developer likely simply said 'We have Linq support', and it's even likely the provider can handle the more basic examples of a single entity type fetch with a Where, an Order By or even a Group By. But in real life, once you as a developer have tasted the joy of writing compact, powerful queries using Linq, you will write queries with much more complexity than these Linq 101 examples. Will the Linq 'provider' you chose be able to handle these as well? In other words: is it a full Linq provider or, as some would say, a 'toy' ?

For developers who will use a Linq provider, it's often a tough call how to decide what will be a solid Linq provider to work with and how much 'Linq' is actually implemented and supported by the used Linq provider. Below I've added a list of questions you can ask yourself when you're testing out a Linq provider. Perhaps some aren't important to you now, but consider that a software project often lasts for several years: most time on a software project is spend during maintenance, so the provider chosen has to be able to deal with many, many cases, and you shouldn't be forced to swap out the provider (and thus the O/R mapper) later on.

It can be the O/R mapper doesn't support a given feature in general, e.g. UNION queries. It's then unlikely that the Linq provider will support the feature. That's not the set of problems I'm talking about here: what I'm talking about are queries which are expected to work as Linq queries, considering the feature set of the O/R mapper (e.g. it supports paging, so paging through Linq queries should also work) but are failing at runtime due to the lack of support in the Linq provider.

The list below is far from complete, I'll update it if more topics are brought forward. I've compiled this list mainly from memory from my experiences when writing Linq to LLBLGen Pro where countless hours have been spend on answering 'what if?' questions related to what is possible with Linq.

  • Can it do joins?. Linq has two types of joins (it also has GroupJoins which are discussed further below)
    // Type A: typical cross join
    var q = from c in md.Customer
               from o in md.Order
               where c.CustomerId==o.CustomerId
               select c;
    // Type B: real join clauses
    var q = from c in md.Customer
              join o in md.Order on c.CustomerId equals o.CustomerId
              select c;
    Type A results in a MethodCall expression to SelectMany(), Type B results in a MethodCall expression to Join(). SelectMany() is typically handled through a CROSS JOIN SQL statement, Join() is typically handled through INNER JOIN SQL statements (or non-ansi joins if the database requires it, like Oracle 8i). If the Linq provider doesn't handle type B, you're stuck with cross joins which can seriously slow down your queries at runtime. Also keep in mind that with type B you can only create INNER JOINs. To produce LEFT / RIGHT joins, you need support for GroupJoin and DefaultIfEmpty.
  • Can it handle GroupJoin and DefaultIfEmpty ?. A GroupJoin is a Linq specific type of join: it's a combination of group by and a join. As it doesn't really have a SQL equivalent, it's a bit difficult to translate it to SQL. GroupJoin is one of the extension methods of Queryable which isn't implemented in many Linq providers out there. However, you need it if you want to do LEFT/RIGHT joins, as it works together with DefaultIfEmpty:
    var q = from c in md.Customer
              join o in md.Order on c.CustomerId equals o.CustomerId into co
              from x in co.DefaultIfEmpty()
              where x.OrderId == null
              select c;
    This query produces a LEFT JOIN between Customer and Order and filters out any customer with one or more orders. LEFT / RIGHT JOINs are a vital part of any querying system, and therefore if you can't express these kind of queries, you're likely going to have a struggle writing efficient code, or you're forced to use another querying API than Linq which makes the whole point of using Linq in the first place rather moot. The biggest pain with GroupJoin is that the actual GroupJoin resides in one part of the expression tree, but the reference where it is actually being used, the DefaultIfEmpty, is in another part of the expression tree. This might not sound like a big deal, but it can be the GroupJoin is behind an alias border, which means the GroupJoin isn't directly reachable as it seems, but it should be reachable as the DefaultIfEmpty requires it: as if you pull one subtree into another part of the expression tree however with the scope of the new location. This often results in aliasing nightmares and other complex problems.
  • Does it support Linq on all supported databases? A typical O/R mapper supports more than one database, for example SqlServer, Oracle, DB2, MySql etc. Is Linq supported on all these databases? Or is Linq only supported on a subset of these databases? One great point of Linq is that you can write queries without having to worry about database specific issues, at least, if the Linq provider offers enough features.
  • Can it handle boolean values anywhere in the Linq query, also on other databases? Boolean values aren't supported in SQL, at least not in the SQL standard which is implemented in most databases. Sure, WHERE clause predicates are boolean expressions, but ever tried to use a boolean expression in the SELECT clause ? Another point is that databases like Oracle don't have support for types which can be mapped as booleans: it lacks a bit type. Does the Linq provider offer the ability to map any field in an entity to any type possible, so you can use that field with that type in a Linq query, no matter what database is used? In Linq to LLBLGen Pro I can write the following query and it works on all databases supported:
    var q = from p in md.Product
              where !p.Discontinued
              select p;
    Discontinued is a boolean field in the Product entity. Through type converter technology I can create my own type converter and map any field with any .NET type (so also your custom classes) to any database field type. Booleans in projections are also something which isn't always supported by the Linq providers out there:
    var q = from c in md.Customer
               select new {
                      c.CustomerId,
                      HasOrders = (c.Orders.Count() > 0)
               };
    The query above seems simple, but it requires a CASE statement under the hood to convert the boolean expression into 1 or 0 which are then converted back to a boolean in the projection. It might be that the Linq provider decided to do this completely on the client, inside the Linq provider, so they return the c.Orders.Count() results. But this gives the problem that if the query is folded into a subquey, it goes wrong:
    var q = (from c in md.Customer
               select new {
                      c.CustomerId,
                      HasOrders = (c.Orders.Count() > 0)
               }).Where(c=>c.CustomerId.StartsWith("C");
    Here, the complete query is folded into a derived table and surrounded with a query which filters on the inner result. But, it still has to bring out the boolean value. The SQL query looks like this:
    SELECT [LPA_L1].[CustomerId], [LPA_L1].[HasOrders] 
    FROM 
    (
        SELECT [LPLA_1].[CustomerID] AS [CustomerId], 
               CASE 
                   WHEN (
                       SELECT COUNT(*) AS [LPAV_] 
                       FROM   [Northwind].[dbo].[Orders] [LPLA_2]  
                        WHERE  [LPLA_1].[CustomerID] = [LPLA_2].[CustomerID]
                       ) > @LPFA_11
                   THEN 1 
                   ELSE 0 
               END AS [HasOrders] 
        FROM [Northwind].[dbo].[Customers] [LPLA_1] 
    ) [LPA_L1] 
    WHERE [LPA_L1].[CustomerId] LIKE @CustomerId2
    and the output looks like this:
    { CustomerId = CACTU, HasOrders = True }
    { CustomerId = CENTC, HasOrders = True }
    { CustomerId = CHOPS, HasOrders = True }
    { CustomerId = COMMI, HasOrders = True }
    { CustomerId = CONSH, HasOrders = True }
  • Can it combine queries into one single query? Most textbook examples of Linq are one query, stored in a var typed variable and often consumed right away. But, you'll soon find out that you'll write a lot of queries which actually are the same queries but for example have a different Where clause or a different projection. This can be solved by using routines which produce different queries which are then combined into one query. The expression tree however won't automatically fold these query's subtrees into the main tree: they'll appear as ConstantExpression instances. The Linq provider therefore has to be able to pull these expression trees into the main expression tree, and in such a way as if the subtrees always were part of the main tree. If the Linq provider at hand doesn't grasp the concept of external trees being folded into another query, you'll not be able to share query fragments through methods which are used inside the actual Linq query.
  • Can it do Group By in C# and VB.NET? With multiple aggregates? GroupBy is an extension method which is handled differently by C# and VB.NET: the VB.NET compiler generates the projection into the GroupBy method call expression, the C# compiler always emits a separate Select method call expression which simply refers to the Group By expression sub tree. Needless to say: if you like one of these two languages, you've to make sure your Linq provider of choice handles a GroupBy in your language of choice well. With C# it's more complicated, as the separate Select is expected but you can't simply 'look ahead' for it: you've to keep track of the GroupBy and when you run into the Select, you've to pull it out of your big hat and re-use it. Though not just simply 'use it': the Select's projection has to be folded into the GroupBy's query as the projection, in any situation. One problematic issue is multiple aggregates on a group by which groups on multiple fields. It has to do 'folding' of query fragments into the GroupBy query to make everything work (as in SQL, all aggregates have to be present in the projection of the group by query). This leads to complex code if you want to support multiple aggregates in a Group By. Be sure to check for this if you expect to group on multiple fields. (Yes, that's possible in Linq, didn't you know? )
  • Can it handle let? Let is a keyword in Linq queries which allows you store a query result into a variable which is from then on used instead of the query assigned to the variable. See the example below:
    var q = from c in metaData.Category
               let x = c.Products.Select(p => (int)p.UnitsInStock).Sum()
               where x > 500
               orderby x ascending, c.CategoryName descending
               select c;
    Here, we store the number of products of a category c which are in stock into the variable x and select only the categories which have more than 500 products overall in stock. We re-use x in the order by clause. In SQL, which is a set-based language, there is no such concept as let. Using let results in a wrapping query around the main query with the Sum() and in the Where and Order By the value of the Sum() is referenced. Let is one of these statements which aren't always straight forward for a Linq provider, so pay attention to what you want to do in Linq and if you want to use this statement.
  • Can it produce server-side paging queries? On all supported databases? Paging is a feature which is a crucial benefit of using an O/R mapper framework: without any effort the framework produces a query which allows you through page through a big resultset (e.g. millions of rows). Though, it's only efficient if the paging takes place in the SQL query, or during fetching of the rows, not after the whole set is fetched into memory. Linq itself has two statements, Skip and Take, which combined offer the abililty to specify which page of data to obtain. Some O/R mappers, like LLBLGen Pro also offer their own extension method to specify which page to retrieve, as Skip and Take could cause confusion, if Skip is mentioned first or if Take is mentioned first, it does make a difference in the syntaxis of Linq.
  • Does it handle in-memory object construction inside a query? And in-memory method calls? Linq allows the developer to mix code which runs in-memory with code which is translated into a query to be run on a database server. For example it should be possible to call a method on an object inside the projection which processes the value from the database before it is stored inside the object to return. It also should be possible to use array and collection constructors inside a linq query in combination with Contains() calls, so you don't have to first setup the collection, then use it in the query: you can define it right inside the query. See the example below:
    var q = from c in metaData.Customer
               where !new List<string>(){"FISSA", "PARIS"}.Contains(c.CustomerId)
               select c.Orders;
    This query retrieves all order collections for customers which actually have orders in Northwind by specifying the CustomerIds of the customers which don't have an order defined. This example is fairly simple, but far more complex examples are thinkable, also with in-projection method calls which perform last-minute in-memory processing of values. Using in-memory objects and methods inside the projection is a key element of Linq: you can process the values retrieved from the database by writing simple code inside a query. It does require that the Linq provider understands what's in-memory code and what's not in-memory code.
  • Does it offer a flexible way to map .NET methods/properties onto DB functions / constructs? We all know the LIKE statement in SQL, and a typical way to produce LIKE queries is to use the string methods StartsWith, EndsWith or Contains (if they're handled by the provider, of course ). However, what if you have a database function and you want to call that function inside a query? Linq only understands .NET methods and types. The way to do this is to map a .NET method onto a database construct, for example a database function, and simply specify the .NET method in the Linq query and the Linq provider then translates that method call into the usage of the database construct it is mapped on. This way, you're able to utilize the large library of database functions in for example Oracle or DB2, straight from .NET. Also, with these mechanisms it's possible to add Full Text Search support to Linq, and not only for SqlServer but also for MySql for example. More on this in a later episode of this series.
  • Can it do hierarchical fetches? Of entities? Efficiently? Linq allows you to specify another query in the projection of a query, which can contain one or more queries in its projection etc. etc. It's key for performance that the Linq provider fetches these nested queries as efficiently as possible, i.e.: one SQL query per nested Linq query. Take this simple example:
    var q = from c in metaData.Customer
              select new {
                    c.CustomerId,
                    Orders = (from o in c.Orders
                                   select new {
                                           o.OrderId,
                                           o.OrderDate })
              };
    How many SQL queries should this Linq query result in, if we consider that there are say 50,000 customers in the database? Only two: one for the customers and one for the orders. Another typical example is fetching graphs of entities: is it possible to specify which entities to fetch in a graph, like fetch all customers from 'Germany' and all their orders? Linq doesn't offer a facility to specify such a query, which leads to the question if the Linq provider offers this facility, together with the question if the graph is fetched efficiently. I'll go deeper into this feature in a later episode of this series.
  • Can it deal with Nullable types and implicit casts? .NET supports nullable types, and databases support NULL values. A match made in hell, err... heaven. Nullable types have two properties: HasValue (a boolean, see the point above about booleans) and Value, the actual non-null value. These two properties can be used inside a Linq query as well, in various places. But... it's also possible to compare a Nullable type with a variable, or a value, which aren't of a Nullable type. The developer can also compare the HasValue with a boolean value, or a variable. Which leads to a different expression subtree. Add to that that the location where HasValue is compared to a value could lead to a CASE statement or not, and you're in for a lot of fun .
  • Does it handle type casts for inheritance scenario's? Can it handle inheritance types in Linq queries? O/R mappers typically support inheritance in one way or the other: they let you derive a subtype entity from a supertype entity and for example allow you to add a relation between the subtype and another entity. Linq offers a couple of ways to specify which types should be used in a query, and not all of them are through a Queryable extension method. If you want to use inheritance in your project, be sure your O/R mapper's Linq provider allows you to filter and specify which types to fetch in a Linq query.

As I said in the beginning, I'm sure I forgot to mention some topics to look into. Like the gazillion ways a Contains query can be written, or if the Linq provider supports joins between queries which again contain joins between entities and queries etc. but this article has already grown too long. The general point is this: if some O/R mapper vendor, and we are one of them so you can apply the same logic to our own code as well, claims 'Linq support', be sure it supports the Linq constructs you're looking for now and in the future.

Don't make the naive mistake that you won't be needing all those fancy joins, hierarchical fetches and what have you: you will need them, simply because they make life very easy for you as a developer. You want to know how simple? Let's close this post with an example of how easy Linq can make it, as it offers the combination of very powerful features right there at your fingertips. The example fetches a hierarchy, using a group by. Though instead of running an aggregate, it returns the whole grouping result. This results in a hierarchy where per key (the field(s) the set was grouped on) all matching elements are stored. In the example below it fetches per Country all Customer entities. It takes 4 lines of code and two SQL queries: one for the group by keys, and one for the data per group. I left the unit-test code around it so you have an idea what the data looks like (Linq to LLBLGen Pro, Selfservicing paradigm)

[Test]
public void GroupingOfEntitiesByGroupByKeyUsingGroupByVariable()
{
    LinqMetaData metaData = new LinqMetaData();
    var q = from c in metaData.Customer
            where c.Country != null
            group c by c.Country into g
            select g;
    int count = 0;
    foreach(var v in q)
    {
        int customerCount = 0;
        foreach(var c in v)
        {
            Assert.AreEqual(v.Key, c.Country);
            customerCount++;
        }
        Assert.IsTrue(customerCount > 0);
        count++;
    }
    Assert.AreEqual(22, count);
}

In future episode's I'll discuss excluding fields, dealing with hierarchical fetches of entities and custom projections, function mappings and much more.

Posted Tuesday, June 17, 2008 1:11 PM by FransBouma | 4 comment(s)

LLBLGen Pro v2.6 has been released!

After almost 11 months of design, development, beta testing and adding final polish, it's here: LLBLGen Pro v2.6! This version, which is a free upgrade for all our v2.x customers, has a couple of major new features, the biggest of course being the full implementation of Linq support in our O/R mapper framework. The work on our Linq provider, which we've dubbed 'Linq to LLBLGen Pro', lasted almost 9 months and was discussed on this blog in a series of articles, which I'll linq () to below.

In the beginning of writing the Linq provider, I was pretty optimistic that it would be easy and quick, but after a while I got very pessimistic and wanted to skip it entirely as it would simply cost too much effort, and therefore time and money. The main reason was the lack of serious documentation and background on various essential details like which expression trees were formed from which linq queries, and how to understand them in full so a meaningful query could be produced from them to run on the database. Anyone who has written some form of Linq provider or is currently busy doing so will run into this problem. Doing trial-error development/research for a couple of months in a row isn't a picknick, but it's also part of being a Software Engineer so it always left me with a mixed bag of what to think of it: it's exciting and interesting, but also frustrating.

A good example of the lack of serious documentation on expression trees is the way the VB.NET compiler compiles a group-by query into an expression tree vs. how the C# compiler does that: the C# compiler always adds a separate .Select() method call, the VB.NET compiler doesn't. In theory, the VB.NET compiler is right: the group-by clause has to be in the same query scope as the projection, but as C# has to be supported as well, you have to build some form of 'look-ahead' inside the tree to see when / if / how the projection is present after the group-by expression is seen. This isn't documented anywhere. Another one is the way how anonymous types are detected. There's no boolean on the Type object which tells you 'This is an anonymous type'. So you check the name. The C# compiler generates names which start with <>, the VB.NET compiler generates names which start with $VB$. And probably yet another language which runs on the CLR and which adds Linq support might choose another prefix. Nowhere is this documented but it is sometimes required to know that the type you're dealing with is an anonymous type.

But that's all water under the bridge now. Looking back, I'm so incredibly happy that management did succeed in motivating me to go on and continue working on the Linq provider, as the end result is one of the most feature-rich Linq providers available on .NET today. With the future technology coming from Microsoft like Dynamic Data and ADO.NET Data services, but also with IQueryable supporting controls from third-party developers like DevExpress, it's becoming more and more clear that a modern O/R mapper system has to have deep and solid Linq support.

What's new?
I won't enlist everything here, just a few items of what's new in LLBLGen Pro v2.6.

  • Full Linq support with our own Linq to LLBLGen Pro provider.
  • .NET 3.5 support. With code changes in the runtime so it works better with Linq to Objects and with VS.NET 2008 project templates
  • Derived table support. Use any query as source for a join side or as From clause
  • Much lower memory consumption during transactions: 90% less memory overhead for temp values during transactions. Temp values are used to be able to roll-back to the start state of the entity graph when a transaction rolls back (PK's roll back, FK's synced with the new PK values roll back etc.)
  • Up to 20% less memory usage for entity graphs
  • String uniquing. When fetching a lot of redundant string data, the same string instance is now re-used to avoid unnecessary memory consumption. This is done without string interning.
  • SqlServer 2008 support, SqlServer CE 3.5 support, CF.NET 3.5 support
  • Using SqlServer CE Desktop is now much easier
  • Plus... a lot of small, but important changes and enhancements.

Linq to LLBLGen Pro development articles
For the people who want to re-read all the articles on the development of Linq to LLBLGen Pro, they're linked below.

Hereby, I'd like to thank all the beta-testers and all the others who have supported us and have given feedback in one way or the other! .

In the next weeks we'll be releasing updated code for our Dynamic Data support and also new code for support for ADO.NET Data Services (Astoria). We'll also publish our Linq to Sql templates on our main site so people who aren't yet a customer can try them out as well.

The coming year
In the coming year, we'll be working on LLBLGen Pro v3, something I'm very excited about. It won't just be a designer upgrade, it will be a designer revolution, combined with a new way of generating code. Of course, we'll be bringing further enhancements and fine-grained tweaks to our own O/R mapper framework and runtime. The project goals are huge, but that's what makes things interesting, right? One of the first steps is a transactional graph manager which can manipulate object graphs in-memory on a transactional basis. I hope to blog about that sometime soon.

But for now, a day of rest and sunshine .

Posted Monday, June 09, 2008 11:37 AM by FransBouma | 13 comment(s)

More Posts