Introducing LLBLGen Pro QuerySpec: a new fluent query API

Friday, April 8, 2011

In the past two months I've been busy creating a new query specification API for LLBLGen Pro. Our native query API is modeled after SQL statement fragments (like a 'predicate', a 'relationship', a 'field'), but specifying a query with it can be a little verbose, and above all: the code doesn't look like a query. Especially with complex queries and projections it can sometimes be tedious to grasp what the SQL will look like and what the query is doing. With LLBLGen Pro QuerySpec this changes: a fluent, compact, highly expressive API which allows you to write queries in the structure of the SQL it will produce and with the expressiveness of Linq.

LLBLGen Pro QuerySpec went into beta yesterday, and is a free add-on for LLBLGen Pro v3.0/v3.1 users and available in the customer area. In theory it should work for v2.6 users as well, but we didn't test that. In the next weeks I hope to write more blog posts about QuerySpec and its features.

But first, let's look into why QuerySpec saw the light of day. In 2008 we introduced LLBLGen Pro v2.6 with a full Linq provider: you could formulate queries in Linq which made things much easier than writing queries in our own native query API. In v3.0, released in 2010, we fine-tuned the Linq provider more to make it handle more edge cases. So why a new query API?

Linq, but simpler

I've spend roughly 10 months full time on our first version of the Linq provider. In that period I learned that there are two core issues with Linq when you use it for specifying a database query: one for the user writing the Linq query and one for the Linq provider developer. I'll briefly try to explain these problems.

The user's problem: no 1:1 mapping of Linq onto SQL.

To make a Linq query runnable on a relational database, it has to be transformed into SQL. This is problematic because Linq's construct in many cases don't map 1:1 onto SQL constructs. Examples are group join, group by, multiple aggregates in projections, where clauses inside join clauses etc. To be able to create SQL that works on the RDBMS, these constructs have to be transformed into other constructs which are then interpreted and transformed into SQL. The transformation of these elements is a complex affair as the intend of the Linq expression sub-tree has to be interpreted and that result has to be used to create the transformation result.

This can get complex rather quickly. An example is a group by with a nested aggregate in the projection. Linq allows that, SQL doesn't. See this example:

LinqMetaData metaData = new LinqMetaData(adapter);
var q = from o in metaData.Order
        group o by o.Customer.Country into g
        orderby g.Key
        where g.Sum(n => n.OrderDetails.Count()) > 10
        select new 
        { 
            Country = g.Key, 
            Num = g.Average(n => n.OrderDetails.Count(od => od.ProductId == 3)) 
        };

Sounds simple? Let's look at the SQL it results in. (param values are inlined by the profiler for easy displaying the SQL, the query did use parameters when it was run)

SELECT   [LPA_L1].[Country],
         [LPA_L1].[LPAV_1]  AS [Num]
FROM     (SELECT   [LPA_L3].[Country],
                   SUM([LPA_L3].[LPAV_])  AS [LPAV_],
                   AVG([LPA_L3].[LPAV_1]) AS [LPAV_1]
          FROM     (SELECT [LPA_L4].[Country],
                           [LPA_L4].[OrderId],
                           [LPA_L4].[LPAV_],
                           (SELECT COUNT(* ) AS [LPAV_]
                            FROM   [Northwind].[dbo].[Order Details] [LPLA_7]
                            WHERE  (([LPA_L4].[OrderId] = [LPLA_7].[OrderID])
                                    AND ([LPLA_7].[ProductID] = 3 /* @p1 */))) AS [LPAV_1]
                    FROM   (SELECT [LPA_L5].[Country],
                                   [LPA_L6].[OrderID] AS [OrderId],
                                   (SELECT COUNT(* ) AS [LPAV_]
                                    FROM   [Northwind].[dbo].[Order Details] [LPLA_4]
                                    WHERE  (([LPA_L6].[OrderID] = [LPLA_4].[OrderID]))) AS [LPAV_]
                            FROM   ([Northwind].[dbo].[Customers] [LPA_L5]
                                    RIGHT JOIN [Northwind].[dbo].[Orders] [LPA_L6]
                                      ON [LPA_L5].[CustomerID] = [LPA_L6].[CustomerID])) [LPA_L4]) [LPA_L3]
          GROUP BY [LPA_L3].[Country]) [LPA_L1]
WHERE    ((((([LPA_L1].[LPAV_] > 10 /* @p3 */)))))
ORDER BY [LPA_L1].[Country] ASC

As you can see, this shows what I call folding: it folds a query into another one as the source to make sure SQL can deal with it. With every transformation comes change. This means that the user can't easily predict what the SQL might look like. The user might take a ballpark guess, but unless s/he has worked with the Linq provider for quite some time, it's hard to do. This is problematic because the application doesn't run in a void, it uses the database, likely quite heavily. This means that the database interaction has to be efficient. With all the transformations going on, this is very hard to optimize, from a developer's point of view and from a DBA's point of view.

QuerySpec overcomes this by offering a transformation free API. There are no transformations done, what you write is what you get.

The Linq provider developer problem: Scoping

A problem I had with the Linq provider was scoping. Due to the sequence oriented nature of Linq, scopes are hard to define, however they're essential for SQL: an alias used inside a sub-query isn't usable outside that sub-query. For the provider developer it becomes problematic to determine what will end up in which SQL scope to make sure aliases are correct. A Linq provider aliases all elements by itself, as the user doesn't specify any aliases. Due to transformations, these aliases change, and scopes change with them. This in itself is solvable with mapping tables, alias objects and scope objects. The problem becomes complex when the original source is hard to track down.

For example, when you specify 2 join statements in a linq query, the second join will return an element which joins the result of the first with the right operand of the second. What was the entity joined by the left operand's first property? You have to track that back. If a transformation took place, you have to make sure the mapping tables are kept up to date. This is all caused by the fact that Linq doesn't know any scoping as SQL does. It does have some scoping but it's incompatible with SQL, and making the two match was for me the biggest challenge when writing a Linq provider. Don't get fooled by the syntactic sugar C# and VB.NET offer you, that's not what the Linq provider sees, the linq provider sees calls to extension methods in a sequence.

In the end my code works on, I think, 99% of the scoping problems (and trust me, there are some very complex problems thinkable), but to get there, I had to write a lot of complex code with a lot of transformation logic in it. To understand all that takes time, and thus to make changes to it takes even longer.

QuerySpec doesn't have this problem: as it follows SQL's structure, scoping isn't a problem, aliases are specified by the user, and scopes are defined by the user as well, through the API's flow.

To have a solid alternative to Linq, we needed an API that could match it in expressiveness and readability and could offer what our Linq provider does too: nested queries in projections, complex query definitions in a single statement, function mappings and full LLBLGen Pro feature compatibility, like prefetch path support, exclusion/inclusion of fields etc.

It also had to have the flexibility of our native API: easily append query fragments in logic (e.g. append where clauses), write your own predicate classes or other methods to create queries for your domain really easily, and be compile time safe, so when a field or entity is renamed or removed, your queries should fail to compile so you can adjust the problem right away.

QuerySpec is built on top of our native API (as is our Linq provider), and embeds all that.

Quick tour of QuerySpec's features.

I'll now very briefly go into what QuerySpec offers. One of the main goals was to re-use what was already generated, and make the user use it in an elegant, simple fashion. Another goal was to keep things close to SQL: no transformations due to mismatches. We did add some helper constructs to make things more expressive, but if you don't want to use them, you don't have to. For example there's an Any() operator, but if you feel more confident with using WHEN EXISTS, you're perfectly fine by using it, it doesn't take any extra code.

QuerySpec starts with its own factory, the QueryFactory. The QueryFactory produces either DynamicQuery instances for ad-hoc projections, or EntityQuery<T> instances for queries which return one or more entity instances of type T. Let's look at a couple of examples.

Entity query: select orders from customers from the UK:

var qf = new QueryFactory();
var q = qf.Order
             .From(QueryTarget.InnerJoin(OrderEntity.Relations.CustomerEntityUsingCustomerId))
             .Where(CustomerFields.Country == "UK")
             .OrderBy(CustomerFields.City.Descending());

This uses a join specified by a relationship which is already generated into the generated code. Advantage is that you don't need to know the FK - PK relationship. Disadvantage is that it doesn't look like a real join.

Alternative:

var qf = new QueryFactory();
var q = qf.Order
             .From(QueryTarget.InnerJoin(qf.Customer)
             .On(OrderFields.CustomerId==CustomerFields.CustomerId)
             .Where(CustomerFields.Country == "UK")
             .OrderBy(CustomerFields.City.Descending());

This uses a join using an ON clause, which accepts a predicate. I specified an INNER join, but I also could have specified a left or right join instead. No join .. into with a DefaultIfEmpty() construct which gets transformed under the hood into a left join.

Dynamic query: a group by query with a typed projection (so a List<T> will be returned) and a nested query in the projection, using an aliased set.

var qf = new QueryFactory();
var q = qf.Create()
            .Select(() => new
            {
                 Key = CustomerFields.Country.ToValue<string>(),
                 CustomersInCountry = qf.Customer.TargetAs("C")
                                          .CorrelatedOver(CustomerFields.Country.Source("C") == CustomerFields.Country)
                                          .ToResultset()
            })
            .GroupBy(CustomerFields.Country);

LLBLGen Pro doesn't require you to specify a From clause as it can figure that one out by itself. Every query has methods like Select, Where, From, Having, GroupBy, which represent a clause in the final SQL query. This is predictable and has a 1:1 projection onto SQL. As we're working with an O/R mapper and the entities it deals with, writing plain SQL isn't going to help: the entity might look different than the actual table it's mapped on, it might be a subtype in an inheritance hierarchy etc. etc.

Because there are no transformations and because scoping is clear, two of the more major problems with a Linq provider are solved. I'm sure some people will not like it and stick with Linq and our Linq provider. That's fine, and it was never our goal to replace it. We just wanted to provide a solid alternative for the people who want one, and I think our new API fits that requirement. Because it all uses the same core elements under the hood, you can write one query in one API and the other query in another API and run it on the same framework: more choice, more ways to solve a problem, so less problems for you, the developer.

LLBLGen Pro QuerySpec beta is now available for LLBLGen Pro v3.x users and is a free add-on for v3.x users and can be downloaded in the customer area of our website. Stay tuned for more QuerySpec examples and blogposts in the coming weeks!

Nice - In my Linq-to-sql work I'm often unsure exactly how my query is going to get executed. It makes me a bit nervous from performance perspectives, so something like this is much better (plus for me QuerySpec reads much better).

Matt Roberts - Friday, April 8, 2011 2:13:29 PM

This looks great. I have been using the similar QueryOver in nHibernate and it is much more natural conversion from SQL than Linq is.

Craig - Sunday, April 10, 2011 9:58:12 AM

I did arrive to the same simplification of Linq, independently for my own custom ORM ;)

When you look what becomes as Expressions from a simple query...i always wanted a simpler way, more extensible and yet more like sql.

liviu - Wednesday, April 13, 2011 5:45:22 PM

Hi Frans, grats on the new feature.

Does it offer the possibility to reuse query parts like the predicate system does?
I like to build my complex predicates in one place in a subfunction, and then reuse them in all the queries that need them.
Would that be doable in this new system?

Also you mention that those queries are compile-time safe. Does it have the same level of compile-time checking as the current predicate system?

Renaud Martinon - Thursday, April 14, 2011 1:54:26 PM

@Renaud:
you can use your custom predicates created in your own methods with this. E.g.:
qf.Customer
.Where(MyPredicateProducerMethod(10));

the .Where() method accepts an IPredicate, so if your method returns an IPredicate, it's usable :) In your own methods, you can also use parts of the system, e.g. we added a lot of extension methods to make life easier for creating predicates, scalar queries, etc. you can use these in your own predicate producer methods.

Compile time safe is like it was before. If you do:
qf.Customer.Where(CustomerFields.CustomerId=="CHOPS");

and 'CustomerId' is changed to 'Id', the compiler will throw an error. It doesn't do type checking.

FransBouma - Thursday, April 14, 2011 2:08:17 PM

Linq, but simpler

The user's problem: no 1:1 mapping of Linq onto SQL.

The Linq provider developer problem: Scoping

Quick tour of QuerySpec's features.

5 Comments