Last time I introduced LLBLGen Pro QuerySpec, a new fluent API for specifying queries for LLBLGen Pro. As promised I'll write a couple of blogposts about certain aspects of the new API and how it works. Today I'll kick off with the basics.
Two types of queries: EntityQuery<T> and DynamicQuery
There are two types of queries in QuerySpec: entity queries (specified with objects of type EntityQuery<T>, where T is the type of the entity to return) and dynamic queries, which are queries with a custom projection (specified with objects of type DynamicQuery or its typed variant DynamicQuery<T>). The difference between them is mainly visible in the number of actions you can specify on the query. For example, an entity query doesn't have a way to specify a group by, simply because fetching entities is about fetching rows from tables/views, not rows from a grouped set of data. Similarly, DynamicQuery doesn't have a way to specify an entity type filter, simply because it's about fetching a custom projection, not about fetching entities. This difference guides the user of the API with writing the final query: the set of actions to specify, e.g. Where, From, OrderBy etc., is within the scope of what the query will result in.
A custom projection is any projection (the list of elements in the 'SELECT' statement returned by a query) which isn't representing a known, mapped entity. This distinction between entity queries and dynamic queries might sound odd at first, but it will be straightforward once you've worked with a couple of queries.
QuerySpec queries are specifications, they're not executed when enumerated, in fact you can't enumerate them. To obtain the result-set, you have to execute the queries. I'll dig deeper in how to execute QuerySpec queries in a follow up post.
The starting point: the QueryFactory
To get started with a query, the user has to decide what kind of objects the query has to produce: entity objects or objects which contain the result of a custom projection. The question is rather easy if you formulate it like this: "Do I want to obtain one (or more) instances of a known entity type, or something else?". If the answer to that is: "One (or more) instances of a known entity type" the query you'll need is an EntityQuery<T>, in all other cases you'll need a DynamicQuery.
If you change your mind half-way writing your query, no worries: you can create a DynamicQuery from an EntityQuery<T> with the .Select(projection) method and can define a DynamicQuery to return entity class instances, so there's always a way to formulate what you want.
To create a query, we'll use a factory. This factory is a small generated class called QueryFactory. It's the starting point for all your queries in QuerySpec: to create a query, you need an instance of the QueryFactory:
var qf = new QueryFactory();
The QueryFactory instance offers a couple of helper methods and a property for each known entity type, which returns an EntityQuery<entitytype> instance, as well as a method to create a DynamicQuery instance.
We'll first focus on entity queries.
Formulating your query, getting started.
QuerySpec is defined to be close to SQL, so it contains the same building blocks for queries as SQL does. Not all elements are mandatory. For example, not all queries need ordering, a where filter etc. and e.g. entity queries have a fixed projection so you don't have to specify it.
- A projection which defines the result-set shape (the 'SELECT' list). Automatic for Entity Queries, specified with .Select(projection) for Dynamic Queries.
- A source set from which the result-set has to be obtained (the 'FROM' clause). Automatic for a query if a single source (entity) is used, otherwise required for both query types, specified with .From(join list).
- A filter on the source set to obtain a subset of the complete set (the 'WHERE' clause). Optional, specified with .Where(predicate).
- An ordering definition to return the result-set rows in a given order (the 'ORDER BY' clause). Optional, but recommended when a limit or paging directive is specified. Specified with .OrderBy(sort clauses).
- A grouping specification to group source set rows on a subset of fields (the 'GROUP BY' clause). Optional, only available in Dynamic Queries. Specified with .GroupBy(field list).
- A filter on the grouped set of rows after a grouping action (the 'HAVING' clause). Optional, only available in Dynamic Queries, and only used with queries with a grouping specification specified with .GroupBy(). Specified with .Having(expression list).
- A limit specification to limit the number of rows in the result-set (the 'TOP', 'LIMIT', 'FIRST' or equivalent clause). Optional, specified with .Limit(number).
- A specification which filters out duplicate rows in the result-set (the 'DISTINCT' clause). Optional. Automatic for entity queries, if the engine decides it's more efficient to use distinct. Specified with .Distinct().
- A specification to obtain a specific range of rows in the result-set (a Paging directive). Optional, specified with .Page(pageNumber, pageSize).
There are other specific elements per query type, e.g. prefetch paths for entity queries, a typed projection lambda mechanism for dynamic queries, but we'll get to that in a later post.
Entity Queries, getting started.
To create an entity query in its most simplest form, we can simply use one of the properties exposed by a QueryFactory instance. Say we want to formulate a query for Customer entities. The QueryFactory class, which is generated for our domain, exposes a property for each entity in the domain and each property returns an EntityQuery<entitytype>, so the Customer property returns an EntityQuery<CustomerEntity>:
var qf = new QueryFactory();
var q = qf.Customer;
the variable 'q' contains an EntityQuery<CustomerEntity> instance and can be used to fetch customer entity instances. As it doesn't contain any filter, executing it will fetch all customer instances. This is the base line: a query which specifies the complete set of all instances of a given entity type, in this case Customer entities.
In general, you'd want to specify a subset of this complete set. To do so, you specify a Where clause. For example, let's specify a query for all customer entities from the USA:
var qf = new QueryFactory();
var q = qf.Customer
.Where(CustomerFields.Country=="USA");
This query specification is similar to our first specification, but it now defines a filter on the complete set of 'Customer' entities. The .Where method accepts a predicate object, which can be a single predicate (e.g. field == value) or a predicate expression, which can be a combination of predicates concatenated with AND, OR, AND NOT, or OR NOT.
The mechanism to formulate the predicates is the native LLBLGen Pro query api mechanism, and which is documented extensively in the runtime framework documentation. The generated code base contains helper classes to specify a field of a given entity, like the CustomerFields.Country specification above, and the runtime framework uses operator overloading to formulate predicates under the hood. QuerySpec offers additional methods to help you formulate predicates, as shown in the following example which is equal to the one above:
var qf = new QueryFactory();
var q = qf.Customer
.Where(CustomerFields.Country.Equal("USA"));
For now we'll leave the predicate specification semantics and focus on the query definitions themselves.
Now that we can filter our set of entities, we can look at how we can do more advanced things. For example, let's specify a query for all Customer entities which have an order on file which was filed by employee with employee id '2'. This is a query which has a filter on a related entity. To do that we can use several mechanisms, one of them is a join. This comes down to: defining a large set with Customer and Order data (the source set), filter that data with a filter and obtain the result-set from that filtered set. In short: join customer with order, use a where clause on the order fields, select the customer fields:
var qf = new QueryFactory();
var q = qf.Customer
.From(QueryTarget.InnerJoin(qf.Order)
.On(CustomerFields.CustomerId==OrderFields.CustomerId))
.Where(OrderFields.EmployeeId==2);
We need to specify the source set to be a joined set between Customer and Order, so when filtering on the fields from Order in that set, we automatically filter the rows of Customer. To do that in SQL you'd use Customer INNER JOIN Order ON .... Here we do the same thing, we specify an INNER JOIN between Customer and Order in the From() call. As the query we're specifying it on is already a Customer query, we use QueryTarget instead of re-specifying qf.Customer. In case you're wondering why the .InnerJoin isn't defined on qf.Customer instead of the From() clause, it's because it introduced ambiguity which is explained more in detail in the document in the QuerySpec beta archive available to LLBLGen Pro v3 customers.
In a follow up post we'll go deeper into what's available in QuerySpec with respect to joins and filters.
Dynamic Queries, getting started.
After we've seen how to create entity queries, it's now time to show how to create Dynamic Queries. This is done roughly in the same way as entity queries:
var qf = new QueryFactory();
var q = qf.Create();
The variable 'q' now contains a DynamicQuery instance, but we can't really do much with it: it's completely empty. To specify a query, we have to specify a projection using the .Select() method. Let's specify a query for all country names available in the Customer set, with the additional requirement that duplicates have to be filtered out. LLBLGen Pro is clever enough that when you specify CustomerFields.Country, you're fetching from 'Customer' so you don't have to specify a source set.
var qf = new QueryFactory();
var q = qf.Create()
.Select(CustomerFields.Country)
.Distinct();
This query is an untyped query, the resultset will be a list of object arrays (with 1 value in each array). To make it a typed query, we have to specify a typed projection. In the untyped variant, 'q' is a DynamicQuery. In the typed variant, 'q' will be a DynamicQuery<T>, where 'T' is the type of the objects returned by the query when it's executed. To create a typed variant, we use a different overload of .Select(), namely the one which accepts a lambda:
var qf = new QueryFactory();
var q = qf.Create()
.Select(()=>CustomerFields.Country.ToValue<string>())
.Distinct();
The variable 'q' is now a DynamicQuery<string>, due to the lambda specification in the Select(). One other new element in the Select call is the .ToValue<T>() call. QuerySpec will parse the lambda specification in the Select() method and will transform the objects which have calls to known methods to elements for the projection and leave all other code alone. This allows you to formulate complex projection statements with code which isn't convertible to SQL statements. For each row to project into the resultset, the lambda is ran at runtime, executing all code you've specified there. In our simple example above, it simply returns the value for CustomerFields.Country.
Executing the query above will result in a List<string> with all country values from the Customer set.
Dynamic Queries can, similar to Entity Queries, contain definitions for larger source sets from which you project your final result-set. Let's say you want all CompanyName values for all customers who have an order on file filed by employee with employee id '2':
var qf = new QueryFactory();
var q = qf.Create()
.Select(()=>CustomerFields.CompanyName.ToValue<string>())
.From(qf.Customer.InnerJoin(qf.Order)
.On(CustomerFields.CustomerId == OrderFields.CustomerId))
.Where(OrderFields.EmployeeId==2)
.Distinct();
Here we first specify the projection, using a .Select() method call. We then specify the source-set by using a .From() method call and after that we specify the filter on that source-set by using the .Where() method call. To limit duplicates, we specify .Distinct().
There's much more possible than this tiny scratch of the surface. I'll go deeper into the possibilities of QuerySpec in a follow up post.
QuerySpec is a free add-on for LLBLGen Pro v3 customers and is currently in beta. The complete archive with sourcecode is downloadable from the customer area on our website.
In the past two months I've been busy creating a new query specification API for LLBLGen Pro. Our native query API is modeled after SQL statement fragments (like a 'predicate', a 'relationship', a 'field'), but specifying a query with it can be a little verbose, and above all: the code doesn't look like a query. Especially with complex queries and projections it can sometimes be tedious to grasp what the SQL will look like and what the query is doing. With LLBLGen Pro QuerySpec this changes: a fluent, compact, highly expressive API which allows you to write queries in the structure of the SQL it will produce and with the expressiveness of Linq.
LLBLGen Pro QuerySpec went into beta yesterday, and is a free add-on for LLBLGen Pro v3.0/v3.1 users and available in the customer area. In theory it should work for v2.6 users as well, but we didn't test that. In the next weeks I hope to write more blog posts about QuerySpec and its features.
But first, let's look into why QuerySpec saw the light of day. In 2008 we introduced LLBLGen Pro v2.6 with a full Linq provider: you could formulate queries in Linq which made things much easier than writing queries in our own native query API. In v3.0, released in 2010, we fine-tuned the Linq provider more to make it handle more edge cases. So why a new query API?
Linq, but simpler
I've spend roughly 10 months full time on our first version of the Linq provider. In that period I learned that there are two core issues with Linq when you use it for specifying a database query: one for the user writing the Linq query and one for the Linq provider developer. I'll briefly try to explain these problems.
The user's problem: no 1:1 mapping of Linq onto SQL.
To make a Linq query runnable on a relational database, it has to be transformed into SQL. This is problematic because Linq's construct in many cases don't map 1:1 onto SQL constructs. Examples are group join, group by, multiple aggregates in projections, where clauses inside join clauses etc. To be able to create SQL that works on the RDBMS, these constructs have to be transformed into other constructs which are then interpreted and transformed into SQL. The transformation of these elements is a complex affair as the intend of the Linq expression sub-tree has to be interpreted and that result has to be used to create the transformation result.
This can get complex rather quickly. An example is a group by with a nested aggregate in the projection. Linq allows that, SQL doesn't. See this example:
LinqMetaData metaData = new LinqMetaData(adapter);
var q = from o in metaData.Order
group o by o.Customer.Country into g
orderby g.Key
where g.Sum(n => n.OrderDetails.Count()) > 10
select new
{
Country = g.Key,
Num = g.Average(n => n.OrderDetails.Count(od => od.ProductId == 3))
};
Sounds simple? Let's look at the SQL it results in. (param values are inlined by the profiler for easy displaying the SQL, the query did use parameters when it was run)
SELECT [LPA_L1].[Country],
[LPA_L1].[LPAV_1] AS [Num]
FROM (SELECT [LPA_L3].[Country],
SUM([LPA_L3].[LPAV_]) AS [LPAV_],
AVG([LPA_L3].[LPAV_1]) AS [LPAV_1]
FROM (SELECT [LPA_L4].[Country],
[LPA_L4].[OrderId],
[LPA_L4].[LPAV_],
(SELECT COUNT(* ) AS [LPAV_]
FROM [Northwind].[dbo].[Order Details] [LPLA_7]
WHERE (([LPA_L4].[OrderId] = [LPLA_7].[OrderID])
AND ([LPLA_7].[ProductID] = 3 /* @p1 */))) AS [LPAV_1]
FROM (SELECT [LPA_L5].[Country],
[LPA_L6].[OrderID] AS [OrderId],
(SELECT COUNT(* ) AS [LPAV_]
FROM [Northwind].[dbo].[Order Details] [LPLA_4]
WHERE (([LPA_L6].[OrderID] = [LPLA_4].[OrderID]))) AS [LPAV_]
FROM ([Northwind].[dbo].[Customers] [LPA_L5]
RIGHT JOIN [Northwind].[dbo].[Orders] [LPA_L6]
ON [LPA_L5].[CustomerID] = [LPA_L6].[CustomerID])) [LPA_L4]) [LPA_L3]
GROUP BY [LPA_L3].[Country]) [LPA_L1]
WHERE ((((([LPA_L1].[LPAV_] > 10 /* @p3 */)))))
ORDER BY [LPA_L1].[Country] ASC
As you can see, this shows what I call folding: it folds a query into another one as the source to make sure SQL can deal with it. With every transformation comes change. This means that the user can't easily predict what the SQL might look like. The user might take a ballpark guess, but unless s/he has worked with the Linq provider for quite some time, it's hard to do. This is problematic because the application doesn't run in a void, it uses the database, likely quite heavily. This means that the database interaction has to be efficient. With all the transformations going on, this is very hard to optimize, from a developer's point of view and from a DBA's point of view.
QuerySpec overcomes this by offering a transformation free API. There are no transformations done, what you write is what you get.
The Linq provider developer problem: Scoping
A problem I had with the Linq provider was scoping. Due to the sequence oriented nature of Linq, scopes are hard to define, however they're essential for SQL: an alias used inside a sub-query isn't usable outside that sub-query. For the provider developer it becomes problematic to determine what will end up in which SQL scope to make sure aliases are correct. A Linq provider aliases all elements by itself, as the user doesn't specify any aliases. Due to transformations, these aliases change, and scopes change with them. This in itself is solvable with mapping tables, alias objects and scope objects. The problem becomes complex when the original source is hard to track down.
For example, when you specify 2 join statements in a linq query, the second join will return an element which joins the result of the first with the right operand of the second. What was the entity joined by the left operand's first property? You have to track that back. If a transformation took place, you have to make sure the mapping tables are kept up to date. This is all caused by the fact that Linq doesn't know any scoping as SQL does. It does have some scoping but it's incompatible with SQL, and making the two match was for me the biggest challenge when writing a Linq provider. Don't get fooled by the syntactic sugar C# and VB.NET offer you, that's not what the Linq provider sees, the linq provider sees calls to extension methods in a sequence.
In the end my code works on, I think, 99% of the scoping problems (and trust me, there are some very complex problems thinkable), but to get there, I had to write a lot of complex code with a lot of transformation logic in it. To understand all that takes time, and thus to make changes to it takes even longer.
QuerySpec doesn't have this problem: as it follows SQL's structure, scoping isn't a problem, aliases are specified by the user, and scopes are defined by the user as well, through the API's flow.
To have a solid alternative to Linq, we needed an API that could match it in expressiveness and readability and could offer what our Linq provider does too: nested queries in projections, complex query definitions in a single statement, function mappings and full LLBLGen Pro feature compatibility, like prefetch path support, exclusion/inclusion of fields etc.
It also had to have the flexibility of our native API: easily append query fragments in logic (e.g. append where clauses), write your own predicate classes or other methods to create queries for your domain really easily, and be compile time safe, so when a field or entity is renamed or removed, your queries should fail to compile so you can adjust the problem right away.
QuerySpec is built on top of our native API (as is our Linq provider), and embeds all that.
Quick tour of QuerySpec's features.
I'll now very briefly go into what QuerySpec offers. One of the main goals was to re-use what was already generated, and make the user use it in an elegant, simple fashion. Another goal was to keep things close to SQL: no transformations due to mismatches. We did add some helper constructs to make things more expressive, but if you don't want to use them, you don't have to. For example there's an Any() operator, but if you feel more confident with using WHEN EXISTS, you're perfectly fine by using it, it doesn't take any extra code.
QuerySpec starts with its own factory, the QueryFactory. The QueryFactory produces either DynamicQuery instances for ad-hoc projections, or EntityQuery<T> instances for queries which return one or more entity instances of type T. Let's look at a couple of examples.
Entity query: select orders from customers from the UK:
var qf = new QueryFactory();
var q = qf.Order
.From(QueryTarget.InnerJoin(OrderEntity.Relations.CustomerEntityUsingCustomerId))
.Where(CustomerFields.Country == "UK")
.OrderBy(CustomerFields.City.Descending());
This uses a join specified by a relationship which is already generated into the generated code. Advantage is that you don't need to know the FK - PK relationship. Disadvantage is that it doesn't look like a real join.
Alternative:
var qf = new QueryFactory();
var q = qf.Order
.From(QueryTarget.InnerJoin(qf.Customer)
.On(OrderFields.CustomerId==CustomerFields.CustomerId)
.Where(CustomerFields.Country == "UK")
.OrderBy(CustomerFields.City.Descending());
This uses a join using an ON clause, which accepts a predicate. I specified an INNER join, but I also could have specified a left or right join instead. No join .. into with a DefaultIfEmpty() construct which gets transformed under the hood into a left join.
Dynamic query: a group by query with a typed projection (so a List<T> will be returned) and a nested query in the projection, using an aliased set.
var qf = new QueryFactory();
var q = qf.Create()
.Select(() => new
{
Key = CustomerFields.Country.ToValue<string>(),
CustomersInCountry = qf.Customer.TargetAs("C")
.CorrelatedOver(CustomerFields.Country.Source("C") == CustomerFields.Country)
.ToResultset()
})
.GroupBy(CustomerFields.Country);
LLBLGen Pro doesn't require you to specify a From clause as it can figure that one out by itself. Every query has methods like Select, Where, From, Having, GroupBy, which represent a clause in the final SQL query. This is predictable and has a 1:1 projection onto SQL. As we're working with an O/R mapper and the entities it deals with, writing plain SQL isn't going to help: the entity might look different than the actual table it's mapped on, it might be a subtype in an inheritance hierarchy etc. etc.
Because there are no transformations and because scoping is clear, two of the more major problems with a Linq provider are solved. I'm sure some people will not like it and stick with Linq and our Linq provider. That's fine, and it was never our goal to replace it. We just wanted to provide a solid alternative for the people who want one, and I think our new API fits that requirement. Because it all uses the same core elements under the hood, you can write one query in one API and the other query in another API and run it on the same framework: more choice, more ways to solve a problem, so less problems for you, the developer.
LLBLGen Pro QuerySpec beta is now available for LLBLGen Pro v3.x users and is a free add-on for v3.x users and can be downloaded in the customer area of our website. Stay tuned for more QuerySpec examples and blogposts in the coming weeks!