Developing Linq to LLBLGen Pro, Day 1

Wednesday, September 12, 2007

(This is part of an on-going series of articles, started here)

I didn't have that much time today to work on our Linq to LLBLGen Pro layer, but nevertheless there are a couple of interesting things to mention. .

It's all about the Source, Luke
Let's look at a skeleton of a very simple Linq query:

// C#
var q = from c in source
        select c;

There are a couple of fuzzy things to see in the query above: a) what the **** is 'q' and b) why is 'source' italic printed? Let's first address b). Linq is actually a Domain Specific Language (DSL), embedded inside C# or VB.NET. This means that it's not really part of the language, it's a different language. Compiling the complete code in C# and VB.NET will result in MSIL, so what to do with the Linq code inside the C# or VB.NET code? There are two options:

If the source implements IQueryable<T>, convert the query to an expression tree and pass it to the implementing source for processing and execution
If the source implements solely IEnumerable<T>, convert the query to a set of calls to extension methods on source which will be ran in-memory using normal generated MSIL.

The expression tree created when the source implements IQueryable<T> is similar to a parse tree, and contains all information from the outer-scope (the C# / VB.NET code) including the query / Linq code itself. This gives an interesting problem: what to use for source? If you look at Linq to Sql, it uses Table<T> classes. However for collections inside entities (e.g. Customer.Orders) it uses EntitySet<T>. The reason is simple: if you have a set of entities loaded into memory and you want to perform a Linq to Objects query on them in-memory, you don't want the C# / VB.NET compilers to create an expression tree which is sent to the IQueryable<T> implementation in the source of the query. To avoid that, the source has to implement IQueryable<T> but also shouldn't be used as a collection / container which could also be the source of an in-memory query.

I initially thought that our EntityCollection<T> class would be a good source class for the IQueryable<T> implementation but I then would run into a problem as described above: an in-memory collection would never be usable as the source in a Linq to Objects query. So I will need a separate class for solely this purpose. I dubbed it DataSource<T>. This class will implement IEnumerable<T> and IQueryable<T>. It's unclear at this point if it has to implement IQueryProvider as well. Linq to Sql's Table<T> class does, but it seems unnecessary and also a combination of concerns you don't want (the combination of IEnumerable<T> and IQueryable<T> is already too much IMHO, but I'm not designing the Linq API )

So, the next question then is: how is the expression tree, created at runtime by code emitted by the C# / VB.NET compilers, converted to an object which is able to produce a result? To answer that, let's go back to our example query above. The statement has to result in an object which can be placed in the 'q' variable. We'll get to q in a second. The source (it's him again) has a property called Provider. This provider implements IQueryProvider, and that interface has a method called CreateQuery() which accepts ... an Expression object which is our expression tree.

The fuzzyness isn't over though. CreateQuery() results in an IQueryable<T> implementing object. If this sounds rather recursive and a bit odd, it is: an IQueryable<T> (the source) is asked for a provider to create ... an IQueryable<T>, but not the same kind of IQueryable<T> as the source, as the source is just a placeholder, a stand-in to be able to take part in the expression tree.

Still with me? Good.

So, now we're arriving at that mysterious 'q'. In the James Bond movies, 'Q' was already a somewhat mysterious fellow, and now in Linq queries it seems his nephew, little 'q', stepped in to spoil the party! In the query above, we used 'var' as the type specification, because we don't know at the time of writing what kind of type q's value has. This isn't really true for the above query, but for queries which create new anonymous types, the 'T' in the IQueryable<T> implementing return value of IQueryProvider.CreateQuery() is anonymous, and therefore unknown.

The class which implements the IQueryable<T> and which instance is returned from IQueryProvider.CreateQuery() is typically native to the O/R mapper which is handling the execution of the query. The main reason is that the instance placed in 'q' is the port to the O/R mapper and the actual execution of the query in expression tree form. In Linq to Sql this is DataQuery<T>, and in our code it will be LLBLGenProQuery<T>. The class automatically implements IEnumerable<T> through IQueryable<T>, and when the enumerator is requested, the expression tree has to be 'executed' and the result has to be returned. This way, deferred execution of a query is accomplished.

It is also key to store any information to make the execution of the query possible inside the object placed in 'q'. This can be done for example under the hood by the source which has access to the provider and can feed the provider with objects it has received when the source itself was created.

The source has to be an IEnumerable<T> implementing type (as it seems), where the T is the type you're actually interested in (i.e. the Entity type). Trying to feed the query a normal class which implemented IQueryable<T> didn't result in a proper expression tree in my tests.

So enough information for some serious programming! .

No Comments