Developing Linq to LLBLGen Pro, Day 1
(This is part of an on-going series of articles, started here)
I didn't have that much time today to work on our Linq to
LLBLGen Pro layer, but nevertheless there are a couple of
interesting things to mention.
.
It's all about the Source, Luke
Let's look at a skeleton of a very simple Linq query:
// C# var q = from c in source select c;
There are a couple of fuzzy things to see in the query above: a) what the **** is 'q' and b) why is 'source' italic printed? Let's first address b). Linq is actually a Domain Specific Language (DSL), embedded inside C# or VB.NET. This means that it's not really part of the language, it's a different language. Compiling the complete code in C# and VB.NET will result in MSIL, so what to do with the Linq code inside the C# or VB.NET code? There are two options:
- If the source implements IQueryable<T>, convert the query to an expression tree and pass it to the implementing source for processing and execution
- If the source implements solely IEnumerable<T>, convert the query to a set of calls to extension methods on source which will be ran in-memory using normal generated MSIL.
The expression tree created when the source implements IQueryable<T> is similar to a parse tree, and contains all information from the outer-scope (the C# / VB.NET code) including the query / Linq code itself. This gives an interesting problem: what to use for source? If you look at Linq to Sql, it uses Table<T> classes. However for collections inside entities (e.g. Customer.Orders) it uses EntitySet<T>. The reason is simple: if you have a set of entities loaded into memory and you want to perform a Linq to Objects query on them in-memory, you don't want the C# / VB.NET compilers to create an expression tree which is sent to the IQueryable<T> implementation in the source of the query. To avoid that, the source has to implement IQueryable<T> but also shouldn't be used as a collection / container which could also be the source of an in-memory query.
I initially thought that our EntityCollection<T> class
would be a good source class for the IQueryable<T>
implementation but I then would run into a problem as
described above: an in-memory collection would never be
usable as the source in a Linq to Objects query. So I will
need a separate class for solely this purpose. I dubbed it
DataSource<T>. This class will implement
IEnumerable<T> and IQueryable<T>. It's unclear
at this point if it has to implement IQueryProvider as well.
Linq to Sql's Table<T> class does, but it seems
unnecessary and also a combination of concerns you don't
want (the combination of IEnumerable<T> and
IQueryable<T> is already too much IMHO, but I'm not
designing the Linq API
)
So, the next question then is: how is the expression tree,
created at runtime by code emitted by the C# / VB.NET
compilers, converted to an object which is able to produce a
result? To answer that, let's go back to our example query
above. The statement has to result in an object which can be
placed in the 'q' variable. We'll get to q in a second. The
source (it's him again) has a property called
Provider. This provider implements IQueryProvider, and that
interface has a method called CreateQuery() which accepts
... an Expression object which is our expression tree.
The fuzzyness isn't over though. CreateQuery() results in an
IQueryable<T> implementing object. If this sounds
rather recursive and a bit odd, it is: an
IQueryable<T> (the source) is asked for a provider to
create ... an IQueryable<T>, but not the same kind of
IQueryable<T> as the source, as the source is
just a placeholder, a stand-in to be able to take part in
the expression tree.
Still with me? Good.
So, now we're arriving at that mysterious 'q'. In the James Bond movies, 'Q' was already a somewhat mysterious fellow, and now in Linq queries it seems his nephew, little 'q', stepped in to spoil the party! In the query above, we used 'var' as the type specification, because we don't know at the time of writing what kind of type q's value has. This isn't really true for the above query, but for queries which create new anonymous types, the 'T' in the IQueryable<T> implementing return value of IQueryProvider.CreateQuery() is anonymous, and therefore unknown.
The class which implements the IQueryable<T> and which
instance is returned from IQueryProvider.CreateQuery() is
typically native to the O/R mapper which is handling the
execution of the query. The main reason is that the instance
placed in 'q' is the port to the O/R mapper and the actual
execution of the query in expression tree form. In Linq to
Sql this is DataQuery<T>, and in our code it will be
LLBLGenProQuery<T>. The class automatically implements
IEnumerable<T> through IQueryable<T>, and when
the enumerator is requested, the expression tree has to be
'executed' and the result has to be returned. This way,
deferred execution of a query is accomplished.
It is also key to store any information to make the
execution of the query possible inside the object placed in
'q'. This can be done for example under the hood by the
source which has access to the provider and can feed the
provider with objects it has received when the source itself
was created.
The source has to be an IEnumerable<T> implementing type (as it seems), where the T is the type you're actually interested in (i.e. the Entity type). Trying to feed the query a normal class which implemented IQueryable<T> didn't result in a proper expression tree in my tests.
So enough information for some serious programming!
.