Developing Linq to LLBLGen Pro, part 12
(Updated Wednesday 30-jan-2008). It was mentioned that we would implement 'Skip' as well, although we already had a paging method added, TakePage(). After carefull analysis, we decided not to implement Skip for now. The reason is that it can lead to confusing queries, while paging is what the developer wants. We believe TakePage() serves the developer better than a combination of Skip / Take (Take is still supported separately) which won't work in a lot of cases if Skip is specified alone.
(This is part of an on-going series of articles, started here)
Cast again
The last episode in this series
contained a remark about Queryable.Cast, and that it
can be ignored. I've to correct myself there, this isn't
entirely correct. Let's look at the example query at hand:
// LLBLGen Pro Linq example var q = from e in metaData.Employee.Where(e => e is BoardMemberEntity).Cast<BoardMemberEntity>() select e.CompanyCarId;
Here, the Cast is actually irrelevant because the Where already filters on the BoardMemberEntity type. However in the following small query, it's not:
var q = from e in metaData.Employee.Cast<BoardMemberEntity>() select e.CompanyCarId;
Here, the Cast is actually a type filter. Well... sort of. The thing is: there's no real 'good' way to handle this, similar to 'as', which is discussed below. Imagine the situation where the employee instance is of type ManagerEntity (supertype of BoardMemberEntity), and not of type BoardMemberEntity. What should the query above do in that case? The instance 'e' in that case doesn't have a CompanyCarId field, as that's a field only available in the BoardMemberEntity type. The specification of Queryable.Cast doesn't reveal any help here: it doesn't say what should be done when the Cast can't be performed. So, I decided it emits a type filter instead. This means that the query above always succeeds if there are BoardMemberEntity instances in the database, it simply filters out any types which aren't of the specified type, in this case BoardMemberEntity.
OfType
Queryable.OfType is a bit of a weird method, in that it
actually does more or less what Cast does in that it filters
out any elements which aren't of the type specified. Of
course, 'Cast' by definition isn't a filter, but in database
queries, you have little choice here: SQL isn't imperative,
at least not in the way SELECT queries work. OfType
therefore is implemented similar to Cast: it emits a type
filter into the query.
The 'as' keyword
I already had implemented support for 'is', which resulted
also in a typefilter (if you now have the feeling there are
a couple of redundant ways to specify the same thing in
Linq, you're correct), but 'as' is a little different. It
again is more or less a keyword which is actually not really
usable in database queries. Let's look at a query which
illustrates this:
var q = from e in metaData.Employee
where (e as BoardMemberEntity).CompanyCarId==1
select e;
This in itself looks pretty straight forward, but look
closely to what it represents: what if 'e' is a ClerkEntity
(not a supertype of BoardMemberEntity) ? The 'as' keyword
should then result in null/Nothing, so accessing the
property CompanyCarId on it should result in an exception,
at least in .NET code it does. But how are you going to
convert this into SQL? One could argue, and I agree with
that, that the query above is pretty poor code and
constructions like that should be discouraged. Though the
thing is that just because it's possible, it will cause some
people to write code just like that.
Fortunately, the code above is translatable to SQL in
general form, if the 'as' keyword is seen as a type
conversion, so 'e' is simply seen as a BoardMemberEntity and
the fact that a 'null' can occur is ignored as the filter on
CompanyCarId will weed out any types without that field in
the query anyway (no matter what inheritance mapping
strategy is used). The only difference is that if you
expect an error, you won't get one. Let's look at a more
nastier version of the one above which actually is more
correct:
var q = from e in metaData.Employee
let x = e as BoardMemberEntity
where (x!=null) && (x.CompanyCarId == 1)
select x;
Here, across two statements, a type filter is written: first
the 'as' conversion in the let statement, and then
the test for null on the result. You can generally write
this as a handling of type specification equal/not
equal null and by adding a special handler for that in the
general binary expression handler, you can convert that
situation to a type filter!
There's something not really great about the query above
though. Linq to Sql has no problems, because it only
supports single-table inheritance. If the O/R mapper
supports multi-table/view inheritance, like
LLBLGen Pro
does, you have a bit of a problem with field names. Consider
a hierarchy where the root has three branches of subtypes,
and in two branches a field Foo is present, though not in
the third branch. If you now fetch all entities of the root
type, you will end up with multiple times the same field in
the projection. The fix for that is field aliasing, namely
you give every field an artificial alias, e.g. Fx_n, where x
is some index to specify the entity and n is the index of
the field in the entity.
The problem begins when 'let' entered the room: a 'let' is
effectively a wrapping SELECT statement. (You can rewrite
some queries by moving the query to other parts of the query
but that's not always possible). The query wraps the
original query in its FROM clause as a Derived Table and
simply selects from that source, making the projection
required for let. Now, imagine you have your fields
aliased with artificial aliases inside that source. Your
Derived Table fields now have names like F0_1, F3_4 etc.,
but not 'CompanyCarId'. This means that the filter on
CompanyCarId, which is outside the Derived Table, as
let wrapped what's before 'let' in the query, will
fail if it targets 'CompanyCarId'. Normally it would work,
as the JOIN statements which joined every subtype's
Table/View would be accessable for the WHERE clause, but in
this case that's not possible because the joins are inside
the Derived Table.
As elements are processed in different areas of the Linq
provider (you're not going to write a big routine which
handles every situation), the handler for Where has no clue
if the field should be re-aliased, at least not at that
point. So I wrote a traverser, a class which crawls over all
LLBLGen Pro query api objects and finds the elements sought.
Basicly it's a base class which simply visits all object
elements and if you in a derived class override a given
method you can tap into this process and do what you need to
do, e.g. collect Derived Table definitions, fields targeting
derived tables etc. I wrote a couple of derived classes
which collected information for me this way and with that I
could easily fix every field alias and target in the entire
query, so CompanyCarId references in filters outside the
Derived Table would then be fixed to target the Fx_n field
which represented CompanyCarId. The crawlers are also handy
to find places where inheritance relations have to be
injected into the query, for example because the query is
folded into a subquery inside a Derived Table. The classes
will be public in the upcoming runtime library of
LLBLGen Pro
v2.6 so our customers can use them as well for their own
fine-grained query voodoo magic
.
Except and Intersect
Except and Intersect are two methods which are actually each
other complement: you can implement both with a single
construct and just a flag to add 'NOT' to the query
fragment. Except and Intersect can be implemented as an
EXISTS query construct, similar to the work done for
Contains: a lot of code can be re-used for these two
methods.
Except and Intersect can have an IEnumerable as argument.
Although Linq to Sql doesn't support it (I don't know why)
it's perfectly doable, and in Linq to LLBLGen Pro you can
pass an in-memory set of elements to Except or Intersect.
What's particularly weird with Linq to Sql not supporting it
is that the code to support in-memory sets as argument for
these two methods is also usable for Contains. Oh well...
.
There's some pitfall to be noticed with these two methods
however. Consider this query:
// won't work
var q = from c in metaData.Customers
where someCollection.Except(c.Orders).First().EmployeeId==3
select c;
This query might look like a bit far-fetched, but the
general idea is this: it's not possible to use an in-memory
collection in a database-targeting query where Except is
called on that in-memory collection with a database set as
argument. Do you see why? It took me 3 days to realize this,
so don't worry if you don't see it right away. The thing is:
Except filters the set it is called on. But that set
isn't in the database, you've to pass it to the database
in the query. With PK fields, that might be possible,
but not with complete entities, that's undoable. With
Intersect it could be done though, however I haven't
implemented that for now, as it's easy to work around it
(swap the operands of Intersect
).
Single
Now here's some method I have no idea why it is in the API:
Queryable.Single. Let me first quote the specification of
the method:
Returns the only element of a sequence, and throws an exception if there is not exactly one element in the sequence.
There's an overload which accepts a predicate and which
simply means that Single(predicate) will return the only
element in the sequence which matches this predicate. What I
find odd is the remark about throwing
an exception: why would anyone call such a
method? There is a rule about exceptions: "Don't ever use exceptions for control flow in your
application". Exceptions aren't expressions for if/else constructs,
they're serious business: they mean something was definitely
wrong and needs handling to avoid a total crash. In database
scenario's it's even weirder: which exception should you
expect? And more importantly: what does it mean? If you
don't know what the true meaning of an exception is, you'll
never be able to handle it.
The example in the MSDN to illustrate Queryable.Single() is
pretty bad actually, because it uses exceptions as control
flow. Not only that, the example fails to illustrate a valid
case where the exception would be something you would want.
This is important, because... I can't think of a use case
for Queryable.Single() where you would want the exception.
After all, exceptions aren't meant for control flow, so I
don't expect an exception to drive my code as if an
if-statement resolved to false: it's not meant to be a test
if a set contains more than one element.
The thing with Single is that it's redundant, at least for
database queries. You can also use First(). First() also
returns a single element, but it doesn't throw an exception
when there are more than one element in the sequence. With
database queries, using Single() has the same behavior as
First(). Sure, I could add code which flags the resulting
QueryExpression object that the result should be checked if
it has more than one element and if so, an exception should
be thrown, as Linq to Sql does with this query:
// Linq to Sql code var q = (from c in nw.Customers select c).Single();
However, do I get the same exception that there are more than one element in the sequence with this one:
// Linq to Sql code
var q = from c in nw.Customers
where c.Orders.Single().EmployeeId==3
select c;
No. With this query I get a hardcore severity level 16 error
from SqlServer that the query is wrong because the subquery
returns more than one element which can't be used with the
operator (=) specified. This isn't the fault of Linq to Sql,
what else can it do? Call RAISERROR (someone at Sybase still
feels ashamed about that typo I bet
) with a Count check? Why? Would that help the caller of
the query? I'm not convinced it necessary will help. The
'Single' method is simply not useful in database queries:
for the Single() overload, use First(), for the
Single(predicate) overload use Where(predicate).First().
Though, as the requirement of the method is that it should
throw an (not specified which one) exception, these usable
synonym statements aren't completely covering what Single()
represents. Though in my opinion, the exception requirement
is a big mistake: if you need behavior to be called when a
set has more than one element, you should test on that
specification and call the behavior if the test succeeds,
not by issuing a query which in the end fails with whatever
error takes place, and count on that to handle things
further.
I'll add support for it, though under protest. The reason
I'll add support for it is to be compatible with queries
which target other O/R mapper frameworks.
Linq to Sql issue(s)
During development of the Linq to LLBLGen Pro provider I
often check what Linq to Sql does with a given query I use
for testing and to see if for example my SQL is more
efficient or falls flat on its face. Sometimes you run into
unexpected things. When I tried an Except or Intersect
query, I saw that 'DISTINCT' was emitted into the query.
Everyone who knows SQL knows that DISTINCT is a keyword you
have to be careful with, not all databases support every
type of field with DISTINCT. In SqlServer for example,
(n)text and image fields aren't supported in a DISTINCT
projection. Not sure if this was a small glitch or that
there was logic which would prevent DISTINCT in queries
where it's not allowed I tried:
// Linq to Sql, gives crash with NotSupportedException. var q = (from e in nw.Employees select e).Distinct();
It too emitted DISTINCT into the query, which was caught by its validation checker. However what's worse: Except and Intersect therefore also aren't usable with Linq to Sql and any image/(n)text containing entity type: DISTINCT is always emitted into the query. This query for example gives the same exception, though no DISTINCT was specified in the Linq query:
var q = from e in nw.Employees.Except(
from e in nw.Employees where e.Country == "USA" select e)
select e;
One could argue: "But if DISTINCT isn't possible, how to weed out the duplicates?". Well, you do that on the client in the routine which consumes the datareader and constructs the objects to return. You keep hashtables with hashes calculated from identifying attributes like PK fields and with that you filter out duplicates. At least with entity fetches like this one. Not supporting situations where DISTINCT can't be emitted into a SQL query is a typical error one could make in a v1 O/R mapper, it's only a bit sad for Linq to Sql users that Microsoft is so generous with releasing fixes for their framework on a regular basis.
Writing a Linq provider is a lot about true software
engineering, I've written about that several times before:
some things are known but the biggest part is unknown
territory: what is constructed in which order, when to
expect what, is it safe to ignore this or that? Unclear
things one can only find out when it is used, i.e. by
trial/error approaches. This is actually a bit sad, because
it's now easy to overlook mistakes or miss corner cases, as
the bigger picture isn't always clear. For example: what
does VB.NET emit into the expression tree for string
concatenations if C# emits 'Add' operations ?
With the DISTINCT keyword popping up in the Linq to Sql
query for Except I was immediately alarmed: why is it there?
Linq to Sql never emits DISTINCT when it's not told to do
so. You then start thinking about it: is it something which
is a left-over from their tree manipulations? Or is it
hardcoded set to be emitted? It turns out it is. In
QueryConverter.VisitExcept and QueryConverter.VisitIntersect
it sets 'IsDistinct' to true. I couldn't think of a reason
to have it default to DISTINCT, so I didn't add a
requirement for that to our tree for Except and Intersect,
also because it uses an EXISTS query, which doesn't care if
DISTINCT is there or not.
Both routines are also a clone of eachother. Clones are
easily created and often overlooked, however this one is
particularly obvious, especially because the behavior of the
two routines is closely related, so the implementation of
the handlers is then too closely related.
What's next?
Implementing 'Single', probably tomorrow, then on to
implementing the database function framework I have in mind:
it has to be a framework where developers can add their own
function mappings to the provider so they can map their own
extension methods to database functions easily. When that's
done, Queryable.Convert gets another look as some scenario's
require it not to be stripped off but handled instead,
though that relies on the function-mapping framework, so it
has to wait till then. After that, hierarchical projections
and prefetch path support are on the menu (prefetch paths
are more a small addition as the core functionality is
already in the runtime for quite some time, hierarchical
projections require the prefetch path merge code already in
the framework to be opened up to the Linq provider) and then
I'm done with the Linq part. Finally. Stay tuned.