Development With A Dot

Blog on development in general, and specifically on .NET

Sponsors

News

My Friends

My Links

Permanent Posts

Portuguese Communities

Comparing LINQ Expressions

Introduction

I recently came upon this problem: how to calculate a hash from a LINQ expression so that I can safely compare two expressions for equality? I had naively assumed that the Expression class – and their descendants – would have implemented GetHashCode in an appropriate way, so as to make developer’s lifes easier, but unfortunately Microsoft thought otherwise.

After looking it up on the Internet, I could see two “solutions”:

  • Convert the Expression to its string representation and get the hash code of it;
  • Use an ExpressionVisitor to visit all contained expressions and calculate their individual hash – if this seems recursive to you, that’s because it is!

Comparing the String Representation

The first “solution” doesn’t actually work, because two expressions might represent exactly the same and yet have different string representations. For example, consider:

   1: Expression<Func<Int32, Boolean>> isEven = x => (x % 2) == 0;

and

   1: Expression<Func<Int32, Boolean>> isEven = n => (n % 2) == 0;

The only thing that differentiates these two expressions is the name of the lambda parameter, unfortunately it causes their string representations to be different.

One possible solution might be to use a regular expression to get all occurrences of lambda expressions, capture the name of the lambda variables, and then do a search and replace for some well known name:

   1: var original = expression.ToString();
   2: var replaced = original;
   3: var reParameterDeclaration = new Regex(@"(?<it>\w+)\s=>\s");
   4: var matches = reParameterDeclaration.Matches(original);
   5:  
   6: for (var i = 0; i < matches.Count; ++i)
   7: {
   8:     var match = matches[i];
   9:     var it = match.Groups[1].Value;
  10:  
  11:     replaced = Regex.Replace(replaced, String.Format(@"\b{0}\b", it), "it");
  12: }
  13:  
  14: Int32 hashCode = replaced.GetHashCode();

At first sight – at least, for me! – this seemed to work, however, the replacement pattern – “get me all words composed of only the lambda variable” -  might match something that it wasn’t supposed to, for instance:

   1: //here it works
   2: Expression<Func<Int32, Boolean>> isEven = x => ((x % 2) == 0);
   3:  
   4: //here it doesn't, because the x inside the string is also matched
   5: Expression<Func<String, String>> x => "x y z";

I might use a different replacement regular expression, for example, I could check for “all lambda variables followed by a dot (.)”:

   1: replaced = Regex.Replace(replaced, String.Format(@"\b{0}\.", it), "it");

But this wouldn’t get code like this:

   1: //won't work because there's no dot after the lambda variable x
   2: var sortedNames = names.OrderBy(x => x);

To call it off, I think it might be possible, but it is more complicated than it seems.

Using an ExpressionVisitor

The next approach involves using an ExpressionVisitor to visit all expressions contained in an Expression, something that I have talked before. The thing here is, an Expression is typically composed of several other expressions, of various kinds, so appropriate code must be used for each in order to get an univocal hash out of it. For example, for a ConstantExpression, we might consider its NodeType, Type and Value properties. The problem is, there are more than 20 Expression subclasses, and we need to do this for each.

To cut things short, suppose we have implemented an hash code extractor for each Expression kind. We can’t simply have a big hash code by adding all hash codes calculated for each Expression fragment, because the order by which they appear is also important:

   1: //first take 10 elements and then sort them
   2: var pagedSortedNames = names.Take(10).OrderBy(x => x);
   3:  
   4: //first sort all elements then take 10 of them
   5: var sortedPagedNames = names.OrderBy(x => x).Take(10);

So, each individual hash code must be take in order, and then we must get the hash code of the entire list of hash codes. Uff!

Existing Implementations

This problem exists since there are LINQ expressions, and, of course, other folks have come up with solutions. The NHibernate project has one, probably LINQ to SQL and Entity Framework have too, just to name those more close to me. However, I don’t think any of these solutions might be ready for external, general purpose usage outside their source projects. It would be great to have one such library that we could pick up and use, but I haven’t so far found any. Might be something to work on, though.

What are your thoughts?

Posted: Jun 12 2013, 07:29 PM by Ricardo Peres | with 2 comment(s)
Filed under: ,

Comments

Andrew Hanson said:

I had a need for this too like a month ago and was surprised it didn't "just work". I didn't go as far as you did with outlining potential solutions; I gave up far too easy. Now that you bring it back up, maybe I'll take a crack at it however. It would be really nice to have.

# June 13, 2013 8:24 AM

Ricardo Peres said:

Andrew:

Let me know your findings, please!

I think I'm having a go at it too...

# June 13, 2013 9:03 AM