[LINQ via C# series]
Lambda expression is another powerful syntactic sugar making C# functional. In this post, “Lambda expression” simply means “C# Lambda expression”. The native concept of lambda expression will be introduced in the later lambda calculus posts.
According to MSDN:
A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.
All lambda expressions use the lambda operator =>, which is read as "goes to". The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression or statement block.
It is just simple functional expression, which looks like number => number > 0. The left side of => is the function parameters, and the right side is the function body.
Compilation as anonymous method
Consider such a “delegate type” definition:
public delegate TResult Func<T, TResult>(T arg);
Delegate
Then a method of that type can be defined:
private static bool IsPositive(int number)
{
return number > 0;
}
and a “delegate instance” can be created for binding, just like an assignment:
Func<int, bool> predicate = new Func<int, bool>(IsPositive);
where in C# the new DelegateType( ) construction can be omitted:
Func<int, bool> predicate = IsPositive;
Anonymous delegate
C# 2.0 introduced another syntactic sugar “anonymous delegate”, which enables writing code without explicitly defining a named method:
Func<int, bool> predicate = delegate(int number)
{
return number > 0;
};
The above inline constructing and assigning will be compiled to code that:
- defines a method the same as IsPositive(), but with a auto generated name;
- binds the generated to the predicate “delegate instance”;
- caching is also considered to optimize the performance.
The compilation result is:
[CompilerGenerated]
private static Func<int, bool> _cachedAnonymousMethodDelegate;
[CompilerGenerated]
private static bool GeneratedMethod(int number)
{
return number > 0;
}
and the binding is:
Func<int, bool> predicate = _cachedAnonymousMethodDelegate ??
(_cachedAnonymousMethodDelegate = new Func<int, bool>(GeneratedMethod));
, with caching enabled.
So this anonymous delegate works the same as the former delegate scenario.
Anonymous method
In C# 3.0, the former anonymous delegate syntax can be simplified to lambda expression:
Func<int, bool> predicate = (int number) =>
{
return number > 0;
};
To turn a anonymous delegate into a lambda expression, just drop the delegate keyword, and add a lambda operator => between (parameter) and {body}.
Lambda expression can be further shorten:
- When the type of parameter can be inferred (from Func<int, …>), the type declaration of parameter (int) can be omitted;
- When lambda expression has one parameter, the parentheses ( ) can be omitted;
- When the body of the lambda expression has only one return statement, the brackets { } and “return” keyword can be omitted.
So the above lambda expression can be written:
Func<int, bool> predicate = number => number > 0;
In C# 3.0, this is called an expression lambda.
In the scenario of having more than one statements in the body, the the brackets { } and “return” cannot be omitted:
Func<int, bool> predicate = number =>
{
Console.WriteLine(number);
return number > 0;
};
In C# 3.0, this is called a statement lambda.
Once again, the above 3 scenarios work the same.
Truly anonymous method
Since lambda expressions like number => number > 0 can be an anonymous method, it should not require a name. Yes, expression lambda and statement lambda can be used in a truly anonymous way:
new Func<int, bool>(number => number > 0)(1);
new Func<int, bool>(number =>
{
Console.WriteLine(number);
return number > 0;
})(1);
The new Func<int, bool>( ) constructions surrounding lambda expressions are used to inform compiler that the parameter type is int, and the return type is bool. After defining a method, parentheses are used to invoke the method with a parameter: …(1).
The following code cannot be compiled:
(number => number > 0)(1);
because in C# compiler’s perspective, there is no type information for the parameter. In loosely typed languages like JavaScript, this kind of code is totally ok:
(function (number) { return number > 1; })(1)
For other language’s compiler with different type inference design, like Haskell, this kind of code works well:
(\number -> number > 0) 1
or similarly in F#:
(fun number -> number > 0) 1
This does not mean C# compiler has weaker type inference functionality, or has worse quality, this is just a different design. C# is designed for imperative OOP, while F# is designed to be functional. Similar to Haskell, F# compiler knows number is an int because it sees that number is compared (>) with an int value.
The Func and Action generic delegate types
Actually, the above Func<T, TResult> delegate type definition is a .NET 3.5+ FCL built-in type.
In .NET 3.5 FCL, this generic delegate type defined in mscorlib.dll:
namespace System
{
public delegate void Action<T>(T obj);
}
And these are defined in System.Core.dll:
namespace System
{
public delegate void Action();
public delegate void Action<T1, T2>(T1 arg1, T2 arg2);
public delegate void Action<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3);
public delegate void Action<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4);
public delegate TResult Func<TResult>();
public delegate TResult Func<T, TResult>(T arg);
public delegate TResult Func<T1, T2, TResult>(T1 arg1, T2 arg2);
public delegate TResult Func<T1, T2, T3, TResult>(T1 arg1, T2 arg2, T3 arg3);
public delegate TResult Func<T1, T2, T3, T4, TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4);
}
They will be used again and again in LINQ programming.
In .NET 4.0 FCL, more Action and Func generic delegate types are provided:
- mscorlib.dll
- Action with 0 ~ 8 type parameters (Action, Action`1 ~ Action`8)
- Func with 1 ~ 9 type parameters (Func`1 ~ Func`9)
- System.Core.dll
- Action`9 ~ Action`16
- Func`10 ~ Func`17
Compilation as expression tree
An expression tree object can be created with lambda expression:
Expression<Func<int, bool>> predicate = number => number > 0;
In the above assignment statement, the right side is a lambda expression, which looks totally the same as the above anonymous method. But this time the predicate has a type Expression<Func<int, bool>> instead of Func<int, bool>. An Expression<T> object is called an expression tree instead of an anonymous method.
Unlike anonymous delegate containing a bunch of executable code, expression tree is a data structure. It is an abstract syntax tree and consists of a number of nodes, representing the structure of executable source code.
The above code is compiled into:
ParameterExpression number = Expression.Parameter(typeof(int), "number");
ConstantExpression zero = Expression.Constant(0, typeof(int));
ParameterExpression[] parameters = new ParameterExpression[] { number };
BinaryExpression body = Expression.GreaterThan(number, zero);
Expression<Func<int, bool>> predicate = Expression.Lambda<Func<int, bool>>(body, parameters);
The above lambda expression is representing a abstract syntax tree data structure:
The differences between
Func<int, bool> method = number => number > 0;
and
Expression<Func<int, bool>> tree = number => number > 0;
are:
- As anonymous method, the lambda expression “number => number > 0” is compiled into a method of Func<int, bool> type, which consists of a bunch of executive IL instructions.
- As expression tree, the lambda expression “number => number > 0” is compiled into an abstract syntax tree object of Expression<Func<int, bool>> type, representing the structure of the Func<int, bool> code “number => number > 0”. This tree consists of:
- A ParameterExpression collection (parameters), representing all the paramters of code “number => number > 0”, which is:
- A ParameterExpression object (number), representing the int parameter named “number”.
- A BinaryExpression object, representing the body of code “number => number > 0”, which is a “greater than” operator with a left operand and a right operand:
- A reference of ParameterExpression object (number), representing the left operand.
- A ConstantExpression object (zero), representing the right operand.
Because each node of expression tree is strong typed with very rich information. it is very easy to write code to traverse an expression tree, get the represented algorithm (like predicting whether a parameter is greater than a constant), and translate to some other stuff (maybe a “greater than” predicate in T-SQL WHERE clause, etc.). Expression tree will be revisited in the posts of LINQ to SQL.
Please notice that, the above syntactic sugar can be only applied to expression lambda, and not available to statement lambda.
Type inference of lambda expression
The parameter type, return type, and lambda expression type should be able to be predicted from the context:
// Expects a int parameter, and returns a bool value.
Func<int, bool> predicate = number => number > 0;
// Expects a int parameter, and returns void.
Action<int> output = number => Console.WriteLine(number);
So “var” cannot be used with lambda expression. The following code cannot compile:
// predicate is a method or an expression tree?
// What is the type of number?
var predicate = number => number > 0;
The compiler still cannot see the type of “number”, and it cannot know should this lambda expression be compiled to an anonymous method (SomeDelegate type), or an expression tree (Expression<SomeDelegate> type).
“dynamic” cannot be used either. “dynamic” is just System.Object.
Traverse expression tree
To demonstrate traversing a expression tree, take the simplest kind of expression tree as example, which only contains the 4 basic binary arithmetical calculation:
- add
- subtract
- multiply
- divide
In .NET these binary calculations are represented by BinaryExpression objects.
For example:
Expression<Func<double, double, double, double, double, double>> infixExpression =
(a, b, c, d, e) => a + b - c * d / 2 + e * 3;
This is a abstract syntax tree representing the structure of a Func<double, double, double, double, double, double> code “(a, b, c, d, e) => a + b - c * d / 2 + e * 2”. It is a very simple binary tree, where:
- each internal node is a binary node (BinaryExpression object) representing add, subtract, multiply, or divide calculation;
- each leaf node is either a parameter (ParameterExpression object), or a constant (ConstantExpression object).
So totally there are 6 possible kinds of nodes in this kind of expression tree:
- add: BinaryExpression(NodeType = ExpressionType.Add)
- subtract: BinaryExpression(NodeType = ExpressionType.Subtract)
- multiply: BinaryExpression(NodeType = ExpressionType.Multiply)
- divide: BinaryExpression(NodeType = ExpressionType.Divide)
- constant: ParameterExpression(NodeType = ExpressionType.Constant)
- parameter: ConstantExpression(NodeType = ExpressionType.Parameter)
Each node has a NodeType property representing the node type.
Recursively traverse this tree is very easy. The following base class encapsulates the basic logic of traversing:
public abstract class SimpleExpressionVisitor<TResult>
{
protected LambdaExpression _expression;
protected SimpleExpressionVisitor(LambdaExpression expression)
{
this._expression = expression;
}
public IEnumerable<TResult> VisitBody()
{
// Takes Body as root node.
return this.VisitNode(this._expression.Body);
}
protected IEnumerable<TResult> VisitNode(Expression node)
{
// Processes the 6 kind of node.
switch (node.NodeType)
{
case ExpressionType.Add:
BinaryExpression add = node as BinaryExpression;
return this.VisitAdd(add);
case ExpressionType.Constant:
ConstantExpression constant = node as ConstantExpression;
return this.VisitConstant(constant);
case ExpressionType.Divide:
BinaryExpression divide = node as BinaryExpression;
return this.VisitDivide(divide);
case ExpressionType.Multiply:
BinaryExpression multiply = node as BinaryExpression;
return this.VisitMultiply(multiply);
case ExpressionType.Parameter:
ParameterExpression parameter = node as ParameterExpression;
return this.VisitParameter(parameter);
case ExpressionType.Subtract:
BinaryExpression subtract = node as BinaryExpression;
return this.VisitSubtract(subtract);
default:
throw new ArgumentOutOfRangeException("node");
}
}
protected abstract IEnumerable<TResult> VisitAdd(BinaryExpression add);
protected abstract IEnumerable<TResult> VisitConstant(ConstantExpression constant);
protected abstract IEnumerable<TResult> VisitDivide(BinaryExpression divide);
protected abstract IEnumerable<TResult> VisitMultiply(BinaryExpression multiply);
protected abstract IEnumerable<TResult> VisitParameter(ParameterExpression parameter);
protected abstract IEnumerable<TResult> VisitSubtract(BinaryExpression subtract);
}
The following class implements the concrete operations while visiting each node, which is to log some string into a IEnumeable<char> object:
public class PreorderVisitor : SimpleExpressionVisitor<char>
{
public PreorderVisitor(LambdaExpression expression)
: base(expression)
{
}
protected override IEnumerable<char> VisitAdd(BinaryExpression add)
{
return this.VisitBinary(add, "add"); // add(left, right)
}
protected override IEnumerable<char> VisitConstant(ConstantExpression constant)
{
return constant.Value.ToString();
}
protected override IEnumerable<char> VisitDivide(BinaryExpression divide)
{
return this.VisitBinary(divide, "div"); // div(left, right)
}
protected override IEnumerable<char> VisitMultiply(BinaryExpression multiply)
{
return this.VisitBinary(multiply, "mul"); // mul(left, right)
}
protected override IEnumerable<char> VisitParameter(ParameterExpression parameter)
{
return parameter.Name;
}
protected override IEnumerable<char> VisitSubtract(BinaryExpression subtract)
{
return this.VisitBinary(subtract, "sub"); // sub(left, right)
}
private IEnumerable<char> VisitBinary(BinaryExpression binary, string prefix)
{
return string.Format(
CultureInfo.InvariantCulture,
"{0}({1}, {2})", // prefix(left, right)
prefix,
this.VisitNode(binary.Left),
this.VisitNode(binary.Right));
}
}
While traversing, each infix expression code structure (a + b) is logged as a prefix expression string add(a, b). The following code:
Expression<Func<double, double, double, double, double, double>> infixExpression =
(a, b, c, d, e) => a + b - c * d / 2 + e * 3;
PreorderVisitor preorderVisitor = new PreorderVisitor(infixExpression);
IEnumerable<char> prefixExpression = preorderVisitor.VisitBody();
Console.WriteLine(prefixExpression);
prints:
add(sub(add(a, b), div(mul(c, d), 2)), mul(e, 3))
Of couse, a + b - c * d / 2 + e * 3 is identical to ((a add b) sub ((c mul d) div 2)) add (e mul 3). This is called infix style, where operator stays between left operand and right operand. (a add b) can be simply read as: a add b.
If the operator is put before left and right operands, it is called prefix style: add(a, b), which has the same meaning with (a add b), but looks more like function invocaion. add(a, b) can be read as: invoke add with arguments a, b.
Yes,
((a add b) sub ((c mul d) div 2)) add (e mul 3)
has the same meaning with
add(sub(add(a, b), div(mul(c, d), 2)), mul(e, 3))
What happened is just changed infix to prefix.
In functional programming language like Lisp, Haskell, F#, it is Ok to write arithmetical expression in either style.
Built-in expression tree visitor
Please notice .NET 4.0 includes a built-in System.Linq.Expressions.ExpressionVisitor class in System.Core.dll.
Translate expression tree
How about postfix? In postfix, changing (a add b) to (a, b)add looks a little weird. (a, b)add can be read as: push a to stack, then push b to stack, execute add.
Yes, this is precisely how the computer works. Just rewrite:
(((a, b)add, ((c, d)mul, 2)div)sub, (e, 3)mul)add
vertically:
a // Loads a to stack.
b // Loads b to stack.
add // Executes add.
c // …
d
mul
2
div
sub
e
3
mul
add
It is very easy to produce this postfix style by changing a little code from PreorderVisitor. Since postfix style is the same as how the computer works, it is Ok to go a little further, replacing string logs (a, b, add, …), computer instructions (loads a to stack, loads b to stack, executes add, …):
public class PostorderVisitor : SimpleExpressionVisitor<Tuple<OpCode, double?>>
{
public PostorderVisitor(LambdaExpression expression)
: base(expression)
{
}
protected override IEnumerable<Tuple<OpCode, double?>> VisitAdd(
BinaryExpression add)
{
return this.VisitBinary(add, OpCodes.Add); // (left, right)add
}
protected override IEnumerable<Tuple<OpCode, double?>> VisitConstant(
ConstantExpression constant)
{
yield return Tuple.Create(OpCodes.Ldc_R8, (double?)constant.Value);
}
protected override IEnumerable<Tuple<OpCode, double?>> VisitDivide(
BinaryExpression divide)
{
return this.VisitBinary(divide, OpCodes.Div); // (left, right)div
}
protected override IEnumerable<Tuple<OpCode, double?>> VisitMultiply(
BinaryExpression multiply)
{
return this.VisitBinary(multiply, OpCodes.Mul); // (left, right)mul
}
protected override IEnumerable<Tuple<OpCode, double?>> VisitParameter(
ParameterExpression parameter)
{
int index = this._expression.Parameters.IndexOf(parameter);
yield return Tuple.Create(OpCodes.Ldarg_S, (double?)index);
}
protected override IEnumerable<Tuple<OpCode, double?>> VisitSubtract(
BinaryExpression subtract)
{
return this.VisitBinary(subtract, OpCodes.Sub); // (left, right)sub
}
private IEnumerable<Tuple<OpCode, double?>> VisitBinary(
BinaryExpression binary, OpCode postfix)
{
// (left, right)postfix
return this.VisitNode(binary.Left).Concat(this.VisitNode(binary.Right))
.Concat(new Tuple<OpCode, double?>[] {
Tuple.Create(postfix, (double?)null) });
}
}
This looks powerful. The following code:
Expression<Func<double, double, double, double, double, double>> infixExpression =
(a, b, c, d, e) => a + b - c * d / 2 + e * 3;
PostorderVisitor postorderVisitor = new PostorderVisitor(infixExpression);
IEnumerable<Tuple<OpCode, double?>> postfixExpression = postorderVisitor.VisitBody();
foreach (Tuple<OpCode, double?> code in postfixExpression)
{
Console.WriteLine("{0} {1}", code.Item1, code.Item2);
}
prints:
ldarg.s 0 // Loads argument a to stack.
ldarg.s 1 // Loads argument b to stack.
add // Executes add.
ldarg.s 2 // …
ldarg.s 3
mul
ldc.r8 2
div
sub
ldarg.s 4
ldc.r8 3
mul
add
Yes, the expression tree data structure has been successfully translated into IL code!
Convert anonymous method to expression tree
Since the same syntax can be used to create anonymous method and expression tree, can these two stuff be converted into each other?
Actually, parse code (a piece of string representing C# code) into a strongly typed tree data structure is tough. Currently this feature is not implemented in any smooth way.
Convert expression tree to anonymous method
But since expression tree is a strongly typed abstract syntax tree representing code structure, it is much easier to convert an expression tree to an anonymous method. What need to do is, traverse the tree, translate the algorithm into IL code.
The following base class encapsulates the basic logic of dynamically generating a method, which executes the translated code:
public abstract class SimpleExpressionTranslator<TDelegate, TResult> where TDelegate : class
{
protected Expression<TDelegate> _expression;
protected SimpleExpressionVisitor<TResult> _visitor;
protected SimpleExpressionTranslator(
Expression<TDelegate> expression,
Func<SimpleExpressionVisitor<TResult>> getVisitor)
{
this._expression = expression;
this._visitor = getVisitor();
}
public TDelegate GetExecutor()
{
DynamicMethod dynamicMethod = new DynamicMethod(
string.Empty,
this._expression.ReturnType,
this._expression.Parameters.Select(parameter => parameter.Type).ToArray(),
this.GetType().Module);
ILGenerator ilGenerator = dynamicMethod.GetILGenerator();
this.Emit(ilGenerator);
return dynamicMethod.CreateDelegate(typeof(TDelegate)) as TDelegate;
}
protected abstract void Emit(ILGenerator ilGenerator);
}
The following class implements translating expression tree to IL instructions:
public class ILTranslator<TDelegate> :
SimpleExpressionTranslator<TDelegate, Tuple<OpCode, double?>>
where TDelegate : class
{
public ILTranslator(Expression<TDelegate> expression)
: base(expression, () => new PostorderVisitor(expression))
{
}
protected override void Emit(ILGenerator ilGenerator)
{
IEnumerable<Tuple<OpCode, double?>> codes = this._visitor.VisitBody();
foreach (Tuple<OpCode, double?> code in codes)
{
if (code.Item2.HasValue)
{
if (code.Item1 == OpCodes.Ldarg_S) // ldarg.s (int)index
{
ilGenerator.Emit(code.Item1, (int)code.Item2.Value);
}
else // ldc.r8 (double)constant
{
ilGenerator.Emit(code.Item1, code.Item2.Value);
}
}
else // add, sub, mul, div
{
ilGenerator.Emit(code.Item1);
}
}
ilGenerator.Emit(OpCodes.Ret); // Returns the result.
}
}
The following code shows how to convert the expression tree (a, b, c, d, e) => a + b - c * d / 2 + e * 3 into a .NET method:
Expression<Func<double, double, double, double, double, double>> infixExpression =
(a, b, c, d, e) => a + b - c * d / 2 + e * 3;
ILTranslator<Func<double, double, double, double, double, double>> ilTranslator =
new ILTranslator<Func<double, double, double, double, double, double>>(
infixExpression);
Func<double, double, double, double, double, double> dotNetILMethod =
ilTranslator.GetExecutor();
double dotNetResult = dotNetILMethod(1, 2, 3, 4, 5);
Console.WriteLine(dotNetResult); // 12
This is very powerful. By traversing a abstract syntax tree, a .NET method is created at runtime.
Execute expression tree via built-in IL translator
Actually, .NET has already provided an API to traverse an expression tree, and convert it to an executable method:
Expression<Func<double, double, double, double, double, double>> infixExpression =
(a, b, c, d, e) => a + b - c * d / 2 + e * 3;
Func<double, double, double, double, double, double> method = infixExpression.Compile();
double result = method(1, 2, 3, 4, 5);
Console.WriteLine(result); // 12
Internally, Compile() does the same work with above ILTranslator.GetExecutor():
- Traverse the expression tree;
- Translate the algorithm into IL instructions;
- Generate a method which consists of those IL instructions.
Convert expression tree into other stuff
In this post, expression tree is translated to prefix style description string, and postfix style IL instructions. Later, the LINQ to SQL posts will revisit expression tree, and explain how expression tree is translated into T-SQL queries.