How LINQ to Object statements work
This post goes into detail as to now LINQ statements work when querying a collection of objects.
This topic assumes you have an understanding of how generics, delegates, implicitly typed variables, lambda expressions, object/collection initializers, extension methods and the yield statement work. I would also recommend you read my previous two posts:
We will start by writing some methods to filter a collection of data.
Assume we have an Employee class like so:
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
and a collection of employees like so:
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" },
};
Filtering
We wish to find all employees that have an even ID. We could start off by writing a method that takes in a list of employees and returns a filtered list of employees with an even ID.
static List<Employee> GetEmployeesWithEvenID(List<Employee> employees) {
var filteredEmployees = new List<Employee>();
foreach (Employee emp in employees) {
if (emp.ID % 2 == 0) {
filteredEmployees.Add(emp);
}
}
return filteredEmployees;
}
The method can be rewritten to return an IEnumerable<Employee> using the yield return keyword.
static IEnumerable<Employee> GetEmployeesWithEvenID(IEnumerable<Employee> employees) {
foreach (Employee emp in employees) {
if (emp.ID % 2 == 0) {
yield return emp;
}
}
}
We put these together in a console application.
using System;
using System.Collections.Generic;
//No System.Linq
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" },
};
var filteredEmployees = GetEmployeesWithEvenID(employees);
foreach (Employee emp in filteredEmployees) {
Console.WriteLine("ID {0} First_Name {1} Last_Name {2} Country {3}",
emp.ID, emp.FirstName, emp.LastName, emp.Country);
}
Console.ReadLine();
}
static IEnumerable<Employee> GetEmployeesWithEvenID(IEnumerable<Employee> employees) {
foreach (Employee emp in employees) {
if (emp.ID % 2 == 0) {
yield return emp;
}
}
}
}
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
Output:
ID 2 First_Name Jim Last_Name Ashlock Country UK
ID 4 First_Name Jill Last_Name Anderson Country AUS
Our filtering method is too specific. Let us change it so that it is capable of doing different types of filtering and lets give our method the name Where ;-)
We will add another parameter to our Where method. This additional parameter will be a delegate with the following declaration.
public delegate bool Filter(Employee emp);
The idea is that the delegate parameter in our Where method will point to a method that contains the logic to do our filtering thereby freeing our Where method from any dependency. The method is shown below:
static IEnumerable<Employee> Where(IEnumerable<Employee> employees, Filter filter) {
foreach (Employee emp in employees) {
if (filter(emp)) {
yield return emp;
}
}
}
public delegate bool Filter(Employee emp);
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var filterDelegate = new Filter(EmployeeHasEvenId);
var filteredEmployees = Where(employees, filterDelegate);
foreach (Employee emp in filteredEmployees) {
Console.WriteLine("ID {0} First_Name {1} Last_Name {2} Country {3}",
emp.ID, emp.FirstName, emp.LastName, emp.Country);
}
Console.ReadLine();
}
static bool EmployeeHasEvenId(Employee emp) {
return emp.ID % 2 == 0;
}
static IEnumerable<Employee> Where(IEnumerable<Employee> employees, Filter filter) {
foreach (Employee emp in employees) {
if (filter(emp)) {
yield return emp;
}
}
}
}
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
Lets use lambda expressions to inline the contents of the EmployeeHasEvenId method in place of the method. The next code snippet shows this change (see line 15). For brevity, the Employee class declaration has been skipped.
public delegate bool Filter(Employee emp);
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var filterDelegate = new Filter(EmployeeHasEvenId);
var filteredEmployees = Where(employees, emp => emp.ID % 2 == 0);
foreach (Employee emp in filteredEmployees) {
Console.WriteLine("ID {0} First_Name {1} Last_Name {2} Country {3}",
emp.ID, emp.FirstName, emp.LastName, emp.Country);
}
Console.ReadLine();
}
static bool EmployeeHasEvenId(Employee emp) {
return emp.ID % 2 == 0;
}
static IEnumerable<Employee> Where(IEnumerable<Employee> employees, Filter filter) {
foreach (Employee emp in employees) {
if (filter(emp)) {
yield return emp;
}
}
}
}
The output displays the same two employees.
Our Where method is too restricted since it works with a collection of Employees only. Lets change it so that it works with any IEnumerable<T>. In addition, you may recall from my previous post, that .NET 3.5 comes with a lot of predefined delegates including
public delegate TResult Func<T, TResult>(T arg);
We will get rid of our Filter delegate and use the one above instead. We apply these two changes to our code.
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var filteredEmployees = Where(employees, emp => emp.ID % 2 == 0);
foreach (Employee emp in filteredEmployees) {
Console.WriteLine("ID {0} First_Name {1} Last_Name {2} Country {3}",
emp.ID, emp.FirstName, emp.LastName, emp.Country);
}
Console.ReadLine();
}
static IEnumerable<T> Where<T>(IEnumerable<T> source, Func<T, bool> filter) {
foreach (var x in source) {
if (filter(x)) {
yield return x;
}
}
}
}
We have successfully implemented a way to filter any IEnumerable<T> based on a filter criteria.
Projection
Now lets enumerate on the items in the IEnumerable<Employee> we got from the Where method and copy them into a new IEnumerable<EmployeeFormatted>. The EmployeeFormatted class will only have a FullName and ID property.
public class EmployeeFormatted {
public int ID { get; set; }
public string FullName {get; set;}
}
We could “project” our existing IEnumerable<Employee> into a new collection of IEnumerable<EmployeeFormatted> with the help of a new method. We will call this method Select ;-)
static IEnumerable<EmployeeFormatted> Select(IEnumerable<Employee> employees) {
foreach (var emp in employees) {
yield return new EmployeeFormatted {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
};
}
}
The changes are applied to our app.
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var filteredEmployees = Where(employees, emp => emp.ID % 2 == 0);
var formattedEmployees = Select(filteredEmployees);
foreach (EmployeeFormatted emp in formattedEmployees) {
Console.WriteLine("ID {0} Full_Name {1}",
emp.ID, emp.FullName);
}
Console.ReadLine();
}
static IEnumerable<T> Where<T>(IEnumerable<T> source, Func<T, bool> filter) {
foreach (var x in source) {
if (filter(x)) {
yield return x;
}
}
}
static IEnumerable<EmployeeFormatted> Select(IEnumerable<Employee> employees) {
foreach (var emp in employees) {
yield return new EmployeeFormatted {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
};
}
}
}
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
public class EmployeeFormatted {
public int ID { get; set; }
public string FullName {get; set;}
}
Output:
ID 2 Full_Name Ashlock, Jim
ID 4 Full_Name Anderson, Jill
We have successfully selected employees who have an even ID and then shaped our data with the help of the Select method so that the final result is an IEnumerable<EmployeeFormatted>.
Lets make our Select method more generic so that the user is given the freedom to shape what the output would look like. We can do this, like before, with lambda expressions. Our Select method is changed to accept a delegate as shown below. TSource will be the type of data that comes in and TResult will be the type the user chooses (shape of data) as returned from the selector delegate.
static IEnumerable<TResult> Select<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, TResult> selector) {
foreach (var x in source) {
yield return selector(x);
}
}
We see the new changes to our app. On line 15, we use lambda expression to specify the shape of the data. In this case the shape will be of type EmployeeFormatted.
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var filteredEmployees = Where(employees, emp => emp.ID % 2 == 0);
var formattedEmployees = Select(filteredEmployees, (emp) =>
new EmployeeFormatted {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
});
foreach (EmployeeFormatted emp in formattedEmployees) {
Console.WriteLine("ID {0} Full_Name {1}",
emp.ID, emp.FullName);
}
Console.ReadLine();
}
static IEnumerable<T> Where<T>(IEnumerable<T> source, Func<T, bool> filter) {
foreach (var x in source) {
if (filter(x)) {
yield return x;
}
}
}
static IEnumerable<TResult> Select<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, TResult> selector) {
foreach (var x in source) {
yield return selector(x);
}
}
}
The code outputs the same result as before. On line 14 we filter our data and on line 15 we project our data.
What if we wanted to be more expressive and concise? We could combine both line 14 and 15 into one line as shown below. Assuming you had to perform several operations like this on our collection, you would end up with some very unreadable code!
var formattedEmployees = Select(Where(employees, emp => emp.ID % 2 == 0), (emp) =>
new EmployeeFormatted {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
});
A cleaner way to write this would be to give the appearance that the Select and Where methods were part of the IEnumerable<T>. This is exactly what extension methods give us. Extension methods have to be defined in a static class. Let us make the Select and Where extension methods on IEnumerable<T>
public static class MyExtensionMethods {
static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> filter) {
foreach (var x in source) {
if (filter(x)) {
yield return x;
}
}
}
static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector) {
foreach (var x in source) {
yield return selector(x);
}
}
}
The creation of the extension method makes the syntax much cleaner as shown below. We can write as many extension methods as we want and keep on chaining them using this technique.
var formattedEmployees = employees
.Where(emp => emp.ID % 2 == 0)
.Select (emp => new EmployeeFormatted { ID = emp.ID, FullName = emp.LastName + ", " + emp.FirstName });
Making these changes and running our code produces the same result.
using System;
using System.Collections.Generic;
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var formattedEmployees = employees
.Where(emp => emp.ID % 2 == 0)
.Select (emp =>
new EmployeeFormatted {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
}
);
foreach (EmployeeFormatted emp in formattedEmployees) {
Console.WriteLine("ID {0} Full_Name {1}",
emp.ID, emp.FullName);
}
Console.ReadLine();
}
}
public static class MyExtensionMethods {
static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> filter) {
foreach (var x in source) {
if (filter(x)) {
yield return x;
}
}
}
static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector) {
foreach (var x in source) {
yield return selector(x);
}
}
}
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
public class EmployeeFormatted {
public int ID { get; set; }
public string FullName {get; set;}
}
Let’s change our code to return a collection of anonymous types and get rid of the EmployeeFormatted type. We see that the code produces the same output.
using System;
using System.Collections.Generic;
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var formattedEmployees = employees
.Where(emp => emp.ID % 2 == 0)
.Select (emp =>
new {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
}
);
foreach (var emp in formattedEmployees) {
Console.WriteLine("ID {0} Full_Name {1}",
emp.ID, emp.FullName);
}
Console.ReadLine();
}
}
public static class MyExtensionMethods {
public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> filter) {
foreach (var x in source) {
if (filter(x)) {
yield return x;
}
}
}
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector) {
foreach (var x in source) {
yield return selector(x);
}
}
}
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
To be more expressive, C# allows us to write our extension method calls as a query expression. Line 16 can be rewritten a query expression like so:
var formattedEmployees = from emp in employees
where emp.ID % 2 == 0
select new {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
};
When the compiler encounters an expression like the above, it simply rewrites it as calls to our extension methods.
So far we have been using our extension methods. The System.Linq namespace contains several extension methods for objects that implement the IEnumerable<T>. You can see a listing of these methods in the Enumerable class in the System.Linq namespace.
Let’s get rid of our extension methods (which I purposefully wrote to be of the same signature as the ones in the Enumerable class) and use the ones provided in the Enumerable class. Our final code is shown below:
using System;
using System.Collections.Generic;
using System.Linq; //Added
public class Program
{
[STAThread]
static void Main(string[] args)
{
var employees = new List<Employee> {
new Employee { ID = 1, FirstName = "John", LastName = "Wright", Country = "USA" },
new Employee { ID = 2, FirstName = "Jim", LastName = "Ashlock", Country = "UK" },
new Employee { ID = 3, FirstName = "Jane", LastName = "Jackson", Country = "CHE" },
new Employee { ID = 4, FirstName = "Jill", LastName = "Anderson", Country = "AUS" }
};
var formattedEmployees = from emp in employees
where emp.ID % 2 == 0
select new {
ID = emp.ID,
FullName = emp.LastName + ", " + emp.FirstName
};
foreach (var emp in formattedEmployees) {
Console.WriteLine("ID {0} Full_Name {1}",
emp.ID, emp.FullName);
}
Console.ReadLine();
}
}
public class Employee {
public int ID { get; set;}
public string FirstName { get; set;}
public string LastName {get; set;}
public string Country { get; set; }
}
public class EmployeeFormatted {
public int ID { get; set; }
public string FullName {get; set;}
}
This post has shown you a basic overview of LINQ to Objects work by showning you how an expression is converted to a sequence of calls to extension methods when working directly with objects. It gets more interesting when working with LINQ to SQL where an expression tree is constructed – an in memory data representation of the expression. The C# compiler compiles these expressions into code that builds an expression tree at runtime. The provider can then traverse the expression tree and generate the appropriate SQL query. You can read more about expression trees in this MSDN article.