[LINQ via C# series]
One post at the beginning of this series mentioned that deferred execution is a important advancement of LINQ. The following code show how is the execution deferred:
IEnumerable<int> source = Enumerable.Range(-2, 5); // -2, -1, 0, 1, 2
// Invokes query methods.
IEnumerable<int> integers = source.Where(item =>
{
Console.WriteLine("Filtering {0}.", item);
return item > 0;
})
.OrderByDescending(item =>
{
Console.WriteLine("Ordering {0}.", item);
return item;
});
// Continues invoking query method.
IEnumerable<string> strings = integers.Select(item =>
{
Console.WriteLine("Projecting {0}.", item);
return item.ToString(CultureInfo.InvariantCulture);
});
Console.WriteLine("Starting to iterate the results.");
foreach (string item in strings)
{
Console.WriteLine("Printing {0}.", item);
}
The console displays:
Starting to iterate the results.
Filtering -2.
Filtering -1.
Filtering 0.
Filtering 1.
Filtering 2.
Ordering 1.
Ordering 2.
Projecting 2.
Printing 2.
Projecting 1.
Printing 1.
The deferred execution is very clear:
- After the Where(), OrderByDescding(), Select() methods were invoked, the expected filter, ordering, projection did not behave.
- Later when iteration starts, because the query results are desired, the expected filter, ordering, projection executed, and generated the result.
This post explains how could deferred execution happen on a IEnumerable<T> collection.
Hot IEnumerable<T> vs. cold IEnumerable<T>
The previous post introduces 2 ways of implementing GetMessages(), which return a string collection {“1”, “2”, “3”}. They can be changed a little bit.
This first implementation is to directly return a array collection:
public static IEnumerable<string> GetMessages()
{
return new string[] // string[] collection is IEnumerable<string>.
{
1.ToString(CultureInfo.InvariantCulture), // ToString() executes immediately.
2.ToString(CultureInfo.InvariantCulture), // ToString() executes immediately.
3.ToString(CultureInfo.InvariantCulture) // ToString() executes immediately.
};
}
All the 3 ToString() methods have executed when GetMessage returns:
IEnumerable<string> collection = GetMessages();
// Now 3 ToString() methods have executed and returned.
// collection refers to an array containing 3 strings.
This returned collection is called a hot IEnumerable<T>. because the items {“1”, “2”, “3”} are already there before iterating the collection.
The second implementation is to yield return the items:
// Returns a collection which can be iterated.
public static IEnumerable<string> GetMessages()
{
// The multiple yield returns execute when iterated in a foreach:
// MoveNext() is invoked and returns true.
yield return 1.ToString(CultureInfo.InvariantCulture); // ToString() is deferred.
// Iteration 1 gets item "1" from collection by calling Current.
// MoveNext() is invoked and returns true.
yield return 2.ToString(CultureInfo.InvariantCulture); // ToString() is deferred.
// Iteration 2 gets item "2" from collection by calling Current.
// MoveNext() is invoked and returns true.
yield return 3.ToString(CultureInfo.InvariantCulture); // ToString() is deferred.
// Iteration 3 gets item "3" from collection by calling Current.
// MoveNext() is invoked and returns false, becuase there is no more yield return to reach.
}
All the 3 ToString() methods do not execute when GetMessage returns:
IEnumerable<string> collection = GetMessages();
// Now 3 ToString() methods have not executed yet.
// They will execute one by one only when collection is iterated in foreach.
This returned collection is called a cold IEnumerable<T>. because the items are not generated before iterating the collection. If collection is never iterated, the ToString() methods will never execute, then the string items {“1”, “2”, “3”} will never exist.
Again, a method returns a cold IEnumerable<T> as long as it contains a number of yield return statements, no mater how do the yield return statements look like. The statements can be plainly sequential:
public static IEnumerable<string> GetMessages()
{
// Contains a number of yield return statements.
yield return 1.ToString(CultureInfo.InvariantCulture);
yield return 2.ToString(CultureInfo.InvariantCulture);
yield return 3.ToString(CultureInfo.InvariantCulture);
}
If the statements are in a loop, they are still a sequence of yield return, which works the same:
public static IEnumerable<string> GetMessages()
{
// Contains a number of yield return statements.
for (int i = 0; i <= 3; i++)
{
yield return i.ToString(CultureInfo.InvariantCulture);
}
}
and this also leads to the same results:
public static IEnumerable<string> GetMessages()
{
// Contains a number of yield return statements.
int[] values = new int[] { 1, 2, 3 };
foreach (int value in values)
{
yield return value.ToString(CultureInfo.InvariantCulture);
}
}
Eager execution vs. deferred execution
Consider the code sample at the beginning of the post. The printed result already shows the projection is a kind of deferred execution. Now try a Select() implementation returning a array. and a Select() with multiple yield return statements:
// Eager execution, returning a hot IEnumerable<T>.
public static IEnumerable<TResult> EagerSelect<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
TResult[] results = new TResult[0];
foreach (TSource item in source)
{
// Projects the item to result.
TResult result = selector(item);
// Increases the array size. and insert the result.
int index = results.Length;
Array.Resize(ref results, index + 1);
results[index] = result;
}
return results;
}
// Deferred execution, returning a cold IEnumerable<T>.
public static IEnumerable<TResult> DeferredSelect<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
// The multiple yield returns execute when iterated:
foreach (TSource item in source)
{
// Projects the item to result.
yield return selector(item);
}
}
Now the test code:
IEnumerable<int> source = Enumerable.Range(-2, 5); // -2, -1, 0, 1, 2
IEnumerable<string> results = source.EagerSelect(item =>
{
Console.WriteLine("Projecting {0}.", item);
return item.ToString(CultureInfo.InvariantCulture);
});
Console.WriteLine("Starting to iterate the results.");
foreach (string item in results)
{
Console.WriteLine("Printing {0}.", item);
}
results:
Projecting -2.
Projecting -1.
Projecting 0.
Projecting 1.
Projecting 2.
Starting to iterate the results.
Printing -2.
Printing -1.
Printing 0.
Printing 1.
Printing 2.
and the deferred version Select():
IEnumerable<string> results = source.DeferredSelect(item =>
{
Console.WriteLine("Projecting {0}.", item);
return item.ToString(CultureInfo.InvariantCulture);
});
results:
Starting to iterate the results.
Projecting -2.
Printing -2.
Projecting -1.
Printing -1.
Projecting 0.
Printing 0.
Projecting 1.
Printing 1.
Projecting 2.
Printing 2.
Bingle! Finally, the truth of LINQ to Objects is almost reached.