Understanding LINQ to Objects (6) Deferred Execution

[LINQ via C# series]

One post at the beginning of this series mentioned that deferred execution is a important advancement of LINQ. The following code show how is the execution deferred:

IEnumerable<int> source = Enumerable.Range(-2, 5); // -2, -1, 0, 1, 2
// Invokes query methods.
IEnumerable<int> integers = source.Where(item =>
                                        {
                                            Console.WriteLine("Filtering {0}.", item);
                                            return item > 0;
                                        })
                                  .OrderByDescending(item =>
                                        {
                                            Console.WriteLine("Ordering {0}.", item);
                                            return item;
                                        });
// Continues invoking query method.
IEnumerable<string> strings = integers.Select(item =>
                                        {
                                            Console.WriteLine("Projecting {0}.", item);
                                            return item.ToString(CultureInfo.InvariantCulture);
                                        });
Console.WriteLine("Starting to iterate the results.");
foreach (string item in strings)
{
    Console.WriteLine("Printing {0}.", item);
}

The console displays:

Starting to iterate the results.
Filtering -2.
Filtering -1.
Filtering 0.
Filtering 1.
Filtering 2.
Ordering 1.
Ordering 2.
Projecting 2.
Printing 2.
Projecting 1.
Printing 1.

The deferred execution is very clear:

  • After the Where(), OrderByDescding(), Select() methods were invoked, the expected filter, ordering, projection did not behave.
  • Later when iteration starts, because the query results are desired, the expected filter, ordering, projection executed, and generated the result.

This post explains how could deferred execution happen on a IEnumerable<T> collection.

Hot IEnumerable<T> vs. cold IEnumerable<T>

The previous post introduces 2 ways of implementing GetMessages(), which return a string collection {“1”, “2”, “3”}. They can be changed a little bit.

This first implementation is to directly return a array collection:

public static IEnumerable<string> GetMessages()
{
    return new string[] // string[] collection is IEnumerable<string>.
        { 
            1.ToString(CultureInfo.InvariantCulture), // ToString() executes immediately.
            2.ToString(CultureInfo.InvariantCulture), // ToString() executes immediately.
            3.ToString(CultureInfo.InvariantCulture)  // ToString() executes immediately.
        };
}

All the 3 ToString() methods have executed when GetMessage returns:

IEnumerable<string> collection = GetMessages();
// Now 3 ToString() methods have executed and returned.
// collection refers to an array containing 3 strings.

This returned collection is called a hot IEnumerable<T>. because the items {“1”, “2”, “3”} are already there before iterating the collection.

The second implementation is to yield return the items:

// Returns a collection which can be iterated.
public static IEnumerable<string> GetMessages()
{
    // The multiple yield returns execute when iterated in a foreach:

    // MoveNext() is invoked and returns true.
    yield return 1.ToString(CultureInfo.InvariantCulture); // ToString() is deferred.
    // Iteration 1 gets item "1" from collection by calling Current.

    // MoveNext() is invoked and returns true.
    yield return 2.ToString(CultureInfo.InvariantCulture); // ToString() is deferred.
    // Iteration 2 gets item "2" from collection by calling Current.

    // MoveNext() is invoked and returns true.
    yield return 3.ToString(CultureInfo.InvariantCulture); // ToString() is deferred.
    // Iteration 3 gets item "3" from collection by calling Current.

    // MoveNext() is invoked and returns false, becuase there is no more yield return to reach.
}

All the 3 ToString() methods do not execute when GetMessage returns:

IEnumerable<string> collection = GetMessages();
// Now 3 ToString() methods have not executed yet.
// They will execute one by one only when collection is iterated in foreach.

This returned collection is called a cold IEnumerable<T>. because the items are not generated before iterating the collection. If collection is never iterated, the ToString() methods will never execute, then the string items {“1”, “2”, “3”} will never exist.

Again, a method returns a cold IEnumerable<T> as long as it contains a number of yield return statements, no mater how do the yield return statements look like. The statements can be plainly sequential:

public static IEnumerable<string> GetMessages()
{
    // Contains a number of yield return statements.
    yield return 1.ToString(CultureInfo.InvariantCulture);
    yield return 2.ToString(CultureInfo.InvariantCulture);
    yield return 3.ToString(CultureInfo.InvariantCulture);
}

If the statements are in a loop, they are still a sequence of yield return, which works the same:

public static IEnumerable<string> GetMessages()
{
    // Contains a number of yield return statements.
    for (int i = 0; i <= 3; i++)
    {
        yield return i.ToString(CultureInfo.InvariantCulture);
    }
}

and this also leads to the same results:

public static IEnumerable<string> GetMessages()
{
    // Contains a number of yield return statements.
    int[] values = new int[] { 1, 2, 3 };
    foreach (int value  in values)
    {
        yield return value.ToString(CultureInfo.InvariantCulture);
    }
}

Eager execution vs. deferred execution

Consider the code sample at the beginning of the post. The printed result already shows the projection is a kind of deferred execution. Now try a Select() implementation returning a array. and a Select() with multiple yield return statements:

// Eager execution, returning a hot IEnumerable<T>.
public static IEnumerable<TResult> EagerSelect<TSource, TResult>(
    this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
    TResult[] results = new TResult[0];

    foreach (TSource item in source)
    {
        // Projects the item to result.
        TResult result = selector(item);

        // Increases the array size. and insert the result.
        int index = results.Length;
        Array.Resize(ref results, index + 1);
        results[index] = result;
    }

    return results;
}

// Deferred execution, returning a cold IEnumerable<T>.
public static IEnumerable<TResult> DeferredSelect<TSource, TResult>(
    this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
    // The multiple yield returns execute when iterated:

    foreach (TSource item in source)
    {
        // Projects the item to result.
        yield return selector(item);
    }
}

Now the test code:

IEnumerable<int> source = Enumerable.Range(-2, 5); // -2, -1, 0, 1, 2
IEnumerable<string> results = source.EagerSelect(item =>
                                        {
                                            Console.WriteLine("Projecting {0}.", item);
                                            return item.ToString(CultureInfo.InvariantCulture);
                                        });
Console.WriteLine("Starting to iterate the results.");
foreach (string item in results)
{
    Console.WriteLine("Printing {0}.", item);
}

results:

Projecting -2.
Projecting -1.
Projecting 0.
Projecting 1.
Projecting 2.
Starting to iterate the results.
Printing -2.
Printing -1.
Printing 0.
Printing 1.
Printing 2.

and the deferred version Select():

IEnumerable<string> results = source.DeferredSelect(item =>
                                        {
                                            Console.WriteLine("Projecting {0}.", item);
                                            return item.ToString(CultureInfo.InvariantCulture);
                                        });

results:

Starting to iterate the results.
Projecting -2.
Printing -2.
Projecting -1.
Printing -1.
Projecting 0.
Printing 0.
Projecting 1.
Printing 1.
Projecting 2.
Printing 2.

Bingle! Finally, the truth of LINQ to Objects is almost reached.

Published Tuesday, March 16, 2010 12:06 AM by Dixin

Comments

No Comments

Leave a Comment

(required) 
(required) 
(optional)
(required)