[LINQ via C# series]
Iterator pattern is the core pattern of LINQ to Objects implementation. To filter, order, or project the data items of a data collection, of course the code need to go through the collection and figure out the results. The previous post explained that:
- a collection should be represented by a IEnumerable<T>;
- the collection’s iterator should be represented by a IEnumerator<T>;
- the iterator should be returned via invoking collection’s GetEnumerator() method;
- the items should be traversed via repeatedly invoking the iterator’s MoveNext() method and Current property.
Consider a common scenario, the program need to retrieve a string collection of several string items, then iterate the collection and print out each string item:
// Gets a string collection {"1", "2", "3"}.
IEnumerable<string> collection = GetMessages();
// Iterates the collection in a foreach, prints each item.
using (IEnumerator<string> iterator = collection.GetEnumerator())
{
while (iterator.MoveNext())
{
// By traversing each item, filter / ordering / projection / ... can be done.
string item = iterator.Current;
Console.WriteLine(item);
}
}
So how could the GetMessages() method be implemented?
Implement IEnumerable<T> and IEnumerator<T>
Of course, the easiest way is to directly return an IEnumerable<string> collection of data items, like a string[] or a List<string>, etc:
public static IEnumerable<string> GetMessages()
{
return new string[] // string[] implicitly implements IEnumerable<string>.
{
"1",
"2",
"3"
};
}
But if trying to follow the standard implementation of iterator pattern, it becomes complicated:
public static IEnumerable<string> GetMessages()
{
return new MessageCollection( // MessageCollection should implement IEnumerable<string>.
"1",
"2",
"3"
);
}
First, a concrete IEnumerable<string> type (Here named MessageCollection) is required to implement:
internal class MessageCollection : IEnumerable<string>
{
// Persists the data.
private string[] _messages;
public MessageCollection(params string[] messages)
{
this._messages = messages;
}
#region IEnumerable<string> Members
public IEnumerator<string> GetEnumerator()
{
// Returns the iterator.
return new MessageIterator(this._messages);
}
#endregion
#region IEnumerable Members
IEnumerator IEnumerable.GetEnumerator() // IEnumerable<string> implements IEnumerable.
{
throw new NotImplementedException();
}
#endregion
}
Then, a concrete IEnumerator<string> iterator (Here named MessageIterator) is required to implement:
internal class MessageIterator : IEnumerator<string>
{
// Persists the data.
private string[] _messages;
private int _currentIndex = -1;
public MessageIterator(string[] messages)
{
this._messages = messages;
}
#region IEnumerator<string> Members
public string Current
{
get { return this._messages[this._currentIndex]; }
}
#endregion
#region IDisposable Members
public void Dispose() // IEnumerator<string> implements IDisopable.
{
}
#endregion
#region IEnumerator Members
object IEnumerator.Current // IEnumerator<string> implements IEnumerable.
{
get { throw new NotImplementedException(); }
}
public bool MoveNext()
{
this._currentIndex++;
return this._currentIndex < this._messages.Length;
}
public void Reset()
{
throw new NotImplementedException();
}
#endregion
}
It is a huge work to do!
The yield syntactic sugar
Since C# 2.0, the yield syntactic sugar is provided to easily implement a standard iterator pattern. The above GetMessages() can be written like this:
public static IEnumerable<string> GetMessages()
{
yield return "1";
yield return "2";
yield return "3";
}
It is incredibly easy. The implementation of a IEnumerable<string> collection, and that collection’s IEnumerator<string> iterator will be generated automatically. So the above code will be compiled into:
public static IEnumerable<string> GetMessages()
{
return new MessageCollectionAndIterator(-2); // -2: before start;
}
The MessageCollectionAndIterator is the generated type implementing IEnumerable<string> and IEnumerator<string>:
[CompilerGenerated]
internal sealed class MessageCollectionAndIterator : IEnumerable<string>, IEnumerator<string>
{
// -2: before start;
// -1: after ended;
// 0: after start, running;
// 1: runing;
// 2: runing;
// 3: before end.
private int _currentState;
private string _currentItem;
private int _initialThreadId;
[DebuggerHidden]
public MessageCollectionAndIterator(int _state)
{
this._currentState = _state;
this._initialThreadId = Thread.CurrentThread.ManagedThreadId;
}
#region IEnumerable<string> Members
[DebuggerHidden]
IEnumerator<string> IEnumerable<string>.GetEnumerator()
{
// -2: before start;
if ((Thread.CurrentThread.ManagedThreadId == this._initialThreadId)
&& (this._currentState == -2))
{
this._currentState = 0; // 0: after start, running;
return this;
// Now, if MoveNext() is invoked, Current property returns "1".
}
// Another iteration is asking for iterator. Returns a new instance.
return new MessageCollectionAndIterator(0); // 0: after start, running;
}
#endregion
#region IEnumerable Members
[DebuggerHidden]
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<string>)this).GetEnumerator();
}
#endregion
#region IEnumerator<string> Members
string IEnumerator<string>.Current
{
[DebuggerHidden]
get
{
return this._currentItem;
}
}
#endregion
#region IEnumerator Members
public bool MoveNext()
{
switch (this._currentState)
{
// After invoking MoveNext() the first time, Current property returns "1".
case 0:
this._currentItem = "1";
this._currentState = 1;
return true;
// After invoking MoveNext() the second time, Current property returns "2".
case 1:
this._currentItem = "2";
this._currentState = 2;
return true;
// After invoking MoveNext() the third time, Current property returns "3".
case 2:
this._currentItem = "3";
this._currentState = 3;
return true;
// Invoking MoveNext() the fourth time, it returns false. Iteration ends.
case 3:
this._currentState = -1;
break;
}
return false;
}
object IEnumerator.Current
{
[DebuggerHidden]
get
{
return this._currentItem;
}
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
#endregion
#region IDisposable Members
void IDisposable.Dispose()
{
}
#endregion
}
This MessageCollectionAndIterator class can be considered a merge of the above MesssageCollection and MessageIterator class, It represents both the collection (IEnumerable<string>) and the collection’s iterator (IEnumerator<string>).
Yield items of collection
Now take a deeper look at the yield return by adding some debug information:
public static IEnumerable<string> GetMessages()
{
Console.WriteLine("Executing code before all yield returns.");
yield return "1";
Console.WriteLine("Executing code after yield return 1 and before yield return 2.");
yield return "2";
Console.WriteLine("Executing code after yield return 2 and before yield return 3.");
yield return "3";
Console.WriteLine("Executing code after all yield returns.");
}
And put the returned collection into a foreach iteration:
// 3 items are expected from collection.
IEnumerable<string> collection = GetMessages();
// Iterates the collection in a foreach.
using (IEnumerator<string> iterator = collection.GetEnumerator())
{
Console.WriteLine("Before invoking MoveNext().");
while (iterator.MoveNext())
{
Console.WriteLine("After invoking MoveNext().");
Console.WriteLine();
Console.WriteLine("Before invoking Current.");
string item = iterator.Current;
Console.WriteLine(@"Printing item ""{0}"".", item);
Console.WriteLine("After invoking Current.");
Console.WriteLine();
Console.WriteLine("Before invoking MoveNext().");
}
}
Here is the result:
Before invoking MoveNext().
Executing code before all yield returns.
After invoking MoveNext().
Before invoking Current.
Printing item "1".
After invoking Current.
Before invoking MoveNext().
Executing code after yield return 1 and before yield return 2.
After invoking MoveNext().
Before invoking Current.
Printing item "2".
After invoking Current.
Before invoking MoveNext().
Executing code after yield return 2 and before yield return 3.
After invoking MoveNext().
Before invoking Current.
Printing item "3".
After invoking Current.
Before invoking MoveNext().
Executing code after all yield returns.
It is clear that:
- Before entering iterations, the code inside GetMessages() is not executed.
- During iterations, the code inside GetMessages() has totally the same execution order as the declaration order;
- In each iteration, the code inside GetMessages() is executed, till the first yield return is hit, or the end of code is reached.
In the yield’s perspective, the collection works like this:
// Returns a collection which can be iterated.
public static IEnumerable<string> GetMessages()
{
// MoveNext() is invoked and returns true.
Console.WriteLine("Executing code before all yield returns.");
yield return "1";
// Iteration 1 gets item "1" from collection by calling Current.
// MoveNext() is invoked and returns true.
Console.WriteLine("Executing code after yield return 1 and before yield return 2.");
yield return "2";
// Iteration 2 gets item "2" from collection by calling Current.
// MoveNext() is invoked and returns true.
Console.WriteLine("Executing code after yield return 1 and before yield return 2.");
yield return "3";
// Iteration 3 gets item "3" from collection by calling Current.
// MoveNext() is invoked and returns false, because there is no more yield return to reach.
Console.WriteLine("Executing code after all yield returns.");
// foreach ends. There is no more item from collection.
}
Again, the most important thing is, if the returned collection is not iterated by a foreach, the code inside GetMessages() will not be executed at all. This sounds a little lazy and deferred, right?