[LINQ via C# series]
After understanding the programming paradigm and why LINQ query methods can be chaining, this post shows the details of LINQ query methods.
Methods like Where(), OrderBy(), OrderByDescending(), and Select() are exhibited again and again in the previous posts. These .NET built-in methods are called LINQ standard query methods, which form the Language-Integrated Query pattern:
- Restriction: Where, OfType
- Projection: Select, SelectMany
- Ordering: OrderBy, ThenBy, OrderByDescending, ThenByDescending, Reverse
- Join: Join, GroupJoin
- Grouping: GroupBy
- Set: Zip, Distinct, Union, Intersect, Except
- Aggregation: Aggregate, Count, LongCount, Sum, Min, Max, Average
- Partitioning: Take, Skip, TakeWhile, SkipWhile
- Cancatening: Concat
- Conversion: ToSequence, ToArray, ToList, ToDictionary, ToLookup, Cast
- Equality: SequenceEqual
- Elements: First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault, DefaultIfEmpty
- Generation: Range, Repeat, Empty
- Qualifiers: Any, All, Contains
According to the previous post, a query method is just a method defined on System.Linq.Enumerable, and work with IEnumerable<T>. This post will focus on querying data items in IEnumerable<T> collection.
In the above list, the underlined queries can be expressed with query expressions. Understanding C# 3.0 Features (6) Lambda Expression has explained all query expressions are compiled into query methods invocations. So whatever the query is like, that is the query methods working.
Standard query methods are also called standard query operators. This post will consistently call them query methods to avoid introducing new names.
Restriction (Filter)
To filter the items in a IEnumerable<T>, Where() extension method is defined:
namespace System.Linq
{
public static class Enumerable
{
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// ...
}
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)
{
// ...
}
}
}
Again, the Where() method is added to IEnumerable<T>, and it returns a IEnumerable<T>, so it is fluent.
The usage of Where is very nature:
IEnumerable<int> source = new int[] { 0, -1, 2 };
IEnumerable<int> positive = source.Where(item => item > 0);
Its parameter is a Func<T, bool> predicate, a lambda expression as an anonymous function, whose parameter is a T object, and return value is a bool. In this sample, predicate filters each item and returns
- true so that item is kept in the result;
- false so that item is dropped.
The Func generic delegate types are explained in Understanding C# 3.0 Features (6) Lambda Expression.
The following query works similar with T-SQL “LIKE”:
IEnumerable<string> source = GetData();
IEnumerable<string> results = source.Where(item => item.Contains("Dixin"));
There is one more overload of Where(), the predicate takes a second int parameters, the index of the item in the source:
IEnumerable<string> results = source.Where(
(item, index) => index % 2 == 0 && item.Contains("Dixin"));
The other filer method is OfType():
IEnumerable<TResult> OfType<TResult>(this IEnumerable source)
It is used to filter the items of the specified type:
// MemoryStream and FileStream inherit Stream.
IEnumerable<Stream> source = new Stream[] {
new MemoryStream(), new FileStream(path, FileMode.Create) };
IEnumerable<MemoryStream> results = source.OfType<MemoryStream>();
Projection (Mapping / binding)
Projection means to transform, or to map. The first kind is, in the source IEnumerable<TSource> collection, transforms each TSource item to a TResult item, and results a IEnumerable<TResult> collection:
IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, TResult> selector)
IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, int, TResult> selector)
The latter overload has a second int parameter for the selector, which is the index of the item.
The following sample queries types which implement IComparable in mscorlib.dll:
IEnumerable<string> comparables = Assembly.Load("mscorlib").GetExportedTypes()
.Where(type => typeof(IComparable) != type && typeof(IComparable).IsAssignableFrom(type))
// Maps each one Type item to one string item.
.Select(type => type.FullName);
foreach (string item in comparables)
{
Console.WriteLine(item);
}
The other kind is, map each one TSource item to many TResult items, and results a IEnumerable<TResult> collection:
IEnumerable<TResult> SelectMany<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector)
IEnumerable<TResult> SelectMany<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, int, IEnumerable<TResult>> selector)
For example, one Type has one FullName, but has many methods. The following code queries obsolete methods in mscorlib.dll:
IEnumerable<string> methods = Assembly.Load("mscorlib").GetExportedTypes()
// Maps each one Type item to many MethodInfo items.
.SelectMany(type => type.GetMethods())
// Filters the resulted MethodInfo items.
.Where(method => Attribute.IsDefined(method, typeof(ObsoleteAttribute)))
// Maps each one MethodInfo item to one string item.
.Select(method => "{0}.{1}".FormatWith(method.DeclaringType.FullName, method.Name));
foreach (string item in methods)
{
Console.WriteLine(item);
}
SelectMany() is used to flatten the collection of types into collection of methods.
Ordering
The ordering methods are also very nature:
IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer)
IOrderedEnumerable<TSource> OrderByDescending<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
IOrderedEnumerable<TSource> OrderByDescending<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer)
IOrderedEnumerable<TSource> ThenBy<TSource, TKey>(
this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector)
IOrderedEnumerable<TSource> ThenBy<TSource, TKey>(
this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer)
IOrderedEnumerable<TSource> ThenByDescending<TSource, TKey>(
this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector)
IOrderedEnumerable<TSource> ThenByDescending<TSource, TKey>(
this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer)
The only question is about ThenBy(): what is the difference between OrderBy().OrderBy() chaining and OrderBy().ThenBy() chaining?
// Each item stands for a person. Key is name, while Value is age.
Dictionary<string, int> source = new Dictionary<string, int>()
{
{ "Anna", 2 },
{ "Bill", 1 },
{ "Carl", 1 }
};
IEnumerable<KeyValuePair<string, int>> result = source
// Orders by age.
.OrderBy(person => person.Value)
// But Finally orders by name.
.OrderBy(person => person.Key);
foreach (KeyValuePair<string, int> person in result)
{
Console.WriteLine("{0}: {1}", person.Key, person.Value);
}
The above ordering is equal to:
// Orders by name.
IEnumerable<KeyValuePair<string, int>> result = source.OrderBy(person => person.Key);
So the result is:
Anna: 2
Bill: 1
Carl: 1
And here is ThenBy():
IEnumerable<KeyValuePair<string, int>> result = source
// Orders by age.
.OrderBy(person => person.Value)
// If multiple persons has the same age, order them by name.
.ThenBy(person => person.Key);
So the result is:
Bill: 1
Carl: 1
Anna: 2
It is the same for OrderByDescending() and ThenByDescending().
Join
Inner join
The Join() query method is used for inner join, pairing one item form a collection with one item from another collection:
IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer, IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer, IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
Here is an example:
// Collection of <name, age>.
IEnumerable<Tuple<string, int>> outerSource = new Tuple<string, int>[]
{
Tuple.Create("Mark", 18),
Tuple.Create("Steven", 18)
};
// Collection of <name, language>.
IEnumerable<Tuple<string, string>> innerSource = new Tuple<string, string>[]
{
Tuple.Create("Mark", "C#"),
Tuple.Create("Mark", "C++"),
Tuple.Create("Mark", "JavaScript"),
Tuple.Create("Steven", "C#"),
Tuple.Create("Dixin", "F#")
};
// Inner joins <name, age> and <name, language> on name.
IEnumerable<Tuple<Tuple<string, int>, Tuple<string, string>>> results = outerSource.Join(
innerSource,
// Uses <name, age>'s name to pair.
outerItem => outerItem.Item1,
// Uses <name, language>'s name to pair.
innerItem => innerItem.Item1,
// Pairs each <name, age> item with one <name, language> item,
// where the <name, age>'s name is equal to <name, language> item's name.
(outerItem, innerItem) => Tuple.Create(outerItem, innerItem));
// Prints each pair.
foreach (Tuple<Tuple<string, int>, Tuple<string, string>> pair in results)
{
// Each pair is combined with one outerItem and one innerItem.
Tuple<string, int> outerItem = pair.Item1;
Tuple<string, string> innerItem = pair.Item2;
Console.WriteLine("({0}, {1}) is paired with a item: ({2}, {3})",
outerItem.Item1, outerItem.Item2,
innerItem.Item1, innerItem.Item2);
}
The query compares each item of outerSource with each item of innerSource to find all pairs of items which satisfy the join-predicate: outerItem's name is equal to innerItem's name. When the join-predicate is satisfied, a result item for each matched pair is combined into results. So the above program prints out:
(Mark, 18) is paired with a item: (Mark, C#)
(Mark, 18) is paired with a item: (Mark, C++)
(Mark, 18) is paired with a item: (Mark, JavaScript)
(Steven, 18) is paired with a item: (Steven, C#)
The (“Dixin”, “F#”) item in innerSource collection is not included in the results, because it cannot join with any item in outerSource.
Outer join
The GroupJoin() is used to pair one item form a collection with one group of items from another collection, that is, outer join:
IEnumerable<TResult> GroupJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer, IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector,
Func<TOuter, IEnumerable<TInner>, TResult> resultSelector)
IEnumerable<TResult> GroupJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer, IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector, Func<TInner, TKey> innerKeySelector,
Func<TOuter, IEnumerable<TInner>, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
Use the above outerSource and innerSource again:
// Group joins <name, age> and <name, language> on name.
IEnumerable<Tuple<Tuple<string, int>, IEnumerable<Tuple<string, string>>>> results = outerSource.GroupJoin(
innerSource,
// Uses <name, age>'s name to pair.
outerItem => outerItem.Item1,
// Uses <name, language>'s name to pair.
innerItem => innerItem.Item1,
// Pairs each <name, age> item with all <name, language> items,
// where the <name, age>'s name is equal to those <name, language> items' names.
(outerItem, innerItemsGroup) => Tuple.Create(outerItem, innerItemsGroup));
// Prints each pair.
foreach (Tuple<Tuple<string, int>, IEnumerable<Tuple<string, string>>> pair in results)
{
// Each pair is combined with one outerItem and one group of innerItems.
Tuple<string, int> outerItem = pair.Item1;
IEnumerable<Tuple<string, string>> innerItemsGroup = pair.Item2;
Console.Write("({0}, {1}) is paired with a group: ", outerItem.Item1, outerItem.Item2);
foreach (Tuple<string, string> innerItem in innerItemsGroup)
{
Console.Write("({0}, {1}) ", innerItem.Item1, innerItem.Item2);
}
Console.WriteLine();
}
The difference is obvious:
(Mark, 18) is paired with a group: (Mark, C#) (Mark, C++) (Mark, JavaScript)
(Steven, 18) is paired with a group: (Steven, C#)
Cross join
SelectMany() can also be used for cross join. The following example:
IEnumerable<string> names = new string[] { "Dixin", "Steven", "Mark" };
IEnumerable<string> languages = new string[] { "C#", "F#", "Haskell" };
// Cross joins collection names and collection languages.
IEnumerable<Tuple<string, string>> results = names.SelectMany(
(name, languegs) => languages,
(name, language) => Tuple.Create(name, language));
foreach (Tuple<string, string> item in results)
{
Console.WriteLine("({0}, {1})", item.Item1, item.Item2);
}
prints:
(Dixin, C#)
(Dixin, F#)
(Dixin, Haskell)
(Steven, C#)
(Steven, F#)
(Steven, Haskell)
(Mark, C#)
(Mark, F#)
(Mark, Haskell)
Grouping
The GroupBy() query method divides items in a collection into groups via a key:
IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
IEnumerable<IGrouping<TKey, TElement>> GroupBy<TSource, TKey, TElement>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector)
IEnumerable<IGrouping<TKey, TElement>> GroupBy<TSource, TKey, TElement>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector,
IEqualityComparer<TKey> comparer)
where a group is presented by IGrouping<Tkey, T>:
namespace System.Linq
{
public interface IGrouping<out TKey, out TElement> : IEnumerable<TElement>, IEnumerable
{
TKey Key { get; }
}
}
A IGrouping<TKey, T> collection is nothing but a IEnumerable<T> collection with a Key.
The concept of grouping is very easy. Take the above innerSource as an example:
// Collection of <name, language>.
IEnumerable<Tuple<string, string>> source = new Tuple<string, string>[]
{
Tuple.Create("Mark", "C#"),
Tuple.Create("Mark", "C++"),
Tuple.Create("Mark", "JavaScript"),
Tuple.Create("Steven", "C#"),
Tuple.Create("Dixin", "F#")
};
// <name, language> items in source are divided into groups.
IEnumerable<IGrouping<string, Tuple<string, string>>> groups = source.GroupBy(
// In each group, <name, language> items have the same key (language).
item => item.Item2);
// Prints each group.
foreach (IGrouping<string, Tuple<string, string>> group in groups)
{
// Each pair is combined with one outerItem and a group of innerItems.
string key = group.Key;
IEnumerable<Tuple<string, string>> items = group;
Console.Write("Group {0}: ", key);
foreach (Tuple<string, string> item in items)
{
Console.Write("({0}, {1}) ", item.Item1, item.Item2);
}
Console.WriteLine();
}
The result is:
Group C#: (Mark, C#) (Steven, C#)
Group C++: (Mark, C++)
Group JavaScript: (Mark, JavaScript)
Group F#: (Dixin, F#)
The other kind of GroupBy() is just doing more step of mapping each group into some other stuff:
IEnumerable<TResult> GroupBy<TSource, TKey, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, IEnumerable<TSource>, TResult> resultSelector)
IEnumerable<TResult> GroupBy<TSource, TKey, TElement, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector,
Func<TKey, IEnumerable<TElement>, TResult> resultSelector)
IEnumerable<TResult> GroupBy<TSource, TKey, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, IEnumerable<TSource>, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
IEnumerable<TResult> GroupBy<TSource, TKey, TElement, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector,
Func<TKey, IEnumerable<TElement>, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
For example:
// <name, language> items in source are divided into groups.
IEnumerable<string> groups = source.GroupBy(
// In each group, <name, language> items have the same key (language).
item => item.Item2,
// Each group is projected to one string message.
(key, groupItems) => "{0}: {1}".FormatWith(key, groupItems.Count()));
foreach (string group in groups)
{
Console.WriteLine(group);
}
The result is:
C#: 2
C++: 1
JavaScript: 1
F#: 1
The above grouping with projection works the same as:
IEnumerable<string> groups = source.GroupBy(item => item.Item2)
// Each group is mapped to one string message.
.Select(group =>
{
string key = group.Key;
IEnumerable<Tuple<string, string>> groupItems = group;
return "{0}: {1}".FormatWith(key, groupItems.Count());
});
Aggregation (Folding)
Aggregation is accumulation, applying a function over a collection’s items, from the first to the last.
TSource Aggregate<TSource>(
this IEnumerable<TSource> source, Func<TSource, TSource, TSource> func)
For example:
IEnumerable<string> source = new string[] { "C#", "F#", "C", "JavaScript" };
string result = source.Aggregate((itemA, itemB) => "{0}, {1}".FormatWith(itemA, itemB));
Console.WriteLine(result); // "C#, F#, C, JavaScript"
This is how the Aggregate() accumulates from the fist item to the last:
- Round 1: Since the accumulator needs 2 parameters (itemA and itemB), it takes the fist item (“C#”) and the second item (“F#”), and results “C#, F#”;
- Round 2: Accumulator takes the result of last round (“C#, F#”), and the third item (“C”), results “C#, F#, C”;
- Round 3: Accumulator takes the result of last round (“C#, F#, C”), and the fourth item (“JavaScript”), results “C#, F#, C, JavaScript”;
- There is no more items in the source collection. The accumulation ends, and Aggregate() returns “C#, F#, C, JavaScript”.
Isn’t it easy?
The Aggregate() can return any other type of result besides TSource:
TAccumulate Aggregate<TSource, TAccumulate>(
this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func)
For example, it can returns a int:
IEnumerable<string> source = new string[] { "C#", "F#", "C", "JavaScript" };
int totalCharCount = source.Aggregate(0, (charCount, item) => charCount + item.Length);
Console.WriteLine(totalCharCount); // 15
Here is how the Aggregate() accumulates:
- Round 1: Since the accumulator needs 2 parameters (charCount and item), it takes 0 (the seed, first argument for Aggregate()) and the the first item “C#”, and calculate 0 + “C#”’s length, results 2;
- Round 2: Accumulator takes the result of last round (2), and the second item (“F#”), calculates 2 + “F#”’s length, results 4;
- Round 3: Accumulator takes the result of last round (4), and the third item (“C”), calculates 4 + “C”’s length, results 5;
- Round 4: Accumulator takes the result of last round (5), and the fourth item (“JavaScript”), calculates 5 + “JavaScript”’s length, results 15;
- There is no more items in the source collection. The accumulation ends, and Aggregate() returns 15.
The last overload of Aggregate() takes one more parameter:
TResult Aggregate<TSource, TAccumulate, TResult>(
this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func,
Func<TAccumulate, TResult> resultSelector)
The third parameter is nothing but a function to do projection:
IEnumerable<string> source = new string[] { "C#", "F#", "C", "JavaScript" };
string totalCharCount = source.Aggregate(0, (charCount, item) => charCount + item.Length,
// Projects an int to a string.
finalResult => finalResult.ToString(CultureInfo.InvariantCulture));
This query works the same as:
string totalCharCount = source.Aggregate(0, (charCount, item) => charCount + item.Length)
// Converts an int to a string.
.ToString(CultureInfo.InvariantCulture));
Set
The Zip() method is introduced in .NET 4.0:
IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first, IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
It is used to merge two collections into one collection, pairing items one by one:
IEnumerable<string> names = new string[] { "Mark", "Steven", "Dixin" };
IEnumerable<int> ages = new int[] { 18, 19 };
// Merges names and ages by pairing <name, age>.
IEnumerable<Tuple<string, int>> results = names.Zip(ages, (name, age) => Tuple.Create(name, age));
foreach (Tuple<string, int> item in results)
{
Console.WriteLine("{0}: {1}", item.Item1, item.Item2);
}
The result is:
Mark: 18
Steven: 19
In the names collection, the third item “Dixin” is not included in the results, because the other ages collection has only 2 items. There is no item can pair with “Dixin”.
Other query methods
The other query methods are so simple that they can be used by just taking a look at the method names:
- Ordering: Reverse
- Aggregation: Count, LongCount, Sum, Min, Max, Average
- Partitioning: Take, Skip, TakeWhile, SkipWhile
- Cancatening: Concat
- Set: Distinct, Union, Intersect, Except
- Conversion: ToSequence, ToArray, ToList, ToDictionary, ToLookup, Cast
- Equality: SequenceEqual
- Elements: First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, ElementAt, ElementAtOrDefault, DefaultIfEmpty
- Generation: Range, Repeat, Empty
- Qualifiers: Any, All, Contains
Among all the above query methods, by composting Take() and Skip(), it is very easy to implement pagination:
IEnumerable<string> source = GetData();
IEnumerable<string> results = source.Skip(20).Take(10);
So a Page() query method can be defined easily:
public static IEnumerable<TSource> Page<TSource>(
this IEnumerable<TSource> source, int pageIndex, int pageSize)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (pageIndex < 0)
{
throw new ArgumentOutOfRangeException("pageIndex");
}
if (pageSize < 0)
{
throw new ArgumentOutOfRangeException("pageSize");
}
return source.Skip(pageIndex * pageSize).Take(pageSize);
}
And, one last thing need to pay attention is, the generation methods:
IEnumerable<TResult> Empty<TResult>()
IEnumerable<int> Range(int start, int count)
IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
are not extension methods:
IEnumerable<string> source = Enumerable.Repeat<string>("I want WinFS.", int.MaxValue);