ASP.NET Hosting

Using the Select LINQ query operator with indexes

Yesterday, Fred asked me if I could help him to convert C# code to LINQ. The solution may not obvious to find unless you know LINQ well. I will reproduce here the solution I gave Fred. Whether the LINQ version of the code is easier to read than the original one is arguable. The purpose here is more to show LINQ's Select query operator in action.

Here is the original code:

int CountCorrectChars(string proposedValue, string correctValue)
{
  int correctCount = 0;
  for (int i=0; i < proposedValue.Length && i < correctValue.Length; i++)
    if (proposedValue[i] == correctValue[i])
      correctCount++;
  return correctCount;
}

Here is the LINQ version that I suggested:

int CountCorrectChars(string proposedValue, string correctValue)
{  
return correctValue
    .Select((testChar, index) => new {Character=testChar, Index=index})
    .Count(testChar => (testChar.Index < proposedValue.Length)
&& (testChar.Character == proposedValue[testChar.Index]));
}

As you can see, the LINQ version is not so easy to understand and is verbose. Of course, we could use shorter names, but that wouldn't change the complexity of the query. The LINQ version is not as good in terms of performance either... So, should we use LINQ or not? My point here is that LINQ is not a "one size fits all" solution. You should use it wisely and avoid complexifying code by choosing always to use LINQ.

What's interesting in this example, is also simply the use of Select with a two-parameter lambda expression. You may know the version of Select that takes a single-parameter lambda well, but its counterpart is less known (and used).

This is something that we cover in LINQ in Action in section 4.4.2. Here is what we write there, which gives another example of Select in action:

The Select and SelectMany operators can be used to retrieve the index of each element in a sequence. Let’s say we want to display the index of each book in our collection before we sort them in alphabetical order:

index=3         Title=All your base are belong to us
index=4         Title=Bonjour mon Amour
index=2         Title=C# on Rails
index=0         Title=Funny Stories
index=1         Title=LINQ rules

Here is how to use Select to achieve that:

Listing 4.15    Code-behind for the first ASP.NET page    (SelectIndex.csproj)
var books =
  SampleData.Books
    .Select((book, index) => new { index, book.Title })
    .OrderBy(book => book.Title);
ObjectDumper.Write(books);

This time we can’t use the query expression syntax because the variant of the Select operator that provides the index has no equivalent in this syntax. Notice that this version of the Select method provides an index variable that we can use in our lambda expression (precision not in the book: its not the name "index" that is important. You can use another name if you want. What makes the difference is that the lambda expression takes two parameters). The compiler automatically determines which version of the Select operator we want to use just by looking at the presence or absence of the index parameter. Notice also that we call Select before OrderBy. This is important to get the indices before the books are sorted, not after.

...One more tool in your toolbox. Now, use it wisely.

Update: Mark Sowul suggests a simpler solution:

return correctValue.Where((testChar, index) => index < proposedValue.Length && testChar == proposedValue[index]).Count();
Somehow I missed that Where overload. 

Cross-posted from http://LinqInAction.net

12 Comments

  • This one is much better:

    return correctValue.Where((testChar, index) =>
    index < proposedValue.Length &&
    testChar == proposedValue[index]).Count();

  • Thanks Mark. I missed that Where overload!

  • "You may know the version of Select that takes a single-parameter lambda well, but its counterpart is less known (and used)."

    Guess the same is true about Where : )

  • I must say I have tried and cannot see the beauty or advance in using LINQ for searching matching array-elements.

    This is a simple task and it went (at least for me) far too much when it comes to complexity and un-readability of something so trivial. What will 16-table joins with conditional-filtering look like if we want someone to be able to modify them one day after we finish writing this code?

    For me personally, generics, anonymous-functions and good ORM design-practices are just about enough to get the objects from and to the DB.
    Once data-classes are defined and marked with the appropriate attributes that map them to the DB, the generation of prepared-statements, stored procedures and functions is just a bit of coding away from you, and it gets to be so trivial to read and understand it that it almost defies any other reason. Anything but performance.

  • "What will 16-table joins with conditional-filtering look like if we want someone to be able to modify them one day after we finish writing this code?"

    Probably just as ugly as it would in raw SQL. But it wouldn't necessarily have to be in a monolithic query anyway.

    What makes LINQ a huge win are the lazy evaluation and composability. (I like the syntax in most cases, and dispute that it makes things automatically look more complicated. Anything can be miused - half the point of this post was that you shouldn't make things too unreadable with complicated LINQ queries).

    For example, we could break the problem above up into two steps: one function that returns IEnumerable of matching characters:

    return correctValue.Where((testChar, index) => index < proposedValue.Length && testChar == proposedValue[index])

    and the other that gets the count of this. Now if you needed to know "are there /any/ such characters?" you can use .Any(), and if the first character matches, well now you've avoided iterating the whole list.

    The fact that queries are composable is a huge win for SQL (mainly n-tier apps) because you can break your business logic into functions that take and return IQueryable and chain them together, and execution happens only once. This is impossible to do with string queries or just stored procedures without either unmaintainable string concatenations or multiple successive trips to the DB (or long, ugly queries with repeated sections).

  • How timely! I just discovered the Where() overload myself a day or two ago. And, likewise, I decided it was a little too arcane for my current need. Nice to know it is there!

  • Another alternative:

    return proposedValue
    .Take(correctValue.Length)
    .TakeWhile((c, i) => c == correctValue[i])
    .Count();

  • Thanks Richard. This is a good alternative.

  • I've just noticed that your original method counts all characters that match, whereas my Take/TakeWhile alternative stops after the first character that doesn't match. If you want the original behaviour, you can replace TakeWhile with Where.

  • How do you convert this:

    string[ ] cities = { "Houston", "Richmond", "Sugar Land" };
    var trips = from c1 in cities
    from c2 in cities
    where c1 != c2
    select new string[ ] { c1, c2 };

    to:

    cities.Select(...) syntax ?

  • Paganen, see the tips provided here: http://weblogs.asp.net/fmarguerie/archive/2009/02/06/converting-linq-queries-from-query-syntax-to-method-operator-syntax.aspx

    It will give you this:

    cities
    .SelectMany(
    c1 => cities,
    (c1, c2) => new { c1 = c1, c2 = c2 }
    )
    .Where(pair => (pair.c1 != pair.c2))
    .Select(pair => new[] { pair.c1, pair.c2 });

    The trick here is to project all elements into pairs with SelectMany.

  • Wise sages:

    How do I generate the index in this case?

    System.Data.DataTable dtProjects = LINQToDataTable(from p in new TasksDataContext().Projects orderby p.ProjectOrder select new { p.IDProject, p.Project, p.Notes, p.HasChildren, index});

    The call to the function using the LINQ syntax works fine without the ", index" but gives me a CS0103: The name 'index' does not exist in the current context
    error.

    Getting the row number should be ubiquitously easy!

    Linq not inherently supporting SQL_Server ROW_NUMBER() OVER(Partition By Order By) and not inherently just doing index (in this context) is especially painful since most of the controls (rows and datapager) are addressable by their index versus their SelectedValue/DataKey.

    Woulda. Coulda. Shoulda. If "ifs" and "buts" were candy and nuts we'd all have a merry Christmas."

Comments have been disabled for this content.