The Linq between C# and C++

C# is the hot new language, unless you’re Don Box in which case it’s probably Ruby, but let’s pretend its C# for the purpose of this discussion.

:)

A future version of C# will allow you to write the following:

int[] numbers = { 10, 0, 9, 1, 8, 2, 7, 3, 6, 4, 5 };
 
var query = from n in numbers
            where n < 5
            orderby n
            select n;
 
foreach (var n in query)
{
    Console.WriteLine(n);
}

Running this code results in the following output:

0
1
2
3
4

If you dream in SQL then this code might make you drool. If you dream in C++ or C# then this might make you cringe. Whatever your persuasion, this code is intriguing and has the potential of dramatically simplifying certain types of coding patterns. Think about creating and consuming data in XML documents or relational databases.

Can this work? Is it type-safe? And where does it leave C++? Let’s take a look.

What on earth is ‘var’?

var is the C# rendition of the compromise certain strongly-typed languages are making to appease the onslaught of loosely-typed languages. In a future version of C# you will be able to declare a variable and leave the compiler to infer its type based on its initializer expression. Consider the following example:

Dictionary<int, string> dictionary = new Dictionary<int, string>();
Dictionary<int, string>.KeyCollection.Enumerator enumerator = dictionary.Keys.GetEnumerator();

There is a whole lot of type information that seems redundant to humans yet compilers appear to require it. Well no longer. Making use of the C# var keyword, the code can be simplified considerably (while the resulting IL remains the same):

var dictionary = new Dictionary<int, string>();
var enumerator = dictionary.Keys.GetEnumerator();

The ISO C++ committee has been moving in the same direction and approved a type deduction system for C++ that works in much the same way, with the obligatory syntactic sugar that we C++ developers love to hate. The new (old) auto keyword indicates that the type of the variable be deduced from the initializer expression. Consider the following C++ equivalent to the previous C# example:

Dictionary<int, String^> dictionary;
Dictionary<int, String^>::KeyCollection::Enumerator^ enumerator = dictionary.Keys->GetEnumerator();

Using the proposed auto keyword it can be simplified as follows:

Dictionary<int, String^> dictionary;
auto^ enumerator = dictionary.Keys->GetEnumerator();

So what is LINQ?

LINQ stands for Language-Integrated Query. Which language? Well any language that purports to target the future of the .NET Framework. Much of the attention around LINQ focuses on C#, being the poster child for the .NET Framework, but there is nothing stopping other languages from providing the language bindings necessary to integrate query facilities into the language.
 
To understand what LINQ really is in relation to C# we need to look under the covers. Here is the query declaration again:

var query = from n in numbers
            where n < 5
            orderby n
            select n;

We have already discussed what var is for, but for the sake of this discussion let’s keep things explicit:

IEnumerable<int> query = from n in numbers
                         where n < 5
                         orderby n
                         select n;

C# uses patterns, not unlike the way C++ templates work, to translate query expressions into method calls. Because of this, the query expression is suitably type-safe and is not simply an expression evaluated at runtime as is the case with many loosely-typed languages. The query expression above can be rewritten using method calls and this is essentially what the compiler does on your behalf:

IEnumerable<int> _subset = Sequence.Where<int>(numbers,
                                               n => n < 5);
 
IEnumerable<int> query = Sequence.OrderBy<int, int>(_subset,
                                                    n => n);

This now looks a lot more like C# but there is still the matter of the parameter expressions. These are known as C# lambda expressions, which provide a more concise syntax for writing anonymous methods. This can in turn be rewritten using anonymous methods as follows:

IEnumerable<int> _subset = Sequence.Where<int>(numbers,
                                               delegate(int n) { return n < 5; });
 
IEnumerable<int> query = Sequence.OrderBy<int, int>(_subset,
                                                    delegate(int n) { return n; });

Of course anonymous methods are just shorthand for named methods:

var _subset = Sequence.Where<int>(numbers,
                                  ConstraintFunction);
 
var query = Sequence.OrderBy<int, int>(_subset,
                                       SelectFunction);
 
.
.
.
 
static bool ConstraintFunction(int n)
{
    return n < 5;
}
 
static int SelectFunction(int n)
{
    return n;
}

So as you can see, query expressions are much like “for each” statements where the compiler takes a simpler expression and produces the more verbose imperative code on your behalf. Writing the query expression is just so much simpler and to-the-point:

var query = from n in numbers
            where n < 5
            orderby n
            select n;

Where does this leave C++?

Let’s start with what you can do today. Today you can already use the System.Query assembly, on which LINQ is based, and write the equivalent code as follows:

IEnumerable<int>^ _subset = Sequence::Where<int>(safe_cast<IEnumerable<int>^>(numbers),
                                                 gcnew Func<int, bool>(ConstraintFunction));
 
IEnumerable<int>^ query = Sequence::OrderBy<int, int>(_subset,
                                                      gcnew Func<int, int>(SelectFunction));
 
.
.
.
 
bool ConstraintFunction(int n)
{
    return n < 5;
}

int SelectFunction(int n)
{
    return n;
}

But I know what you’re saying, that syntactic sugar is just sprinkled on way too thick, and I agree. In the future we will hopefully be able to use automatic type inference and lambda expressions (as a language not library feature) to simplify constructs such as these. I hope the Visual C++ team continues the efforts they started with the Visual C++ 2005 release and pioneer modern language features in the Visual C++ compiler.


© 2006 Kenny Kerr

 

8 Comments

  • Hmmm ... I'm not too sure about these two developments in C#, especially the use of the &quot;var&quot; keyword.



    To me, LINQ seems to be a logical (?) extension of some of the query features of ADO.NET datasets to .NET collections and I can appreciate the power of this. However, seeing &quot;var&quot; definitely jars - is this JavaScript? ;-)



    On the other hand, it may just need some getting used to :-)



    Ashley Visagie

  • Kenny,



    I loved the way you trivialized the fancy C# syntactic sugar to the actual classes and methods! :-)

  • But Nish, that kind of removes the whole &quot;Language Integrated&quot; part out of LINQ, doesn't it? ;)

  • David, yeah it does. And eventually, it's syntax that makes a language popular. VB.NET can do most of the stuff C# can, and perhaps a little more (named indexers), yet people from the two camps fight each other. C++/CLI folks (a minority among CLI developers) will have to be happy with the fact that they can at least use the underlying library used by LINQ. It's better than not being able to do anything at all.

  • var doesn't sound that bad. the type has to be declared when the variable is declared is my understanding.



    so it's not like you can assign a string to an int and then back again.



    it basically allows you to type less and given that it's still a one line declaration you can see the type of variable being created.



  • &gt; var doesn't sound that bad.



    I didn't mean the meaning of the keyword, but the chosen word for the keyword. Of course I was mostly teasing anyway, though the historical narration wasn't teasing and is painful.



    &gt; the type has to be declared when the

    &gt; variable is declared is my understanding.



    Yes, that was true from the beginning in both Pascal and C. The way it's being proposed now is that type inference will be usable in the declaring statement instead of making the programmer write repetitive syntax, and this has nothing to do with the original meaning of the keyword &quot;var&quot; or the original kinds of datatype specifications that Pascal and C had (and that had been added to Fortran and Basic some time after those languages had been initialized). So I was really teasing about the name of the keyword and not about its meaning.

  • I think there are two different issues here. The first is the concepts embodied by these new language features. The second is the choice of keywords. I think few people will dispute the value of the concepts being introduced.



    Some of us with history going back to various other languages and doctrines will feel some pain while typing “var” simply because it reminds us of the painful type schizophrenia immortalized by the VARIANT data type. Us “experienced” developers have a far easier time getting to grips with C++’s proposed “auto” keyword simply because it more accurately captures the meaning of AUTOmatic type deduction. For the probies out there that have largely only used C#, there is little history and therefore little resistance to the “var” keyword.



    :)

  • > Dictionary dictionary;

    *ahem*
    that is not C++

    it's C++/CLI and that is not C++, whatever MS want you to think

Comments have been disabled for this content.