New "Orcas" Language Feature: Query Syntax
Last month I started a series of posts covering some of the new VB and C# language features that are coming as part of the Visual Studio and .NET Framework "Orcas" release. Here are pointers to the first three posts in my series:
- Automatic Properties, Object Initializer and Collection Initializers
- Extension Methods
- Lambda Expressions
Today's blog post covers another fundamental new language feature: Query Syntax.
What is Query Syntax?
Query syntax is a convenient declarative shorthand for expressing queries using the standard LINQ query operators. It offers a syntax that increases the readability and clarity of expressing queries in code, and can be easy to read and write correctly. Visual Studio provides complete intellisense and compile-time checking support for query syntax.
Under the covers the C# and VB compilers take query syntax expressions and translate them into explicit method invocation code that utilizes the new Extension Method and Lambda Expression language features in "Orcas".
Query Syntax Example:
In my previous language series posts, I demonstrated how you could declare a "Person" class like below:
We could then use the below code to instantiate a List<Person> collection with people values, and then use query syntax to perform a LINQ query over the collection and fetch only those people whose last name starts with the letter "G", sorted by the people's firstname (in ascending order):
The query syntax expression above is semantically equivalent to the below code that uses LINQ extension methods and lambda expressions explicitly:
The benefit with using the query syntax approach is that it ends up being a little easier to read and write. This is especially true as the expression gets richer and more descriptive.
Query Syntax - Understanding the From and Select Clauses:
Every syntactic query expression in C# begins with a "from" clause and ends with either a "select" or "group" clause. The "from" clause indicates what data you want to query. The "select" clause indicates what data you want returned, and what shape it should be in.
For example, let's look again at our query against the List<Person> collection:
In the above code snippet the "from p in people" clause is indicating that I want to perform a LINQ query against the "people" collection, and that I will use the parameter "p" to represent each item in the input sequence I am querying. The fact that we named the parameter "p" is irrelevant - I could just have easily named it "o", "x", "person" or any other name I wanted.
In the above code snippet the "select p" clause at the end of the statement is indicating that I want to return an IEnumerable sequence of Person objects as the result of the query. This is because the "people" collection contains objects of type "Person", and the p parameter represents Person objects within the input sequence. The datatype result of this query syntax expression is consequently of type IEnumerable<Person>.
If instead of returning Person objects, I wanted to return just the firstnames of the people in the collection, I could re-write my query like so:
Note above how I am no longer saying "select p", but am instead saying "select p.FirstName". This indicates that I don't want to return back a sequence of Person objects - but rather I want to return a sequence of strings - populated from each Person object's FirstName property (which is a string). The datatype result of this query syntax expression is consequently of type IEnumerable<string>.
Sample Query Syntax Against a Database
The beauty of LINQ is that I can use the exact same query syntax against any type of data. For example, I could use the new LINQ to SQL object relational mapper (ORM) support provided in "Orcas" to model the SQL "Northwind" database with classes like below (please watch my video here to learn how to-do this):
Once I've defined the class model above (and its mapping to/from the database), I can then write a query syntax expression to fetch all products whose unitprice is greater than $99:
In the above code snippet I am indicating that I want to perform a LINQ query against the "Products" table on the NorthwindDataContext class created by the ORM designer in Visual Studio "Orcas". The "select p" indicates that I want to return a sequence of Product objects that match my query. The datatype result of this query syntax expression is consequently of type IEnumerable<Product>.
Just like with the previous List<Person> collection query syntax example, the C# compiler will translate our declarative query syntax into explicit extension method invocations (using Lambda expressions as the arguments). In the case of the above LINQ to SQL example, these Lambda expressions will then be converted into SQL commands and evaluated within SQL server (so that only those Product rows that match the query are returned to our application). Details on the mechanism that enables this Lambda->SQL conversion can be found in my Lambda Expressions blog post under the "Lambda Expression Trees" section.
Query Syntax - Understanding the Where and OrderBy Clauses:
Between the opening "from" clause and closing "select" clause of a query syntax expression you can use the most common LINQ query operators to filter and transform the data you are querying. Two of the most common clauses you'll end up using are "where" and "orderby". These handle the filtering and ordering of results.
For example, to return a list of alphabetically descending category names from the Northwind database - filtered to only include those categories where there are more than 5 products associated with the category - we could write the below query syntax that uses LINQ to SQL to query our database:
In the above expression we are adding a "where c.Products.Count > 5" clause to indicate that we only want to return category names where there are more than 5 products in the category. This takes advantage of the LINQ to SQL ORM mapping association between products and categories in our database. In the above expression I also added a "orderby c.CategoryName descending" clause to indicate that I want to sort the results in descending order.
LINQ to SQL will then generate the below SQL when querying the database using this expression:
SELECT [t0].[CategoryName] FROM [dbo].[Categories] AS [t0]
WHERE ((
SELECT COUNT(*)
FROM [dbo].[Products] AS [t1]
WHERE [t1].[CategoryID] = [t0].[CategoryID]
)) > 5
ORDER BY [t0].[CategoryName] DESC
Notice how LINQ to SQL is smart and only returns back the single column we need (the categoryname). It also does all of the filtering and ordering in the database layer - which makes it very efficient.
Query Syntax - Transforming Data with Projections
One of the points I made earlier was that the "select" clause indicates what data you want returned, and what shape it should be in.
For example, if you have a "select p" clause like below - where p is of type Person - then it will return a sequence of Person objects:
One of the really powerful capabilities provided by LINQ and query syntax is the ability for you to define new classes that are separate from the data being queried, and to then use them to control the shape and structure of the data being returned by the query.
For example, assume that we define a new "AlternatePerson" class that has a single "FullName" property instead of the separate "FirstName" and "LastName" properties that our origional "Person" class had:
I could then use the below LINQ query syntax to query my origional List<Person> collection, and transform the results to be a sequence of AlternatePerson objects using the query syntax below:
Notice how we can use the new Object Initializer syntax I talked about in the first post in my language series to create a new AlternatePerson instance and set its properties within the "select" clause of our expression above. Note also how I am assigning the "FullName" property by concatenating the FirstName and LastName properties of our origional Person class.
Using Query Syntax Projections with a Database
This projection feature ends up being incredibly useful when working with data pulled from a remote data provider like a database, since it provides us with an elegant way to indicate which columns of data our ORM should actually fetch from a database.
For example, assume I use the LINQ to SQL ORM provider to model the "Northwind" database with classes like below:
By writing the LINQ query below, I am telling LINQ to SQL that I want a sequence of "Product" objects returned:
All of the columns necessary to populate the Product class would be returned from the database as part of the above query, and the raw SQL executed by the LINQ to SQL ORM would look like below:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID],
[t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock],
[t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[UnitPrice] > 99
If I didn't need/want all of these columns for some scenarios, I could alternatively define a new "MyProduct" class like below that has a subset of the properties that the Product class has, as well as one additional property - "TotalRevenue" -- that the Product class doesn't have (note: for people not familiar with C#, the Decimal? syntax indicates that UnitPrice property is a nullable value):
I can then use the projection capability of query syntax to shape the data I want returned from the database using a query like below:
This is indicating that instead of returning a sequence of "Product" objects, I instead want "MyProduct" objects, and that I only need 3 properties of them filled. LINQ to SQL is then smart enough to adjust the raw SQL to execute to only return those three needed product columns from the database:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[UnitPrice]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[UnitPrice] > 99
Just to show-off, I could also populate the 4th property of the MyProduct class - which is the "TotalRevenue" property. I want this value to be the aggregate amount of revenue that our products have sold for. This value isn't stored anywhere as a pre-computed column within the Northwind database. Instead you need to perform a join between the "Products" table and the "Order Details" table and sum up all of the Order Detail rows associated with a given product.
What is cool is that I can use the LINQ "Sum" extension method on the Product class's OrderDetails association and write a multiplication Lambda expression as part of my query syntax projection to compute this value:
LINQ to SQL is then smart enough to use the below SQL to perform the calculation in the SQL database:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[UnitPrice], (
SELECT SUM([t2].[value])
FROM (
SELECT [t1].[UnitPrice] * (CONVERT(Decimal(29,4),[t1].[Quantity])) AS [value], [t1].[ProductID]
FROM [dbo].[Order Details] AS [t1]
) AS [t2]
WHERE [t2].[ProductID] = [t0].[ProductID]
) AS [value]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[UnitPrice] > 99
Query Syntax - Understanding Deferred Execution, and using ToList() and ToArray()
By default the result of a query syntax expression is a variable of type IEnumerable<T>. In my samples above you'll notice that all of the query syntax assignments are to IEnumerable<Product>, IEnumerable<string>, IEnumerable<Person>, IEnumerable<AlternatePerson>, and IEnumerable<MyProduct> variables.
One of the nice characteristics of IEnumerable<T> interfaces is that objects that implement them can defer the actual execution of the queries until a developer first attempts to iterate over the values (this is accomplished using the "yield" construct that was first introduced with C# 2.0 in VS 2005). LINQ and query syntax expressions take advantage of this feature, and defer the actual execution of queries until the first time you loop over the results. If you never iterate over the IEnumerable<T> result, then the query is never executed.
For example, consider the below LINQ to SQL example:
The database will be hit and the values to populate our Category objects will be retrieved not when the query syntax expression is declared - but rather when we first try and loop over the results (indicated above with the red arrow).
This deferred execution behavior ends up being really useful because it enables some powerful composition scenarios where we can "chain" multiple LINQ queries and expressions together. For example, we could feed the result of one expression into another - and by deferring the execution allow an ORM like LINQ to SQL to optimize the raw SQL based on the entire expression tree. I'll show examples of how to use this in a later blog post.
How to evaluate the query syntax expression immediately
If you don't want to defer the execution of queries, and instead want to execute them immediately, you can use the built-in ToList() and ToArray() operators to return either a List<T> or an array containing the results.
For example, to return a generic-based List<T> collection:
and to return an array:
In both cases above the database will be hit and the Category objects populated immediately.
Summary
Query syntax provides a very convenient declarative shorthand for expressing queries using the standard LINQ query operators. It offers a syntax that is very readable, and which works against any type of data (any in-memory collection, array, XML content, or against remote data providers like databases, web-services, etc). Once you become familiar with the syntax, you can immediately apply the knowledge everywhere.
In the not too distant future I'll finish the last segment of this language series - which will cover the new "anonymous types" feature. I'll then move on to cover some super practical examples of using all of these language features in the real world (especially using LINQ against databases and XML files).
Hope this helps,
Scott