VB.NET: Beware of the 'Aggregate' keyword (updated)

UPDATE I tested this initially with EmployeeID and noticed the strange behavior. Writing this blogpost I thought the max of the employeeID was a little artificial, so I changed that in OrderDate. But... what happened (see my reply to this blogpost below in the comments) ? When o.OrderDate is used, the VB.NET compiler produces a proper expression tree! But when EmployeeID is used, it doesn't. Both are nullable types. When using CustomerID it also works properly. It's beyond me why it fails to produce a proper expression tree when EmployeeID is used. Nevertheless, the advice I gave below is still valid: do yourself a favor and call the aggregate extension methods of All, Any, Max, Min, Sum etc. when you want to obtain these values from a set in a scalar query.

This afternoon I was writing some VB.NET specific Linq queries for the documentation for our upcoming LLBLGen Pro v2.6 (currently in beta) version, and typically I was simply porting over C# queries to VB.NET code and see if they work and if they do, use them, if not, start some cursing, quests for help in the MSDN docs and repeat the process. During my travels through the MSDN Linq documentation and what Microsoft had added to VB9 (.NET 3.5), I ended up in the VB.NET Linq specific language clauses documentation. One of them is about the Aggregate clause.

The Aggregate clause was a strange fellow to me. It didn't ring any bell. What does it do? Well, let's look at a typical example: I want to fetch the maximum EmployeeID for the Northwind Orders. In SQL one would do:

SELECT MAX(EmployeeID) FROM Orders

Easy enough. In Linq for VB.NET, using this Aggregate keyword, you can write:

Dim max = Aggregate o in metaData.Order Into m=Max(o.EmployeeID)

Looks ok, right? Running it will give the maximum EmployeeID, all is well. Or not? When you examine the trace output of the query generated, or check the expression tree passed into the provider, you'll see that all Order entities are fetched, and the aggregate is simply ran in memory!

This is because the Expression tree has no notion of 'Aggregate' nor 'Max'. The only thing that's in the expression tree is metaData.Order.

So I checked it with Linq to Sql. Same result. This is a very dangerous keyword if applied without proper caution: you'll fetch all data into memory for the aggregate to be applied. This can lead to serious performance issues at runtime, for example if the Order table contains 10 million rows, the retrieval of the Max date using the Aggregate keyword isn't going to perform...

What to use instead?
Use the C# way: call the extension method:

Dim max = metaData.Order.Max(Function(o) o.EmployeeID)

This will give you a query which executes the Max aggregate function on the single field EmployeeID in the database server, so no slow fetching of all the data.

Published Wednesday, May 21, 2008 6:36 PM by FransBouma

Comments

# re: VB.NET: Beware of the 'Aggregate' keyword

Wednesday, May 21, 2008 2:05 PM by Ayende Rahien

I would call this Broken

# re: VB.NET: Beware of the 'Aggregate' keyword

Wednesday, May 21, 2008 2:41 PM by rajbk

Frans,

I tried the following code in VB.net

Sub Main()

   Dim context As NorthwindDataContext = New NorthwindDataContext()

   Dim maxData = Aggregate o in context.Orders Into m=Max(o.OrderDate)

   Console.WriteLine(maxData)

   Console.ReadLine()

End Sub

The query sent to SQL is :

SELECT MAX([t0].[OrderDate]) AS [value]

FROM [dbo].[Orders] AS [t0]

-Raj

# re: VB.NET: Beware of the 'Aggregate' keyword

Wednesday, May 21, 2008 2:53 PM by FransBouma

Raj: very strange! I tried it in linq to sql and I got a simple fetch of all the orders and no max code. Also with our own provider, I got a simple expression tree with a single expression, no aggregate, no max call, nothing.

Do you use SP1 beta on vs.net 2008 perhaps?

I'll do some more testing. Stay tuned!

# re: VB.NET: Beware of the 'Aggregate' keyword

Wednesday, May 21, 2008 3:00 PM by FransBouma

It's even weirder:

Dim max = Aggregate o In nw.Orders Into m = Max(o.EmployeeID)

-> FULL fetch of order table

Dim max = Aggregate o In nw.Orders Into m = Max(o.OrderDate)

-> SELECT MAX(..) FROM Order

Very strange... both are nullable types. Using CustomerID instead of EmployeeID -> normal MAX query.

Expression tree with o.EmployeeId:

value(SD.LLBLGen.Pro.LinqSupportClasses.DataSource2`1[ NW26.Adapter.EntityClasses.OrderEntity])

Expression tree with o.OrderDate:

value(SD.LLBLGen.Pro.LinqSupportClasses.DataSource2`1[ NW26.Adapter.EntityClasses.OrderEntity]).Max(o => o.OrderDate)

i.o.w.: the VB.NET compiler apparently doesn't work in all cases. Ok, it's now a matter of determining WHEN this occurs...

# re: VB.NET: Beware of the 'Aggregate' keyword (updated)

Sunday, May 25, 2008 6:31 PM by Peter

So is version 2.6 the one that will have LINQ support? Are you shooting for a specific month for the release? (roughly)

Thanks.

# re: VB.NET: Beware of the 'Aggregate' keyword (updated)

Monday, May 26, 2008 2:40 AM by FransBouma

@Peter: v2.6 is the version with linq support yes. We hope to release the RTM version within 2 weeks.  (probably sooner)

# re: VB.NET: Beware of the 'Aggregate' keyword (updated)

Friday, May 30, 2008 2:56 AM by worldspawn[]

Two weeks! Awesome.