September 2009 - Posts - Raj Kaimal

September 2009 - Posts

LINQ to SQL Paging Gotcha

Framework Version: 3.5

I ran into an issue with how LINQ to SQL implements the Skip and Take operators which leads to records being repeated or missing when performing custom paging.

I am using the Northwind database. My UI will display the City and ContactName of all Customers five records at a time. The LINQ expression projects to an anonymous type with properties of City and ContactName. We see below the LINQ expression, the SQL generated and the data returned for the first and next 5 records.

First 5 Records (Page 1)   Next 5 Records (Page 2)
LINQ Expression
(from c in Customers
        select new {
            c.City,
            c.ContactName
        }).Skip(0).Take(5)
        LINQ Expression
(from c in Customers
        select new {
            c.City,
            c.ContactName
        }).Skip(5).Take(5)
Generated SQL
SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]
  Generated SQL
DECLARE @p0 Int = 5
DECLARE @p1 Int = 5

SELECT [t1].[City], [t1].[ContactName]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[City], [t0].[ContactName]) AS [ROW_NUMBER], 
    [t0].[City], [t0].[ContactName]
    FROM [Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]
Result set
City            ContactName
--------------- ------------------------------
Berlin          Maria Anders
México D.F.     Ana Trujillo
México D.F.     Antonio Moreno
London          Thomas Hardy
Luleå           Christina Berglund
  Result set
City            ContactName
--------------- ------------------------------
Barquisimeto    Carlos González
Bergamo         Giovanni Rovelli
Berlin          Maria Anders
Bern            Yang Wang
Boise           Jose Pavarotti

From the result set, we see that Maria Anders is repeated in Page 1 and Page 2!  The same behavior can also be observed with a LinqDataSource as defined below:

<asp:GridView ID="GridView1" runat="server" AllowPaging="True" AutoGenerateColumns="False"
    DataSourceID="LinqDataSource1" PageSize="5">
    <Columns>
        <asp:BoundField DataField="City" HeaderText="City" ReadOnly="True" SortExpression="City" />
        <asp:BoundField DataField="ContactName" HeaderText="ContactName" ReadOnly="True"
            SortExpression="ContactName" />
    </Columns>
</asp:GridView>
<asp:LinqDataSource ID="LinqDataSource1" runat="server" ContextTypeName="DataClassesDataContext"
     Select="new (City, ContactName)" TableName="Customers">
</asp:LinqDataSource>


The reason for this behavior is because the SQL generated for Page 1 does not have an order specified which results in the records being returned in an unknown order

SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]

while for subsequent pages, the SQL generated has an order by clause (required for the ROW_NUMBER() function).

(ORDER BY [t0].[City], [t0].[ContactName]) 

While paging through the result set, this will either result in repeating records or missing records*. 

The best way to work around this issue in framework version 3.5 is to always specify an orderby in the expression before the Take and Skip operators. By doing this, you guarantee that the order used for the first page (with the TOP operator) will be the same as the order used for subsequent pages (using the ROW_NUMBER() OVER ORDER function)

(from c in Customers
 orderby c.ContactName
        select new {
            c.City,
            c.ContactName
        }).Skip(0).Take(5)
SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]
ORDER BY [t0].[ContactName]

LINQ To Entities on the other hand forces you to specify an order by in a scenario like this. If you don’t you get the following error:

The method 'Skip' is only supported for sorted input in LINQ to Entities.
The method 'OrderBy' must be called before the method 'Skip'.

This issue seems to have been fixed in the .NET 4.0 framework where the SELECT TOP statement is no longer generated by LINQ TO SQL for the first page.

(from c in Customers
 orderby c.ContactName
        select new {
            c.City,
            c.ContactName
        }).Skip(0).Take(5)
DECLARE @p0 Int = 0
DECLARE @p1 Int = 5
 
SELECT [t1].[City], [t1].[ContactName]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[City], [t0].[ContactName]
    FROM [Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]

In addition, if you don’t specify an orderby clause or project your result, LINQ To SQL will use all the properties specified in the entity class in the order by clause and select statement leading to ugly and inefficient code:

(from c in Customers
        select c).Skip(10).Take(5)
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], 
[t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], 
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], 
[t0].[Phone], [t0].[Fax]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], 
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
    FROM [Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]



Special thanks to Joseph Albahari for creating LINQPad. 

*  In most cases we get lucky and get the records from SQL based on the PK for the first page (even though as mentioned before, order is not guaranteed) and since our LINQ Expression will most likely have the PK as the first item in the projection it results in the same order for both the first page and subsequent pages. Hence this obscure issue.

Posted by rajbk | 5 comment(s)
Filed under: , ,

Windows 7 Backup

The backup tool that comes with Vista does not allow users to select individual folders for backup. Instead the user is presented with a default list of locations for backup. Because of this limitation, I had to resort to other third party tools to do the job.

vista_backup

With Windows 7, you now have the option of selecting individual folders for backup. Nice!

win7_backup

Posted by rajbk | 2 comment(s)
More Posts