LINQ to SQL Paging Gotcha
Framework Version: 3.5
I ran
into an issue with how LINQ to SQL implements the Skip and
Take operators which leads to records being repeated or
missing when performing
custom paging.
I am using the Northwind database. My UI will display the
City and ContactName of all Customers five records at a
time. The LINQ expression projects to an anonymous type with
properties of City and ContactName. We see below the LINQ
expression, the SQL generated and the data returned for the
first and next 5 records.
| First 5 Records (Page 1) | Next 5 Records (Page 2) | |
LINQ Expression
(from c in Customers select new { c.City, c.ContactName }).Skip(0).Take(5) |
LINQ Expression (from c in Customers select new { c.City, c.ContactName }).Skip(5).Take(5) |
|
Generated SQL SELECT TOP (5) [t0].[City], [t0].[ContactName] FROM [Customers] AS [t0] |
Generated SQL
DECLARE @p0 Int = 5 DECLARE @p1 Int = 5 SELECT [t1].[City], [t1].[ContactName] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY [t0].[City], [t0].[ContactName]) AS [ROW_NUMBER], [t0].[City], [t0].[ContactName] FROM [Customers] AS [t0] ) AS [t1] WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1 ORDER BY [t1].[ROW_NUMBER] |
|
Result set City ContactName
--------------- ------------------------------
Berlin Maria Anders
México D.F. Ana Trujillo
México D.F. Antonio Moreno
London Thomas Hardy
Luleå Christina Berglund
|
Result set City ContactName
--------------- ------------------------------
Barquisimeto Carlos González
Bergamo Giovanni Rovelli
Berlin Maria Anders
Bern Yang Wang
Boise Jose Pavarotti
|
From the result set, we see that Maria Anders is repeated in Page 1 and Page 2! The same behavior can also be observed with a LinqDataSource as defined below:
<asp:GridView ID="GridView1" runat="server" AllowPaging="True" AutoGenerateColumns="False"
DataSourceID="LinqDataSource1" PageSize="5">
<Columns>
<asp:BoundField DataField="City" HeaderText="City" ReadOnly="True" SortExpression="City" />
<asp:BoundField DataField="ContactName" HeaderText="ContactName" ReadOnly="True"
SortExpression="ContactName" />
</Columns>
</asp:GridView>
<asp:LinqDataSource ID="LinqDataSource1" runat="server" ContextTypeName="DataClassesDataContext"
Select="new (City, ContactName)" TableName="Customers">
</asp:LinqDataSource>
The reason for this behavior is because the SQL
generated for Page 1 does not have an order specified which
results in the
records being returned in an unknown order
SELECT TOP (5) [t0].[City], [t0].[ContactName] FROM [Customers] AS [t0]
while for subsequent pages, the SQL generated has an order by clause (required for the ROW_NUMBER() function).
(ORDER BY [t0].[City], [t0].[ContactName])
While paging through the result set, this will either result in repeating records or missing records*.
The best way to work around this issue in framework version 3.5 is to always specify an orderby in the expression before the Take and Skip operators. By doing this, you guarantee that the order used for the first page (with the TOP operator) will be the same as the order used for subsequent pages (using the ROW_NUMBER() OVER ORDER function)
(from c in Customers
orderby c.ContactName
select new {
c.City,
c.ContactName
}).Skip(0).Take(5)
SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]
ORDER BY [t0].[ContactName]
LINQ To Entities on the other hand forces you to specify an order by in a scenario like this. If you don’t you get the following error:
The method 'Skip' is only supported for sorted input in LINQ to Entities.
The method 'OrderBy' must be called before the method 'Skip'.
This issue seems to have been fixed in the .NET 4.0 framework where the SELECT TOP statement is no longer generated by LINQ TO SQL for the first page.
(from c in Customers
orderby c.ContactName
select new {
c.City,
c.ContactName
}).Skip(0).Take(5)
DECLARE @p0 Int = 0
DECLARE @p1 Int = 5
SELECT [t1].[City], [t1].[ContactName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]
In addition, if you don’t specify an orderby clause or project your result, LINQ To SQL will use all the properties specified in the entity class in the order by clause and select statement leading to ugly and inefficient code:
(from c in Customers
select c).Skip(10).Take(5)
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address],
[t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country],
[t0].[Phone], [t0].[Fax]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]
Special thanks to
Joseph Albahari
for creating
LINQPad.
* In most cases we get lucky and get the
records from SQL based on the PK for the first page (even
though as mentioned before, order is not guaranteed) and
since our LINQ Expression will most likely have the PK as
the first item in the projection it results in the same
order for both the first page and subsequent pages. Hence
this obscure issue.