Paging in a sql result set

This post is based on a post by fellow CB blogger Dave Hayden.  It deals with paging through a set of sql server data. Dave has a very clear and elegant solution but it did take some puzzling to find out why things have to be done that way. Dave’s solution steered towards a sproc. Trying to stay away from that I want to present a dynamic sql version of the same idea.

Since the 2005 version of MS SQL server has the ROW_NUMBER function. This function returns the sequence number of the result set rows. Like this select statement.

SELECT Id, Name, ROW_NUMBER() OVER (ORDER BY Name) AS Row FROM Fleet

The OVER part determines the sort order of the result set.

This row number is an ideal base for retrieving data in pages. With a page size of 20 the first page will contain row numbers 1 to 21, the second 21 to 40, etc. Alas you cannot use the function result directly in the where clause. To retrieve the first page this sql will not work.

SELECT Id, Name, ROW_NUMBER() OVER (ORDER BY Name) AS Row FROM Fleet WHERE ROW < 21

To get around this you need an outer select to wrap up. Like this:

SELECT Id, Name FROM

(SELECT Id, Name, ROW_NUMBER() OVER (ORDER BY Name) AS Row FROM Fleet) AS tmp

WHERE ROW < 21

There is a pattern in this

SELECT  MyCols FROM

(SELECT MyCols, ROW_NUMBER() OVER (ORDER BY MyOrder) AS Row FROM MyView) AS tmp

WHERE ROW < MaxRow


The values for the example query I started with

MyCols=”Id, Name”

MyView=”Fleet”

MyOrder=”Name”

 

In a more complex scenario the data will be a view on several tables resulting in one result set. Here the list of columns for the result set will not match the list of columns in the view.

Take this query.

SELECT  Name,Id, City FROM

(SELECT Person.Name,Person.Id, City.City , ROW_NUMBER() OVER (ORDER BY Name) AS Row FROM Persons INNER JOIN City On Person.CityId=City.ID) AS tmp 

WHERE ROW < 21

The pattern now evolves to

SELECT  MyCols FROM

(SELECT MyViewCols, ROW_NUMBER() OVER (ORDER BY MyOrder) AS Row FROM MyView) AS tmp 

WHERE ROW < MaxRow

The values for the last query

MyCols=”Name, Id, City”

MyViewCols=”Person.Name,Person.Id, City.City

MyView=”Persons INNER JOIN City On Person.CityId=City.ID

MyOrder=”Name”

 

To complete the pattern I have to fully parameterize the row numbers. Here I also limit the inner query with a SELECT TOP, no need to retrieve more rows than maxrow.

SELECT MyCols FROM

(SELECT TOP MaxRow ROW_NUMBER OVER (ORDER BY  MyOrder ) AS Row, MyViewCols FROM MyView)  AS Tmp

WHERE Row > @MinRow AND Row <= @MaxRow

It is cumbersome to assemble this query over and over again.  Having to do all sql myself (more on that in a later post) I have built a small C# function which will assemble the paging query from the fragments passed in. It uses the pattern described,  builds a sqlCommand and is used like this.

const string columns = "Code, ExpireDate, Description";

const string view = "Permit INNER JOIN ProductCell ON Permit.Code = ProductCell.Code WHERE Permit.LicenceId=@LicenceId";

const string orderBy = "Permit.Code";

const string viewColumns = "Permit.Code, ExpireDate, Description";

var cmd = BuildPagedSelectCommand(pageNo, columns, view, orderBy, viewColumns, pageSize);

cmd.Parameters.AddWithValue("@LicenceId", licenceId);

 

The function itself is pretty straightforward

protected SqlCommand BuildPagedSelectCommand(int pageNumber, string columnList, string selectFrom, string orderBy, string columnsAs, int pageSize)

{

    var minRow = (pageNumber – 1) * pageSize;

    var maxRow = minRow + pageSize;

    var cmd = new SqlCommand(string.Format("SELECT {0} FROM (SELECT TOP {1} ROW_NUMBER() OVER (ORDER BY {2}) AS Row,{3} FROM {4}) AS tmp WHERE Row > @MinRow AND Row <= @MaxRow",

                                           columnList, maxRow, orderBy, columnsAs, selectFrom), connection);

    cmd.Parameters.AddWithValue("@MinRow", minRow);

    cmd.Parameters.AddWithValue("@MaxRow", maxRow);

    return cmd;

}

All it does is calculate the rownumbers, assemble the sql statement and create the sql command to return.

For the simpler cases where the result columns match the underlying view columns I have created an overload.

protected SqlCommand BuildPagedSelectCommand(int pageNumber, string columnList, string selectFrom, string orderBy, int pageSize)

{

    return BuildPagedSelectCommand(pageNumber, columnList, selectFrom, orderBy, columnList, pageSize);

}

It uses the original overload using the columnlist twice.

Paging data is not that complicated but the syntax of the query required is somewhat confusing. I hope this example helps in clarifying it.

This entry was posted in Data. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://damieng.com Damien Guard

    You also want to ensure that you order by something unique not just something like name otherwise it’s quite possible that items on page 1 will appear again on page 2 as the query plan takes a different route that results in items expected on page 2 being considered page 1.

    Remember item sequence is not guaranteed beyond what is specified in order by and this is especially important in paging.

    [)amien

  • http://petersgekko.codebetter.com pvanooijen

    Good point. But as the query is parameterized I don’t expect a different, or even new, query plan.

    It will get rough when the underlying data changes between pages. What if person 1 inserts/deletes data which results in a changed number of rows on page 1 and person 2 pages from page 2 to page 3… Etc..

  • http://weblogs.asp.net/bdill Ben

    Nice post, thank you.

    @pvanooijen
    The execution plan can still get you. If the code is running inside a clustered environment, you could see different results between servers…

    On top of that/instead of, you’re still facing recompiling execution plans. Any of which could introduce inconsistent results.

    What I haven’t seen is a performance analysis of the row_number function. The example is very basic, its a 1:1 paging of a table. Typically you’ll see filtering in a paged result set. Is SQL able to use hints when executing a run_number function, or must the entire result set be examined to return the firs page of data? The latter could come at a heafty price in terms of performance.

  • fschwiet

    If you’re using LINQ to SQL with the Skip(n) and Take(n) operators, I’ve found it will generate queries using row number as you suggested.

    Pretty sweet! I always wondered how this was done.

  • Andrew

    Can I get a limit and an offset :) Heck no, but at least this works… 10x as complicated and hard to read as it should be. There has to be a better way to do this.

  • Eric T.

    Perhaps I’m being overly paranoid, but is there not a danger of SQL injection in the method you created (BuildPagedSelectCommand)?

    I understand that you don’t have to expose this method publicly, but now you have to remember that any time you call it you have to be aware of the values that you are passing in.

    Maybe it’s not an issue today because you’re just using the method in a very specific instance. But in a few months you might not remember. It gets even worse when you’re sharing this with a team – not everyone is going to take the time to check what’s going on inside the method and who knows what might be passed in.

    I realize the point of the article was about paging and the ROW_NUMBER function so maybe my remarks are off topic, but then why bother abstracting the SQL statement into a method?

    You did mention you have more to say about building the SQL in a later post, so if you’re planning something that will better handle SQL injection problems please ignore this.

  • http://petersgekko.codebetter.com pvanooijen

    @Ben. Right, that makes perfectly sense. I havn’t timed performance but we are using thise code for paging through pretty big result sets. Actually the code was needed because the pages were faaaar to big.. Anyway, paging is smooth and sweet

    @Andrew. I’m not quite sure what you are pointing at. There is a limit in the TOP x. But you still have to read in all pages which precede the desired one. So there is no offset.

    @Eric. This is clearly not a method to expose publicly, just a helper to be use inside a repository. Notice the protected visibillity.
    Having to do all sql by hand is a pita. When the queries get really hairy, like these paging things, the method makes the code easier to read. And it saves me from calculating and setting page and rownumbers over and over again.
    I will get back on having to do the sql by hand (instead of using an O/R mapper) in the promissed later post.

  • karl

    The SQL Server’s team inability to add a simple syntax for this is disheartening.

  • itzik shmulewitz

    the problem using this method is that if you use order by a field which is not indexed or even calculated in query time the query will still sort the whole query and will not tryjust to find the top x rows. this problem is still solvable by using CLR aggregate function to create a binary field containing only the relevant ids but in sql 2005 there is a memory limit of 8K to a CLR aggregate function ( changed in sql 2008) so in string and in the 2000 row this method will not work (in case you are sorting by an int type) but it is still usefull.

  • Test

    test