Calculating SQL Server ROW_NUMBER() OVER() for a derived table

Lukas Eder picture Lukas Eder · Sep 23, 2013 · Viewed 38.7k times · Source

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance:

ROW_NUMBER() OVER()

This is particularly useful when used with ordered derived tables, such as:

SELECT t.*, ROW_NUMBER() OVER()
FROM (
    SELECT ...
    ORDER BY
) t

How can this be emulated in SQL Server? I've found people using this trick, but that's wrong, as it will behave non-deterministically with respect to the order from the derived table:

-- This order here ---------------------vvvvvvvv
SELECT t.*, ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM (
    SELECT TOP 100 PERCENT ...
    -- vvvvv ----redefines this order here
    ORDER BY
) t

A concrete example (as can be seen on SQLFiddle):

SELECT v, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM (
  SELECT TOP 100 PERCENT 1 UNION ALL
  SELECT TOP 100 PERCENT 2 UNION ALL
  SELECT TOP 100 PERCENT 3 UNION ALL
  SELECT TOP 100 PERCENT 4
  -- This descending order is not maintained in the outer query
  ORDER BY 1 DESC
) t(v)

Also, I cannot reuse any expression from the derived table to reproduce the ORDER BY clause in my case, as the derived table might not be available as it may be provided by some external logic.

So how can I do it? Can I do it at all?

Answer

ErikE picture ErikE · Sep 23, 2013

The Row_Number() OVER (ORDER BY (SELECT 1)) trick should NOT be seen as a way to avoid changing the order of underlying data. It is only a means to avoid causing the server to perform an additional and unneeded sort (it may still perform the sort but it's going to cost the minimum amount possible when compared to sorting by a column).

All queries in SQL server ABSOLUTELY MUST have an ORDER BY clause in the outermost query for the results to be reliably ordered in a guaranteed way.

The concept of "retaining original order" does not exist in relational databases. Tables and queries must always be considered unordered until and unless an ORDER BY clause is specified in the outermost query.

You could try the same unordered query 100,000 times and always receive it with the same ordering, and thus come to believe you can rely on said ordering. But that would be a mistake, because one day, something will change and it will not have the order you expect. One example is when a database is upgraded to a new version of SQL Server--this has caused many a query to change its ordering. But it doesn't have to be that big a change. Something as little as adding or removing an index can cause differences. And more: Installing a service pack. Partitioning a table. Creating an indexed view that includes the table in question. Reaching some tipping point where a scan is chosen instead of a seek. And so on.

Do not rely on results to be ordered unless you have said "Server, ORDER BY".