SQL: Last_Value() returns wrong result (but First_Value() works fine)

Echo picture Echo · Mar 13, 2013 · Viewed 9.1k times · Source

I have a table in SQL Server 2012 as the snapshot shows:

enter image description here

Then I'm using Last_Value() and First Value to get AverageAmount of each EmpID for different YearMonth. The script is as follows:

SELECT A.EmpID,  
       First_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey asc) AS  '200901AvgAmount', 
       Last_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey asc) AS '201112AvgAmount'

FROM  Emp_Amt  AS A

However, the result for this query is:

result

In column of "201112AvgAmount", it shows different values for each EmpID while "200901AvgAmount" has correct values.

Is there anything wrong with my SQL script? I did a lot research online but still cannot find the answer....

Answer

Lukas Eder picture Lukas Eder · Nov 7, 2014

Here's a quick query to illustrate the behaviour:

select 
  v,

  -- FIRST_VALUE() and LAST_VALUE()
  first_value(v) over(order by v) f1,
  first_value(v) over(order by v rows between unbounded preceding and current row) f2,
  first_value(v) over(order by v rows between unbounded preceding and unbounded following) f3,
  last_value (v) over(order by v) l1,
  last_value (v) over(order by v rows between unbounded preceding and current row) l2,
  last_value (v) over(order by v rows between unbounded preceding and unbounded following) l3,

  -- For completeness' sake, let's also compare the above with MAX()
  max        (v) over() m1,
  max        (v) over(order by v) m2,
  max        (v) over(order by v rows between unbounded preceding and current row) m3,
  max        (v) over(order by v rows between unbounded preceding and unbounded following) m4
from (values(1),(2),(3),(4)) t(v)

The output of the above query can be seen here (SQLFiddle here):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  4 |  1 |  1 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  4 |  2 |  2 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  4 |  3 |  3 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. (RANGE is not exactly the same as ROWS, but that's another story). Think about it this way:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)

If you want to prevent that behaviour, you have two options:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()