Oracle MIN as analytic function - odd behavior with ORDER BY?

Cade Roux picture Cade Roux · Sep 12, 2011 · Viewed 14k times · Source

This particular case was distilled from an example where the programmer assumed that for two shipments into a tank car, line #1 would be loaded first. I corrected this to allow for the loading to be performed in any order - however, I discovered that MIN() OVER (PARTITION BY) allows an ORDER BY in Oracle (this is not allowed in SQL Server), and additionally, it alters the behavior of the function, causing the ORDER BY to apparently be added to the PARTITION BY.

WITH data AS (
SELECT 1 AS SHIPMENT_ID, 1 AS LINE_NUMBER, 2 AS TARE, 3 AS GROSS FROM DUAL
UNION ALL
SELECT 1 AS SHIPMENT_ID, 2 AS LINE_NUMBER, 1 AS TARE, 2 AS GROSS FROM DUAL
)
SELECT MIN(tare) OVER (PARTITION BY shipment_id) first_tare
,MAX(gross) OVER (PARTITION BY shipment_id) last_gross
,FIRST_VALUE(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect
,FIRST_VALUE(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect
,MIN(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect_still
,MAX(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect_still
,MIN(tare) OVER (PARTITION BY shipment_id, LINE_NUMBER) first_tare_incorrect_still2
,MAX(gross) OVER (PARTITION BY shipment_id, LINE_NUMBER) last_gross_incorrect_still2
 FROM data

A SQL Server example (with non-applicable code commented out):

WITH data AS (
SELECT 1 AS SHIPMENT_ID, 1 AS LINE_NUMBER, 2 AS TARE, 3 AS GROSS -- FROM DUAL
UNION ALL
SELECT 1 AS SHIPMENT_ID, 2 AS LINE_NUMBER, 1 AS TARE, 2 AS GROSS -- FROM DUAL
)
SELECT MIN(tare) OVER (PARTITION BY shipment_id) first_tare
,MAX(gross) OVER (PARTITION BY shipment_id) last_gross
-- ,FIRST_VALUE(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect
-- ,FIRST_VALUE(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect
-- ,MIN(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect_still
-- ,MAX(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect_still
,MIN(tare) OVER (PARTITION BY shipment_id, LINE_NUMBER) first_tare_incorrect_still2
,MAX(gross) OVER (PARTITION BY shipment_id, LINE_NUMBER) last_gross_incorrect_still2
 FROM data

So question: What is Oracle doing and why and is it right?

Answer

Justin Cave picture Justin Cave · Sep 12, 2011

If you add an ORDER BY to the MIN analytic function, you turn it into a "min so far" function rather than an overall minimum. For the final row for whatever you're partitioning by, the results will be the same. But the prior rows may have a different "min so far" than the overall minimum.

Using the EMP table as an example, you can see that the minimum salary so far for the department eventually converges on the overall minimum for the department. And you can see that the "min so far" value for any given department decreases as lower values are encountered.

SQL> ed
Wrote file afiedt.buf

  1  select ename,
  2         deptno,
  3         sal,
  4         min(sal) over (partition by deptno order by ename) min_so_far,
  5         min(sal) over (partition by deptno) min_overall
  6    from emp
  7*  order by deptno, ename
SQL> /

ENAME          DEPTNO        SAL MIN_SO_FAR MIN_OVERALL
---------- ---------- ---------- ---------- -----------
CLARK              10       2450       2450        1300
KING               10       5000       2450        1300
MILLER             10       1300       1300        1300
ADAMS              20       1110       1110         800
FORD               20       3000       1110         800
JONES              20       2975       1110         800
SCOTT              20       3000       1110         800
smith              20        800        800         800
ALLEN              30       1600       1600         950
BLAKE              30       2850       1600         950
MARTIN             30       1250       1250         950
SM0                30        950        950         950
TURNER             30       1500        950         950
WARD               30       1250        950         950
BAR
PAV

16 rows selected.

Of course, it would make more sense to use this form of the analytic function when you're trying to do something like compute a personal best that you can use as a comparison in future periods. If you're tracking an individual's decreasing golf scores, mile times, or weight, displaying personal bests can be a form of motivation.

SQL> ed
Wrote file afiedt.buf

  1  with golf_scores as
  2  (  select 1 golfer_id, 80 score, sysdate dt from dual union all
  3     select 1, 82, sysdate+1 dt from dual union all
  4     select 1, 72, sysdate+2 dt from dual union all
  5     select 1, 75, sysdate+3 dt from dual union all
  6     select 1, 71, sysdate+4 dt from dual union all
  7     select 2, 74, sysdate from dual )
  8  select golfer_id,
  9         score,
 10         dt,
 11         (case when score=personal_best
 12               then 'New personal best'
 13               else null
 14           end) msg
 15    from (
 16  select golfer_id,
 17         score,
 18         dt,
 19         min(score) over (partition by golfer_id
 20                              order by dt) personal_best
 21    from golf_scores
 22*        )
SQL> /

 GOLFER_ID      SCORE DT        MSG
---------- ---------- --------- -----------------
         1         80 12-SEP-11 New personal best
         1         82 13-SEP-11
         1         72 14-SEP-11 New personal best
         1         75 15-SEP-11
         1         71 16-SEP-11 New personal best
         2         74 12-SEP-11 New personal best

6 rows selected.