PostgreSQL - how should I use first_value()?

Brent.Longborough picture Brent.Longborough · Mar 30, 2013 · Viewed 23.4k times · Source

This answer to shows how to produce High/Low/Open/Close values from a ticker:
Retrieve aggregates for arbitrary time intervals

I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for first_value().

So far, I have tried two queries:

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
    ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
    )
ORDER BY cstamp;

Here's a piece of the result:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004

As you can see, 77.8 is not what I believe is the correct value for first_value(), which should be 77.0.

I though this might be due to the ambiguous ORDER BY in the WINDOW, so I changed this to

ORDER BY cstamp ASC 

but this appears to upset the PARTITION as well:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000

since the values for max and last now vary within the partition.

What am I doing wrong? Could someone help me better to understand the relation between PARTITION and ORDER within a WINDOW?


Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.

CREATE TABLE wtest (
    cstamp timestamp without time zone,
    price numeric(10,5)
);

COPY wtest (cstamp, price) FROM stdin;
2013-03-29 09:04:54 77.80000
2013-03-29 09:04:50 76.98000
2013-03-29 09:29:51 77.00000
2013-03-29 09:29:41 77.80000
2013-03-29 09:26:18 77.00000
2013-03-29 09:19:14 77.00000
2013-03-29 09:19:10 77.00000
2013-03-29 09:33:50 76.00000
2013-03-29 09:33:46 76.10000
2013-03-29 09:33:15 77.79000
2013-03-29 09:30:08 77.80000
2013-03-29 09:30:04 77.00000
\.

Answer

Clodoaldo Neto picture Clodoaldo Neto · Mar 30, 2013

SQL Fiddle

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;