I have a table with id
, year
and count
.
I want to get the MAX(count)
for each id
and keep the year
when it happens, so I make this query:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
ERROR: column "table.year" must appear in the GROUP BY clause or be used in an aggregate function
So I try:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count)
, it just shows the table as it is. I suppose because when grouping by year
and id
, it gets the max for the id
of that specific year.
So, how can I write that query? I want to get the id
´s MAX(count)
and the year when that happens.
The shortest (and possibly fastest) query would be with DISTINCT ON
, a PostgreSQL extension of the SQL standard DISTINCT
clause:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT
list. You can spell out column names for clarity:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id
etc. which may or may not be welcome. It's better than "undefined" in any case.
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year
from the ORDER BY
. Or pick the latest year with year DESC
.
More explanation, links, a benchmark and possibly faster solutions in this closely related answer:
Aside: In a real life query, you wouldn't use some of the column names. id
is a non-descriptive anti-pattern for a column name, count
is a reserved word in standard SQL and an aggregate function in Postgres.