How do you get the rows that contain the max value for each grouped set?
I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:
Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)
Person | Group | Age
---
Bob | 1 | 32
Jill | 1 | 34
Shawn| 1 | 42
Jake | 2 | 29
Paul | 2 | 36
Laura| 2 | 39
Desired result set:
Shawn | 1 | 42
Laura | 2 | 39
The correct solution is:
SELECT o.*
FROM `Persons` o # 'o' from 'oldest person in group'
LEFT JOIN `Persons` b # 'b' from 'bigger age'
ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL # bigger age not found
It matches each row from o
with all the rows from b
having the same value in column Group
and a bigger value in column Age
. Any row from o
not having the maximum value of its group in column Age
will match one or more rows from b
.
The LEFT JOIN
makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULL
s from b
('no biggest age in the group').
Using INNER JOIN
makes these rows not matching and they are ignored.
The WHERE
clause keeps only the rows having NULL
s in the fields extracted from b
. They are the oldest persons from each group.
This solution and many others are explained in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming