Get records with max value for each group of grouped SQL results

Yarin picture Yarin · Aug 24, 2012 · Viewed 257.3k times · Source

How do you get the rows that contain the max value for each grouped set?

I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:

Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)

Person | Group | Age
---
Bob  | 1     | 32  
Jill | 1     | 34  
Shawn| 1     | 42  
Jake | 2     | 29  
Paul | 2     | 36  
Laura| 2     | 39  

Desired result set:

Shawn | 1     | 42    
Laura | 2     | 39  

Answer

axiac picture axiac · Jan 22, 2015

The correct solution is:

SELECT o.*
FROM `Persons` o                    # 'o' from 'oldest person in group'
  LEFT JOIN `Persons` b             # 'b' from 'bigger age'
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL                 # bigger age not found

How it works:

It matches each row from o with all the rows from b having the same value in column Group and a bigger value in column Age. Any row from o not having the maximum value of its group in column Age will match one or more rows from b.

The LEFT JOIN makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULLs from b ('no biggest age in the group').
Using INNER JOIN makes these rows not matching and they are ignored.

The WHERE clause keeps only the rows having NULLs in the fields extracted from b. They are the oldest persons from each group.

Further readings

This solution and many others are explained in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming