MySQL GROUP BY and HAVING

Elie picture Elie · Feb 25, 2011 · Viewed 48.1k times · Source

I'm grouping my results based on a column X and I want to return the rows that has highest Column Y's value in the group.

SELECT * 
FROM   mytable 
GROUP  BY col1 
HAVING col2 >= (SELECT MAX(col2) 
                FROM   mytable AS mytable2 
                WHERE  mytable2.col1 = mytable.col1 GROUP BY mytable2.col1) 

I want to optimize the query above. Is it doable without sub-queries?

I found the solution and it's simpler than you think:

SELECT * FROM (SELECT * FROM mytable ORDER BY col2 DESC) temp GROUP BY col1

Runs in 5 milliseconds on 20,000 rows.

Answer

OMG Ponies picture OMG Ponies · Feb 25, 2011

Using a derived table/inline view for a JOIN:

SELECT x.* 
  FROM mytable x
  JOIN (SELECT t.col1,
               MAX(t.col2) AS max_col2
          FROM MYTABLE t
      GROUP BY t.col1) y ON y.col1 = x.col1
                        AND y.max_col2 >= x.col2

Be aware that this will duplicate x records if there's more than one related y record. To remove duplicates, use DISTINCT:

SELECT DISTINCT x.* 
  FROM mytable x
  JOIN (SELECT t.col1,
               MAX(t.col2) AS max_col2
          FROM MYTABLE t
      GROUP BY t.col1) y ON y.col1 = x.col1
                        AND y.max_col2 >= x.col2

The following is untested, but will not return duplicates (assuming valid):

SELECT x.* 
  FROM mytable x
 WHERE EXISTS (SELECT NULL
                 FROM MYTABLE y
                WHERE y.col1 = x.col1
             GROUP BY y.col1
               HAVING MAX(y.col2) >= x.col2)