I'm grouping my results based on a column X and I want to return the rows that has highest Column Y's value in the group.
SELECT *
FROM mytable
GROUP BY col1
HAVING col2 >= (SELECT MAX(col2)
FROM mytable AS mytable2
WHERE mytable2.col1 = mytable.col1 GROUP BY mytable2.col1)
I want to optimize the query above. Is it doable without sub-queries?
I found the solution and it's simpler than you think:
SELECT * FROM (SELECT * FROM mytable ORDER BY col2 DESC) temp GROUP BY col1
Runs in 5 milliseconds on 20,000 rows.
Using a derived table/inline view for a JOIN:
SELECT x.*
FROM mytable x
JOIN (SELECT t.col1,
MAX(t.col2) AS max_col2
FROM MYTABLE t
GROUP BY t.col1) y ON y.col1 = x.col1
AND y.max_col2 >= x.col2
Be aware that this will duplicate x
records if there's more than one related y
record. To remove duplicates, use DISTINCT
:
SELECT DISTINCT x.*
FROM mytable x
JOIN (SELECT t.col1,
MAX(t.col2) AS max_col2
FROM MYTABLE t
GROUP BY t.col1) y ON y.col1 = x.col1
AND y.max_col2 >= x.col2
The following is untested, but will not return duplicates (assuming valid):
SELECT x.*
FROM mytable x
WHERE EXISTS (SELECT NULL
FROM MYTABLE y
WHERE y.col1 = x.col1
GROUP BY y.col1
HAVING MAX(y.col2) >= x.col2)