MySQL - SELECT WHERE field IN (subquery) - Extremely slow why?

quano picture quano · May 26, 2011 · Viewed 292.5k times · Source

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:

SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1

This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.

Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:

SELECT *
FROM some_table 
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)

This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.

Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM temp_view
)

And that works just fine. MySQL does this in some milliseconds.

Any SQL experts here who can explain what's going on?

Answer

quano picture quano · May 27, 2011

The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:

SELECT * FROM
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
) AS subquery

The final query would look like this:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT * FROM
    (
        SELECT relevant_field
        FROM some_table
        GROUP BY relevant_field
        HAVING COUNT(*) > 1
    ) AS subquery
)