Subqueries vs joins

palmsey picture palmsey · Sep 26, 2008 · Viewed 57.7k times · Source

I refactored a slow section of an application we inherited from another company to use an inner join instead of a subquery like:

WHERE id IN (SELECT id FROM ...)

The refactored query runs about 100x faster. (~50 seconds to ~0.3) I expected an improvement, but can anyone explain why it was so drastic? The columns used in the where clause were all indexed. Does SQL execute the query in the where clause once per row or something?

Update - Explain results:

The difference is in the second part of the "where id in ()" query -

2   DEPENDENT SUBQUERY  submission_tags ref st_tag_id   st_tag_id   4   const   2966    Using where

vs 1 indexed row with the join:

    SIMPLE  s   eq_ref  PRIMARY PRIMARY 4   newsladder_production.st.submission_id  1   Using index

Answer

Jeffrey L Whitledge picture Jeffrey L Whitledge · Sep 26, 2008

A "correlated subquery" (i.e., one in which the where condition depends on values obtained from the rows of the containing query) will execute once for each row. A non-correlated subquery (one in which the where condition is independent of the containing query) will execute once at the beginning. The SQL engine makes this distinction automatically.

But, yeah, explain-plan will give you the dirty details.