Procedurally transform subquery into join

Justin R. picture Justin R. · Nov 20, 2009 · Viewed 9.3k times · Source

Is there a generalized procedure or algorithm for transforming a SQL subquery into a join, or vice versa? That is, is there a set of typographic operations that can be applied to a syntactically correct SQL query statement containing a subquery that results in a functionally equivalent statement without a subquery? If so, what are they (i.e., what's the algorithm), and in what cases do they not apply?

Answer

OMG Ponies picture OMG Ponies · Nov 30, 2009

Converting a subquery into a JOIN can be pretty straightforward:

IN clause

 FROM TABLE_X x
WHERE x.col IN (SELECT y.col FROM TABLE_Y y)

...can be converted to:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

Your JOIN criteria is where you have direct comparison.

EXISTS clause

But there are complications when you look at the EXISTS clause. EXISTS are typically correllated, where the subquery is filtered by criteria from the table(s) outside the subquery. But the EXISTS is only for returning a boolean based on the criteria.

 FROM TABLE_X x
WHERE EXISTS (SELECT NULL
                FROM TABLE_Y y
               WHERE y.col = x.col)

...converted:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

Because of the boolean, there's a risk of more rows turning up in the resultset.

SELECTs in the SELECT clause

These should always be changed, with prejudice:

SELECT x.*,
       (SELECT MAX(y.example_col)
          FROM TABLE_Y y
         WHERE y.col = x.col)
  FROM TABLE_X x

You're probably noticing a patter now, but I made this a little different for an inline view example:

SELECT x.*,
       z.mc
  FROM TABLE_X x
  JOIN (SELECT y.col, --inline view within the brackets
               MAX(y.example_col) 'mc'
          FROM TABLE_Y y
      GROUP BY y.col) z ON z.col = x.col

The key is making sure the inline view resultset includes the column(s) needed to join to, along with the columns.

LEFT JOINs

You might've noticed I didn't have any LEFT JOIN examples - this would only be necessary if columns from the subquery use NULL testing (COALESCE on almost any db these days, Oracle's NVL or NVL2, MySQLs IFNULL, SQL Server's ISNULL, etc...):

SELECT x.*,
       COALESCE((SELECT MAX(y.example_col)
          FROM TABLE_Y y
         WHERE y.col = x.col), 0)
  FROM TABLE_X x

Converted:

   SELECT x.*,
          COALESCE(z.mc, 0)
     FROM TABLE_X x
LEFT JOIN (SELECT y.col,
                  MAX(y.example_col) 'mc'
             FROM TABLE_Y y
         GROUP BY y.col) z ON z.col = x.col

Conclusion

I'm not sure if that will satisfy your typographic needs, but hope I've demonstrated that the key is determining what the JOIN criteria is. Once you know the column(s) involved, you know the table(s) involved.