Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail

Dwayne Towell picture Dwayne Towell · Jul 22, 2014 · Viewed 22.6k times · Source

In this candidate answer it is asserted that JOIN is better than LEFT JOIN under some circumstances involving some WHERE clauses because it does not confuse the query planner and is not "pointless". The assertion/assumption is that it should be obvious to anyone.

Please explain further or provide link(s) for further reading.

Answer

Erwin Brandstetter picture Erwin Brandstetter · Jul 22, 2014

Effectively, WHERE conditions and JOIN conditions for [INNER] JOIN are 100 % equivalent in PostgreSQL. (It's good practice to use explicit JOIN conditions to make queries easier to read and maintain, though).

The same is not true for a LEFT JOIN combined with a WHERE condition on a table to the right of the join. The purpose of a LEFT JOIN is to preserve all rows on the left side of the join, irregardless of a match on the right side. If no match is found, the row is extended with NULL values for columns on the right side. The manual:

LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.

If you then apply a WHERE condition on columns of tables on the right side, you void the effect and forcibly convert the LEFT JOIN to work like a plain JOIN, just more expensively due to a more complicated query plan.

In a query with many joined tables, Postgres (or any RDBMS) is hard put to it to find the best (or even a good) query plan. The number of theoretically possible sequences to join tables grows factorially (!). Postgres uses the "Generic Query Optimizer" for the task and there are some settings to influence it.

Obfuscating the query with misleading LEFT JOIN as outlined, makes the work of the query planner harder, is misleading for human readers and typically hints at errors in the query logic.

Many related answers for problems stemming from this:

Etc.