Nested Join vs Merge Join vs Hash Join in PostgreSQL

vinieth picture vinieth · Feb 28, 2018 · Viewed 11.4k times · Source

I know how the

  1. Nested Join
  2. Merge Join
  3. Hash Join

works and its functionality.

I wanted to know in which situation these joins are used in Postgres

Answer

Laurenz Albe picture Laurenz Albe · Feb 28, 2018

The following are a few rules of thumb:

  • Nested loop joins are preferred if one of the sides of the join has few rows. Nested loop joins are also used as the only option if the join condition does not use the equality operator.

  • Hash Joins are preferred if the join condition uses an equality operator and both sides of the join are large and the hash fits into work_mem.

  • Merge Joins are preferred if the join condition uses an equality operator and both sides of the join are large, but can be sorted on the join condition efficiently (for example, if there is an index on the expressions used in the join column).

A typical OLTP query that chooses only one row from one table and the associated rows from another table will always use a nested loop join as the only efficient method.

Queries that join tables with many rows (which cannot be filtered out before the join) would be very inefficient with a nested loop join and will always use a hash or merge join if the join condition allows it.

The optimizer considers each of these join strategies and uses the one that promises the lowest costs. The most important factor on which this decision is based is the estimated row count from both sides of the join. Consequently, wrong optimizer choices are usually caused by misestimates in the row counts.