In trying to improve the speed of an immensely slow query (several minutes on two tables with only ~50,000 rows each, on SQL Server 2008 if it matters), I narrowed down the problem to an OR
in my inner join, as in:
SELECT mt.ID, mt.ParentID, ot.MasterID
FROM dbo.MainTable AS mt
INNER JOIN dbo.OtherTable AS ot ON ot.ParentID = mt.ID
OR ot.ID = mt.ParentID
I changed this to (what I hope is) an equivalent pair of left joins, shown here:
SELECT mt.ID, mt.ParentID,
CASE WHEN ot1.MasterID IS NOT NULL THEN
ot1.MasterID ELSE
ot2.MasterID END AS MasterID
FROM dbo.MainTable AS mt
LEFT JOIN dbo.OtherTable AS ot1 ON ot1.ParentID = mt.ID
LEFT JOIN dbo.OtherTable AS ot2 ON ot2.ID = mt.ParentID
WHERE ot1.MasterID IS NOT NULL OR ot2.MasterID IS NOT NULL
.. and the query now runs in about a second!
Is it generally a bad idea to put an OR
in a join condition? Or am I just unlucky somehow in the layout of my tables?
This kind of JOIN
is not optimizable to a HASH JOIN
or a MERGE JOIN
.
It can be expressed as a concatenation of two resultsets:
SELECT *
FROM maintable m
JOIN othertable o
ON o.parentId = m.id
UNION
SELECT *
FROM maintable m
JOIN othertable o
ON o.id = m.parentId
, each of them being an equijoin, however, SQL Server
's optimizer is not smart enough to see it in the query you wrote (though they are logically equivalent).