I need to return all values from colA
that are not in colB
from mytable
. I am using:
SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)
It is working however the query is taking an excessively long time to complete.
Is there a more efficient way to do this?
In standard SQL there are no parentheses in DISTINCT colA
. DISTINCT
is not a function.
SELECT DISTINCT colA
FROM mytable
WHERE colA NOT IN (SELECT DISTINCT colB FROM mytable);
Added DISTINCT
to the sub-select as well. If you have many duplicates it could speed up the query.
A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN
as alternative to exclude the values in valB
, and an alternative way to get distinct values with GROUP BY
:
WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM mytable m
LEFT JOIN x ON x.colB = m.colA
WHERE x.colB IS NULL
GROUP BY m.colA;
Or, simplified further, and with a plain subquery (probably fastest):
SELECT DISTINCT m.colA
FROM mytable m
LEFT JOIN mytable x ON x.colB = m.colA
WHERE x.colB IS NULL;
There are basically 4 techniques to exclude rows with keys present in another (or the same) table:
The deciding factor for speed will be indexes. You need to have indexes on colA
and colB
for this query to be fast.