Efficient way to select all values from one column not in another column

Question 1

Efficient way to select all values from one column not in another column

sql performance distinct notin

Flash · Jan 5, 2012 · Viewed 45.6k times · Source

Answer

Answer

In standard SQL there are no parentheses in DISTINCT colA. DISTINCT is not a function.

SELECT DISTINCT colA
FROM   mytable
WHERE  colA NOT IN (SELECT DISTINCT colB FROM mytable);

Added DISTINCT to the sub-select as well. If you have many duplicates it could speed up the query.

A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN as alternative to exclude the values in valB, and an alternative way to get distinct values with GROUP BY:

WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM   mytable m
LEFT   JOIN x ON x.colB = m.colA
WHERE  x.colB IS NULL
GROUP  BY m.colA;

Or, simplified further, and with a plain subquery (probably fastest):

SELECT DISTINCT m.colA
FROM   mytable m
LEFT   JOIN mytable x ON x.colB = m.colA
WHERE  x.colB IS NULL;

There are basically 4 techniques to exclude rows with keys present in another (or the same) table:

Select rows which are not present in other table

The deciding factor for speed will be indexes. You need to have indexes on colA and colB for this query to be fast.

Question 2

I need to return all values from colA that are not in colB from mytable. I am using:

SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)

It is working however the query is taking an excessively long time to complete.

Is there a more efficient way to do this?

Efficient way to select all values from one column not in another column

Answer

Related questions