I've got two postgresql tables:
table name column names
----------- ------------------------
login_log ip | etc.
ip_location ip | location | hostname | etc.
I want to get every IP address from login_log
which doesn't have a row in ip_location
.
I tried this query but it throws a syntax error.
SELECT login_log.ip
FROM login_log
WHERE NOT EXIST (SELECT ip_location.ip
FROM ip_location
WHERE login_log.ip = ip_location.ip)
ERROR: syntax error at or near "SELECT" LINE 3: WHERE NOT EXIST (SELECT ip_location.ip`
I'm also wondering if this query (with adjustments to make it work) is the best performing query for this purpose.
There are basically 4 techniques for this task, all of them standard SQL.
NOT EXISTS
Often fastest in Postgres.
SELECT ip
FROM login_log l
WHERE NOT EXISTS (
SELECT -- SELECT list mostly irrelevant; can just be empty in Postgres
FROM ip_location
WHERE ip = l.ip
);
Also consider:
LEFT JOIN / IS NULL
Sometimes this is fastest. Often shortest. Often results in the same query plan as NOT EXISTS
.
SELECT l.ip
FROM login_log l
LEFT JOIN ip_location i USING (ip) -- short for: ON i.ip = l.ip
WHERE i.ip IS NULL;
EXCEPT
Short. Not as easily integrated in more complex queries.
SELECT ip
FROM login_log
EXCEPT ALL -- "ALL" keeps duplicates and makes it faster
SELECT ip
FROM ip_location;
Note that (per documentation):
duplicates are eliminated unless
EXCEPT ALL
is used.
Typically, you'll want the ALL
keyword. If you don't care, still use it because it makes the query faster.
NOT IN
Only good without NULL
values or if you know to handle NULL
properly. I would not use it for this purpose. Also, performance can deteriorate with bigger tables.
SELECT ip
FROM login_log
WHERE ip NOT IN (
SELECT DISTINCT ip -- DISTINCT is optional
FROM ip_location
);
NOT IN
carries a "trap" for NULL
values on either side:
Similar question on dba.SE targeted at MySQL: