Delete from table A joining on table A in Redshift

moinudin picture moinudin · Apr 30, 2014 · Viewed 14.1k times · Source

I am trying to write the following MySQL query in PostgreSQL 8.0 (specifically, using Redshift):

DELETE t1 FROM table t1
LEFT JOIN table t2 ON (
    t1.field   = t2.field AND
    t1.field2  = t2.field2
)
WHERE t1.field > 0

PostgreSQL 8.0 does not support DELETE FROM table USING. The examples in the docs say that you can reference columns in other tables in the where clause, but that doesn't work here as I'm joining on the same table I'm deleting from. The other example is a subselect query, but the primary key of the table I'm working with has four columns so I can't see a way to make that work either.

Answer

Erwin Brandstetter picture Erwin Brandstetter · Apr 30, 2014

Amazon Redshift may be based on Postgres 8.0, but is a very much different thing. I don't use it, but the manual informs, that the USING clause is supported in DELETE statements:

Just use the modern form:

DELETE FROM tbl
USING  tbl t2
WHERE  t2.field  = tbl.field
AND    t2.field2 = tbl.field2
AND    t2.pkey  <> tbl.pkey    -- exclude self-join
AND    tbl.field > 0;

This is assuming JOIN instead of LEFT JOIN in your MySQL statement, which would not make any sense. I also added the condition AND t2.pkey <> t1.pkey, to make it a useful query. This excludes rows joining itself. pkey being the primary key column.

What this query does:
Delete all rows where at least one other row exists in the same table with the same not-null values in field and field2. All such duplicates are deleted without leaving a single row per set.

To keep (for example) the row with the smallest pkey per set of duplicates, use t2.pkey < t2.pkey.

An EXISTS semi-join (as @wilplasser already hinted) might be a better choice, especially if multiple rows could be joined (a row can only be deleted once anyway):

DELETE FROM tbl
WHERE  field > 0
AND    EXISTS (
   SELECT 1
   FROM   tbl t2
   WHERE  t2.field  = tbl.field
   AND    t2.field2 = tbl.field2
   AND    t2.pkey  <> tbl.pkey 
   );