Amazon Redshift Keys are not enforced - how to prevent duplicate data?

Question 1

Amazon Redshift Keys are not enforced - how to prevent duplicate data?

sql amazon-web-services amazon-redshift

Saeven · Mar 2, 2013 · Viewed 14.1k times · Source

Answer

Answer

I assign UUIDs when the records are created. If the record is inherently unique, I use type 4 UUIDs (random), and when they aren't I use type 5 (SHA-1 hash) using the natural keys as input.
Then you can follow this instruction by AWS very easily to perform UPSERTs. If your input has duplicates, you should be able to clean up by issuing a SQL that looks something like this in your staging table:

CREATE TABLE cleaned AS
SELECT
  pk_field,
  field_1,
  field_2,
  ...  
FROM (
       SELECT
         ROW_NUMBER() OVER (PARTITION BY pk_field order by pk_field) AS r,
       t.*
       from table1 t
     ) x
where x.r = 1

Question 2

Just testing out AWS Redshift, and having discovered some dupe data on an insert that I'd hoped would just fail on duplication in the key column, reading the docs reveal that primary key constraints aren't "enforced".

Anyone figured out how to prevent duplication on primary key (per "traditional" expectation).

Thanks to any Redshift pioneers!

Amazon Redshift Keys are not enforced - how to prevent duplicate data?

Answer

Related questions