Ignore error in batch insert Postgresql

Avraam Mavridis picture Avraam Mavridis · Nov 13, 2016 · Viewed 8k times · Source

I have a process that runs every 5 minutes and tries to insert a batch of articles into a table. The articles come from web-scrapping, so there are cases in which I am trying to insert a batch that contains articles which have already been saved into the DB.

My primary key is uuid - an MD5 hash of the article title.

Checking if an article exists in the db to filter the batch is kinda inefficient.

Is it a DB level way in Postgresql to ignore the attempts of inserting a duplicate uuid without returning an Error?

Answer

Adam Matan picture Adam Matan · Nov 13, 2016

Solution

You could insert using the WHERE NOT EXISTS clause.

For example, consider a test table with a numeric id as primary key and a textual name.

Code

db=> CREATE TABLE test(id BIGSERIAL PRIMARY KEY, name TEXT);
CREATE TABLE

-- Insertion will work - empty table
db=> INSERT INTO test(id, name) 
     SELECT 1, 'Partner number 1' 
     WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
INSERT 0 1

-- Insertion will NOT work - duplicate id
db=> INSERT INTO test(id, name) 
     SELECT 1, 'Partner number 1' 
     WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);    
INSERT 0 0

-- After two insertions, the table contains only one row
db=> SELECT * FROM test;
 id |       name
----+------------------
  1 | Partner number 1
(1 row)

Difference from ON CONFILCT

Quoting the documentation:

ON CONFLICT can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error.

The action can be DO NOTHING, or a DO UPDATE. The second approach is often referred to as Upsert - a portmanteau of Insert and Update.

Technically WHERE NOT EXISTS is equivalent to ON CONFILCT DO NOTHING. See the query plans for a deeper dive.