Most efficient way to do a bulk UPDATE with pairs of input

Flimzy picture Flimzy · Feb 25, 2015 · Viewed 7.1k times · Source

Suppose I want to do a bulk update, setting a=b for a collection of a values. This can easily be done with a sequence of UPDATE queries:

UPDATE foo SET value='foo' WHERE id=1
UPDATE foo SET value='bar' WHERE id=2
UPDATE foo SET value='baz' WHERE id=3

But now I suppose I want to do this in bulk. I have a two dimensional array containing the ids and new values:

[ [ 1, 'foo' ]
  [ 2, 'bar' ]
  [ 3, 'baz' ] ]

Is there an efficient way to do these three UPDATEs in a single SQL query?

Some solutions I have considered:

  1. A temporary table

    CREATE TABLE temp ...;
    INSERT INTO temp (id,value) VALUES (....);
    UPDATE foo USING temp ...
    

    But this really just moves the problem. Although it may be easier (or at least less ugly) to do a bulk INSERT, there are still a minimum of three queries.

  2. Denormalize the input by passing the data pairs as SQL arrays. This makes the query incredibly ugly, though

    UPDATE foo
    USING (
        SELECT
            split_part(x,',',1)::INT AS id,
            split_part(x,',',2)::VARCHAR AS value
        FROM (
            SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) AS x
        ) AS x;
    )
    SET value=x.value WHERE id=x.id
    

    This makes it possible to use a single query, but makes that query ugly, and inefficient (especially for mixed and/or complex data types).

Is there a better solution? Or should I resort to multiple UPDATE queries?

Answer

wildplasser picture wildplasser · Feb 25, 2015

Normally you want to batch-update from a table with sufficient index to make the merge easy:

CREATE TEMP TABLE updates_table
        ( id integer not null primary key
        , val varchar
        );
INSERT into updates_table(id, val) VALUES
 ( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
        ;

UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
        ;

So you should probably populate your update_table by something like:


INSERT into updates_table(id, val)
SELECT
        split_part(x,',',1)::INT AS id,
        split_part(x,',',2)::VARCHAR AS value
    FROM (
        SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) 
         ) AS x
     ;

Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)


In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:

UPDATE target_table t
    SET value = u.val
    FROM updates_table u
    WHERE t.id = u.id
    AND (t.value IS NULL OR t.value <> u.value)
            ;