Performance issue in update query

vara picture vara · Jul 9, 2014 · Viewed 18.9k times · Source

I have one small doubt in query performance. Basically, I have a table with more than 1C records. sl_id is the primary key in that table. Currently, I am updating the table column status to true (default false) by using the sl_id.

In my program, I will have 200 unique sl_id in an array. I am updating the status to true(always) by using each sl_id.

My doubt:

Shall I use individual update queries by specifing each sl_id in a where condition to update the status?

(OR)

Shall I use IN operator and put all 200 unique sl_id in one single query?

Which one will be faster?

Answer

Craig Ringer picture Craig Ringer · Jul 9, 2014

In rough order of slower to faster:

  • 200 Individual queries, each in their own transaction
  • 200 Individual queries, all in one transaction
  • 1 big query with WHERE ... IN (...) or WHERE EXISTS (SELECT ...)
  • 1 big query with an INNER JOIN over a VALUES clause
  • (only faster for very big lists of values): COPY value list to a temp table, index it, and JOIN on the temp table.

If you're using hundreds of values I really suggest joining over a VALUES clause. For many thousands of values, COPY to a temp table and index it then join on it.

An example of joining on a values clause. Given this IN query:

SELECT *
FROM mytable
WHERE somevalue IN (1, 2, 3, 4, 5);

the equivalent with VALUES is:

SELECT *
FROM mytable
INNER JOIN (
  VALUES (1), (2), (3), (4), (5)
) vals(v)
ON (somevalue = v);

Note, however, that using VALUES this way is a PostgreSQL extension, wheras IN, or using a temporary table, is SQL standard.

See this related question: