I need to do a Postgres update on a collection of records & I'm trying to prevent a deadlock which appeared in the stress tests.
The typical resolution to this is to update records in a certain order, by ID for example - but it seems that Postgres doesn't allow ORDER BY for UPDATE.
Assuming I need to do an update, for example:
UPDATE BALANCES WHERE ID IN (SELECT ID FROM some_function() ORDER BY ID);
results in deadlocks when you run 200 queries concurrently. What to do?
I'm looking for a general solution, not case-specific workarounds like in UPDATE with ORDER BY
It feels that there must be a better solution than writing a cursor function. Also, if there's no better way, how would that cursor function optimally look like? Update record-by-record
As far as I know, there's no way to accomplish this directly through the UPDATE
statement; the only way to guarantee lock order is to explicitly acquire locks with a SELECT ... ORDER BY ID FOR UPDATE
, e.g.:
UPDATE Balances
SET Balance = 0
WHERE ID IN (
SELECT ID FROM Balances
WHERE ID IN (SELECT ID FROM some_function())
ORDER BY ID
FOR UPDATE
)
This has the downside of repeating the ID
index lookup on the Balances
table. In your simple example, you can avoid this overhead by fetching the physical row address (represented by the ctid
system column) during the locking query, and using that to drive the UPDATE
:
UPDATE Balances
SET Balance = 0
WHERE ctid = ANY(ARRAY(
SELECT ctid FROM Balances
WHERE ID IN (SELECT ID FROM some_function())
ORDER BY ID
FOR UPDATE
))
(Be careful when using ctid
s, as the values are transient. We're safe here, as the locks will block any changes.)
Unfortunately, the planner will only utilise the ctid
in a narrow set of cases (you can tell if it's working by looking for a "Tid Scan" node in the EXPLAIN
output). To handle more complicated queries within a single UPDATE
statement, e.g. if your new balance was being returned by some_function()
alongside the ID, you'll need to fall back to the ID-based lookup:
UPDATE Balances
SET Balance = Locks.NewBalance
FROM (
SELECT Balances.ID, some_function.NewBalance
FROM Balances
JOIN some_function() ON some_function.ID = Balances.ID
ORDER BY Balances.ID
FOR UPDATE
) Locks
WHERE Balances.ID = Locks.ID
If the performance overhead is an issue, you'd need to resort to using a cursor, which would look something like this:
DO $$
DECLARE
c CURSOR FOR
SELECT Balances.ID, some_function.NewBalance
FROM Balances
JOIN some_function() ON some_function.ID = Balances.ID
ORDER BY Balances.ID
FOR UPDATE;
BEGIN
FOR row IN c LOOP
UPDATE Balances
SET Balance = row.NewBalance
WHERE CURRENT OF c;
END LOOP;
END
$$