Does Django Atomic Transaction lock the database?

kong picture kong · Mar 1, 2017 · Viewed 8.9k times · Source

When you do:

@transaction.atomic
def update_db():
    do_bulk_update()

while the function is running, does it lock the database?

I'm asking regarding django's atomic transaction: https://docs.djangoproject.com/en/1.10/topics/db/transactions/#autocommit-details

Answer

daphtdazz picture daphtdazz · Dec 14, 2017

(I'm assuming modern SQL databases in this answer.)

tl;dr

Transactions are not locks, but hold locks that are acquired automatically during operations. And django does not add any locking by default, so the answer is No, it does not lock the database.

E.g. if you were do:

@transaction.atomic
def update_db():
    cursor.execute('UPDATE app_model SET model_name TO 'bob' WHERE model_id = 1;')
    # some other stuff...

You will have locked the app_model row with id 1 for the duration of "other stuff". But it is not locked until that query. So if you want to ensure consistency you should probably use locks explicitly.

Transactions

As said, transactions are not locks because that would be awful for perfomance. In general they are lighter-weight mechanisms in the first instance for ensuring that if you make a load of changes that wouldn't make sense one at a time to other users of the database, those changes appear to happen all at once. I.e. are atomic. Transactions do not block other users from mutating the database, and indeed in general do not block other users from mutation the same rows you may be reading.

See this guide and your databases docs (e.g. postgres) for more details on how transactions are protected.

Django implementation of atomic.

Django itself does the following when you use the atomic decorator (referring to the code).

Not already in an atomic block

  1. Disables autocommit. Autocommit is an application level feature which will always commit transactions immediately, so it looks to the application like there is never a transaction outstanding.

    This tells the database to start a new transaction.

    • At this point psycopg2 for postgres sets the isolation level of the transaction to READ COMMITTED, which means that any reads in the transaction will only return committed data, which means if another transaction writes, you won't see that change until it commits it. It does mean though that if that transaction commits during your transaction, you may read again and see that the value has changed during your transaction.

      Obviously this means that the database is not locked.

  2. Runs your code. Any queries / mutations you make are not committed.

  3. Commits the transaction.

  4. Re-enables autocommit.

In an earlier atomic block

Basically in this case we try to use savepoints so we can revert back to the if we "rollback" the "transaction", but as far as the database connection is concerned we are in the same transaction.

Automatic locking

As said, the database may give your transaction some automatic locks, as outlined in this doc. To demonstrate this, consider the following code that operates on a postgres database with one table and one row in it:

my_table
id | age
---+----
1  | 50

And then you run this code:

import psycopg2 as Database
from multiprocessing import Process
from time import sleep
from contextlib import contextmanager


@contextmanager
def connection():
    conn = Database.connect(
        user='daphtdazz', host='localhost', port=5432, database='db_test'
    )
    try:
        yield conn
    finally:
        conn.close()

def connect_and_mutate_after_seconds(seconds, age):

    with connection() as conn:
        curs = conn.cursor()
        print('execute update age to %d...' % (age,))
        curs.execute('update my_table set age = %d where id = 1;' % (age,))
        print('sleep after update age to %d...' % (age,))
        sleep(seconds)
        print('commit update age to %d...' % (age,))
        conn.commit()


def dump_table():
    with connection() as conn:
        curs = conn.cursor()
        curs.execute('select * from my_table;')
        print('table: %s' % (curs.fetchall(),))

if __name__ == '__main__':

    p1 = Process(target=connect_and_mutate_after_seconds, args=(2, 99))
    p1.start()

    sleep(0.6)
    p2 = Process(target=connect_and_mutate_after_seconds, args=(1, 100))
    p2.start()
    p2.join()

    dump_table()

    p1.join()

    dump_table()

You get:

execute update age to 99...
sleep after update age to 99...
execute update age to 100...
commit update age to 99...
sleep after update age to 100...
commit update age to 100...
table: [(1, 100)]
table: [(1, 100)]

and the point is that the second process is started before the first command completes, but after it has called the update command, so the second process has to wait for the lock which is why we don't see sleep after update age to 100 until after the commit for age 99.

If you put the sleep before the exec, you get:

sleep before update age to 99...
sleep before update age to 100...
execute update age to 100...
commit update age to 100...
table: [(24, 3), (100, 2)]
execute update age to 99...
commit update age to 99...
table: [(24, 3), (99, 2)]

Indicating the lock was not acquired by the time the second process gets to its update, which happens first but during the first process's transaction.