Dealing with duplicate primary keys on insert in SQLAlchemy (declarative style)

Edwardr picture Edwardr · Apr 25, 2012 · Viewed 34.4k times · Source

My application is using a scoped session and the declarative style of SQLALchemy. It's a web-app and a lot of the DB insertions are executed by Celery, a task scheduler.

Typically, when deciding to insert an object, my code might do something along the following lines:

from schema import Session
from schema.models import Bike

pk = 123 # primary key
bike = Session.query(Bike).filter_by(bike_id=pk).first()
if not bike: # no bike in DB
    new_bike = Bike(pk, "shiny", "bike")
    Session.add(new_bike)
    Session.commit()

The issue here is that because a lot of this is done by asynchronous workers, it's possible for one working to be halfway though inserting a Bike with id=123, while another one is checking for its existence. In this case the second worker will try and insert a row with the same primary key, and SQLAlchemy will raise an IntegrityError.

I can't for the life of me find a nice way to deal with this issue apart from swapping out Session.commit() for:

'''schema/__init__.py'''
from sqlalchemy.orm import scoped_session, sessionmaker
Session = scoped_session(sessionmaker())

def commit(ignore=False):
    try:
        Session.commit()
    except IntegrityError as e:
        reason = e.message
        logger.warning(reason)

        if not ignore:
            raise e

        if "Duplicate entry" in reason:
            logger.info("%s already in table." % e.params[0])
            Session.rollback()

And then everywhere I have Session.commit I now have schema.commit(ignore=True) where I don't mind that the row is not inserted again.

To me this seems very brittle because of the string checking. Just as an FYI, when an IntegrityError is raised it looks like this:

(IntegrityError) (1062, "Duplicate entry '123' for key 'PRIMARY'")

So of course is the primary key I was inserting was something like Duplicate entry is a cool thing then I suppose I could miss IntegrityError's which weren't actually because of duplicate primary keys.

Are there any better approaches, which maintain the clean SQLAlchemy approach I'm using (as opposed to starting to write out statements in strings etc. . .)

Db is MySQL (though for unit testing I like to use SQLite, and wouldn't want to hinder that ability with any new approaches).

Cheers!

Answer

sirdodger picture sirdodger · Jul 23, 2012

If you use session.merge(bike) instead of session.add(bike), then you will not generate primary key errors. The bike will be retrieved and updated or created as needed.