I have 2 simple scripts:
from sqlalchemy import create_engine, ForeignKey, Table
from sqlalchemy import Column, Date, Integer, String, DateTime, BigInteger, event
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import Engine
from sqlalchemy.orm import relationship, backref, sessionmaker, scoped_session, Session
class Test(declarative_base()):
__tablename__ = "Test"
def __init__(self, *args, **kwargs):
args = args[0]
for key in args:
setattr(self, key, args[key] )
key = Column(String, primary_key=True)
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a})
engine = create_engine("sqlite:///testn", echo=False)
with engine.connect() as connection:
Test.metadata.create_all(engine)
session = Session(engine)
list(map(lambda x: session.merge(Test(x)), data))
session.commit()
result:
real 0m15.300s
user 0m14.920s
sys 0m0.351s
second script:
from peewee import *
class Test(Model):
key = TextField(primary_key=True,null=False)
dbname = "test"
db = SqliteDatabase(dbname)
Test._meta.database = db
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a })
if not Test.table_exists():
db.create_tables([Test])
with db.atomic() as tr:
Test.insert_many(data).upsert().execute()
result:
real 0m3.253s
user 0m2.620s
sys 0m0.571s
Why?
This comparison is not entirely valid, as issuing an upsert style query is very different from what SQLAlchemy's Session.merge
does:
Session.merge()
examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on primary key, and if none can be located, creates a new instance.
In this test case this will result in 10,000 load attempts against the database, which is expensive.
On the other hand when using peewee with sqlite the combination of insert_many(data)
and upsert()
can result in a single query:
INSERT OR REPLACE INTO Test (key) VALUES ('key0'), ('key1'), ...
There's no session state to reconcile, since peewee is a very different kind of ORM from SQLAlchemy and on a quick glance looks closer to Core and Table
s
In SQLAlchemy instead of list(map(lambda x: session.merge(Test(x)), data))
you could revert to using Core:
session.execute(Test.__table__.insert(prefixes=['OR REPLACE']).values(data))
A major con about this is that you have to write a database vendor specific prefix to INSERT
by hand. This will also subvert the Session, as it will have no information or knowledge about the newly added rows.
Bulk insertions using model objects are a little more involved with SQLAlchemy. Very simply put using an ORM is a trade-off between ease of use and speed:
ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.