With this table:
CREATE TABLE test_insert (
col1 INT,
col2 VARCHAR(10),
col3 DATE
)
the following code takes 40 seconds to run:
import pyodbc
from datetime import date
conn = pyodbc.connect('DRIVER={SQL Server Native Client 10.0};'
'SERVER=localhost;DATABASE=test;UID=xxx;PWD=yyy')
rows = []
row = [1, 'abc', date.today()]
for i in range(10000):
rows.append(row)
cursor = conn.cursor()
cursor.executemany('INSERT INTO test_insert VALUES (?, ?, ?)', rows)
conn.commit()
The equivalent code with psycopg2 only takes 3 seconds. I don't think mssql is that much slower than postgresql. Any idea on how to improve the bulk insert speed when using pyodbc?
EDIT: Add some notes following ghoerz's discovery
In pyodbc, the flow of executemany
is:
In ceODBC, the flow of executemany
is:
I was having a similar issue with pyODBC inserting into a SQL Server 2008 DB using executemany(). When I ran a profiler trace on the SQL side, pyODBC was creating a connection, preparing the parametrized insert statement, and executing it for one row. Then it would unprepare the statement, and close the connection. It then repeated this process for each row.
I wasn't able to find any solution in pyODBC that didn't do this. I ended up switching to ceODBC for connecting to SQL Server, and it used the parametrized statements correctly.