I have a dataframe with ca 155,000 rows and 12 columns. If I export it to csv with dataframe.to_csv , the output is an 11MB file (which is produced instantly).
If, however, I export to a Microsoft SQL Server with the to_sql method, it takes between 5 and 6 minutes! No columns are text: only int, float, bool and dates. I have seen cases where ODBC drivers set nvarchar(max) and this slows down the data transfer, but it cannot be the case here.
Any suggestions on how to speed up the export process? Taking 6 minutes to export 11 MBs of data makes the ODBC connection practically unusable.
Thanks!
My code is:
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
ServerName = "myserver"
Database = "mydatabase"
TableName = "mytable"
engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database)
conn = engine.connect()
metadata = MetaData(conn)
my_data_frame.to_sql(TableName,engine)
I recently had the same problem and feel like to add an answer to this for others.
to_sql
seems to send an INSERT
query for every row which makes it really slow. But since 0.24.0
there is a method
parameter in pandas.to_sql()
where you can define your own insertion function or just use method='multi'
to tell pandas to pass multiple rows in a single INSERT query, which makes it a lot faster.
Note that your Database may has a parameter limit. In that case you also have to define a chunksize.
So the solution should simply look like to this:
my_data_frame.to_sql(TableName, engine, chunksize=<yourParameterLimit>, method='multi')
If you do not know your database parameter limit, just try it without the chunksize parameter. It will run or give you an error telling you your limit.