I want to migrate data from a large csv file to sqlite3 database.
My code on Python 3.5 using pandas:
con = sqlite3.connect(DB_FILENAME)
df = pd.read_csv(MLS_FULLPATH)
df.to_sql(con=con, name="MLS", if_exists="replace", index=False)
Is it possible to print current status (progress bar) of execution of to_sql method?
I looked the article about tqdm, but didn't find how to do this.
Unfortuantely DataFrame.to_sql
does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:
import sqlite3
import pandas as pd
from tqdm import tqdm
DB_FILENAME='/tmp/test.sqlite'
def chunker(seq, size):
# from http://stackoverflow.com/a/434328
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
def insert_with_progress(df, dbfile):
con = sqlite3.connect(dbfile)
chunksize = int(len(df) / 10) # 10%
with tqdm(total=len(df)) as pbar:
for i, cdf in enumerate(chunker(df, chunksize)):
replace = "replace" if i == 0 else "append"
cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)
pbar.update(chunksize)
df = pd.DataFrame({'a': range(0,100000)})
insert_with_progress(df, DB_FILENAME)
Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.
The result is quite stunning: