For Loop or executemany - Python and SQLite3

Simon Roadknight picture Simon Roadknight · May 4, 2017 · Viewed 8.5k times · Source

I have started to learn Python and SQL recently and have a question.

Using Python with SQLite3 I have written the following code:

# Use sqlite3 in the file
import sqlite3

# Create people.db if it doesn't exist or connect to it if it does exist
with sqlite3.connect("people.db") as connection:
    c = connection.cursor()

    # Create new table called people
    c.execute("""CREATE TABLE IF NOT EXISTS people(firstname TEXT, lastname TEXT, age INT, occupation TEXT)""")


    people_list = [
        ('Simon', 'Doe', 20, 'Python Master'),
        ('John', 'Doe', 50, 'Java Master'),
        ('Jane', 'Doe', 30, 'C++ Master'),
        ('Smelly', 'Doe', 2, 'Shower Master')
    ]

    # Insert dummy data into the table
    c.executemany("""INSERT INTO people(firstname, lastname, age, occupation) VALUES(?, ?, ?, ?)""", people_list)

I noticed that I can do the same thing using a for loop instead of executemany like so:

# Use sqlite3 in the file
import sqlite3

# Create people.db if it doesn't exist or connect to it if it does exist
with sqlite3.connect("people.db") as connection:
    c = connection.cursor()

# Create new table called people
c.execute("""CREATE TABLE IF NOT EXISTS people(firstname TEXT, lastname TEXT, age INT, occupation TEXT)""")


people_list = [
    ('Simon', 'Doe', 20, 'Python Master'),
    ('John', 'Doe', 50, 'Java Master'),
    ('Jane', 'Doe', 30, 'C++ Master'),
    ('Smelly', 'Doe', 2, 'Shower Master')
]

# Insert dummy data into the table
for person in people_list:
    c.execute("""INSERT INTO people(firstname, lastname, age, occupation) VALUES(?, ?, ?, ?)""", person)

I'm just wondering which one is more efficient and used more often?

Answer

user1919238 picture user1919238 · May 4, 2017

A batch insert with executemany will be more efficient, and the performance difference will typically be quite large as your number of records gets large. There is a significant overhead to executing an insert statement, which you incur over and over again if you insert one row at a time.

You should prefer a batch insert whenever it is straightforward to do so.

In some cases, it may be much simpler to code an algorithm that inserts one row at a time. For example, if you are receiving data from somewhere one item at a time, it may be simpler to just insert it as you get it, rather than building up a list and doing a single insert when you have saved up a lot of data. In such a case, you need to consider the tradeoffs between performance and simplicity of code. It's often better to start off with the simple approach (insert one row at a time) and then only optimize to something else if you find it's not performant.