Creating seed data in a flask-migrate or alembic migration

Mark Richman picture Mark Richman · Oct 12, 2013 · Viewed 28.9k times · Source

How can I insert some seed data in my first migration? If the migration is not the best place for this, then what is the best practice?

"""empty message

Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069

"""

# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None

from alembic import op
import sqlalchemy as sa


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table('list_type',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('name', sa.String(length=80), nullable=False),
    sa.PrimaryKeyConstraint('id'),
    sa.UniqueConstraint('name')
    )
    op.create_table('job',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('list_type_id', sa.Integer(), nullable=False),
    sa.Column('record_count', sa.Integer(), nullable=False),
    sa.Column('status', sa.Integer(), nullable=False),
    sa.Column('sf_job_id', sa.Integer(), nullable=False),
    sa.Column('created_at', sa.DateTime(), nullable=False),
    sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
    sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
    sa.PrimaryKeyConstraint('id')
    )
    ### end Alembic commands ###

    # ==> INSERT SEED DATA HERE <==


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('job')
    op.drop_table('list_type')
    ### end Alembic commands ###

Answer

Mark Hildreth picture Mark Hildreth · Oct 12, 2013

Alembic has, as one of its operation, bulk_insert(). The documentation gives the following example (with some fixes I've included):

from datetime import date
from sqlalchemy.sql import table, column
from sqlalchemy import String, Integer, Date
from alembic import op

# Create an ad-hoc table to use for the insert statement.
accounts_table = table('account',
    column('id', Integer),
    column('name', String),
    column('create_date', Date)
)

op.bulk_insert(accounts_table,
    [
        {'id':1, 'name':'John Smith',
                'create_date':date(2010, 10, 5)},
        {'id':2, 'name':'Ed Williams',
                'create_date':date(2007, 5, 27)},
        {'id':3, 'name':'Wendy Jones',
                'create_date':date(2008, 8, 15)},
    ]
)

Note too that the alembic has an execute() operation, which is just like the normal execute() function in SQLAlchemy: you can run any SQL you wish, as the documentation example shows:

from sqlalchemy.sql import table, column
from sqlalchemy import String
from alembic import op

account = table('account',
    column('name', String)
)
op.execute(
    account.update().\
        where(account.c.name==op.inline_literal('account 1')).\
        values({'name':op.inline_literal('account 2')})
        )

Notice that the table that is being used to create the metadata that is used in the update statement is defined directly in the schema. This might seem like it breaks DRY (isn't the table already defined in your application), but is actually quite necessary. If you were to try to use the table or model definition that is part of your application, you would break this migration when you make changes to your table/model in your application. Your migration scripts should be set in stone: a change to a future version of your models should not change migrations scripts. Using the application models will mean that the definitions will change depending on what version of the models you have checked out (most likely the latest). Therefore, you need the table definition to be self-contained in the migration script.

Another thing to talk about is whether you should put your seed data into a script that runs as its own command (such as using a Flask-Script command, as shown in the other answer). This can be used, but you should be careful about it. If the data you're loading is test data, then that's one thing. But I've understood "seed data" to mean data that is required for the application to work correctly. For example, if you need to set up records for "admin" and "user" in the "roles" table. This data SHOULD be inserted as part of the migrations. Remember that a script will only work with the latest version of your database, whereas a migration will work with the specific version that you are migrating to or from. If you wanted a script to load the roles info, you could need a script for every version of the database with a different schema for the "roles" table.

Also, by relying on a script, you would make it more difficult for you to run the script between migrations (say migration 3->4 requires that the seed data in the initial migration to be in the database). You now need to modify Alembic's default way of running to run these scripts. And that's still not ignoring the problems with the fact that these scripts would have to change over time, and who knows what version of your application you have checked out from source control.