Using the SQLAlchemy ORM inside an Alembic migration: how do I?

Hank Gay picture Hank Gay · Dec 3, 2012 · Viewed 8.5k times · Source

I currently have a column that contains HTML markup. Inside that markup, there is a timestamp that I want to store in a new column (so I can query against it). My idea was to do the following in a single migration:

  1. Create a new, nullable column for the data
  2. Use the ORM to pull back the HTML I need to parse
  3. For each row
    1. parse the HTML to pull out the timestamp
    2. update the ORM object

But when I try to run my migration, it appears to be stuck in an infinite loop. Here's what I've got so far:

def _extract_publication_date(html):
    root = html5lib.parse(html, treebuilder='lxml', namespaceHTMLElements=False)
    publication_date_string = root.xpath("//a/@data-datetime")[0]
    return parse_date(publication_date)


def _update_tip(tip):
    tip.publication_date = _extract_publication_date(tip.rendered_html)
    tip.save()


def upgrade():
    op.add_column('tip', sa.Column('publication_date', sa.DateTime(timezone=True)))
    tips = Tip.query.all()
    map(tips, _update_tip)


def downgrade():
    op.drop_column('tip', 'publication_date')

Answer

killthrush picture killthrush · Mar 18, 2016

After a bit of experimentation using @velochy's answer, I settled on something like the following pattern for using SqlAlchemy inside Alembic. This worked great for me and could probably serve as a general solution for the OP's question:

from sqlalchemy.orm.session import Session
from alembic import op

# Copy the model definitions into the migration script if
# you want the migration script to be robust against later
# changes to the models. Also, if your migration includes
# deleting an existing column that you want to access as 
# part of the migration, then you'll want to leave that 
# column defined in the model copies here.
class Model1(Base): ...
class Model2(Base): ...

def upgrade():
    # Attach a sqlalchemy Session to the env connection
    session = Session(bind=op.get_bind())

    # Perform arbitrarily-complex ORM logic
    instance1 = Model1(foo='bar')
    instance2 = Model2(monkey='banana')

    # Add models to Session so they're tracked
    session.add(instance1)
    session.add(instance2)

    # Apply a transform to existing data
    m1s = session.query(Model1).all()
    for m1 in m1s:
        m1.foo = transform(m1.foo)
    session.commit()

def downgrade():
    # Attach a sqlalchemy Session to the env connection
    session = Session(bind=op.get_bind())

    # Perform ORM logic in downgrade (e.g. clear tables)
    session.query(Model2).delete()
    session.query(Model1).delete()

    # Revert transform of existing data
    m1s = session.query(Model1).all()
    for m1 in m1s:
        m1.foo = un_transform(m1.foo)
    session.commit()

This approach appears to handle transactions properly. Frequently while working on this, I would generate DB exceptions and they would roll things back as expected.