How can pandas.read_sql_query() query a TEMP table?

DavidJ picture DavidJ · Oct 9, 2014 · Viewed 9.3k times · Source

I'm in the process of converting Python code over to the new SQLAlchemy-based Pandas 0.14.1.

A common pattern we use is (generically):

connection = db.connect()  # open connection/session

sql = 'CREATE TEMP TABLE table1 AS SELECT ...'
connection.execute(sql)

... other sql that creates TEMP tables from various joins of previous TEMP tables ...

sql = 'CREATE TEMP TABLE tableN AS SELECT ...'
connection.execute(sql)

result = connection.query('SELECT * FROM tableN WHERE ...')

connection.close()

Now, once the connection is closed the TEMP tables are purged by the DB server. However, as the final select query is using the same connection/session, it can access the tables.

How can I achieve similar using SQLAlchemy and pd.read_sql_query() ?

For example:

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql_query('SELECT * FROM tmptable WHERE ...', engine)

yields a DB error that the TEMP table tmptable doesn't exist. Presumably this is because passing the engine to the read_sql_query() requires it to open a new connection which has an independent session scope and hence can't see the TEMP table. Is that a reasonable assumption?

Is there a way to work around that? (passing the connection to read_sql_query() isn't supported)

(I know that I can concatenate the SQL into a single string with ; separating the statements, but this is a simplification of the actual situation where the TEMP tables are created by a multitude of functions which call others nesting 3-4 deep. So, to achieve that would require implementing a layer than can coalesce the SQL across multiple calls before issuing it, which I'd rather avoid implementing if there is a nicer way)

Using -
Pandas: 0.14.1
sqlalchemy: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python distribution (Python 2.7.6)
Josh Kuhn's Netezza SQLAlchemy dialect from https://github.com/deontologician/netezza_sqlalchemy

Answer

ssharma picture ssharma · Dec 28, 2018

You can now pass SQLAlchemy connectable to pandas.read_sql. From the docs:

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

...

con : SQLAlchemy connectable (engine/connection) or database string URI

or DBAPI2 connection (fallback mode)

Using SQLAlchemy makes it possible to use any DB supported by that > library. If a DBAPI2 object, only sqlite3 is supported.

So, this should work:

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=connection)