Return Pandas dataframe from PostgreSQL query with sqlalchemy

lmart999 picture lmart999 · Jan 11, 2015 · Viewed 67.7k times · Source

I want to query a PostgreSQL database and return the output as a Pandas dataframe.

I created a connection to the database with 'SqlAlchemy':

from sqlalchemy import create_engine
engine = create_engine('postgresql://user@localhost:5432/mydb')

I write a Pandas dataframe to a database table:

i=pd.read_csv(path)
i.to_sql('Stat_Table',engine,if_exists='replace')

Based on the docs, looks like pd.read_sql_query() should accept a SQLAlchemy engine:

a=pd.read_sql_query('select * from Stat_Table',con=engine)

But it throws an error:

ProgrammingError: (ProgrammingError) relation "stat_table" does not exist

I'm using Pandas version 0.14.1.

What's the right way to do this?

Answer

joris picture joris · Jan 11, 2015

You are bitten by the case (in)sensitivity issues with PostgreSQL. If you quote the table name in the query, it will work:

df = pd.read_sql_query('select * from "Stat_Table"',con=engine)

But personally, I would advise to just always use lower case table names (and column names), also when writing the table to the database to prevent such issues.


From the PostgreSQL docs (http://www.postgresql.org/docs/8.0/static/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS):

Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case

To explain a bit more: you have written a table with the name Stat_Table to the database (and sqlalchemy will quote this name, so it will be written as "Stat_Table" in the postgres database). When doing the query 'select * from Stat_Table' the unquoted table name will be converted to lower case stat_table, and so you get the message that this table is not found.

See eg also Are PostgreSQL column names case-sensitive?