How can I populate a pandas DataFrame with the result of a Snowflake sql query?

RubenLaguna picture RubenLaguna · Nov 2, 2018 · Viewed 7.7k times · Source

Using the Python Connector I can query Snowflake:

import snowflake.connector

# Gets the version
ctx = snowflake.connector.connect(
    user=USER,
    password=PASSWORD,
    account=ACCOUNT,
    authenticator='https://XXXX.okta.com',
    )
ctx.cursor().execute('USE warehouse MY_WH')
ctx.cursor().execute('USE MYDB.MYSCHEMA')


query = '''
select * from MYDB.MYSCHEMA.MYTABLE
LIMIT 10;
'''

cur = ctx.cursor().execute(query)

The result is a snowflake.connector.cursor.SnowflakeCursor. How can I convert that to a pandas DataFrame?

Answer

RubenLaguna picture RubenLaguna · Nov 2, 2018

You can use DataFrame.from_records() or pandas.read_sql() with snowflake-sqlalchemy. The snowflake-alchemy option has a simpler API

pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])

will return a DataFrame with proper column names taken from the SQL result. The iter(cur) will convert the cursor into an iterator and cur.description gives the names and types of the columns.

So the complete code will be

import snowflake.connector
import pandas as pd

# Gets the version
ctx = snowflake.connector.connect(
    user=USER,
    password=PASSWORD,
    account=ACCOUNT,
    authenticator='https://XXXX.okta.com',
    )
ctx.cursor().execute('USE warehouse MY_WH')
ctx.cursor().execute('USE MYDB.MYSCHEMA')


query = '''
select * from MYDB.MYSCHEMA.MYTABLE
LIMIT 10;
'''

cur = ctx.cursor().execute(query)
df = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])

If you prefer using pandas.read_sql then you can

import pandas as pd

from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL


url = URL(
    account = 'xxxx',
    user = 'xxxx',
    password = 'xxxx',
    database = 'xxx',
    schema = 'xxxx',
    warehouse = 'xxx',
    role='xxxxx',
    authenticator='https://xxxxx.okta.com',
)
engine = create_engine(url)


connection = engine.connect()

query = '''
select * from MYDB.MYSCHEMA.MYTABLE
LIMIT 10;
'''

df = pd.read_sql(query, connection)