How to Load Data into Amazon Redshift via Python Boto3?

Chris picture Chris · Jan 25, 2016 · Viewed 26.9k times · Source

In Amazon Redshift's Getting Started Guide, data is pulled from Amazon S3 and loaded into an Amazon Redshift Cluster utilizing SQLWorkbench/J. I'd like to mimic the same process of connecting to the cluster and loading sample data into the cluster utilizing Boto3.

However in Boto3's documentation of Redshift, I'm unable to find a method that would allow me to upload data into Amazon Redshift cluster.

I've been able to connect with Redshift utilizing Boto3 with the following code:

client = boto3.client('redshift')

But I'm not sure what method would allow me to either create tables or upload data to Amazon Redshift the way it's done in the tutorial with SQLWorkbenchJ.

Answer

Alex B picture Alex B · Mar 29, 2016

Right, you need psycopg2 Python module to execute COPY command.

My code looks like this:

import psycopg2
#Amazon Redshift connect string 
conn_string = "dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  
#connect to Redshift (database should be open to the world)
con = psycopg2.connect(conn_string);
sql="""COPY %s FROM '%s' credentials 
      'aws_access_key_id=%s; aws_secret_access_key=%s'
       delimiter '%s' FORMAT CSV %s %s; commit;""" % 
      (to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,delim,quote,gzip)

#Here
#  fn - s3://path_to__input_file.gz
#  gzip = 'gzip'

cur = con.cursor()
cur.execute(sql)
con.close() 

I used boto3/psycopg2 to write CSV_Loader_For_Redshift