How to Load Data into Amazon Redshift via Python Boto3?

Question 1

How to Load Data into Amazon Redshift via Python Boto3?

python amazon-web-services amazon-s3 amazon-redshift boto3

Chris · Jan 25, 2016 · Viewed 26.9k times · Source

Answer

Answer

Right, you need psycopg2 Python module to execute COPY command.

My code looks like this:

import psycopg2
#Amazon Redshift connect string 
conn_string = "dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  
#connect to Redshift (database should be open to the world)
con = psycopg2.connect(conn_string);
sql="""COPY %s FROM '%s' credentials 
      'aws_access_key_id=%s; aws_secret_access_key=%s'
       delimiter '%s' FORMAT CSV %s %s; commit;""" % 
      (to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,delim,quote,gzip)

#Here
#  fn - s3://path_to__input_file.gz
#  gzip = 'gzip'

cur = con.cursor()
cur.execute(sql)
con.close()

I used boto3/psycopg2 to write CSV_Loader_For_Redshift

Question 2

In Amazon Redshift's Getting Started Guide, data is pulled from Amazon S3 and loaded into an Amazon Redshift Cluster utilizing SQLWorkbench/J. I'd like to mimic the same process of connecting to the cluster and loading sample data into the cluster utilizing Boto3.

However in Boto3's documentation of Redshift, I'm unable to find a method that would allow me to upload data into Amazon Redshift cluster.

I've been able to connect with Redshift utilizing Boto3 with the following code:

client = boto3.client('redshift')

But I'm not sure what method would allow me to either create tables or upload data to Amazon Redshift the way it's done in the tutorial with SQLWorkbenchJ.

How to Load Data into Amazon Redshift via Python Boto3?

Answer

Related questions