Using psycopg2 with Lambda to Update Redshift (Python)

awsQuestion picture awsQuestion · Apr 13, 2016 · Viewed 24.2k times · Source

I am attempting to update Redshift from a Lambda function using python. To do this, I am attempting to combine 2 code fragments. Both fragments are functional when I run them separately.

  1. Updating Redshift from PyDev for Eclipse

    import psycopg2
    
    conn_string = "dbname='name' port='0000' user='name' password='pwd' host='url'"
    conn = psycopg2.connect(conn_string)
    
    cursor = conn.cursor()
    
    cursor.execute("UPDATE table SET attribute='new'")
    conn.commit()
    cursor.close()
    
  2. Receiving Content Uploaded to S3 Bucket (Pre-Built Template Available on Lambda)

    from __future__ import print_function
    
    import json
    import urllib
    import boto3
    
    print('Loading function')
    
    s3 = boto3.client('s3')
    
    
    def lambda_handler(event, context):
        #print("Received event: " + json.dumps(event, indent=2))
    
        # Get the object from the event and show its content type
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
    
        try:
            response = s3.get_object(Bucket=bucket, Key=key)
            print("CONTENT TYPE: " + response['ContentType'])
            return response['ContentType']
    
        except Exception as e:
            print(e)
            print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
            raise e
    

Since both of these segments worked, I tried to combine them so that I could update Redshift upon the upload of a file to s3:

from __future__ import print_function

import json
import urllib
import boto3
import psycopg2

print('Loading function')

s3 = boto3.client('s3')


def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')

    conn_string = "dbname='name' port='0000' user='name' password='pwd' host='url'"

    conn = psycopg2.connect(conn_string)

    cursor = conn.cursor()

    cursor.execute("UPDATE table SET attribute='new'")
    conn.commit()
    cursor.close()

    try:
        response = s3.get_object(Bucket=bucket, Key=key)
        print("CONTENT TYPE: " + response['Body'].read())
        return response['Body'].read()
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Since I am using an outside library, I need to create a deployment package. I created a new folder (lambda_function1) and moved my .py file (lambda_function1.py) to that folder. I ran the following command to install psycopg2 in that folder:

pip install psycopg2 -t \lambda_function1

I receive the following feedback:

Collecting psycopg2
  Using cached psycopg2-2.6.1-cp34-none-win_amd64.whl
Installing collected packages: psycopg2
Successfully installed psycopg2-2.6.1 

I then zipped the contents of the directory. And uploaded that zip to my lambda function. When I upload a document to the bucket the function monitors, I receive the following error in my cloudwatch log:

Unable to import module 'lambda_function1': No module named _psycopg 

When I look in the library, the only thing named "_psycopg" is "_psycopg.pyd".

What is causing this problem? Does it matter that Lambda uses Python 2.7 when I use 3.4? Does it matter that I zipped the contents of my file on a Windows machine? Has anyone been able to successfully connect to Redshift from lambda?

Answer

Vor picture Vor · Apr 13, 2016

In order for this to work you need to build psycopg2 with statically linked libpq.so library. Check out this repo https://github.com/jkehler/awslambda-psycopg2. It has already build psycopg2 package and instructions how to build it yourself.

Back to your questions:

What is causing this problem?

psycopg2 needs to be build an compiled with statically linked libraries for Linux.

Does it matter that Lambda uses Python 2.7 when I use 3.4?

Yes it does, lambda only supports 2.7 version. Just create virtual environment and install all necessary packages in there.

Does it matter that I zipped the contents of my file on a Windows machine?

As long as all the libraries you zipped could ran on Linux it doesn't

Has anyone been able to successfully connect to Redshift from lambda?

yes.