Amazon Aurora 1.8 Load Data From S3 - Cannot Instantiate S3 Client

aarbor picture aarbor · Oct 25, 2016 · Viewed 15.2k times · Source

With the latest Aurora update (1.8), the command LOAD DATA FROM S3 was introduced. Has anyone gotten this to work? After upgrading to 1.8, I followed the setup guide Here to create the Role to allow access from RDS to S3.

After rebooting the server and trying to run the command

LOAD DATA FROM S3 PREFIX 's3://<bucket_name>/prefix' INTO TABLE table_name

in SQL Workbench/J, I get the errors:

Warnings:
S3 API returned error: Missing Credentials: Cannot instantiate S3 Client
S3 API returned error: Failed to instantiate S3 Client
Internal error: Unable to initialize S3Stream

Are there any additional steps required? Can I only run this from the SDK? I don't see that mentioned anywhere in the documents

Answer

Dan Carrington picture Dan Carrington · Feb 17, 2017

I had the same issue. I tried adding AmazonS3FullAccess to the IAM role that my RDS instances were using...no joy.

After poking around, I went into the RDS console, to Clusters. Selected my Aurora cluster and clicked Manage IAM Roles. It gave me a drop-down, I selected the IAM role (same one that the individual instances were using).

Once I did that, all was well and data load was nice and fast.

So, there are (for us) 5 steps/components:

1) The S3 bucket and bucket policy to allow a user to upload the object

{
    "Version": "2012-10-17",
    "Id": "Policy1453918146601",
    "Statement": [
        {
            "Sid": "Stmt1453917898368",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account id>:<user/group/role>/<IAM User/Group/Role>"
            },
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::<bucket name>/*"
        }
    ]
}

The "Principal" would be whatever IAM user, group or role will be uploading the data files to the bucket so that the RDS instance can import the data.

2) The IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486490368000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket name>/*"
            ]
        }
    ]
}

This is pretty simple with the Policy Generator.

3) Create the IAM Role:

This role should be assigned the IAM policy above. You can probably do an inline policy, too, if you're not going to use this policy for other roles down the line, but I like the idea of having a defined policy that I can reference later if I have a need.

4) Configure a Parameter Group that your cluster/instances will use to set the aws_default_s3_role value to the ARN of the role from #3 above.

5) Configure the Aurora Cluster by going to Clusters, selecting your cluster, selecting Manage IAM Roles and setting the IAM Role for your DB Cluster

At least for me, these steps worked like a charm.

Hope that helps!