How can directly access AWS S3 buckets as a dataset in RStudioServer?

Josh Beauregard picture Josh Beauregard · Mar 19, 2016 · Viewed 8.7k times · Source

I have multiple s3 buckets on an aws account and I also have a EC2 machine running Rstudio Pro. I would like to access my S3 buckets (that are several Terabytes of data each).

I would like to be able to set up rstudio to mount the buckets as Data sets with out having to copy the whole thing into an EBS before reading it every time.

Any help would be great.

Answer

spsaaibi picture spsaaibi · Mar 19, 2016

It seems that you could try the aws.s3 package from the cloudyr project, https://github.com/cloudyr/aws.s3.

With this, assuming you have your data on a private bucket, you could access it as follows:

aws.s3::getbucket(
bucket = 'hpk',
key = YOUR_AWS_ACCESS_KEY,
secret = YOUR_AWS_SECRET_ACCESS_KEY
)

Hopefully this will help you accessing data from your buckets. You can then also try aws.ec2 to communicate with your ec2 machine.