How to write parquet file from pandas dataframe in S3 in python

Alexsander picture Alexsander · Nov 21, 2018 · Viewed 23.4k times · Source

I have a pandas dataframe. i want to write this dataframe to parquet file in S3. I need a sample code for the same.I tried to google it. but i could not get a working sample code.

Answer

Wai Kiat picture Wai Kiat · Nov 27, 2018

For your reference, I have the following code works.

s3_url = 's3://bucket/folder/bucket.parquet.gzip'
df.to_parquet(s3_url, compression='gzip')

In order to use to_parquet, you need pyarrow or fastparquet to be installed. Also, make sure you have correct information in your config and credentials files, located at .aws folder.

Edit: Additionally, s3fs is needed. see https://stackoverflow.com/a/54006942/1862909