I have a pandas dataframe. i want to write this dataframe to parquet file in S3. I need a sample code for the same.I tried to google it. but i could not get a working sample code.
For your reference, I have the following code works.
s3_url = 's3://bucket/folder/bucket.parquet.gzip'
df.to_parquet(s3_url, compression='gzip')
In order to use to_parquet
, you need pyarrow
or fastparquet
to be installed. Also, make sure you have correct information in your config
and credentials
files, located at .aws
folder.
Edit: Additionally, s3fs
is needed. see https://stackoverflow.com/a/54006942/1862909