I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from Amazon Redshift to S3 using Parquet format. Is this feature not supported yet or was I not able to find any documentation about it. Could somebody who has worked on it share some light on this? Thank you.
Redshift Unload to Parquet file format is supported as of Dec 2019:
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
FORMAT PARQUET
It is mentioned in Redshift Features
and also updated in Unload Document
with an example provided in the Unload Examples Document
Excerpt of the official documentation:
The following example unloads the LINEITEM table in Parquet format, partitioned by the l_shipdate column.
unload ('select * from lineitem')
to 's3://mybucket/lineitem/'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
PARQUET
PARTITION BY (l_shipdate);
Assuming four slices, the resulting Parquet files are dynamically partitioned into various folders.
s3://mybucket/lineitem/l_shipdate=1992-01-02/0000_part_00.parquet
0001_part_00.parquet
0002_part_00.parquet
0003_part_00.parquet
s3://mybucket/lineitem/l_shipdate=1992-01-03/0000_part_00.parquet
0001_part_00.parquet
0002_part_00.parquet
0003_part_00.parquet
s3://mybucket/lineitem/l_shipdate=1992-01-04/0000_part_00.parquet
0001_part_00.parquet
0002_part_00.parquet
0003_part_00.parquet