Athena query results at specific path on S3

siberiancrane picture siberiancrane · Nov 23, 2017 · Viewed 8.2k times · Source

I am aware that running a saved Athena query stores results in an Amazon S3 location based on the name of the query and the date the query ran, as follows:

QueryLocation}/{QueryName|Saved}/{yyyy}/{mm}/{dd}/{QueryID}/

Is it possible to override this and store it on a path similar to

QueryLocation}/QueryName

overwriting the results file whenever the query is run?

Surely it can be done by renaming (moving + deleting) file on S3 but wonder if there is a straighter way.

Answer

H6. picture H6. · May 14, 2019

Create Table as Select

Another way of storing Athena query results at a specific location in S3 is to use a CTAS-Query (CREATE TABLE AS SELECT).

Using this has tons of advantages, because you can even specify the result format. Gzipped JSON, Parquet etc...

CREATE TABLE default.my_result_table
WITH 
(
  format='JSON',
  external_location='s3://MY_BUCKET/MY_KEY/...'
) AS

SELECT * FROM MY_SOURCE_TABLE WHERE ....

You can even specify directly the partitions you want to use (for example

CREATE TABLE default.my_result_table
WITH 
(
  format='Parquet',
  external_location='s3://MY_BUCKET/MY_KEY/...',
  partitioned_by = ARRAY['parition_key_1', ...]
) AS

SELECT * FROM MY_SOURCE_TABLE WHERE ....