How to Query parquet data from Amazon Athena?

Question 1

How to Query parquet data from Amazon Athena?

amazon-web-services parquet amazon-athena

rajeswari · Mar 14, 2017 · Viewed 10.5k times · Source

Answer

Answer

If your data has been successfully stored in Parquet format, you would then create a table definition that references those files.

Here is an example statement that uses Parquet files:

CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs_pq (
  request_timestamp string,
  elb_name string,
  request_ip string,
  request_port int,
  ...
  ssl_protocol string )
PARTITIONED BY(year int, month int, day int) 
STORED AS PARQUET
LOCATION 's3://athena-examples/elb/parquet/'
tblproperties ("parquet.compress"="SNAPPY");

This example was taken from the AWS blog post Analyzing Data in S3 using Amazon Athena that does an excellent job of explaining the benefits of using compressed and partitioned data in Amazon Athena.

Question 2

Athena creates a temporary table using fields in S3 table. I have done this using JSON data. Could you help me on how to create table using parquet data?

I have tried following:

Converted sample JSON data to parquet data.
Uploaded parquet data to S3.
Created temporary table using columns of JSON data.

By doing this I am able to a execute query but the result is empty.

Is this approach right or is there any other approach to be followed on parquet data?

Sample json data:

{"_id":"0899f824e118d390f57bc2f279bd38fe","_rev":"1-81cc25723e02f50cb6fef7ce0b0f4f38","deviceId":"BELT001","timestamp":"2016-12-21T13:04:10:066Z","orgid":"fedex","locationId":"LID001","UserId":"UID001","SuperviceId":"SID001"},
{"_id":"0899f824e118d390f57bc2f279bd38fe","_rev":"1-81cc25723e02f50cb6fef7ce0b0f4f38","deviceId":"BELT001","timestamp":"2016-12-21T13:04:10:066Z","orgid":"fedex","locationId":"LID001","UserId":"UID001","SuperviceId":"SID001"}

How to Query parquet data from Amazon Athena?

Answer

Related questions