Athena date_parse for date with optional millisecond field

Scorpion picture Scorpion · Sep 12, 2018 · Viewed 10k times · Source

I have date in S3 using which I created an Athena table. I have some date entries in S3 in json format which Athena is not accepting as either Date or timestamp when am running the queries.

Using AWS Athena which uses Prestodb as query engine

Example json :


    {"creationdate":"2018-09-12T15:49:07.269Z", "otherfield":"value1"}
    {"creationdate":"2018-09-12T15:49:07Z", "otherfield":"value2"}

AWS Glue is taking both the fields as string and when am changing them to timestamp and date respectively the queries around timestamp are not working giving ValidationError on the timestamp field.

Anyway, I found a way to use prestodb date_parse function but its not working either since some fields have milliseconds while other not.


    parse_datetime(creationdate, '%Y-%m-%dT%H:%i:%s.%fZ')
    parse_datetime(creationdate, '%Y-%m-%dT%H:%i:%sZ')

Both are failing because of different entries present i.e. one with millisecond %f and one without Is there a way to provide a parser, regex so that am able to convert these strings into Date during sql query execution?

Answer

jens walter picture jens walter · Sep 13, 2018

Instead of providing the timestamp format, you can use the from_iso8601_timestamp function.

This way, all timestamps get parsed.

select from_iso8601_timestamp(creationdate) from table1;