How to extract keys in a nested json array object in Presto?

k-kawa picture k-kawa · Sep 9, 2015 · Viewed 29.5k times · Source

I'm using the latest(0.117) Presto and trying to execute CROSS JOIN UNNEST with complex JSON array like this.

[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}, ...]

To do that, first I tried to make an ARRAY with the values of id by

SELECT CAST(JSON_EXTRACT('[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}]', '$..id') AS ARRAY<BIGINT>)

but it doesn't work.

What is the best JSON Path to extract the values of id?

Answer

Attila Dobi picture Attila Dobi · May 29, 2018

This will solve your problem. It is more generic cast to an ARRAY of json (less prone to errors given an arbitrary map structure):

select 
        TRANSFORM(CAST(JSON_PARSE(arr1) AS ARRAY<JSON>), 
                   x -> JSON_EXTRACT_SCALAR(x, '$.id'))
from 
       (values ('[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}]')) t(arr1)

Output in presto:

 [1,2]

... I ran into a situation where a list of jsons was nested within a json. My list of jsons had an ambiguous nested map structure. The following code returns an array of values given a specific key in a list of jsons.

  1. Extract the list using JSON EXTRACT
  2. Cast the list as an array of jsons
  3. Loop through the json elements in the array using the TRANSFORM function and extract the value of the key that you are interested in.

>

TRANSFORM(CAST(JSON_EXTRACT(json, '$.path.toListOfJSONs') AS ARRAY<JSON>),
          x -> JSON_EXTRACT_SCALAR(x, '$.id')) as id