How do you make a HIVE table out of JSON data?

nickponline picture nickponline · Jul 14, 2012 · Viewed 93.1k times · Source

I want to create a Hive table out of some JSON data (nested) and run queries on it? Is this even possible?

I've gotten as far as uploading the JSON file to S3 and launching an EMR instance but I don't know what to type in the hive console to get the JSON file to be a Hive table?

Does anyone have some example command to get me started, I can't find anything useful with Google ...

Answer

Mike Repass picture Mike Repass · Nov 13, 2012

It's actually not necessary to use the JSON SerDe. There is a great blog post here (I'm not affiliated with the author in any way):

http://pkghosh.wordpress.com/2012/05/06/hive-plays-well-with-json/

Which outlines a strategy using the builtin-function json_tuple to parse the json at time of query (NOT at the time of table definition):

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-json_tuple

So basically, your table schema is simply to load each line as a single 'string' column and then extract the relevant json fields as needed on a per query basis. e.g. this query from that blog post:

SELECT b.blogID, c.email FROM comments a LATERAL VIEW json_tuple(a.value, 'blogID', 'contact') b 
AS blogID, contact  LATERAL VIEW json_tuple(b.contact, 'email', 'website') c 
AS email, website WHERE b.blogID='64FY4D0B28';

In my humble experience, this has proven more reliable (I encountered various cryptic issues dealing with the JSON serdes, especially with nested objects).