I have a lot of gzip'd log files in s3 that has 3 types of log lines: b,c,i. i and c are both single level json:
{"this":"that","test":"4"}
Type b is deeply nested json. I came across this gist talking about compiling a jar to make this work. Since my java skills are less than stellar, I didn't really know what to do from here.
{"this":{"foo":"bar","baz":{"test":"me"},"total":"5"}}
Since types i and c are not always in the same order, this makes specifying everything in the generate regex difficult. Is handling JSON (in a gzip'd file) possible with Pig? I am using whichever version of Pig comes built on an Amazon Elastic Map Reduce instance.
This boils down to two questions: 1) Can I parse JSON with Pig (and if so, how)? 2) If I can parse JSON (from a gzip'd logfile), can I parse nested JSON objects?
Pig 0.10 comes with builtin JsonStorage and JsonLoader().