Inspect Parquet from command line

sds picture sds · Mar 21, 2016 · Viewed 84.7k times · Source

How do I inspect the content of a Parquet file from the command line?

The only option I see now is

$ hadoop fs -get my-path local-file
$ parquet-tools head local-file | less

I would like to

  1. avoid creating the local-file and
  2. view the file content as json rather than the typeless text that parquet-tools prints.

Is there an easy way?

Answer

gil.fernandes picture gil.fernandes · Nov 14, 2017

You can use parquet-tools with the command cat and the --json option in order to view the files without a local copy and in the JSON format.

Here is an example:

parquet-tools cat --json hdfs://localhost/tmp/save/part-r-00000-6a3ccfae-5eb9-4a88-8ce8-b11b2644d5de.gz.parquet

This prints out the data in JSON format:

{"name":"gil","age":48,"city":"london"}
{"name":"jane","age":30,"city":"new york"}
{"name":"jordan","age":18,"city":"toronto"}

Disclaimer: this was tested in Cloudera CDH 5.12.0