How do I inspect the content of a Parquet file from the command line?
The only option I see now is
$ hadoop fs -get my-path local-file
$ parquet-tools head local-file | less
I would like to
local-file
and json
rather than the typeless text that parquet-tools
prints.Is there an easy way?
You can use parquet-tools
with the command cat
and the --json
option in order to view the files without a local copy and in the JSON format.
Here is an example:
parquet-tools cat --json hdfs://localhost/tmp/save/part-r-00000-6a3ccfae-5eb9-4a88-8ce8-b11b2644d5de.gz.parquet
This prints out the data in JSON format:
{"name":"gil","age":48,"city":"london"}
{"name":"jane","age":30,"city":"new york"}
{"name":"jordan","age":18,"city":"toronto"}
Disclaimer: this was tested in Cloudera CDH 5.12.0