Get a few lines of HDFS data

Unmesha SreeVeni picture Unmesha SreeVeni · Feb 28, 2014 · Viewed 50.9k times · Source

I am having a 2 GB data in my HDFS.

Is it possible to get that data randomly. Like we do in the Unix command line

cat iris2.csv |head -n 50

Answer

Viacheslav Rodionov picture Viacheslav Rodionov · Feb 28, 2014

Native head

hadoop fs -cat /your/file | head

is efficient here, as cat will close the stream as soon as head will finish reading all the lines.

To get the tail there is a special effective command in hadoop:

hadoop fs -tail /your/file

Unfortunately it returns last kilobyte of the data, not a given number of lines.