I am having a 2 GB
data in my HDFS
.
Is it possible to get that data randomly. Like we do in the Unix command line
cat iris2.csv |head -n 50
Native head
hadoop fs -cat /your/file | head
is efficient here, as cat will close the stream as soon as head will finish reading all the lines.
To get the tail there is a special effective command in hadoop:
hadoop fs -tail /your/file
Unfortunately it returns last kilobyte of the data, not a given number of lines.