Advantages of Sequence file over hdfs textfile

hrkrshn picture hrkrshn · Aug 2, 2012 · Viewed 10.1k times · Source

What is the advantage of Hadoop Sequence File over HDFS flat file(Text)? In what way Sequence file is efficient?

Small files can be combined and written into a sequence file, but the same can be done for a HDFS text file also. Need to know the difference between the two ways. I have been googling about this for a while, would be helpful if i get clarity on this?

Answer

Razvan picture Razvan · Aug 2, 2012
  1. Sequence files are appropriate for situations in which you want to store keys and their corresponding values. For text files you can do that but you have to parse each line.
  2. Can be compressed and still be splittable which means better workload. You can't split a compressed text file unless you use a splittable compression format.
  3. Can be approached as binary files => more storage efficient. In a text file a double will be a number of chars => large storage overhead.