How can I add a header row to files created from Pig (Hadoop)?

Ryan Guest picture Ryan Guest · Jan 7, 2013 · Viewed 8.3k times · Source

I'm writing a pig latin script similar to the following:

A = load 'data' using PigStorage('\t');
store A into my_data using PigStorage();

This outputs

(Bob, 10, 4.0)
(Jim, 11, 3.25)
(Paul, 9, 2.75)

I'd like to add a first header row to each file stored in HDFS

(Name, Age, GPA)
(Bob, 10, 4.0)
(Jim, 11, 3.25)
(Paul, 9, 2.75)

Any ideas?

Answer

Alastor Moody picture Alastor Moody · Jul 1, 2015

You can use CSVExcelStorage as the storage function which allows you to do precisely what you want:

STORE output INTO '/outputfolder/' USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t', 'NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

Using the "WRITE_OUTPUT_HEADER" option will write the header to every file which satisfies your use case.