I'm writing a pig latin script similar to the following:
A = load 'data' using PigStorage('\t');
store A into my_data using PigStorage();
This outputs
(Bob, 10, 4.0)
(Jim, 11, 3.25)
(Paul, 9, 2.75)
I'd like to add a first header row to each file stored in HDFS
(Name, Age, GPA)
(Bob, 10, 4.0)
(Jim, 11, 3.25)
(Paul, 9, 2.75)
Any ideas?
You can use CSVExcelStorage as the storage function which allows you to do precisely what you want:
STORE output INTO '/outputfolder/' USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t', 'NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');
Using the "WRITE_OUTPUT_HEADER" option will write the header to every file which satisfies your use case.