using PIG to load a file

YuliaPro picture YuliaPro · Nov 11, 2011 · Viewed 26k times · Source

I am very new to PIG and I am having what feels like a very basic problem. I have a line of code that reads:

A = load 'Sites/trial_clustering/shortdocs/*'
      AS (word1:chararray, word2:chararray, word3:chararray, word4:chararray);

where each file is basically a line of 4 comma separated words. However PIG is not splitting this into the 4 words. When I do dump A, I get: (Money, coins, loans, debt,,,) I have tried googling and I cannot seem to find what format my file needs to be in so that PIG will interpret it properly. Please help!

Answer

Donald Miner picture Donald Miner · Nov 12, 2011

Your problem is that Pig, by default, loads files delimited by tab, not comma. What's happening is "Money, coins, loans, debt" are getting stuck in your first column, word1. When you are printing it, you get the illusion that you have multiple columns, but really the first one is filled with your whole line, then the others are null.

To fix this, you should specify PigStorage to load by comma by doing:

A = LOAD '...' USING PigStorage(',') AS (...);