Top "Apache-pig" questions

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

Conditional Filter in GROUP BY in Pig

I have the following dataset in which I need to merge multiple rows into one if they have the same …

hadoop apache-pig hadoop-streaming
Storing results of UNION in PIG in a single file

I have a PIG Script which produces four results I want to store all of them in a single file. …

hadoop apache-pig hdfs
pig to hadoop issue: Server IPC version 7 cannot communicate with client version 4

I am trying to get pig started and failing: $ pig 2013-05-10 18:03:22,972 [main] INFO org.apache.pig.Main - Apache …

hadoop apache-pig
How can I add a header row to files created from Pig (Hadoop)?

I'm writing a pig latin script similar to the following: A = load 'data' using PigStorage('\t'); store A into …

hadoop apache-pig
Storing data to SequenceFile from Apache Pig

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/…

hadoop apache-pig
In Apache Pig, select DISTINCT rows based on a single column

Let's say I have a table such as the one below, that may or may not contain duplicates for a …

group-by apache-pig distinct