Top "Apache-pig" questions

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

Computing median in map reduce

Can someone example the computation of median/quantiles in map reduce? My understanding of Datafu's median is that the 'n' …

hadoop statistics mapreduce apache-pig median
how to include external jar file using PIG

When I run a mapreduce job using hadoop command, I use -libjars to setup my jar to the cache and …

hadoop apache-pig
Join vs COGROUP in PIG

Are there any advantages (wrt performance / no of map reduces ) when i use COGROUP instead of JOIN in pig ? http://…

hadoop apache-pig
Can I generate nested bags using nested FOREACH statements in Pig Latin?

Let's say I have a data set of restaurant reviews: User,City,Restaurant,Rating Jim,New York,Mecurials,3 Jim,New …

apache-pig
strsplit issue - Pig

I have following tuple H1 and I want to strsplit its $0 into tuple.However I always get an error message: …

apache-pig
Removing duplicates using PigLatin

I'm using PigLatin to filter some records. User1 8 NYC User1 9 NYC User1 7 LA User2 4 NYC User2 3 DC The script should …

apache-pig
Calculate count of distinct values of a field using pig script

For a file of the form A B user1 C D user2 A D user3 A D user1 I want …

hadoop apache-pig
Load only particular field in PIG?

This is my file: Col1, Col2, Col3, Col4, Col5 I need only Col2 and Col3. Currently I'm doing this: a = …

hadoop mapreduce apache-pig
What is the best Pig plugin for Eclipse?

I'm about to start playing around with PIG-latin, and I was hoping to get some text highlighting and such for …

eclipse eclipse-plugin editor apache-pig
STORE output to a single CSV?

Currently, when I STORE into HDFS, it creates many part files. Is there any way to store out to a …

apache-pig