Top "Apache-pig" questions

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

Difference between Pig and Hive? Why have both?

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop …

hadoop hive apache-pig
When to use Hadoop, HBase, Hive and Pig?

What are the benefits of using either Hadoop or HBase or Hive ? From my understanding, HBase avoids using map-reduce and …

hadoop hbase hive apache-pig
PIG how to count a number of rows in alias

I did something like this to count the number of rows in an alias in PIG: logs = LOAD 'log' logs_…

hadoop apache-pig
How do I get schema / column names from parquet file?

I have a file stored in HDFS as part-m-00000.gz.parquet I've tried to run hdfs dfs -text dir/part-m-00000.…

hadoop apache-pig hdfs parquet
Filter a string on the basis of a word

I have a pig job where in I need to filter the data by finding a word in it, Here …

hadoop apache-pig
Merging multiple files into one within Hadoop

I get multiple small files into my input directory which I want to merge into a single file without using …

hadoop apache-pig
Pig Latin: Load multiple files from a date range (part of the directory structure)

I have the following scenario- Pig version used 0.70 Sample HDFS directory structure: /user/training/test/20100810/<data files> /user/…

hadoop apache-pig
How to flatten a group into a single tuple in Pig?

From this: (1, {(1,2), (1,3), (1,4)} ) (2, {(2,5), (2,6), (2,7)} ) ...How could we generate this? ((1,2),(1,3),(1,4)) ((2,5),(2,6),(2,7)) ...And how could we generate this? (1, 2, 3, 4) (2, 5, 6, 7) For a single row I know how …

hadoop apache-pig
How to perform a DISTINCT in Pig Latin on a subset of columns?

I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible …

apache-pig
Hadoop Pig: Passing Command Line Arguments

Is there a way to do this? eg, pass the name of the file to be processed, etc?

hadoop apache-pig