Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop …
hadoop hive apache-pigWhat are the benefits of using either Hadoop or HBase or Hive ? From my understanding, HBase avoids using map-reduce and …
hadoop hbase hive apache-pigI did something like this to count the number of rows in an alias in PIG: logs = LOAD 'log' logs_…
hadoop apache-pigI have a file stored in HDFS as part-m-00000.gz.parquet I've tried to run hdfs dfs -text dir/part-m-00000.…
hadoop apache-pig hdfs parquetI have a pig job where in I need to filter the data by finding a word in it, Here …
hadoop apache-pigI get multiple small files into my input directory which I want to merge into a single file without using …
hadoop apache-pigI have the following scenario- Pig version used 0.70 Sample HDFS directory structure: /user/training/test/20100810/<data files> /user/…
hadoop apache-pigFrom this: (1, {(1,2), (1,3), (1,4)} ) (2, {(2,5), (2,6), (2,7)} ) ...How could we generate this? ((1,2),(1,3),(1,4)) ((2,5),(2,6),(2,7)) ...And how could we generate this? (1, 2, 3, 4) (2, 5, 6, 7) For a single row I know how …
hadoop apache-pigI would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible …
apache-pigIs there a way to do this? eg, pass the name of the file to be processed, etc?
hadoop apache-pig