Top "Mapreduce" questions

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

Load only particular field in PIG?

This is my file: Col1, Col2, Col3, Col4, Col5 I need only Col2 and Col3. Currently I'm doing this: a = …

hadoop mapreduce apache-pig
Number of reducers in hadoop

I was learning hadoop, I found number of reducers very confusing : 1) Number of reducers is same as number of partitions. 2) …

hadoop mapreduce hadoop2 reducers bigdata
Cassandra NOT EQUAL Operator

Question to all Cassandra experts out there. I have a column family with about a million records. I would like …

mapreduce cassandra cql3
Hadoop namenode : Single point of failure

The Namenode in the Hadoop architecture is a single point of failure. How do people who have large Hadoop clusters …

hadoop mapreduce hdfs yarn hadoop2
How to find optimal number of mappers when running Sqoop import and export?

I'm using Sqoop version 1.4.2 and Oracle database. When running Sqoop command. For example like this: ./sqoop import \ --fs <name …

oracle hadoop mapreduce hdfs sqoop
Python MapReduce Hadoop Streaming Job that requires multiple input files?

I have two files in my cluster File A and File B with the following data - File A #Format: #…

python hadoop mapreduce hadoop-streaming
Got InterruptedException while executing word count mapreduce job

I have installed Cloudera VM version 5.8 on my machine. When I execute word count mapreduce job, it throws below exception. `16/09/06 06:55:49 …

hadoop mapreduce cloudera hortonworks-data-platform hortonworks-sandbox
How the data is split in Hadoop

Does the Hadoop split the data based on the number of mappers set in the program? That is, having a …

hadoop mapreduce hadoop-partitioning
How to optimize shuffling/sorting phase in a hadoop job

I'm doing some data preparation using a single node hadoop job. The mapper/combiner in my job outputs many keys (…

hadoop mapreduce hadoop2
Large Block Size in HDFS! How is the unused space accounted for?

We all know that the block size in HDFS is pretty large (64M or 128M) as compared to the block …

hadoop mapreduce hdfs