Top "Mapreduce" questions

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

Simple word count MapReduce example yielding strange results

I am having a strange problem with a Hadoop Map/Reduce job. The job submits correctly, runs, but produces incorrect/…

java hadoop mapreduce hortonworks-data-platform
Chaining multiple mapreduce tasks in Hadoop streaming

I am in scenario where I have two mapreduce jobs. I am more comfortable with python and planning to use …

python hadoop mapreduce hadoop-plugins
MongoDB MapReduce - Emit one key/one value doesnt call reduce

So i'm new with mongodb and mapreduce in general and came across this "quirk" (or atleast in my mind a …

mongodb mapreduce pymongo
How to get Filename/File Contents as key/value input for MAP when running a Hadoop MapReduce Job?

I am creating a program to analyze PDF, DOC and DOCX files. These files are stored in HDFS. When I …

java hadoop mapreduce distributed-system
How do I access DistributedCache in Hadoop Map/Reduce jobs?

I'm trying to pass a small file to a job I'm running using the GenericOptionsParser's -files flag: $ hadoop jar MyJob.…

hadoop mapreduce distributed-cache
Unable to copy files from local disk to HDFS

i have successfully installed ubuntu 12.04 and hadoop 2.4.0. after entering the jps command i find the output as below 4135 jps 2582 SeconadaryNameNode 3143 …

ubuntu hadoop mapreduce hdfs word-count
Hadoop MapReduce vs MPI (vs Spark vs Mahout vs Mesos) - When to use one over the other?

I am new to parallel computing and just starting to try out MPI and Hadoop+MapReduce on Amazon AWS. But …

hadoop parallel-processing mapreduce mpi
Iterate through ArrayWritable - NoSuchMethodException

I just started working with MapReduce, and I'm running into a weird bug that I haven't been able to answer …

hadoop mapreduce iteration nosuchmethoderror
How to import a custom module in a MapReduce job?

I have a MapReduce job defined in main.py, which imports the lib module from lib.py. I use Hadoop …

python mapreduce hadoop-streaming
Hadoop MapReduce job I/O Exception due to premature EOF from inputStream

I ran a MapReduce program using the command hadoop jar <jar> [mainClass] path/to/input path/to/output. …

hadoop mapreduce runtime-error eof ioexception