Top "Mapreduce" questions

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

How does the MapReduce sort algorithm work?

One of the main examples that is used in demonstrating the power of MapReduce is the Terasort benchmark. I'm having …

algorithm sorting parallel-processing hadoop mapreduce
Change File Split size in Hadoop

I have a bunch of small files in an HDFS directory. Although the volume of the files is relatively small, …

java hadoop mapreduce distributed-computing
What is a container in YARN?

What is a container in YARN? Is it same as the child JVM in which the tasks on the nodemanager …

hadoop mapreduce yarn
Can apache spark run without hadoop?

Are there any dependencies between Spark and Hadoop? If not, are there any features I'll miss when I run Spark …

hadoop amazon-s3 apache-spark mapreduce mesos
How does Hadoop perform input splits?

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the …

hadoop mapreduce hdfs
Is there a .NET equivalent to Apache Hadoop?

So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler. …

c# .net hadoop mapreduce
Merging two collections in MongoDB

I've been trying to use MapReduce in MongoDB to do what I think is a simple procedure. I don't know …

mongodb mapreduce
Find all duplicate documents in a MongoDB collection by a key field

Suppose I have a collection with some set of documents. something like this. { "_id" : ObjectId("4f127fa55e7242718200002d"), "id":1, "…

mongodb mapreduce duplicates aggregation-framework
Is gzip format supported in Spark?

For a Big Data project, I'm planning to use spark, which has some nice features like in-memory-computations for repeated workloads. …

java scala mapreduce gzip apache-spark
Hadoop : java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

My program looks like public class TopKRecord extends Configured implements Tool { public static class MapClass extends Mapper<Text, Text, …

java hadoop mapreduce