Top "Mapreduce" questions

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

I am getting: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask While trying to make …

hadoop mapreduce hive
Java8: HashMap<X, Y> to HashMap<X, Z> using Stream / Map-Reduce / Collector

I know how to "transform" a simple Java List from Y -> Z, i.e.: List<String> …

java mapreduce java-8 java-stream collectors
Good MapReduce examples

I couldn't think of any good examples other than the "how to count words in a long text with MapReduce" …

mapreduce
Setting the number of map tasks and reduce tasks

I am currently running a job I fixed the number of map task to 20 but and getting a higher number. …

hadoop mapreduce
Container is running beyond memory limits

In Hadoop v1, I have assigned each 7 mapper and reducer slot with size of 1GB, my mappers & reducers runs …

hadoop mapreduce yarn mrv2
Map and Reduce in .NET

What scenarios would warrant the use of the "Map and Reduce" algorithm? Is there a .NET implementation of this algorithm?

c# mapreduce
Count lines in large files

I commonly work with text files of ~20 Gb size and I find myself counting the number of lines in a …

linux mapreduce
What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?

In Map Reduce programming the reduce phase has shuffling, sorting and reduce as its sub-parts. Sorting is a costly affair. …

sorting hadoop mapreduce hdfs shuffle
Hive ParseException - cannot recognize input near 'end' 'string'

I am getting the following error when trying to create a Hive table from an existing DynamoDB table: NoViableAltException(88@[]) at …

hadoop mapreduce hive bigdata amazon-dynamodb
Reduce a key-value pair into a key-list pair with Apache Spark

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, …

python apache-spark mapreduce pyspark rdd