Top "Mapreduce" questions

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

Explode the Array of Struct in Hive

This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY<…

hadoop mapreduce hive hiveql
Chaining multiple MapReduce jobs in Hadoop

In many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps. i.e. Map1 , …

hadoop mapreduce
MongoDB Stored Procedure Equivalent

I have a large CSV file containing a list of stores, in which one of the field is ZipCode. I …

stored-procedures mongodb geolocation mapreduce
merge output files after reduce phase

In mapreduce each reduce task write its output to a file named part-r-nnnnn where nnnnn is a partition ID associated …

hadoop mapreduce
Simple explanation of MapReduce?

Related to my CouchDB question. Can anyone explain MapReduce in terms a numbnuts could understand?

frameworks mapreduce glossary
data block size in HDFS, why 64MB?

The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64…

database hadoop mapreduce block hdfs
Reading HDFS and local files in Java

I want to read file paths irrespective of whether they are HDFS or local. Currently, I pass the local paths …

java hadoop mapreduce hdfs
List the namenode and datanodes of a cluster from any node?

From any node in a Hadoop cluster, what is the command to identify the running namenode? identify all running datanodes? …

hadoop mapreduce
Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in …

hadoop mapreduce
Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)

I have 3 data nodes running, while running a job i am getting the following given below error , java.io.IOException: …

java hadoop mapreduce hive hdfs