Top "Hadoop-streaming" questions

Hadoop streaming is a utility that allows running map-reduce jobs using any executable that reads from standard input and writes to standard output.

R install packages from Shell

I am trying to implement a reducer for Hadoop Streaming using R. However, I need to figure out a way …

r ansible hadoop-streaming
Hadoop: job runs okay on smaller set of data but fails with large dataset

I have a following situation I have 3 machines cluster with following confirguration. Master Usage of /: 91.4% of 74.41GB MemTotal: 16557308 kB MemFree: 723736 …

java hadoop mapreduce hadoop-streaming
Python MapReduce Hadoop Streaming Job that requires multiple input files?

I have two files in my cluster File A and File B with the following data - File A #Format: #…

python hadoop mapreduce hadoop-streaming
Sorting by value in Hadoop from a file

I have a file containing a String, then a space and then a number on every line. Example: Line1: Word 2 …

java hadoop hadoop-streaming
python - PipeMapRed.waitOutputThreads(): subprocess failed with code 1

Recently, I want to parse websites and then use BeautifulSoup to filter what I want and write in csv file …

mapreduce beautifulsoup hadoop-streaming
How to resolve java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2?

I am trying to execute NLTK in Hadoop environment. Following is the command which i used for execution. bin/hadoop …

hadoop nltk hadoop-streaming
Unzip files using hadoop streaming

I have many files in HDFS, all of them a zip file with one CSV file inside it. I'm trying …

hadoop zip hadoop-streaming
Hadoop Streaming Command Failure with Python Error

I'm a newcomer to Ubuntu, Hadoop and DFS but I've managed to install a single-node hadoop instance on my local …

python hadoop hadoop-streaming
Conditional Filter in GROUP BY in Pig

I have the following dataset in which I need to merge multiple rows into one if they have the same …

hadoop apache-pig hadoop-streaming
How do I pass a parameter to a python Hadoop streaming job?

For a python Hadoop streaming job, how do I pass a parameter to, for example, the reducer script so that …

python hadoop hadoop-streaming