Top "Hadoop-streaming" questions

Hadoop streaming is a utility that allows running map-reduce jobs using any executable that reads from standard input and writes to standard output.

Getting the count of records in a data frame quickly

I have a dataframe with as many as 10 million records. How can I get a count quickly? df.count is …

scala apache-spark hadoop-streaming
Importing text file : No Columns to parse from file

I am trying to take input from sys.stdin. This is a map reducer program for hadoop. Input file is …

python pandas hadoop-streaming
Hive FAILED: ParseException line 2:0 cannot recognize input near ''macaddress'' 'CHAR' '(' in column specification

I've tried running hive -v -f sqlfile.sql Here is the content of the file CREATE TABLE UpStreamParam ( 'macaddress' CHAR(50), …

hadoop hive hadoop-streaming
The import org.apache.hadoop.mapreduce cannot be resolved

I am trying to execute the below code package test; import java.io.IOException; import java.util.*; import org.apache.…

hadoop mapreduce hive hadoop-streaming hadoop-plugins
POC for Hadoop in real time scenario

I have a bit of a problem. I want to learn about Hadoop and how I might use it to …

hadoop real-time bigdata hadoop-streaming
Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

Hey I'm fairly new to the world of Big Data. I came across this tutorial on http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20…

python hadoop mapreduce hadoop-streaming mrjob
Hadoop Java Error : Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/myorg/WordCount)

I am new to hadoop. I followed the maichel-noll tutorial to set up hadoop in single node.I tried running …

java hadoop jar hadoop-streaming
How to decide when to use a Map-Side Join or Reduce-Side while writing an MR code in java?

How to decide when to use a Map-Side Join or Reduce-Side while writing an MR code in java?

hadoop mapreduce hadoop-streaming
Hadoop is not showing my job in the job tracker even though it is running

Problem: When I submit a job to my hadoop 2.2.0 cluster it doesn't show up in the job tracker but the …

java hadoop hadoop-streaming yarn
Using python efficiently to calculate hamming distances

I need to compare a large number of strings similar to 50358c591cef4d76. I have a Hamming distance function (…

python performance hadoop-streaming