I'm very much new to MapReduce and I completed a Hadoop word-count example.
In that example it produces unsorted file (with key-value pairs) of word counts. So is it possible to sort it by number of word occurrences by combining another MapReduce task with the earlier one?
In simple word count map reduce program the output we get is sorted by words. Sample output can be :
Apple 1
Boy 30
Cat 2
Frog 20
Zebra 1
If you want output to be sorted on the basis of number of occrance of words, i.e in below format
1 Apple
1 Zebra
2 Cat
20 Frog
30 Boy
You can create another MR program using below mapper and reducer where the input will be the output got from simple word count program.
class Map1 extends MapReduceBase implements Mapper<Object, Text, IntWritable, Text>
{
public void map(Object key, Text value, OutputCollector<IntWritable, Text> collector, Reporter arg3) throws IOException
{
String line = value.toString();
StringTokenizer stringTokenizer = new StringTokenizer(line);
{
int number = 999;
String word = "empty";
if(stringTokenizer.hasMoreTokens())
{
String str0= stringTokenizer.nextToken();
word = str0.trim();
}
if(stringTokenizer.hasMoreElements())
{
String str1 = stringTokenizer.nextToken();
number = Integer.parseInt(str1.trim());
}
collector.collect(new IntWritable(number), new Text(word));
}
}
}
class Reduce1 extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text>
{
public void reduce(IntWritable key, Iterator<Text> values, OutputCollector<IntWritable, Text> arg2, Reporter arg3) throws IOException
{
while((values.hasNext()))
{
arg2.collect(key, values.next());
}
}
}