“Combiner" Class in a mapreduce job

wayen wan picture wayen wan · Apr 19, 2012 · Viewed 10.8k times · Source

A Combiner runs after the Mapper and before the Reducer,it will receive as input all data emitted by the Mapper instances on a given node. then emits output to the Reducers.

And also,If a reduce function is both commutative and associative, then it can be used as a Combiner.

My Question is what does the phrase "commutative and associative" mean in this situation?

Answer

Donald Miner picture Donald Miner · Apr 19, 2012

Assume you have a list of numbers, 1 2 3 4 5 6.

Associative here means you can take your operation and apply it to any subgroup, then apply it to the result of those and get the same answer:

(1) + (2 + 3) + (4 + 5 + 6)
  ==
(1 + 2) + (3 + 4) + (5) + (6)
  ==
...

Think of the parenthesis here as the execution of a combiner.

Commutative means that the order doesn't matter, so:

1 + 2 + 3 + 4 + 5 + 6
  ==
2 + 4 + 6 + 1 + 2 + 3
  ==
...

For example, addition, fits this property, as seen before. "Maximum" fits this property above as well, because the max of maxs is the max. max(a,b) == max(b,a).

Median is an example that doesn't work: the median of medians is not the true median.


Don't forget another important property of a combiner: the input types for the key/value and the output types of the key/value need to be the same. For example, you can't take in a string:int and return a string:float.

Often times, the reducer might output some sort of string instead of numerical value, which may prevent you from just plugging in your reducer as the combiner.