A Combiner runs after the Mapper and before the Reducer,it will receive as input all data emitted by the Mapper instances on a given node. then emits output to the Reducers.
And also,If a reduce function is both commutative and associative, then it can be used as a Combiner.
My Question is what does the phrase "commutative and associative" mean in this situation?
Assume you have a list of numbers, 1 2 3 4 5 6.
Associative here means you can take your operation and apply it to any subgroup, then apply it to the result of those and get the same answer:
(1) + (2 + 3) + (4 + 5 + 6)
==
(1 + 2) + (3 + 4) + (5) + (6)
==
...
Think of the parenthesis here as the execution of a combiner.
Commutative means that the order doesn't matter, so:
1 + 2 + 3 + 4 + 5 + 6
==
2 + 4 + 6 + 1 + 2 + 3
==
...
For example, addition, fits this property, as seen before. "Maximum" fits this property above as well, because the max of maxs is the max. max(a,b) == max(b,a).
Median is an example that doesn't work: the median of medians is not the true median.
Don't forget another important property of a combiner: the input types for the key/value and the output types of the key/value need to be the same. For example, you can't take in a string:int and return a string:float.
Often times, the reducer might output some sort of string instead of numerical value, which may prevent you from just plugging in your reducer as the combiner.