Difference between combiner and partitioner

hadoop mapreduce partitioner

harshit · Jul 25, 2016 · Viewed 10.1k times · Source

I am a newbie to MapReduce and I just can't figure out the difference in the partitioner and combiner. I know both run in the intermediate step between the map and reduce tasks and both reduce the amount of data to be processed by the reduce task. Please explain the difference using an example.

Answer

First thing, agree with @Binary nerd s comment

Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.

where as Partitioner come into the picture when we are working on more than one Reducer. So, the partitioner decide which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key

With Combiner and Partitioner scenario :

With Partitioner only scenario :

Examples :

Combiner Example
Partitioner Example :

The partitioning phase takes place after the map phase and before the reduce phase. The number of partitions is equal to the number of reducers. The data gets partitioned across the reducers according to the partitioning function . The difference between a partitioner and a combiner is that the partitioner divides the data according to the number of reducers so that all the data in a single partition gets executed by a single reducer. However, the combiner functions similar to the reducer and processes the data in each partition. The combiner is an optimization to the reducer. The default partitioning function is the hash partitioning function where the hashing is done on the key. However it might be useful to partition the data according to some other function of the key or the value. -- Source

Difference between combiner and partitioner

Answer

Related questions