How to decide when to use a Map-Side Join or Reduce-Side while writing an MR code in java?
Map side join performs join before data reached to Map. Map function expects a strong prerequisites before joining data at map side. Both method have some pros and cons. Map side join is efficient compare to reduce side but it require strict format.
Prerequisites:
Reduce side join also called as Repartitioned join or Repartitioned sort merge join and also it is mostly used join type. It will have to go through sort and shuffle phase which would incur network overhead.Reduce side join uses few terms like data source, tag and group key lets be familiar with it.
As we know we are going to join this data on reduce side we must prepare in a way that it can be used for joining in reduce phase. let’s have a look what are the steps needs to be perform.
For more information check this link: http://hadoopinterviews.com/map-side-join-reduce-side-join/