Understanding "median of medians" algorithm

simon picture simon · Feb 28, 2012 · Viewed 49.4k times · Source

I want to understand "median of medians" algorithm on the following example:

We have 45 distinct numbers divided into 9 group with 5 elements each.

48 43 38 33 28 23 18 13 8

49 44 39 34 29 24 19 14 9 

50 45 40 35 30 25 20 15 10

51 46 41 36 31 26 21 16 53

52 47 42 37 32 27 22 17 54
  1. The first step is sorting every group (in this case they are already sorted)
  2. Second step recursively, find the "true" median of the medians (50 45 40 35 30 25 20 15 10) i.e. the set will be divided into 2 groups:

    50 25
    
    45 20 
    
    40 15
    
    35 10
    
    30
    

    sorting these 2 groups

    30 10
    
    35 15 
    
    40 20
    
    45 25
    
    50
    

the medians is 40 and 15 (in case the numbers are even we took left median) so the returned value is 15 however "true" median of medians (50 45 40 35 30 25 20 15 10) is 30, moreover there are 5 elements less then 15 which are much less than 30% of 45 which are mentioned in wikipedia

and so T(n) <= T(n/5) + T(7n/10) + O(n) fails.

By the way in the Wikipedia example, I get result of recursion as 36. However, the true median is 47.

So, I think in some cases this recursion may not return true median of medians. I want to understand where is my mistake.

Answer

templatetypedef picture templatetypedef · Feb 28, 2012

The problem is in the step where you say to find the true median of the medians. In your example, you had these medians:

50 45 40 35 30 25 20 15 10

The true median of this data set is 30, not 15. You don't find this median by splitting the groups into blocks of five and taking the median of those medians, but instead by recursively calling the selection algorithm on this smaller group. The error in your logic is assuming that median of this group is found by splitting the above sequence into two blocks

50 45 40 35 30

and

25 20 15 10

then finding the median of each block. Instead, the median-of-medians algorithm will recursively call itself on the complete data set 50 45 40 35 30 25 20 15 10. Internally, this will split the group into blocks of five and sort them, etc., but it does so to determine the partition point for the partitioning step, and it's in this partitioning step that the recursive call will find the true median of the medians, which in this case will be 30. If you use 30 as the median as the partitioning step in the original algorithm, you do indeed get a very good split as required.

Hope this helps!