The Median of medians
approach is very popular in quicksort
type partitioning algorithms to yield a fairly good pivot, such that it partitions the array uniformly. Its logic is given in Wikipedia as:
The chosen pivot is both less than and greater than half of the elements in the list of medians, which is around n/10 elements (1/2 * (n/5)) for each half. Each of these elements is a median of 5, making it less than 2 other elements and greater than 2 other elements outside the block. Hence, the pivot is less than 3(n/10) elements outside the block, and greater than another 3(n/10) elements outside the block. Thus the chosen median splits the elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm.
Can somebody explain it a bit lucidly for me. I am finding it difficult to understand the logic.
Think of the following set of numbers:
5 2 6 3 1
The median of these numbers is 3
. Now if you have a number n
, if n > 3
, then it is bigger than at least half of the numbers above. If n < 3
, then it is smaller than at least half of the numbers above.
So that is the idea. That is, for each set of 5 numbers, you get their median. Now you have n / 5
numbers. This is obvious.
Now if you get the median of those numbers (call it m
), it is bigger than half of them and smaller than the other half (by definition of median!). In other words, m
is bigger than n / 10
numbers (which themselves were medians of small 5 element groups) and bigger than another n / 10
numbers (which again were medians of small 5 element groups).
In the example above, we saw that if the median is k
and you have m > k
, then m
is also bigger than 2 other numbers (that were themselves smaller than k
). This means that for each of those smaller 5 element groups where m
was bigger than its median, m
is also bigger than two other numbers. This makes it at least 3 numbers (2 numbers + the median itself) in each of those n / 10
small 5 element groups, that are smaller than m
. Hence, m
is at least bigger than 3n/10
numbers.
Similar logic for the number of elements m
is bigger than.