How is nth_element Implemented?

Jonathan Mee picture Jonathan Mee · Mar 19, 2015 · Viewed 16.7k times · Source

There are a lot of claims on StackOverflow and elsewhere that nth_element is O(n) and that it is typically implemented with Introselect: http://en.cppreference.com/w/cpp/algorithm/nth_element

I want to know how this can be achieved. I looked at Wikipedia's explanation of Introselect and that just left me more confused. How can an algorithm switch between QSort and Median-of-Medians?

I found the Introsort paper here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.5196&rep=rep1&type=pdf But that says:

In this paper, we concentrate on the sorting problem and return to the selection problem only briefly in a later section.

I've tried to read through the STL itself to understand how nth_element is implemented, but that gets hairy real fast.

Could someone show me pseudo-code for how Introselect is implemented? Or even better, actual C++ code other than the STL of course :)

Answer

5gon12eder picture 5gon12eder · Mar 19, 2015

Disclaimer: I don't know how std::nth_element is implemented in any standard library.

If you know how Quicksort works, you can easily modify it to do what is needed for this algorithm. The basic idea of Quicksort is that in each step, you partition the array into two parts such that all elements less than the pivot are in the left sub-array and all elements equal to or greater than the pivot are in the right sub-array. (A modification of Quicksort known as ternary Quicksort creates a third sub-array with all elements equal to the pivot. Then the right sub-array contains only entries strictly greater than the pivot.) Quicksort then proceeds by recursively sorting the left and right sub-arrays.

If you only want to move the n-th element into place, instead of recursing into both sub-arrays, you can tell in every step whether you will need to descend into the left or right sub-array. (You know this because the n-th element in a sorted array has index n so it becomes a matter of comparing indices.) So – unless your Quicksort suffers worst-case degeneration – you roughly halve the size of the remaining array in each step. (You never look at the other sub-array again.) Therefore, on average, you are dealing with arrays of the following lengths in each step:

  1. Θ(N)
  2. Θ(N / 2)
  3. Θ(N / 4)

Each step is linear in the length of the array it is dealing with. (You loop over it once and decide into what sub-array each element should go depending on how it compares to the pivot.)

You can see that after Θ(log(N)) steps, we will eventually reach a singleton array and are done. If you sum up N (1 + 1/2 + 1/4 + …), you'll get 2 N. Or, in the average case, since we cannot hope that the pivot will always exactly be the median, something on the order of Θ(N).