Using red black trees for sorting

Vaibhav Bajpai picture Vaibhav Bajpai · Jul 17, 2010 · Viewed 9.4k times · Source

The worst-case running time of insertion on a red-black tree is O(lg n) and if I perform a in-order walk on the tree, I essentially visit each node, so the total worst-case runtime to print the sorted collection would be O(n lg n)

I am curious, why are red-black trees not preferred for sorting over quick sort (whose average-case running time is O(n lg n).

I see that maybe because red-black trees do not sort in-place, but I am not sure, so maybe someone could help.

Answer

Aryabhatta picture Aryabhatta · Jul 17, 2010

Knowing which sort algorithm performs better really depend on your data and situation.

If you are talking in general/practical terms,

Quicksort (the one where you select the pivot randomly/just pick one fixed, making worst case Omega(n^2)) might be better than Red-Black Trees because (not necessarily in order of importance)

  • Quicksort is in-place. The keeps your memory footprint low. Say this quicksort routine was part of a program which deals with a lot of data. If you kept using large amounts of memory, your OS could start swapping your process memory and trash your perf.

  • Quicksort memory accesses are localized. This plays well with the caching/swapping.

  • Quicksort can be easily parallelized (probably more relevant these days).

  • If you were to try and optimize binary tree sorting (using binary tree without balancing) by using an array instead, you will end up doing something like Quicksort!

  • Red-Black trees have memory overheads. You have to allocate nodes possibly multiple times, your memory requirements with trees is doubles/triple that using arrays.

  • After sorting, say you wanted the 1045th (say) element, you will need to maintain order statistics in your tree (extra memory cost because of this) and you will have O(logn) access time!

  • Red-black trees have overheads just to access the next element (pointer lookups)

  • Red-black trees do not play well with the cache and the pointer accesses could induce more swapping.

  • Rotation in red-black trees will increase the constant factor in the O(nlogn).

  • Perhaps the most important reason (but not valid if you have lib etc available), Quicksort is very simple to understand and implement. Even a school kid can understand it!

I would say you try to measure both implementations and see what happens!

Also, Bob Sedgewick did a thesis on quicksort! Might be worth reading.