Grokking Timsort

Yang picture Yang · Nov 14, 2009 · Viewed 9.1k times · Source

There's a (relatively) new sort on the block called Timsort. It's been used as Python's list.sort, and is now going to be the new Array.sort in Java 7.

There's some documentation and a tiny Wikipedia article describing the high-level properties of the sort and some low-level performance evaluations, but I was curious if anybody can provide some pseudocode to illustrate what Timsort is doing, exactly, and what are the key things that make it zippy. (Esp. with regard to the cited paper, "Optimistic Sorting and Information Theoretic Complexity.")

(See also related StackOverflow post.)

Answer

u0b34a0f6ae picture u0b34a0f6ae · Nov 14, 2009

Quoting the relevant portion from a now deleted blog post: Visualising Sorting Algorithms: Python's timsort

The business-end of timsort is a mergesort that operates on runs of pre-sorted elements. A minimum run length minrun is chosen to make sure the final merges are as balanced as possible - for 64 elements, minrun happens to be 32. Before the merges begin, a single pass is made through the data to detect pre-existing runs of sorted elements. Descending runs are handled by simply reversing them in place. If the resultant run length is less than minrun, it is boosted to minrun using insertion sort. On a shuffled array with no significant pre-existing runs, this process looks exactly like our guess above: pre-sorting blocks of minrun elements using insertion sort, before merging with merge sort.

[...]

  • timsort finds a descending run, and reverses the run in-place. This is done directly on the array of pointers, so seems "instant" from our vantage point.
  • The run is now boosted to length minrun using insertion sort.
  • No run is detected at the beginning of the next block, and insertion sort is used to sort the entire block. Note that the sorted elements at the bottom of this block are not treated specially - timsort doesn't detect runs that start in the middle of blocks being boosted to minrun.
  • Finally, mergesort is used to merge the runs.