It's well-known that the worst-case runtime for heapsort is Ω(n lg n), but I'm having trouble seeing why this is. In particular, the first step of heapsort (making a max-heap) takes time Θ(n). This is then followed by n heap deletions. I understand why each heap deletion takes time O(lg n); rebalancing the heap involves a bubble-down operation that takes time O(h) in the height of the heap, and h = O(lg n). However, what I don't see is why this second step should take Ω(n lg n). It seems like any individual heap dequeue wouldn't necessarily cause the node moved to the top to bubble all the way down the tree.
My question is - does anyone know of a good lower-bound proof for the best-case behavior of heapsort?
So I did a bit of digging myself and it looks like this result actually is fairly recent! The first lower-bound proof I can find is from 1992, though heapsort itself was invented in 1964.
The formal lower-bound proof is due to Schaffer and Sedgewick's "The Analysis of Heapsort" paper. Here's a slightly paraphrased version of the proof that omits some of the technical details.
To begin, let's suppose that n = 2k - 1 for some k, which guarantees that we have a complete binary heap. I'll show how to handle this case separately later on. Because we have 2k - 1 elements, the first pass of heapsort will, in Θ(n), build up a heap of height k. Now, consider the first half of the dequeues from this heap, which removes 2k-1 nodes from the heap. The first key observation is that if you take the starting heap and then mark all of the nodes here that actually end up getting dequeued, they form a subtree of the heap (i.e. every node that get dequeued has a parent that also gets dequeued). You can see this because if this weren't the case, then there would be some node whose (larger) parent didn't get dequeued though the node itself was dequeued, meaning that the values are out of order.
Now, consider how the nodes of this tree are distributed across the heap. If you label the levels of the heap 0, 1, 2, ..., k - 1, then there will be some number of these nodes in levels 0, 1, 2, ..., k - 2 (that is, everything except the bottom level of the tree). In order for these nodes to get dequeued from the heap, then they have to get swapped up to the root, and they only get swapped up one level at a time. This means that one way to lower-bound the runtime of heapsort would be to count the number of swaps necessary to bring all of these values up to the root. In fact, that's exactly what we're going to do.
The first question we need to answer is - how many of the largest 2k-1 nodes are not in the bottom level of the heap? We can show that this is no greater than 2k-2 by contradiction. Suppose that there are at least 2k-2 + 1 of the largest nodes in the bottom level of the heap. Then each of the parents of those nodes must also be large nodes in level k - 2. Even in the best case, this means that there must be at least 2k-3 + 1 large nodes in level k - 2, which then means that there would be at least 2k-4 + 1 large nodes in level k - 3, etc. Summing up over all of these nodes, we get that there are 2k-2 + 2k-3 + 2k-4 + ... + 20 + k large nodes. But this value is strictly greater than 2k-1, contradicting the fact that we're working with only 2k-1 nodes here.
Okay... we now know that there are at most 2k-2 large nodes in the bottom layer. This means that there must be at least 2k-2 of the large nodes in the first k-2 layers. We now ask - what is the sum, over all of these nodes, of the distance from that node to the root? Well, if we have 2k-2 nodes positioned somewhere in a complete heap, then at most 2k-3 of them can be in the first k - 3 levels, and so there are at least 2k-2 - 2k-3 = 2k-3 heavy nodes in level k - 2. Consequently, the total number of swaps that need to be performed are at least (k - 2) 2k-3. Since n = 2k-1, k = Θ(lg n), and so this value is Θ(n lg n) as required.