How is counting sort a stable sort?

Lazer picture Lazer · Apr 3, 2010 · Viewed 25.1k times · Source

Suppose my input is (a,b and c to distinguish between equal keys)

1 6a 8 3 6b 0 6c 4

My counting sort will save as (discarding the a,b and c info!!)

0(1) 1(1) 3(1) 4(1) 6(3) 8(1)

which will give me the result

0 1 3 4 6 6 6 8

So, how is this stable sort? I am not sure how it is "maintaining the relative order of records with equal keys."

Please explain.

Answer

nybon picture nybon · Jun 14, 2013

To understand why counting sort is stable, you need to understand that counting sort can not only be used for sorting a list of integers, it can also be used for sorting a list of elements whose key is an integer, and these elements will be sorted by their keys while having additional information associated with each of them.

A counting sort example that sorts elements with additional information will help you to understand this. For instance, we want to sort three stocks by their prices:

[(GOOG 3), (CSCO 1), (MSFT 1)]

Here stock prices are integer keys, and stock names are their associated information.

Expected output for the sorting should be:

[(CSCO 1), (MSFT 1), (GOOG 3)] 
(containing both stock price and its name,
and the CSCO stock should appear before MSFT so that it is a stable sort)

A counts array will be calculated for sorting this (let's say stock prices can only be 0 to 3):

counts array: [0, 2, 0, 1] (price "1" appear twice, and price "3" appear once)

If you are just sorting an integer array, you can go through the counts array and output "1" twice and "3" once and it is done, and the entire counts array will become an all-zero array after this.

But here we want to have stock names in sorting output as well. How can we obtain this additional information (it seems the counts array already discards this piece of information)? Well, the associated information is stored in the original unsorted array. In the unsorted array [(GOOG 3), (CSCO 1), (MSFT 1)], we have both the stock name and its price available. If we get to know which position (GOOG 3) should be in the final sorted array, we can copy this element to the sorted position in the sorted array.

To obtain the final position for each element in the sorted array, unlike sorting an integer array, you don't use the counts array directly to output the sorted elements. Instead, counting sort has an additional step which calculates the cumulative sum array from the counts array:

counts array: [0, 2, 2, 3] (i from 0 to 3: counts[i] = counts[i] + counts[i - 1]) 

This cumulative sum array tells us each value's position in the final sorted array currently. For example, counts[1]==2 means currently item with value 1 should be placed in the 2nd slot in the sorted array. Intuitively, because counts[i] is the cumulative sum from left, it shows how many smaller items are before the ith value, which tells you where the position should be for the ith value.

If a $1 price stock appears at the first time, it should be outputted to the second position of the sorted array and if a $3 price stock appears at the first time, it should be outputted to the third position of the sorted array. If a $1 stock appears and its element gets copied to the sorted array, we will decreased its count in the counts array.

counts array: [0, 1, 2, 3] 
(so that the second appearance of $1 price stock's position will be 1)

So we can iterate the unsorted array from backwards (this is important to ensure the stableness), check its position in the sorted array according to the counts array, and copied it to the sorted array.

sorted array: [null, null, null]
counts array: [0, 2, 2, 3]    

iterate stocks in unsorted stocks from backwards
1. the last stock (MSFT 1)
sorted array: [null, (MSFT 1), null] (copy to the second position because counts[1] == 2)
counts array: [0, 1, 2, 3] (decrease counts[1] by 1)

2. the middle stock (CSCO 1)
sorted array: [(CSCO 1), (MSFT 1), null] (copy to the first position because counts[1] == 1 now)
counts array: [0, 0, 2, 3] (decrease counts[1] by 1)

3. the first stock (GOOG 3)
sorted array: [(CSCO 1), (MSFT 1), (GOOG 3)] (copy to the third position because counts[3] == 3)
counts array: [0, 0, 2, 2] (decrease counts[3] by 1)

As you can see, after the array gets sorted, the counts array (which is [0, 0, 2, 2]) doesn't become an all-zero array like sorting an array of integers. The counts array is not used to tell how many times an integer appears in the unsorted array, instead, it is used to tell which position the element should be in the final sorted array. And since we decrease the count every time we output an element, we are essentially making the elements with same key's next appearance final position smaller. That's why we need to iterate the unsorted array from backwards to ensure its stableness.

Conclusion:

Since each element contains not only an integer as key, but also some additional information, even if their key is the same, you could tell each element is different by using the additional information, so you will be able to tell if it is a stable sorting algorithm (yes, it is a stable sorting algorithm if implemented appropriately).

References:

Some good materials explaining counting sort and its stableness: