I have a hard time translating QuickSort with Hoare partitioning into C code, and can't find out why. The code I'm using is shown below:
void QuickSort(int a[],int start,int end) {
int q=HoarePartition(a,start,end);
if (end<=start) return;
QuickSort(a,q+1,end);
QuickSort(a,start,q);
}
int HoarePartition (int a[],int p, int r) {
int x=a[p],i=p-1,j=r;
while (1) {
do j--; while (a[j] > x);
do i++; while (a[i] < x);
if (i < j)
swap(&a[i],&a[j]);
else
return j;
}
}
Also, I don't really get why HoarePartition
works. Can someone explain why it works, or at least link me to an article that does?
I have seen a step-by-step work-through of the partitioning algorithm, but I don't have an intuitive feel for it. In my code, it doesn't even seem to work. For example, given the array
13 19 9 5 12 8 7 4 11 2 6 21
It will use pivot 13, but end up with the array
6 2 9 5 12 8 7 4 11 19 13 21
And will return j
which is a[j] = 11
. I thought it was supposed to be true that the array starting at that point and going forward should have values that are all larger than the pivot, but that isn't true here because 11 < 13.
Here's pseudocode for Hoare partitioning (from CLRS, second edition), in case this is useful:
Hoare-Partition (A, p, r)
x ← A[p]
i ← p − 1
j ← r + 1
while TRUE
repeat j ← j − 1
until A[j] ≤ x
repeat i ← i + 1
until A[i] ≥ x
if i < j
exchange A[i] ↔ A[j]
else return j
Thanks!
EDIT:
The right C code for this problem will end up being:
void QuickSort(int a[],int start,int end) {
int q;
if (end-start<2) return;
q=HoarePartition(a,start,end);
QuickSort(a,start,q);
QuickSort(a,q,end);
}
int HoarePartition (int a[],int p, int r) {
int x=a[p],i=p-1,j=r;
while (1) {
do j--; while (a[j] > x);
do i++; while (a[i] < x);
if (i < j)
swap(&a[i],&a[j]);
else
return j+1;
}
}
To answer the question of "Why does Hoare partitioning work?":
Let's simplify the values in the array to just three kinds: L values (those less than the pivot value), E values (those equal to the pivot value), and G value (those larger than the pivot value).
We'll also give a special name to one location in the array; we'll call this location s, and it's the location where the j pointer is when the procedure finishes. Do we know ahead of time which location s is? No, but we know that some location will meet that description.
With these terms, we can express the goal of the partitioning procedure in slightly different terms: it is to split a single array into two smaller sub-arrays which are not mis-sorted with respect to each other. That "not mis-sorted" requirement is satisfied if the following conditions are true:
That's really all we need to do. We don't even need to worry where the E values wind up on any given pass. As long as each pass gets the sub-arrays right with respect to each other, later passes will take care of any disorder that exists inside any sub-array.
So now let's address the question from the other side: how does the partitioning procedure ensure that there are no G values in s or to the left of it, and no L values to the right of s?
Well, "the set of values to the right of s" is the same as "the set of cells the j pointer moves over before it reaches s". And "the set of values to the left of and including s" is the same as "the set of values that the i pointer moves over before j reaches s".
That means that any values which are misplaced will, on some iteration of the loop, be under one of our two pointers. (For convenience, let's say it's the j pointer pointing at a L value, though it works exactly the same for the i pointer pointing at a G value.) Where will the i pointer be, when the j pointer is on a misplaced value? We know it will be:
Note that sometimes the i and j pointer will actually both stop on E values. When this happens, the values will be switched, even though there's no need for it. This doesn't do any harm, though; we said before that the placement of the E values can't cause mis-sorting between the sub-arrays.
So, to sum up, Hoare partitioning works because: