I was working on implementing a quicksort yesterday, and then I ran it, expecting a faster runtime than the Mergesort (which I had also implemented). I ran the two, and while the quicksort was faster for smaller data sets <100 elements (and I did verify that it works), the mergesort became the quicker algorithm fairly quickly. I had been taught that quicksort is almost always "quicker" than mergesort, and I understand that there is some debate on this topic, but I at least expected it to be closer than this. For data sets >10000 elements, the mergesort was over 4 times faster. Is this to be expected, or is there an error in my quicksort code?
mergesort:
public static void mergeSort(int[ ] e)
{
if (e.length <= 1) return;
int[] first = new int[e.length/2];
int[] second = new int[e.length - first.length];
System.arraycopy(e, 0, first, 0, first.length);
System.arraycopy(e, first.length, second, 0, second.length);
mergeSort(first);
mergeSort(second);
System.arraycopy(merge(first, second), 0, e, 0, e.length);
}
private static int[] merge(int[] first, int[] second) {
int iFirst = 0;
int iSecond = 0;
int iCombined = 0;
int[] combined = new int[first.length + second.length];
while(iFirst < first.length && iSecond < second.length) {
if (first[iFirst] > second[iSecond]) {
combined[iCombined++] = second[iSecond++];
}
else combined[iCombined++] = first[iFirst++];
}
for(; iFirst < first.length; iFirst++) {
combined[iCombined++] = first[iFirst];
}
for(; iSecond < second.length; iSecond++) {
combined[iCombined++] = second[iSecond];
}
return combined;
}
quicksort:
public static void quicksort(int[] a, int first, int last) {
if (first >= last) return;
int partitionIndex = partition(a, first, last);
quicksort(a, first, partitionIndex - 1);
quicksort(a, partitionIndex + 1, last);
}
public static int partition(int[] x, int first, int last) {
int left = first;
int right = last;
int pivot = x[first];
int pivotIdx = first;
while(left <= right) {
while(left < x.length && x[left] <= pivot) left++;
while(right >= 0 && x[right] > pivot) right--;
if (left <= right) {
int temp = x[left];
x[left] = x[right];
x[right] = temp;
}
}
pivotIdx = right;
x[first] = x[right];
x[pivotIdx] = pivot;
return pivotIdx;
}
I actually just wrote a "linked-list comparative sort demo program" in C and arrived at a similar conclusion (that mergesort will beat quicksort for most uses), altho I have been told that quicksort is generally not used for linked lists anyway. I would note that the choice of pivot values is a monster factor -- my initial version used a random node as the pivot, and when I refined it a bit to take a mean of two (random) nodes, the exectution time for 1000000 records went from over 4 minutes to less than 10 seconds, putting it right on par with mergesort.
Mergesort and quicksort have the same big O best case (n*log(n)) and despite what people may try to claim, big O is really about iteration count and not comparison count. The biggest difference that can be produced between the two of them will always be to quicksort's detriment, and it involves lists that are already largely sorted or contain a large number of ties (when quicksort does better than mergesort, the difference will not be nearly so great). This is because ties or already sorted segments streamline straight through mergesort; when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. This is to say, the number of iterations will remain constant, but the number of comparisons is cut in half. If you are talking about actual time and are sorting strings, it's the comparisons that are expensive.
Ties and already sorted segments in quicksort can easily lead to unbalanced lists if the pivot value is not carefully determined, and the imbalanced lists (eg, one on the right, ten on the left) are what causes the slowdown. So, if you can get your quicksort to perform as well on an already sorted list as it does on a ramdomized list, you've got a good method for finding the pivot.
If you're interested, the demo program produces output like this:
[root~/C] ./a.out -1 3
Using "", 0 records
Primary Criteria offset=128
Command (h for help, Q to quit): N
How many records? 4000000
New list is 562500.00 kb
Command (h for help, Q to quit): m
Mergesorting..............3999999 function calls
123539969 Iterations Comparison calls: 82696100
Elapsed time: 0 min 9 sec
Command (h for help, Q to quit): S
Shuffled.
Command (h for help, Q to quit): q
Quicksorting..............4000000 function calls
190179315 Iterations Comparison calls: 100817020
Elapsed time: 0 min 23 sec
Altho without the krazy kolors. There's some more stuff about it by me about halfway down this page.
ps. neither sort requires extra memory with the linked list.