The searching problem shows us the importance of sorting. It used to be estimated that more computer time was spent sorting than doing anything else -- even that more than half of all computer usage was sorting. This is certainly no longer true -- idling is the biggest `activity', and communications, graphics, and `dedicated' processes [such as controlling washing machines, doorknobs, cars] are among other leading contenders -- but sorting is still a major activity, and moreover one which is of mathematical and computational interest.
We look first at comparison-based sorting. The background assumption is that we have an array of items, numbered from 1 to N, and that our primary weapon is that we can compare two items to find out which is the smaller.
The main alternative is bucket-based sorting. In this, we [conceptually] have a collection of `buckets', for example labelled `A', `B', `C', ..., `Z'; Then people with names such as `Brown', `Bacon', `Bloggs' all go into the `B' bucket. Any bucket with more than one item in it can then be sub-divided into `Ba', `Bb', ..., `Bz' buckets, and the items re-distributed. Details left to the imagination. This is how mail is typically sorted, for example, except that the buckets are then labelled with town names, or individual staff names, or whatever. There are many variations, some extremely efficient.We assume that the comparison is mathematically well-behaved. This involved two features: (a) that the comparison doesn't change with time as the sorting progresses [I may be richer than you today, but poorer tomorrow -- we assume that the sorting is done fast enough for this not to matter]; (b) that the comparison is transitive [if I am richer than you and you are richer than him, then I am known to be richer than him without testing -- this wouldn't follow for things like ability at games, where it is quite possible for A to beat B, B to beat C and C to beat A]. Without these features, it's hard to say exactly what you would mean by sorting the array.
So, we are given our array of items; and we are given a comparison function which will tell us whether or not any pair are in the right order; we are asked to permute the items into an order such that A[1] <= A[2] <= A[3] <= ... <= A[N]. [Note that if some of the items are `equal', there may be many such orders; we usually don't care which of the valid orders happens, but sometimes we do, in which case a so-called stable sorting method, which doesn't alter the order of `equal' items, is needed.]
We start with the array unsorted, and finish with it sorted; so in the middle, it must be partly-sorted. What could that mean? One obvious possibility is to ask that the first so-many items be sorted. So, let us consider: for some j, where 0 <= j <= N, A[1] <= A[2] <= A[3] <= ... <= A[j]. We can make this true by setting j to be 1, and our task is to increase j until it reaches the value N. Our algorithm will take the form:
for j = 2 to N `include some element of A[j], ..., A[N] into the sorted part of the array' endforand our main task is to implement the `include ...' bit while making sure that the `invariant' remains true.
1 for j = 2 to N 2 temp = A[j]; lo = 1; hi = j-1 3 while lo <= hi 4 mid = (lo + hi)/2 5 if A[mid] > temp then hi = mid-1 else lo = mid+1 endif 6 endwhile 7 for k = j-1 to lo step -1 do A[k+1] = A[k] endfor 8 A[lo] = temp 9 endforHere, lines 3-6 are the `search' algorithm. Counting `equal' items as less than temp will preserve stability. We finish the search with hi set to the index just below where we want to insert A[j] and lo set to the index just above. Line 7 does the budging-up, leaving line 8 to plug the space left at index lo with temp [and note that when you budge up, you have to start at the top and work down -- if you work the wrong way, you copy the same item into all the slots].
1 for j = 1 to N-1 2 best = j 3 for k = j+1 to N do if A[k] < A[best] then best = k endif endfor 4 temp = A[j]; A[j] = A[best]; A[best] = temp 5 endforPretty simple, huh? Lines 2 and 3 set best to the index of the smallest remaining item, and line 4 swaps it to the right place. In line 1, we don't have to bother with the case j = N because the last item left over must be the largest; but we do have to do the case j = 1 because the very first item isn't necessarily the smallest.
Insertion sort is a little harder. We first do a search. To use binary chop on an array of length j takes roughly lg j comparisons [where lg is logarithm to base 2], most easily seen from the fact that it takes an extra step to search every time the array length doubles. So altogether, we do lg 1 + lg 2 + ... + lg N-1 = lg (N-1)! = N lg N/e comparisons, roughly, using Stirling's approximation for N!, and ignoring some correction terms and the fact that all the numbers really ought to be integers. Filling in the details and getting better results is left to the masochistic. Now what about the budging up? In the best case, which is when the array is already sorted, there is no budging to do; the next item is already in the right place. In the worst case, which is when the array is initially anti-sorted, the next item always has to go in first and every element has to be budged up every time, for a total of 1 + 2 + ... + N-1 = N(N-1)/2 budges. In the average case, where the next item is placed randomly, half the elements have to be budged, so the expected total is N(N-1)/4 budges. So, if we assume that a budge takes about as much time as a comparison, then in the worst possible case insertion sorting is a little slower than selection sorting (because of the searching), on average it's about twice as fast, and in the best cases it's a whole lot faster (because lg N is a whole lot less than N for large N). The best case is actually quite common, or nearly so; you quite often have a sorted list of things, and you want to change a few items and re-sort, so that only a few items are out of place. So insertion sort is also an O(N^2) method in general, but with luck it's only O(N log N) [note the subtle change from `lg' to `log'!].
Take your pick! Insertion is probably a little more efficient, but selection is about as simple a piece of code as you can write for this purpose. And yet .... Either of these methods is one you might reasonably employ for sorting half a dozen books, or a hand of cards, or a tutorial group of students, into order. But imagine them with a large number of items.
Suppose, for the sake of definiteness, you have been assigned the task of re-shelving all the books in the George Green Library. [Either the students have taken all the books out and it is re-call day, or else some mad librarian has decided to change the system!] Selection sort involves first of all scouring the entire library in order to find the earliest book, say `Aardvarks for all'. Then you scour the library again for `Aardvarks in danger'. Then again for .... A very long time later, you reach `Zoology'. Insertion sort means that when you're near the end of the process, you find a book, typically from around the middle of the library, so you have to shift up half the books in the entire library just to make room for this one book. Bubblesorts fare no better. You wouldn't do it this way! In all these cases, there are too many `small' steps when what you want is a grand design that will put all the books somewhere not too far from their final resting places with only a modest amount of work per book. How much work? In this particular case -- comparison-based sorting -- we can put bounds on it.
If we are very lucky, then most of the ways are incompatible, and we are quickly left with a very short list. It is easy to see that we cannot possibly be sure that the items are sorted without every item being at some stage compared with its neighbours, so the luckiest possible is N-1 comparisons [e.g., if the items are originally in the right order, and we just look along the row]. But being very lucky implies that we could equally have been unlucky; if the comparison had gone the other way, then most of the ways would have been compatible, and we would have had to make many more comparisons to make the list shorter. If the comparison would divide the list in this unequal way, then if all orderings are equally likely [this assumption is important!], then the bigger part is more likely to be the compatible outcome than the smaller part. So the best we can hope to do, both on average and in the worst case, is to divide the list into two equal parts. In this case, the number of steps needed to reduce the list to one is clearly lg N!, or roughly N lg (N/e), again by Stirling's approximation. You cannot do better than this on average; and if you devise a method which does better than this in some, or even most, cases, then it will sometimes do worse, perhaps even much worse.
Using O notation, we can draw up a little table:
. | insertion | selection | theoretical limit |
---|---|---|---|
best case | O(N log N) | O(N^2) | O(N) |
average case | O(N^2) | O(N^2) | O(N log N) |
worst case | O(N^2) | O(N^2) | O(N log N) |
N | selection | theoretical limit |
---|---|---|
10 | 50 microseconds | 22 microseconds |
100 | 5 milliseconds | 520 microseconds |
1000 | 0.5 seconds | 8.5 milliseconds |
10000 | 50 seconds | 120 milliseconds |
100000 | 1.5 hours | 1.5 seconds |
1000000 | 6 days | 19 seconds |
10000000 | 2 years | 3.5 minutes |
100000000 | 2 centuries | 40 minutes |
Of course, we haven't yet shown that there are any methods that even begin to approach the theoretical limit. That is our next topic!