G12FCO -- Formal Computation

Lecture 4 -- Sorting I

The searching problem shows us the importance of sorting. It used to be estimated that more computer time was spent sorting than doing anything else -- even that more than half of all computer usage was sorting. This is certainly no longer true -- idling is the biggest `activity', and communications, graphics, and `dedicated' processes [such as controlling washing machines, doorknobs, cars] are among other leading contenders -- but sorting is still a major activity, and moreover one which is of mathematical and computational interest.

We look first at comparison-based sorting. The background assumption is that we have an array of items, numbered from 1 to N, and that our primary weapon is that we can compare two items to find out which is the smaller.

The main alternative is bucket-based sorting. In this, we [conceptually] have a collection of `buckets', for example labelled `A', `B', `C', ..., `Z'; Then people with names such as `Brown', `Bacon', `Bloggs' all go into the `B' bucket. Any bucket with more than one item in it can then be sub-divided into `Ba', `Bb', ..., `Bz' buckets, and the items re-distributed. Details left to the imagination. This is how mail is typically sorted, for example, except that the buckets are then labelled with town names, or individual staff names, or whatever. There are many variations, some extremely efficient.

We assume that the comparison is mathematically well-behaved. This involved two features: (a) that the comparison doesn't change with time as the sorting progresses [I may be richer than you today, but poorer tomorrow -- we assume that the sorting is done fast enough for this not to matter]; (b) that the comparison is transitive [if I am richer than you and you are richer than him, then I am known to be richer than him without testing -- this wouldn't follow for things like ability at games, where it is quite possible for A to beat B, B to beat C and C to beat A]. Without these features, it's hard to say exactly what you would mean by sorting the array.

So, we are given our array of items; and we are given a comparison function which will tell us whether or not any pair are in the right order; we are asked to permute the items into an order such that A[1] <= A[2] <= A[3] <= ... <= A[N]. [Note that if some of the items are `equal', there may be many such orders; we usually don't care which of the valid orders happens, but sometimes we do, in which case a so-called stable sorting method, which doesn't alter the order of `equal' items, is needed.]

Simple sorting methods

There are many simple ways of obtaining this permutation; and if you have ever played cards, you probably have yet another way of sorting the cards in your hand that you follow only semi-consciously and that is quite different from anything to be discussed here. Since we want to discuss the systematic development of algorithms, a good start is to think about what invariants we might use.

We start with the array unsorted, and finish with it sorted; so in the middle, it must be partly-sorted. What could that mean? One obvious possibility is to ask that the first so-many items be sorted. So, let us consider: for some j, where 0 <= j <= N, A[1] <= A[2] <= A[3] <= ... <= A[j]. We can make this true by setting j to be 1, and our task is to increase j until it reaches the value N. Our algorithm will take the form:

	for j = 2 to N
		`include some element of A[j], ..., A[N] into the
			sorted part of the array'
	endfor

and our main task is to implement the `include ...' bit while making sure that the `invariant' remains true.

Insertion sort

If we `include' A[j], then we make minimal disturbance to the unsorted part of the array; to restore the invariant, we must find the right place to put A[j], and `budge up' some of the sorted part to make room for it. We get something like:

1	for j = 2 to N
2	    temp = A[j]; lo = 1; hi = j-1
3	    while lo <= hi
4		mid = (lo + hi)/2
5		if A[mid] > temp then hi = mid-1 else lo = mid+1 endif
6	    endwhile
7	    for k = j-1 to lo step -1 do A[k+1] = A[k] endfor
8	    A[lo] = temp
9      endfor

Here, lines 3-6 are the `search' algorithm. Counting `equal' items as less than temp will preserve stability. We finish the search with hi set to the index just below where we want to insert A[j] and lo set to the index just above. Line 7 does the budging-up, leaving line 8 to plug the space left at index lo with temp [and note that when you budge up, you have to start at the top and work down -- if you work the wrong way, you copy the same item into all the slots].

Selection sort

The obvious alternative is to strengthen the invariant: not only is the first part of the array sorted, it is the result of sorting the so-many smallest items. So A[1] to A[j] are all smaller than A[j+1] to A[N], and our task is to select the smallest of the unsorted items and move it to position j+1, which will restore the invariant with j increased by one. In this way, the sorted part of the array is never touched again, but the unsorted part must be searched. The code looks like:

1	for j = 1 to N-1
2	    best = j
3	    for k = j+1 to N do if A[k] < A[best] then best = k endif endfor
4	    temp = A[j]; A[j] = A[best]; A[best] = temp
5	endfor

Pretty simple, huh? Lines 2 and 3 set best to the index of the smallest remaining item, and line 4 swaps it to the right place. In line 1, we don't have to bother with the case j = N because the last item left over must be the largest; but we do have to do the case j = 1 because the very first item isn't necessarily the smallest.

Swap-based sorting methods

A number of sorting methods, of which bubblesort is the best known [partly because of its appealing name], are based on the even simpler idea that if the items are not sorted, then some of them must be out of order, and if we keep swapping any pairs that are out of order then the result will be sorted. In the simplest form of bubblesort, you simply work up the array comparing and [if necessary] swapping adjacent items, then repeat the whole process until you get through with no swaps; there are some easy ways to make modest improvements. These simple forms offer no advantages at all, either in simplicity or in efficiency, over insertion and selection, and it is one of the minor mysteries of computing that they remain so popular! There are some very efficient versions [generically known as diminishing increment sorts or Shell sorts -- someone's name, nothing to do with the seaside!], but analysing these is beyond the scope of this lecture.

The efficiency of insertion and selection sorting

The methods give us a fairly gentle introduction to the analysis of performance. Let us look first at selection sort, which is the easier. To find best, we have to look along the whole of the unsorted part of the array. This involves N-j comparisons; then there is a swap. In the outer loop, j runs from 1 to N-1, so we do N-1 + N-2 + ... + 1 = N(N-1)/2 comparisons and 1 + 1 + ... + 1 = N-1 swaps. For large values of N, the work done in the comparisons will swamp the work done in the swaps, so the time taken is that to do roughly 0.5N^2 comparisons. We can say that selection sort is an O(N^2) method, where the [so-called] big-Oh notation means that the ratio of the time taken to N^2 remains bounded as N tends to infinity.

Insertion sort is a little harder. We first do a search. To use binary chop on an array of length j takes roughly lg j comparisons [where lg is logarithm to base 2], most easily seen from the fact that it takes an extra step to search every time the array length doubles. So altogether, we do lg 1 + lg 2 + ... + lg N-1 = lg (N-1)! = N lg N/e comparisons, roughly, using Stirling's approximation for N!, and ignoring some correction terms and the fact that all the numbers really ought to be integers. Filling in the details and getting better results is left to the masochistic. Now what about the budging up? In the best case, which is when the array is already sorted, there is no budging to do; the next item is already in the right place. In the worst case, which is when the array is initially anti-sorted, the next item always has to go in first and every element has to be budged up every time, for a total of 1 + 2 + ... + N-1 = N(N-1)/2 budges. In the average case, where the next item is placed randomly, half the elements have to be budged, so the expected total is N(N-1)/4 budges. So, if we assume that a budge takes about as much time as a comparison, then in the worst possible case insertion sorting is a little slower than selection sorting (because of the searching), on average it's about twice as fast, and in the best cases it's a whole lot faster (because lg N is a whole lot less than N for large N). The best case is actually quite common, or nearly so; you quite often have a sorted list of things, and you want to change a few items and re-sort, so that only a few items are out of place. So insertion sort is also an O(N^2) method in general, but with luck it's only O(N log N) [note the subtle change from `lg' to `log'!].

Take your pick! Insertion is probably a little more efficient, but selection is about as simple a piece of code as you can write for this purpose. And yet .... Either of these methods is one you might reasonably employ for sorting half a dozen books, or a hand of cards, or a tutorial group of students, into order. But imagine them with a large number of items.

Suppose, for the sake of definiteness, you have been assigned the task of re-shelving all the books in the George Green Library. [Either the students have taken all the books out and it is re-call day, or else some mad librarian has decided to change the system!] Selection sort involves first of all scouring the entire library in order to find the earliest book, say `Aardvarks for all'. Then you scour the library again for `Aardvarks in danger'. Then again for .... A very long time later, you reach `Zoology'. Insertion sort means that when you're near the end of the process, you find a book, typically from around the middle of the library, so you have to shift up half the books in the entire library just to make room for this one book. Bubblesorts fare no better. You wouldn't do it this way! In all these cases, there are too many `small' steps when what you want is a grand design that will put all the books somewhere not too far from their final resting places with only a modest amount of work per book. How much work? In this particular case -- comparison-based sorting -- we can put bounds on it.

Theoretical limits

If we take N objects, we can arrange them in N! different ways. Imagine all those N! ways written down in a long list. Every time we do a comparison (say, one that shows that the item that was originally in position 3 is greater than the item that was originally in position 17), some of those ways are compatible with the outcome, and some are incompatible. If we cross off those that are incompatible, then we are left with a shorter list; as long as that list has more than one entry, we can't yet be sure that the array is sorted, as there are at least two different orders that are compatible with everything we know so far.

If we are very lucky, then most of the ways are incompatible, and we are quickly left with a very short list. It is easy to see that we cannot possibly be sure that the items are sorted without every item being at some stage compared with its neighbours, so the luckiest possible is N-1 comparisons [e.g., if the items are originally in the right order, and we just look along the row]. But being very lucky implies that we could equally have been unlucky; if the comparison had gone the other way, then most of the ways would have been compatible, and we would have had to make many more comparisons to make the list shorter. If the comparison would divide the list in this unequal way, then if all orderings are equally likely [this assumption is important!], then the bigger part is more likely to be the compatible outcome than the smaller part. So the best we can hope to do, both on average and in the worst case, is to divide the list into two equal parts. In this case, the number of steps needed to reduce the list to one is clearly lg N!, or roughly N lg (N/e), again by Stirling's approximation. You cannot do better than this on average; and if you devise a method which does better than this in some, or even most, cases, then it will sometimes do worse, perhaps even much worse.

Using O notation, we can draw up a little table:

.	insertion	selection	theoretical limit
best case	O(N log N)	O(N^2)	O(N)
average case	O(N^2)	O(N^2)	O(N log N)
worst case	O(N^2)	O(N^2)	O(N log N)

This may sound a bit theoretical, so let's put some numbers in. Suppose we can do a comparison in one microsecond. This is perhaps a little pessimistic on modern PCs if we're comparing integers, but rather optimistic if we're comparing names or other complicated structures.

N	selection	theoretical limit
10	50 microseconds	22 microseconds
100	5 milliseconds	520 microseconds
1000	0.5 seconds	8.5 milliseconds
10000	50 seconds	120 milliseconds
100000	1.5 hours	1.5 seconds
1000000	6 days	19 seconds
10000000	2 years	3.5 minutes
100000000	2 centuries	40 minutes

Well, you get the idea! For small arrays of items, it doesn't very much matter what you do. But no jiggery-pokery with faster computers or slightly better programming is going to make up for the lower lines. If you want to sort thousands or even millions of items, then we just have to look for ways that look more like the right-hand column.

Of course, we haven't yet shown that there are any methods that even begin to approach the theoretical limit. That is our next topic!

top of page

G12FCO home page. border=0

Previous lecture. border=0

Next lecture. border=0

Coursework 1 .