G12FCO -- Formal Computation

Lecture 5 -- Sorting II

The previous discussion implies that we should be looking for much more efficient ways of sorting. If we are to approach the theoretical limit, then we need to look at each item no more, on average, than O (log N) times, rather than O(N). How? There are several ways, of which the most interesting is that known as Quicksort. To introduce this, we first look at a problem known as ...

The Dutch National Flag

The Dutch national flag consists of red, white and blue stripes. You are asked to imagine that we have an array of items of these colours, and we have to re-arrange them into order, all the reds, then all the whites, then all the blues.

Naturally, there are some snags. Firstly, we don't know what colour each item is until we look at it, and we have to imagine that looking is expensive. Secondly, we are forgetful, so we can't look at all the items once and then remember what they all are. Thirdly, all we can do to re-arrange items is to swap two of them. Fourthly, our algorithm must be robust in case there actually are no items of some particular colour(s). Fifthly, all of this is to be done as efficiently as possible.

So, our items are of four kinds: not yet looked at [let us call these `grey'], known to be red, known to be white, and known to be blue. Initially, all the items are grey, and finally none of them will be. Our `forgetfulness' implies that items of the same colour must be in zones, so that we have to remember only the boundaries of a small number of zones, not the colours of each item. Obviously, four zones, Grey, Red, White and Blue, are both necessary and sufficient. In what order? It doesn't really matter, but it will obviously be best if we put Red at the left and Blue at the right, then we will never have to move [again] any red or blue item. If we, arbitrarily, use the order RWGB, then we get, in the middle of the process, the following picture [which looks better in Netscape version 3 upwards or in Internet Explorer version 2]:

red	white	grey (== unknown)		blue
1	^red	^white	blue^	N

Left as exercises: The order RGWB differs only trivially from the above. If there are expected to be very few white items, then the order WRGB (or RGBW) is more efficient; you then have to tidy up at the end. You can also experiment with the more symmetric RGWGB (with two grey zones); this is more efficient until one of the grey zones disappears, but the program is harder to write.

The boundaries of the zones are marked by the three integer variables red, white and blue, as shown. In the diagram, we have red set to be the index of the first non-red item, and similarly for the others; you can experiment with setting red to be the last red item, which looks slightly more intuitive, but is actually just a little harder to program.

We now have the basis for our algorithm:

        Initialise the array to all grey.
        While there are any grey items,
                look at one of the grey items, and
                        if it is blue, swap with the last grey item, or
			if it is white. swap with the first grey item, or
                        if it is red, swap with the first white, and
                                swap that white with the first grey,
                in each case, updating the zone markers to match.

Which grey item should we look at? In principle, any would do; looking at the algorithm, it is `obviously' best to look at the first grey item, that indicated by the white variable, which makes the red and white cases much easier. So, refining our algorithm to be much closer to computerese, we have:

        [Initialise:] red = 1; white = 1; blue = N
        [Loop:] while white <= blue
                if item[white] is white, increment white, else
		if it is blue, swap (white, blue) and decrement blue, else
		[it is red:] swap (white, red) and increment red and white

Is this a proper algorithm? Yes. Looking at the above figure, we have a rather complicated `invariant': items 1 to red-1 [inclusive] are known to be red, items red to white-1 [inclusive] are known to be white, items white to blue [inclusive] are known to be grey, and items blue+1 to N [inclusive] are known to be blue. This condition is established to be true by the initialisation (with the red, white and blue zones all empty), it is re-established each time round the loop, and the number of grey items, blue-white+1, is reduced by one each time. Further details again left as an exercise!

Remark: How does anything think of such a messy invariant, and shouldn't an invariant look more like an equation? To take the last part first -- sometimes it does, but more often it takes the form of a description of the current state of the calculation. How do we think of it? Basically by imagining the initial state of the calculation and the final state, and trying to generalise to an intermediate state that we can describe. There is no substitute for experience.

This algorithm looks at each item, and does a bounded amount of work on each, so it is O(N). You can make it somewhat more efficient by looking at two items [those indexed by white and blue] at a time, and dealing with more cases, at the expense of complexity and of making many special cases [e.g. when there is only one unknown item]; exploring this would take us too far afield, and there is clearly no way of avoiding looking at every item.

What does all this have to do with ...

Quicksort

you are asking? The answer is that the DNF algorithm can also be used for pivoting. Suppose that instead of sorting our items [simply] into red, white and blue, we want to sort them into order. Then we can select one item [the pivot], and split the array into those less than the pivot [`red'], equal to it [`white'] and greater than it [`blue']. We then need to sort the `red' and `blue' sections of the array. How? By using the same method. Thus:

        To quicksort items i to j [inclusive] of an array:
                if i < j [else there is nothing to do]
                        choose a pivot,
                        use DNF to partition this segment of the array,
                        quicksort items i to red-1,
                        quicksort items blue+1 to j.

Remark: As with all recursive processes, do not try to understand this algorithm by staring at how it works in detail. You first need to have faith that it works, then it is easy to see that your faith is justified.

How efficient is this process? In the best cases, the pivot is roughly the median element, so that the red and blue segments are roughly half the array each. So the DNF algorithm is used once on the whole array, then again on each half, again on each quarter, again on each eighth, and so on, until the segments are of size one. So the work done is O(N) + 2xO(N/2) + 4xO(N/4) + ... + NxO(1) or O(N) + O(N) + O(N) + ... + O(N), where the number of terms is clearly of order the log to base 2 of N, so the total work is [at last!] O (N log N).

Have we found the Holy Grail of sorting? Yes and no. Quicksort is, with a few modifications, the method of choice for comparison-and-swap-based sorting. It is easy and efficient -- in the best case. The choice of pivot is crucial. In the worst case, the pivot turns out to be at one end of the array; in this case, the `blue' segment is all-but-one of the items, and we have found a rather poor implementation of selection sort. Note that this happens systematically if we choose the pivot to be either the first or the last item and the array happens to be nearly sorted already -- a common special case. So, it is worth spending some time finding a good pivot. One commonly used way is to select the median of the first, last and middle elements of the array; this works well if the array is `random' or is nearly sorted, but it can still do badly if the pivot is especially unlucky.

With careful choice of pivot, quicksort is a good way of bringing the array into nearly-sorted order; but it becomes rather fussy when the segments get very short. So it is usual to replace quicksort by one of the less-efficient sorting methods in the very last stages. It is also possible to replace some of the recursive calls of quicksort by a loop. Full details get very messy, but if you're trying to squeeze the last few percent of efficiency out of a program, ....

If a cast-iron guarantee of O (N log N) behaviour is more important than minimising the expected time taken, then you need to use a different method. Heapsort and mergesort are two methods to be considered; they are discussed elsewhere, along with some other refinements and methods. If your browser understands Java, there is a Web page which demonstrates lots of methods in real-time, so that you can see how they work in practice.

top of page

G12FCO home page. border=0

Previous lecture. border=0

Next lecture. border=0

Problem Class 2 .