G12FCO -- Formal Computation

Problem Class 2(a) -- Finding the median

Problem 1: Given an array of N items, find the median item -- i.e. the one that would be in the middle if the array was sorted. Assume that N is odd, otherwise the problem becomes harder to solve but for non-interesting reasons. Write pseudo-code to do this; i.e., as usual, don't bother about details of syntax and declarations, just get the algorithm sorted. Estimate the number of operations needed by your algorithm.

One way to find the median of an array of items would be first to sort them, then you can easily pick out the middle one. But this is doing too much work. Think of Dutch national flag; once the array is partitioned, we know whether the median item is red, white or blue, so only one of the colours needs further sorting. Which one? Well, clearly the reds if they constitute more than half the array, the blues likewise, and otherwise the median is `the' white item [as these are all equal]. You will find it a good idea to generalise the problem from finding the median to finding the kth out of N items. Which element will you choose as pivot?

Problem Class 2(b) -- Yet More Sorting

Although Quicksort is (with a number of refinements) the sorting method of choice, it has a poor worst-case behaviour. Here are two methods with good worst-case behaviour; but, inevitably, with other problems.

Mergesort

If you have two sorted arrays -- say, of people -- you can merge them in no more than N-1 comparisons; essentially, you compare the two people at the front, and the earlier one can be taken off [and replaced by the person behind], until one of the arrays becomes empty, Mergesort is the recursive extension of this idea:

	To mergesort elements i to j of an array:
		if j > i,
			mid = (i+j)/2,
			mergesort elements i to mid,
			mergesort elements mid+1 to j,
			merge the two parts.

The problem with mergesort is the storage needed. You can't easily merge two arrays into one using the same space, so you have to merge into a new array. This is, in most cases, no longer a serious problem, as large memories become cheap.

If you have the storage, then this is a pretty efficient method. For example, to sort 1024 items, you first sort the two halves of 512 items each, then do no more than 1023 comparisons to merge them. To sort two halves of 512 items each, you first sort the four quarters of 256 items each, then do no more than 2x511 = 1022 comparisons to merge them into the two halves. ... To sort 512 pairs of items, you first sort the 1024 single items [which requires no work], then in 512x1 = 512 comparisons merge them into pairs. So the total number of comparisons is no more than

1023+1022+1020+1016+...+512 = 9217 = 10x1024 - 1023.

Here, the 10 is lg 1024. Generalising is left as an exercise, but it's easy to see that the method is always O(N log N).

Problem 2: Write pseudo-code to implement mergesort [in rather more detail than is given above!].

Heapsort

A heap is a partly-sorted array. There are several ways to think of it, but the simplest is perhaps to think of A[2n] and A[2n+1] as being the `children' of A[n]. We don't know which is the older child, but we do know who the oldest person in the entire array is -- it's A[1]. So the proposed algorithm is in two parts:

	[phase 1:] Partially-sort the array until it is a heap.
	[phase 2:] Repeatedly pull off the first element and then
			restore the heap property.

Phase 1 can be accomplished by essentially our technique for `insertion'; assume A[1] to A[j-1] form a heap, where j can be 2 initially, then add A[j] to it. To add A[j], repeatedly compare it with its parent (in A[j/2]), stopping when j reaches 1 or if the parent is older, and otherwise swapping child with parent and halving j.

Phase 2 is a little harder. Removing the first element creates a gap, which must be filled. Well, the trick is to swap the first element with the last member of the heap [which is then reduced by one element]; this puts the wrong element at the top, so this must be swapped down in a sort-of inverse of phase 1 until the heap is restored. Specifically, you swap the `wrong' element with its elder `child' [requires two comparisons] until either you run off the end of the heap or you find the element to be older than both its children.

How much work is done? Well, in phase 1, j is halved every time round the inner loop, and in phase 2 it is doubled every time. So the number of comparisons and swaps is O(log N) for each element in each phase, and so the total work is guaranteed to be no worse than O(N log N).

Snags? Well, it's a little complicated. It's also rather counter-intuitive; to finish with the youngest person at the front, the heap has to have the oldest person at the front, ready to be swapped to the back. This means in turn that sorting an array that is already sorted is no easier than sorting a random array.

Problem 3: Write pseudo-code to implement heapsort. [You won't have time to do this in the problem class, but I commend it as a programming/algorithms exercise. If you don't fully understand how it works, try it by hand. One way is to take about ten cards, shuffle them, lay them out in order and get heaping. Another way is with people. Set them out in a sort-of rugger scrum, one person at the front, two in the next row, four in the next, and so on. Each person must hang on to their `parent'; set up the heap property by swapping children and parents as suggested above, etc.]

top of page

G12FCO home page. border=0

Previous problem class. border=0

Next problem class. border=0

Lecture 5.