The median is the item that would be in position k = (N+1)/2 if the array were sorted. The suggestion was to use the DNF algorithm to partition the array into `small', `medium' and `large' items, then we know whether the median is `red', `white' or `blue' from the positions of the markers. So, typically there is a range from lo to hi of items within which the median must be found [initially the whole array], and we use DNF to refine this range, using A[k] as the pivot. Thus:
[You were asked for a sketch of this code, so anything that gives the essence would have sufficed.] Note that we must partition using a copy of A[k], even though k doesn't change, because the swapping of DNF means that the items of A will move under our feet. As soon as the median turns `white', we can exit -- indeed, this is really the only way out, and the get-out when hi-lo <= 1 is merely a [rather modest and unnecessary] efficiency hack.lo = 1, hi = N while hi - lo > 1 do pivot = A[k], red = lo, white = lo, blue = hi [now do DNF, where "A[i] is red" means "A[i] < pivot", etc.] [after this, A[lo], ..., A[red-1] are "red", A[red], ..., A[white-1] are "white", and A[white], ..., A[hi] are "blue"] if red > k # median is red -- partition lo to red then hi = red elseif white <= k # median is blue -- partition white to hi then lo = white else lo = k, hi = k # median is white -- finished! endif endwhile return A[k]
How many operations does this algorithm need? In the worst case, it has the same problem as Quicksort -- that an unlucky choice of pivot makes the partitioning very unequal. Specifically, if at each stage the kth item of the array happens to be the largest [or smallest], then only one item is stripped away each time, and we have to do O(N) passes of DNF on parts of the array that are themselves O(N), and the total work is O(N2). But the expected performance is much better. If the pivot is a `typical' element, then roughly half the items will be `red', and half `blue', and only one of these parts will be further processed. So we expect to do a DNF pass on all N items, then a second pass on O(N/2), a third on O(N/4), and so on, total work O(N + N/2 + N/4 + ...), or O(2N).
This is not the most efficient way to find medians; but efficient ways are surprisingly complicated. Ways are known whose worst [and expected!] behaviour is O(N); many books on algorithms do this as an example.
Problem 2: -- Mergesort.
As described on the problem page, the real problem here is to manage the storage. The `merge' phase is much easier if you can merge into a temporarily unused space. We `create' this by first copying the items to be sorted into a new array, then merge the two halves of the new array after sorting them. Thus:
As discussed on the problem page, this method is guaranteed to be O(N log N).mergesort (A, i, j) # to sort A[i], ..., A[j]: if j > i+1 then n = j+1-i dim B[n] # create space for p = 1 to n do B[p] = A[i-1+p] endfor mid = n/2 # sort the two bits: mergesort (B, 1, mid) mergesort (B, mid+1, n) # now merge back into the space in A: p = i, q = 1, r = mid+1 while q <= mid and r <= n do if B[q] < B[r] then A[p] = B[q], p = p+1, q=q+1 else A[p] = B[r], p = p+1, r=r+1 endif # copy back "tails" [only one will now be non-empty] while q <= mid do A[p] = B[q], p = p+1, q=q+1 endwhile while r <= n do A[p] = B[r], p = p+1, r=r+1 endwhile endif # otherwise, nothing to do!
Problem 3: -- Heapsort.
[For the method, its rationale, and its efficiency, see (again) the problem page.]
for k = 2 to N do # items 1 ... k-1 have the heap property; we add item k j = k while j > 1 and A[j] < A[j/2] do swap (j, j/2), j = j/2 endwhile # "swap(p,q)" means interchange A[p] with A[q]. endfor # heap set up. From now, A[1] is always the largest (remaining) # item -- we pick it off, and swap it to the end. Then we have # to swap the new A[1] to its rightful place in the heap. for k = N to 2 by -1 do swap (1, k) j = 1 loop most = j if 2×j < k and A[2×j] > A[most] then most = 2×j endif if 2×j+1 < k and A[2×j+1] > A[most] then most = 2×j+1 endif if most = j then exitloop endif swap (most, j), j = most endloop endfor