G12FCO -- Formal Computation

Problem Class 2 -- Median, and More Sorting -- solutions

Problem 1: -- Finding the median.

The median is the item that would be in position k = (N+1)/2 if the array were sorted. The suggestion was to use the DNF algorithm to partition the array into `small', `medium' and `large' items, then we know whether the median is `red', `white' or `blue' from the positions of the markers. So, typically there is a range from lo to hi of items within which the median must be found [initially the whole array], and we use DNF to refine this range, using A[k] as the pivot. Thus:

lo = 1, hi = N
while hi - lo > 1 do
   pivot = A[k], red = lo, white = lo, blue = hi
   [now do DNF, where "A[i] is red" means "A[i] < pivot", etc.]
   [after this, A[lo], ..., A[red-1] are "red", A[red], ..., A[white-1]
      are "white", and A[white], ..., A[hi] are "blue"]
   if red > k			# median is red -- partition lo to red
     then hi = red
     elseif white <= k		# median is blue -- partition white to hi
     then lo = white
     else lo = k, hi = k	# median is white -- finished!
   endif
endwhile
return A[k]

[You were asked for a sketch of this code, so anything that gives the essence would have sufficed.] Note that we must partition using a copy of A[k], even though k doesn't change, because the swapping of DNF means that the items of A will move under our feet. As soon as the median turns `white', we can exit -- indeed, this is really the only way out, and the get-out when hi-lo <= 1 is merely a [rather modest and unnecessary] efficiency hack.

How many operations does this algorithm need? In the worst case, it has the same problem as Quicksort -- that an unlucky choice of pivot makes the partitioning very unequal. Specifically, if at each stage the kth item of the array happens to be the largest [or smallest], then only one item is stripped away each time, and we have to do O(N) passes of DNF on parts of the array that are themselves O(N), and the total work is O(N²). But the expected performance is much better. If the pivot is a `typical' element, then roughly half the items will be `red', and half `blue', and only one of these parts will be further processed. So we expect to do a DNF pass on all N items, then a second pass on O(N/2), a third on O(N/4), and so on, total work O(N + N/2 + N/4 + ...), or O(2N).

This is not the most efficient way to find medians; but efficient ways are surprisingly complicated. Ways are known whose worst [and expected!] behaviour is O(N); many books on algorithms do this as an example.

Problem 2: -- Mergesort.

As described on the problem page, the real problem here is to manage the storage. The `merge' phase is much easier if you can merge into a temporarily unused space. We `create' this by first copying the items to be sorted into a new array, then merge the two halves of the new array after sorting them. Thus:

mergesort (A, i, j)
	# to sort A[i], ..., A[j]:
   if j > i+1 then
      n = j+1-i
      dim B[n]	# create space
      for p = 1 to n do B[p] = A[i-1+p] endfor
      mid = n/2
      	# sort the two bits:
      mergesort (B, 1, mid)
      mergesort (B, mid+1, n)
		# now merge back into the space in A:
      p = i, q = 1, r = mid+1
      while q <= mid and r <= n do
         if B[q] < B[r]
           then A[p] = B[q], p = p+1, q=q+1
           else A[p] = B[r], p = p+1, r=r+1
         endif
		# copy back "tails" [only one will now be non-empty]
      while q <= mid do A[p] = B[q], p = p+1, q=q+1 endwhile
      while r <= n do A[p] = B[r], p = p+1, r=r+1 endwhile
   endif	# otherwise, nothing to do!

As discussed on the problem page, this method is guaranteed to be O(N log N).

Problem 3: -- Heapsort.

[For the method, its rationale, and its efficiency, see (again) the problem page.]

for k = 2 to N do
	# items 1 ... k-1 have the heap property;  we add item k
   j = k
   while j > 1 and A[j] < A[j/2] do swap (j, j/2), j = j/2 endwhile
		# "swap(p,q)" means interchange A[p] with A[q].
endfor
	# heap set up.  From now, A[1] is always the largest (remaining)
	# item -- we pick it off, and swap it to the end.  Then we have
	# to swap the new A[1] to its rightful place in the heap.
for k = N to 2 by -1 do
   swap (1, k)
   j = 1
   loop
      most = j
      if 2×j < k and A[2×j] > A[most] then most = 2×j endif
      if 2×j+1 < k and A[2×j+1] > A[most] then most = 2×j+1 endif
	if most = j then exitloop endif
      swap (most, j), j = most
   endloop
endfor

top of page

G12FCO home page. border=0

Previous discussion. border=0

Next discussion. border=0

Problem class.