One way to find the median of an array of items would be first to sort them, then you can easily pick out the middle one. But this is doing too much work. Think of Dutch national flag; once the array is partitioned, we know whether the median item is red, white or blue, so only one of the colours needs further sorting. Which one? Well, clearly the reds if they constitute more than half the array, the blues likewise, and otherwise the median is `the' white item [as these are all equal]. You will find it a good idea to generalise the problem from finding the median to finding the kth out of N items. Which element will you choose as pivot?
To mergesort elements i to j of an array: if j > i, mid = (i+j)/2, mergesort elements i to mid, mergesort elements mid+1 to j, merge the two parts.The problem with mergesort is the storage needed. You can't easily merge two arrays into one using the same space, so you have to merge into a new array. This is, in most cases, no longer a serious problem, as large memories become cheap.
If you have the storage, then this is a pretty efficient method. For example, to sort 1024 items, you first sort the two halves of 512 items each, then do no more than 1023 comparisons to merge them. To sort two halves of 512 items each, you first sort the four quarters of 256 items each, then do no more than 2x511 = 1022 comparisons to merge them into the two halves. ... To sort 512 pairs of items, you first sort the 1024 single items [which requires no work], then in 512x1 = 512 comparisons merge them into pairs. So the total number of comparisons is no more than
1023+1022+1020+1016+...+512 = 9217 = 10x1024 - 1023.Here, the 10 is lg 1024. Generalising is left as an exercise, but it's easy to see that the method is always O(N log N).
Problem 2: Write pseudo-code to implement mergesort [in rather more detail than is given above!].
[phase 1:] Partially-sort the array until it is a heap. [phase 2:] Repeatedly pull off the first element and then restore the heap property.Phase 1 can be accomplished by essentially our technique for `insertion'; assume A[1] to A[j-1] form a heap, where j can be 2 initially, then add A[j] to it. To add A[j], repeatedly compare it with its parent (in A[j/2]), stopping when j reaches 1 or if the parent is older, and otherwise swapping child with parent and halving j.
Phase 2 is a little harder. Removing the first element creates a gap, which must be filled. Well, the trick is to swap the first element with the last member of the heap [which is then reduced by one element]; this puts the wrong element at the top, so this must be swapped down in a sort-of inverse of phase 1 until the heap is restored. Specifically, you swap the `wrong' element with its elder `child' [requires two comparisons] until either you run off the end of the heap or you find the element to be older than both its children.
How much work is done? Well, in phase 1, j is halved every time round the inner loop, and in phase 2 it is doubled every time. So the number of comparisons and swaps is O(log N) for each element in each phase, and so the total work is guaranteed to be no worse than O(N log N).
Snags? Well, it's a little complicated. It's also rather counter-intuitive; to finish with the youngest person at the front, the heap has to have the oldest person at the front, ready to be swapped to the back. This means in turn that sorting an array that is already sorted is no easier than sorting a random array.
Problem 3: Write pseudo-code to implement heapsort. [You won't have time to do this in the problem class, but I commend it as a programming/algorithms exercise. If you don't fully understand how it works, try it by hand. One way is to take about ten cards, shuffle them, lay them out in order and get heaping. Another way is with people. Set them out in a sort-of rugger scrum, one person at the front, two in the next row, four in the next, and so on. Each person must hang on to their `parent'; set up the heap property by swapping children and parents as suggested above, etc.]