G12FCO -- Formal Computation, solutions 1996

G12FCO -- Formal Computation

Solutions to exam, 1996

Warning -- These are the model solutions exactly as supplied to the external examiner, and they do not take into account any changes to questions or mark schemes made as a consequence of his response. So they are not guaranteed to correspond exactly to the paper as set, and they were not written with student consumption in mind. Typos and other bungles also possible!

Let P_k be the proposition that first and second are the indices of the largest and second-largest respectively of A₁, A₂, ..., A_k. We initially establish P₂; then maintain P_k as a loop-invariant, so that on termination [guaranteed, as there as no uncounted loops or subroutine calls], the truth of P_n guarantees that second is the correct answer.
[10 marks for sensible discussion of correctness/termination along these lines, including clear correlation with the code; partial credit for partial discussions.]
```
		int first, second, k;
		if (n < 2)
			error ("n too small");	     /* bonus! */
		if (A[1] >= A[2])
			first = 1, second = 2;
		else
			first = 2, second = 1;	     /* P2 established */
		for (k = 3; k <= n; k++)	     /* P(k-1) established */
			if (A[k] > A[second])
				if (A[k] > A[first])
					second = first, first = k;
				else
					second = k;  /* Pk established */
		return second;			     /* Pn established */
```
[10 marks for clear and correct algorithm; 6 for implementation, with trivial `typos' ignored and 2 marks deducted for each other error.]
The loop is entered n-2 times, doing one comparison each time and a second comparison whenever second is updated; there is one comparison outside the loop, so the number of comparisons is (a) in the best case, n-1, for example when the objects are initially sorted largest-first, and (b) in the worst case, 2n-3, for example when the objects are initially sorted smallest-first.
[2 marks for each. Total mark: 30]
Comments: Unseen example, of similar difficulty to other coding exercises discussed during the module. Students' preferred language may [well] be more like QBasic than C! Note: it is certainly possible to find second largest in n-2 + ceiling(lg n) comparisons -- run a Wimbledon-style KO tourney, then run a small KO tournament among those who lost to the winner.
Suppose that after making k comparisons, there are p_k of the initial orderings still compatible with the outcomes of the comparisons; by assumption, all of these orderings are equally likely. Now suppose we compare two of the objects. Since the objects all differ, there will be some number of orderings, say q, in which the `first' object is greater than the `second' and p_k-q in which it is less. In the `best' case, q = p_k-q, it follows that p_k+1 = q = p_k/2; for any other value of q, since the actual ordering is more likely to be in the larger partition [by assumption], the expected value of p_k+1 is greater than this. Since p₀ = n!, we deduce that p_k is, in the best case 2^-kn!, and in any other case its expected value is greater than this. But for as long as p_k>1, the objects are not [known to be] sorted; we deduce that the expected number of comparisons [and a fortiori the total number of operation] is at least the smallest value of k which makes 2^-kn! <= 1, that is, the ceiling of log₂n!. By Stirling's approximation, n! is approximately (n/e)ⁿ, so log₂n! is approximately n log n - n, or O(n log n), as required.
[Bookwork; 10 marks]
Quicksort: The objects are `partitioned' [for example, using the DNF algorithm] into those less than, equal to, or greater than some `pivot' element; each partition is then sorted using quicksort recursively. Best case: the partitions are of equal size every time, so that the maximum partition `depth' is O(log n), each object is compared with the pivot once at each depth, and so the work done is O(n log n). Worst case: one partition is empty each time, so only the pivot is removed from the set of objects [it turned out to be the largest or smallest object], and the total work done is clearly O(n²). Average case: each partition is a `reasonable' fraction of the whole, so the maximum partition `depth' is again O(log n) [though with a larger constant than in the best case], and the total work O(n log n).
[All bookwork; 2 marks for description and each case, total 8]
Insertion sort: Assume, for a loop invariant, that the first k objects have been sorted, where initially k = 0. To sort the first k+1, find by the `binary chop' algorithm where in the first k the (k+1)th should go, and create space for it by moving up a place all the objects [perhaps none] between that position and the (k+1)st. Best case: no moves are needed, when the objects are already in the right order, so only the binary chops are needed, total work O(log 1 + log 2 + ... + log n), or O(n log n). Worst case: the next object always sorts to the front, when the objects are initially anti-sorted, and the work needed is dominated by having to move every already-sorted object at every stage, total work O(1 + 2 + ... + n), or O(n²). Average case: the next abject sorts to the middle, so on average half the objects move at each stage, and the total work done is half that for the worst case, so still O(n²).
[All bookwork; 2 marks for description and each case, total 8]
To mitigate the worst-case behaviour of `quicksort', we need to take more care over choice of pivot -- a typical technique would be to choose the median of three pseudo-randomly chosen objects. Also, the recursion and other book-keeping of `quicksort' become inefficient when there are few objects, so another idea is to stop `quicksorting' when the partitions become small and instead to do a `clean-up' pass over the objects with a different sorting technique designed to be efficient with a nearly-sorted array, such as insertion sort.
[Bookwork; 4 marks. Total of 30 marks available for the question.]
No detailed solution supplied! [20 marks available for factual information, 10 for literary merit, total 30.]
- Clearly IS is in NP [2 marks], for an oracle could suggest an appropriate set of nodes [1 mark], and we could verify in polynomial time [2 marks] that no two of these nodes were connected [1 mark].
- Consider an arbitrary instance of 3SAT [3 marks], and suppose it has n clauses. Construct from it the graph suggested in the hint [3 marks], which is clearly possible in polynomial time [for each triangle can be drawn in constant time, and each literal can be joined to its negation in time of order n, so the total work needed is of order n² [3 marks]. Now ask whether this graph has an independent set of size n [2 marks]. If so, then it must clearly include one vertex of each triangle, and the original instance of 3SAT was satisfiable by setting to true the literals corresponding to those vertices -- no contradiction can arise, for no literal can be in the independent set together with its negation, by construction [3 marks]. Conversely, if the instance of 3SAT was satisfiable, then we can choose a vertex of each triangle corresponding to a true literal in a satisfying assugnment, and the set of such vertices constitutes an independent set of size n [3 marks]. So the instances of 3SAT and IS have the same decisions [1 mark].
- So, an arbitrary instance of 3SAT can be transformed in polynomial time [2 marks] into an instance of IS [2 marks]. But 3SAT is NP-complete [1 mark]; hence so must be IS [1 mark], QED.
[Total mark 30. Unseen problem of a familiar type -- slightly easier than those done for coursework and slightly harder than those tackled in problem classes.]

Assume that in a typical intermediate state the tape looks somewhat like [with `b' representing a blank square]

...

where the binary number to the left of the central blank is the partially constructed result, while to the right the 1's are the remaining unary number and the 0's are to be ignored. Initially, there is nothing to the left, there are no 0's to the right, and the read/write head is over the leftmost 1. We use the following states:

If your browser numbers the first state as 1, it is ignoring the "value" attribute -- renumber the states as 0, ..., 8 instead of 1, ..., 9, else it doesn't make sense. Grrr!

[start] Scan right, looking for a 1 [state 1]. On reaching a blank in this state, enter state 2 [tidy up and halt].
Scan right, looking for a blank [state 3].
Scan left, looking for a blank [state 8]. Overwrite 0's with blanks.
Scan left; an even number of 1's have been seen. Overwrite 1 with 0 [state 4]. On reaching the blank, enter state 5.
Scan left; an odd number of 1's have been seen. On reading 1, enter state 3; on reaching the blank, enter state 6.
Unary number was even; scan left, overwrite the blank with a 0 and enter state 7.
Unary number was odd; scan left, overwrite the blank with a 1, and enter state 7.
Digit written; scan right looking for the blank [state 0].
Halt.

So we get the following FSM [blanks `can't happen']:

8 [halt]
	next state			write symbol			direction
current state \ read symbol	b	0	1	b	0	1	b	0	1
0 [start]	2	0	1	b	0	1	L	R	R
1	3	1	1	b	0	1	L	R	R
2	8	2		b	b		L	L
3	5	3	4	b	0	0	L	L	L
4	6	4	3	b	0	1	L	L	L
5	7	5	5	0	0	1	R	L	L
6	7	6	6	1	0	1	R	L	L
7	0	7	7	b	0	1	R	R	R

[Marks: 10 for general design and structure, pro rata for faulty attempts, trivia ignored in this section; 3 for description and 2 for sensible design of tape layout; 10 for detailed description of TM, either as matrix or as state transition diagram, 2 marks deducted for each missing or seriously incorrect state, but trivia ignored; 5 for accuracy, 1 deducted for each error; total 30. Unseen, but TMs of similar difficulty seen during module.]

1. Computable. root pi can be composed as exp ([log {4 arctan 1}]/2), so is computable, so given n we can find its nth digit and hence determine d_n [4 marks].
2. Computable. The truth of the proposition `n is prime' for a given n can be computed by testing n for divisibility by each integer less than n [or more efficiently!], hence we can determine d_n [4 marks].
3. Not computable. A machine that could determine d_n for arbitrary given n could thereby solve the `blank-tape halting problem', and hence `the' halting problem [5 marks].
4. Computable. The number is 0.111...1000... for some number, possibly zero, of 1's, and is therefore rational [6 marks]. [We know that some TMs halt; if we didn't know this, then the number could instead be 0.111..., but that too is rational.]
5. Not computable. Given an instance of the halting problem for a TM with two symbols, say X and Y, we could construct a new TM in which each halt state was replaced by a state in which the symbol 1 was written. A machine that could determine d_n for the n that corresponds to the newly-constructed TM would determine the halting problem for the original TM; since that TM was an arbitrary binary TM, such a machine could solve the `blank-tape halting problem' [6 marks].
6. Computable. After a number of ticks equal to the number of states of the given TM [easily determined from its position in the given enumeration], the TM must either halt, or write a non-blank symbol, or repeat a state with the tape still blank; in the last case, it will clearly continue to read blanks, and will therefore repeat this cycle of states indefinitely. So a simple emulation for a computable number of cycles suffices to determine d_n [5 marks].
[Each of these is a minor twist on an example discussed in lectures. Total available marks: 30]