G12FCO -- Formal Computation

Lecture 3 -- Pseudo-Algorithms

We talked about the three properties [determinism, termination and correctness] of proper algorithms. What happens if one, or more, of these properties fails? The result is a pseudo-algorithm. Pseudo-algorithms can be extremely useful, but they are usually much harder to analyse.

Non-determinism

There are three areas of relevance here: randomness, choice and ambiguity.

Ambiguity is an accidental choice; if a computer program talks about a+b+c, then in many (not all) computer languages, it is undefined whether a and b are first added, and the result added to c, or b and c added and the result added to a. Often, this choice won't matter; but the order affects, for example, rounding errors, and problems where these are critical will have to be described in a way, or in a language, such that the order is determined. Most computer languages have many such corners. A common one is that procedure or operand parameters may be evaluated in any order; so even just a+b is ambiguous if a and b are, for example, functions which may have side-effects such as printing some information. If you become a professional programmer, then it will be your responsibility to understand your chosen language well-enough to know when such ambiguities occur, and when they matter.

Choice is common in informally-expressed algorithms: "Choose one of the vertices and ...". For a proper algorithm, or for a computer program, you will have to say how the choice is made: "Start with the first vertex and ...". This is easy for finite sets, rather hard for infinite sets ["Choose a positive real number and ...."], which doesn't stop it happening all over the place in pure mathematics ["Choose delta such that ...."], but that's another story. There are two problems with making this choice. Firstly, it's over-particularising the algorithm -- people who use it may not know whether choosing the first vertex is important, and they may have good reasons for making a different choice in a particular application. Secondly, particular choices may work badly in particular cases; one example is that if you are trying to sort some numbers, and they happen to be already sorted, then the first element will be the smallest, which is often a bad choice in sorting algorithms. So while your rival has an algorithm which says almost instantaneously "This array is already sorted!", your algorithm is painfully slow in exactly the most trivial case. On the other hand, if you do not make a definite choice, then firstly you may have a harder job showing that your algorithm always works (for it will have to work for all possible choices), and secondly if I get different results from you, you may not know whether it's because one of us has made a mistake or because there is some subtle reason why my choice being different from yours can affect the answer.

The use of random numbers is common in simulation programs, and also as a way of implementing choice [as just discussed]. For example, a chess program may select at random from a number of standard opening lines while being otherwise completely deterministic. Computer programs `never' use truly random numbers; even if this was possible, it would make testing very difficult. Instead `pseudo-random' numbers are used; typically, these generate a 32-bit integer as a more-or-less complicated function of the previously generated number, the details being beyond the scope of this module. This process has to be `seeded'; often by using the current time or some other unpredictable number as the starting value. While testing, a known seed can be used, so that the same sequence of random numbers is always generated, making bugs easier to track down. Programs that use random numbers are often very hard to analyse, as the best that can be said is that some parameter has such-and-such a distribution.

Non-termination

Many useful programs are designed never to stop, at least in normal use. Many computers stop only when the power is switched off; and you might expect the computers that control a nuclear power station to keep going, at least while the station closes down, even if the power is switched off. Similarly, the computer programs that control space probes are designed not to stop even if serious errors occur; they put themselves into `safe' modes in which they can be re-programmed by remote commands from base.

Nevertheless, if you expect your program to produce some results, then you would like a guarantee that `eventually' those results will be forthcoming. Here there is a snag. There is a very general result, which we will look at later in the module, that tells us that you can't expect to look at an arbitrary program and be able to decide whether that program will terminate. This is the famous Halting Problem. So, if you want to be able to decide this (or any of many other interesting properties of your programs), your algorithm has to be designed in such a way that the decision is possible.

You may have been told that using the goto command in programming is a Bad Idea -- I prefer simply never to tell students that the command exists! The reason behind this is that goto commands break up the loop structure of your program, making it very hard to analyse it and make decisions about its correctness or termination. If you understand your program very well, then you may be able to decide despite the gotos; but if you are good enough and experienced enough to do that, then you will find that you don't need the gotos anyway.

If you can't show that a loop terminates, then you can't use a post-condition either. If your program says while x <> 0 ..., then all you can say about the next bit of the program is that either x is zero or we never get to it.

Programs that use random numbers may have problems in this area. For example, one way of doing something with probability 1/3 is to toss a fair coin. If it comes down heads, then we don't do whatever-it-was. If it comes down tails, then we toss again; if the second toss is heads then we do do it, and otherwise we start the whole process again. Clearly, if we always get tails, then this process never terminates; but the probability is only 2^-n that we will need more than n tosses, so as a matter of practice it works very efficiently, and we expect to use only two tosses.

Incorrectness

How can an incorrect program ever be useful? It depends what the error is, and how well-controlled it is. We can start with real numbers; symbolic packages such as Maple have made some progress towards working exactly with irrational numbers, but most computer programs and calculators simply work to some number of significant figures. Mostly this works perfectly well, but it is easy to construct examples where the tiny errors made at every stage are amplified until the results are complete nonsense. [Mathematicians, in particular, should understand that constant amplification leads to exponential growth which, even from tiny beginnings, soon swamps `reasonable' numbers.]

Similarly, calculators do not even try to evaluate, for example, sines. Instead, they evaluate a polynomial [usually, incidentally a Chebychev polynomial, not the Taylor series, which has poor numerical convergence properties] which agrees with the sine function to within the accuracy required by the calculator.

More interestingly, many pseudo-algorithms are attempting to reproduce limiting processes of mathematics. For example, to solve an equation, you might want to use the well-known Newton-Raphson process. This says, roughly, that if you plug an approximate root into the Newton-Raphson formula, then you will get out a better approximation, if you are lucky, and that if you repeat this process, starting with the better approximation, then the results will converge to the root. But in principle this convergence takes forever; so in real life we write our program so that it stops after 50 cycles, or when two approximations differ by less than 0.000001, or when the absolute function value is less than 0.000001, or whatever. This usually works well; but sometimes it fails, and we have to remember that we have not found the root, but the output from 50 cycles of the formula. Analysing all this properly can be hard!

Summary: Pseudo-algorithms arise very commonly, and they are not to be sneered at. But analysing their behaviour is typically much harder than analysing true algorithms. This is a very `wordy' topic; sorry! Next time we'll look at some more practical algorithms.