A number of fallacious `proofs' that P = NP have depended on violating this. Typically, they use a horrendously complicated transformation to turn a problem such as HC into an instance of a much simpler problem -- the solution of simultaneous linear equations is a favourite -- and `hence HC is in P', forgetting that the transformation was intractable. Beware!Given two problems A and B, if there is a polynomial-time transformation that converts an arbitrary instance of A into an instance of B which has the same
yes/no
outcome,
we write A transf B.
[Several notations for this are used in the literature, but most of
them don't travel well onto Web pages!]
In some sense, the existence of this transformation shows that B
is at least as hard as A, though as argued above this is not to
be taken literally as meaning that it necessarily takes longer
for a given size of input.
This relationship is reflexive [A transf A] and transitive [A transf B and B transf C implies that A transf C], but not symmetric [A transf B does not imply that B transf A]. So, it defines a partial ordering on problems, by which some problems are harder than others. Are there any hardest problems?
If the problem C is itself also in NP, then it is said to be NP-complete [and again similarly for other classes of problems]. An NP-complete problem [if any such animal exists] is a hardest-possible problem in NP.
At first sight, this is an extraordinary notion. What could be meant by a generalisation of all the possible problems in NP? There is too much variety -- networking problems, timetabling problems, packing problems, combinatorial mathematics, arithmetic, matrices, geometry, you name it, every branch of mathematics has its NP problems.
But there is a common thread, and we can at least easily conceive of
an NP-hard problem.
Every problem in NP has a pseudo-algorithm, presumably intended to
run on a computer, thus expressed as a computer program in some language.
We'll formalise some of this [much] later on, but informally we can
ask the decision problem COMP:
`If we run this program with that data on such-and-such
a computer, will the output be yes
?'.
Every problem in NP then generates an instance of COMP.
Since for a given problem D the program and computer can be fixed,
and the data only has to be copied, the transformation from D to
COMP can certainly be achieved in polynomial time, and so
D transf COMP.
Sadly, COMP isn't in NP [how hard it is, we'll see in
due course].
So, we have an NP-hard problem, but not an NP-complete problem.
For that, we need to look at:
F = ((a and not b) or (b and c)) and not c [... true if a is true, b is false and c is false], orThe `satisfiability' decision problem, SAT is this problem for a slightly restricted form of F. Specifically, F must have form A and B and C and ..., where each of A, B, .... contains only ors and nots of the individual variables. Any F can be put into this form, though not necessarily in polynomial time.
F = (a or b) and (a or not b) and not a [... which is always false].
Why is SAT important? Because of:
Theorem [Stephen Cook, 1971]: SAT is NP-complete.The proof of Cook's theorem is decidedly messy -- not difficult, just messy -- and it will only be sketched here. For full details, consult, for example, the book by Garey and Johnson mentioned in the introduction.
For each step, we set up a dirty great big array of logical
variables, one for each possible state of the computer, one for
each bit of the input (converted to binary) and for the oracle (whose
size need not be more than P(N)).
The computer is fixed, so the size of this array is no worse
than linear in P(N).
Now, for each possible state of the computer (which is fixed,
so there are only a fixed, though rather large, number of possible
configurations for all its possible bits), the electronics of the
computer determine, as a function of this possible state, what the
next state, for the next step, will be.
So if, e.g., at time 1234, if the machine is in state
5678, then we may know that at time 1235 it will be in state 9876;
we can enforce that by a clause in F of the form
(not state12345678 or not state12359876).
Input can be dealt with in a similar way.
Then you want to say things like that at time 1234, the computer
must be in exactly one state.
This is done by clauses of form
(state12340000 or state12340001 or ...)
and (not state12340000 or not
state12340001) and ..., repeated over all possible
pairs of states.
I said it was messy!
All of this has to be repeated over each possible time step.
Also, you want clauses to say what the initial states of the
computer and of the input are.
Finally, you want to pick out the yes
computations,
so you pick the state where the computer prints out yes
,
and or together this state over all possible times.
Phew!
But note that the real mess is largely fixed, depending only on
the computer;
it just gets repeated P(N) times.
There is a further mess of order P(N) at each step depending
on the state of the oracle, and of order N depending on
the input.
But it is easy to see that the complete transformation consists of
a formula F whose construction is elementary and whose size
is polynomial in N.
The resulting formula is essentially an emulation of COMP
for P(N) steps, and F is satisfiable if and only if
the emulated program would have printed yes
.
We could have used a time-limited version of COMP instead of SAT in Cook's theorem. But what we next want to do is to use the NP-completeness of SAT to discover lots more NP-complete problems, and SAT, with its nice simple logical structure, is much more suitable for this purpose than COMP; so we would simply have deferred the mess, as we would next have had to transform COMP to SAT.
Although Cook's theorem shows how to transform any problem in NP to SAT, many specific problems can be transformed much more succinctly. We'll see an example later.