Diagonalization Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach. Ahto Buldas Ahto.Buldas@ut.ee
Background One basic goal in complexity theory is to separate interesting complexity classes. To separate C 1 and C 2 (i.e. to show C 1 C 2 ) we need to exhibit a machine (the language of which is) in C 1 that gives a different answer on some input from every machine in C 2. Diagonalization is the only general technique known for constructing such a machine. We have already seen diagonalization in Section 1.4, where it was used to show the existence of uncomputable functions. 2
In this lecture, we use diagonalization to prove hierarchy theorems, according to which giving Turing machines more computational resources (such as time, space, and non-determinism) allows them to solve a strictly larger number of problems. We will also show that if P NP then there exist problems that are neither in P nor NP-complete. It was realized though in the 1970s that diagonalization alone may not resolve P versus NP. Interestingly, the limits of diagonalization are proved using diagonalization. 2
Basic facts about universal simulation We know that Turing machines can be efficiently represented by strings, such that: Every string α {0, 1} represents a Turing machine. Every Turing machine is represented by infinitely many strings. Given a string α a universal Turing machine U can simulate the machine M α represented by α with a small (i.e. at most logarithmic) overhead, i.e. there is a function F (m) = O(m log m) such that for any α and any x {0, 1}, the running time of U(α, x) is F (T α (x)), where T α (x) is the running time of M α (x). Notational remark: For a fixed bijection ϕ: N {0, 1} and for i N, we will also use the notation M i for the machine represented by the string α = ϕ(i). One way to construct such a bijection is to define ϕ(i) as the binary expansion of i + 1 with the leading 1-bit ignored. 3
Time Hierarchy Theorem Theorem 3.1. For every time-constructible functions f and g with f(n) log f(n) = o(g(n)) we have that DTIME(g(n)) DTIME(f(n)). We will prove a lighter version of the theorem: Theorem 3.1. For every ɛ > 0 we have that DTIME(n 1+ɛ ) DTIME(n). 4
Proof of the time hierarchy theorem (lighter version) For a fixed ɛ > 0 define a diagonal function D 1,ɛ : {0, 1} {0, 1} as follows: D 1,ɛ (x) = 0 if M x (x) outputs 1 in time x 1+ ɛ 2 1 otherwise (i.e. if M x (x) outputs something else or does not stop during this time). Due to the efficiency of universal simulation, D 1,ɛ (x) with x {0, 1} m can be computed in time: O ( m 1+ 2 ɛ log ( m 1+ 2 ɛ )) (( = O 1 + ɛ ) m 1+ ɛ ) 2 log m = o ( m 1+ɛ). 2 So, the language L D of D 1,ɛ is in DTIME(n 1+ɛ ). On the other hand, L D cannot be in DTIME(n), because this would lead to a contradiction. 5
There is an infinite set A {0, 1}, such that for every α A, M α is the machine with running time t(n) = O(n) that decides L D, i.e. M α (x) = D 1,ɛ (x) for every x {0, 1} and for every α A. On the other hand, because of t(n) = O(n) there exists n 0 such that t(n) n 1+ ɛ 2 for every n n 0, and hence for every x with x n 0 and for every α A the machine M α (x) stops in x 1+ ɛ 2 steps. By taking α A such that α n 0, we have that M α (α) stops in α steps and hence D 1,ɛ (α) M α (α), i.e. A contradiction. M α (α) = D 1,ɛ (α) M α (α). 5
Space Hierarchy Theorem Completely analogous to the time hierarchy theorem. Space-constructible functions f : N N are functions for which there is a machine that, given any n-bit input, constructs f(n) in space O(f(n)). Theorem 3.2: For every space-constructible function f : N N, there exists a language L that is decidable in space O(f(n)) but not in space g(n) = o(f(n)), i.e. SPACE(f(n)) SPACE(g(n)). The proof is completely analogous to that of Theorem 3.1. The theorem does not have the logarithmic factor because the universal machine for space-bounded computation incurs only a constant factor overhead in space. 6
Proof of the Space Hierarchy Theorem L def = { ( M, 10 k ) : 0 M( M, 10 k ) using space f( M, 10 k ) }. For any M that decides a language in space o(f(n)), L will differ in at least one spot (at ( M, 10 k )) from L M. L is decided as follows: 1. On an input x, compute f( x ) using space-constructibility, and mark off f( x ) cells of tape. Whenever an attempt is made to use more than f( x ) cells, reject. 2. If x is not of the form M, 10 k for some TM M, reject. 3. Simulate M(x) for at most 2 f( x ) steps (using f( x ) space). If M tries to use more than f( x ) space or more than 2 f( x ) operations, then reject. 4. If 1 M(x) during this simulation, then reject; otherwise, accept. Note on step 3: Execution is limited to 2f( x ) steps in order to avoid the case where M does not halt on the input x. That is, the case where M consumes space of only O(f(x)) as required, but runs for infinite time. 7
Nondeterministic Time Hierarchy Theorem Theorem 3.3: If f, g are time constructible functions satisfying f(n + 1) = o(g(n)), then NTIME(g(n)) NTIME(g(n)). The proof uses the observation that universal simulation of non-deterministic machines (with a non-deterministic universal Turing machine) can be done in O(T )-time, where T is the running time of the simulated machine. We will prove a weaker version: NTIME(n 1.5 ) NTIME(n). 8
Problems with the proof and lazy diagonalization The technique from the previous section does not directly apply, since it has to determine the answer of a TM in order to flip it. To determine the answer of a non-deterministic that runs in O(n)-time, we may need to examine as many as 2 Ω(n) possible strings of non-deterministic choices. So it is unclear that how the diagonal machine can determine in O(n 1.5 ) or even O(n 100 )-time how to flip this answer. Instead we introduce a technique called lazy diagonalization, which is only guaranteed to flip the answer on some input in a fairly large range. We define f : N N as follows: f(1) = 2 and f(i + 1) = 2 f(i)1.2. Our D will flip the answer of M i on an input in {1 n : f(i) < n f(i+1)}. 9
Proof D (non-deterministic!) is defined as follows: On input x, if x 1, reject. If x = 1 n, then compute i such that f(i) < n f(i + 1) and 1: If f(i) < n < f(i + 1) then simulate M i on input 1 n+1 using nondeterminism in n 1.1 time and output its answer. If the simulation takes more than that then end with q accept. 2: If n=f(i+1), accept 1 n iff M i rejects 1 f(i)+1 in (f(i)+1) 1.1 time. Part 2 requires going through all 2 (f(i)+1)1.1 branches of M i ( 1 f(i)+1 ), but that is fine since D s input size is f(i + 1) = 2 f(i)1.2. We conclude that D runs in less than O(n 1.5 ) time. Let L D be the language decided by D. We claim that L D NTIME(n). 10
Suppose that L D is decided by M in cn steps (for some c > 0). Since each machine is represented by infinitely many strings, we can find i large enough such that M = M i and on inputs of length n f(i), M i can be simulated within n 1.1 steps. Thus the two steps of D imply: 1: If f(i) < n < f(i + 1), then D(1 n ) = M i (1 n+1 ) 2: D(1 f(i+1) ) M i (1 f(i)+1 ) (see the figure below!). By assumption, M i and D agree on all inputs, including those in the form 1 n for f(i) < n f(i+1). Together with (1), this implies that D(1 f(i+1) ) = M i (1 f(i)+1 ), contradicting (2). 11
Ladner s Theorem on NP-intermediate problems There is a large number of NP-complete problems. This phenomenon suggests a bold conjecture: every problem in NP is either in P or NPcomplete. We show that if P NP then this is false. If P = NP then the conjecture is trivially true but uninteresting. Theorem 3.4 (Ladner s Theorem): Suppose that P NP. Then there exists a language A NP\P that is not NP-complete. 12
Proof of Ladner s theorem Let M 1, M 2,... be an enumeration of deterministic Turing machines clocked so that the machine M i (x) runs in time x i and captures all of the languages in P. We also have a similar list f i of the polynomial-time computable functions. For every i, we have two sets of requirements to fulfill: R i : A L Mi S i : x: x SAT and f i (x) A, or x SAT and f i (x) A. NB!: i R i A P, i S i A is not NP-hard. We also have to show that A NP. 13
Let A = {x x SAT and f( x ) is even}. Note that if we make f(n) computable in n O(1) time then A will be in NP. Function f will be set to the current stage of the construction. Intuitively: In stage 2i, we keep f(n) = 2i for large enough n until condition R i is fulfilled. If R i is never fulfilled then A will be equal to L Mi and a finite difference from SAT contradicting the assumption that P NP. In stage 2i + 1, we keep f(n) = 2i + 1 until condition S i is fulfilled. If S i is never fulfilled then A will be finite and SAT reduces to A via f i which would put SAT in P, again contradicting P NP. In order to make f poly-time computable we use lazy diagonalization, i.e., we do not start a new stage until we see the requirement for the previous stage has been fulfilled on inputs so small we can test it. 14
Let f(0) = f(1) = 2. For n 1 we define f(n + 1) as follows: If (log n) f(n) n then let f(n + 1) = f(n). Otherwise, if log f(n) n < n, we have two cases: f(n) = 2i: Check if there is an input x, x log n such that either (a): M i (x) accepts and either f( x ) is odd or x is not in SAT, or (b): M i (x) rejects and f( x ) is even and x is in SAT. If such an x exists then f(n+1) = f(n)+1 otherwise f(n+1) = f(n). f(n) = 2i + 1: Check if there is an input x, x log n such that either (a): x is in SAT and either f( f i (x) ) is odd or f i (x) is not in SAT, or (b): x is not in SAT and f( f i (x) ) is even and f i (x) is in SAT. If such an x exists then f(n+1) = f(n)+1 otherwise f(n+1) = f(n). 15
Since to compute f(n + 1) we only examine x with x log n and the running time of M i (x) is: x i log i n log f(n) n < n, the running time of computing f(n + 1) from f(n) is O(2 x n) = O(2 log n n) = O(n 2 ), and hence the total running time for computing f(n + 1) is: n i=1 c n 2 = O(n 3 ) = n O(1), we can compute f(n) in time polynomial in n. It is straightforward to check that f(n) does not increase until the corresponding requirements if fulfilled and that if f(n) remains constant for all large n then we will have violated the P NP assumption. 16
Problems conjectured to be NP-intermediate We do not know of a natural decision problem that, assuming NP P, is proven to be in NP\P but not NP-complete, and There are remarkably few candidates for such languages. However, there are a few fascinating examples for languages not known to be either in P nor NP-complete. Two such examples are the Factoring and Graph isomorphism languages. 17
Limits of diagonalization For concreteness, let us say that diagonalization is any technique that relies upon the following properties of Turing machines: I. Existence of effective representation of Turing machines by strings. II. Ability of one TM to simulate any another without much overhead in time or space. Any argument that only uses these facts is treating machines as black boxes: the machine s internal workings do not matter. We define a variant of Turing Machines called oracle Turing Machines that still satisfy the two properties. However, for some oracle machine models P = NP, whereas the other models results in P NP. We conclude that to resolve P versus NP we need to use some other property besides the above two. 18
Oracle Turing Machines Oracle machines are machines that are given access to an oracle that can magically solve the decision problem for some language O {0, 1}. The machine has a special oracle tape on which it can write a string q {0, 1}, and in one step to get an answer to the question Is q in O?. This can be repeated arbitrarily often with different queries. If O is a difficult language that cannot be decided in polynomial time then this oracle gives an added power to the TM. 19
Def. 3.6 (Oracle Turing Machines): An oracle Turing machine is a TM M that has a special read/write tape we call M s oracle tape and three special states q query, q yes, q no. To execute M, we specify in addition to the input a language O {0, 1} that is used as the oracle for M. Whenever during the execution M enters the state q query, the machine moves into the state q yes if q O and q no if q O, where q denotes the contents of the special oracle tape. Note that, regardless of the choice of O, a membership query to O counts only as a single computational step. If M is an oracle machine, O {0, 1} a language, and x {0, 1}, then we denote the output of M on input x and with oracle O by M O (x). Nondeterministic oracle TMs are defined similarly. Def. 3.7: For every O {0, 1}, P O is the set of languages decided by a polynomial-time deterministic TM with oracle access to O and NP O is the set of languages decided by a polynomial-time nondeterministic TM with oracle access to O. 20
Claim 3.8: 1. SAT P SAT. 2. If O P, then P O = P 3. If EXPCOM = { M, x, 1 n : M(x) outputs 1 within 2 n steps}, then P EXPCOM = NP EXPCOM = EXP def = c DTIME ( 2 nc). Proof: 1. Given oracle access to SAT, to decide whether a formula ϕ is in SAT, the machine asks the oracle if ϕ SAT, and outputs the opposite answer. 2. Oracle can only help compute more languages and so P P O. If O P then it is redundant as an oracle, since we can transform any polynomialtime oracle TM using O into a standard (no oracle) TM by simply replacing each oracle call with the computation of O. Thus P O P. 21
3. Clearly, an oracle call to EXPCOM allows one to perform an exponentialtime computation at the cost of one call, and so EXP P EXPCOM. On the other hand, if M is a non-deterministic polynomial-time oracle TM, we can simulate its execution with a EXPCOM oracle in exponential time: such time suffices both to enumerate all of M s non-deterministic choices and to answer the EXPCOM oracle queries. Thus, EXP P EXPCOM NP EXPCOM EXP. 21
P vs NP cannot be solved by diagonalization Theorem 3.9 (Baker, Gill, Solovay, 1975): There exist oracles A,B such that P A = NP A and P B NP B. Proof : We can use A = EXPCOM. Now we construct B. For any language B, let U B be the unary language: U B = {1 n : Some string of length n is in B}. For every oracle B, the language U B is clearly in NP B, since a nondeterministic TM can make a non-deterministic guess for the string x {0, 1} n such that x B. Below we construct an oracle B such that U B P B, and hence P B NP B. 22
Construction of B: We construct B in stages, where stage i ensures that Mi B does not decide U B in 2 n /10 time. Initially we let B be empty, and gradually add strings to it. Each stage determines the status (i.e., whether or not they will ultimately be in B) of a finite number of strings. Stage i: So far, we have declared for a finite number of strings whether or not they are in B. Choose n large enough so that it exceeds the length of any such string, and run M i (1 n ) for 2 n /10 steps. Whenever it queries the oracle about strings whose status has been determined, we answer consistently. When it queries strings whose status is undetermined, we declare that the string is not in B. Note that until this point, we have not declared that B has any string of length n. Now we make sure that if M i halts within 2 n /10 steps then its answer on 1 n is incorrect: 23
If M i accepts, we declare that all strings of length n are not in B, thus ensuring 1 n U B. If M i rejects, we pick a string of length n that it has not queried (such a string exists because M i made at most 2 n /10 queries) and declare that it is in B, thus ensuring 1 n U B. In either case, the answer of M i is incorrect. This ensures that U B is not in P B (and in fact not in DTIME B (f(n)) for every f(n) = o(2 n )). 23
What have we learned? Diagonalization uses the representation of Turing machines as strings to separate complexity classes. We can use it to show that giving a TM more of the same type of resource (time, non-determinism, space) allows it to solve more problems, and to show that, assuming NP P, NP has problems neither in P nor NPcomplete. Results proven solely using diagonalization relativize in the sense that they hold also for TMs with oracle access to O, for every oracle O {0, 1}. We can use this to show the limitations of such methods. In particular, relativizing methods alone cannot resolve the P vs. NP question. 24