OHJ-2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012

276 The P vs. NP problem is a major unsolved problem in computer science It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a $ 1,000,000 prize for the first correct solution A proof either way would have profound implications for mathematics, cryptography, algorithm research, artificial intelligence, game theory, multimedia processing, In a 2002 poll of 100 researchers: 61 believed the answer to be P NP 9 believed the answer is P = NP 22 were unsure of the answer 8 believed the question may be independent of the currently accepted axioms and therefore is impossible to prove or disprove Reasons to believe P NP 277 After decades of studying these problems no one has been able to find a polynomial-time algorithm for any of more than 3 000 important known NP-complete problems Furthermore, the result P = NP would imply many other startling results that are currently believed to be false It is also intuitively argued that the existence of problems that are hard to solve but for which the solutions are easy to verify matches real-world experience: If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in "creative leaps," no fundamental gap between solving a problem and recognizing the solution once it's found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss... [Scott Aaronson, MIT] 1

278 On the other hand, some researchers believe that there is overconfidence in believing P NP. For example, in 2002 these statements were made: The main argument in favor of NPis the total lack of fundamental progress in the area of exhaustive search. This is, in my opinion, a very weak argument. The space of algorithms is very large and we are only at the beginning of its exploration. [...] The resolution of Fermat's Last Theorem also shows that very simple questions may be settled only by very deep theories. [Moshe Y. Vardi, Rice University] Being attached to a speculation is not a good guide to research planning. One should always try both directions of every problem. Prejudice has caused famous mathematicians to fail to solve famous problems whose solution was opposite to their expectations, even though they had developed all the methods required. [Anil Nerode, Cornell University] Consequences if P = NP 279 A proof that P = NP could have stunning practical consequences, if the proof leads to efficient methods for solving some of the important problems in NP A proof may not lead directly to efficient methods, perhaps if the proof is non-constructive, or the size of the bounding polynomial is too big to be efficient in practice The consequences, both positive and negative, arise since various NP-complete problems are fundamental in many fields Cryptography, for example, relies on certain problems being difficult A constructive and efficient solution to an NP-complete problem such as 3SAT would break most existing cryptosystems including public-key cryptography, a foundation for many modern security applications such as secure economic transactions over the Internet These would need to be modified or replaced by other solutions 2

280 On the other hand, there are enormous positive consequences that would follow from rendering tractable many currently mathematically intractable problems For instance, many problems in operations research are NPcomplete, such as some types of integer programming, and the TSP Efficient solutions to these problems would have enormous implications for logistics. Many other important problems, such as some problems in protein structure prediction, are also NP-complete If these problems were efficiently solvable it could spur considerable advances in biology 281 But such changes may pale in significance compared to the revolution an efficient method for solving NP-complete problems would cause in mathematics itself. According to Stephen Cook:...it would transform mathematics by allowing a computer to find a formal proof of any theorem which has a proof of a reasonable length, since formal proofs can easily be recognized in polynomial time. Example problems may well include all of the CMI prize problems. Research mathematicians spend their careers trying to prove theorems, and some proofs have taken decades or even centuries to find after problems have been stated for instance, Fermat's Last Theorem took over three centuries to prove A method that is guaranteed to find proofs to theorems, should one exist of a "reasonable" size, would essentially end this struggle 3

282 Consequences if P NP A proof that showed that P NP would allow one to show in a formal way that many common problems cannot be solved efficiently, so that the attention of researchers can be focused on partial solutions or solutions to other problems Due to widespread belief in P NP, much of this focusing of research has already taken place Also P NP still leaves open the average-case complexity of hard problems in NP. For example, it is possible that SAT requires exponential time in the worst case, but that almost all randomly selected instances of it are efficiently solvable. 283 7.4 NP-Completeness A function f: * * is a polynomial time computable if there exists a Turing machine M and a polynomial p for which f = f M and time M (n) p(n) for all n Let A *, B * be two formal languages Definition 7.29 Language A is polynomial time reducible to language B, written A mp B, if a polynomial time computable function f: * * exists, where for every x *, x A f(x) B 4

284 Theorem 7.31 (Extended) For all languages A, B, C it holds i. A mp A, (reflexive) ii. if A mp B and B mp C, then A mp C (transitive), iii. if A mp B and B NP, then A NP, and iv. if A mp B and B P, then A P. Note: for the part of mapping reducibility this theorem is exactly the same as Lemma J (Theorem 5.22). The difference is the polynomial time computability of the reduction. 285 Proof. i. We choose f(x) = x to be the reduction. ii. The composite function h(x) = g(f(x)) is a reduction from A to C, h: A m C (see Lemma J, Theorem 5.22). h can be computed in polynomial time: Let M f (M g ) be the Turing machine computing function f (g) in time bounded by polynomial p (q). We can assume that p and q are everywhere non-descending. Let M g, M REW, and M f work as in the proof of Lemma J. 5

286 x f(x) f(x) g(f(x)) M M f M REW g h(x) The TM computing the composite mapping 287 By combining the TMs as previously, we get a TM M h that computes the function h, and uses the following time on input x: time Mf (x) + time MREW (f(x)v) + time Mg (f(x)) p( x ) + 2p( x ) + q( f(x) ) 3p( x ) + q(p( x )) = O(q(p( x ))), which is polynomial in the length of x. iii. (and iv.) By combining the TM M f, which computes the reduction f: A mp B in time bounded by the polynomial p, the TM M B, which decides the language B in time bounded by q, and M REW similarly as in the proof of Lemma J, we get the TM M A, which decides the language A in time O(q(p( x ))). It is deterministic whenever M B is. 6

288 Satisfiability of Boolean formulas, SAT Given a Boolean formula, which consists of Boolean variables x 1,..., x n, Constant values 0 (false) and 1 (true), and Boolean operations,, and. Is satisfiable? Is there an assignment of values 0 and 1 to the variables t: { x 1,..., x n } { 0, 1 }, such that (t(x 1 ),, t(x n )) = 1 Let us guess the assignment t of values for the variables and verify that (t) = 1. If contains n Boolean variables, then t can be represented as a binary string of n bits and it can be verified in polynomial time 289 Stephen Cook and Leonid Levin discovered in the early 1970s that there exists the class of NP-complete problems The individual complexity of an NP-complete problem is related to that of the entire class of NP If a polynomial-time algorithm exists for any of the NP-complete problems, all problems in NP would be polynomial-time solvable NP-complete problems help to study the question P =?= NP and to recognize difficult practical problems Theorem 7.27 (Cook-Levin theorem) SAT P P = NP 7

290 The satisfiability problem for many special forms of Boolean formulas is also NP-complete A formula is in conjunctive normal form (cnf), if it comprises several conjuncts = C 1 C 2 C m, where each clause C i is a disjunction C i = i1 i2 ir Terms ij are literals: Boolean variables or their negations CSAT is the satisfiability problem for cnf-formulas: { is a satisfiable cnf-formula } Obviously, CSAT NP. An arbitrary Boolean formula can be converted to a cnf-formula in polynomial time 291 By restricting the number of terms in a clause of a cnf-formula to be exactly k literals, we get the k-conjunctive normal form (k-cnf) A family of languages: ksat = { is a satisfiable k-cnf-formula } Language 2SAT belongs to P Theorem CSAT mp 3SAT Proof. The given cnf-formula can be converted in polynomial time into an equivalent 3-cnf-formula ' Let = C 1 C 2 C m 8

292 Each clause C k = 1 2 r, r 3, is replaced by a 3-cnfformula C k ' = ( 1 2 t 1 ) ( t 1 3 t 2 ) ( t r 3 r 1 r ), where t 1,, t r 3 are new variables. The formula C k ' can clearly be obtained from clause C k in polynomial time. We still need to check that the transformation satisfies reducibility CSAT ' 3SAT: 1. satisfiable ' satisfiable: For all clauses C k the assignment satisfying must set i = 1 for some i C k. C k ' gets satisfied when we set the values of literals as in the assignment satisfying C k and the new variables get values as 293 follows t j 1, 0, if if j i 2 j i 2 2. is satisfiable ' is satisfiable: Also the subformulas C k corresponding to the clauses C k of must be satisfied. Then either a) Some literal i = 1, i C k, and C k gets satisfied by it, or b) For some i < r 3: t i =1 t i+1 = 0, and it must be that i+2 =1, and again C k gets satisfied. 9

294 If r 3, then C k = 1 2 3 C k ' = C k C k = 1 2 C k ' = ( 1 2 t) 1 2 t) C k = C k ' = ( t 1 t 2 ) t 1 t 2 ) t 1 t 2 ) t 1 t 2 ) The equivalence of satisfiability of the formulas is maintained. 295 Vertex Cover, VC Given an undirected graph G and a natural number k. Does G contain a subset of k nodes that cover every edge of G? A node covers an edge if the edge touches the node. To represent VC as a formal language we need to encode graphs as strings. Similar encoding techniques as those used with Turing machines apply. We can guess the given number k of nodes from the given graph G, and then verify in time polynomial in the size of the graph that the chosen k nodes cover all edges of G 10

296 Theorem 7.44 3SAT mp VC Proof. Let = C 1 C 2 C m be a 3-cnf-formula with variables x 1,, x n. The corresponding instance of vertex cover G, k is as follows: G has a node corresponding to each literal G has 3 nodes C j1, C j2, C j3 corresponding to each clause C j of G has edges: (x i, x i ), (C j1, C j2 ), (C j2, C j3 ), (C j3 C j1 ), and If C j = 1 2 3, then (C j1, 1 ), (C j2, 2 ), (C j3, 3 ) k = n + 2m Clearly G can be composed in polynomial time from formula. 297 The graph for formula = (x 1 x 3 x 4 ) ( x 1 x 2 x 4 ) x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 C 1 2 C 2 2 C 1 1 C 1 3 C 2 1 C 2 3 11

298 1. is satisfiable G has a vertex cover of at most k = n + 2m nodes: Let us include into the vertex cover corresponding to the value assignment, the node representing the literal which obtains value 1 (n nodes) For each clause C j, one edge (C jr, r ) of the corresponding triangle is now covered We take to the vertex cover the two remaining corners of the triangle (altogether 2m nodes) 299 2. Ghas a vertex cover of at most k nodes is satisfiable Let V', V' k, be a vertex cover of G For V' to be able to cover all edges of G, it must contain one node per each variable and at least two nodes from each C j - triangle Hence, V' = k Let us set 1, t( xi) 0, if x V ' if x V ' One of the edges starting from the corners of each C j - triangle is covered by a literal node V' Then t ) = t(c j ) = 1 i i 12

300 Hence, SAT mp CSAT mp 3SAT mp VC Definition 7.34 A language B is NP-complete if it satisfies: 1.B NP, and 2.A mp Bfor every A NP A NP-complete language can be decided deterministically in polynomial time if and only if all other languages in NP can also be decided deterministically in polynomial time Theorem 7.35 If B is NP-complete and B P, then P = NP. 301 Theorem 7.36 If B is NP-complete and mp Cfor C NP, then C is NP-complete. Proof. Because B is NP-complete, by definition A mp B for every language A NP. On the other hand, B mp C, and by the transitivity of polynomial time reductions (Theorem 7.31) it must hold that A mp C for all A NP. By assumption C NP, and the claim holds. Hence, to show that language C is NP-complete, it suffices to reduce in polynomial time some language B known to be NPcomplete to C and in addition verify that C NP However, we should find the first NP-complete language 13

302 Theorem 7.37 (Cook-Levin) Language SAT = { is a satisfiable Boolean formula } is NP-complete. We need to show that A mp SAT for any A NP All that we know about A is that it has a polynomial time nondeterministic decider N The reduction for A takes a string w and produces a Boolean formula w that simulates N on input w w is satisfiable iff w L(N) = A For each possible computation of N we have one truth value assignment of the variables in w The formula w is composed to give those conditions by which the given assignment corresponds to an accepting computation of N 303 Corollary 7.42 CSAT, 3SAT, and VC are NP-complete. Independent Set, IS: Given an undirected graph G and a natural number k. Does G have at least k nodes which have no edges with each other? By the following lemma it is easy to compose reductions VC mp IS and IS mp CLIQUE Lemma Let G = (V, E) be an undirected graph and V V. Then the following conditions are equivalent: 1. V is a vertex cover in G, 2.V \V is an independent set, and 3.V \V is a clique in the complement graph of G: = (V, (V V) \ E) 14

304 VC mp IS: Let us choose the mapping f: f( G, k ) = G, V k. Clearly this transformation can be computed in polynomial time. Now, by the preceding lemma G, k VC G, V k IS. Hence, f: VC mp IS. IS mp CLIQUE: Let us now choose the mapping f: f( G, k ) =, k. This transformation can be computed in polynomial time and by the preceding lemma G, k IS, k CLIQUE. Thus, f: IS mp CLIQUE. 15