A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece Roosevelt Uiversity 430 S. Michiga Av. Chicago, IL 60605, USA edatsi@roosevelt.edu awolpert@roosevelt.edu Abstract We give a radomized algorithm for testig satisfiability of Boolea formulas i cojuctive ormal form with o restrictio o clause legth. This algorithm uses the clauseshorteig approach proposed by Schuler [14]. The ruig time of the algorithm is O 2 1 1/α)) where α = lm/) + Ol l m) ad, m are respectively the umber of variables ad the umber of clauses i the iput formula. This boud is asymptotically better tha the previously best kow 2 1 1/ log2m)) boud for SAT. 1. Keywords: SAT with o restrictio o clause legth, upper boud, clause shorteig Submitted July 2005; revised Novber 2005; published Novber 2005 1. Itroductio Durig the past few years there has bee cosiderable progress i obtaiig upper bouds o the complexity of solvig the Boolea satisfiability probl. This lie of research has produced ew algorithms for k-sat. They were further used to prove otrivial upper bouds for SAT o restrictio o clause legth). Upper bouds for k-sat. The best kow upper bouds for k-sat are based o two approaches: the satisfiability codig lma [9, 8] ad multistart radom walk [12, 13]; both give close upper bouds o the time of solvig k-sat. Schöig s radomized algorithm i [12] has the 2 2/k) boud where is the umber of variables i the iput formula. The radomized algorithm i [8] has a slightly better boud this algorithm is ofte referred to as the PPSZ algorithm). The multistart-radom-walk approach is deradomized usig coverig codes i [3], which gives the best kow 2 2/k + 1)) boud for determiistic k-sat algorithms. The PPSZ algorithm is deradomized with the same boud for Uiquek-SAT [11]. For small values of k, these bouds are improved: for example, 3-SAT ca be solved by a radomized algorithm with the 1.324 boud [7] ad by a determiistic algorithm with the 1.473 boud [2]. 1. The results of this paper without proofs) appeared i the proceedigs of SAT 2005 [6] c 2005 Delft Uiversity of Techology ad the authors.

E. Datsi ad A. Wolpert Upper bouds for SAT o restrictio o clause legth). The first otrivial upper boud for SAT is give i [10]: the 2 1 1/2 ) boud for a radomized algorithm that uses the PPSZ algorithm as a subroutie. A close boud for a determiistic algorithm is proved i [4]. A much better boud for SAT is the 2 1 1/ log2m)) boud, where m is the umber of clauses i the iput formula. This boud is due to Schuler [14]. His radomized algorithm is based o a approach that ca be called clause shorteig: the algorithm repeatedly applies a k-sat subroutie to formulas obtaied by shorteig iput clauses to legth k. Schuler s algorithm is deradomized i [5]; the deradomizatio gives a determiistic algorithm that solves SAT with the same boud. I this paper we improve the 2 1 1/ log2m)) boud: we give a radomized algorithm that solves SAT with the followig upper boud o the ruig time: «2 1 1 l m )+Ol l m) Idea of the algorithm. Our algorithm for SAT is a repetitio of a polyomial-time procedure P that tests satisfiability of a iput formula F. If F is satisfied by a truth assigmet A, the procedure P fids A with probability at least p. As usual, repeatig P o the iput formula O1/p) times, we ca fid A with a costat probability. Whe describig P, we view clauses as sequeces rather tha sets) of literals. Each clause l 1, l 2,..., l s such that s > k is divided ito two segmets as follows: First segmet: l 1,..., l k, Secod segmet: l k+1,..., l s We say that the first segmet is true uder a assigmet A if at least oe of its literals is true uder A; otherwise we say that the first segmet is false uder A. If A is clear from the cotext, we omit the words uder A. Let F cosist of clauses C 1,..., C m ad let A be a fixed satisfyig assigmet to F. The procedure P is based o the followig dichotomy: Case 1. The iput formula F has may clauses i which the first segmet is false. We call such clauses bad. Suppose that the umber of bad clauses is greater tha or equal to some d. The if we choose a clause C i from C 1,..., C m at radom, the first segmet i C i is false with probability at least d/m. Assumig that C i is bad, we ca simplify F by assigig false to all literals occurrig i the first segmet of C i. Case 2. The iput formula F has few less tha d) bad clauses. If we cosider a formula made up of the first segmets of all iput clauses, we get a k-cnf formula i which almost all clauses cotai true literals. Oly the first segmets of bad clauses prevet us from fidig A by applyig a k-sat algorithm to this formula. To fix it, we guess all bad clauses ad replace their first segmets by clauses of legth k that cotai true literals. More exactly, for each bad clause, we replace its first segmet by k literals chose at radom from this clause. The probability that the resultig k-cnf formula is satisfied by A is easily estimated. The procedure P first assumes that Case 2 holds. To process this case, P ivokes a subroutie deoted S) that trasforms F ito a k-cnf formula ad applies a k-sat 1) 50

A Faster Clause-Shorteig Algorithm algorithm. If o satisfyig assigmet is foud, Case 1 is cosidered to hold. The P simplifies the iput formula ad recursively ivokes itself o the simplified formula. The procedure P ad the subroutie S) uses k ad d as parameters. Clearly, the success probability of P depeds o values of the parameters. What values of k ad d maximize the success probability? I Sect. 4 we fid values of the parameters k logm/) + Olog logm)) ad d / log 3 m) that give the followig lower boud o the success probability: «2 1 1 l m )+Ol l m) Sect.2 cotais basic defiitios ad otatio. I Sect. 3 we describe the procedure P ad the subroutie S. I Sect. 4 we prove a lower boud o the success probability of P. I Sect. 5 we defie our algorithm for SAT ad prove the claimed upper boud o its ruig time. 2. Defiitios ad Notatio We deal with Boolea formulas i cojuctive ormal form CNF). By a variable we mea a Boolea variable that takes truth values t true) or f false). A literal is a variable x or its egatio x. If l is a literal the l deotes its complet, i.e. if l is x the l deotes x, ad if l is x the l deotes x. Similarly, if v deotes oe of the truth values t or f, we write v to deote its complet. A clause C is a set of literals such that C cotais o completary literals. A formula F is a set of clauses; ad m deote, respectively, the umber of variables ad the umber of clauses i F. If each clause i F cotais at most k literals, we say that F is a k-cnf formula. A assigmet to variables x 1,..., x is a mappig from {x 1,..., x } to {t, f}. This mappig is exteded to literals: each literal x i is mapped to the complet of the truth value assiged to x i. We say that a clause C is satisfied by a assigmet A or, C is true uder A) if A assigs t to at least oe literal i C. Otherwise, we say that C is falsified by A or, C is false uder A). The formula F is satisfied by A if every clause i F is satisfied by A. I this case, A is called a satisfyig assigmet for F. Let F be a formula ad l 1,..., l s be literals. We write F [l 1 =f,..., l s =f] to deote the formula obtaied from F by assigig the value f to all of l 1,..., l s. This formula is obtaied from F as follows: the clauses that cotai ay literal from l 1,..., l s are deleted from F, ad the literals l 1,..., l s are deleted from the other clauses. Note that F [l 1 = f,..., l s = f] may cotai the pty clause or may be the pty formula. Let A ad A be two assigmets that differ oly i the values assiged to a literal l. The we say that A is obtaied from A by flippig the value of l. By SAT we mea the followig computatioal probl: Give a formula F i CNF, decide whether F is satisfiable. The k-sat probl is the restricted versio of SAT that allows oly clauses cosistig of at most k literals. We write log x to deote log 2 x. 51

E. Datsi ad A. Wolpert 3. Procedure P ad Subroutie S Both procedures S ad P use the parameters k ad d; their values are determied i the ext sectio. 3.1 Descriptio of S Suppose that a iput formula F has a satisfyig assigmet A such that F has oly few < d) clauses i which the first segmet is false uder A, i.e. Case 2 of the dichotomy holds. The the subroutie S fids such a assigmet with the probability estimated i Sect. 4). The subroutie takes two steps: 1. Reductio of F to a k-cnf formula F such that ay satisfyig assigmet to F satisfies F. 2. Use of a k-sat algorithm to fid a satisfyig assigmet to F. First, S guesses all bad clauses, i.e. clauses i which the first segmet is false uder A. More exactly, the subroutie guesses a set {B 1,..., B d 1 } of clauses such that all bad clauses are cotaied i this set. For each clause B i, the subroutie guesses k literals i B i such that this guessed set cotais a true literal. The the subroutie replaces each clause i F by a set of k literals chose from this clause either the guessed set for B 1,..., B d 1 or the first segmets for the other clauses i F. The resultig formula is deoted by F. It is obvious that if we guessed right the A satisfies F. To test satisfiability of k-cnf formulas at the secod step, S uses a polyomial-time radomized algorithm that fids a satisfyig assigmet with a expoetially small probability. We choose Schöig s algorithm [12] to perform this testig we could use ay algorithm that has at least the same success probability, for example the PPSZ algorithm [8]). More exactly, we use oe radom walk of Schöig s algorithm, which has the success probability at least 2 2/k) up to a costat [13]. If S fids a satisfyig assigmet the the procedure returs it ad halts. Otherwise, S should retur o but for techical reasos, istead of o, we require S retur F which is used i the exteral procedure P. Note that if d = 1, i.e. there is o bad clause, the S simply fids a satisfyig assigmet to the k-cnf formula made up of the m first segmets. Also ote that the smaller d, the higher the probability of guessig F. Subroutie S Iput: Formula F with m clauses over variables, itegers k ad d. Output: Satisfyig assigmet or k-cnf formula. 1. Reduce F to a k-cnf formula F as follows: a) Choose d 1 clauses B 1,..., B d 1 i F at radom. b) Replace each clause that does ot belog to {B 1,..., B d 1 } by its first segmet. c) For each B i, choose k literals i B i at radom ad replace B i by the chose set. 52

A Faster Clause-Shorteig Algorithm 2. Test satisfiability of F usig oe radom walk of legth 3 see [12] for details): a) Choose a iitial assigmet a uiformly at radom; b) Repeat 3 times: i. If F is satisfied by the assigmet a the retur a ad halt; ii. Pick ay clause C i F such that C is falsified by a. Choose a literal l i C uiformly at radom. Modify a by flippig the value of l. c) Retur F. 3.2 Descriptio of P Suppose that a iput formula F has a satisfyimg assigmet A. The procedure first calls Subroutie S. The result of S depeds o which case of the dichotomy holds. 1. The iput formula F has may d) bad clauses. The S ever fids A. 2. The iput formula F has few < d) bad clauses. The S returs A with some probability). If S does ot retur a satisfyig assigmet, the procedure P simplifies F ad recursively ivokes itself o the simplified formula, i.e. P processes Case 1 of the dichotomy. To simplify the formula, P uses the fact that A falsifies at least oe clause i F. Note that such a clause has bee obtaied from a log clause loger tha k) i F because each short clause must cotai a true literal. Therefore, P chooses a clause at radom from those clauses i F that are obtaied from log clauses i F. Let l 1,..., l k be all literals of the chose clause. The P assigs f to these literals ad reduces F to F [l 1 =f,..., l k =f]. Procedure P Iput: Formula F with m clauses over variables, itegers k ad d. Output: Satisfyig assigmet or o. 1. Ivoke S o F, k, ad d. 2. Choose a clause at radom from those clauses i F that are obtaied from log loger tha k) clauses of F. Let the chose clause cosist of literals l 1,..., l k. 3. Simplify F by assigig f to all l 1,..., l k, i.e. reduce F to F [l 1 =f,..., l k =f]. 4. Recursively ivoke P o F [l 1 =f,..., l k =f]. 5. Retur o. Note that Schuler s algorithm [14] is a special case of P: take d = 1, k = log2m), ad use the PPSZ algorithm istead of Schöig s algorithm at step 2 i S. 53

E. Datsi ad A. Wolpert 4. Success Probability of Procedure P Give a iput formula F ad some values of the parameters k ad d, Procedure P fids a fixed satisfyig assigmet A or returs o. What is the probability of fidig A? I this sectio, we give a lower boud o the success probability of P ad choose values of k ad d that maximize this boud. We defie the miimum success probabilities for S ad P as follows: s, m, k, d) is the miimum success probability of S where the miimum is take over all pairs F, A such that i) A satisfies F ; ii) F has variables ad m clauses; iii) F has less tha d bad clauses. p, m, k, d) is the miimum success probability of P where the miimum is take over all pairs F, A such that i) A satisfies F ; ii) F has variables ad m clauses. I all ext lmas we assume that m > d 2. Lma 1. Subroutie S has the followig lower boud o the success probability: s, m, k, d) > ) d 1) log e 1 2 k ) Proof. Let A be a fixed satisfyig assigmet to F. The probability of fidig A is the product of three probabilities s 1, s 2, ad s 3. The probability s 1 is the probability that the set {B 1,..., B d 1 } of chose clauses cotais all bad clauses step 1a i S). The probability s 2 is the probability of guessig sets with true literals for all B 1,..., B d 1 step 1b i S). The probability s 3 is the probability of fidig A by oe radom walk step 2 i S). Sice B 1,..., B d 1 are chose at radom from m iput clauses, we have usig Stirlig s approximatio as i [1, page 4]): s 1 1 m d 1) 2πm d 1 m ) ) 1 d 1 m 2 H d 1 m ) m > 3 2 For further estimatio we use the iequality log1 + x) < x log e: s 1 > 3 2 d 1 2 H m ) m = 3 m 2 log 2 d 1) d 1) m d+1) log1+ m d+1) d 1 > 3 2 = 3 2 2 d 1) log m d 1) m d+1) d 1 m d+1) log e d 1 ) d 1) d 1 2 H m ) m To estimate s 2, we estimate the probability of guessig a set with a true literal for each B i. Obviously, B i has at most k) subsets of literals of size k. For a fixed true literal i Bi, there are at most 1 k 1) of such subsets that cotai this literal. Hece s 2 ) 1 d 1 k 1) = ) k d 1 k) 54

A Faster Clause-Shorteig Algorithm A lower boud o s 3 is proved i [13]: 2 ) 3 2 2. k Usig the iequality 1 x 2 x log e, we have s 3 2 ) 3 2 2 k = 2 3 2 1 1 ) k 2 3 2 2 log e k = 2 log e 3 2 1 k ) The product of the above bouds for s 1, s 2, ad s 3 gives the followig boud: s 1 s 2 s 3 > 3 2 = kd 1) d 1) d 1) k ) d 1) 2 1 log e k ) Fially, we relax the above boud to ) d 1) log e 1 2 k ). ) d 1) 2/3) 2 1 log e k ) Our goal is to choose values of the parameters k ad d viewed as fuctios of ad m) so as to maximize the success probability of Procedure P. We choose the fuctios =, m) ad d 0 = d 0, m) i two steps. First, we fid a lower boud o the success probability of P. The maximum of this boud is attaied whe k is a certai fuctio of, m, ad d. The, substitutig this fuctio i the boud, we fid d 0, m) that maximizes the boud. Lma 2. For all k such that k 1 + log ) d d 1) log e Procedure P has the followig lower boud o the success probability: p, m, k, d) > ) d 1) log e 1 2 k ) Proof. Let A be a fixed satisfyig assigmet to F. Suppose that Procedure P fids A with t recursive calls, where t is some iteger i the iterval 0 t /k. The P simplifies formulas t times. For each simplificatio, the probability of guessig a clause i F that cosists of false literals is at least d/m. Therefore, with probability at least d/m) t, we get a formula G such that G is still satisfied by A; G has less tha d bad clauses; G has at most kt variables. Usig Lma 1, we get the followig lower boud o the probability of fidig A with t recursive calls: p, m, k, d) > ) d t m kt)) d 1) log e kt)1 2 k ) = ) d t m ) d 1) 1 kt ) d 1) 2 1 log e k )+kt1 log e k ) = β, m, k, d) d m ) t 1 kt = β, m, k, d) 2 t log m d ) d 1) 2 kt1 log e k ) ) d 1) log1 kt ) + kt1 log e k ) 2) 55

E. Datsi ad A. Wolpert where β deotes the lower boud i the claim: β, m, k, d) = ) d 1) log e 1 2 k ) Note that the β boud is the same as the boud give by Lma 1. Sice log1 x) < x log e for all 0 x < 1, we have p, m, k, d) > β, m, k, d) 2 t log m kd 1) )+ log e+k log e d = β, m, k, d) 2 t k 1+ mi t [β, m, k, d) 2 t k d 1) log e 1+ log d ) d 1) log e log d ) ] where the miimum is take over all t such that 0 t /k. The secod iequality i 2) is equivalet to ) d 1) log e k 1 + log ) d 0. Therefore, the miimum is attaied whe t = 0, which proves the claim. It follows from Lma 2 that Procedure P has the followig lower boud o the success probability: p, m, k, d) > mi [β, m, k, d) 2 t d 1) log e k 1+ log d ) ] 3) t What value of k maximizes this boud for give, m, ad d? A more thorough aalysis shows that this boud is maximum whe the iequality o k is replaced by the equality: k = 1 + log ) d d 1) log e Cosider the boud obtaied from 3) by substitutig this value for k. Differetiatio by d shows that this boud is maximum whe d is close to / log 3 ). However, k ad d must be itegers, so we take the ceiligs of those values of k ad d. Lma 3. For all ad m such that 10 < 2 3 /2, take the followig values of k ad d: = log ) + 3 log log) 1 + log e d 0 = log 3 ) log 3 4) ) The p, m,, d 0 ) > 2 1 log 2 +2 56

A Faster Clause-Shorteig Algorithm Proof. First, we show that Lma 2 ca be applied for ad d 0, i.e. we show that 2) holds. It is straightforward to check the first iequality i 2). For the secod iequality we have ) ) 1 + d 0 1) log e log d 0 = ) 1 + d 0 1) log e log) + log d 0 log ) ) + 3 log log) log) + log = 0. log 3 ) Next, we substitute ad d 0 i the boud give by Lma 2. Note that d 0 = / log 3 )+ δ where 0 δ < 1. The we have usig < ) p, m,, d 0 ) > ) d0 1) 2 1 log e = ) δ+1 ) log 3 ) 2 1 log e > ) log 3 ) 2 1 log e = 2 log 2 ) > 2 = 2 log 2 ) log) 1 log e log) log ) 1 log e log) 2 log 2 1 log e ) It rais to prove that 2 log 2 ) + log e 1 log e ) + 2 This iequality is equivalet to log e log e + 2 2 log 2 ) which, i tur, is equivalet to ) 2 ) k0 1 + 2k0 log) log e 5) Our assumptio 2 3 /2 implies < log). Ideed, ) < log + 3 log log) + 1 [ ) ] = log) log loglog 3 )) 2 log) Therefore, the first factor i 5) is less tha 1 uder the assumptio. To make sure that the secod factor is less tha log e 1.44, we otice that 5. Ideed, if 10 the > 3 log log) > 3 log log10) > 5. 57

E. Datsi ad A. Wolpert 5. Mai Result The mai algorithm Algorithm A) is a repetitio of Procedure P. Lma 3 shows that if we substitute ad d 0 defied i 4) for the parameters of P the the success probability of P is at least 2 1 log 2 +2 The umber r of repetitios is take to be the iverse to this probability: r = 2 1 log e +2 which gives a costat success probability of Algorithm A. Algorithm A Iput: Formula F i CNF with m clauses over variables. Output: Satisfyig assigmet or o. 1. Compute ad d 0 as i 4) ad r as i 6). 2. Repeat the followig r times: a) Ru PF,, d 0 ); b) If a satisfyig assigmet is foud, retur it ad halt. 3. Retur o. Theor 1. Algorithm A rus i time O 2 m) 2 1 log e +2 For ay satisfiable iput formula such that 10 < 2 3 /2, Algorithm A fids a satisfyig assigmet with probability greater tha 1/2. Proof. To prove the first claim, we eed to estimate the ruig time of Procedure P. The procedure recursively ivokes itself at most / times. Each call scas the iput formula ad elimiates at most variables. Therefore, the procedure performs less tha scas. Algorithm A repeats the procedure at most r times, which gives the boud i the claim. Let p A, m) be the success probability of Algorithm A. The p A, m) 1 1 p, m,, d 0 )) r Usig Lma 3 ad the iequality 1 x) r e xr, we have p A, m) 1 e 1. 6) Rark 1. Note that we ca write the boud i Theor 1 as O 2 m) 2 «1 1 l m )+Ol l m). 58

A Faster Clause-Shorteig Algorithm Rark 2. For formulas with a costat clause desity, i.e. if m is liear i, our boud is better tha i the geeral case: 2 1 1 Ol l m) Rark 3. Our algorithm is based o the clause-shorteig approach proposed by Schuler [14]. His algorithm is deradomized with the same upper boud o the ruig time [5]. It would be atural to try to apply the same method of deradomizatio to our algorithm. However, the direct applicatio gives a determiistic algorithm with a much worse upper boud. Is it possible to deradomize Algorithm A with the same or a slightly worse upper boud? Ackowledgmets We thak Edward A. Hirsch ad Daiel Rolf for helpful discussios. We also thak the aoymous reviewers for their help i improvig the paper. Refereces [1] B. Bollobás. Radom Graphs. Cambridge Uiversity Press, 2d editio, 2001. [2] T. Bruegga ad W. Ker. A improved local search algorithm for 3-SAT. Theoretical Computer Sciece, 329 1-3):303 313, Decber 2004. [3] E. Datsi, A. Goerdt, E. A. Hirsch, R. Kaa, J. Kleiberg, C. Papadimitriou, P. Raghava, ad U. Schöig. A determiistic 2 2/k + 1)) algorithm for k-sat based o local search. Theoretical Computer Sciece, 289 1):69 83, 2002. [4] E. Datsi, E. A. Hirsch, ad A. Wolpert. Algorithms for SAT based o search i Hammig balls. I Proceedigs of the 21st Aual Symposium o Theoretical Aspects of Computer Sciece, STACS 2004, volume 2996 of Lecture Notes i Computer Sciece, pages 141 151. Spriger, March 2004. [5] E. Datsi ad A. Wolpert. Deradomizatio of Schuler s algorithm for SAT. I Proceedigs of the 7th Iteratioal Coferece o Theory ad Applicatios of Satisfiability Testig, SAT 2004, pages 69 75, May 2004. [6] E. Datsi ad A. Wolpert. A improved upper boud for SAT. I Proceedigs of the 8th Iteratioal Coferece o Theory ad Applicatios o Satisfiability Testig, SAT 2005, volume 3569 of Lecture Notes i Computer Sciece, pages 400 407. Spriger, Jue 2005. [7] K. Iwama ad S. Tamaki. Improved upper bouds for 3-SAT. I Proceedigs of the 15th Aual ACM-SIAM Symposium o Discrete Algorithms, SODA 2004, page 328, Jauary 2004. A prelimiary versio appeared i Electroic Colloquium o Computatioal Complexity, Report No. 53, July 2003. 59

E. Datsi ad A. Wolpert [8] R. Paturi, P. Pudlák, M. E. Saks, ad F. Zae. A improved expoetial-time algorithm for k-sat. I Proceedigs of the 39th Aual IEEE Symposium o Foudatios of Computer Sciece, FOCS 98, pages 628 637, 1998. [9] R. Paturi, P. Pudlák, ad F. Zae. Satisfiability codig lma. I Proceedigs of the 38th Aual IEEE Symposium o Foudatios of Computer Sciece, FOCS 97, pages 566 574, 1997. [10] P. Pudlák. Satisfiability algorithms ad logic. I Proceedigs of the 23rd Iteratioal Symposium o Mathatical Foudatios of Computer Sciece MFCS 98), volume 1450 of Lecture Notes i Computer Sciece, pages 129 141. Spriger-Verlag, 1998. [11] D. Rolf. Deradomizatio of PPSZ for Uique-k-SAT. I Proceedigs of the 8th Iteratioal Coferece o Theory ad Applicatios o Satisfiability Testig, SAT 2005, volume 3569 of Lecture Notes i Computer Sciece, pages 216 225. Spriger, Jue 2005. [12] U. Schöig. A probabilistic algorithm for k-sat ad costrait satisfactio probls. I Proceedigs of the 40th Aual IEEE Symposium o Foudatios of Computer Sciece, FOCS 99, pages 410 414, 1999. [13] U. Schöig. A probabilistic algorithm for k-sat based o limited local search ad restart. Algorithmica, 32 4):615 623, 2002. [14] R. Schuler. A algorithm for the satisfiability probl of formulas i cojuctive ormal form. Joural of Algorithms, 54 1):40 44, Jauary 2005. A prelimiary versio appeared as a techical report i 2003. 60