Information Processing Letters 23 (1986) 103-106 North-Holland
20 August 1986
ON THE PROBABILISTIC PERFORMANCE OF ALGORITHMS FOR THE SATISFIABILITY PROBLEM * John FRANCO Department of Computer Seience,-~Indiana University, Bloomington, IN 47405, U.S.A. Communicated by Alan Shaw Received 8 March 1985 Revised 15 August 1985
Keywords: Satisfiability, average analysis, probabilistic analysis, Davis-Putnam, NP-complete
I. Introduction The satisfiability problem (SAT) is the problem of determining whether a given collection I of disjunctions (clauses) of boolean literals can all be satisfied (have value true) by some consistent assignment of truth values to the literals of I (truth assignment). SAT is NP-complete so there is no known efficient algorithm for solving this problem. The Davis-Putnam procedure (DPP) [4] is a well-known, much studied method for solving instances of SAT. The probabilistic analysis of variants of DPP under the assumption of constant-density input distributions such as in [7,8,9] has given the impression that the Davis-Putnam procedure is intrinsically a very fast method for solving most instances of SAT. This impression is moderated somewhat by the results of this letter which show that the following two trivial algorithms, run concurrently, solve 'more' instances of SAT in polynomial time than all previously studied algorithms.
variables in I Until t satisfies I Return (t) A2(I): Search I for a null clause If a null clause is found Then Return ("not satisfiable") Else Return ("cannot determine whether I is satisfiable") The results reported here represent the conclusion of work begun in [6]. They indicate that the favorable results previously obtained are probably due more to the probabilistic model chosen than to characteristics of the Davis-Putnam procedure. Furthermore, it may be that a similar statement can be made about probabilistic results obtained for some NP-complete problems on random graphs because of a certain property of constant-density models. This point will be expanded later.
2. Previous probabilistic results for SAT AI(I): Repeat Randomly choose a truth assignment t to * This article is based on work supported by the Air Force Office of Scientific Research under Grant No. AFOSR-840372.
The results of [7,8,9] are based on the probabilistic model (or a closely related model) which we refer to as M 1(n, r, p). According to M1 (n, r, p), each of n clauses is constructed independently by including each of r boolean variables in the clause, independently, in the following way: with prob-
0020-0190/86/$3.50 © 1986, Elsevier Science Publishers B.V. (North-Holland)
103
Volume 23, Number 2
INFORMATION PROCESSING LETTERS
ability p the positive literal associated with the variable is placed in the clause a n d with probability p the negative literal associated with the variable is placed in the clause. Thus, b o t h literals associated with the same variable are in the clause with .probability p2 and neither literal associated with the same literal is in the same clause with probability ( 1 - p)2. A c c o r d i n g to the results of [7,8,9], a collection of variants of D P P a n d other algorithms solves SAT in p o l y n o m i a l average time under Ml(n,r,p) if (a) n~< tin(r), (b) p ~< t(ln(r)/r) 3/2, (c) n > exp{ or}, a n d (d) p > c where t is a positive constant equal to the e x p o n e n t of the p o l y n o m i a l which b o u n d s the average complexity a n d c is any constant greater t h a n zero. In this short article it is s h o w n that algorithm A~ finds a solution to a r a n d o m instance of SAT u n d e r M~(n, r, p) with probability tending to 1 when p >~ l n ( n ) / r and A 2 verifies that no solution exists with probability tending to 1 w h e n p ~< 0.4 l n ( n ) / r if n < 2 ~ and w h e n p ~< ln(n)/(2r) if limn,, + ~ n / 2 ef = 0. N o t e that satisfiability can be d e t e r m i n e d in p o l y n o m i a l time by exhaustive search if n >t 2 r.
3. Analysis of A 1 and A 2
T h e result for A a requires finding the probability that a r a n d o m truth assignment is a solution to a r a n d o m instance I of SAT. T h e probability that a truth assignment, T, to r variables does not satisfy a r a n d o m clause, c, is the probability that n o n e of the r literals m a d e t r u e by T are in c a n d this is ( 1 - p)r. Therefore, the probability that T satisfies n i n d e p e n d e n t clauses is ( 1 - ( 1 - p)r)n. But, for 0 ~ p ~< 1, -(1 -p)r>
-exPt-rP}
so
(1 - (1 - p)r)" > (1 - exp{ - r p } ) " .
20 August 1986
But (1 - n - a ) n a p p r o a c h e s 1 as n gets large when a > 1. Thus, T is a solution to a r a n d o m instance of SAT, with probability tending to 1, when limn,r__, ~ p > l n ( n ) / r a n d A 1 finds a solution to a r a n d o m instance of S A T in p o l y n o m i a l time with probability tending to 1 for the same limiting range of p. Since any instance of S A T with a null clause is unsatisfiable (this is true u n d e r the interpretation given in [7,8,9]), a l g o r i t h m A2 verifies that some instances of SAT are unsatisfiable in polymonial time. The analysis of A 2 requires finding the probability that a null clause exists in a r a n d o m instance of SAT. T h e probability that n o n e of 2r literals is in a r a n d o m clause is ( 1 - p)2r. Therefore, the probability that at least one literal is in c is 1 - ( 1 - p)2r a n d the probability that at least o n e literal is in every clause of I is (1 - (1 - p)2r)n. Thus, the probability that there is a zero-literal clause in I is 1 - (1 - (1 - p)2r)n. Suppose, for large n and r, that p=aln(n)/r,
a<0.4,
and
n<2L
It is easy to verify by c o m p a r i n g terms of the series expansions for the log of each expression that (1 - ax) >/(1 - a) x for all 0 ~ x ~< 1 and 0 ~ a < ½. Let x = l n ( n ) / r . Then, since n < 2 r, 0 ~< x ~< 1. Therefore, 1 -- (1 - (1 - p)2r)n > 1 -- (1 -- (1 -- a)2 ln(n)) n > 1 - e x p ( -- (1 -- a)2 ln(n)n}
= 1 - exp( - n 2 ln(1-a)n} = 1 - exp( - n 2 l n ( 1 - a ) + l
}.
But a < 0.4 so 1 - exp{ - n z in(1 -a)+a } approaches 1 as n and r get large. Thus, I contains a zero-literal clause, with probability t e n d i n g to 1, w h e n lim~s_,o~p < 0.4 l n ( n ) / r and w h e n n < 2L If l i m n , r . _ , ~ n / 2 ~ = 0, t h e n lim~.~._. ~ r p 2 = 0. But, for 0 ~ p ~< 1 and limn, r __.oorp2 = O,
(I -
p ) 2 r = exp( -- 2rp} (1 - O(2rp2))
Suppose, for large r a n d n, p = a l n ( n ) / r , where a is a constant. T h e n
(by taking the p o w e r series expansion). So,
(1 - exp{ - r p } ) n = (1 - exp{ - a In(n) })I'
1 - (1 -
=(1-n-a) 104
""
(I -
= 1-(1-
p)2r)n
exp{-2rp}(1-
O(2rp2))) n
Volume 23, Number 2
INFORMATION PROCESSING LETTERS
> 1 - exp{ - n exp{ - 2 r p } ( 1 - O(2rp2)) } = 1 - exp { - nl-2~(1 - O(2rpZ)) }. But 2rp a approaches zero in the limit and if a < ½, then 1 - exp( - n'-2a(1 - O(2rp2)) } approaches 1 in the limit. Thus, I contains a zero-literal clause with probability tending to 1, when limn, r __,~ p < ln(n)/(2r) if limn, r __,~ n / 2 ¢I, = 0. Since A 2 runs in time b o u n d e d by a polynomial in n and r, A 2 verifies that no solution to I exists in polynomial time with probability tending to 1 when limn,r--,~p < 0.4 l n ( n ) / r if n < 2 r and when lim ~,r __.~ p < ln(n)/(2r) if limn, r __,~ n / 2 vf = 0.
4. Concluding remarks The results obtained assuming the constantdensity m o d e l Ma(n, r, p) are interesting in light of some probabilistic results obtained for the constant-component-size model M2(n, r, k): each of n clauses is selected independently and uniformly from the set of all possible k-literal clauses that can be constructed from r variables. U n d e r M2(n , r, k), the algorithms of [7,8] and A a and A 2 require exponential average time for any constant limiting ratio of n to r (see [6]) but the algorithm of [5] finds solutions to r a n d o m instances of SAT when limn,r__,oon/r < 1 for any k >~ 3. Furthermore, algorithms based on the unit-clause rule find solutions to r a n d o m instances of SAT under M2(n, r, k) in polynomial time with probability approaching 1 when limn,r_~oon/r < o ( 2 k / k ) (see [2,3]). The apparent disparity between the two sets of results is d u e in large part to a property of the constant-density model for SAT: the constantdensity m o d e l allows zero-literal clauses with about the same probability that it allows one-literal or two-literal or m-literal clauses, with m constant. Therefore, a r a n d o m instance generated according to Ml(n, r, p), has no solution with high probability unless the average n u m b e r of literals per clause is increasing with n and r. But, if the average n u m b e r of literals per clause is increasing with n and r, then the probability that a clause is satisfied
20 August 1986
by a r a n d o m truth assignment is increasing with n and r fast enough so that the probability that a r a n d o m truth assignment satisfies a r a n d o m instance of SAT under M l ( n , r, p) is high in this case. A similar p h e n o m e n o n can be observed when constant-density models are used to generate rand o m graphs or subsets of sets as instances for certain NP-complete problems. For example, consider the problem of finding a hamiltonian circuit in a given graph. This problem was studied in [1] under the assumption of the following constantdensity model: n vertices are given and the probability that an edge (undirected) connects any pair of vertices is p independent of any other edges appearing in the graph. U n d e r this model, the probability that a vertex is isolated (no edges between this vertex and any other) is about the same as the probability that a vertex connects to one other vertex or two other vertices or m other vertices where m is fixed. But a graph containing an isolated vertex has no hamiltonian circuit. Therefore, under the constant-density model, a r a n d o m graph does not have a hamiltonian circuit unless the average n u m b e r of edges per vertex is increasing with n. But then the chance of a hamiltonian circuit being in a r a n d o m graph is great and one may be found with high probability using the algorithm of [1]. It would be interesting to see how well the algorithm of [1] works probabilistically under the constant-component size model (known as the constant-degree model) for r a n d o m graphs. Another NP-complete problem for which a favorable analysis is obtained under a constantdensity model but an unfavorable result (using the same algorithm) is obtained under a constantcomponent-size model is Set Cover. We will not consider this example here but we refer the reader to [10]. The observations m a d e here suggest that constant-density models m a y be inferior to constantcomponent-size models in the sense that they seem to generate more trivial instances.
References [1] D. Angluin and L. Valiant, Fast probabilistic algorithms for hamiltonian circuits and matchings, Proc. 9th Ann. ACM Symp. on Theory of Computation (1977) 30-41. 105
Volume 23, Number 2
I N F O R M A T I O N PROCESSING LETTERS
[2] M.T. Chao and J. Franco, Probabilistic analysis of the unit clause and maximum occurring literal selection heuristics for the 3-satisfiability problem, Tech. Rept. No. 164, Indiana Univ., 1985. [3] M.T. Chao and J. Franco, Probabilistic analysis of a generalization of the unit clause literal selection heuristic for the k-satisfiability problem, Tech. Rept. No. 165, Indiana Univ., 1985. [4] M. Davis and H. Putnam, A computing procedure for quantification theory, J. ACM 7 (1960) 201-215. [5] J. Franco, Probabilistic analysis of the pure literal heuristic for the satisfiability problem, Ann. Oper. Res. 1 (1984) 273-289. [6] J. Franco and M. Paul, Probabilistic analysis of the
106
[7]
[8]
[9]
[10]
20 August 1986
Davis-Putnam procedure for solving the satisfiability problem, Discr. Appl. Math. 5 (1983) 77-87. A. Goldberg, P.W. Purdom and C.A. Brown, Average time analyses of simplified Davis-Putnam procedures, Inform. Process. Lett. 15 (1982) 72-75. P.W. Purdom, Search rearrangement backtracking and polynomial average time, Artificial Intelligence 21 (1983) 117-133. P.W. Purdom and C.A. Brown, The pure literal rule and polynomial average time, SIAM J. Comput. 14 (1985) 943-953. C. Vercellis, A probabilistic analysis of the Set Covering Problem, Ann. Oper. Res. 1 (1984).