Polynomial time, nondeterministic polynomial time, and exponential time

Polynomial time, nondeterministic polynomial time, and exponential time

Chapter 3 Polynomial time, nondeterminist ic polynomial time, and exponential time 3.1 Chapter overview and basic definitions Programmers, even tho...

3MB Sizes 4 Downloads 164 Views

Chapter 3

Polynomial time, nondeterminist ic polynomial time, and exponential time 3.1

Chapter overview and basic definitions

Programmers, even those who had little exposure to complexity theory, use polynomial time, nondeterministic polynomial time, and exponential time as a rough and basic efficiency-related taxonomy for classifying algorithms. These three types of running time define the classes P, NP, and E (or EXP, depending on the definition of exponential time), on which we focus in this chapter. The class P contains all computational problems that can be solved in "reasonable" time, and, therefore, it is considered to be the class of feasible problems. P also contains problems solvable by algorithms with huge time complexity, such as, say, n 21~176 which are in fact unusable. However, such problems do not arise in practice, and, on the other hand, there are many theoretical advantages in the equivalation "P - Class of Feasible Problems." First, the solvability of a problem in polynomial time seems to be an intrinsic feature of the problem, independent of computability issues. Problems in P seem to have an underlying "nice" structure, which can be captured by some mathematical theory, and which polynomial-time algorithms can exploit. Secondly, the class P is robust to changes in the computational model, because all reasonable ("classical" 1) computational models can simulate each other in polynomial time. 1Note however that this argument has been recently challenged by some new computational models based on quantum theory. See Chapter 4.

51

52

Chapter 3. P, NP, and E

The class NP contains all the woblems that can be solved in polynomial time by nondeterministic machines. Tl~,ere exists an alternative characterization of NP that is more intuitive and has aiso the merit of being independent of a particular computational model: A set A is in NP if and only if input instances in A admit membership proofs whose validity can be verified in (deterministic) polynomial time (see Theorem 1.1.6 for a formal statement). For example, let us consider the NP set SAT. SAT consists of the set of satisfiable boolean formulas. We recall (see also Section 3.2) that a boolean formula r is given by a well-formed expression consisting of variables and the logical operators V (and), A (or), and negation (~ denotes the negation of the variable x). A truth assignment for r is an assignment of the boolean values T (true) and F (false) to each variable in r The formula r is satisfiable if there exists a truth assignment that makes it true. For instance, the formula (x V y)/x (5 V y) is satisfiable. One truth assignment that satisfies it (which can serve as a membership proof, also sometimes called a solution) is the truth assignment x - T and y - T. The formula x A y /x (5 V y) is not satisfiable. The set SAT is in NP, because each r E SAT has a truth assignment that makes it true and given a truth assignment one can check in polynomial time (in the length of r whether it makes r true or not. Finding a satisfying truth assignment is another issue. In short, for P problems, it is feasible to find solutions, while, for NP problems, it is feasible to check solutions. Clearly, P C_ NP, and even though it is very reasonable to believe that the inclusion is proper, the issue of whether P is properly included in NP is, as everyone knows, open. The class NP is important also because numerous important computational problems are NP-complete. Intuitively, a problem A is NP-complete if A E NP and any other problem in NP can be solved in deterministic polynomial time with the "help" of A. Therefore, an NP-complete problem cannot be solved in deterministic polynomial time, unless all problems in NP are in P. The "help" is formalized by the concept of a polynomial-time reduction. There are more types of reductions and, correspondingly, there are more notions of NP-completeness. The most general type of a polynomial-time reduction is obtained via oracle Turing machines. Intuitively, an oracle Turing machine operates like a normal Turing machine except that it is allowed to ask whether a given string belongs to a given set A, called the oracle set, and the correct answer is assumed to be automatically obtained in one computation step. More precisely, an oracle Turing machine M is obtained by adding to a normal machine an extra tape, called the oracle query tape, and three distinguished states qquery, qy, and q.. The oracle set A can be any language over the tape alphabet F. The machine, in addition to the normal operations, can write a string u E F* on the oracle query tape. From time to time, as dictated by the transition function, the finite control goes in the state qquery with some string u E F* on the oracle query tape. We say that M queries the oracle A whether u E A or not. When this happens, the machine goes in the next step in state qy if u E A and it goes in state qn if U r A. An oracle Turing machine M working with oracle set A is denoted M A and L ( M A) is the language

3.1. Chapter overview and basic definitions

53

accepted by M with oracle A. We are now prepared to define the most general type of polynomial-time reducibility, the polynomial-time Turin9 reducibility, also called Cook reducibility. A problem B is polynomial-time Turing reducible to a problem A (notation B _<~ A) if there is a polynomial-time oracle machine M such that B - L(MA). In other words, B _<~ A if B is polynomial-time computable given unrestricted access to A. The < ~ reducibility (called polynomialtime many-one reducibility and also known as Karp reducibility) is basically the most restrictive type of reducibility. A problem B is ___P reducible to a problem A (notation B _<~ A) if there is a polynomial-time computable function f such that, for all x, x E B if and only if f(x) E A (i.e., if A is viewed as an oracle, the reduction algorithm on input x constructs on the oracle query tape a single string f(x) and the answer of the oracle to the query "f(x) C A?" directly determines whether x E B or not). A problem A is NP-complete under ___~ reduction if A C NP and for all B E NP, B <_~ A. In the analogous way, we define NPcompleteness under __P reduction. A problem is NP-complete if it is NP-complete under <_~ reduction. The theory of NP-eompleteness starts with Cook's theorem from 1971 stating that SAT is NP-eomplete. Since then hundreds of problems have been shown to be NP-eomplete. Such problems arise from a large variety of domains and many of them are quite important for real-world applications. The complexity classes E and EXP are important because they capture, typically, a natural category of "non-smart" algorithms that simply operate an exhaustive search to find the solution to a problem. EXP is also important because it is the smallest deterministic time class known to contain NP. This chapter undertakes a quantitative analysis of the classes P, NP, E, and EXP, and of the relations between them. We briefly overview the sections in this chapter. Section 3.2 attempts to clarify what is currently known about the hardness of NP-complete problems. There is a common perception that the only way to tackle an NP-eomplete problem is to consider the exponentially large list of all possible candidate solutions and to patiently try all of them till we hit a good one. This leads to the standard exponential time algorithm for NP problems. In fact, better search strategies have been found for many important NP-eomplete problems, with significantly faster running time (although still exponential). We present such a method in Section 3.2 for the 3-SAT problem. More precisely we describe a probabilistie algorithm for 3-SAT that runs in time p(n)(3/4) n, for some polynomial p, which is much better than the standard ~(2 '~) algorithm. It is widely conjectured that P is properly included in NP. If this conjecture stands, what can we say quantitatively about N P - P? It is known that P is properly included in E. How "large" is E - P? These are the type of questions that we explore in Section 3.3 and in Section 3.4. In Section 3.3, we use the apparatus of topology, more precisely, the effective Baire classification schemas and the superset topology (see Section 1.2). The main results are that if P ~ NP, then N P - P

54

Chapter 3. P, NP, and E

is effectively of the second Baire category (thus, topology-wise, it is not small), while the class of NP-complete problems under Cook reducibility is effectively of the first Baire category (thus, it is small). It follows that if P ~ NP, then there are (from a topological point of view) many problems that are neither in P nor NP-complete. In Section 3.4, we turn to measure-theoretical tools. In the context of analyzing classes such as P, NP, and E, it is meaningful to consider the variant of effective measure theory given by polynomial-time computable martingales (see Section 1.2). It can be seen that, in this framework, E does not have measure zero, while P does have measure zero. In fact, we present results that show that many classes that generalize P in quite various ways (such as, to give just one example, the class of P-selective sets) also have measure zero. These results show that, quantitatively speaking and using the yardsticks of effective measure theory, E - P is quite large. What about the measure-theoretical quantitative analysis of NP? Alas, NP remains evasive from this angle too: It is not known whether the effective measure of NP is zero (which would make NP similar to P) or not (which would make NP similar to E). Researchers working in this area have conjectured that the effective measure of NP is not zero (again when the effectivity is based on polynomial-time martingales). This conjecture implies P ~ NP (because the measure of P is zero). More interestingly, as a result in Section 3.4 shows, the conjecture also implies that there exist problems that are NP-complete with respect to Cook reducibility, but not NP-complete with respect to Karp reducibility (a separation which is not known to follow from the hypothesis P ~ NP). Section 3.5 is dedicated to the quantitative analysis of the relation between relativized P and relativized NP. For any set, pA (called P relativized with oracle A) denotes the class of sets computable by deterministic polynomial-time oracle machines working with oracle set A, and NP A (called NP relativized with oracle A) denotes the class of sets computable by nondeterministic polynomial-time oracle machines working with oracle set A. It is known that there are oracle set A and B, such that pA ~ NpA and pB = N p s . The main result in Section 3.5 shows that if the oracle set A is taken at random, then NP A differs from pA in a very strong way: There is a set T(A) in NP A that cannot be even poorly approximated by any pA algorithm, in the sense that any pA algorithm is correct on only a fraction of (1/2 • e) of the input strings of length at most n, for all sufficiently large n. The definitions of the classes DTIME[f(n)] and NTIME[f(n)], based on which the canonical complexity classes P, NP, E, and EXP are built, are done in the framework of the worst-case complexity analysis of problems. Of course, it is quite interesting to analyze how difficult a problem is on average, with respect to some relevant distributions on input strings at a given length. Section 3.6 introduces elements of average case complexity theory and presents the analogues of P, NP, and NP-completeness in the framework of this theory.

3.2. Upper bound for 3-SAT

3.2

55

Upper bound for 3-Satisfiability

IN BRIEF"

A probabilistic algorithm for 3-SAT is presented t h a t runs in time

3 p(n)(-~) n, for some polynomial p, and which is correct with probability at least 1 e -n.

The satisfiability (SAT) problem is in many ways the canonical representative of the class NP. To list just a few of the reasons, we recall that SAT has a simple formulation, it has been the first problem discovered to be NP-complete, and any nondeterministic polynomial-time computation can be transparently encoded as an instance of SAT (cf. the proof of Cook's Theorem). In particular, many practical NP-complete problems can be reduced quite directly to the SAT problem. It is thus important to study the complexity of this problem. No interesting general lower bound is known for the time needed to solve SAT (there are some good lower bounds for specific approaches to solving SAT). W h a t about upper bounds? A brute-force algorithm runs in O(n2 n) time on an instance r with n variables by trying all possible t r u t h assignments. A polynomial-time algorithm for SAT, of course, does not seem possible (it would imply P = NP). Remaining in the realm of exponential time algorithms, it is of interest to find algorithms for k-SAT 2 that work in time O(c "~) for exponents c > 1 that are small. In this section we will present a probabilistic algorithm for 3-SAT that works in time p(n)(x3)n , for some polynomial p, and which gives the correct answer with probability at least 1 - e -n. We recall t h a t A denotes the boolean operation and, V denotes the boolean operation or, T denotes the boolean value true, and F denotes the boolean value false. A boolean formula r with variables x l , . . . , xn, is in the 3-Conjunctive Normal Form (briefly, 3-CNF) if it has the form r = C1 A C2 A ... A Cm, and each Ci (called a clause) has the form Ci = yi~ v yi~ V Yi3, where each yi~ is either a variable xk C { x l , . . . ,xn} or the negation ~k of a variable. For example, the following formula r is in 3-CNF: r

(X 1 V x 2 V x 4 ) A (~IY1 VX 3 VX4) A (X2 V x 3 VX4).

The formula r is satisfable because the assignment x l - T, x2 - T, x3 - F, and x4 -- F makes the formula true (there are some other satisfying assignments as well). The 3-SAT problem is defined as follows. P r o b l e m 3.2.1 3-SAT Problem:

Input: A formula r in 3-CNF. Question: Is r satisfiable?

2The k-SAT problem is a variant of SAT, in which each clause is restricted to have at most k literals. For k > 3, k-SAT is still NP-complete

Chapter 3. P, NP, and E

58

We will first sketch a very simple probabilistic algorithm that, for some polynomial p, runs in time p ( n ) ( 3 ) n (which is already much better than O(n2"*)) and is correct with high probability. Let r be a formula in 3-CNF. We will assume that r is satisfiable, because this turns out to be the interesting case (in case r is not satisfiable, the algorithm will always be correct). Let a* be a truth assignment that satisfies r We first pick a a truth assignment uniformly at random. If a satisfies r we are done. If a does not satisfy r there must be a clause C that is not satisfied by a, but that is satisfied by a*. Thus, if we pick uniformly at random a variable in C and we flip its truth value in a, we obtain a new a, which, with probability at least 1/3, is closer by one bit to a*. Suppose that the initial a and a* differ in j bits, for some 0 < j < n. It follows that, if we repeat the above procedure n times, we obtain with probability at least (1/3)J an assignment that satisfies r (this assignment may be a*, but we may also hit another satisfying assignment before getting to a*). There are 2'* possible initial assignments, and (~.) of these assignments differ from a* in j bits. Therefore the probability of success (i.e., of finding a satisfying assignment) is J 1 ~(3)(~)2~ 3=0

1 "* 2-~ ( 1 + )

5

"* _(2)

Consequently, the probability of failure is at most 1 - (~)'*. Therefore, if we repeat the whole procedure n-(~)'* times, the error probability is at most

(1- (~)'*) "*(~)~ _
r , th n

p ob b Zity

Proof. The input for the algorithm is a boolean formula r in 3-CNF with n variables. The algorithm consists in repeating a polynomial number of times the following routine, which we call the basic iteration.

BASIC ITERATION Step 1. Pick uniformly at random an initial assignment a0 E {0, 1}'* (as usual, we identify 0 with F and 1 with T). Step 2. Repeat for i = 0 , . . . , 3 n - 1 : If ai satisfies r stop the whole run of the algorithm and accept. Let C be a clause not satisfied by ~b. Pick one of its three literals uniformly at random and flip its value in hi. Call the new assignment ai+l.

3.2. Upper bound for 3-SAT

57

As before, we consider that r is satisfiable, because this is the interesting case. We focus on one fixed basic iteration. Let us call "Success" the event that the fixed basic iteration finds a satisfying assignment for r Let a* be a fixed satisfying assignment for r Then Prob("Success") _ Prob(a0 turns into a* in the basic iteration).

(3.1)

We want to find a lower bound for the right hand side of the above inequality. The Hamming distance between two strings al and a2 of equal lengths, denoted dist(al, a2), is, by definition, the number of positions in which the two strings differ. At an arbitrary iteration i of Step 2, assuming ai is not a satisfying assignment, dist(ai+l, a*) - dist(ai, a*) + 1. We call iteration i "good", if dist(a,+l , a* ) - dist(a~, a* ) - 1, and we call it "bad", if dist(a~+l , a* ) - dist(ai, a* ) § 1. As before, we have that the probability that iteration i is "good" is at least 1/3, because there must be at least one variable in C whose truth value is different in ai from a*. The probability that iteration i is "bad" is, of course, at most 2/3. If we assume that the probability of a "good" iteration is 1/3 and that the probability of a "bad" iteration is 2/3, then we decrease the probability of turning a0 into a*. Therefore, since we are interested in finding a lower bound for the right hand side of the inequality (3.1), we make this assumption. Let us suppose that dist(a0, a*) - j, for some 0 _ j _ n. We make one more simplification: In order to find a lower bound, it is enough if we consider only those transitions from a0 to a* that have i "bad" iterations and j + i "good" iterations, for 0 _ i < j (we must have j more "good" iterations than "bad" iterations to reach a*). Note that the total number of iterations is j + 2i < 3j < 3n. This is why we do 3n iterations in Step 2. The probability of a transition from a0 to a* with j § i "good" iterations and i "bad" iterations, in which the positions of "good" and "bad" iterations are fixed , is, under our assumption, (!~J+i 2 i We need to determine the number 3J " (5)" of transitions of this type. This can be done using a particular form of the Ballot Theorem (see [Fel68]). C l a i m 3.2.3 The number of transitions from ao to a* with j § i "good" iterations and i "bad" iterations is _A_(j+2~) j+2i "

Proof. We consider the grid of pairs (x, y), with x and y integer numbers. The points of the grid form an oriented graph in which each node (x, y) has two outgoing edges to the nodes (x + 1, y -t- 1) and (x + 1, y - 1). The first type of edge is called an "up" edge, and an edge of the second type is called a "down" edge. Edges in

Chapter 3. P,

58

NP,

and E

this graph correspond to iterations in Step 2. In a pair (x, y), we view x as being the iteration number, and y as being the Hamming distance between the current assignment (at iteration x) and a*. The assignment a0 corresponds to the node (0, j) and a* corresponds to the node (j + 2i, 0). Observe that just before arriving at (j + 2i, 0), we must be at (j + 2i - 1, 1). Thus the number of transitions from a0 to a* with j + i "good" iterations and i "bad" iterations is equal to the number of paths from node (0,j) to (j + 2i - 1, 1) that do not touch the x-axis (this is true because if at some iteration earlier than j + 2i we touch the x-axis, it means that we have reached a* before the j + 2i stipulated iterations). We will determine A, the total number of paths from (0,j) to (j + 2i - 1, 1), and B, the number of paths from (0,j) to (j + 2 i - 1, 1) which touch and maybe even cross the x-axis. Next we will calculate A - B, which will give us the desired result. For A, observe that each path from (0, j) to (j + 2 i - 1, 1) has j + 2 i - 1 edges, and that the difference between "down" edges and "up" edges must be j - 1. Therefore, there are j + i - 1 "down" edges and i "up" edges. Since the "up" edges can occur anywhere in the path, it follows that

i For B, observe that there is a 1-to-1 correspondence between the set of paths from (0,j) to ( j + 2 i - 1 , 1 ) that touch or cross the x-axis and the (total) number of paths from ( 0 , - j ) to the same (j + 2 i - 1, 1). Indeed, notice that (0, - j ) is the symmetric of (0,j) with respect to the x-axis. Consider a path from (0,j) to (j + 2 i - 1, 1) that touches the x-axis and let us consider the initial segment of this path until (and including) the first touch of the x-axis. If we reflect this segment on the x-axis and leave the rest unmodified, we obtain, in a 1-to-1 manner, a path from ( 0 , - j ) to (j + 2 i - 1, 1). Therefore, to determine B, we need to count how many paths are there from ( 0 , - j ) to (j + 2i - 1, 1). Such a path has j + 2i - 1 edges, and the difference between "up" edges and "down" edges must be j + 1. Consequently, there are j + i "up" edges, and i - 1 "down" edges. Since the "down" edges can occur anywhere in the path, it follows that

B - ( j + 2 i"- i1-)1 Thus, i

/

-

i-1

j

2i

"

This concludes the proof of Fact 3.2.3. | Let pj be the probability that there is a transition from a0 to a* with j + i "good" iterations and i "bad" iterations, with 0 <_ i <_ j, under our assumption that a "good" iteration happens with probability 1/3, and a "bad" iteration

3. 2.

59

Upper bound for 3-SAT

happens with probability 2/3. According to our discussion above, J

j

(j+2i)(1)J+i(2)

i=0

l~(j+2i)

1

(2) i

The last inequality has been obtained by retaining only the last term in the sum. It is known (see for example [Bo185]) that for 0 < A < 1 and for any m, (m) Am

> 1 2h(~)m= 1 (~_)~'~ ) ( . 1 - 2J(A, m) 2J(A, m) 1- A

(1-~)m

where h(A) d.j _A log 2 A - (1 - A)log2(1 - A) is the binary entropy function, and J(A, m) - V/2A(1 - A)lrm. Therefore, (3j)

1

>-- 2J(1/3, 3j) 2h(1/a)'(3J)

~, (~)~,

1 >-- 2J(1/3, 3n) Thus,

1

1

PJ > -3 " 2 J ( 1 /_3 , 1

6J(1/3, 3n)

. 1

~, (~/ ~ (~)~' (~) " \

/

2J"

The probability that the initial assignment a0 has dist(a0, a*) - j is (~.)/(2n). Therefore, the probability of the event "Success" (i.e., of the fact that in the fixed basic iteration we find a satisfying assignment) is at least n

(~')

j=o 2'~

1 6J(1/3,3n)

1 _ 2J

6J(1/3,3n)

9 ~ (1 + 2)'~1 3 n

6J(1/3, 3n) ( ~ ) 1

The probability of failure is at most

6nJ(1/3, 3n). (54)~ basic iterations,

1-6J(1/13,3n)

9 (3) n. Consequently, if we make

the probability that we fail to find a satisfying

60

Chapter 3. P, NP, and E

assignment is at most n

6nJ(1/3,3n).

( - ~ ) '~ m

e -n

for sufficiently large n.

3.3

NP vs. P - - t h e t o p o l o g i c a l v i e w

IN BRIEF" Assume P ~ NP. Then N P - P is topologically not small. On the other hand, the class of NP-complete problems is topologically small. The question whether P is equal or not to NP is the most outstanding open question in computational complexity. Solving this problem is beyond reach at this moment. However, there is strong evidence that P is properly included in NP (for example, just think about the hundreds of natural NP-complete problems for which no polynomial-time algorithm is known). It is thus reasonable to assume that P ~ NP and to develop a theory based on this hypothesis. In this section we undertake a topological analysis of some important subclasses of NP (assuming that P ~ NP). As pointed out in Section 1.2.1, in principle, two topologies are relevant for such an analysis: The Cantor topology and the superset topology. The Cantor topology is more natural but, unfortunately, it is not adequate in this case. If we attempt to use Definition 2.1.2, we see easily that NP itself is "small" (more exactly, NP is effectively of first category). Consequently, the Cantor topology approach classifies as small all the classes inside NP and, thus, is not capable of differentiating the relative sizes of such classes. Therefore, we consider the superset topology. This approach will allow us to compare the size of some interesting classes inside NP such as, to name just two, NP - P and the class of NP-complete problems. We show in this section that NP - P, if not empty, is not small with respect to effective superset topology. More precisely, if not empty, NP - P is of the second Baire category. Superset topology has been introduced in Section 1.2.1. We recall that we consider the binary alphabet Z~ - {0, 1 }. Z~* denotes the set of finite binary strings, and ~c~ denotes the set of infinite binary sequences. For i > 1, si denotes the ith string in ~E* in lexicographical order. The length of v E ~E* is the number of symbols from E* in v and is denoted by Ivl. For v C E* or in E ~ , for any natural number i >__ 1 (and i <__ [vl, in case v C E*), v ( i ) denotes the i-th symbol in v. Thus, v - v(1)v(2).., v(Ivl), for v E E*. The base/~s _ (UvS)vc~, of open sets in the superset topology is defined by

u2 - {w e

((1 < i <

Ivl

and v ( i ) - 1) =~ ( v ( i ) - w ( i )

- 1))}.

3.3. NP vs. P--thetopologicalview

61

D e f i n i t i o n 3.3.1 (Baire classification with respect to the superset topology) (1) A class A C ~c~ is effectively nowhere dense with respect to the superset topology, if there is a computable function f" ~* ~ ~* such that for every v in ~*, (i) v is a prefix of :(v), and (ii) Ar

s -0. UI(~,)

(2) A class A C ~ is effectively of first Baire category with respect to the superset topology, if there is a countable decomposition A - UieN Ai and a computable function f" I~ • ~* ~ ~* such that, for every i E N and for every v E ~*, (i) v is a prefix of f (i, v), and s (ii) Ai N U/(i,v) - O.

(3) A class A C ~ is effectively of second Baire category with respect to the superset topology if it is not effectively of first Baire category with respect to the superset topology. Sometimes, when the context is clear and for the sake of simplicity, we will just say first or second Baire category, or even first or second category. For the motivations behind these definitions, the reader is invited to go back to Section 1.2.1. We present first a technical lemma that will be at the core of all proofs showing that various classes are of the second Baire category. The lemma utilizes some classical concepts from computable function theory, which we introduce next. We fix an enumeration {Mi }~N of all standard Turing machines having as input one binary string. Let (Wi)ieN be the enumeration of the class of computably enumerable sets of binary strings defined by stipulating that, for each i, Wi is the domain of Mi, i.e., Wi is equal to the set of input strings on which Mi halts. Let Wi,s be the set of strings enumerated in Wi within the first s steps of a dove-tailing simulation of Mi (see [Soa87]). We also fix an enumeration {Pi}ieN of all deterministic polynomialtime machines, and an enumeration {NPi}ieN of all nondeterministic polynomialtime machines. For each i, L(P/) (L(NPi)) denotes the language accepted by Pi (respectively, NPi). If x E E*, P~(x) (NPi(x)) denotes the output of the machine Pi (NPi) on input x. We say that a set is co-finite if its complement is finite. Let T O T = {x [ W, = E*} and Co-FIN = {x I the complement of W, is finite }. For a class A of computably enumerable sets, we say that T O T is m-reducible to (A, Co-FIN), notation T O T
Chapter 3. P, NP, and E

62

T O T -
Proof. Suppose A is of the first category. This means that there is a countable decomposition A - UieN Ai and a computable function f" N x E* ~ E* such that, for every integer i and every string w E E*, w is a proper prefix of f(i, w) and Uf(i,~) s N A~ - q}. Let D - (j E I~l(3i)(Vn)(Vs)[3k E N, n < k < If(i,

0~)1, sk

r

wj,~]}.

Clearly, Co-FIN is included in the complement of D and D is in the E2 level of the arithmetic hierarchy. Observe that Wj E A implies j E D. Indeed, if we assume that Wj E A, then there exists some i such that Wj E Ai. Also note that if, for some n, it holds true that, for all k with n < k < If(i, 0n)i, Sk E Wj, then we can conclude that

uT( ,0~ ), which contradicts the fact that Ai n U~(i,0~) = 0, for all integers i and n. It follows that 9 i E T O T implies W~(i) E A, which implies s(i) E D, and 9 i ~ T O T implies s(i) E Co-FIN C_/J. (D denotes the complement of D.) Consequently, T O T <_m D, which is a contradiction because T O T is H2-complete and D is in E2. I We show now that if NP - P 7t q}, then NP - P is of the second category. By Lemma 3.3.2, all we have to do is to show that T O T <_m (NP - P, Co-FIN). L e m m a 3.3.3 If N P - P r ~, then T O T <_m ( N P - P, Co-FIN).

Proof. Let {Mi}ieN be the fixed enumeration of all deterministic Turing machines such that Wi is the domain of Mi. For every Turing machine Mi we define a nondeterministic polynomial-time Turing machine NPs(i) such that: 9 If i E TOT, then NPs(i) E N P - P, and 9 If i r TOT, then NP~(i) accepts all the inputs in E* except for a finite set.

3. 3. NP vs. P--the topological view

63

We first overview informally the construction. The computation of NPs(i) is performed in stages, starting with Stage O. The machine NPs(i) has two kind of objectives depending on whether the current stage is even or odd. Namely, if the current stage is 2e, for some integer e, NPs(i) tries to find, in polynomial time, whether Mi accepts se. If and when this happens, we pass to the next stage, i.e., to Stage 2e 4- 1. In case NPs(i) does not succeed in determining whether Mi accepts se, the current input is accepted. In this way, if i ~' TOT, then clearly NPs(i) remains stuck in an even stage and accepts a co-finite set. On the other hand, if the current stage is odd, say equal to 2e 4-1 for some integer e, then NPs(i) looks for a string y such that NP~(~)(y) ~ Pc(y). If no such y is found, the current input x is accepted if and only if x C S A T . Consequently, if at Stage 2e 4- 1 only failures (to find an y as above) occur, then, starting from some string, NP~(i) will be equal to SAT (formally, their characteristic functions will be equal; abusing notation, we often identify a set with its characteristic function). Hence, NPs(i) will eventually find a string y such that NPs(~)(y) ~ Pc(y), because otherwise SAT would be, except for a finite number of inputs, equal to P~ and thus it would be in P, which contradicts our hypothesis that NP ~ P. When this happens, stage is incremented to 2e 4- 2. We present next the complete construction. Construction of NPs(i) Initially, Stage=O. On input x E E* of length n, the computation of NP~(i) proceeds as follows: (a) For n steps, NPs(i) simulates deterministically the previous computations NPs(~)(Sl), NPs(~)(s2),..., and determines how many stages have been completed so far. The variable Stage contains the value of smallest uncompleted stage that NP~(~) is able to find within the allowed n steps. (b) Case I: Stage = 2e. For n steps, NPs(i) simulates Mi on input se. If Mi does not accept s~ in the allotted time, then x is accepted. Otherwise, Stage -- 2e 4- 1, and the input x is accepted. Case 2: Stage = 2e 4- 1. For n steps, NP~(i) looks for a string y < x such that NP~(~)(y) ~ P~(y). During this search NPs(~) is simulated deterministically on different inputs y. If such a y is found, then Stage = 2e + 2 and x is accepted. Otherwise, x is accepted if and only if x C S A T . End of construction of NPs(i) We need to show that (a) NPs(i) is a polynomial-time nondeterministic machine, (b) if i E T O T then L(NP~(~)) is not in P, and (c) if i r T O T , then NPs(~) accepts a co-finite set. These claims are proven in the following Claims.

C l a i m 3.3.4 NPs(i) is a polynomial-time nondeterministic algorithm. Proof. The only nondeterministic step in the computation of NPs(i) on an input x occurs in (b), case 2. In this case, NPs(i) has to determine if x E SAT, which can be done by a nondeterministic polynomial-time computation. All the other computations are performed in deterministic polynomial time (in fact, linear time). |

Chapter 3. P, NP, and E

64

C l a i m 3.3.5 If at a certain moment in the construction of NPs(i), Stage = 2e and Mi accepts se, then there exists a moment when Stage becomes 2e + 1.

Proof. For a sufficiently long input x, NPs(i) will have enough time to simulate the entire computation of Mi on input se. When this happens, Stage is increased to 2e + 1. | C l a i m 3.3.6 If at a certain moment in the construction of NPs(i), Stage = 2e+ 1, then L(NPs(i)) ~ L(Pe) and there exists a later moment when Stage becomes 2e+2.

Proof. Suppose that at Stage 2e + 1, NP~(i) fails to find any y such that NPs(i)(y) ~ Be(y). In this case, NP~(i)(y) will be equal to SAT(y) for almost every input y by the action of (b) Case 2. Also, NPs(i) will be equal to Pe on almost every input. This implies that SAT can be solved in deterministic polynomial time, contrary to the hypothesis that NP ~ P. | C l a i m 3.3.7 If i ~ TOT, then NPs(i) accepts a co-finite set.

Proof. Let se be the smallest string that is not accepted by Mi. By Claim 3.3.5 and Claim 3.3.6, Stage 2e is reached. Clearly, from this moment on, NPs(i) accepts every input string. | C l a i m 3.3.8 If i C TOT, NPs(~) C NP - P .

Proof. By Claim 3.3.4, NPs(i) C NP. Taking into account that i C TOT and Claim 3.3.5, it follows that the assertion in Claim 3.3.6 holds for every e, i.e., NPs(i) ~ Pe for all e. II This concludes the proof of Lemma 3.3.3. II Using Lemma 3.3.2 and Lemma 3.3.3, we immediately obtain the following theorem. T h e o r e m 3.3.9 INFORMAL STATEMENT: If NP and P are different, then the class N P - P is not small from the point of view of superset topology. FORMAL STATEMENT: If N P - P ~ 0, then N P - P / s effectively of the second Baire category with respect to the superset topology. The next theorem shows that, on the other hand, the set of NP-complete sets is of first category unless P = NP. Let NPCOMP - {A E N P I A is _<~-eomplete}.

T h e o r e m 3.3.10 INFORMAL STATEMENT: If NP and P are different, then the class of NP-complete problems is small from the point of view of superset topology. FORMAL STATEMENT: If N P - P ~r 0, then NPCOMP is effectively of the first Baire category with respect to the superset topology.

3.3. NP vs. P--the topological view

65

Proof. Let {Pi}ieN be this time 3 an enumeration of deterministic polynomial-time oracle machines. Without loss of generality, we assume that the time complexity of Pk on an input of length n is bounded by n k + k. Since for each A in N P C O M P there must be a machine Pk such that S A T - L(pA), we decompose N P C O M P - U NPCOMPi, iEN where NPCOMPi-

{A e NP I S A T - L(pA)}.

We show that, for any i E N, NPCOMPi is effectively (and uniformly in i) nowhere dense, from which the conclusion of the theorem follows. For all i E N, and for all w and y strings in ~*, let us denote by b(i, w, y) the string obtained by appending at the end of w lyl i + i ls, i.e.,

b(i, w, y) = w l . . . 1. lyl'+~ Note that for every positive integer i and for every string w C E*, there exists a b(i,~,y) string y such that P~ (y) ~r SAT(y), because, otherwise, SAT would be in P (P~, where z E E*, means that the oracle machine P i works with the finite oracle set whose characteristic function is encoded in the natural way by the string z). Consider the function f : N x E* ~ E* that, on input (i, w) acts as follows: First, for all v e E*, with Iv I - I w l , it finds the smallest (lexicographically) string y(v) such that pb(i,v,~(v))(y(v)) ~ SAW(y(v)), and, next, it selects the longest such y(v) over all v with Ivl - Iwl. We denote the selected string by y0. The output of f ( i , w) is b(i, w, yo). Clearly, the function f is computable, and, for each i C N and each w C E*, w is a prefix of f ( i , w). It remains to show that, for all i and all w, s U$(i,~) N NPCOMP, - 0.

(3.2)

Let A E U f(i,~) s be an infinite set and let v be the initial segment of length Iwl of the characteristic function of A. There is a smallest string y(v) such that pb(i,v,y(v)) i (y(v)) 7~ SAW(y(v)). Furthermore, ly(v)l <_ ]y0], because y0 is the longest such string among those resulting from strings of length Ivl. As P~ on input y(v) does not ask the oracle for a string longer than ly(v)l i + i and since b(i, v, y(v)) is s a prefix of the characteristic function of A (because A E Uf(i,.) ), it follows that eA(y(v)) ~ SAT(y(v)). Consequently, A is not in NPCOMPi, and Equation (3.2) is established. Hence N P C O M P is effectively of first category. | As a corollary, we obtain that, if P ~ NP, there exists sets in NP that are neither in P nor NP-complete. Moreover, the class of such sets is quite large, being of second Baire category. 3Note that, except for this proof, {Pi}iaN denotes a fixed enumeration of deterministic polynomial-time machines.

66

Chapter 3. P, NP, and E

T h e o r e m 3.3.11 INFORMAL STATEMENT: If NP and P are different, then the class of problems that are neither in P nor NP-complete is not small from the point of view of superset topology. FORMAL STATEMENT: Assume NP ~ P. Then ( N P - P ) - N P C O M P is effectively of the second Baire category with respect to the superset topology.

Proof. Suppose that ( N P - P) - N P C O M P is of the first Baire category. It is easy to see that the union of two sets of first category is of first category as well. Since NP - P = ((NP - P) - NPCOMP) U NPCOMP, we obtain that NP - P is of the first Baire category, which contradicts Theorem 3.3.9. I Using the same tools, we can investigate classes of sets achieving a stronger form of separation between P and NP. Recall that an NP problem is a decision problem, i.e., a problem for which the solution on any input instance is yes or no. Given the strong evidence for the existence of NP problems admitting no polynomial time algorithm that gives the correct answer on all inputs, we may want to reconsider our demands and look for a polynomial-time algorithm for which only the yes answers are always correct, or perhaps only the no answers are always correct. The sets for which even these more modest objectives are not achievable are called P-immune sets, and, respectively, P-simple sets. D e f i n i t i o n 3.3.12 subset in P.

(a) A set A is P-immune if A is infinite and has no infinite

(b) A set A is P-simple if the complement of A is infinite and has no infinite subset in P (i.e., the complement of A is P-immune). Even if we assume that P ~ NP, it is not known if there exists a set in NP that is P-immune set, or P-simple. However, we will show that if there exists one Pimmune set in NP, then there exist many such sets, and that the similar assertion holds for P-simple sets. T h e o r e m 3.3.13 INFORMAL STATEMENT: The class of P-immune sets in NP is either empty or not small from the point of view of superset topology. FORMAL STATEMENT: If there exists a P-immune set in NP, then the class of P-immune sets in NP is effectively of the second Baire category with respect to the superset topology.

Proof. Let A = {B E N P I B is P-immune}. By hypothesis, A is not empty, so we fix a set H in A. We will construct, in an effective way, for each i a nondeterministic polynomial-time machine NPs(i) such that if i E T O T then L(NPs(~)) E A, and if i ~ A, then the language accepted by NPs(~) is co-finite (i.e., NPs(i) accepts all the strings except for a finite set). Then, by Lemma 3.3.2, the conclusion follows. Note that NP~(~) E A holds if for any deterministic polynomial-time machine Pk, either L(Pk) is finite, or L(Pk) is not an infinite subset of L(NP~(~)). During the construction we use the variables Stage, NextCandidate, and a list, called List, which stores indices of deterministic polynomial-time machines that have been considered but not yet ruled out as accepting infinite subsets of

3.3. NP vs. P--the topological view

67

L(NPs(i)). List is viewed as a linear sequence of integers, so that we can speak about the first element of List, about the second element of List, and so forth. Insertions in List are made in the last positions, deletions can be made anywhere, but after a deletion the List is compactified so as not to contain any gap. We will take care not to increase the size of List too much, so that insertions and deletions can be realized in linear time. The variable NextCandidate keeps the index j of the next deterministic polynomial-time machine Pj that we attempt to introduce in List. Stage is a variable that records the current stage in the construction of NPs(i), similarly to the construction in the proof of Lemma 3.3.3. If Stage has an even value, Stage = 2e, then NPs(i) tries to find if Mi accepts s~. When this happens, Stage is incremented to 2e + 1. If Mi does not accept Se, then Stage remains perpetually at the value 2e, causing NP~(i) to accept all further inputs. If Stage has an odd value, then NP~(i) tries to find for each k in List a string z such that z E L(Pk)f3 L(NPs(i)) (L(NPs(i)) denotes the complement of L(NPs(i))). In this attempt, NP~(i) dedicates to each element of List a time that is proportional to its position in List, without exceeding n steps for the whole operation, where n is the length of the current input of NPs(~). Specifically, if k is in position j, n/(2J) steps are spent in the effort of finding the desired string z. If no such z is found, then NP~(i) accepts or rejects the current input x depending on whether x E H, or, respectively x r H. However, as NPs(i) processes increasingly longer inputs, and assuming that L(Pk) is infinite, there will be eventually enough time to discover a z such that z E L(Pk) fq L(NP~(~)). The existence of such a z follows from the P-immunity of H and from the fact that repeated failures make L(NPs(i)) be equal to H, modulo a finite set (i.e., their symmetric difference is a finite set).

Construction of NP,(i) Initially, Stage = O, NextCandidate = O, List = ~. On input x E E* of length n, NPs(i) runs as follows: (a) For n steps, NPs(i) simulates deterministically the previous computations (i.e., it computes NP,(~)(sl),NP~(~)(s2),..., for as many inputs the time bound allows) and determines the current values of Stage, NextCandidate, and the content of List. It will become clear from the construction that all these values are the same on all the nondeterministic branches of the computations that are simulated, so there is no ambiguity in determining the above values of the three variables. (b) Case 1: Stage =2e. The machine Mi on input s~ is simulated for n steps. If Mi accepts Se in the allotted time, x is accepted, Stage is increased to 2e + 1, and NPs(i) stops. Otherwise, x is also accepted, but Stage remains equal to 2e. In both cases, NPs(~) stops. Case 2: Stage = 2e + 1. Let m be the number of elements in List. If m > n, then x is rejected and NPs(i) stops. Otherwise, NextCandidate is incremented by 1. For j E { 1 , . . . , m + 1}, let List[j] denote the value of the j - t h element in List. Then for every j E { 1 , . . . , m + 1}, for n/(2J) steps, NP~(i) looks for a string z such that z E L(PL~st~]) and z r L(NP,(~)). In doing this, NPs(~) is simulated in a deterministic way.

68

Chapter 3. P, NP, and E

Case 2.1: The search succeeds for some j. Then for all such j's, List~] is deleted from List, x is accepted, and Stage "- 2e + 2. Case 2.2: The search fails for all j. Then x is accepted if and only if x C H. End of construction of NPs(i).

The proof follows by the next series of facts. C l a i m 3.3.14 NPs(i) is a polynomial-time nondeterministic machine. Proof. The only nondeterministic step in the computation of NP~(~) occurs in (b), Case 2.2. This step is realized by simulating the nondeterministic polynomial-time machine that accepts H. All the other operations are performed in a deterministic way in polynomial time (in fact, linear time). Observe that the size of List is not allowed to increase too much, so that the insertions and the deletions can be done in linear time in the size of the current input. I

C l a i m 3.3.15 Suppose that at a certain m o m e n t in the construction of NPs(i), Stage - 2e and Mi accepts Se. Then there exists a later m o m e n t when Stage is increased to 2e ~ 1. Proof. This follows from the fact that on a long enough input string x, NP~(i) has enough time to simulate the accepting computation of Mi on input se. II

C l a i m 3.3.16 Suppose that at a certain m o m e n t in the construction of NP~(i), Stage - 2e + 1. Then there exists a later m o m e n t when Stage is increased to 2e + 2. Proof. Let us suppose the contrary. Clearly, while the construction is at Stage 2e + 1, there exists a moment when some index k such that L(Pk) is infinite is inserted in List. Since H is P-immune, L ( P k ) N H ~ q}. Moreover, L(Pk)f3 H is an infinite set. This is true because if L(Pk) fq H - B and B is a finite set, then L(Pk) - B C H, but L(Pk) -- B is an infinite set in P. This would contradict the P-immunity of H. By our assumption, it follows that there exists a string x0 such that for every string x >_ x0, NPs(i) accepts x if and only if x C H. It follows that L(NP~(i)) is itself P-immune. Therefore, on a long enough input, NP~(i) has enough time to discover a string z such that z C L(Pk)NL(NPs(~)). This implies the incrementation of Stage to the value 2e + 2, which contradicts our assumption. II

C l a i m 3.3.17 If i ~_ TOT, then NPs(i) accepts a co-finite language. Proof. Let e be minimal with the property that Mi does not accept se. By Claim 3.3.15 and Claim 3.3.16, it follows that Stage reaches the value 2e. From this moment on, NP,(i) accepts every input. II

C l a i m 3.3.18 /f i e T O T and L(Pk) is infinite, then L(Pk) fq L(NP~(i)) r q}. Proof. Since i E TOT, it follows from Claim 3.3.15 and Claim 3.3.16 that there exists a moment when k is inserted in List. Then, by the argument we used in the proof of Claim 3.3.16, we get that L(Pk)fq L(NP~(~)) ~ O. II

3.3. NP vs. P--thetopologicalview

69

Claim 3.3.19 L(NPs(~)) is infinite. Proof. If i r T O T , the conclusion follows from Claim 3.3.17. If i E T O T , then Claim 3.3.15 and Claim 3.3.16 together imply that the value of Stage passes through all positive integers. Any increase of Stage from an even value is done in (b) Case 1 and implies also the acceptance of the current input string x. | From Claims 3.3.17, 3.3.18, and 3.3.19, it follows that T O T ___m (A, Co-FIN), where A is the class of P-immune sets in NP. By Lemma 3.3.2, we conclude that NP - P is of second Baire category. | The analogous result for the case of sets in NP that are P-simple is derived similarly. T h e o r e m 3.3.20 INFORMAL STATEMENT: The class of P-simple sets in NP is either empty or not small from the point of view of superset topology. FORMAL STATEMENT: If there exists a P-simple set in NP, then the class of P-simple sets in NP is effectively of the second Baire category with respect to the superset topology.

Proof. The proof relies again on Lemma 3.3.2. We fix H, a P-simple set in NP. For every integer i, we define a nondeterministic polynomial-time machine NPs(i) such that (a) i E T O T implies H is included in L(NPs(~)), and (b) i tig T O T implies NPs(~) accepts a co-finite set. Note that (a) implies that L(NPs(~)) is P-simple, because if B is any infinite set in P, then B n H ~ q}, and, therefore, B N L(NP~(~)) ~ 0. Construction of NP~(i) On an input x of length n, NPs(~) does the following: (a) For n steps, NPs(i) simulates deterministically the previous computations NPs(~)(sl), NPs(~)(s2),..., for as many inputs the time bound allows. It may happen that on some input sk the simulation of NP~(i)(sk) finds different values for the variable Stage on the originally nondeterministic branches of the computation of NPs(i)(sk). If this is the case, the least such value is selected for Stage. (b) Case 1: Stage - 2e. Then Stage is increased to 2e + 1 and the nondeterministic polynomial-time machine N that accepts H is started on input x. If a computation of N(x) that accepts x is discovered, then NPs(i) accepts x and Stage is reset to the value 2e. Otherwise, NPs(i) rejects x. After these operations, NPs(i) stops. Case 2: Stage = 2e + 1. For n steps, NP~(~) simulates Mi on input se. If Mi accepts se in the allotted time, then x is accepted, Stage is increased to 2e + 2, and NPs(i) stops. Otherwise, NPs(i) just accepts x (observe that x is accepted anyway) and stops.

End of construction of NPs(i) Suppose that i E TOT. Then the value of Stage passes through all positive integers k. This is proved by induction on k. Suppose k - 2e. There exists a moment when the input x is such that x r H (because H is infinite). At this

70

Chapter 3. P, NP, and E

moment, Stage becomes 2e + 1 and is never decreased below this value later. In case k -- 2e + 1, since i E TOT, we conclude that for a sufficiently long input x, NPs(i) has enough time to discover that Mi accepts se and, consequently, to increase Stage to the value 2e + 2. Observe that any permanent increase of Stage in (b) Case 1 implies that NPs(i) rejects the current input. Hence L(NPs(i)) is infinite. On the other hand, H is included in L(NP~(i)), because NP~(i) rejects an input x only in case x r H (see (b) Case 1). Clearly, the computation described for NP~(i) can be carried on in nondeterministic polynomial-time. We conclude that L(NPs(i)) is a set in NP that is P-simple. In case i tig TOT, it is easy to see that Stage stabilizes itself to the value 2e + 1, where s~ is the minimal string that M does not accept. From that moment on, NPs(i) accepts all further input strings. Hence, in this case, L(NP~(i)) is cofinite. I At the end of this section, we note that classes which are effectively of the second Baire category with respect to the superset topology exhibit the same kind of logical independence whice we have seen in Theorem 2.3.11. The proofs of the following results are almost identical to the ones we have seen in Proposition 2.3.9 and Proposition 2.3.10, and, therefore, we will just state the results and sketch very briefly the proofs. P r o p o s i t i o n 3.3.21 Consider ~ a property of computable predicates such that if f is a computable predicate having property P then f ( x ) = 0 for infinitely many x. Suppose that there is a sound deductive system T such that, for each predicate f having the property T), there is a Turing machine M that calculates f for which the sentence "The function computed by M has property P " is a theorem of T. Then the set of predicates having property T~ is effectively of Baire first category with respect to the superset topology.

Proof. (sketch) Similar to the proof of Proposition 2.3.10.

I

T h e o r e m 3.3.22 Let T be any sound deductive system and let C be a class of languages which is effectively of the second Baire category with respect to the superset topology. Suppose that any language A in C is (a) computable, and (b) the complement of A is infinite. Then there is a language A E C such that, for each machine M computing A, the sentence "The function computed by M belongs to C" is not a theorem of T. Moreover, the set of such languages A is effectively of the second Baire category with respect to the superset topology.

Proof. For any language A in C consider the assertion For some machine M computing A, the sentence "The function computed by M belongs to C" is a theorem of T. For any language A in C, the above assertion is either true or false. By Proposition 3.3.21, the languages for which the assertion is true form a class which is effectively of first Baire category with respect to the superset topology. Since C is of the second Baire category, it follows that the language for which the assertion

3.~. P, NP, E ~ t h e measure-theoretical view

71

is false is of the second Baire category (recall that the union of two classes of first Baire category is of the first Baire category as well). | Taking C = NP - P, we obtain the following result. C o r o l l a r y 3.3.23 Let T be any sound deduction system and assume P ~ NP. Then there is a set A in N P - P such that, for any Turing machine M calculating the characteristic function of A, the sentence "The function computed by M belongs to NP - P " is not a theorem of T . Similar results hold for all the other classes that have been shown to be effectively of second Baire category.

3.4

P, NP, E

the measure-theoretical view

IN BRIEF: A type of resource-bounded measure (PF measure) is considered, in which martingales are polynomial-time computable. It is shown that E = Uc>0DTIME[2 on] does not have PF-measure zero. On the other hand, classes of sets for which some very weak membership property is decidable in deterministic polynomial time have PF-measure zero. The PF-measure of NP is not known, but it is shown that if it is not zero, then NP many-one completeness and NP Turing completeness are different. We now turn to the measure-theoretical approach (the reader may find useful to review Section 1.2.2 and also Section 1.2.1 for the different notations regarding binary strings). This time, the computational requierements will be stronger: We will ask our constructions not only to be doable effectively (i.e., performed via computable functions), but to actually be doable in polynomial time. More precisely, in this section we will consider martingales that run in polynomial time, and, consequently, we will use Definition 1.2.11 with F = PF, where P F is the class of functions computable in polynomial time. 4 We will see that this approach is useful for investigating the size of different classes of languages inside the class E = ~c>0 DTIME[2~'~] 9 First we need to show that E itself does not have PF-measure zero (otherwise, every class inside E would have PF-measure zero). T h e o r e m 3.4.1 E does not have PF-measure zero. Proof. We show that for every martingale d E PF, there is a language A E E such that d does not succeed on A. This implies that there is no martingale that can succeed on all languages in E, and, thus, E does not have PF-measure zero.

4Taking F to be the class of computable functions, as we did in the previous measuretheoretical (and, mutatis mutandi, topological) analysis, is not adequate, because it results in all classes of interest having measure zero.

Chapter 3. P, NP, and E

72

Let d be a martingale in PF and let c be a positive constant so that d is computed by a machine whose running time on any input of length n is bounded by n c + c. We define the function a" E* --~ E* by

a(x)-

x0, xl,

if d(xO) <_ d(x), otherwise.

We denote, for all i >_ 1, a~(A) - a(a(.., a ( A ) . . . ) ) , where A is the empty word. i times

Thus, al(A) - a(A), a2(A) - a(a(A)), and so forth. Note that, for each i, the length of hi(A) is i. Let A be the set defined as follows: For all i _> 1, si E A r

the last bit of hi(A) is 1.

We first show that A E E. Recall that [si[ - Llogi], for all i _> 1. To check if s~ is in A or not, we need to calculate hi(A), which involves i calculations of the martingale d on strings of length at most i - 1. These calculations take time bounded by i . ((i - 1) c + c) < i c+1 (for i sufficiently large) -- 2 (c+l)l~ _< 2 (Is'E+l)(~+~) < 2 (c+2)ls'l (for Is~[ sufficiently large). Therefore A E DTIME[2 (c+2)n] C_ E. Next we show that d does not succeed on A. Note that, by the definition of the function a, d(A (s 1) ) _ d(A), and

d ( A ( s l ) A ( s 2 ) . . . A ( s , ~ ) ) <_ d ( A ( s l ) A ( s 2 ) . . . A(sn-1)), for all n _ 2. Thus, for all n >_ 1, d ( A ( s l ) A ( s 2 ) . . . A ( s n ) ) < d(A), which implies that d does not succeed on A. A basic result of complexity theory (also, perhaps, the best known one, even among people with otherwise feeble acquaintance of complexity) is that P C E. We want to investigate the amplitude of this separation. We will demonstrate that the PF-measure of P is zero, which, together with the fact that E does not have measure zero, means that the class of problems solvable in polynomial time represents just a very small part of E. Moreover, we will consider classes C obtained through various relaxations of P, and we will show that these classes also have PFmeasure zero. We first establish a lemma which is useful in showing that different classes have PF-measure zero. It provides conditions under which a countable union of PF-measure zero sets have PF-measure zero. L e m m a 3.4.2 Let C - UicN Ci, with Ci C_ E cr for all i E N. Let d" N • E* ---, E* be a martingale system such that: (a) For all i E N, di succeeds on Ci, and (b) there is a constant c > 0 and a Turing machine M such that, for all i E N and for all inputs x of length n >_ i, M computes d~(x) in time bounded by nC(log n) c~, Then C has PF-measure zero.

3.4.

73

P, NP, E - - t h e measure-theoretical view

P r o o f . For each i E N, let C2~, - Ci and d22, - di. For j E l~l which is not of the

form 2 21 , we define Cj - ~ , and we let ~ be the constant martingale that assigns 1 to all inputs. Note that C = [.J Ci and that, for all i :> 2, di can be calculated in time nC(logn) cl~176 for all inputs of length n >_ loglogi. Also, it is immediate to check that the family of functions (di)ieN forms a martingale system, and that each d'~ succeeds on Ci. Observe that, for i > 2, (log n) l~176 _< n, for all n > i. Therefore, for all i C N, di can be calculated in time bounded by n 2c, for all inputs of length n _ i. Next we proceed like in the proof of Proposition 1.2.12.~. For each i C N, let hi(x) be defined such that, for each x ~ A, d~(x) - 5~(x). d~(pred(x)), where pred(x) is the prefix of x of length i x l - 1. Also let -i

d~(x) -

I2

5,(x) . d,(pred(x))

if Ixl < i, if Ixl > i.

One can check that, for all i, (a) di is a martingale, (b) di succeeds on Ci, and (c) a~i(x) can be computed in time bounded by n 2c+ 1, for all inputs x of length n >__i. Let d(x) be defined by d ( x ) - ~ - - 0 d~(x). Then O0

i=0 Ixl

O0

E 2- i i=l~[+l

i=0 lxl _-

+

i--O

Then d is a martingale and, from the above expression, it can be seen that d is in PF. Also, for each i C l~l, for x with Ix I >_ i, d ( x ) > ~.(x). Since ~ succeeds on Ci, it follows that d succeeds on Ci as well. Since this holds for all i C l~, we conclude that d succeeds on C, and, thus, C has PF-measure zero. | Theorem

3.4.3 P has P F - m e a s u r e zero.

P r o o f . Let (/14/)ieN be an effective enumeration 5 of polynomial-time machine Turing machines accepting languages such that, for all i, the running time of/14/ is bounded by n i for all inputs of length n >_ i. Let Ai be the language accepted by Mi. Then P = UieN{Ai}. For each i, we define a martingale d~ that succeeds on { A i } as follows: di(A) - 1, and, for every x e E* - {A}, di(xO) -

2di(x) 0

if s]~l+ 1 ~ Ai, if sl,l+ 1 E Ai,

5This means that there is one Turing machine such that, for all i E N and for all x E E*, M(i,x) = Mi(x).

Chapter 3. P, NP, and E

74 and di(xl) _ f o

if Slxl+l E Ai, if Sl~l+ 1 r Ai.

It follows that, for every n, di(Ai(sl)Ai(s2)... Ai(sn)) = 2 n, and, thus, di succeeds on {Ai }. The value of di on input x of length n can be calculated in time bounded by ISl~l[~ _ (log n) ~. By Lemma 3.4.2, it follows that P has BE-measure zero. I The next result is a significant strenghtening of Theorem 3.4.3. P is the class of languages A for which membership in the language (i.e., answering the question "Is x in A?") is solvable in polynomial time. Researchers have considered generalizations of P obtained by relaxing the membership question. We will study to what extent Theorem 3.4.3 extends to these generalizations of P. We list below the most important such generalizations of P with references to the papers where they have been introduced (some of these concepts have generated extensive literature, but it is beyond our scope to review it here). A set A C ~* is P-selective [Sel82] if there exists f E P F such that, for all pair (xl,x2), f(xl,x2) E {xl,x2} and (xl E A) v (x2 E A) ~ f(xl, x2) E A. A set A C ~g* is P-multiselective [HJRW97] if there exists f E P F and a natural number constant q > 1 such that, for all q-tuple (xl,...,Xq), f(x~,...,Xq) E { x l , . . . , x q } and (xl E A) V . . . V (xq E A) =~ f ( x l , . . . , X q ) E A. A set A C ~* is cheatable [Bei87] if there exists f E P F and a natural number constant q >_ 1 such that, for all q-tuple ( x l , . . . ,Xq), f ( x l , . . . ,Xq) outputs a set D C Eq of size q that contains A ( x l ) . . . A(xq). A set A C ~* is easily countable [HN93] if there exists f E P F and a natural number constant q >_ 1 such that, for all q-tuple ( x l , . . . , x q ) , f(x~,...,Xq) E { 0 , . . . , q } a n d f ( x l , . . . , x q ) is not equal to the cardinality of A N {Xl,...,Xq}. A set A C ~* is easily approximable [KS91, BKS94] if there exists f E P F and a natural number constant q > 1 such that, for all q-tuple ( x l , . . . , Xq), f ( x ~ , . . . , Xq) outputs a q-bit vector ( y l , . . . , yq) for which at least half the numbers y~ are equal to A(xi). A set A is near testable [GHJY91] if there is f E P F such that, for each g E N, f(se) computes the truth value of the predicate (st E A) @ (St+l E A), where 9 represents the "exclusive or." A set A is nearly near testable [HH91] if there is f E P F such that, for each l E N, f(st) outputs the truth value of one of the following two predicates: (a) st E A or (b) (st E A)@ (se+l E A). A set A is locally self-reducible [BS95] if there is a constant q >_ 1 and a polynomial-time deterministic oracle machine M that recognizes A and, for all natural number i >_ 1, M on input si queries only elements of the set A set A is P-approximable [BKS94, Ogi94] if there exists some constant q such that, for all q-tuple ( X l , . . . , Xq), one can exclude in polynomial time one possibility of how the characteristic function of A is defined on x 1 , . . . , Xq.

3.4. P, NP, E--the measure-theoretical view

75

The properties used in the above definitions are called collectively polynomialtime weak membership properties. We consider another type of sets, defined in the same spirit. D e f i n i t i o n 3.4.4 (P-quasi-approximable sets) A set A is P-quasi-approximable if there exists a constant q and a polynomial-time algorithm M which takes as inputs q-tuples of strings and outputs either a q-long binary string or "I don't know" (denoted '?') and which satisfies the following property: For infinitely many q-tuples ( x l , . . . ,xq), with xi+l being the lexicographical successor of xi for i - 1 , . . . , q - 1, M outputs a q-long binary string, and whenever this happens the q-long binary ouput string is different from A ( x l ) . . . A(xq). All the above types of sets are obtained by stating that some property of the characteristic function can be decided in polynomial time. It is easy to check that P-quasi-approximability is at least as weak as any of the other properties (e.g., if a set is P-selective then it is P-quasi-approximable, etc.), and, thus, the class of P-quasi-approximable sets includes the class of sets having any of the other polynomial-time weak membership properties. The next theorem shows that the class of sets that are P-isomorphic to some P-quasi-approximable set has P F measure zero. D e f i n i t i o n 3.4.5 (P-isomorphism) Two sets A, B C E* are P-isomorphic if there is a bijection h" E* ~ E* such that (a) both h and its inverse h -1 are in PF, and (b) for all x E E*, x E A if and only if h(x) C B. T h e o r e m 3.4.6 The closure under P-isomorphism of the class of P-quasi-approximable sets has PF-measure zero.

Proof. Let (hi, hj)i,jcN be an enumeration of all pairs of polynomial-time functions hi, hj 9 E* ~ E*, and, for all natural numbers q _ 1, let (fiq)icN be an enumeration of polynomial-time computable functions fiq" ({0, 1}*)q --, {0, 1}q U {?}. We will assume without loss of generality that, for all i, hi and fi are computable in time bounded by n i(1/3) , for all inputs of length n _> i. The closure under P-isomorphism of the class of P-quasi-approximable sets is equal to the union of the classes (At)teN defined as follows" Let us consider a bijection (.,-,., .)" l~4 -~ N. For t - (i, j, m, q), in case hi is a bijection and hj is the inverse of hi, we let At be the class of sets A that are P-isomorphic via hi to some set that is P-quasi-approximable via q and fqm; otherwise, we let At be the empty set. By Lemma 3.4.2, it is enough if there is a martingale system (dt)tcN such that each class At has PF-measure zero via dt (i.e., dt succeeds on At) and dt runs in time O(n(log n)t). It is thus sufficient to design such a martingale system (dr)teN. We fix i, j, m, q and t - (i, j, m, q), and write more simply h, h -1, f, d, A instead of hi, hi, f q , dr, At. We can take the bijection such that t >_ max(i, j, m, q). In what follows, we assume that hi is a bijection and hj is its inverse, otherwise it is clear that d succeeds on A. The martingale d will take advantage of the fact that from

Chapter 3. P, NP, and E

76

time to time (but for infinitely many t?), f(st, s t + l , . . . , S t + q - 1 ) returns a q-long bit binary string that is different from B ( h - l ( s t ) ) . . . B ( h - l ( s t + q _ l ) ) , for all sets B in A. Consequently, d bets 0 on all strings x such t h a t x(pos(h-~(st_l+j))) f ( s t , s t + l , . . . , s t + q - 1 ) ( j ) , for j - 1 , . . . , q , and distributes the a m o u n t t h a t becomes available to the other strings (recall, t h a t f ( s t ( x ) , . . . ,st(x)+q-1)(j) is the j - t h bit of f ( s t ( x ) , . . . , st(,)+q-1)). In this way, the amount t h a t is allocated by d to these "other" strings is increased by a multiplicative factor of 2q/(2 q - 1). The set of these "other" strings contain all the prefixes of length m a x ( p o s ( h - 1 ( s , ) ) , . . . , pos(h-1 (s,+q_l)) of sets in A because d has allocated 0 only to strings t h a t cannot be prefixes of the characteristic function of any set in A. Since the redistribution can be done infinitely often, d succeeds on A. The redistribution task must start well in advance of reaching the point where d bets 0 on strings x as above. Therefore it is convenient t h a t as soon as a value g as above is found during the computation of d on some input x, preparatory steps for all the further bets (i.e., the redistribution task) are made on the spot. The multiplicative factors of these antedated bets, denoted by d(-), are computed now and are stored for further use in a d a t a structure called LIST(x) which will be transferred to the offsprings of x and then to the offsprings of the offsprings and so on until the whole redistribution is finished. T h e strings x on which the redistribution task is performed will be marked active as opposed to the other strings which are marked inactive. This marking is used to prevent the overlapping of intervals of strings on which distinct redistribution tasks are performed. We proceed to formally describe the c o m p u t a t i o n of d on input x - X l . . . , x , ~ , where xi C {0, 1},i - 1 , . . . , n . We assume t h a t q > 2 (the case q - 1 is easier). If x = A, then d(x) - 1 and x is marked inactive. Suppose x # A and let x' = Xl ... xn-1 (i.e., x' is obtained from x by removing the last bit). We first compute d(y) for all strict prefixes y of x. There are two Ca~es:

Case 1. x' is marked inactive. Let sm - h(sn). We want to see if Sm is part of a q-tuple ( s , , s , + l , . . . , S,+q-1) such t h a t f ( s , , s , + l , . . 9 s,+q-1) E {0, 1} q. To this aim, we check the following T E S T : "There is a natural n u m b e r C [m-q+l,m]suchthatf(s,,s,+l,...,Se+q-1) E {0,1}qand n - min{pos(h-1 (s,)), p o s ( h - l ( s , + l ) ) , . . . , pos(h-l(s,+q-1)). '' Case 1.1. The answer to the T E S T is NO. LIST(x) = 0 and x is marked inactive.

Then

d(x)

=

d(x'),

Case 1.2. The answer to the T E S T is YES i.e., a "good" value g has been found and a redistribution task can be started. We say t h a t x triggers a redistribution task. We do right now the p r e p a r a t o r y steps

~.4- P, NP, E--the measure-theoretical view

77

for the redistribution task. Let g(x) be the smallest value satisfying the TEST. We order lexicographically the set

{h-l (st(,) ), . . . , h-l (st(x)+q-1) }, obtaining Zl < z2 < ... < Zq. This reordering defines a permutation 7r: {0, 1}q --~ {0, 1}q. We insert in LIST(x), in order, the following q triplets: (Zl, d(zl), bl),..., (Zq, d(zq), bq), where 1 q--1 d(zi)= 2q-IE 2h h--i

and, for i = 1 , . . . , q , bi - f ( s t ( x ) , . . . ,

8l(x)+q_l)(Tr-l(i)).

Note that in the last triplet, d(zq) = 0. A triplet (z,d(z),b) C {0, 1}* x R • {0, 1} signifies that when the computation of d will reach a successor of x of length pos(z), call it u, it will bet d(z)d(x') on it if the bit pos(z) in u coincides with b and will bet ( 2 q / ( 2 q - 1))d(x I) if that bit does not coincide with b (d(u) will be computed according to case 2.1 below). The redistribution starts with x, so that, according to the strategy stated above, we mark x active and define: 1

[x-~q-1

~--L'~,A_~h--1 2h) " d(27'),

d(x)-

2~ d(x"~ 2q-1

~

J'

if Xn = f(st(z), . . . , st(x)+q-1)(1) if x~

# f(se(~),...

,8s

)

'~ase 2. x I is marked active (i.e., a redistribution task is in progress). Let LIST(x 1) - ((sil, d(sil), b l ) , . . . , (siq, d(s~q), bq)). Case 2.1. O n e o f (s,~,d(s,~),x,~) or (sn, d(s,~), 1-Xn) is in LIST(x'). Then,

{ d(sn)d(x(l : il - 1)),

if (s,,, d(sn),xn) C LIST(x')

d(x) 2q d(x(1 "il - 1)),

if (Sn, d(sn) 1 - Xn) e LIST(x')

Next, if (sn, d(sn),x,,) or (sn, d(sn), 1 - x,~) are not in the last position of LIST(x'), then LIST(x) = LIST(x') and x is marked active (the redistribution continues for the offsprings of x). In case (s,~, d(s,,), Xn) or (Sn, d(sn), 1 - xn) are in the last position of LIST(x'), then LIST(x) = ~ and x is marked inactive (the redistribution task is finished).

Chapter 3. P, NP, and E

78

Case 2.2. Both (sn, d(sn),xn) or (sn, d(sn), 1 - x,~) are not in LIST(x'). Then d(x) = d(x'), LIST(x) = LIST(x') and x is marked active. The following Claims show that d achieves the purported goals. C l a i m 3.4.7 d(x) can be computed in time O ( n . (log n)t), where n = Ixl.

Proof. The computation of d(x) involves an autonomous part and the computation of d(y) for all strict prefixes y of x. Since there are Ix] such prefixes, we only have to show that the autonomous part can be computed in O(logn). If Case 1 is entered, we have to compute h(slxj) , check the TEST, and, if Case 1.2 occurs, find z l , . . . , zq, insert q triplets in LIST(x) and do some easy computations. One can check that these operations take time O((logn)t). The operations required by Case 2 take O(t) time, since there are a constant number (namely q < t) of elements in LIST(x'). I C l a i m 3.4.8 d(.) is a martingale.

Proof. Let x = XlX2 . . . x n , x' = xlx~....x,~-I and x" - xl(1 - X n ) , where xi C {0, 1}, for all i e { 1 , . . . , n}. We show that

d(x') - d(x) + d(J') 2

(3.3)

for all x E {0, 1 }*. We focus on the computation of d(x). If x' is marked inactive, then d ( x') will also find x ~ to be inactive and the T E S T evaluates the same in the computation of d(x) and d(x"). Now, relation 3.3 can be easily checked. Suppose next that x ~ is marked active and let LIST(x') - ((sil, d(s~), b l ) , . . . , (s~q, d(s~q), bq)). It is clear that the same case among Case 2.1 and Case 2.2 applies to both d(x) and d(x"). If Case 2.2 applies to both d(x) and d(x"), relation 3.3 is checked immediately. Suppose that Case 2.1 applies to both d(x) and d(x'), with (s,~, d(sn),x,~) in LIST(x') (the other situation, (sn, d(sn), 1 - xn) in LIST(x'), is symmetric). It follows that there is some p such that n ip and x,~ bp. We claim that p ~ 1 (i.e., (s~, d(n), b~) is not the first triplet in LIST(x')). For the sake of obtaining a contradiction, assume that p - 1. This means that il = n, which implies that x and x/I, both being strings of length n, are triggering the current redistribution task. This contradicts the fact that x I is marked inactive. Therefore, p ~ 1. It is clear that for prefixes x(1 : r) of x, with ip-1 < r < ip (if there are any), d(x(1 : r)) is computed according to Case 2.2, and, thus, d(x(1 : r)) = d(x(1 : ip-1)). Now, either x ' = x~p_1 or x' is a x ( l : r) as above. In both cases =

- -

q--1

d(x') - d(x'(1 "in-l)) - (1/(2 q - 1)). ( E h=p-1

2h)" d(x(1 "(il - 1))).

3.4. P, NP, E--the measure-theoretical view Since

79

q--1

d(x) - (1/(2 q - 1)). ( E

2h)" d(x(1 "(il - 1)))

h=p

and

d(x") = (2q/(2 q - 1)). d(x(1 : (il

-

-

1))),

relation 3.3 is verified.

II

C l a i m 3 . 4 . 9 d succeeds on A.

Proof. We inductively define the infinite sequence of integers (gi)ieN as follows. Let go = 1 and gi+l = "the smallest value g > gi t h a t is selected as g(x) in Case 1.2 in the c o m p u t a t i o n of d on some x C {0, 1}*." By the properties of f, it is clear t h a t gi is defined for all i. For a value g in the above sequence, let me = m i n { p o s ( h - l ( s e ) ) , . . . , pos(h-l(St+q-1))} and Me = m a x { p o s ( h - l ( s e ) ) , . . . , pos(h-l(St+q_l))}. Since for all sets B in A f ( s t , . . . , Se+q-1) # B ( h - l ( s t ) ) . . . B(h-l(St+q-1)), it follows t h a t d(B(1 : Me)) = (1 + 1/(2q - 1))d(B(1 : me - 1)). For any n, let Tn be such t h a t (1 + 1 / ( 2 q - 1)) T" _ n, we conclude that for all sets B in A, d(B(1 9 Mere)) - (1 + 1/(2 q - 1))T"d(A) > n, because d(B(1 " me,_1 - 1)) d(B(1 : Me,)). II Claims 3.4.7, 3.4.8, and 3.4.9 show that the requirements of L e m m a 3.4.2 are satisfied for the clas of P-quasi-approximable set,and, therefore, this class has P F - m e a s u r e zero. II Since the classes with weak membership properties mentioned in this section, as well as the class of sets that are not P-bi-immune 6, are all included in the class of P-quasi-approximable sets, Theorem 3.4.6 has the following immediate corollary. C o r o l l a r y 3.4. 10 The following classes have PF-measure zero:

(1) the (2) the (3) the (4) the (5) the (6) the (7) the

class class class class class class class

of of of of of of of

P-selective sets, P-multiselective sets, cheatable sets, easily countable sets, easily approximable sets, near-testable sets, nearly near-testable sets,

(8) the class of locally self-reducible sets,

(9) the class of P-approximable sets, (10) the class of sets that are not P-bi-immune. A set A is P-bi-immune if neither A nor its complement contain an infinite subset in P.

80

Chapter 3. P, NP, and E

The class of sets that are in E and that are P-quasi-approximable also have PF-measure zero, simply because it is included in the class of P-quasi-approximable sets. Keeping in mind that E does not have PF-measue zero, it means that "most" (in the sense of PF-measure) sets in E are not P-quasi-approximable. More precisely the PF-measure of the class of sets which are in E and which are not P-quasi-approximable is not zero. In follows that "most" (again, in the sense of PF-measure) sets in E do not have any of the polynomial-time weak membership properties. We show next that Theorem 3.4.6 does not extend to the closure of the class of P-quasi-approximable sets under many-one polynomial-time equivalences. D e f i n i t i o n 3.4.11 Two sets A, B C E* are many-one polynomial-time equivalent if there are two functions f, g" E* --~ E* such that (a) f and g are computable in polynomial-time, (b) x E A r f ( x ) E B, and (c) x E B r g(x) E A. T h e o r e m 3.4.12 The class of sets that are many-one equivalent to some P-quasiapproximable set does not have PF-measure zero. Proof. Let

A-

{A E E I A has infinitely many strings of the form 0 n for some n E l~}.

Observe that if A is a set in E - ,4, then A is not P-bi-immune, because there is an no such that the set D - {On ] n > no} is infinite, in P, and included in the complement of A. Hence, by Corollary 3.4.10 (10), the class E - ,4 has PF-measure zero. Since the union of two PF-measure zero sets has PF-measure zero (this follows from Lemma 3.4.2) and since E does not have PF-measure zero, it follows that ,4 does not have P F-measure zero. We will show that any set A in ,4 is polynomial-time equivalent to some set B which is not P-bi-immune (thus, B is P-quasi-approximable). Let {$1 < $2 ( ...} be the lexicographical ordering of E* - {02n I n E N}. For each A E ,4, we define B by: (1) S~ e B ~=~ s~ E A, for all i E N, and (2) 02n E B r 0 n E A, for all n E N. We take the function f : E* --* E* to be defined by f(si) = Si, for all i E N. We take the function g: E* --~ E* to be defined by g(Si) = si, for all i E N, and, for all n E N, g(02~) = 0 n. Clearly, f is computable in polynomial time, and, for all x E E*, x E A <=~ f ( x ) E B. Also, g is computable in polynomial time, and, for all x E E*, x E B r g(x) E A. Consequently, A is one-one polynomial-time equivalent to B. Also B contains the set {02~ I 0n E A}, which is infinite and in P. Thus, B is not P-bi-immune. I Thus the PF-measurability shows a big difference between exponential-time computation and polynomial-time computation, even ff the latter is used to decide very weak properties of sets. The next natural question is: W h a t is the PF-measure of NP? The problem is open and probably beyond the current state of affairs in complexity theory. Indeed, if NP has PF-measure zero then NP C UkcN DTIME[2nk]' and, if it does not have PF-measure zero, then P C NP.

3.~. P, NP, E--the measure-theoretical view

81

Given our belief in the deep separation between polynomial-time computation and nondeterministic polynomial-time computation, the hypothesis that NP does not have PF-measure zero looks plausible. It is interesting to see some consequences of this hypothesis which are not known to be implied by the weaker hypothesis p C NP. The next theorem presents a very important such consequence: It shows that _ P NP-completeness and ___~ NP-completeness are different. T h e o r e m 3.4.13 Assume that NP does not have PF-measure zero. Then there is a set which is NP-complete under <~ reduction and it is not NP-complete under <-Pro reduction. Proof. We introduce first some notation. For any set A C ~E*, we denote A0 = {x ]0x e A}, and A1 = {x I lx E A}. SAT is the set of boolean formulas encoded in some standard way as binary strings. As it is well-known, SAT is _P-complete for NP. If B and C are two sets in ~*, the tagged union of B and C is a set denoted B @ C defined by BOC-{OxlxeB}

U (l~l~eC).

C l a i m 3.4.14 Let A E NP. The set Ao@((AoASAT)@(AoUSAT)) is NP-complete under <~ reduction. Proof. Since A is in NP, A0 is in NP as well. NP is closed under @, O, and U; whence A is in NP. To show
r

(x E A0 and x E A0 n SAT) (x r A0 and x E A0 U SAT).

or

Therefore, xESAT

v==~

(0x E A and 10x E A) or

(0x r A and l l x E A).

Thus, with two easy-computable queries to A (of which the second depends on the answer to the first one), we can determine if x E SAT or not. I C l a i m 3.4.15 The class A = {A I A~ <_~ A0 @ ((A0 n SAT) @ (A0 U SAT))} has PF-measure zero. This will end the proof because from the assumption that NP does not have P Fmeasure zero, it follows that there is a set A E NP such that A r ,4. Consequently, A0 @ ((A0 n SAT) ~ (A0 u SAT)) is not _


82

Chapter 3. P, NP, and E

which can be determined effectively. For example, if the _ P reduction is done via some function h E P F and if, for some x E A1, h(x) = 10y, then it is not possible that x E A1 and y r A0, i.e., it is not possible that lx E A and 0y ~ A. Such a dependency allows a martingale to bet zero on all sets that have 1 on position l x and 0 on position 0y of their characteristic sequence and this can be exploited to make the martingale succeed. Let {hi }~eN be an enumeration of all functions in P F such that h~(x) is computable in time bounded by n i for all inputs x with Ix] >_ i. We can also assume that there is a polynomial-time algorithm which on input 1i constructs a machine that ccomputes hi. Let Ai - {A I A1 <_P A0 @ ((A0 N SAT) @ (A0 u SAT)) via hi}. Then .4 - UieN Jli. By Lemma 3.4.2, it is enough to show the following statement: (S) There is a martingale system such that, for all i, di is computable in time n3(log n) i for all inputs of length n > i and di succeeds on Jli. We will actually prove an auxiliary claim which implies statement (S). C l a i m 3.4.16 For any i, there is a function fi" E* ---, (E* • {0, 1}) • (E* • {0, 1}) U {'?'} such that: (1) For any i, f~ is computable in time bounded by 4n2(log n) i for all inputs of length n >_ i. (2) For any i, there are infinitely many inputs x such that fi(x) E (]E* • {0, 1}) x • {0,1}). (3) For any i, if fi(x) - ((xl, bl), (x2, b2)), then (a) xl > Sl~ I, x2 > st~ I, xl ~ x2, and (b) for any A E Ai such that A(1 9 Ixl) - x , ( A ( x l ) , A ( x 2 ) ) ~ (bl, b2). (4) There is a polynomial-time procedure which, for all i, constructs a machine that computes f i from a machine that computes hi. We will not give the full proof of the fact that Claim 3.4.16 implies statement (S), because such a proof is similar to (and easier than) the proof of Theorem 3.4.6. We just observe that the property of the class Ai asserted in Claim 3.4.16 resembles the property of P-quasi-approximability in that it states that it is possible to easily determine infinitely many pairs (xl,bl), (x2, b2)such that ( A ( x l ) , A ( x 2 ) ) (bl, b2). Thus, as in the proof of Theorem 3.4.6, as soon as such a pair is determined, we can prepare future bets such that the martingale assigns 0 to all strings having bl,b2 in the positions x l , x 2 . As we have seen in the proof of Theorem 3.4.6 this is enough to make the martingale di succeed on ,4~. One can see that di(x) can be calculated in time n3(log n) ~ for all inputs x of length n >_ i, and that there is a polynomial procedure that constructs a machine for di from a machine for f~. End of Proof of Claim 3.4.15. I Proof of Claim 3.~.16. There are four cases to consider (identified below as Case I, Case 2.1, Case 2.2, and Case 2.3). We show that in each of the four cases,

3.4. P, NP, E--the measure-theoretical view

83

a function f~ satisfying (1)- (4) can be constructed. One small problem is that we do not know which of the four cases actually holds. This is solved by attempting to compute all four variants allowing n2(logn) i steps for each variant. The first variant that produces an output different from '?' will give the output of the final fi. If all variants produce '?', then this will be the output of the final fi. Case 1. The set {(xl,x2) E E* • E* Ix1 ~ x2, hi(xl) -- hi(x2)} is infinite. In this case the function fi is defined by the following algorithm that computes it. On input x of length n, the algorithm calculates within the allowed time the strings h~(sl),h~(s2),..., hi(sn+l) and checks if there is j _ n such that hi(sj) = hi(s,~+l). Note that this computation requires at most (n + 1)(logn) i steps, and, for sufficiently large n, this is at most n2(logn) i. Thus if the computation does not terminate within n2(logn) i steps, we stop it and '?' is output. Otherwise (and this will be the case for almost every x), if there is such a j, the algorithm outputs the pair of strings (s~+l, 1 - x(j)) and some other arbitrary pair of strings, say (sn+2,0) (the second pair is relevant only in Case 2). If there is no such j , the algorithm again outputs '?'. Note that, if h~(sj) = h~(sn+l), for any A E Jl~, we have A(sj) = A(sn+l) because hi is a reduction. Therefore, in this situation, no set A, for which x is a prefix of its characteristic function, can have the value 1 - x(j) in the (n + 1)-th position of its characteristic sequence. This establishes ( 3 ) i n the Claim. Statements (1), (2) and (4) are easy to check. Case 2. There is a string x0 E E* such that, for all Xl and x2 in E* with 92 >

>

#

A first observation is that, in this case, for all sufficiently large n, there is a string x of length n such that [h~(x)] >_ Ixl. The reason is that there are 2 n strings of length n and only 2 n - 1 strings of length at most n - 1, and thus it is not possible to map into a one-to-one manner all the strings of length n into strings that are shorter than n. Without loss of generality we can assume that there is a string x0 such that x0 is not in SAT, and, for any A E Ai, 0x0 r A (if this is not the case, we can further split ,4~ into the countable union of sets that do not contain 0x0, the sets that do not contain 00x0, etc.). We can also assume that, for all x C E*, h(x) starts with 0, 10, or 11, because otherwise h(x) r Ao | ((Ao N SAT) | (A0 U SAT)), and we could modify h(x) to be 0x0. We define the sets

Bo = {x I hi(x) starts with 0 } B10 = {x ]hi(x) starts with 10 } Bll = {x I hi(x) starts with 11 }, and the functions

hi,o(x) - { y'

XO,

hi,lO(X) - { y'

XO~

if x C B0 and hi(x) - Oy otherwise if x E B10 and hi(x) - 10y otherwise

Chapter 3. P, NP, and E

84 hi 11(x) - ~ y' ' I x0,

if x E Bll and hi(x) - l l y otherwise

We note that A1 N B0 _ P Ao via hi,0 because

x e A1 M Bo ~ (hi(x) E Ao @ ((A0 N SAT) (9 (Ao U SAT)) and hi(x) starts with 0 hi,o(x) E Ao. Similarly, A1NB10 <_P AoMSAT via hi,lO, and A1MBll _ P AoUSAT via hi,11. For any z C {0, 10, 11} and for any x, ]hi,z(x)l :> Ihi(x)1-2. Since BoUBloUBll = E*, by our observation above, there exists z E {0, 10, 11} such that for an infinity of z E Bz, Ihi,z(x)] _> Ix] - 2. Depending on z, we have three cases. Case 2.1. (z = 0) There are an infinity of x C B0 such that Ihi,0(x)l >_ I x l - 2. S i n c e A I M B 0 ---ProA0 viahi,0, for a n y x E B0, x E A1 ~ hi,0(x) E A0, and thus there are an infinity of x such that A(lx) = A(Oh~,o(x)). Therefore, for such an x, (A(lx), A(Oh~,o(x))) ~ (1, 0) The function fi is defined by the following algorithm. On input y of length n, the algorithm looks for an x that is lexicographically larger than sn such that hi(x) starts with 0 (i.e., x E B0) and Ihi,o(X)l >_ Is,,[. If no such x is found within n2(log n) i steps, the output is '?'. Otherwise the output is (lx, 1), (Ohi,o(x), 0)). Note that, since we are in Case 2.1, for infinitely many y an x as above exists among the strings of length at most Isnl + 2 = ([lognJ + 2). Since checking an x takes time O(]x] i) = O((logn)i), the function will output a pair ((lx, 1), (Ohi,o(x),O)) infinitely many times. It follows that fi satisfies conditions (1)-(4). Case 2.2. There are an infinity of x e B10 such that Ihi,lo(x)l > Ix I - 2. Since A1 M B10 <_P A0 M SAT via hi,10, for no x E Blo is it possible that x E A1 and hi,lo(x) ~- Ao. Therefore, for such an x,

(A(lx),A(Oh~,o(x))) ~ (1, 0). Similarly to Case 2.1, the function f~ is defined by the following algorithm. On input y of length n, the algorithm looks for an x that is lexicographically larger than sn such that hi(x) starts with 10 (i.e., x E B10) and Ihi,lo(x)] >_ Isnl. If no such x is found within n2(log n) i steps, the output is '?'. Otherwise the output is (lx, 1), (0hi,0(x), 0)). As in Case 2.1, the function fi satisfies (1)-(4). Case 2.3. There are an infinity of x E Bll such that ]hi,11(x)l_ Ix I - 2. Since A1 N Bll -Pro A0 U SAT via h~,11, for no x E Bll is it possible that x ~ A1 and h~,11(x) E A0. Therefore, for such an x,

(A(lx),A(Oh~,o(x))) ~ (0, 1). Similarly to Case 2.1, the function fi is defined by the following algorithm. On input y of length n, the algorithm looks for an x that is lexicographically larger

3.5. Strong relativized separation o] P and NP

85

than sn such that hi(x) starts with 11 (i.e., x C Bll) and Ihi,ll(x)l >_ Is,~l. If no such x is found within n2(log n) i steps, the output is '?'. Otherwise the output is (lx, 0), (0hi,0(x), 1)). As in Case 2.1, the function f~ satisfies (1)-(4). This ends the proof of Claim 3.4.16 and of Theorem 3.4.13. |

3.5

Strong relativized separation of P and NP

IN BRIEF: For almost all oracle sets A, there is a set L in NP A with the following property: Any deterministic polynomial-time machine with access to A, that attempts to determine if a string x is in L, is correct on only half the inputs x of length at most n, for all sufficiently large n. In some precise sense, it is provable that nondeterministic polynomial-time computation can do tasks that deterministic polynomial-time computation can not. The catch is that we allow both type of computations to have access to an additional set, called the oracle set, which is viewed as a data base that can be queried. We say that the computations are done relative to an oracle set (for details, see Section 3.1). Thus, questions of the type "Is x in the oracle set?" receive an immediate answer. The oracle set may contain a lot of information that is available for free, and, consequently, computations relative to an oracle set can be much more powerful than computations that are done "from scratch." For example, any computably enumerable set can be solved in deterministic polynomial time relative to the oracle set that encodes the HALTING PROBLEM. Nevertheless, in spite of the distortion introduced by oracles, the question of what can be done relative to various oracle sets is a viable topic worthy of scientific investigation. For a set A, we denote by pA the class of languages that can be solved in deterministic polynomial time relative to A. We denote by NP A the class of languages that can be solved in nondeterministic polynomial time relative to A. There are sets A such that pA _ NpA. For example, if A is a set that is <~,-complete for PSPACE, then PSPACE C_ pA C NP A C PSPACE A - PSPACE, and, thus, pA _ NpA. There also exist sets B such that pB C Np B (this is the result that we have referred to in the first paragraph). In this section we will prove a result that strengthens quantitatively the relativized separation of P from NP in two quite different directions. Given the (apparently) conflicting views resulting from different oracle sets, it is natural to ponder which of the relations pA _ NpA and pA C Np A happens for "most" oracles, i.e., what happens when A is chosen at random. We will see that the answer is that for "most" sets A, pA C N p A. This is the first direction of the generalization which regards the size of the set of oracles relative to which we have the separation of P and NP. The second direction, refers to a quantitative aspect of the separation itself. We will show that for "most" oracles A, there is a language L in NP A such that no deterministic polynomial-time algorithm can answer correctly the question "Is x in L?" but for, roughly speaking,

88

Chapter 3. P, NP, and E

half of the inputs x. Since the answer is either YES or NO, it can not be worse than this. Such a separation is called a separation with balanced immunity. D e f i n i t i o n 3.5.1 (P-balanced immunity) Let A C ~*. We say A is P-balanced immune if both A and its complement are infinite and each infinite set B E P satisfies the property that l i m n - ~ II(SnA)
class C is P balanced immune if there is a set A E C that is P balanced immune. To define what we mean by "most oracle sets," we utilize the apparatus of measure theory introduced in Section 1.2.2. We recall that a set A C ~E* is identified with the infinite binary sequence A ( s l ) A ( s 2 ) . . . A ( s n ) . . . E ~ . Such a sequence is also identified with a real number in the interval [0, 1] by associating to the above infinite binary sequence, the real number having the binary representation O . A ( s l ) A ( s 2 ) . . . A(sn) .... Via this representation, ~ can be viewed as the interval [0, 1]. Therefore, a class of sets of binary strings represents a subset of [0, 1]. Hence, such classes can be measured using the Lebesgue measure on [0, II. This approach is very natural because the Lebesgue measure of the entire interval [0, 1] is one, and, thus, the measure is a probability measure on ~ , which we denote Prob. Also, recall from Section 1.2.2, that the Lebesgue measure can be constructed starting with the basic intervals (Bx)xe~., where for x - x ~ x 2 . . . x ~ , x~ E {0, 1}, Bx - [ 0 . x l x 2 . . . x~, 0.XlX2... x ~ l l ...]. A natural way to define a r a n d o m set A is to flip a fair coin infinitely main times and to use the i-th flip as the value of A(si) (by considering, say, that head represents 0 and tail represents 1). Intuitively, the probability that the random set A belongs to Bx is 2-1-1. Since the length of the interval B , is 2-1xl, we see that the Lebesgue measure on [0,1] corresponds to the above method method of building random sets of strings. Our previous informal statements asserting that some property holds for "most" oracles, means, formally, t h a t the set of oracle sets for which that property is true has Lebesgue measure one. Or, in other words, if we build a set A by flipping a fair coin for each x E ~* to decide whether to put x in A or not, with probability one we obtain a set for which the property is true. The following terminology is very common. If P ( ) is a property that depends on an oracle set A and if the set {A I P ( A ) holds true} has measure one, we say that property P holds relative to a random oracle. T h e o r e m 3.5.2 INFORMAL STATEMENT: For almost all oracle sets A, there is a set in NP A that splits into half all infinite sets in pA at all suJficiently large lengths. FORMAL STATEMENT: NP is P balanced immune relative to a random oracle. 7

Proof. For each oracle set A, we build a language T ( A ) and we show t h a t (a) for all A, T ( A ) E NP A, and (b) for a set of oracle sets A having measure one, T ( A ) is pA-balaneed immune. In the construction, we split the characteristic sequence 7The notion of a P balanced immune can be relativized in the obvious way, i.e., by letting the set B in the Definition 3.5.1 be in relativized P.

3.5. Strong relativized separation of P and NP

87

of the oracle set into disjoint blocks that we attach to each x. Namely, for any x in E*, let Block(x) = {y I (3u e ~ * ) [ y = xu and ]Yl = 9lxl and y is among the first [(ln2)2sl~l]strings of length 9Ix I of the form y = xu 1}. For y = xu in Block(x), we define ~A(y) = A(xul)A(xulO)...A(xulOl"l-1). The language T(A) is defined by

T(A) = {x I (3y E Block(x))[(A(y) = EIGHT(x)]}, where E I G H T ( x ) denotes the string obtained by concatenating x with itself eight times. Clearly T(A) is in NP A, for all oracles A, and thus objective (a) is realized. We fix a deterministic polynomial-time oracle machine M and we let L(M A) be the language accpted by M with oracle set A. We look at the set of oracles A relative to which either L(M A) is finite or limn-~oo II(L(MA)nT(A))<-'*I exists and is IIL(MA)-<'~ It equal to 1/2. We will show that this set has measure one. The intersection of all these sets taken over all deterministic polynomial-time machines has measure one as well because it is a countable intersection of measure one sets. Hence, for any oracle set A in this intersection, NP A is pA-balanced immune. One important technical difficulty is that it is possible that, infinitely many times, M A queries on some input v strings that may cause some string w > v to be in T(A) and this affects the independence of some random variables that will be considered later. We will first show that this can happen only for a set of oracle sets that has measure zero. A string y = xu that is in Block(x), for some x, is said to be examined by Mm(w) if during the computation of Mm(w) the oracle is queried about any string of the form xulO k for some k < lu[. Define EXAM(A, w) = {YIY examined by MA(w) and not examined by MA(v) for v < w}, and E V I D E N C E ( A ) - U { Y I Y e Block(x) and ~A(y) _ E I G H T ( x ) It)

and (3w <_x)[y e EXAM(A, w)]}. Let Jll - {A [ E V I D E N C E ( A ) is finite }. C l a i m 3.5.3 P r o b ( A 1 ) = 1.

Proof. Since M A can make only a polynomial number of queries, it follows that, for all w sufficiently long, EXAM(A, w) contains fewer than 2 E~I elements. The probability that a fixed y E EXAM(A, w) satisfies ~A(y) _ EIGHT(x), for some x >_ w, is 2 -71.1 < 2-Vl~l. Let E(w) be the event that there is y in EXAM(A, w)

Chapter 3. P, NP, and E

88

such that ~A(y) _ EIGHT(x), for some x >_ w. The probability of E(w) is at most 21wl. 2-7twl - 2-61~1. Since the series )-~wcE* 2-61wl is convergent, it follows from the Borel-Cantelli Lemma that the probability that there are infinitely many w for which E(w) holds is zero. The conclusion follows. I In the remainder of the proof we will consider only oracle sets t h a t are in ,41. If L ( M A) is finite, L ( M A) cannot affect whether T ( A ) is pA-balanced immune or not. So let us focus on oracle sets A such that L ( M A) is infinite. We say that M A has evidence on a string x, if the machine M on some input z < x queries some string y in Block(x) such that ~A(y) _ EIGHT(x). For each k _ 1, we let xk(A) be the kth string, in the standard lexicographical ordering of E*, accepted by M A without evidence. We need to define xk(A) also for oracle sets A such that L ( M A) is finite. Thus, if A C .41 is such that L ( M A) is finite, then xk(A) is the k-th string in the set of strings z with the properties (a) z is larger than the largest string accepted by M A with evidence and (b) M A has no evidence on z. Note that, if A E ,41 and L ( M A) is infinite, then L ( M A) is equal to the union of the set {xk(A) I k >_ 0} with the finite (possibly empty) set of the strings accepted with evidence, and thus computing lim rt---*c~

II{xm(A) l m <_ n

and xm(A) e T(A)}II n

is sufficient for our purposes. The events "xk(A) E T(A)" conditioned by ,41 for different values of k are "almost" independent. This statement is formalized in the next Claim. C l a i m 3.5.4 Fix k > 1. Let 13 be an event of the form "x- (A) G T ( A ) and

... and xi. E T ( A ) and x j l ( A ) r T ( A ) a n d il, . . . i r , j l , . . . ,j8 < k. Then

... a n d x j ,

1

Prob(xk(A) E T(A) I B n A~) - -~

r T ( A ) " for some

16

<~.

Proof. The probability that xk(A) ~ T(A) conditioned by BnA1 is at most equal to the probability that there is y E Block(xk(A)) such that ~A(y) = EIGHT(xk(A)), and it is at least equal to the probability that there is y E Block(xk(A)) not examined on any input less than xk(A) such that (A(y) _ E I G H T ( x k ( A ) ) . Let n be the length of xk(A). Noting that k < 2 n+l (because there are 2n+l - 1 strings of length at most n), we infer that the number of queries on inputs A, 0, 1 , . . . , 1'~ is less than 22n, for n sufficiently large. Consequently, the number of strings that have been examined on inputs less than xk(A) is at most 22n, for k sufficiently large. Thus, 1

1--

1 - ~-~

<__Prob(xk(A) E T(A) I B n A ~ ) <_ 1 -

1-

In 2)28"

89

3.5. Strong relativized separation of P and NP From the Taylor expansion we get that, for m sufficiently large, 21

v 1~

<

(1 - ~)1 (ln2)m, and (1 - ~)(ln2) m-roll4 < 89+ ~ 1 . By substituting 2 s'~ for m in the above estimate, and taking into account again that k < 2 n+l, we obtain the statement in the Claim 3.5.4. II We define the random variables (Yj (A))j> 1, by

1, ifxj(A) E T(A) Yj(A) and, for any k, m E N,

-1,

if

xj(A) q[ T(A)

(k+l)m Sk,m(A)- E YJ(A)" 3=km+l

C l a i m 3.5.5 For any c > 0 and for any k E I~, k > 1, there exists a constant c

such that, for all m sufficiently large, Prob (

Sk,m(A)

c

> c 1,41

m

<_e4.m 2 9

Proof. To simplify notation we will write Yi instead of Yi(A). From the Chebyshev inequality we have that Prob (

Sk,m(A) m

>~lg~

] _< ~ 1 E[S~m[A1 ,

(3.4)

1 -- r

E[Y~4 I .A~]

(

km+l
+6

~

E[Y~2Yj2I .A~]

km+l<_i<3<_(kT1)m

E[Y~2YJYk I 'All

Z

-t-12

kin+ 1 <_i,j,k <_( k + 1)m,j < k

+4,

E[Y,Y Y,,Y, I A I).

Z

kin+ 1 <_i
We evaluate each of the four sums appearing on the right hand side. We immediately get that km+l<_i<_(k+l)m

and E[Yi2YJ2 I A1] <- ( 7 ) km+l_
-re(m-2

1)

Chapter 3. P,

90

NP, and E

The evaluation of the generic term in the third sum is E [y~2yj yk IA1] = Prob(xj(A) e T(A) and xk(A) e T(A) IA1 ) + Prob(xj(A) r T(A) and xk(A) ~_T(A)[A1) - Prob(xj(A) e T(A) and xk(A) r T(A)[Aa) - Prob(xj(A) r T(A) and xk(A) E T(A) IA1 ) = Prob(xj(A) e T(A) lA1)Prob(xk(A ) E T(A) I,A1 and xj(A) e T(A)) + Prob(xj(A) r T(A) lA1)Prob(xk(A ) r T(A) IA 1 and xj(A) r T(A)) - Prob(xj(A) E T(A) l,41)Prob(xk(A ) r T(A) I.A1 and xj(A) E T(A)) - Prob(xj(A) r T(A) lA1)Prob(xk(A ) e T(A) IA~ and xj(A) r T(A)).

By Claim 3.5.4, E [y2 yj yk ] <_2 . (2 + ~1 6 ) ( 1 + V1 6 ) ( -12 "

=32

16)(1

2

j4

16) 2

k4

(1 1) 1 ~+~-~ <64.m4.

Thus, 1 <_ 32--.m 1 E[y2yjyk] <_64m ( 2 ) .~--~ kin+l
Next we consider the generic term in the fourth sum and in a similar way we obtain (1 16) 4 (~ E [Y~Yj Yk Yt ] <_8 -~ + - ~ --8 16

16a

=a~-i+32~i5

16) a m4

--- (27+

217

1

) m4"

Thus, E[YiYjYkYt] Z kin+1
1 <_m 4" (27 + 217) 9 ~--~ - (27 + 217).

By substituting these evaluations in the inequality (3.4), we obtain Prob (

Sk,m(A) m

> e I A1 <_ s

for some constant c.

(

C

< --

1

s

. m

2 '

m+6

m(m - 1) + 12 32 + 4!(27 + 217)) m 2

3.5. Strong relativized separation of P and

NP

91

c 2 is convergent, using the Borel-Cantelli Lemma, we Since the series ~m=l c'.m infer that, for every e > 0 and every k >_ 1,

Pr~ ([ Sk'm(A)m

< e for almost every m [ A 1 ) - 1.

Since P r o b ( j l l ) - 1, it follows that, for every c > 0 and every k > 1, Prob (

Sk,m(A)

< e for almost every m / -- 1.

m

(3.5)

Let Jl~,k be the measure-one set of oracle set for which the event in the above probability expression holds. We denote

INA(m,n) --II{Xm+l(A),...,

xn(A)}

AT(A)[ I,

and O U T A (m, n) - II{xm+l ( A ) , . . . , x~ (A) } N T(A)II, where T(A) is the complement of T(A). Let us fix an arbitrary c > 0 and k _ 1. Relation (3.5) implies that for each A in fl, e,k, there is m0 such that, for all m _> m0,

IINA(km, (k + 1)m) The set ,4~ -

-

OUTA(km, (k + 1)m)l

_ ~.m.

OUTA(km,(k +

~k>lc4~,k has measure one. Since

1)m) +

INA(km, (k + 1)m)

- m, for any e > 0, for every oracle set A C ,4~, for any k :> 2, and for m sufficiently large, m(1 - e) <

2-INA(m, 2m)

< m(1 + e)

m(1 - e) _

2. IN A (2m, 3m)

< m(1 + e)

m ( 1 - e) <_ 2 - I N A ( ( k - 1)m,

kin) < m(1

+ e).

Summing up these inequalities, we obtain ( k - 1 ) m ( 1 - e) <__2.

INA(m, km) <_( k -

1)m(1 + e),

which implies that for all e > 0, for all A E Ac, for all k, for sufficiently large m,

1

e

2

2 -<

INA(m, km) (k-1)m

1

e

<-2- +-'2

We also have INA (m,

km

km)

INA(0, -

km

km)

,( -

ra + INA(m , kin) km

(3.6)

92

Chapter 3. P, NP, and E

which, combined with relation (3.6), yields 1

c

1

2

2

k-

< IN A(0,km) < 2 + e

km

-

1

~+;

For k m < n < (k + 1)m,

INA(O, km) < INA(0, n) < INA(O, km) + m n

n

n

which implies 1

c

1

<

INA(0, n)

< -1+

c

2 +~.

2 2 kn -2 The above relation holds for all e > 0, for all A E At, for all k, and for all n sufficiently large. The set ,4 - N fl[~ has measure one. Hence, for all A E A and thus with probability one of A, limn-,cr INa(0'n) exists and it is equal to 1/2. As n noted, this concludes the proof of Theorem 3.5.2. II

3.6

Average-case

complexity

IN BRIEF: A theory of average-case complexity is developed and the averagecase analogues of the classes P and NP are defined. It is shown t h a t there are NP-complete problems that are easy on average. A natural example of a problem that is complete for the average-case analogue of NP is exhibited. An NP complete problem is considered to be a hard problem. However, NPcompleteness only implies that there are some input instances on which the problem is unfeasible (of course assuming that P ~ NP). It is possible that these instances are few, rare, and perhaps irrelevant in the sense that a casual user may never be interested to solve these instances. In many applications it is more meaningful to know that a problem is hard or easy "on average." To tackle the issue of average complexity, we must first introduce a class of relevant probability distributions for the input instances. As usual, instances are encoded as strings over the binary alphabet ~ - {0, 1}. We consider the lexicographical ordering over IE* and for x, y C P~*, x < y means that x precedes y in this order. We denote the predecessor of x by x - 1 for any non-empty string x. A distribution on IE* can be given by either a distribution function or by a density function. D e f i n i t i o n 3.6.1 (Distribution function) A distribution function is a function # : ~ * --, [0, 1] such that

(~) ~ ~ o ~ - d ~ ~ i ~ 9 ,

i.~., for ~ZZ~, y e r~*, ~ < y i,~pZi~ ~(~) <_~(y), ~nd

(b) converges to 1, i.e., l i m x - ~ / z ( x ) - - 1.

3.6. Average-case complexity

93

D e f i n i t i o n 3.6.2 (Density function) A density function is a function # ' : E*

[0,11 ,u

h tha

= 1.

For any distribution function ~ there is an associated density function ~ defined

by , ' ( x ) = { #(A) ~u(x)-p(x-1)

, if x - A (the empty string) ,ifx~A.

Also, for any density function #' there is an associated distribution function defined by y<_x

Therefore a pair (#, p~), with ~ and #' associated to each other, represents a unique object called a distribution. D e f i n i t i o n 3.6.3 (Distribution) A distribution 1~* is a pair (Iz, #'), where i~ is a distribution function, 1~ is a density function, and # and I~~ are associated to each other as above. For technical convenience, we will assume that if ~ is a distribution function, ~(A) = 0. Also, we will allow density functions for which ~ x e g * ~'(x) is equal to a constant c > 0 that may be different from 1 (or distribution functions with the limit in Definition 3.6.1 (b) equal to an arbitrary constant c > 0), because they are easy to modify to satisfy the formal definition. For example, let us define a distribution on E* based on the following random experiment: (a) First we pick a natural number n at random with some probability pn, and (b) next we pick uniformly at random a string of length n. Thus, the probability that a given string x is chosen is px - p,~ " ~1, where n - [x[. In principle to obtain a density function we need 1 xE~*

n>l

xE~] n

1

n>l It 2

If we take Pn - n-'~, we have that ~ n > 1 pn - c ~ 1 (actually c - -~-). The probabilities can be normalized by defining p~ - ca " n--~. I However, we will consider 1 Pn acceptable as it is. The distribution defined by #1(x) - ~ 1 2-~ is called the standard uniform distribution of ~E*. m

D e f i n i t i o n 3.6.4 (Distributional problem) A distributional problem is a pair (A,~*), where A is a language (equivalently, a decision problem), and #* is a distribution. It is clear that some restrictions on distributions must be imposed; otherwise it is always possible to have the worst-case complexity be the same as the average-case complexity. It seems reasonable to require that the density function is polynomialtime computable. Such distributions are said to be P-samplable.

94

Chapter 3. P, NP, and E

D e f i n i t i o n 3.6.5 (P-samplable distribution) A distribution is P-samplable if there is a polynomial-time algorithm M that calculates the associated density function #'. This means that, for all x E ~*, #'(x) has a finite binary expansion and M(x) outputs this expansion in time that is polynomial in Ix I. Unfortunately, this definition does not allow the development of a useful theory of average-case complexity. Ben-David et. al. [BDCGL92] have shown that for every standard NP-complete problem it is possible to build a P-samplable distribution relative to which the problem is hard on average. Most commonly used distributions satisfy a stronger property: Their distribution function is computable in polynomial-time. D e f i n i t i o n 3.6.6 (P-computable distribution) A distribution is P-computable if there is a polynomial-time algorithm M that calculates the associated distribution function #. This means that, for all x E ~*, #(x) has a finite binary expansion and M(x) outputs this expansion in time that is polynomial in [x I. Clearly, a distribution that is P-computable is also P-samplable (because #'(x) = #(x) - I z ( x - 1)). The converse is probably not true. P r o p o s i t i o n 3.6.7 If P ~ NP, then there is a distribution that is P-samplable but not P-computable. Proof. We consider triples (r a, b), where r is a boolean formula in CNF, a is a truth assignment for the variables of r and b E {0, 1}. We encode such triples via a 1-to-1 mapping as binary strings, and we denote by (r a, b) the encoding of (r a, b). It can be easily arranged that both encoding and decoding can be done in polynomial time and that for all formulas r and for all assignments a for r (r a, 1) is lexicographically between (r ( 0 , . . . , 0), 1) and (r ( 1 , . . . , 1), 1). Let ]r denote the length of some fixed natural encoding of the formula r let ]a[ be the number of variables to which a assigns truth values, and let t(~b, a) be the truth value of r under the assignment a. Let us consider the function

I -

1 2214'121ai o

, if x - (r a, b) and t(r a ) - b , otherwise.

Clearly, the encoding (r a, b) can be taken such that #1 is computable in polynomial time. We have 1

xE~*

rt:>l {~bI I~bl--n} {a,b I t(dp,a)--b} 1 1 = ~ 22n ~ 2 l a l 21~I, n>~ {~,i i~l=,~} 1 92 =

n~l

- y : o -1. rt~>l

95

3. 6. Average-case complexity

Thus /~' is a density function and the distribution associated to #' is Psamplable. Note that if # is the distribution function associated to #' then #((r ( 1 , . . . , 1), 1>) - # ( ( r ( 0 , . . . , 0), 1)) ~ 0 if and only if there is a satisfying assignment for r (recall that for any truth assignment a for a formula r (r a, 1> is lexicographically between (r (0, . . . , 0), 1> and (r ( 1 , . . . , 1), 1)). Thus if/z were computable in polynomial time, it would imply SAT E P, and thus P = NP. | We define next what it means for a (decision) problem to be feasible on average, i.e., to be solvable in polynomial time on average. At a first sight, we should simply require that the expected running time over all inputs of a given length is bounded by a fixed polynomial, i.e., require that the running time t A (X) of an algorithm for a distributional problem (A, #*) satisfies for some fixed k and c, # ' ( x ) . tA(x) < c. n k, for all n e N.

(3.7)

Unfortunately, this attempt, though natural, has serious deficiencies that make it unsuitable for developing a theory of average-case complexity. To illustrate the problems with this definition, let us consider the function f (x) -

2 n, O,

if x - 0n otherwise.

The expected value for inputs x of length n under the uniform distribution is Z

~f(x)-

I

xE~'*

However the expected value of f2 is

~-~ ~/2(~)_ 2". Thus the class of functions with a polynomially-bounded expected value is not closed under squaring, and, in general, under multiplication. A definition of average-case complexity based on Equation (3.7) would be dependent on the type of machine that we are considering, because converting from one model to another (for example from a Turing machine with two tapes to a Turing machine with one tape) usually implies a polynomial slow-down of the running time. Moreover, even for a fixed model of computation, there would be serious problems. For instance, if we compose two functions tA and tB, both satisfying the relation in Equation (3.7), the resulting function may not satisfy that relation. Composing two functions is an operation that is needed when, to give just one example, we reduce one problem to another. Therefore, Levin [Lev86] has proposed another definition which avoids all these problems and which is now widely accepted as the right definition for average polynomial time.

96

Chapter 3. P, NP, and E

D e f i n i t i o n 3.6.8 Let #* be a distribution. A function f is polynomial on #*average if there is a constant e > 0 such that < OG~

where #' is the density function associated to #. Note that this definition states that, for some ~ > 0, ( f ( x ) ) ~ is linear on average. Let us check that the class of functions that are polynomial on #*-average is closed under multiplication. Let us consider two functions f and g which, for some constants c1 > 0 and c2 > 0, satisfy ~#'(x)

(f(x))el < cx3, and

Consider c = (1/2) min(el, c2). Then,

xr ~ #'(x)(f(x).l~;ig(~))~

+

(f(x))el + E # ' ( x x~,x

(g(x))="

) (g(x))e2 < cx3

x~,x

Thus, Definition 3.6.8 avoids the problem we have seen before (as well as other deficiencies of our first attempt) and it provides the basis for defining feasibility on average. D e f i n i t i o n 3.6.9 (AP) AP is the class of distributional problems (A,#*) that can be solved by a deterministic algorithm having a running time polynomial on ~* -average. We next show that there are NP-complete problems t h a t are feasible in the averagesense with respect to quite natural distributions. We consider the following problem, which is one distributional version of the well-known NP-complete problem 3COLORABILITY. P r o b l e m 3.6.10 D-3COL Problem:

Input: A graph G. Question: Is there a 3-coloring of the graph G? In other words, can the nodes of G be colored with 3 colors such that no pair of adjacent vertices are colored with the same color? Distribution: The density function #' is defined as follows: A natural number n is picked randomly with probability 1/(nZ). Next a graph with vertices labeled 1 , . . . , n is picked by taking independently for every two nodes i and j an edge (i,j) with probability 1/2.

3.6. Average-case complexity

97

P r o p o s i t i o n 3.6.11 D-3COL is in AP.

Proof. The proof is based on the fact that most graphs h a v e / ( 4 as a subgraph (/(4 is the complete graph with four vertices). Such a graph, obviously, is not 3-colorable. Therefore, our algorithm on input a graph G, first checks for the presence of/44 as a subgraph of G. If K4 is detected (and this happens most of the times), then the verdict comes immediately: The graph is not 3-colorable. If /(4 is not detected, then in a brute-force manner, we try all possible 3-colorings. This takes a long time, but because it is done only rarely, the average running time will be polynomial. Let us do the calculations. The probability that four given vertices form a K4 subgraph is (1/2) 6 = 1/64 (because there are six possible pairs of vertices). Suppose the number n of vertices has been fixed. We group the vertices into disjoint groups of four. The probability that no group is a K4 is (1 - ~4) n/4 - (63)'~/4 gi and therefore 1

Prob(G has n vertices and contains no/44) _ ~-ff.

~-~

9

Let Hn be the set of graphs with n vertices that do not contain K4, i.e., the event in the equation above. If the input graph G with n vertices has a K4 subgraph, the running time t(a) of the algorithm is bounded by a polynomial p(n), because we only need to check the (4) < n4 subsets of four vertices to find t h e / / 4 subgraph. If the input graph G with n vertices does not contain/(4, then in addition to the p(n) steps above, the algorithm is going over all 3n possible 3-colorings. Thus, in this case, the running time t(G) is bounded by 3n. q(n), for some polynomial q, and this is less than 4 n for n sufficiently large. We take k such that (a) p(n) 1/k is less than the length of the encoding of a graph G with n vertices (this length is denoted by ial) and (b) 41/k 9 (~3 r 1/4 (for some constant a). To finish the proof, it is sufficient to show that t(a) G

/k

. #'(a)

1

< o0.

lal

We calculate a truncation of this series, discarding a finite number of initial terms corresponding to graphs for which IGI is too small and does not satisfy the above inequalities. Clearly, since we are omitting a finite number of terms, it is sufficient to show the convergence of the truncated series.

G

I(]1

"It'(G) --

E

t(G)l/k. #'(G) +

a has K4

IGI

E

t(G)l/k "It'(G)"

a has no K4

IGI

For the first term observe that t(G) 1/k < p(n) 1/k < [G[, and therefore

G has K4

let

-

G has//4

-

98

Chapter 3. P, NP, and E

For the second term, we have

t(G)l/k G has no K4

4n/k 9 n G6H,~

<~4

~/k. ~

n --
#'(a)

G6Hn

2 " ( 6n3/ 4 ) -64

7%

1/4) n

<

El.a. ~


n

This ends the proof of Proposition 3.6.11. II Thus, there are problems that are hard (in our example, hard meaning NP-complete) in the worst-case and easy on average. There exist also problems that remain hard on average as well. As in the case of worst-case analysis, a notion of completeness is helpful to describe this phenomenon. We first define an analogue of NP for the average-case. D e f i n i t i o n 3.6.12 (DistNP) DistNP is the class of distributional problems (A,#*) having the property that A is a decision problem in NP and #* is a P-computable distribution. We also need a notion of reducibility between distributional problems. The main requierements are (a) the transitivity of the reduction relation, and (b) the fact that if (A, #*) reduces to (B, u*) and (B, v*) is in AP, then (A, #*) is also in AP. To obtain these properties, in addition to the normal relation between the decision problems A and B, we also need to ensure that the reduction does not map many instances A ("many" according to/z*) into few instances of B ("few" according to u*). Otherwise, it would be possible that most instances of A are mapped to a few hard instances of B, and, thus, even if (B,u*) is in AP, it would not follow that (A, #*) is in AP. The needed technical concept is that of domination between distributions. D e f i n i t i o n 3.6.13 (Domination) Let Iz* and v* be two distributions and #' and v' be, respectively, their associated density functions. We say that v* dominates #* (or #* is dominated by v*), and we write #* < v*, if there is a polynomial p such that, for all x 6 E*, Iz'(x) <_p([x]), v'(x). D e f i n i t i o n 3.6.14 (Average-case reduction) Let (A,#*) and (B,v*) be two distributional problems, and let #' and v' be the density functions of #* and respectively v*. We say that (A,#*) is polynomial-time reducible to (B,v*) (notation (A, Iz*)


3. 6. Average-case complexity

99

(2) there is a distribution T* such that #* < T*, and, for all y in the range of f, t/(y) - ~xcI_~(y) 7-'(x) (where T' is the density function associated to T*). We show that this notion of reducibility has the desired properties. P r o p o s i t i o n 3.6.15 (1) /f (AI,#~)


Proof. (1) Let f and g be two functions such that (A1, #~) is reducible to (A2, #~) via f and (A2, #~) is reducible to (A3, #~) via g. We show that g o f reduces (A1, tt~) to (A3,#~). Clearly, x E Ai r f(x) E A2 r g(f(x)) E A3. Since (AI,#~) ___P (A2, #~) and (Ag.,#~) _


ye g- ~(z)nRange(I)

1

~

p(lyl)"#~(Y) 1

u~a- 1(z)nR~g~(f)

~c f- 1(y) 1

>--

E y E g - 1 (z)nRange(f)

1

>-

1

P(lYl)"

p(lxl) "#l(x)

E xE f - 1(y)

1

i

~ p(p(Ixl)) " p(Ixl) '#l(X)" x~(gof)-~(z)

Let c(z) -

1 1 i ~ p(p(Ixl)) " p(Ixl) .tz~(x). xe(gof)-l(~)

The relation established above shows that ~'(g(S(z)) >_ c(g(f(z)), for all x. Consider the distribution ~ having the associated density function

, 1 #'3(g(f(x))) 9It' 5 3 ( x ) - p(p(lxl))p(IxI) " c(g(f(x))) l(X). Then, clearly, for any z in the range of g o f,

Chapter 3. P, NP, and E

100

and

,~ (~)

< p(p(l~l))- p ( l ~ l ) . ~(~). Therefore (A1, #~) _

0,

E Y

t~(y) Without loss of generality we can assume ~ < 1. We know that x C A if and only if f ( x ) 6_ B and, thus, the determination of whether x E A can be done by (a) calculating f(x), and (b) running M on f(x). The time to do (a) is polynomial for all x, so it is sufficient to show that the time to do (b), which is t(f(x)), is polynomial on u*-average. There exists k > 0 such that If(x)l < I~1k for all but finitely many x. Let h(x) - t(l(x)) We show that h(x) is polynomial on #*-average and from here it

p(Ixl)k/~"

fonows that t(/(~)) - h(~). p(l~l)k/~ is polynomial on ,*-average (ber of the closure under multiplication of the class of functions polynomial under #*-average). We have

E x

h(x)~/k

I~I

9#'(x) E t(f(x))~/k" -

Ixl

x

<~

1

p(l~l) . # ! ( x )

(t(](~)))~/'

9~'~(~)

x

E

yCRange(f)

E xE f -l(y)

(t(y)) ~/k

(t(y)) ~/k yeRange(f)

!

Ixl ! ~(~) xe I - l(y)

9 ~'(y) yERange(f)

]y]l/k

( (t(Y))~lyl ) ~/k .~'(y) yERange(f)

-- E

u! (y) < c~.

Y

This ends the proof of Proposition 3.6.15.

I01

3. 6. Average-case complexity

Equipped with a reducibility, we can show that there are problems complete for DistNP. P r o b l e m 3.6.16 Distributional Bounded Halting (D-BH) Problem: Input: A triplet (N,x, lk), where N is a nondeterministic machine N, x is an input string for N, and k is a natural number. Question: Does N halt on input x within k steps?

Di~t,~b~t~: ,5_~.((g,~,

1')) -

~ ~ 9 b. [N[2.2'I'N] " [x[2.2[xl

(Th~ r

to choosing N, x, and 1k independently according to the standard uniform distribution.) The Bounded Halting Problem (BH) (which is D-BH without the distribution) is easily shown to be NP-complete (in the standard sense). Indeed, let A be a problem in NP. Then there is a nondeterministic polynomial-time machine NA which solves A and which runs in time p(n), for some polynomial p. Then x 6 A if and only if (NA, x, 1p(Ixl)) 6 BH. Showing completeness in the average case is more delicate because we have to consider all NP-complete problems A and, in addition, all P-computable distributions. It is possible that according to such a distribution a string x has density much greater than 2-1xl, while the triplet x is mapped to by the standard reduction seen above has ~)-BH density less than 2-1xl. This violates the domination rule for a reduction among distributional problems. The problem is overcome by first mapping strings with high density into short strings. More precisely, a string x is mapped into a string whose length is <_ 1 + log ~-7~.1 This is achieved in the following lemma. L e m m a 3.6.17 Let lz* be a P-computable distribution and tz' its associated density function. There exists a function code : ~* ~ ~* such that (1) code is I-to-I, (2) c o d e / s computable in polynomial time, and (3) for every x, [code(x)[ _< 1 + min { Ixl, log - ~1

}.

Proof. There are two categories of strings: (a) strings x with ~'(x) < 2-1~1, and (b) strings x with #'(x) > 2 -I~1. We use two different encodings for the two categories. To keep the coding 1-to-1, the encoding of strings in category (a) starts with 0, and the encoding of strings in category (b) starts with 1. For strings in category (a), code(x) = 0x. It is clear that conditions (1), (2), and (3) are verified. Let # be the distribution function associated to #*. For strings in category (b), code(x) is of the form lz, where z is taken to be the binary expansion of a certain value in the interval ~ ( x - 1), #(x)). This ensures the 1-to-1 property of the mapping (because the intervals [/z(x - 1), #(x)) are disjoint). The string z is the longest common prefix of the binary representation of #(x) a n d / z ( x - 1). This ensures that code(x) is computable in polynomial time. We still need to check

102

Chapter 3. P, NP, and E

property (3). Note that, since #'(x) - # ( x ) - # ( x - 1) and #'(x) > 2-1~1, we have Izl < I~1 and It' (x)

#(x)

-

#(x

-

<_

-

1)

z(i) . 2 - ' +

\i=1

~

2-'

i=lzl+l

)

-

z(i) . 2 - '

\i=1

oo

=

E

2 - i - - 2-1~1'

i=lzl§

because #(x) <_ 0.zl . . . .1. .

Izl

and t t ( x - 1) _> 0.z. Thus,

_< log ~-7-~,1 and, there-

fore, Icode(x)l < _ 1 + min{lx I, log . - ~1} . Theorem

l

3.6.18 D-BH is complete for DistNP.

Proof. Let (A, #*) be a distributional problem for DistNP and NA be a nondeterministic polynomial-time machine that accepts A in time pA(IXl), where PA is a polynomial. Consider NA,u the nondeterministic polynomial-time machine that on input y, guesses nondeterministically x such that code(x) = y, and then runs NA on input x (if there is no such x, it will reject). Let p(n) = n + Pcode(n) + PA (n), where Pcode(n) is the time required to calculate code(x), for a string x of length n. The reduction from (A, #*) to (D-BH, #~)-BH) is given by f (x) - (NA,u, code(x), lP(IXl). It can be checked immediately that x C A r f ( x ) c D-BH, and that f can be calculated in polynomial time. It remains to check the domination property. By L e m m a 3.6.17, #'(x) < 2 . 2 -Ic~ Therefore, #~_BH(NA,u, code(x), 1p(I*I)) -

1

INA..I 2 . 21g...,

C -- , <

where c -

IgA., 12 192 1 N A , ~

I

2

1 Icode(x)l 2

.

1

1

I c o d e ( x ) l 2 . 21code(x)l

(p(Ixl)) 2

1 (p(l~l))

. jLt !

2

-

(~)'

(i.e. ' it does not depend on x) It follows that

#'(x) _< -2. icode(x)12 9p(lxl) 2 9 ttD_BH , (N a , ~ , c o d e ( x ) , 1p(I~I)) . C

Therefore the domination requirement is satisfied if we take I #l(x)

__

! N ~ #D-BH(a,t,,code(x) I p(l~l)) 9

(Note that since the coding is 1-to-l, x is the only element mapped into (N~,~, code(x), lP(l*l)). 1

3. 6. Average-case complexity

103

D-BH is the generic complete problem for the class DistNP, in the sense that, being built from the Bounded Halting Problem, it simply encompasses all NP problems with all their inputs. Such problems are not very useful for showing the existence of other complete problems via reductions. More natural examples of problems that are complete for DistNP are known, but the list of such problems is currently far smaller than the list of NP-complete problems. We content to present (following the exposition in [Wan97a]) just one example of a more natural DistNP complete problem. P r o b l e m 3.6.19 Distributional Post Correspondence Problem (D-PC): Input: A nonempty set LIST = ((t~,,r~),...,(t~,,,r,~)) of pairs of binary strings and a positive integer n written in the unary alphabet. Question: Is there a sequence of at most n integers i , , . . . ,ik, k _ n, such that ti, t~2 999 t~ik = r~l r~2 999ri~ ? (Such a sequence is called a solution of size k of the problem.) Distribution: , 1 1 m 1 #D_PC(((~I, rl),..., (~m, 7"m)), 1'~) - n2 m2 H ]~i[2. Iri]2 . 21',l+l*,l '

i=1

(i.e., the uniform distribution). Theorem

3 . 6 . 2 0 D-PC is complete for DistNP.

Proof. Let (A, tt*) be a problem in DistNP. Thus, there is a nondeterministic polynomial-time Turing machine M1 such that M1 accepts A. We can assume without loss of generality that M1 has only one accepting state and that, for all x, all the computation paths of M1 on input x are bounded by some polynomial in Ixl. We will be using the function code from Lemma 3.6.17 for the distribution function #' associated to it*. Recall that for all x, we have #'(x) _ 2 -Ic~ . As in the proof of Theorem 3.6.18, from M1 we build another nondeterministic Turing machine M as follows. M on input l w guesses nondeterministically x such that code(x) - w. If the input does not start with 1 or if x is not found, M rejects immediately. Otherwise, M simulates M1 on input x. Clearly, M also has exactly one accepting state, and there is a polynomial p such that, for all x, x E A if and only if lcode(x) is accepted by M in time at most p([xl). We can assume as well that, for all x, all the computation paths of M on input x are bounded by p(Ixl), and also that M has a single tape. Next we build the reduction function f. We fix x to be an input binary string for the problem A and we have to build an instance for the D-PC problem. For the machine M, let Q be the set of states, q0 be the starting state, a be the (unique) accepting state, 5 the transition function, E the alphabet. Let z = lcode(x) and let E1 = Q U E U { B , A, [9, !}, where B is the blank symbol and A, O a n d ! are new symbols. The reduction f will be 1-to-1 and, thus, we cannot have in the set LIST of the instance f(x) a string longer than clxl, with c > 1, because otherwise the domination property cannot be satisfied (#~).pc(f(x)) would be less than polynomially smaller than #'(x)). To take care of this, all the

Chapter 3. P, NP, and E

104

strings in the D-PC instance that we are constructing will have length at most Ix I + O(log(Ixl) ). We need an additional encoding function, which depends on x, and which we describe now. We define a bijective function d" ~1 ~ S C {0, 1} L, for some positive integer L, and we call the strings d(s), with s E F~I, codewords. The encoding d has the following properties' (1) L - O(log(Ixl)), (2) No codeword is a substring of x, (3) The set of all proper prefixes of all codewords is disjoint from the set of all proper suffixes of all codewords, (4) 1, 10, 000, 100 are not prefixes of any codeword. Note that any string that starts with 1, and in particular z - lcode(x), can be decomposed in a unique way as a concatenation of 1, 10, 000, and 100. The function d is built as follows. The codewords will belong to the regular set R - 0100(00 + 11)*11. This ensures that conditions (3) and (4) hold true. The value L is taken to be the least even integer such that 2 (L-6)/2 _ Ix] + ]]~1]1. Therefore, L - O(log(ixl) ). Also, since the string x has at most IxJ substrings of length L, we can pick a set S of strings in R that does not have any substring of x and that can be put into a bijective correspondence, which is our d, with ~1 (note that R has 2 (L-6)/2 strings of length L). The encoding d can be extended in the obvious way to F~, i.e., for any v E E~, d(v) is obtained by replacing each symbol in v with the corresponding codeword. We now build f(x) as an instance for the D-PC problem. Thus, f(x) consists of a set of pairs of words, LIST(x), and of a nonnegative value n written in unary. We will define n later, so let us focus for now on LIST(x). The set LIST(x) consists of six groups of pairs of binary strings.

Group 1" (d(~), d(A)zd([:]!qo)); Group 2: (u, d(u')), for all u E { 1,10,000, 100}, where u' is obtained from u by replacing each 0 with !0 and each 1 with !1; Group 3: (d(X!),d(!X)), for each X E ]EU {B, El}; Group 4" For each q E Q - {a} and for each X E F~U {B}, (d(q!X!), d(!Y!p)), (d(Z!q!X!), d(!p!Z!Y)), (d(q!D!), d(!Y!p!D)), (d(Z!q!D!), d(!p!Z!Y!D)),

if (p, Y, R) E 5(q, X); if (p, Y, L) E 5(q, X); if (p, Y, R) E 5(q, B); if (p, Y, L) E 6(q, B);

Group 5: For all X, Y E ~ U {B}, (d(X!a!), d(!a)), (d(X!a!Y!), d(!a)), d(a!Y!), d(!a)); Group 6: (d(a!D!D!),d(!D)).

3. 6. Average-case complexity

105

If u E ~E U {B}, we denote by d(u) the string obtained by replacing in d(u) each codeword d(v) with d(v!) and omitting the last d(!). A partial solution of the problem is a pair of words (u, v) such that u is a prefix of v, and such that u and v are obtained from a sequence of (not necessarily distinct) pairs in LIST(x) by concatenating their left strings and, respectively (i.e., for v), their right strings. We will show that there is a sequence of partial solutions that describe in a natural way the computation of M on input z. The status of the machine M at a given time is described completely by the content of its tape at that moment, by the current state q E Q, and by the position of the read/write head on the tape. All these elements taken together define the configuration of M at a given moment. If the content of the tape is a/3, with a,/3 E E*, the read/write head is scanning the cell containing the rightmost symbol in a, and the current state is q, then the corresponding configuration can be represented by the string c~q/3. For a configuration C = c~q/3, we denote

(C) - d(a)d(!q!)d(,6)d(!D). The machine M starts in the initial configuration Co = qoz and it moves succesively through a sequence of configurations. Let (START) be the string d(AzD!). Observe that if we try to build a solution for LIST(x), the only pair that can be used to start is the one from Group 1, (d(/~), d(/~) zd(D !qo)). Next, in order to build z in the left hand side of the solution, we can only use pairs from Group 2; we will append z to the left hand side, and in the right hand side we get d(!)d(z). Next, to place d(D!) in the left hand side, we can only use the pair from Group 3 (d(D!), d(!D)). Concatenating these pairs, gives ((START), (START)(C0)), and this is the only way to start building a solution. Observe that the numbers of pairs from LIST(x) used to build this partial solution is bounded by a polynomial in Ixl. Next, to continue building our solution, we need to append (Co) to the left hand side of our partial solution. If q0 = a (the unique accepting state), we can complete a solution by appending pairs from Group 5 and, at the end, the pair from Group 6. If q0 ~ a, then we can only use a pair from Group 4 that corresponds to a legal move of M from configuration Co. This legal move (if there is one) takes M into some configuration C1. Next it can be checked that we can only use pairs from Group 3. This leads us to the partial solution ((START) (C0), (START> (C0) (C~)). Observe that in the transition from the partial solution ((START), (START)(Co)) to the partial solution ((START)(C0), (START)(C0)(C1)), we have used a polynomial (in Ixl) number of pairs from LIST(x). In a similar way, it can be checked that, if we have built the partial solution ((START)(C0)... (Ck-1), (START)(Co)... (Ck_l)(Ck>),

(3.8)

Chapter 3. P, NP, and E

106

the only way to continue and place (Ck) in the left hand side is" (a) If Ck contains a, then we can complete a solution by using pairs from Groups 3 and 5 and at the end the pair of Group 6, and (b) if Ck does not contain a, then there must be a legal move taking the machine M from configuration Ck to configuration Ck+l, and, in this case, the only partial solution that we can obtain is ((START)(C0)... (Ck-1)(Ck),

(START)(C0)... (Ck)(Ck+l)).

(3.9)

It can be again checked that, since Ck has size bounded by the fixed polynomial p(Ixl), to make the transition from the partial solution given in Equation (3.8) to the partial solution given in Equation (3.9), we have used a number of pairs from LIST(x) that is bounded by a fixed polynomial in Ixl. Therefore, the instance LIST(x) has a solution ff and only if M on input z - lcode(x) has a computation path that goes through a sequence of consecutive configurations Co, C1,..., Ck, and the last configuration, Ck, contains the unique accepting state a. The existence of such a computation path shows that M accepts x, and, as noted, this can only happen in at most p(Ixl) steps. Thus, if Ck has the accepting state a, then k < p(]xl). Recall our estimation on the number of pairs from LIST(x) necessary to make the transition from one partial solution to the next one containing a new (C) in the left hand side. It follows that there is a polynomial q such that M accepts z - lcode(x) if and only if LIST(x) has a solution of size g(Ixl). So, our reduction is

f(x) - (LIST(x), lg(Ixl)). It is easy to check that f(x) is computable in polynomial time, and, using the above remarks, that x C A if and only if f(x) C D-PC. It remains to check the domination property. Note that the reduction f is 1-to-1 (this follows from the pair in Group 1 and because code(.) is injective). Also note that the length of the string d(/~)zd([::l!qo) (which appears in Group 1) is bounded by Icode(x)l + O(L), where n - O(log(Ixl)), and that the length of each of the other strings in LIST(x) is bounded by O(L) - O(log(Ixl) ). Consequently,

I f ( x ) ] - Icode(x)l + O(log(Ixl)) + g(l~l). It follows that for some polynomial r,

iDPe(f(x) ) ) 2_lcode(z)[ . and, therefore,

3.7

]~*DPCdominates

Comments

and

1

) #'(X) 9 1

-

r(l l)'

#*.

bibliographical

notes

The probabilistic algorithm for 3-SAT in Section 3.2 is due to Schhning [Sch99]. It has been slightly improved several times, and, at the time of this writing, the most

3.7. Notes

107

efficient probabilistic algorithm for 3-SAT runs in time O(1.32793 n) and has been developed by Roll [Rol03]. Non-trivial exact algorithms for several NP-complete problems have been found and the article of Woeginger [Woe03] is an informative survey of this area. As mentioned in Section 2.7, the classification schemas induced by effective Baire category concepts have been introduced in computation and complexity theory by Mehlhorn [Meh73]. Mehlhorn's approach has been extended in several directions primarily by considering different types of open set extensions and by limiting the computing power of the extension functions (see the articles by Lutz [Lut90], Fenner [Fen95], Ambos-Spies [AS96], Ambos-Spies and Reimann [ASR96]). The idea of using the superset topology in the context of effective Baire classification of classes inside NP is due to Zimand [Zim93]. The results from Section 3.3 are from the same paper [Zim93]. The technique used to demonstrate Theorem 3.3.9 and several other related results is called delayed diagonalization and has been invented by Ladner [Lad75] to show that if P ~ NP, then there exist sets in NP that are neither NP-complete nor in P. The main concepts of resource-bounded measure theory have been developed by Lutz [Lut90, Lut92]. He has brought to light some earlier studies of Schnorr [Sch73] and has shown the applicability of this theory in the exploration of some quantitative issues in computational complexity. It is now a mature area with its own ramifications, open problems, and all the other attributes of a vital theory. The survey papers of Lutz [Lut97] and Ambos-Spies and Mayordomo [ASM97] provide a good coverage of the core directions. Theorem 3.4.1 and Theorem 3.4.3 state simple and basic facts of resource-bounded measure theory. The class of P-quasiapproximable sets has been introduced by Zimand [Zim98]. It is a generalization of a large number of classes (see the list in Corollary 3.4.10) that capture in various ways the idea of a polynomial-time weak membership property. Theorem 3.4.6 and Theorem 3.4.12 have been shown by Zimand [Zim98]. The fact that the hypothesis "NP does not have PF measure zero" implies that NP-completeness under Cook reductions differs from NP-completeness under Karp reductions (i.e., Theorem 3.4.13) has been shown by Lutz and Mayordomo [LM96]. This result has been extended to other reductions by Ambos-Spies and Bentzien [ASB97]. Relativization is a basic notion in computability theory. It has first been used in complexity theory by Baker, Gill, and Solovay [BGS75]. Their article shows the existence of oracle sets A and B such that pA ~_ NpA and pS C Np s. The study of complexity classes relativized with random oracles has been initiated by Bennett and Gill [BG81]. They have shown that relative to a random oracle A, NP A is pA bi-immune. Theorem 3.5.2 is a strenghtening of this result and has been obtained by Hemaspaandra and Zimand [HZ96]. The notion of P-balanced immunity has been introduced by Miiller [Mii93]. Kautz and Miltersen [KM94] have shown that relative to a random oracle A, NP A does not have effective measure zero with respect to pA-computable martingales. The theory of average-case complexity was initiated by Levin [Lev86]. Levin's paper is very concise and it does not elucidate the motivation behind some of the

108

Chapter 3. P, NP, and E

subtle and key elements of the theory. Further explanations have been given by Gurevich [Gur91a, Gur91b], and in the survey papers of Goldreich [Go197] and Wang [Wan97b]. The fact that 3-COLORABILITY can be solved in average polynomial time (Proposition 3.6.11) has been shown by Wilf [Wi184]. The article of Wang [Wan97a] is a comprehensive survey of DistNP complete problems (and of related matters). Theorem 3.6.18 and Theorem 3.6.20 are due to Gurevich [Gur91a].