Complexity

Complexity

Chapter 20 Complexity Several works deal with complexity issues related to error-correcting codes (problem of hnear decoding, with or without preproc...

929KB Sizes 3 Downloads 124 Views

Chapter 20

Complexity Several works deal with complexity issues related to error-correcting codes (problem of hnear decoding, with or without preprocessing, problem of computing the minimum distance of a linear code, ...), mainly from an NPcompleteness viewpoint, that is, a worst-case approach. In the first two sections of this chapter, we also use the framework of the theory of NP-completeness, for covering problems, and determine the complexity of upperbounding the covering radius of a given binary code, linear or not. We first give in Section 20.1 the necessary background concerning the polynomial hierarchy, by describing different classes of complexity. Then in Section 20.2 we show that computing an upper bound on the covering radius of a nonlinear code is co-NP-complete and that the same problem for hnear codes is II2-complete. This means that it is unlikely that there are polynomial-time algorithms that solve these problems for all instances. W h a t can be done when facing an NP-complete (or worse) problem? One can use heuristics (see Section 3.8), approximations, ... or a brute-force attack. If only an existential result is required, then random methods often work. Recently a promising method has emerged: derandomization. The idea, roughly speaking, is to perform an exhaustive search on a suitably chosen small subset of the original set of possible solutions. We illustrate the method in Section 20.3 on a problem akin to classical covering, namely that of covering by union of subcubes.

20.1

Basic facts about the p o l y n o m i a l hierarchy

Our intention here is to give an intuitive approach of the notion of completeness in the polynomial hierarchy. 479

480

Chapter 20. Complezity

We deal only with decision problems, that is, problems consisting of a question whose answer is either YES or NO. An algorithm A solves a problem ~r if, applied to any instance I of lr, it gives the correct answer to that instance I. An estimation of the size of an instance I of r is given by any "reasonable" encoding of I (for instance, a reasonable encoding of an integer m requires log m bits). The time complexity function of an algorithm A that solves lr is, for each possible instance size, the maximal time required by A to solve an instance of that size. A polynomial-time algorithm is one whose time complexity function can be bounded by a polynomial p(n), where n is the size of the instance we consider. The class of polynomial-time solvable problems is denoted by P.

A polynomial reduction from a problem r l to another problem ~r2 is a polynomial constructive transformation that maps any instance of r l into an equivalent instance of 7r2 (the answer is the same for both instances): thus, such a transformation provides the means for converting any polynomial-time algorithm that solves 7r2 into a corresponding polynomial-time algorithm for solving r l . Next, we introduce the class NP" a decision problem belongs to NP if it can be solved by a polynomial-time nondeterministic algorithm, i.e., an algorithm consisting of two stages: a guessing stage and a polynomial-time checking stage. The first stage provides some structure s. The second stage proceeds in a deterministic way and correctly answers YES or NO. For example, consider the well-known Travelling Salesman (TS) problem, for which the instance is a set of cities, the set of integer distances between the cities and an upper bound B, and the question is whether there exists a Hamiltonian cycle with length at most B; the guessing stage provides a sequence s of the cities and the checking stage checks in polynomial time if s is a Hamiltonian cycle of length no more than B or not. For a set S of problems, let co-S be the set of problems that are complementary to those in S (i.e., their answers are reversed). We have P - co-P C_ NP N co-NP, but membership in NP does not seem to imply membership in co-NP (see Figure 20.1). For instance, the complement of TS is to determine whether all Hamiltonian cycles have length at least B + 1 and there is no known way to verify a YES answer short of examining a very large proportion of all possible Hamiltonian cycles, which is not known to be achievable in polynomial time. Among problems in NP, some have the property that all other problems in NP can be polynomially reduced to them. This particular class of problems is denoted by NP-C and its members are called NP-complete problems. If one problem in NP-C could be solved in polynomial time, then so could every problem in NP and P would be equal to NP. The question "P - NP?" is still open and is one of the most challenging in the theory of complexity. Thus,

20.1. Basic facts about the polynomial hierarchy

481

S NPmple

mplete oblem~

~.~roblem~

\_

I

NP

k,,,,

J

co-NP

_.~

Figure 20.1" If NP # co-NP.

the NP-complete problems can be seen as the most difficult problems in NP. For example, TS is NP-complete and so is 3-satisfiability (3-SAT), for which the instance is a set of variables and a set of clauses containing exactly three different literals (a literal is either a variable xi or a negated variable ~j), and the question is whether there exists a truth assignment to the variables such that each clause has at least one true literal. In other words, can the boolean formula E be satisfied, where E - C1 A C2 A . . . ACm, each clause Ci - xil V xi2 V xis for i - 1 , 2 , . . . , m and xil,xi2,xis are three different literals? Such an expression for E is called its conjunctive normal form. Some problems might be harder than the NP-complete problems and classes of problems of increasing apparent difficulty, all containing NP, can be defined, which form the polynomial hierarchy. For these classes, the notion of completeness can be extended" a problem 7r belonging to a class S of the polynomial hierarchy is S-complete if every problem in S can be polynomially reduced to r. In particular, the polynomial hierarchy contains classes denoted by rio, HI, ..., IIk, ... and E0, El, ..., E k , . . . , with the following properties: II0 = E0 - P, E1 -- NP, H1 - co-NP, IIL - co-EL, EL O HL C_ EL+I N IIL+I (see Figure 20.2). Roughly speaking, a problem is in EL if it can be solved by a polynomialtime nondeterministic algorithm with access to an oracle (a subroutine) that provides, in one step of computation, solutions for a problem in EL-1. Another rather informal characterization of EL is to represent the instance of a problem lr by a string z; now 7r E EL if and only if 7r - {z 3yl 9 Vy2 . . .

Chapter 20. Complexity

482

/

S

f

IIk+ll

!

complete problems

\_

J )"]k+ i

IIk+i

Figure 20.2: If Ek+l r IIk+i, for k _ 1.

Qyk R(z, yl, y2,..., yk)}, where the quantifiers alternate, Q stands for V ff k is even and 3 if k is odd, R is a polynomial-time recognizable relation and the lengths of the strings yl, y 2 , . . . , yk are polynomially bounded by the length of the string z. The same characterization holds for IIk, with the alternating quantifiers V3V .... Thus, the following problem is in IIk and moreover it is Hk-complete. NAME: V132V3. 9Qk-3-satisfiability (V132V3... Qk-3-SAT), where the quantifiers alternate and Q stands for V if k is odd and 3 if k is even. INSTANCE: k integers m l , . . . , ink, a quantified Boolean expression Vul,1...Vul,ml 3u2,1... 3u2,m~ Vua,1 ... Vua,m, ... Quk,1... Quk,m~ E, where E is in conjunctive normal form, there are three distinct literals in each clause and the quantified variables are all the variables of E. QUESTION: Is it true that for every truth assignment to u1,1,..., ul,ml, there exists a truth assignment to u2,1,..., U2,m2, such that for every truth assignment to u3,1,..., U3,ms,..., E is satisfied? To prove that a problem 7r is S-complete, we have to check that it belongs

20.2.

The complexity of computing the covering radius of a binary code 483

to S and that every problem in S can be polynomially reduced to 7r. For the second step, it is sufficient to prove that some known S-complete problem 7r0 is polynomially reducible to 7r, since all problems in S are polynomially reducible to 7r0 and the reduction process is transitive. Completeness results are conditional; for example, the NP-completeness of a problem 7r means that a polynomial-time algorithm solving 7r exists if and only if P - NP; analogously, for k >_ 1, the ~k-completeness of 7r implies that 7r C E~k\Z]k-1, unless Ek -- ~3k-1. It is not known whether the polynomial hierarchy is finite or infinite. The first alternative occurs if P - NP; it also occurs if for some k0 >__ 1, Eko -- IIko, since it can be shown that this would imply that for all k >_ k0, Ek -- IIk -- Eko. It is widely believed that P is not equal to NP, i.e., that no polynomial-time algorithm exists for NP-complete problems.

20.2

T h e c o m p l e x i t y of c o m p u t i n g t h e covering radius of a binary c o d e

The complexity of computing bounds on the covering radius of a binary code has been studied first in the linear case then in the general unrestricted case. The seemingly rather paradoxical result is that the problem of upperbounding the covering radius of a linear code is II2-complete, whereas the problem of upperbounding the covering radius of a nonlinear code is "only" co-NPcomplete. This fact can be explained by the more compact representation of a linear code: the size of a problem involving an [n, k] (respectively, (n, g ) ) code is n • k (respectively, n • K), but the latter representation provides, explicitly but uneconomically, all the elements of the code. Co-NP-completeness and II2-completeness results mean that it is unlikely that the covering radius problem may be solved by a polynomial-time algorithm. We first show the II2-completeness of the linear version, then the co-NPcompleteness of the unrestricted version (whose proof is much shorter and simpler). The problem of deriving an upper bound on the covering radius of a binary linear code can be stated (as a decision problem) as follows: NAME: Upper bound on the covering radius of a binary linear code (UB-LIN). INSTANCE: A binary linear code C, given by a parity check matrix I-I of dimensions m • n, an integer w. QUESTION" Is it true that Vy C IFTM,3x E IFn, such that t t x T = y and w(x) ~_ w?

Chapter 20. Complexity

484

Indeed, since the covering radius of C is the smallest positive integer R such that any m-tuple is the sum of at most R columns of H (see Theorem 2.1.9), we get a YES answer to the question if and only if R _ w. T h e o r e m 20.2.1 The decision problem UB-LIN is II2-complete. P r o o f . We see that the problem UB-LIN presents the pattern V3. Thus it belongs to Iis. We reduce, in two steps, the problem V3-3-satisfiability, which is, as we already mentioned, 11s-complete, to UB-LIN. NAME: V3-3-satisfiability (V3-3-SAT). INSTANCE: A quantified Boolean expression Vui . . . Vuml 3vm~+ i . . . 3vm~+m~ E, where E is in conjunctive normal form, there are three distinct literals in each clause and the quantified variables are all the variables of E. QUESTION: Is it true that for every truth assignment to u i , . . . , um~, there exists a truth assignment to vm~+i,...,vm~+,,~ that satisfies E? We first reduce V3-3-SAT to V3-3-dimensional matching (V3-3-DM), then we reduce V3-3-DM to UB-LIN. NAME: V3-3-dimensional matching (V3-3-DM). INSTANCE" Two disjoint subsets Mi and Ms of Xi x X2 x X3, where Xi, X2 and X3 are three disjoint sets of the same cardinality. QUESTION: Is it true that VSi C_ Mi, 3S2 C_ Ms, such that SiWS2 is a matching? Recall that a matching S is a subset of Mi U M2 with IXil elements such that no two triples in S agree in any coordinate. Starting from any instance of V3-3-SAT, we have to construct, in polynomial time, an instance of V3-3-DM in such a way that positive and negative instances correspond in V3-3-DM and V3-3-SAT. Let Vui ...Vum13v,,~l+i ...3v,~+m2E be an instance of V3-3-SAT, with E - Ci A C2 A ... ACm. Let n be the number of variables in E. For each variable wi that occurs in E, let 2} - T[ U T~, where T[ - {(~i[j], ai[j], bi[j])

9 1 < j
and T~ - {(wi[j], ai[j + 1], bi[j])" 1 < j < m} u {(wi[m], ai[1], bi[m])}.

20.2.

The complezity of computing the covering radius of a binary code 485

The structure of Ti depends on m. It involves "internal" elements ai[j] X2, bi[j] ~ X3 which will not occur outside of T~ and "external" elements wi[j], Ei[j], elements of Xl, which will occur in other triples. Since none of the internal elements will appear outside of ~ , any matching will have to include exactly m triples from T~, either all triples in Tt or all triples in T[. Hence the set T~ forces a matching to choose between setting wi true and setting wi false. Thus, a matching M ' specifies a truth assignment, with a variable wi being set true if and only if M' N T / = Tit. For each clause C3, let

[ c j ] - 3. The elements s l[j] E X2, s2[j] E X3 are internal. Thus any matching M ' will contain exactly one triple from cj. This can only be done, however, if some wi[j] (or ~/[j]) for a literal wi E Cj ( ~ E Cj) does not occur in the triples in Ti N M', which will be the case if and only if the truth assignment, determined by M', sets true wi ( ~ , respectively), i.e., Cj is satisfied. Up to now, X~ contains 2mn elements: wi[j],~[j] for 1 < i < n and 1 < j _< m; X2 contains m(n + 1) elements: ai[j], sl[j]; X3 contains m(n + 1) elements: bi[j], s2[j]. If a matching exists, because of the internal elements in the sets T/ and cj, it must contain exactly mn + m - m(n + 1) triples belonging to these sets. We now define a set G consisting of triples containing the external elements wi[j], ~ [ j ] E Xl and m ( n - 1) additional internal elements belonging to X2 (X3), so that I X 1 [ - I X 2 [ - I X 3 [ - 2mn:

1 <_k<_m(n-1),l<_i_
[j] [j] that doe .or i. t iple M ' \ G. r162 m ( n - 1) such external elements and the structure of G insures that they can always be matched by choosing M ' n G appropriately. Thus G guarantees that, whenever a set satisfies all the constraints imposed by the sets ~ and cj, then it can be extended to a matching. Finally, let M - ( [.J~l Ti) O ([.Jj= 1 cj)UG. It is a subset of X1 • X2 z X3, with X ~ - {w~[j], ~ [ j ] " 1 _< i _< n, 1 _< j _< m}, X2-{a~[j]'l<_i_
u{s~[j]'l_
u {gl[j] " 1 <_j <_m ( n X3 - {b~[j]

1)},

9 1_< i _< n, 1 ___j _< m} U {s2[j] " 1 __ j <_ m} u

486

Chapter 20.

Complexity

u {;~[j]. 1 < j < m ( n - ~)}. All in all, M contains 2 m n + 3m + 2 m 2 n ( n - 1) elements and is constructed in polynomial time. From the comments made during the description of M, it follows that M cannot contain a matching unless E can be satisfied by some truth assignment to its variables. Conversely, if there exists a truth assignment that satisfies E, let M ' C M be constructed as follows" for each clause Cj, let zj E {wi, wi " 1 _ i < n} A Cj be a literal that is set true (there is at least one). Set

U

w~--true

U

w i : f alse

m

T!) u ( U

j=l

[;] , ,, [;] , F] ) }) , a,,

where G' is a well chosen subset of G that includes all the gl[k],g2[k] and remaining w~[j] and ~ [ j ] . Such a G' can always be found and the resulting M ' is a matching included in M. Next, for all variables ui (1 < i < ml), let M1 contain one triple belonging to T~ and let M2 - M \ M1. This construction is polynomial. We now have to prove that for every truth assignment to u l , . . . , um~, there exists a truth assignment to Vm ~+ 1, 9 9Vml 9 +m~, that satisfies E if and only if for all S1 C M1, there exists 5"2 C M2, such that S1 U $2 is a matching with sets X1, X2, X3, M1 and M2 as above. First suppose that for all S1 C_ M1, there exists $2 C_ M2, such that S1 u $2 is a matching. For any truth assignment to u l , . . . , urn1, let S1 contain the triples in M1 that correspond to the variables that are set true. Let $2 C M2 be such that S1 u $2 is a matching. Assign true to those variables vj for which T] C_$2 and false to the others. This assignment makes E true. Conversely, assume that for every truth assignment to U l , . . . , Uml, there exists a truth assignment to Vml+lt...tVml+m2, that satisfies E. Let S1 be any subset of M1. Assign true to those variables u~ for which T~ A S1 is nonempty and false to the others. Assign true to the variables vj so that E is true. There exists a matching S that contains T[, T] for all the true variables ui, vj and for none of the false variables. Let $2 = S \ SI: S = S1 U $2 and

S2 C_ M2. Next, starting from any instance of V3-3-DM, we have to construct, in polynomial time, an instance of UB-LIN in such a way that positive and negative instances correspond in UB-LIN and V3-3-DM. Let M1, M2, X1, X 2 , X 3 be an instance of V3-3-DM. Let M = M 1 U M 2 (so ]M[ = IMxl + IM2[), let p = ]Xi I and Xi = {xi,1, xi,2,..., X,,p) for i = 1, 2, 3. Let w = p and

20.2. The complexity of computing the covering radius of a binary code 487

H

[.1] H2

be a binary matrix of dimensions ( 3 p + [M~I) x 8IMI, where H i and H2 have dimensions 3p x 81MI IM I x 81MI, respectively. If t = (z.l,i,z.2,j,z.3,k) belongs to M, then we let H i contain one column, Htl (~ with exactly three ones in the positions corresponding to zl,i, z.2,j and z3,k, and seven columns, Ht~(7), obtained from Ht~(~ by replacing ones by zeros in all possible ways. So each triple t E M is associated with eight columns in H. Each row in H2 corresponds to a triple in M1, with ones in the eight columns associated with this triple and zeros elsewhere. This construction is polynomial. We now have to prove that

Ht~(~), Ht~(2),...,

VS1 C_ M1, 3S2 C_ M2, such that S1 U $2 is a matching if and only if Vy C IF TM,3x C IF ~, such that H x T - y and w(x) _< w, with m - 3p + IMxl, n - 81MI, matrix H as above and w - p -

IX~l.

Assume that for all S1 C_ M1, there exists $2 C_ M2, such that S1 u $2 is a matching. Let y be any vector of length 3w + ]Mll. The last ]M1] coordinates of y correspond to the triples in Mi and the ones in these locations select a subset Si of Mi. Choose $2 C_ M2, such that S = SiUS2 is a matching. Let y' be the vector of length 3w obtained from y by taking its first 3w components. Let 9stand for the componentwise product. Because ~ t e s Htl (~ - 13~, we get" y ' -

(~-,t6s Htl(~ * Y ' - ~ t e s (Hti (~

Y'))" Since Hti(~

9y ' -

Htl (j)

for some j between 0 and 7, this means that y' is the sum of ISI = w columns of H i . But the way H was constructed and S1 was chosen from y also shows that the sum of these same w columns of H is equal to y, i.e., y = H x T, with w(x) = w. Since y was arbitrary, the UB-LIN property holds. Conversely, assume that for all y E IF 3w+lMll, there exists x E IF 8IMI, such that H x T - y and w(x) < w. For any set S1 C_ M1, let y be the vector with all ones in the first 3w coordinates and with ones in those coordinates among the last IMll that correspond to triples in S1. Then H l x T = 13w and H l x T is the sum of at most w columns of H i , each column containing at most three ones. Thus, H l x T is the sum of exactly w columns of H i , each column containing exactly three ones. So H x T is the sum of w columns H t(~ of H and these w columns select a matching S. Since this sum has ones in just those positions among the last IMll that correspond to triples in S1, it follows that the triples in M1 that are contained in S are just those in S1. Therefore S = S1 U $2, where $2 C_ M2. Thus the V3-3-dimensional matching property holds.

488

Chapter 20.

Complezity

This proves, together with UB-LIN E II2, that UB-LIN is II2-complete. [] We now deal with the nonlinear case and state the (decision) problem of lowerbounding the covering radius of a binary code as follows: NAME: Lower bound on the covering radius of a binary code (LB-

NLIN). INSTANCE: A binary code C C_ IF'~, given explicitly by its elements, an integer w. QUESTION: Is it true that 3y E IF'~, such that Vc E C, d(y, c) >_ w? We get a YES answer to the question if and only if R ( C ) >_ w. This representation of the problem immediately shows that it belongs to NP, since it can be checked in polynomial time, for a given vector y, whether d(y, C) > w or not. In order to prove the NP-completeness of LB-NLIN (Theorem 20.2.5 below), we need the following definitions, notations and easy lemmas. We say that a vector v = (Vl, v ~ , . . . , v2,~) E IF 2'~ is doubled if and only if v2i-1 = v2i for all i = 1 , 2 , . . . , n . Let u(i) E IF 2i denote the vector (0101...01). Let Y2k - { ( 0 1 ] u ( n - 1 ) ) , ( 1 0 ] u ( n - 1 ) ) , ( 0 1 ] ~ ( n - 1 ) ) , (10}~(n-

1))}.

L e m m a 20.2.2 I f v - (vl, v 2 , . . . , v2,~) E IF 2"~ is such that for all y E Y2~, d(v, y) ~ n, then v2 - vl. [] Let Y~,~ - sj (Y2~), where sj denotes the circular right shift of 2 j - 2 bits, for j - 2 , 3 , . . . , n . L e m m a 20.2.3 I f v - (vl, v 2 , . . . , v2,~) E IF 2'~ is such that for all y E Y~n, d(v, y) < n, then v2j - v 2 j - 1 . [] 1]~ Y J2~, 'IY2nl2 n + 2, since ( O l l u ( n - 1 ) ) a n d Let Y2~ - --j=l are invariant under all even circular shifts.

(lOll(n-1))

L e m m a 20.2.4 A vector v E IF 2~ is doubled if and only if d(y, v) < n for all y E Y2~.

We are now ready to prove the following:

[]

20.2.

The complexity of computing the covering radius of a binary code 489

Theorem

20.2.5

The decision problem LB-NLIN is NP-complete.

P r o o f . We reduce 3-satisfiability, which is mentioned to be NP-complete in Section 20.1, to LB-NLIN. N A M E : 3-satisfiability (3-SAT). I N S T A N C E : A boolean formula E, in conjunctive n o r m a l form, with exactly three distinct literals in each clause. Q U E S T I O N : Can E be satisfied?

Starting from any instance of 3-SAT, we have to construct, in polynomial time, an instance of LB-NLIN in such a way t h a t positive and negative instances correspond in LB-NLIN and 3-SAT. Let E - C1/~ C2 A . . . / ~ C,~ be an instance of 3-SAT, each clause Cj, defined over the set of variables { z l , x 2 , . . . , x,~}, consisting of exactly three distinct literals. For each clause Cj, let z ( C j ) - ( z l , . . . , z2,~) E IF 2'~ be the vector defined by z2i-1 - z2i - 0 if Cj contains the literal ~i; z2i-1 - z2i - 1 if Cj contains the literal xi; z2i-1 - 0 and z2i - 1 otherwise. We define C C IF 2'~+2 as follows: c =

1 < j < m)u

Finally, let w - n + 1. T h e code C contains m + 4 + 2n codewords and its construction is polynomial in rim, the size of the instance of 3-SAT. We now have to prove t h a t E can be satisfied if and only if C has covering radius at least w. First suppose t h a t E can be satisfied; a t r u t h assignment to the variables {x 1, x2, 9 9 x,~} t h a t satisfies E can be represented by a vector v = (vl, v2, 9 9 v,~) C IF ~. Let v* - (vl, vl, v2, v2, . . . , v,~, v,~, 0, 0) C IF 2'~+2. T h e n for all c E Y2(,~+~), d(c, v*) _< n + 1 (by L e m m a 20.2.4). Moreover, in each clause Cj there is at least one literal which is set true by v; it is easy to see then t h a t d((z(Cj)100), v*) ~ 2 + 2 -t- 0 + (n - 3) - n + 1. Finally, for all e e C, d ( c , v * ) _< n + 1. This implies t h a t d(C,V*) >__2n + 2 - (n + 1) = n + 1, so

R(C) > Conversely, assume t h a t R(C) >_ w - n + 1. T h e n there exists v* E IF 2'~+2, d(v*, C) >__ n + 1 and d(V*, c) _~ n + 1 for all codewords c. In particular, for all c E Y2(,~+1), d(V*,c) _~ n + 1. By L e m m a 20.2.4, V* is doubled:

Chapter 20.

490

Complezity

V* - (vi, vi, v2, v 2 , . . . , v,~+i, v,~+i). Furthermore, d(V*, (~.(Cj)IO0)) _ n + 1 for all j, so

d((vi, vi, v2, v2, .. ., v,~, v,~), z~(Cj)) <_ n + 1. Let v = (vi, v2,..., v,~). The structure of z(Cj) shows that there exists i E ( 1 , . . . , n } such that z 2 , - i - z2, and d((v,,v,), (z2,-i, z2,)) - 0 . This means that the truth assignment defined by v satisfies clause Cj" if z2i-i - z2i vi = 1 (respectively, 0), then variable x{ is true (respectively, false) and xi (respectively, ~i) belongs to C I. Hence E is satisfied. []

As an immediate consequence, the decision problem associated to the problem of upperbounding the covering radius of a nonlinear code is co-NPcomplete.

20.3

Derandomization

Recall (see Definition 3.7.5) that a binary n x m array is called t-independent or t-surjective if any n • t subarray contains among its rows all 2 t possible t-tuples (see for instance Sections 3.7 and 6.2 for the use of surjectivity for covering codes - - especially 2-surjectivity). We denote by f(n, t) the maximum number of columns in a t-surjective array with n rows, and by g(m, t) the minimum number of rows in a t-surjective array with m columns. The study of I(n, t) is equivalent to that of g(m, t). In Section 3.7, we use for g(m, t) the alternative notation ms2(m, t). A family 3r of vectors of IFn is said to be t-independent, or t-suTjeciive, if M, the n • [3rl two-dimensional array whose columns are the elements of 3r, is t-surjective. Thus y(n, t) is the maximum size of a t-independent family of vectors of IF '~. Let us mention two applications of the study of f(n, t). 1. VLSI testing. Suppose a circuit C has N binary inputs with the property that each output is influenced by at most t inputs. Let Y" be a tindependent family of vectors of IFn of size I~'1 = N. Then the n rows of M make up an exhaustive set of test vectors for the circuit C: this means that ff C responds correctly to the n input vectors, then we can guarantee that C is not faulty. 2. Writing on binary memories with defects. In a defective memory, some positions are stuck at "0" while others are stuck at "1". Suppose the total number of defective positions does not exceed t. A code adapted to such a memory is a set ( M i } of row-disjoint n • [~'~1 t w o - d i m e n s i o n a l arrays, where each -~'i is t-independent. Encoding message i is done by picking a row out of Mi that "matches" the memory's defects.

20.3. Derandomizar

491

For t - 2, the problem of determining f(n, t) has been solved (cf. Section 6.2). In the general case t ~ 3 however, gaps remain between lower and upper estimates of the size of the largest t-independent families.

Constructive issues We use the following terminology for t-independent families. An infinite sequence (Yn)n~162, where each :Tn is a t-independent family of vectors of IF'~, is called constructive if there exists an algorithm that computes any member of Y,~ in (worst case) time complexity polynomial in n. It is called semiconstructive if there is an algorithm which computes any member of Y,~ in complexity polynomial in lY,~I. Nonconstructive bounds on the maximum size f(n, t) of a t-independent family of vectors of IF'~, when n goes to infinity for constant t, have been obtained by random arguments proving that f(n, t) - 2 c*'~ with

+o(1)) <

< 2

(20.3.1)

Semi-constructive t-independent families (Y,~) which also satisfy fl(t_12_t) _< 1 log [~,~[ n

(20.3.2)

have been obtained, although the constants are worse than those obtained by random arguments. Theorem 20.3.5 below gives better semi-constructive family sizes. Up to now, the best result on the largest possible constructive exponential sized t-independent families is lY,~]- 2 c*'~+~

where c t - 8t-32-2t(X + o(1)).

(20.3.3)

We now use an altogether different strategy, somewhat in the spirit of derandomization, for obtaining semi-constructive t-independent families. The object is to turn a randomized algorithm into a deterministic one. The idea is to replace random choice on an exponentially large space by exhaustive search on a polynomially small sample space that "inherits" the original probability distribution. The problem of finding the minimum number g(m, t) of rows of a tsurjective array of length rn can be rephrased (see Becket and Simon [64] and Z6mor and Cohen [703]) as a transversal or covering problem, namely: P r o b l e m 1 Find the minimum cardinality of a subset 7~ of IFm such that Tr n K,~_, ~ 0 for every ( m - t)-dimensional face K,~_, of IFTM.

Chapter 20.

492

Complezity

This is reminiscent of the classical covering problem: P r o b l e m 2 Find the minimum cardinality of a subset C of IFn such that c n

B~(x) # 0

for every sphere of radius r. The problem is now sufficiently transformed for us to apply a theorem which gives an efficient covering by means of a greedy algorithm. View IFTM as the vertex set of a hypergraph, the set 7 / o f hyperedges being the ( m - t)dimensional faces (see Berge [66] for an account on hypergraphs). Note that every hyperedge has size b - 2m-t and that every vertex belongs to A - ( t ) hyperedges: (IF m, 7t) is b-uniform and A-regular. Let us restate the 3ohnsonStein-Lovs theorem in a form convenient to us. T h e o r e m 20.B.4 y (V, 7/) is a b-uniform and A-regular hypergraph, then a greedy algorithm outputs a transversal of (V, 7/) with at most I-~J-(1 + In A) elements. Proof.

For an elementary proof, see Theorem 12.2.1.

Consequently,

g(m,t) < 2 t ( 1 + In ( 7 ) ) . Asymptotically (for large m), this reduces to the lower bound in (20.3.1). As it stands, the complexity of the greedy algorithm is exponential in m, and we have not gained anything on the existential result of (20.3.1). We now improve on this in the following way. Consider an [m, k] linear code C such that its dual C • has minimum distance t + 1. Then (see the last paragraph of Section 2.2) C is an orthogonal array of strength t. In other words, the 2 k codewords of C make up the rows of a t-independent array with the stronger requirement that on any t given column positions, every binary t-tuple appears in exactly 2 k-t rows. Consider now the induced hypergraph (C, 9t') where 9t' = {H n C : H 6 7/}. Clearly, (C, 7 / ' ) i s (t)-regular and 2k-t-uniform because of the orthogonal array property. Applying Theorem 20.3.4, we obtain with a greedy algorithm a transversal of (C, 9/'), which is also a transversal of (IF TM,7/). Its size is the same as before, but we now have a substantial gain in complexity: the set of vertices on which we are performing our greedy algorithm has size 2 k instead of 2TM. For a given t and for m a large enough primitive length (m = T - 1), we can choose for C • a BCH code (see Section 10.1) with dimension m - k - m log(m + 1), yielding IcI - (m + The greedy

20.4.

493

Notes

Table 20.1" Achievable sizes for t-independent families ~ ' ' I.~1 > 2 c`'~. t-3 t--4 t

constructive 1/12.34 1/148.68 ct ~ 8/t322t

semi-constructive 1/9.50 1/44.36 ct ~ 1/t2t In 2

i ii v i. ii.

existential 1/7.44 1/27.32 c t . ~ 1 / ( t - 1)2tln2

iii iii iii

Sloane [593]. Cohen and Z~mor [172].

iii.

Roux [565].

iv.

T h e o r e m 20.3.5.

v.

ii iv iv

(20.3.3).

algorithm is now polynomial in m, which means that the t-independent family we obtain in this manner is semi-constructive. This is an example of derandomization. Reverting to the notation of t-independent families, we have obtained" T h e o r e m 20.3.5 There is a semi-constructive sequence :Trn o f t - i n d e p e n d e n t families ofIF "~ of size 2 c''~ with ct - (1/t2'ln 2)(1 + o(1)). [] This coincides with the lower bound of (20.3.1), and is the best to date in the semi-constructive case for t > 4.

20.4

Notes

For results on the complexity of other problems related to coding theory, see Berlekamp, McEliece and van Tilborg [69], Ntafos and Hakimi [504], Diaconis and Graham [200], Lobstein and Cohen [451], Bruck and Naor [106], Lobstein [450], Stern [623], Barg [50], Vardy [660], mainly from an NP-completeness point of view. w For a deeper, more formal account of the theory of NP-completeness and polynomial hierarchy, we refer the interested reader to Garey and D. S. Johnson [247] and Barth61emy, Cohen and Lobstein [51], which have been largely used for the exposition of Section 20.1. The Ilk-completeness of V132V3... Qk-3-satisfiability is due to Meyer and Stockmeyer [484].

w Theorem 20.2.1 was proved by McLoughlin [481] in 1984. In that proof, in the reduction of V3-3-satisfiability to V3-3-dimensional matching, the

494

Chapter 20. Complexity

construction of M, starting from formula E, is from Garey and D. S. Johnson [247]. Theorem 20.2.5 and the preceding lemmas are by Frances and Litman [241] (1994). However in [130], Carnielli mentions that "Interest in more specific methods of attack on the hyper-rook domain problem is justified because this problem is hard to treat in algorithmic terms: indeed, as pointed in [129], it is an NP-complete problem. This fact can be proved just by showing that a particular case of the problem, when formulated in algorithmic terms, reduces to the matrix domination problem (see [247, w which is known to be an NP-complete problem." w The study of f(n,t) originates in R6nyi [550], under the name of qualitative independence, and has since been extensively studied by Kleitman and Spencer [380], Alon [14], Roux [565], Freiman, Lipkin and Levitin [242], Alon, Bruck, J. Naor, M. Naor and Roth [18], Sloane [593]. For applications to VLSI testing, see Seroussi and Bshouty [577]. About writing on memories with defects, see, e.g., Dumer [211] and references therein. Let us also mention that t-independent families have applications in e-biased probability spaces, see J. Naor and M. Naor [502], and derandomization, see Alon, Babai and Itai [16]. The values of f(n, 2) are known (see Section 3.7). A q-ary generalization, namely the qualitative 2-independence problem, is solved asymptotically by Gargano, Khmer and Vaccaro [248]. Inequalities (20.3.1) are due to Kleitman and Spencer [380]. An improvement of their argument yielding better numerical constants is given by Roux

[565].

Inequality (20.3.2)is by Freiman, Lipkin and Levitin [242], where the problem of finding the largest possible semi-constructive t-independent families is studied. Theorem 20.3.5 is by Cohen and Z6mor [172]. The problem of finding the largest possible constructive exponential sized t-independent families was studied first in Alon [14], then by Alon, Bruck, J. Naor, M. Naor and Roth [18], where the inequality 12(t-12 -3t) _< ~1 log I~',~[ was established. The improvement (20.3.3) is due to Cohen and Z6mor [172].