Upper bounds for covering arrays by tabu search

Upper bounds for covering arrays by tabu search

Discrete Applied Mathematics 138 (2004) 143 – 152 www.elsevier.com/locate/dam Upper bounds for covering arrays by tabu search Kari J. Nurmela Depar...

208KB Sizes 2 Downloads 97 Views

Discrete Applied Mathematics 138 (2004) 143 – 152

www.elsevier.com/locate/dam

Upper bounds for covering arrays by tabu search Kari J. Nurmela Department of Computer Science, Helsinki University of Technology, Espoo 012150, Finland Received 31 October 2000; received in revised form 13 February 2003; accepted 22 February 2003

Abstract A t-covering array is a collection of k vectors in a discrete space with the property that, in any t coordinate positions, all combinations of the coordinate values occur at least once. Such arrays have applications, for example, in software testing and data compression. Covering arrays are sometimes also called t-surjective arrays or qualitatively t-independent families; when t = 2 covering arrays are also called group covering designs or transversal covers. In an optimal covering array the number of vectors is minimized. Constructions for optimal covering arrays are known when t = 2 and the vectors are binary vectors, but in the other cases only upper and lower bounds are known. In this work a tabu search heuristic is used to construct covering arrays that improve on the previously known upper bounds on the sizes of optimal covering arrays. ? 2003 Elsevier B.V. All rights reserved.

1. Introduction Let Zq denote the set {0; 1; : : : ; q − 1} and let Z = Zq1 Zq2 · · · Zqn be an arbitrary mixed discrete space. A word in this space is an n-tuple (vector) x = (x1 ; x2 ; : : : ; x n ), where xi ∈ Zqi . In most cases in this paper q = q1 = q2 = · · · = qn , and we denote the space by Z = Zq1 Zq2 : : : Zqn = Zq Zq : : : Zq = Zqn . A code over Z is a multiset of words of Z. A t-covering array in Z is a code C over Z with the property that given any word w = (w1 ; w2 ; : : : ; wn ) ∈ Z and any t distinct coordinate positions, i1 ; i2 ; : : : ; it , there is a word x = (x1 ; x2 ; : : : ; x n ) ∈ C such that xi1 = wi1 , xi2 = wi2 ; : : : ; and xit = wit . Sometimes t is called the strength of the covering array; most literature considers the cases t = 2 and 3. 

Supported by the Academy of Finland.

0166-218X/$ - see front matter ? 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0166-218X(03)00291-9

144

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

Covering arrays have applications in a number of Aelds, including software testing, compression of inconsistent data, computer architecture design, search theory, and truth functions (see [16] for references). Covering arrays are sometimes also called t-surjective arrays or qualitatively t-independent families. When t = 2, covering arrays can be seen as group covering designs or transversal covers. It is easy to construct a covering array, if the number of words does not have to be small, or if the number of coordinates is very small. In practical applications the arrays with a minimum number of words are desired. A covering array in Z with the smallest possible number of words is called optimal. Orthogonal arrays of index one are examples of optimal covering arrays, however, such orthogonal arrays exist only when the length of words is small [5]. A generalized version of the problem of Anding optimal t-covering arrays is NP-complete [12], and no eFcient methods are known to construct large optimal covering arrays, except for two-covering arrays in Z2n , which was solved completely by RGenyi, Katona, Kleitman, and Spencer (see [13] for references). 1.1. Example It is often convenient to write the covering array in a matrix form. The matrix form of a t-covering array with k words in Z = Zq1 Zq2 · · · Zqn has k rows and n columns, where the rows are the words of the array. The order of rows is not signiAcant; the matrices obtained by diIerent permutations of rows correspond to the same covering array. Furthermore, two covering arrays are equivalent if one can be obtained from the other by a permutation of coordinates of equal arity and by a permutation of the values in each coordinate. A nontrivial example of a covering array is   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1   0 0 0 0 0 1 1 0 1 2 2 2 2 2 2 2 2 2 2 2     0 1 1 1 1 0 1 2 2 0 0 0 0 1 1 1 2 2 2 2     0 2 2 2 2 2 2 0 1 0 0 0 0 1 2 2 0 1 1 1   1 0 1 1 1 2 2 0 1 0 1 1 2 0 0 1 1 0 1 2     1 1 2 2 2 1 0 1 0 2 1 1 0 0 2 1 2 2 1 0     (1) A =  1 2 0 1 2 0 2 1 0 2 0 2 2 1 0 2 1 0 2 1 ;   1 2 1 0 2 2 1 2 0 1 2 1 1 0 1 2 0 2 0 1     1 2 1 2 0 2 1 1 2 2 1 0 1 2 0 0 2 1 0 0     2 0 2 2 2 0 1 2 2 1 2 2 0 2 2 0 1 0 1 2   2 1 0 2 1 2 0 2 2 2 1 2 2 0 1 2 0 1 2 0     2 1 2 0 1 1 2 0 2 1 0 1 1 2 1 0 2 0 2 1     2 1 2 1 0 1 2 2 1 1 2 0 2 1 0 0 1 2 0 0 2

2

1

1

1

1

0

1

0

0

2

2

1

2

2

1

0

1

0

2

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

145

which is a two-covering array in Z320 with 15 words. We say that a t-combination of Z is comprised of t distinct coordinate positions and their values so that the values conform to the arities of the coordinates in Z. A t-combination is denoted by a set of ordered pairs {(p1 ; u1 ); (p2 ; u2 ); : : : (pt ; ut )}, where p1 ; p2 ; : : : ; pt are distinct coordinate positions and u1 ; u2 ; : : : ; ut are the corresponding coordinate values. A word x = (x1 ; x2 ; : : : ; x n ) ∈ Z covers a t-combination p = {(p1 ; u1 ); (p2 ; u2 ); : : : (pt ; ut )}, if xp1 = u1 , xp2 = u2 , : : :, and xpt = ut . Now a t-covering array in Z is a collection of words of Z that cover all possible t-combinations in Z. Now, for example, the two-combination {(1; 0); (2; 1)} is covered only by the word on the fourth row of (1), while the two-combination {(1; 0); (2; 0)} is covered by each of the words on the Arst three rows. Previously published two-coverings in Z3n with 15 words have n = 13 [3], n = 15 [4,13], and n = 16 [14], so our n = 20 in (1) is an evident improvement. It is not known if 20 is the maximal number of coordinates in this case, so n = 20 is a lower bound for the number of ternary coordinates that can be present in a covering array with 15 words. 1.2. Bounds on covering arrays Let gt (Zqn ) be the smallest possible number of words in a t-covering array in Zqn . Array (1) shows that g2 (Z320 ) 6 15: On the other hand, let nt; q (k) be the largest n such that a t-covering array exists in Zqn . Array (1) gives n2; 3 (15) ¿ 20: It is clear that a lower bound n on nt; q (k) gives an upper bound k on gt (Zqn ), and vice versa. Either of these approaches can be used when trying to And (near) optimal covering arrays; in this work we try to And good upper bounds on gt (Z) by constructing covering arrays with a computer search. Lower bounds on gt (Z) include the trivial bounds gt (Zqn ) ¿ qt and gt (Z) ¿



(2)

q;

q∈Q

where Q ⊆ {q1 ; q1 ; : : : ; qn } and |Q| = t. These bounds do not depend on the number of coordinates, which is in contrast to the lower bound  q log2 n n +1 g2 (Zq ) ¿ 2 in [16].

146

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

In some rare cases integer programming has been used to prove lower bounds for speciAc instances of covering arrays [13]. Nonexistence results for orthogonal arrays improve on the bound (2) slightly (see for example [5]). For further lower bounds, see [6,9,10,14,16]. The sizes (and constructions) of the optimal two-covering arrays in Z2n are known: we have

k −1 n2; 2 (k) =  k2 (see [13] for the references to the works by RGenyi, Katona, Kleitman, and Spencer). 1.3. Covering arrays in software testing When applying covering arrays to software testing, each word of the array corresponds to a test case and each coordinate value gives the value of the corresponding test parameter, see e.g. [3]. In this application the space is usually mixed, because diIerent parameters can have diIerent number of possible values. Often some t-combinations are impossible to test, because the parameter values conKict (these combinations are here called infeasible). In these cases we can restrict the space Z so that some words are not included; we are only covering the feasible t-combinations. Most approaches to testing by covering arrays have used t = 2, but it is also possible to use t ¿ 2. The problem with larger values of t is that the number of test cases grows rapidly. The compromise proposed in [3] is “hierarchical” covering arrays: all feasible two-combinations are covered, but in addition to these, all feasible three-combinations whose coordinate positions belong to a specially selected set (the most critical parameters of the system), are also covered. These extensions can be easily and reasonably eFciently included in the tabu search algorithm of Section 2. However, we did not attempt to make tables of upper bounds for these kinds of arrays, because such arrays would necessarily be very large due to the large number of possible parameter selections. The rest of this paper is organized as follows: in Section 2 we describe the tabu search method that is used to And better covering arrays than previously known. The search results are summarized in Section 3; more speciAcally, we give tables of upper bounds for g2 (Zqn ) for 3 6 q 6 10 (emphasis on q = 3), a few improved upper bounds on g3 (Zqn ), and a couple of examples of improved upper bounds on g2 (Z), where Z is a mixed space. 2. Tabu search for covering arrays There has been several earlier computational approaches to constructing covering arrays. The AETG system [3] uses a greedy algorithm, where the array is constructed by adding new words one by one according to a special heuristic. AETG is reasonably fast, but it does not usually give optimal covering arrays.

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

147

Stevens uses simulated annealing in his thesis [14] to construct small covering arrays (transversal covers). The cost function is the number of uncovered two-combinations, and proposed moves are composed by selecting a random word and changing a random coordinate value in that word. A proposed move is accepted with a probability that decreases exponentially with time. The number of words is constant during one simulated annealing run; after the run the number of words is adjusted for the next run according to whether a covering array was found or not. Sloane [13] also mentions an integer programming approach by Cook and a computer program CATS by Sherwood. In this work we use a tabu search approach with a special heuristic neighborhood. Tabu search [7] is a framework for stochastic search heuristics, and it has been successfully applied to a variety of combinatorial optimization problems. Our approach is easiest to describe by considering the matrix representation of a covering array. We start with a k × n random matrix, where the rows correspond to the words of the covering array. The number of uncovered t-combinations is the cost of the matrix. Next we select one uncovered t-combination at random. We check which rows require only the change of a single element such that the row covers the selected combination. These changes are the moves of the current neighborhood. The cost change corresponding to each such move is calculated and the move leading to the smallest cost is selected, provided that the move is not tabu. If there are several equally good nontabu moves, we select one of them at random. Then another uncovered t-combination is selected and the process is repeated until a matrix with zero cost (covering array) is found or the number of moves has reached a prescribed limit. The number of words is constant during one tabu search run; if a covering array is found, we try to And a better array by reducing the number of words and restarting the tabu search. The tabu condition prevents changing an element of the matrix, if it has been changed during the last T moves. The purpose of the tabu condition is to prevent looping, and possibly to diversify the search in the search space. Looping is not a big problem, because the neighborhood is not symmetric (if a move changes matrix A to matrix A , it is often the case that there is not a move in the neighborhood of A that changes it back to A), so relatively small values of T work well. In our searches typically 1 6 T 6 10. Occasionally in the very beginning of the search it can happen that the neighborhood of the selected random matrix is empty (more than a single element change is needed on each row of the matrix to cover the selected random t-combination). In these cases we allow changing any of the rows to cover the selected t-combination. Because often very many moves are required to And a covering array (many of the new arrays in this work were found only after millions of moves), it is important that the moves can be performed quickly. It is easy and relatively fast to compute which t-combinations are covered by a given word whenever this information is needed, but it turns out to be advantageous to store this information in the memory in case it is needed again in the near future of the computation (a cache for covering information). A complete look-up table is too large to At in the memory in most of the cases studied

148

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

in this work; when the computation proceeds, the memory used by the least recently needed covering information must be reused. We do not impose any symmetry constraints on the covering array at any stage of the search. When the covering array is large, this kind of unstructured search becomes ineFcient. Limiting the structure of the covering array can help to And larger covering arrays, but such considerations are beyond the scope of this work.

3. Results The main goal of this study is to tabulate good upper bounds on g2 (Zqn ) for small q and n, since so little is known about even those values (except for q = 2). Table 1 gives the best currently known upper bounds on g2 (Z3n ) for n 6 92378. If an entry for a given n is missing, the next entry is used. Table 2 gives upper bounds on g2 (Zqn ) for n 6 15 and 4 6 q 6 10. All bounds marked exact (·) either meet (2) or are from [16], except the exact value g2; 3 (5) = 11 that is attributed to Applegate in [13]. Our search algorithm seems to work best for t = 2 and q = 3. The entries in Table 1 give many improvements on the bounds in [1,13,14]. Table 2 of [1] contains some erroneous values [2]; they are not included in Tables 1 and 2. Table 1 Upper bounds for g2 (Z3n )

n

Bound

4 5 7 9 10 20 24 30 36 43 60 68 83 104 126

9.a

11.c 12.d 13d 14b 15b 17b 18b 19b 20b 21b 22b 23b 24b 25b

n

Bound

156 191 235 252 312 462 470 792 1716 3003 6435 11440 24310 43758 92378

26b 27b 28b 31e 32e 33f 34e 36f 39f 42f 45f 48f 51f 54f 57f

n ).—Exact value. None—g2 (Zqn ) 6 g2 (Zqn+1 ) or g2 (Zqn ) 6 g2 (Zq+1 a Orthogonal array is known to exist [5]. b Tabu search. c Ref. [8]. d Ref. [14] e Eq. (4). f Eq. (3). g Ref. [15].

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

149

Table 2 Upper bounds for g2 (Zqn ), 5 6 q 6 10, 3 6 n 6 15

n

q=4

q=5

q=6

q=7

q=8

q=9

q = 10

3 4 5 6 7 8 9 10 11 12 13 14 15

16. 16. 16:a 19:b 21b 23b 24 24c 25c 26c 27 27c 28b

25. 25. 25. 25:a 29:b 33c 35c 37c 38c 40c 41c 42c 43c

36:a 37:b 39c 41c 42 42c 48b 52c 55c 57c 58c 60c 61c

49. 49. 49. 49. 49. 49:a 63 63b 73c 76c 79c 81c 83c

64. 64. 64. 64. 64. 64. 64:a 80 80d 99c 102c 104c 107c

81. 81. 81. 81. 81. 81. 81. 81:a 120 120 120 131c 135c

100. 100:a 109c 117c 120 120 120 120 120 120 120d 162c 166c

It is diFcult to say how close to the optimal values the new bounds are, because the tabu search algorithm is stochastic: many of the new bounds can probably be slightly improved simply by using the same search method with more computing time. Most of the CPU time was used when q = 3 (total amount of CPU time used was a few months on a 500 MHz Pentium PC). Implementation details aIect to the eFciency of the program: we think that our implementation is reasonably eFcient, although some minor optimizations could be made. When searching for a two-covering array in Z337 with 19 words, and when the amount of memory accessible to the process is limited to 18 MB, our implementation (written in the C + + programming language) makes roughly 3:3 × 106 moves per CPU hour on a 500 MHz Pentium running the Linux operating system. The actual covering arrays giving the bounds marked with d in the tables take too much space to be listed here. They are available in a matrix form in the World Wide Web at address http://www.tcs.hut.fi/∼kjnu/covarr.html. The arrays— like the matrices (1) and (5)—are given in a canonical form (except the arrays in mixed spaces), where the string obtained by writing the rows one after another is lexicographically minimized under all permutations of rows and columns, and under all permutations of symbols in a row. It can be shown that two covering arrays have the same canonical form exactly when the arrays are equivalent. Tabu search can be used to And several nonequivalent arrays (assuming that there are several of them) for the same parameters by restarting the search from diIerent initial matrices. In this work we were mainly interested in computing new upper bounds on the sizes of optimal covering arrays, so only one array for each bound is given in the web address mentioned above. In addition to the bounds found by tabu search and the old bounds from the references, there are a few bounds in Table 1 obtained by simple constructions. The largest

150

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

entries (marked with f) are from a construction by [13], where a three-covering array is made from three copies of an optimal two-covering array (one row removed). These arrays give the bound

n2; 3 (3k) ¿



k  k+1 2

:

(3)

Another simple construction (marked with e in Table 1) for a two-covering array in Z32n can be obtained from a two-covering array in Z3n by making two copies of the original array and adding six words as follows. Assume that B is a k × n matrix corresponding to a two-covering array in Z3n Then a matrix A corresponding to a two-covering array in Z32n with k + 6 rows can be constructed by    0   0   = 1 A  (k+6)×2n  1   2  2

B

B



:::

0

1

:::

:::

0

2

:::

:::

1

0

:::

:::

1

2

:::

:::

2

0

:::

  1   2   : 0   2   0 

:::

2

1

:::

1

k×n

k×n

To see that A covers all two-combinations of Z32n , consider a two-combination {(k; v); (l; w)}. Now if k 6 n and l 6 n the combination is covered by one of the copies of B. The other copy of B covers the combinations where k ¿ n and l ¿ n. The remaining cases are covered by the six additional rows and thus g2 (Z32n ) 6 g2 (Z3n ) + 6:

(4)

Both constructions (3) and (4) are easy to generalize for q ¿ 3. However, it is diFcult to say how good these bounds are for q ¿ 3, since not enough is known about these cases. Our search algorithm can be used also in mixed spaces. At the moment, very few results on covering arrays in mixed spaces have been published. Among the only examples are those given in [3], where AETG is used to And two-covering arrays showing that g2 (Z229 Z317 Z415 ) 6 41 and g2 (Z235 Z339 Z41 ) 6 28. The tabu search algorithm of Section 2 found covering arrays showing g2 (Z229 Z317 Z415 ) 6 29 and g2 (Z235 Z339 Z41 ) 6 21, so in these cases tabu search gave clearly better arrays than AETG.

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

151

bounds for three-covering arrays in Z2n , although it seems for two-covering arrays. Matrix  0 0 0 0 0 0 0  0 0 0 0 0 1 1   0 1 1 1 1 0 1   1 0 0 0 1 0 1   1 0 1 1 0 1 0   1 1 0 1 0 1 0   0 1 0 0 0 0 1   0 0 1 0 1 1 0 (5)    0 1 1 0 0 1 0  1 1 0 0 1 1 0   0 0 0 1 0 0 1   1 0 1 0 0 0 1   0 0 0 1 1 1 0   1 1 1 1 1 0 0  1 1 1 1 1 1 1 1

We also found three new that the search works better  0 0 0 0 0  0 0 0 0 0   0 0 0 0 0   0 0 0 1 1   0 0 1 0 1   0 1 0 1 0   0 1 1 0 1   0 1 1 1 0   1 0 0 1 1   1 0 1 0 0   1 0 1 1 0   1 1 0 0 0   1 1 0 0 1   1 1 1 1 1  1

1

1

1

is a three-covering array in Z212 with 15 words, so g3 (Z212 ) 6 15. Roux [11] (see [11]) gives a construction g3 (Z22n ) 6 g3 (Z2n ) + g2 (Z2n );

(6) g3 (Z224 ) 6 22

which when applied to (5) gives two other new bounds, improving slightly in Table III in [13] and Table I in [1].

and

g3 (Z248 ) 6 30,

Acknowledgements R The helpful comments of P.R.J. Osterg aS rd, M. Chateauneuf, B. Stevens, and D.L. Kreher are gratefully acknowledged. References [1] M. Chateauneuf, Covering arrays, Ph.D. Thesis, Michigan Technological University, Houghton, 2000. [2] M. Chateauneuf, Private communication, 2000. [3] D.M. Cohen, S.R. Dalal, M.L. Fredman, G.C. Patton, The AETG system: an approach to testing based on combinatorial design, IEEE Trans. Software Eng. 23 (1997) 437–444. [4] D.M. Cohen, M.L. Fredman, New techniques for designing qualitatively independent systems, J. Combin. Design 6 (1998) 411–416.

152

K.J. Nurmela / Discrete Applied Mathematics 138 (2004) 143 – 152

[5] C.J. Colbourn, Orthogonal arrays of index more than one, in: C.J. Colbourn, J.H. Dinitz (Eds.), The CRC Handbook of Combinatorial Designs, CRC Press, Boca Raton, 1996, pp. 172–178. [6] L. Gargano, J. KRorner, U. Vaccaro, Sperner capacities, Graphs Combin. 9 (1993) 31–46. [7] F. Glover, Tabu search—part I, ORSA, J. Comput. 1 (1989) 190–206. R [8] P.R.J. Osterg aS rd, Constructions of mixed covering codes, Research Report A18, Digital Systems Laboratory, Helsinki University of Technology, Espoo, 1991. [9] S. Poljak, A. Pultr, V. RRodl, On qualitatively independent partitions and related problems, Discrete Appl. Math. 6 (1983) 193–205. [10] S. Poljak, Zs. Tuza, Improved bounds for the number of qualitatively independent partitions, J. Combin. Theory Ser. A 51 (1989) 111–116. [11] G. Roux, k-propriGetGes dans des tableaux de n colonnes; cas particulier de la k-surjectivitGe et de la k-permutivitGe, Ph.D. Thesis, University of Paris 6, Paris, 1987. [12] G. Seroussi, N.H. Bshouty, Vector sets for exhaustive testing of logic circuits, IEEE Trans. Inform. Theory 34 (1988) 513–522. [13] N.J.A. Sloane, Covering arrays and intersecting codes, J. Combin. Design 1 (1993) 51–63. [14] B. Stevens, Transversal covers and packings, Ph.D. Thesis, University of Toronto, Toronto, 1998. [15] B. Stevens, A. Ling, E. Mendelsohn, A direct construction of transversal covers using group divisible designs, Ars Combin. 63 (2002) 145–159. [16] B. Stevens, L. Moura, E. Mendelsohn, Lower bounds for transversal covers, Design Codes Cryptogr. 15 (1999) 279–299.