Tabu search for covering arrays using permutation vectors

Tabu search for covering arrays using permutation vectors

Journal of Statistical Planning and Inference 139 (2009) 69 -- 80 Contents lists available at ScienceDirect Journal of Statistical Planning and Infe...

466KB Sizes 0 Downloads 62 Views

Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Tabu search for covering arrays using permutation vectors Robert A. Walker II, Charles J. Colbourn∗ Computer Science and Engineering, Arizona State University, P.O. Box 878809, Tempe, AZ 85287, USA

A R T I C L E

I N F O

Available online 27 May 2008 Keywords: Covering array Orthogonal array Permutation vector Tabu search Heuristic search

A B S T R A C T

A covering array CA(N; t, k, v) is an N × k array, in which in every N × t subarray, each of the vt possible t-tuples over v symbols occurs at least once. The parameter t is the strength of the array. Covering arrays have a wide range of applications for experimental screening designs, particularly for software interaction testing. A compact representation of certain covering arrays employs “permutation vectors” to encode vt × 1 subarrays of the covering array so that a covering perfect hash family whose entries correspond to permutation vectors yields a covering array. We introduce a method for effective search for covering arrays of this type using tabu search. Using this technique, improved covering arrays of strength 3, 4 and 5 have been found, as well as the first arrays of strength 6 and 7 found by computational search. © 2008 Elsevier B.V. All rights reserved.

1. Introduction A covering array CA(N; t, k, v) is an N × k array in which every subarray induced by a selection of t columns contains all possible t-tuples over v symbols. Fig. 1 shows a CA(13; 3, 10, 2). A CA(vt ; t, k, v) is an orthogonal array, denoted OA(t, k, v); in this case every t-tuple occurs exactly once. The smallest N for which a CA(N; t, k, v) exists is the covering array number, denoted CAN(t, k, v). Screening experiments are often used to indicate factors and levels that impact response; once such factors are identified, more detailed models can then be constructed to measure main effects and interactions. A particular case arises in testing a complex system for unexpected interactions; in experimental design, covering arrays arise primarily in this setting. Covering arrays have been the focus of much research, primarily due to their applications in software and hardware interaction testing. These applications are discussed in Cohen et al. (1997) and Colbourn (2004). Applications in biological sciences also arise (Shasha et al., 2001). Our focus is on construction techniques, rather than on the specific application to experimental design. Techniques used to construct covering arrays include recursive methods (for examples see Hartman and Raskin, 2004; Martirosyan and Van Trung, 2004; Sloane, 1993), algebraic methods (Chateauneuf and Kreher, 2002; Hedayat et al., 1999), and computational search such as in Cohen (2004, 2005) and Nurmela (2004). Recently, Sherwood et al. (2006) exploited a compact representation of covering arrays based on permutation vectors. When v is prime or a prime power, a covering perfect hash family CPHF(n; k, vt−1 , t) is an n × k array on vt−1 symbols such that every n × t subarray contains at least one row which is “covering” in the following sense. The vt−1 symbols in a CPHF can be viewed as a (t − 1)-tuple on v symbols. This (t − 1)-tuple represents a permutation vector of length vt over the elements of the finite field Fv . Given a (t − 1)-tuple (h1 , h2 , . . . , ht−1 ) with hj ∈ {0, 1, . . . , v − 1} for 1  j  t − 1, −−−−−−−−−−−→ (i) (i) (i) (i) a permutation vector (h1 , h2 , . . . , ht−1 ) of length vt has the symbol (ht−1 · t−1 ) + · · · + (h2 · 2 ) + (h1 · 1 ) + 0 in position i



Corresponding author. Tel.: +1 480 727 6631; fax: +1 480 965 2751. E-mail addresses: [email protected] (R.A. Walker), [email protected] (C.J. Colbourn).

0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.05.020

70

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

Fig. 1. A CA(13; 3, 10, 2).

 (i) where i is represented in base v as i = t−1 vk · k . A row is covering if the expansion of the permutation vectors into columns k=0 results in an OA(t, t, v). When every symbol in a CPHF is expanded in this manner, the result is a covering array. (i)

When i < v, k =0 for k > 0. Hence, every permutation vector starts with the sequence 0, 1, . . . , v−1. Eliminating these duplicate rows leads to the key theorem of Sherwood et al. (2006): Theorem 1.1. If v is a prime or a prime power, and a CPHF(n; k, vt−1 , t) exists, then a CA(n · (vt − v) + v; t, k, v) exists. We typically omit the exponent, and refer to a CPHF(n; k, v, t) instead of a CPHF(n; k, vt−1 , t). Using backtracking, Sherwood et al. (2006) found covering arrays for strengths 3 and 4 that improve upon other known constructions. In this paper, we employ the permutation vector representation as the basis of a tabu search method. In this way, we find a number of improved covering arrays for strengths 3–5; more surprisingly, we find the first covering arrays of strength 6 and 7 from computer search. We conclude by presenting the first existence tables for covering arrays of strength 5, partly to demonstrate the utility of the arrays found by the heuristic search method. 2. Forming CAs from CPHFs In order to understand the construction underlying Theorem 1.1, we show the expansion of the following CPHF(2; 10, 3, 3) into a CA: 11 10

00 01

22 11

21 11

01 00

02 22

10 01

11 02

02 20

12 12

Write each of the vt−1 symbols as a t − 1 tuple on v symbols (in this case, the 32 symbols as 2-tuples on 3 symbols). To convert the symbol 11 (h1 = 1, h2 = 1) into a vector of length 33 each row number i is written as a vt tuple. Hence for example i = 0 is written as i = 000 and i = 17 is written as i = 122. For row i = 000, the vector is assigned the value 0 · 1 + 0 · 1 + 0 = 0. Continuing in this manner, i = 001 : 0 · 1 + 0 · 1 + 1 = 1 i = 002 : 0 · 1 + 0 · 1 + 2 = 2 i = 010 : 0 · 1 + 1 · 1 + 0 = 1 i = 011 : 0 · 1 + 1 · 1 + 1 = 2 i = 012 : 0 · 1 + 1 · 1 + 2 = 0 i = 020 : 0 · 1 + 2 · 1 + 0 = 2 i = 021 : 0 · 1 + 2 · 1 + 1 = 0 i = 022 : 0 · 1 + 2 · 1 + 2 = 1 .. . i = 212 : 2 · 2 + 1 · 1 + 2 = 1 i = 220 : 2 · 2 + 2 · 1 + 0 = 0 i = 221 : 2 · 2 + 2 · 1 + 1 = 1 i = 222 : 2 · 2 + 2 · 1 + 2 = 2

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

71

T

Fig. 2. A CA(54; 3, 10, 3) .

Expanding each symbol of the CPHF in this manner, the CA(54; 3, 10, 3) shown transposed in Fig. 2 is obtained. The first v rows (columns as shown) of every permutation vector are the same, in this case 0, 1, 2. We only need to use one copy of each. So we reduce the CA(54; 3, 10, 3) to a CA(51; 3, 10, 3). This process is described in more detail in Sherwood et al. (2006). 3. Mathematical preliminaries We first discuss how to determine whether a set of permutation vectors is covering. Consider t permutation vectors: −−−−−−−−−−−−→ −−−−−−−−−−−−−→ −−−−−−−−−−−−−→ (1) (1) (1) (2) (2) (2) (t) (t) (t) (h1 , h2 , . . . , ht−1 ), (h1 , h2 , . . . , ht−1 ), . . . , (h1 , h2 , . . . , ht−1 ) This set of permutation vectors is covering if its expansion into a vt × t array is an orthogonal array. To check if this condition is not met, we check to see if the array contains some t-tuple twice as a row. Hence, a set of permutation vectors is non-covering if and only if we can find distinct i, j ∈ {0, 1, . . . , vt − 1} so that (j) (j) (j) (j) (1) (i) (1) (i) (1) (i) (1) (1) (1) (i) + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) 0 (j) (j) (j) (j) (2) (i) (2) (i) (2) (i) (2) (2) (2) (i) + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) 0

(1)

.. . (j) (j) (j) (j) (t) (i) (t) (i) (t) (i) (t) (t) (t) (i) + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) 0 (i)

(j)

Write r = r − r for 0  r  t − 1 with fixed i and j. Then rewrite (1) as (1) (1) 0 + (h(1) · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 1 (2) (2) 0 + (h(2) · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 1

(2)

.. . (t) (t) 0 + (h(t) · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 1

The set of permutation vectors is non-covering if and only if there exist {r : 0  r  t − 1}, with i = 0 for at least one i, that solve the system of linear equations (2). Ass an example, consider the array: CPHF(3;22,3,3) 11 11 21 10 20 21 22 21 01 22 21 00

21 11 01

22 01 02

02 01 12

20 11 10

12 01 10

01 00 22

00 20 21

01 22 12

11 10 02

10 12 12

00 12 22

02 02 20

00 02 00

20 20 11

12 10 20

10 00 10

21 10 11

22 11 20

Each symbol represents a permutation vector. For instance, the symbol 01 represents the vector with h1 = 0, h2 = 1. For the example, the field F3 is simply Z3 .

72

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

Let us consider the first three columns of this array. The first row contains 11 11 21. This row is non-covering due to the duplication of the vector represented by 11. The second row contains 20 21 22. For this set of vectors, a solution to (2) is given by (0 , 2 , 2 ) = (1, 1, 0) (arithmetic in Z3 ). 1+1·2+0·0=1+2=0 1+1·2+0·1=1+2=0 1+1·2+0·2=1+2=0 Hence the second row yields a distinct non-covering tuple. However, the third row, 01 22 21, is covering because there is no choice of (0 , 2 , 2 ) to solve (2). Because this row is covering, it expands into an orthogonal array, shown transposed: ⎡

0 ⎣0 0

1 1 1

2 2 2

0 1 2 0 2 0

2 1 1

0 1 2 1 2 0 1 2 0

1 2 1

2 0 0 1 2 0

1 2 0 1 2 0 0 1 2

1 2 0 0 1 2 2 0 1

2 1 2

0 1 2 0 0 1

2 0 1 0 1 2 1 2 0

2 0 2 0 0 1

⎤ 1 1⎦ 2

We include the three constant rows (columns as shown) only once each. Hence the CPHF(3; 22, 3, 3) yields a CA(3 · (33 − 3) + 3 = 75; 3, 22, 3) and establishes the bound CAN(3, 22, 3)  75. 4. Tabu search Nurmela's tabu search results for covering arrays (Nurmela, 2004) treat the actual covering array. Our tabu search method searches instead for the covering perfect hash family in order to reduce computation time significantly. We maintain a current candidate array with an associated score. We also maintain a list of recent states as a tabu list. We then generate a neighborhood of moves. We choose from these the move with the best score that does not take us into a state in the tabu list. More information on the tabu search method can be found in Glover and Laguna (1997). The score S of a given candidate array is the number of sets of t columns that have no covering row; such a set of columns is uncovered, and each column in an uncovered set is deficient. By definition, 0  S  ( kt ) and an array with score S = 0 is a CPHF. A move changes one element of the array to a new value. Not all potential moves need to be considered. Changing an element within a column that is not deficient can have no positive effect. Therefore, we limit the neighborhood of moves to deficient columns. We denote the number of deficient columns as D, so that D  k and D  S · t. There are nD(vt−1 − 1) moves to consider. For each, we compute the score the new array would have. We cache information about which set(s) of columns are covered by which row(s) and thus consider only the ( k−1 t−1 ) sets of columns that contain the element being changed. This is slightly less efficient than the “cost change table” discussed in Fleurant and Ferland (1996), but employs similar ideas and uses less memory. In the event that two or more moves result in the same best score, we choose among them randomly. We maintain a tabu list of the last 50,000 moves made. This number was chosen after extensive experimentation; larger tabu lists did not improve the results in our cases. Using this list, we are able to generate a list of moves that take the current array back to a tabu array. This is discussed in more detail in Section 4.1. We start the search with a randomly generated array. At the beginning of the search, we usually have D = k, which leads to a large number of moves to consider. It is therefore helpful to restrict the neighborhood examined. To do this, we select one column in a weighted random fashion, where the weight of each column is the number of non-covering sets to which it belongs. By restricting the neighborhood to changes within this column, we can increase the speed by a factor of D with only a minor decrease in search effectiveness. Once D becomes small (ideally D < k), we remove this neighborhood restriction. When D does not reach the threshold to be considered small, we turn the neighborhood restriction off when the search appears to “stall”. 4.1. The tabu list Tabu search relies on keeping a fixed-length list of recent states known as a tabu list. The search program is then prohibited from revisiting these states. This technique helps to prevent cycles. For large arrays, storing a long list of arrays is prohibitive both for memory and computation time, when one must compare each target array with every array in this list. Nurmela (2004) employs a general strategy discussed in Glover and Laguna (1997) in order to simplify the tabu condition. Instead of storing a long list of recent arrays, Nurmela stores a short list of recent positions that were modified. Changes to these positions are prohibited until enough moves have occurred that they are no longer considered tabu. This technique prevents revisiting any recent states; however, it can disallow moves that should not be tabu. The size of the tabu list with this technique must be shorter than the size of exact tabu lists; otherwise it is possible to make every move tabu. In the search, we keep an undo list of recent moves on which a move is specified by a row, a column, and the value that was replaced. At each iteration of the search, we can convert this to a list of moves taking the current array back to a tabu state. To accomplish this, we use Algorithm 1. It works by keeping a list of differences between the old array and the current array.

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

73

Whenever there is exactly one position that differs, we declare changing that position to the value it held previously to be a tabu move. Algorithm 1. Convert undo list into tabu list. StateCount ← 0 Changes ← ∅ TabuMoves ← ∅ for Undo ∈ UndoHistory from most recent to oldest do if CurrentArray[Undo.Row][Undo.Col] = Undo.OldValue then Remove [Undo.Row][Undo.Col] from Changes StateCount ← StateCount − 1 else if [Undo.Row, Undo.Col] in Changes then Changes[Undo.Row][Undo.Col] = Undo.OldValue else Insert [Undo.Row][Undo.Col] = Undo.OldValue into Changes StateCount ← StateCount + 1 end if end if if StateCount = 1 then Insert Changes[1] into TabuMoves end if end for Using this more exact tabu list resulted in much faster searches and better results than using the technique given by Nurmela. This algorithm can be used for any tabu search that operates on arrays. Similar techniques are discussed in much more detail in Glover and Laguna (1997); in particular, the method of “reverse elimination” discussed therein employs logical structure to infer tabu moves rather than storing them explicitly, as we do here. 4.2. The non-covering test The search program very frequently needs to test whether a given set of permutation vectors is covering or non-covering. Since this test comprises the most executed portion of the program, it is important for it to run as fast as possible. Recall that a given set of permutation vectors is non-covering if and only if there exist non-zero r that solve the equation (i) (i) 0 + (h(i) × 1 ) + (h2 × 2 ) + · · · + (ht−1 × t−1 ) = 0 1

(3)

for all h(i) in the set. It is possible to perform a lot of this work before the search begins. We think of it in reverse, as follows. Given non-zero values r there are vt−2 permutation vectors h(i) so that (3) holds (Sherwood et al., 2006). We can pre-compute the set of vectors that is “solved” by each assignment of the r 's. Call this set V({0 , 1 , . . . , t−1 }). Then, a given set T of permutation vectors is non-covering if and only if there exists non-zero r 's so that T ⊆ V({0 , 1 , . . . , t−1 }). Testing this condition is much faster than solving the linear system, and uses very little memory. In the worst case, this method runs in O(t · vt−1 · (t − 2) log v) time since there are t elements to test for inclusion in O(vt−1 ) sets using binary search on vt−2 elements. Because it uses so little memory, we use this method for large t and v, specifically when v2(t−1) > 5 000 000. Another method is to compute the set V−1 ((h1 , h2 , . . . , ht−1 )) of r 's that solves (2) for permutation vector (h1 , h2 , . . . , ht−1 ). Then a given set T of permutation vectors is non-covering if and only if  V−1 (h) = ∅ (4) h∈T

We accelerate this by storing not only V−1 (h) but also the values of V−1 (h) ∩ V−1 (k) for any two permutation vectors h and k. This uses significantly more space than the previous method. However, storing the V−1 sets in a sorted list let us check the non-covering condition in O(( t/2 − 1)vt−1 ) time. When t = 4 this is just O(v3 ) time. This method is used for t > 4 and t = 4, v  5.

Fig. 3. A WCA(4; 8, 4).

74

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

For the fastest check, we simply store a single bit for every possible set of permutation vectors: 1 if it is covering, and 0 if it is not. This permits a constant time check, but requires O((vt−1 )t ) space, which is prohibitive for large t or v. We use this check when t = 3 and when t = 4 and v  4. 4.3. Related search problems The method developed for CPHF search can be applied to many similar search problems. Let C = {Ci : i = 1...} be a set of subsets of tuples of the same length over an alphabet of size v. Let the length of the tuples in set Ci be denoted ti . A C-(N, k)-array is an N × k array with entries from the same alphabet of size v, in which every N × t subarray has the property that for every i with 1  i  , there exists a row of the subarray equal to a t-tuple in Ci . No assumption is made that Ci and Cj are disjoint, nor that ti = tj , nor that a given tuple appears in any of the sets. We also define a C∗ -array to have the same property when we only consider N × t subarrays where the columns maintain their original ordering. Any C-array can be formulated as a C∗ -array; however, the converse is not necessarily true. Taking C to be the set of vt singleton sets each containing a distinct t-tuple, a C∗ -array is a covering array. If we consider all groups of Ci whose elements are merely permutations of each other, and keep only one of each in C, a C-array is a covering array. Taking C to have a single set in which t-tuples with distinct entries appear makes a C∗ -array a perfect hash family, and restricting the set to contain only the covering t-tuples makes a C∗ -array a CPHF. In general, the description of C is not an explicit listing of tuples; rather an oracle to test membership of a tuple in Ci is assumed. The generic method of search has been used to find perfect hash families in Walker and Colbourn (2007). The search method can find a C∗ -array for any set C, but we optimized it for the case when |C| = 1. To demonstrate the generic nature of this method, set C to be a single set C1 consisting of common 4-letter English words. We refer to this as a WCA: word covering array of strength 4, and Fig. 3 gives an example. While the practical implications of such an array are likely non-existent, it is an effective demonstration of the generic nature of the algorithm. It is also interesting to see the kind of patterns “utilized” by the resulting array, since many of these same patterns probably appear less visibly in CPHFs. 5. Results Given n, v and t, the maximum value of k for which a CPHF was found is shown in Tables 1–5. Items marked with an asterisk appeared in Sherwood et al. (2006). All others were created with the method in this paper. Most of the arrays in these tables were found by a 2.66 GHz Pentium 4 in less than 10 min. Some arrays with k columns required more than a day of search; however, finding an array with k − 1 columns in these cases always took less than an hour. The most difficult arrays to find generally are those listed at the borders of the table, i.e. those with the highest k for a given v and t. This difficulty is based on the added size of the search space and the additional memory needed to process the non-covering condition. Explicit solutions appear in Walker (2005). We give one array of each strength in Table 6. 5.1. Analysis and reduction In order to assess the quality of the covering arrays generated, we implemented an analysis program that tests for flexibility in the array. One form of flexibility is a “don't-care” position, a position whose value can be changed without affecting the covering property. Of the 24 strength 3 arrays generated, only 3 have don't-care positions, and they occur in fewer than 1% of the positions. For strength 4, 5 arrays of 25 have don't-care positions, making up roughly 4.4% of the positions. For strength 5, 5 of 12 arrays have don't-care positions, making up 4.5% of the positions. For strength 6, 7 of 9 arrays have flexible positions making up 8.7% of the positions. Finally, for strength 7, all 6 arrays have flexible positions making up 4.7% of the positions. The larger proportion of Table 1 Table for k, given n and v, where t = 3 v

3 4 5 7 8 9 11 13

n 2

3

4

5

6

7

10 16∗ 24∗ 32 40∗ 41 50 59

22 34 48 81 91 113 146 200

37 64 95 150 200 225

57 118 160

89 222

142

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

75

Table 2 Table for k, given n and v, where t = 4 v

2 3 4 5 7 8 9 11 13

n 2

3

4

5

6

7

8∗ 10∗ 13 15 18 20 21 23 24

9 16∗ 20 24 30 36 39

12 23 31 35 54

30 42 62

39

50

Table 3 Table for k, given n and v, where t = 5 v

n 2

3

4

5

6

2 3 4

10 10 11

12 13 15

14 16

16 19

17

2

3

4

5

6

7 10 11

9 12

11 14

12

14

2

3

4

5

6

9 12

11 12

12 13

12

13

Table 4 Table for k, given n and v, where t = 6 v

2 3 4

n

Table 5 Table for k, given n and v, where t = 7 v

2 3

n

don't-care positions as strength increases appears to be due to the additional complexity of searching for higher-strength arrays, in turn suggesting that better arrays are more likely to exist in these cases. A higher degree of flexibility is indicated by the presence of a redundant row. This does not happen for any of the strength 3 arrays. However, it happens once for strength 4, twice for strength 5, six times for strength 6, and five times for strength 7. This again shows a strong trend of higher quality for lower strength. Removing these extra rows produces the following improved arrays:

CA(54; 4, 12, 2) CA(110; 5, 14, 2) CA(176; 5, 17, 2) CA(96; 6, 7, 2) CA(160; 6, 9, 2) CA(232; 6, 11, 2) CA(272; 6, 12, 2) CA(322; 6, 14, 2) CA(1449; 6, 10, 3) CA(224; 7, 9, 2) CA(364; 7, 11, 2) CA(452; 7, 12, 2) CA(572; 7, 13, 2) CA(4347; 7, 12, 3) 5.2. Recursive bound improvements The results found are often the best known covering arrays with small k. Thus they have the additional value of being excellent ingredients to recursive constructions. Establishing these claims is by no means straightforward, since to the best of our knowledge no one has tabulated values of CAN(5, k, v). We therefore developed a Maple environment to incorporate all of the known direct

76

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

Table 6 An example for each strength CPHF(2; 16,4,3) 32 20 13 00 23 11

10 12

CPHF(2; 10,3,4) 122 100 112 100

21 02

020 101

CPHF(3; 13,3,5) 1001 1111 0010 0101 1021 2110

1002 2001 2010

22 13

11 30

202 221

0110 2200 0102

02 03

23 10

112 122

1012 0200 0012

01 01

00 32

222 200

1201 0121 2122

0000 0211 2111

12 31

03 22

201 021

2112 1220 0112

0210 2222 1212

30 33

000 201

2101 2112 0110

33 20

31 21

212 120

1110 1111 1022

2020 1112 0021

120 011

0221 0102 1001

CPHF(3; 9,2,6) 00101 00000 11001 10100 10010 01001

01010 10101 01100

10101 00000 01011

01100 11101 01000

00110 01111 01110

10000 11111 11110

10100 01011 11010

10001 00101 10001

CPHF(2; 9,2,7) 110000 101101 100000 111010

111101 010001

100100 110111

000111 001111

101010 111111

000010 000001

101110 111100

111011 111011

Table 7 Bounds on CAN(5, k, v) for 2  k  9 v

k 10

25

50

3

483

2007

3765

5997

999

2689

4113

6749

5

6

7

8

9

392 392

250

62 112

4

287 347

100

2

644 674

500 1007 1037

1000

2500

5000 2744 2744

10000

1464 1494

2047 2077

3038 3614

3926 4502

8667

12 969

18 369

28 491

36 651

42 147

10 517

15 247

21 103

31 994

41 278

51 471

2044

7396

13 656

20 440

41 245

57 663

78 713

110 443

143 562

153 132

3456

12 736

19 776

29 354

46 561

64 593

86 765

122 088

156 833

195 258

9875

31 735

49 325

71 095

113 089

156 847

210 737

303 763

387 913

454 755

9875

37 259

55 809

82 603

126 789

175 571

235 461

334 279

426 269

494 739

24 474

78 876

133 942

202 550

346 098

492 420

692 610

1 002 556

128 4624

1 441 860

24 474

82 196

149 102

226 140

366 853

520 135

729 525

1 074 301

1 364 799

1 460 130

48 363

104 433

179 417

268 765

461 015

648 259

899 415

1 334 711

1 686 031

2 146 675

48 363

114 729

212 429

311 893

517 157

712 753

985 653

1 479 395

1 840 831

2 319 151

59 048

191 112

332 696

520 977

806 834

1 124 979

1 509 029

2 296 905

2 951 596

3703 209

59 048

212 623

400 904

589 185

896 553

1 280 603

1 664 653

2 463 183

3 156 290

3 907 903

59 049

326 925

575 127

910 945

1 410 635

1 981 605

2 677 055

3 784 425

4 891 283

6 159 485

59 049

367 757

703 575

1 039 393

1 579 915

2 275 365

2 970 815

4 047 337

5 315 539

6 583 741

and recursive constructions for covering arrays of strengths 2–5. For strengths 3 and 4, these tables and the known constructions appear in Colbourn et al. (2006). For strength 5, we use the Roux-type constructions of Martirosyan and Colbourn (2005) and Martirosyan and Van Trung (2004), the perfect hash family construction (Colbourn et al., 2006; Martirosyan and Van Trung, 2004), the Turán squaring construction of Hartman (2005), as well as direct constructions. The direct constructions arise from orthogonal arrays (Hedayat et al., 1999) and other computational search techniques (Cohen, 2004, 2005; Cohen et al., 2008; Nurmela, 2004). Now to illustrate the impact of the new direct constructions, we present the best-known values for CAN(5, k, v) before and after the CPHF constructions are included. In Table 7, two upper bounds on CAN(5, k, v) are given for 2  v  9 and k ∈ {10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10 000}.

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

77

Table 8 Bounds for (N; 5, 2) 6

32o

12

v

92

17

176v

28

i

287

40

j

375

144

644q

10

62v

14

110

v

16

152v

20

194j

24

261i

32

i

34

357j

q

81

434q

8

176

j

1002

252

1007j

288

j

1031

56u

330

64

392

162

734j

169

770q

192

1005

j

224

1006j

256

1025j

280

1026j

320

j

324

1123j

q

1121

338

j

1217

361

1358

378

1455j

384

1456j

448

1461j

504

1464j

512

j

1504

560

1507

j

576

1519j

640

1609j

648

1614j

676

1708j

722

1849j

756

1946j

768

1947j

800

j

1952

896

j

2023

924

j

2029

1008

j

2047

1024

j

2096

1120

2102j

1152

2123j

1280

2213j

1296

2223j

1352

2317j

1408

2458j

1444

2461j

1512

j

1536

j

1600

j

2565

1620

j

1792

j

1848

2717j

6864

j

4096

2558

q

2744

6561

2559

q

3038

3614

8192

2706

j

3632

10 000

2707

j

3926

The first bound given uses the CPHFs produced here in conjunction with the known recursive constructions, and all other direct constructions of which we are aware. The second entry gives the bound calculated in the same manner, but omitting the CPHF direct constructions. Two things are striking. The impact of the “small” arrays produced on the recursions makes a substantial improvement for v ∈ {2, 3, 4}. Perhaps more surprising is the improvements for 5  v  9, since these result from the use in the recursions of covering arrays of strengths 3 and 4. Since CPHF constructions improve the bounds on CAN(t, k, v) for t ∈ {3, 4}, improvements arise for strength five as well. We provide more detailed information in Tables 8–9 for v ∈ {2, 3, 4}, presenting the tables after the use of covering perfect hash families. Let (N; t, v) be the largest k for which CAN(t, k, v)  N. As k increases, for many consecutive numbers of factors (columns), the covering array number does not change. Therefore reporting those values of (N; t, v) for which (N; t, v) > (N − 1; t, v), along with the corresponding value of N, enables one to determine all covering array numbers when k is no larger than the largest (N; t, v) value tabulated. Since the exact values for covering array numbers are unknown in general, we in fact report lower bounds on (N; t, v). For each strength in turn, explicit constructions of covering arrays from direct and computational constructions are tabulated. Then each known construction is applied and its consequences tabulated (in the process, results implied by this for fewer factors are suppressed, so that one explanation (“authority”) for each entry is maintained). Applications of the recursions is repeated until no entries in the table improve. The authorities used are: h j q u z

perfect hash family (Martirosyan and Van Trung, 2004) Roux-type doubling (Martirosyan and Colbourn, 2005) Turán squaring (Hartman, 2005) miscellaneous direct construction composition

i o s v

Roux-type doubling (Martirosyan and Van Trung, 2004) orthogonal array (Hedayat et al., 1999) simulated annealing (Cohen, 2004, 2005) permutation vector (this paper)

For each v, we tabulate the entries for N and (N; 5, v). We also provide a plot showing the logarithm of the number of factors horizontally and the size of the covering array vertically. The plot simply demonstrates the growth rate, and the computed bound is that provided by the points tabulated. Note Added: Since this research was completed, three new methods have been proposed for the construction of covering arrays of higher strength: FireEye (Lei et al., 2007), density (Bryce and Colbourn, 2007), and PaintBall (Kuhn, 2006). Each is an heuristic method, and each results in some improvements to the tables presented.

78

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

Table 9 Bounds for (N; 5, 3) 6

243o

7

377s

8

457s

10

483v

13

723v

16

963v

i

26

2007j

38

j

2643

40

2970j

48

j

52

3765j

q

19

v

1203

20

j

32

2247

46

j

3555

60

j

1530

3711

64

4113

81

4347q

86

j

5733

92

j

5787

96

5943j

102

5997j

104

6335j

169

6507q

256

q

8667

258

j

268

j

306 322 361

4005

9977

264

9949j

j

288

10 035j

308

j

10 299

312

10 301j

324

j

338

10 413j

j

276 j

10051

j

10377

9945

10 031

10 393

q

10827

388

12 661

396

12 715j

414

12 771j

432

12 929j

444

12 937j

450

12 949j

456

12 957j

468

12 965j

512

12 969j

516

14 429j

522

14 433j

528

14 449j

536

14 477j

540

14 531j

552

14 661j

576

14 669j

594

14 713j

600

j

14 729

612

j

14 745

616

j

624

j

644

j

15 137

648

15 153j

666

j

15 181

676

j

704

j

15 671

722

15 755j

736 792

j

17 589 17 867j

738 800

17 657 17 947j

776 852

j

17 705 18 137j

788 864

17 759j 18 161j

888

18 197j

894

948

18 289j

968

1058

j

19 877

1072

j

20 153

1176

j

1188

j

20 495

j

15 205

j

j

15 061

j

702

15 635

774 840

j

17 673 18 117j

17 665 17 959j

18 209j

900

18 225j

912

18 233j

918

18 241j

936

18 285j

18 343j

1024

18 369j

1032

19 829j

1044

19 833j

1056

19 849j

1080

j

1104

20 337

j

1134

j

20 345

1152

20 451j

20 669

j

1232

j

1242

20 985j

j

20 499

j

1200

20 207

j

20 653

1224

21 247

1332

21 291

1350

21 343

1352

21 359j

1408

21 837

j

1428

j

21 921

1444

j

21 975

1458

23 809j

23 965j

1512

23 973j

1536

23 981j

1548

24 117j

1552

24 149j

24 311j

1600

24 391j

1656

24 403j

1680

24 561j

1704

24 581j

1776

24 695

j

1788

j

1800

j

24 723

1824

25 031j

25 157

j

1944

j

25 237

1998

25 253j

26 773

j

j

2160

27 275

j

1288

21 231

1368

j

21 773

1404

j

21 797

1472

23 897j

1476

1576

24 203j

1584

1708

j

24 605

1728

j

24 659

1836

j

1872

j

25 153

2052

j

26 749

2144

j

27 221

1296

1896

24 707

j

1936

25 211

2088

j

26 777

2106

26 793

2112

26 833j

2208

27 409

j

2214

j

27 485

2268

27 497j

2048

25 289

2116

j

26 897

2272

27 607j

2304

27 615j

2322

27 659j

2352

27 703j

2376

27 707j

2400

27 893j

2448

27 913j

2464

28 243j

2484

28 245j

2496

28 415j

2520

28 491j

2556

28 499j

2576

28 539j

2592

28 555j

2640

28 627j

2664

28 647j

2700

28 699j

2704

28 715j

2736

j

29 129

2754

j

29 153

2808

29 225

j

2816

j

29 265

2856

j

29 349

2880

29 403j

2888

j

j

31 331

j

2948

j

31 399

2952

j

3024

31 515j

31 691

j

3104

j

31 723

3132

j

31 777

3152

31 781j

29 409

2064

j

20 983

j

21 155

j

15 619

j

756 828

1248

25 039

684

15 059

2916

31 243

3042

j

31 523

3072

j

31 555

3168

31 889j

3174

31 985j

3200

32 245j

3240

32 257j

3312

32 277j

3360

32 435j

3402

32 455j

3408

32 497j

3416

32 521j

3456

32 575j

3510

32 611j

3528

32 635j

3552

32 687j

3564

32 699j

3576

32 759j

3592

32 775j

3600

32 829j

3648

33 137j

3672

j

33 145

3744

j

33 259

3792

33 263

j

3872

j

33 317

3888

j

33 343

3996

33 359j

4096

j

33 395

4104

j

34 855

34 879

j

4176

j

34 883

4212

j

34 899

4224

34 939j

4232

j

j

35 381

j

4416

j

35 515

4428

j

35 591

4536

35 603j

j

4704

35 809

j

4728

j

35 813

4752

35 867j

4968

36 405j

4992

36 575j

5040

36 651j

35 003

2944 3096

4128

4288

35 327

4544

j

35 713

4608

j

4320

35 721

4644

35 765

4800

36 053j

4896

36 073j

4928

36 403j

31 507

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

79

Table 10 Bounds for (N; 5, 4) 6

1024o

11

2044v

15

3064v

16

5856j

20

6064j

22

6168j

24

j

7292

26

j

30

j

8256

j

28

8152j

32

j

11 048

40

11 256j

48

j

52

13 656j

j

7396

44

12 116

56

j

14 412

58

14 516

60

14 620j

62

j

17 516

64

j

18 272

68

18 480j

76

18 584j

80

18 688j

84

19 652j

121

q

124

j

27 242

128

27 998j

136

j

28 893

138

29 048j

160

j

29 274

169

30 640q

j

130

20 440

28 206

j j

13 240

152

29 152

170

j

31 836

190

31 932

192

32 067j

200

j

32 142

216

j

32 523

224

32 568j

232

32 613j

240

32 970j

242

33 015j

248

39 817j

250

41 245j

256

41 269j

260

41 582j

272

42 269j

276

42 484j

280

j

42 588

288

j

42 737

290

42 841

j

300

42 886

j

304

j

42 931

310

43 053j

320

43 290

j

328

j

44 764

44 868

j

45 369

j

340

j

46 565

352

46 769j

47 180

j

372

j

47 284

47 341

j

47 476

j

400

j

47 655

416

48 111j

48 180

j

432

j

48 846

j

48 891

j

472

j

49 248

480

49 352j

j

512

57 687

j

520

j

58 495

528

59 808j

568

60 444j

576

60 548j

580

60 652j

640

61 137

j

656

j

63 343

660

63 447j

65 578

j

682

j

65 782

704

65 857j

66 678

j

768

j

66 948

800

67 229j

68 846

j

928

j

68 936

944

69 338j

j

1024

j

78 761

1040

79 674j

1136

82 337j

1152

82 441j

1280

j

83 357

1312

85 671j

1360

j

88 511

1364

88 823j

1488

j

1520

90 291j

1680

j

91 996

1728

93 004j

1920

j

93 898

1922

94 267j

368 420 484

j

49 433

544

59 976j

600

j

60 697

j

664

63 777

722

j

66 268

832

j

68 066

48 801

336 380 448

496

j

56 235

500

57 663

552

60 191j

560

60 295j

608

j

620

j

60 778

j

672

63 881

736

j

66 517

j

676 744

384 464

60 900

64 382

j

66 621

j

68 756

j j

1000

78 713

1120

81 831j

1240

82 883

j

1352

87 211

j

1472

j

680 760

840

68 135

960

j

69 442

968

j

69 811

992

76 613

1056

80 987j

1088

81 155j

1104

81 727j

1160

j

82 545

1200

j

82 635

1216

82 761

j

1320

j

85 775

1328

86 105

j

1344

j

1392

j

88 898

1408

j

1536

90 561

j

1776

93 094

j

1792

93 198

1936

94 942j

1984

101 744j

2000

2112

107 393j

2176

107 561j

2208

2280

109 351j

2304

109 455j

2320

109 559j

2480

j

2512

j

110 443

2560

j

2664

j

1584

109 969

j

2656

114 786

2744

j

117 687

2784

89 002

90 842

864

338

j j

j

1444

89 824

1600

j

1856

114 890

86 209

896

90 073

90 177

91 858

j

1888

93 794

j

103 844j

2048

103 892j

2080

105 390j

2100

106 772j

108 133j

2220

108 237j

2240

108 741j

2272

109 247j

2360

109 649j

2400

109 685j

2432

109 847j

2584

j

113 959

2624

j

2640

114 204j

2704

j

j

2728

117 582j

j

2976

119 478j

90 946

93 288

1664

j

110 547

j

2688

114 968

j

117 791

2816

j

117 895

2888

115 970

j

118 717

2720 2944

114 100

117 270

119 329

3040

119 592j

3072

119 865j

3168

120 146j

3200

120 250j

3208

121 162j

3328

121 266j

3360

121 404j

3362

122 412j

3456

123 222j

3496

123 312j

3552

123 450j

3584

123 554j

3712

j

123 689

3776

j

124 321

3800

j

124 425

3840

j

3844

125 352

j

3872

126 027j

3936

j

132 829

3968

j

132 874

4000

j

135 646

4096

135 724

j

4160

137 282j

4200

j

138 664

4224

j

139 285

4256

j

139 453

4370

j

4416

140 619j

4440

j

140 723

4480

j

141 227

4544

j

j

4640

142 090j

4720

142 333j

4750

142 369j

4800

5024

143 562j

124 470

j

4072

135 718

4352

j

139 543

141 778

4560

j

141 882

4608

141 986

142 414j

4864

142 729j

4960

142 851j

140 526

80

R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80

6. Conclusions By utilizing the compact search space afforded by covering perfect hash families, tabu search is able to find smaller arrays for higher strength more efficiently. This efficient representation of a covering array enables searches for t  5. The resulting arrays for small k improve on the best known bounds for many larger k, utilizing the arrays in recursive constructions. Acknowledgments We thank to Sosina Martirosyan for helpful discussions regarding the non-covering conditions. We also thank the referees for improving the presentation. References Bryce, R.C., Colbourn, C.J., 2007. A density-based greedy algorithm for higher strength covering arrays. Software Testing Verif. Reliab., to appear. Chateauneuf, M., Kreher, D., 2002. On the state of strength-three covering arrays. J. Combin. Des. 10 (4), 217–238. Cohen, M.B., 2004. Designing test suites for software interaction testing. Ph.D. Thesis, University of Auckland. Cohen, M.B., 2005. Private communications. Cohen, D.M., Dalal, S.R., Fredman, M.L., Patton, G.C., 1997. The AETG system: an approach to testing based on combinatorial design. IEEE Trans. Software Eng. 23 (7), 437–444. Cohen, M.B., Colbourn, C.J., Ling, A.C.H., 2008. Constructing strength 3 covering arrays with augmented annealing. Discrete Math. 308, 2709–2722. Colbourn, C.J., 2004. Combinatorial aspects of covering arrays. Le Matematiche (Catania) 58, 121–167. Colbourn, C.J., Martirosyan, S.S., Van Trung, T., Walker II, R.A., 2006. Roux-type constructions for covering arrays of strengths three and four. Designs Codes Crypt. 41, 33–57. Fleurant, C., Ferland, J.A., 1996. Genetic and hybrid algorithms for graph coloring. Ann. Oper. Res. 63, 437–461. Glover, F., Laguna, M., 1997. Tabu Search. Kluwer Academic Publishers, Norwell MA. Hartman, A., 2005. Software and hardware testing using combinatorial covering suites. In: Golumbic, M.C., Hartman, I.B.-A. (Eds.), Interdisciplinary Applications of Graph Theory, Combinatorics, and Algorithms. Springer, Norwell, MA, pp. 237–266. Hartman, A., Raskin, L., 2004. Problems and algorithms for covering arrays. Discrete Math. 284, 149–156. Hedayat, A.S., Sloane, N.J.A., Stufken, J., 1999. Orthogonal Arrays, Theory and Applications. Springer, Berlin. Kuhn, D.R., 2006. An algorithm for generating very large covering arrays, Internal Report 7308, National Institute of Standards and Technology. Lei, Y., Kacker, R., Kuhn, D.R., Okun, V., Lawrence, J., 2007. IPOG: a general strategy for t-way software testing, submitted for publication. Martirosyan, S.S., Colbourn, C.J., 2005. Recursive constructions for covering arrays. Bayreuth. Math. Schr. 74, 266–275. Martirosyan, S.S., Van Trung, T., 2004. On t-covering arrays. Designs Codes Crypt. 32, 323–339. Nurmela, K., 2004. Upper bounds for covering arrays by tabu search. Discrete Appl. Math. 138, 143–152. Shasha, D.E., Kouranov, A.Y., Lejay, L.V., Chou, M.F., Coruzzi, G.M., 2001. Using combinatorial design to study regulation by multiple input signals: a tool for parsimony in the post-genomics era. Plant Physiology 127, 1590–1594. Sherwood, G., Martirosyan, S.S., Colbourn, C.J., 2006. Covering arrays of higher strength from permutation vectors. J. Combin. Design 14, 202–213. Sloane, N.J.A., 1993. Covering arrays and intersecting codes. J. Combin. Design 1, 51–63. Walker, R.A. II, 2005. Covering Arrays and Perfect Hash Families, Ph.D. Thesis, Computer Science and Engineering, Arizona State University. Walker, R.A. II, Colbourn, C.J., 2007. Perfect hash families: constructions and existence. J. Math. Crypt. 1, 125–150.