Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
Tabu search for covering arrays using permutation vectors Robert A. Walker II, Charles J. Colbourn∗ Computer Science and Engineering, Arizona State University, P.O. Box 878809, Tempe, AZ 85287, USA
A R T I C L E
I N F O
Available online 27 May 2008 Keywords: Covering array Orthogonal array Permutation vector Tabu search Heuristic search
A B S T R A C T
A covering array CA(N; t, k, v) is an N × k array, in which in every N × t subarray, each of the vt possible t-tuples over v symbols occurs at least once. The parameter t is the strength of the array. Covering arrays have a wide range of applications for experimental screening designs, particularly for software interaction testing. A compact representation of certain covering arrays employs “permutation vectors” to encode vt × 1 subarrays of the covering array so that a covering perfect hash family whose entries correspond to permutation vectors yields a covering array. We introduce a method for effective search for covering arrays of this type using tabu search. Using this technique, improved covering arrays of strength 3, 4 and 5 have been found, as well as the first arrays of strength 6 and 7 found by computational search. © 2008 Elsevier B.V. All rights reserved.
1. Introduction A covering array CA(N; t, k, v) is an N × k array in which every subarray induced by a selection of t columns contains all possible t-tuples over v symbols. Fig. 1 shows a CA(13; 3, 10, 2). A CA(vt ; t, k, v) is an orthogonal array, denoted OA(t, k, v); in this case every t-tuple occurs exactly once. The smallest N for which a CA(N; t, k, v) exists is the covering array number, denoted CAN(t, k, v). Screening experiments are often used to indicate factors and levels that impact response; once such factors are identified, more detailed models can then be constructed to measure main effects and interactions. A particular case arises in testing a complex system for unexpected interactions; in experimental design, covering arrays arise primarily in this setting. Covering arrays have been the focus of much research, primarily due to their applications in software and hardware interaction testing. These applications are discussed in Cohen et al. (1997) and Colbourn (2004). Applications in biological sciences also arise (Shasha et al., 2001). Our focus is on construction techniques, rather than on the specific application to experimental design. Techniques used to construct covering arrays include recursive methods (for examples see Hartman and Raskin, 2004; Martirosyan and Van Trung, 2004; Sloane, 1993), algebraic methods (Chateauneuf and Kreher, 2002; Hedayat et al., 1999), and computational search such as in Cohen (2004, 2005) and Nurmela (2004). Recently, Sherwood et al. (2006) exploited a compact representation of covering arrays based on permutation vectors. When v is prime or a prime power, a covering perfect hash family CPHF(n; k, vt−1 , t) is an n × k array on vt−1 symbols such that every n × t subarray contains at least one row which is “covering” in the following sense. The vt−1 symbols in a CPHF can be viewed as a (t − 1)-tuple on v symbols. This (t − 1)-tuple represents a permutation vector of length vt over the elements of the finite field Fv . Given a (t − 1)-tuple (h1 , h2 , . . . , ht−1 ) with hj ∈ {0, 1, . . . , v − 1} for 1 j t − 1, −−−−−−−−−−−→ (i) (i) (i) (i) a permutation vector (h1 , h2 , . . . , ht−1 ) of length vt has the symbol (ht−1 · t−1 ) + · · · + (h2 · 2 ) + (h1 · 1 ) + 0 in position i
∗
Corresponding author. Tel.: +1 480 727 6631; fax: +1 480 965 2751. E-mail addresses:
[email protected] (R.A. Walker),
[email protected] (C.J. Colbourn).
0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.05.020
70
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
Fig. 1. A CA(13; 3, 10, 2).
(i) where i is represented in base v as i = t−1 vk · k . A row is covering if the expansion of the permutation vectors into columns k=0 results in an OA(t, t, v). When every symbol in a CPHF is expanded in this manner, the result is a covering array. (i)
When i < v, k =0 for k > 0. Hence, every permutation vector starts with the sequence 0, 1, . . . , v−1. Eliminating these duplicate rows leads to the key theorem of Sherwood et al. (2006): Theorem 1.1. If v is a prime or a prime power, and a CPHF(n; k, vt−1 , t) exists, then a CA(n · (vt − v) + v; t, k, v) exists. We typically omit the exponent, and refer to a CPHF(n; k, v, t) instead of a CPHF(n; k, vt−1 , t). Using backtracking, Sherwood et al. (2006) found covering arrays for strengths 3 and 4 that improve upon other known constructions. In this paper, we employ the permutation vector representation as the basis of a tabu search method. In this way, we find a number of improved covering arrays for strengths 3–5; more surprisingly, we find the first covering arrays of strength 6 and 7 from computer search. We conclude by presenting the first existence tables for covering arrays of strength 5, partly to demonstrate the utility of the arrays found by the heuristic search method. 2. Forming CAs from CPHFs In order to understand the construction underlying Theorem 1.1, we show the expansion of the following CPHF(2; 10, 3, 3) into a CA: 11 10
00 01
22 11
21 11
01 00
02 22
10 01
11 02
02 20
12 12
Write each of the vt−1 symbols as a t − 1 tuple on v symbols (in this case, the 32 symbols as 2-tuples on 3 symbols). To convert the symbol 11 (h1 = 1, h2 = 1) into a vector of length 33 each row number i is written as a vt tuple. Hence for example i = 0 is written as i = 000 and i = 17 is written as i = 122. For row i = 000, the vector is assigned the value 0 · 1 + 0 · 1 + 0 = 0. Continuing in this manner, i = 001 : 0 · 1 + 0 · 1 + 1 = 1 i = 002 : 0 · 1 + 0 · 1 + 2 = 2 i = 010 : 0 · 1 + 1 · 1 + 0 = 1 i = 011 : 0 · 1 + 1 · 1 + 1 = 2 i = 012 : 0 · 1 + 1 · 1 + 2 = 0 i = 020 : 0 · 1 + 2 · 1 + 0 = 2 i = 021 : 0 · 1 + 2 · 1 + 1 = 0 i = 022 : 0 · 1 + 2 · 1 + 2 = 1 .. . i = 212 : 2 · 2 + 1 · 1 + 2 = 1 i = 220 : 2 · 2 + 2 · 1 + 0 = 0 i = 221 : 2 · 2 + 2 · 1 + 1 = 1 i = 222 : 2 · 2 + 2 · 1 + 2 = 2
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
71
T
Fig. 2. A CA(54; 3, 10, 3) .
Expanding each symbol of the CPHF in this manner, the CA(54; 3, 10, 3) shown transposed in Fig. 2 is obtained. The first v rows (columns as shown) of every permutation vector are the same, in this case 0, 1, 2. We only need to use one copy of each. So we reduce the CA(54; 3, 10, 3) to a CA(51; 3, 10, 3). This process is described in more detail in Sherwood et al. (2006). 3. Mathematical preliminaries We first discuss how to determine whether a set of permutation vectors is covering. Consider t permutation vectors: −−−−−−−−−−−−→ −−−−−−−−−−−−−→ −−−−−−−−−−−−−→ (1) (1) (1) (2) (2) (2) (t) (t) (t) (h1 , h2 , . . . , ht−1 ), (h1 , h2 , . . . , ht−1 ), . . . , (h1 , h2 , . . . , ht−1 ) This set of permutation vectors is covering if its expansion into a vt × t array is an orthogonal array. To check if this condition is not met, we check to see if the array contains some t-tuple twice as a row. Hence, a set of permutation vectors is non-covering if and only if we can find distinct i, j ∈ {0, 1, . . . , vt − 1} so that (j) (j) (j) (j) (1) (i) (1) (i) (1) (i) (1) (1) (1) (i) + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) 0 (j) (j) (j) (j) (2) (i) (2) (i) (2) (i) (2) (2) (2) (i) + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) 0
(1)
.. . (j) (j) (j) (j) (t) (i) (t) (i) (t) (i) (t) (t) (t) (i) + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 + (h1 · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) 0 (i)
(j)
Write r = r − r for 0 r t − 1 with fixed i and j. Then rewrite (1) as (1) (1) 0 + (h(1) · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 1 (2) (2) 0 + (h(2) · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 1
(2)
.. . (t) (t) 0 + (h(t) · 1 ) + (h2 · 2 ) + · · · + (ht−1 · t−1 ) = 0 1
The set of permutation vectors is non-covering if and only if there exist {r : 0 r t − 1}, with i = 0 for at least one i, that solve the system of linear equations (2). Ass an example, consider the array: CPHF(3;22,3,3) 11 11 21 10 20 21 22 21 01 22 21 00
21 11 01
22 01 02
02 01 12
20 11 10
12 01 10
01 00 22
00 20 21
01 22 12
11 10 02
10 12 12
00 12 22
02 02 20
00 02 00
20 20 11
12 10 20
10 00 10
21 10 11
22 11 20
Each symbol represents a permutation vector. For instance, the symbol 01 represents the vector with h1 = 0, h2 = 1. For the example, the field F3 is simply Z3 .
72
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
Let us consider the first three columns of this array. The first row contains 11 11 21. This row is non-covering due to the duplication of the vector represented by 11. The second row contains 20 21 22. For this set of vectors, a solution to (2) is given by (0 , 2 , 2 ) = (1, 1, 0) (arithmetic in Z3 ). 1+1·2+0·0=1+2=0 1+1·2+0·1=1+2=0 1+1·2+0·2=1+2=0 Hence the second row yields a distinct non-covering tuple. However, the third row, 01 22 21, is covering because there is no choice of (0 , 2 , 2 ) to solve (2). Because this row is covering, it expands into an orthogonal array, shown transposed: ⎡
0 ⎣0 0
1 1 1
2 2 2
0 1 2 0 2 0
2 1 1
0 1 2 1 2 0 1 2 0
1 2 1
2 0 0 1 2 0
1 2 0 1 2 0 0 1 2
1 2 0 0 1 2 2 0 1
2 1 2
0 1 2 0 0 1
2 0 1 0 1 2 1 2 0
2 0 2 0 0 1
⎤ 1 1⎦ 2
We include the three constant rows (columns as shown) only once each. Hence the CPHF(3; 22, 3, 3) yields a CA(3 · (33 − 3) + 3 = 75; 3, 22, 3) and establishes the bound CAN(3, 22, 3) 75. 4. Tabu search Nurmela's tabu search results for covering arrays (Nurmela, 2004) treat the actual covering array. Our tabu search method searches instead for the covering perfect hash family in order to reduce computation time significantly. We maintain a current candidate array with an associated score. We also maintain a list of recent states as a tabu list. We then generate a neighborhood of moves. We choose from these the move with the best score that does not take us into a state in the tabu list. More information on the tabu search method can be found in Glover and Laguna (1997). The score S of a given candidate array is the number of sets of t columns that have no covering row; such a set of columns is uncovered, and each column in an uncovered set is deficient. By definition, 0 S ( kt ) and an array with score S = 0 is a CPHF. A move changes one element of the array to a new value. Not all potential moves need to be considered. Changing an element within a column that is not deficient can have no positive effect. Therefore, we limit the neighborhood of moves to deficient columns. We denote the number of deficient columns as D, so that D k and D S · t. There are nD(vt−1 − 1) moves to consider. For each, we compute the score the new array would have. We cache information about which set(s) of columns are covered by which row(s) and thus consider only the ( k−1 t−1 ) sets of columns that contain the element being changed. This is slightly less efficient than the “cost change table” discussed in Fleurant and Ferland (1996), but employs similar ideas and uses less memory. In the event that two or more moves result in the same best score, we choose among them randomly. We maintain a tabu list of the last 50,000 moves made. This number was chosen after extensive experimentation; larger tabu lists did not improve the results in our cases. Using this list, we are able to generate a list of moves that take the current array back to a tabu array. This is discussed in more detail in Section 4.1. We start the search with a randomly generated array. At the beginning of the search, we usually have D = k, which leads to a large number of moves to consider. It is therefore helpful to restrict the neighborhood examined. To do this, we select one column in a weighted random fashion, where the weight of each column is the number of non-covering sets to which it belongs. By restricting the neighborhood to changes within this column, we can increase the speed by a factor of D with only a minor decrease in search effectiveness. Once D becomes small (ideally D < k), we remove this neighborhood restriction. When D does not reach the threshold to be considered small, we turn the neighborhood restriction off when the search appears to “stall”. 4.1. The tabu list Tabu search relies on keeping a fixed-length list of recent states known as a tabu list. The search program is then prohibited from revisiting these states. This technique helps to prevent cycles. For large arrays, storing a long list of arrays is prohibitive both for memory and computation time, when one must compare each target array with every array in this list. Nurmela (2004) employs a general strategy discussed in Glover and Laguna (1997) in order to simplify the tabu condition. Instead of storing a long list of recent arrays, Nurmela stores a short list of recent positions that were modified. Changes to these positions are prohibited until enough moves have occurred that they are no longer considered tabu. This technique prevents revisiting any recent states; however, it can disallow moves that should not be tabu. The size of the tabu list with this technique must be shorter than the size of exact tabu lists; otherwise it is possible to make every move tabu. In the search, we keep an undo list of recent moves on which a move is specified by a row, a column, and the value that was replaced. At each iteration of the search, we can convert this to a list of moves taking the current array back to a tabu state. To accomplish this, we use Algorithm 1. It works by keeping a list of differences between the old array and the current array.
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
73
Whenever there is exactly one position that differs, we declare changing that position to the value it held previously to be a tabu move. Algorithm 1. Convert undo list into tabu list. StateCount ← 0 Changes ← ∅ TabuMoves ← ∅ for Undo ∈ UndoHistory from most recent to oldest do if CurrentArray[Undo.Row][Undo.Col] = Undo.OldValue then Remove [Undo.Row][Undo.Col] from Changes StateCount ← StateCount − 1 else if [Undo.Row, Undo.Col] in Changes then Changes[Undo.Row][Undo.Col] = Undo.OldValue else Insert [Undo.Row][Undo.Col] = Undo.OldValue into Changes StateCount ← StateCount + 1 end if end if if StateCount = 1 then Insert Changes[1] into TabuMoves end if end for Using this more exact tabu list resulted in much faster searches and better results than using the technique given by Nurmela. This algorithm can be used for any tabu search that operates on arrays. Similar techniques are discussed in much more detail in Glover and Laguna (1997); in particular, the method of “reverse elimination” discussed therein employs logical structure to infer tabu moves rather than storing them explicitly, as we do here. 4.2. The non-covering test The search program very frequently needs to test whether a given set of permutation vectors is covering or non-covering. Since this test comprises the most executed portion of the program, it is important for it to run as fast as possible. Recall that a given set of permutation vectors is non-covering if and only if there exist non-zero r that solve the equation (i) (i) 0 + (h(i) × 1 ) + (h2 × 2 ) + · · · + (ht−1 × t−1 ) = 0 1
(3)
for all h(i) in the set. It is possible to perform a lot of this work before the search begins. We think of it in reverse, as follows. Given non-zero values r there are vt−2 permutation vectors h(i) so that (3) holds (Sherwood et al., 2006). We can pre-compute the set of vectors that is “solved” by each assignment of the r 's. Call this set V({0 , 1 , . . . , t−1 }). Then, a given set T of permutation vectors is non-covering if and only if there exists non-zero r 's so that T ⊆ V({0 , 1 , . . . , t−1 }). Testing this condition is much faster than solving the linear system, and uses very little memory. In the worst case, this method runs in O(t · vt−1 · (t − 2) log v) time since there are t elements to test for inclusion in O(vt−1 ) sets using binary search on vt−2 elements. Because it uses so little memory, we use this method for large t and v, specifically when v2(t−1) > 5 000 000. Another method is to compute the set V−1 ((h1 , h2 , . . . , ht−1 )) of r 's that solves (2) for permutation vector (h1 , h2 , . . . , ht−1 ). Then a given set T of permutation vectors is non-covering if and only if V−1 (h) = ∅ (4) h∈T
We accelerate this by storing not only V−1 (h) but also the values of V−1 (h) ∩ V−1 (k) for any two permutation vectors h and k. This uses significantly more space than the previous method. However, storing the V−1 sets in a sorted list let us check the non-covering condition in O(( t/2 − 1)vt−1 ) time. When t = 4 this is just O(v3 ) time. This method is used for t > 4 and t = 4, v 5.
Fig. 3. A WCA(4; 8, 4).
74
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
For the fastest check, we simply store a single bit for every possible set of permutation vectors: 1 if it is covering, and 0 if it is not. This permits a constant time check, but requires O((vt−1 )t ) space, which is prohibitive for large t or v. We use this check when t = 3 and when t = 4 and v 4. 4.3. Related search problems The method developed for CPHF search can be applied to many similar search problems. Let C = {Ci : i = 1...} be a set of subsets of tuples of the same length over an alphabet of size v. Let the length of the tuples in set Ci be denoted ti . A C-(N, k)-array is an N × k array with entries from the same alphabet of size v, in which every N × t subarray has the property that for every i with 1 i , there exists a row of the subarray equal to a t-tuple in Ci . No assumption is made that Ci and Cj are disjoint, nor that ti = tj , nor that a given tuple appears in any of the sets. We also define a C∗ -array to have the same property when we only consider N × t subarrays where the columns maintain their original ordering. Any C-array can be formulated as a C∗ -array; however, the converse is not necessarily true. Taking C to be the set of vt singleton sets each containing a distinct t-tuple, a C∗ -array is a covering array. If we consider all groups of Ci whose elements are merely permutations of each other, and keep only one of each in C, a C-array is a covering array. Taking C to have a single set in which t-tuples with distinct entries appear makes a C∗ -array a perfect hash family, and restricting the set to contain only the covering t-tuples makes a C∗ -array a CPHF. In general, the description of C is not an explicit listing of tuples; rather an oracle to test membership of a tuple in Ci is assumed. The generic method of search has been used to find perfect hash families in Walker and Colbourn (2007). The search method can find a C∗ -array for any set C, but we optimized it for the case when |C| = 1. To demonstrate the generic nature of this method, set C to be a single set C1 consisting of common 4-letter English words. We refer to this as a WCA: word covering array of strength 4, and Fig. 3 gives an example. While the practical implications of such an array are likely non-existent, it is an effective demonstration of the generic nature of the algorithm. It is also interesting to see the kind of patterns “utilized” by the resulting array, since many of these same patterns probably appear less visibly in CPHFs. 5. Results Given n, v and t, the maximum value of k for which a CPHF was found is shown in Tables 1–5. Items marked with an asterisk appeared in Sherwood et al. (2006). All others were created with the method in this paper. Most of the arrays in these tables were found by a 2.66 GHz Pentium 4 in less than 10 min. Some arrays with k columns required more than a day of search; however, finding an array with k − 1 columns in these cases always took less than an hour. The most difficult arrays to find generally are those listed at the borders of the table, i.e. those with the highest k for a given v and t. This difficulty is based on the added size of the search space and the additional memory needed to process the non-covering condition. Explicit solutions appear in Walker (2005). We give one array of each strength in Table 6. 5.1. Analysis and reduction In order to assess the quality of the covering arrays generated, we implemented an analysis program that tests for flexibility in the array. One form of flexibility is a “don't-care” position, a position whose value can be changed without affecting the covering property. Of the 24 strength 3 arrays generated, only 3 have don't-care positions, and they occur in fewer than 1% of the positions. For strength 4, 5 arrays of 25 have don't-care positions, making up roughly 4.4% of the positions. For strength 5, 5 of 12 arrays have don't-care positions, making up 4.5% of the positions. For strength 6, 7 of 9 arrays have flexible positions making up 8.7% of the positions. Finally, for strength 7, all 6 arrays have flexible positions making up 4.7% of the positions. The larger proportion of Table 1 Table for k, given n and v, where t = 3 v
3 4 5 7 8 9 11 13
n 2
3
4
5
6
7
10 16∗ 24∗ 32 40∗ 41 50 59
22 34 48 81 91 113 146 200
37 64 95 150 200 225
57 118 160
89 222
142
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
75
Table 2 Table for k, given n and v, where t = 4 v
2 3 4 5 7 8 9 11 13
n 2
3
4
5
6
7
8∗ 10∗ 13 15 18 20 21 23 24
9 16∗ 20 24 30 36 39
12 23 31 35 54
30 42 62
39
50
Table 3 Table for k, given n and v, where t = 5 v
n 2
3
4
5
6
2 3 4
10 10 11
12 13 15
14 16
16 19
17
2
3
4
5
6
7 10 11
9 12
11 14
12
14
2
3
4
5
6
9 12
11 12
12 13
12
13
Table 4 Table for k, given n and v, where t = 6 v
2 3 4
n
Table 5 Table for k, given n and v, where t = 7 v
2 3
n
don't-care positions as strength increases appears to be due to the additional complexity of searching for higher-strength arrays, in turn suggesting that better arrays are more likely to exist in these cases. A higher degree of flexibility is indicated by the presence of a redundant row. This does not happen for any of the strength 3 arrays. However, it happens once for strength 4, twice for strength 5, six times for strength 6, and five times for strength 7. This again shows a strong trend of higher quality for lower strength. Removing these extra rows produces the following improved arrays:
CA(54; 4, 12, 2) CA(110; 5, 14, 2) CA(176; 5, 17, 2) CA(96; 6, 7, 2) CA(160; 6, 9, 2) CA(232; 6, 11, 2) CA(272; 6, 12, 2) CA(322; 6, 14, 2) CA(1449; 6, 10, 3) CA(224; 7, 9, 2) CA(364; 7, 11, 2) CA(452; 7, 12, 2) CA(572; 7, 13, 2) CA(4347; 7, 12, 3) 5.2. Recursive bound improvements The results found are often the best known covering arrays with small k. Thus they have the additional value of being excellent ingredients to recursive constructions. Establishing these claims is by no means straightforward, since to the best of our knowledge no one has tabulated values of CAN(5, k, v). We therefore developed a Maple environment to incorporate all of the known direct
76
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
Table 6 An example for each strength CPHF(2; 16,4,3) 32 20 13 00 23 11
10 12
CPHF(2; 10,3,4) 122 100 112 100
21 02
020 101
CPHF(3; 13,3,5) 1001 1111 0010 0101 1021 2110
1002 2001 2010
22 13
11 30
202 221
0110 2200 0102
02 03
23 10
112 122
1012 0200 0012
01 01
00 32
222 200
1201 0121 2122
0000 0211 2111
12 31
03 22
201 021
2112 1220 0112
0210 2222 1212
30 33
000 201
2101 2112 0110
33 20
31 21
212 120
1110 1111 1022
2020 1112 0021
120 011
0221 0102 1001
CPHF(3; 9,2,6) 00101 00000 11001 10100 10010 01001
01010 10101 01100
10101 00000 01011
01100 11101 01000
00110 01111 01110
10000 11111 11110
10100 01011 11010
10001 00101 10001
CPHF(2; 9,2,7) 110000 101101 100000 111010
111101 010001
100100 110111
000111 001111
101010 111111
000010 000001
101110 111100
111011 111011
Table 7 Bounds on CAN(5, k, v) for 2 k 9 v
k 10
25
50
3
483
2007
3765
5997
999
2689
4113
6749
5
6
7
8
9
392 392
250
62 112
4
287 347
100
2
644 674
500 1007 1037
1000
2500
5000 2744 2744
10000
1464 1494
2047 2077
3038 3614
3926 4502
8667
12 969
18 369
28 491
36 651
42 147
10 517
15 247
21 103
31 994
41 278
51 471
2044
7396
13 656
20 440
41 245
57 663
78 713
110 443
143 562
153 132
3456
12 736
19 776
29 354
46 561
64 593
86 765
122 088
156 833
195 258
9875
31 735
49 325
71 095
113 089
156 847
210 737
303 763
387 913
454 755
9875
37 259
55 809
82 603
126 789
175 571
235 461
334 279
426 269
494 739
24 474
78 876
133 942
202 550
346 098
492 420
692 610
1 002 556
128 4624
1 441 860
24 474
82 196
149 102
226 140
366 853
520 135
729 525
1 074 301
1 364 799
1 460 130
48 363
104 433
179 417
268 765
461 015
648 259
899 415
1 334 711
1 686 031
2 146 675
48 363
114 729
212 429
311 893
517 157
712 753
985 653
1 479 395
1 840 831
2 319 151
59 048
191 112
332 696
520 977
806 834
1 124 979
1 509 029
2 296 905
2 951 596
3703 209
59 048
212 623
400 904
589 185
896 553
1 280 603
1 664 653
2 463 183
3 156 290
3 907 903
59 049
326 925
575 127
910 945
1 410 635
1 981 605
2 677 055
3 784 425
4 891 283
6 159 485
59 049
367 757
703 575
1 039 393
1 579 915
2 275 365
2 970 815
4 047 337
5 315 539
6 583 741
and recursive constructions for covering arrays of strengths 2–5. For strengths 3 and 4, these tables and the known constructions appear in Colbourn et al. (2006). For strength 5, we use the Roux-type constructions of Martirosyan and Colbourn (2005) and Martirosyan and Van Trung (2004), the perfect hash family construction (Colbourn et al., 2006; Martirosyan and Van Trung, 2004), the Turán squaring construction of Hartman (2005), as well as direct constructions. The direct constructions arise from orthogonal arrays (Hedayat et al., 1999) and other computational search techniques (Cohen, 2004, 2005; Cohen et al., 2008; Nurmela, 2004). Now to illustrate the impact of the new direct constructions, we present the best-known values for CAN(5, k, v) before and after the CPHF constructions are included. In Table 7, two upper bounds on CAN(5, k, v) are given for 2 v 9 and k ∈ {10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10 000}.
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
77
Table 8 Bounds for (N; 5, 2) 6
32o
12
v
92
17
176v
28
i
287
40
j
375
144
644q
10
62v
14
110
v
16
152v
20
194j
24
261i
32
i
34
357j
q
81
434q
8
176
j
1002
252
1007j
288
j
1031
56u
330
64
392
162
734j
169
770q
192
1005
j
224
1006j
256
1025j
280
1026j
320
j
324
1123j
q
1121
338
j
1217
361
1358
378
1455j
384
1456j
448
1461j
504
1464j
512
j
1504
560
1507
j
576
1519j
640
1609j
648
1614j
676
1708j
722
1849j
756
1946j
768
1947j
800
j
1952
896
j
2023
924
j
2029
1008
j
2047
1024
j
2096
1120
2102j
1152
2123j
1280
2213j
1296
2223j
1352
2317j
1408
2458j
1444
2461j
1512
j
1536
j
1600
j
2565
1620
j
1792
j
1848
2717j
6864
j
4096
2558
q
2744
6561
2559
q
3038
3614
8192
2706
j
3632
10 000
2707
j
3926
The first bound given uses the CPHFs produced here in conjunction with the known recursive constructions, and all other direct constructions of which we are aware. The second entry gives the bound calculated in the same manner, but omitting the CPHF direct constructions. Two things are striking. The impact of the “small” arrays produced on the recursions makes a substantial improvement for v ∈ {2, 3, 4}. Perhaps more surprising is the improvements for 5 v 9, since these result from the use in the recursions of covering arrays of strengths 3 and 4. Since CPHF constructions improve the bounds on CAN(t, k, v) for t ∈ {3, 4}, improvements arise for strength five as well. We provide more detailed information in Tables 8–9 for v ∈ {2, 3, 4}, presenting the tables after the use of covering perfect hash families. Let (N; t, v) be the largest k for which CAN(t, k, v) N. As k increases, for many consecutive numbers of factors (columns), the covering array number does not change. Therefore reporting those values of (N; t, v) for which (N; t, v) > (N − 1; t, v), along with the corresponding value of N, enables one to determine all covering array numbers when k is no larger than the largest (N; t, v) value tabulated. Since the exact values for covering array numbers are unknown in general, we in fact report lower bounds on (N; t, v). For each strength in turn, explicit constructions of covering arrays from direct and computational constructions are tabulated. Then each known construction is applied and its consequences tabulated (in the process, results implied by this for fewer factors are suppressed, so that one explanation (“authority”) for each entry is maintained). Applications of the recursions is repeated until no entries in the table improve. The authorities used are: h j q u z
perfect hash family (Martirosyan and Van Trung, 2004) Roux-type doubling (Martirosyan and Colbourn, 2005) Turán squaring (Hartman, 2005) miscellaneous direct construction composition
i o s v
Roux-type doubling (Martirosyan and Van Trung, 2004) orthogonal array (Hedayat et al., 1999) simulated annealing (Cohen, 2004, 2005) permutation vector (this paper)
For each v, we tabulate the entries for N and (N; 5, v). We also provide a plot showing the logarithm of the number of factors horizontally and the size of the covering array vertically. The plot simply demonstrates the growth rate, and the computed bound is that provided by the points tabulated. Note Added: Since this research was completed, three new methods have been proposed for the construction of covering arrays of higher strength: FireEye (Lei et al., 2007), density (Bryce and Colbourn, 2007), and PaintBall (Kuhn, 2006). Each is an heuristic method, and each results in some improvements to the tables presented.
78
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
Table 9 Bounds for (N; 5, 3) 6
243o
7
377s
8
457s
10
483v
13
723v
16
963v
i
26
2007j
38
j
2643
40
2970j
48
j
52
3765j
q
19
v
1203
20
j
32
2247
46
j
3555
60
j
1530
3711
64
4113
81
4347q
86
j
5733
92
j
5787
96
5943j
102
5997j
104
6335j
169
6507q
256
q
8667
258
j
268
j
306 322 361
4005
9977
264
9949j
j
288
10 035j
308
j
10 299
312
10 301j
324
j
338
10 413j
j
276 j
10051
j
10377
9945
10 031
10 393
q
10827
388
12 661
396
12 715j
414
12 771j
432
12 929j
444
12 937j
450
12 949j
456
12 957j
468
12 965j
512
12 969j
516
14 429j
522
14 433j
528
14 449j
536
14 477j
540
14 531j
552
14 661j
576
14 669j
594
14 713j
600
j
14 729
612
j
14 745
616
j
624
j
644
j
15 137
648
15 153j
666
j
15 181
676
j
704
j
15 671
722
15 755j
736 792
j
17 589 17 867j
738 800
17 657 17 947j
776 852
j
17 705 18 137j
788 864
17 759j 18 161j
888
18 197j
894
948
18 289j
968
1058
j
19 877
1072
j
20 153
1176
j
1188
j
20 495
j
15 205
j
j
15 061
j
702
15 635
774 840
j
17 673 18 117j
17 665 17 959j
18 209j
900
18 225j
912
18 233j
918
18 241j
936
18 285j
18 343j
1024
18 369j
1032
19 829j
1044
19 833j
1056
19 849j
1080
j
1104
20 337
j
1134
j
20 345
1152
20 451j
20 669
j
1232
j
1242
20 985j
j
20 499
j
1200
20 207
j
20 653
1224
21 247
1332
21 291
1350
21 343
1352
21 359j
1408
21 837
j
1428
j
21 921
1444
j
21 975
1458
23 809j
23 965j
1512
23 973j
1536
23 981j
1548
24 117j
1552
24 149j
24 311j
1600
24 391j
1656
24 403j
1680
24 561j
1704
24 581j
1776
24 695
j
1788
j
1800
j
24 723
1824
25 031j
25 157
j
1944
j
25 237
1998
25 253j
26 773
j
j
2160
27 275
j
1288
21 231
1368
j
21 773
1404
j
21 797
1472
23 897j
1476
1576
24 203j
1584
1708
j
24 605
1728
j
24 659
1836
j
1872
j
25 153
2052
j
26 749
2144
j
27 221
1296
1896
24 707
j
1936
25 211
2088
j
26 777
2106
26 793
2112
26 833j
2208
27 409
j
2214
j
27 485
2268
27 497j
2048
25 289
2116
j
26 897
2272
27 607j
2304
27 615j
2322
27 659j
2352
27 703j
2376
27 707j
2400
27 893j
2448
27 913j
2464
28 243j
2484
28 245j
2496
28 415j
2520
28 491j
2556
28 499j
2576
28 539j
2592
28 555j
2640
28 627j
2664
28 647j
2700
28 699j
2704
28 715j
2736
j
29 129
2754
j
29 153
2808
29 225
j
2816
j
29 265
2856
j
29 349
2880
29 403j
2888
j
j
31 331
j
2948
j
31 399
2952
j
3024
31 515j
31 691
j
3104
j
31 723
3132
j
31 777
3152
31 781j
29 409
2064
j
20 983
j
21 155
j
15 619
j
756 828
1248
25 039
684
15 059
2916
31 243
3042
j
31 523
3072
j
31 555
3168
31 889j
3174
31 985j
3200
32 245j
3240
32 257j
3312
32 277j
3360
32 435j
3402
32 455j
3408
32 497j
3416
32 521j
3456
32 575j
3510
32 611j
3528
32 635j
3552
32 687j
3564
32 699j
3576
32 759j
3592
32 775j
3600
32 829j
3648
33 137j
3672
j
33 145
3744
j
33 259
3792
33 263
j
3872
j
33 317
3888
j
33 343
3996
33 359j
4096
j
33 395
4104
j
34 855
34 879
j
4176
j
34 883
4212
j
34 899
4224
34 939j
4232
j
j
35 381
j
4416
j
35 515
4428
j
35 591
4536
35 603j
j
4704
35 809
j
4728
j
35 813
4752
35 867j
4968
36 405j
4992
36 575j
5040
36 651j
35 003
2944 3096
4128
4288
35 327
4544
j
35 713
4608
j
4320
35 721
4644
35 765
4800
36 053j
4896
36 073j
4928
36 403j
31 507
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
79
Table 10 Bounds for (N; 5, 4) 6
1024o
11
2044v
15
3064v
16
5856j
20
6064j
22
6168j
24
j
7292
26
j
30
j
8256
j
28
8152j
32
j
11 048
40
11 256j
48
j
52
13 656j
j
7396
44
12 116
56
j
14 412
58
14 516
60
14 620j
62
j
17 516
64
j
18 272
68
18 480j
76
18 584j
80
18 688j
84
19 652j
121
q
124
j
27 242
128
27 998j
136
j
28 893
138
29 048j
160
j
29 274
169
30 640q
j
130
20 440
28 206
j j
13 240
152
29 152
170
j
31 836
190
31 932
192
32 067j
200
j
32 142
216
j
32 523
224
32 568j
232
32 613j
240
32 970j
242
33 015j
248
39 817j
250
41 245j
256
41 269j
260
41 582j
272
42 269j
276
42 484j
280
j
42 588
288
j
42 737
290
42 841
j
300
42 886
j
304
j
42 931
310
43 053j
320
43 290
j
328
j
44 764
44 868
j
45 369
j
340
j
46 565
352
46 769j
47 180
j
372
j
47 284
47 341
j
47 476
j
400
j
47 655
416
48 111j
48 180
j
432
j
48 846
j
48 891
j
472
j
49 248
480
49 352j
j
512
57 687
j
520
j
58 495
528
59 808j
568
60 444j
576
60 548j
580
60 652j
640
61 137
j
656
j
63 343
660
63 447j
65 578
j
682
j
65 782
704
65 857j
66 678
j
768
j
66 948
800
67 229j
68 846
j
928
j
68 936
944
69 338j
j
1024
j
78 761
1040
79 674j
1136
82 337j
1152
82 441j
1280
j
83 357
1312
85 671j
1360
j
88 511
1364
88 823j
1488
j
1520
90 291j
1680
j
91 996
1728
93 004j
1920
j
93 898
1922
94 267j
368 420 484
j
49 433
544
59 976j
600
j
60 697
j
664
63 777
722
j
66 268
832
j
68 066
48 801
336 380 448
496
j
56 235
500
57 663
552
60 191j
560
60 295j
608
j
620
j
60 778
j
672
63 881
736
j
66 517
j
676 744
384 464
60 900
64 382
j
66 621
j
68 756
j j
1000
78 713
1120
81 831j
1240
82 883
j
1352
87 211
j
1472
j
680 760
840
68 135
960
j
69 442
968
j
69 811
992
76 613
1056
80 987j
1088
81 155j
1104
81 727j
1160
j
82 545
1200
j
82 635
1216
82 761
j
1320
j
85 775
1328
86 105
j
1344
j
1392
j
88 898
1408
j
1536
90 561
j
1776
93 094
j
1792
93 198
1936
94 942j
1984
101 744j
2000
2112
107 393j
2176
107 561j
2208
2280
109 351j
2304
109 455j
2320
109 559j
2480
j
2512
j
110 443
2560
j
2664
j
1584
109 969
j
2656
114 786
2744
j
117 687
2784
89 002
90 842
864
338
j j
j
1444
89 824
1600
j
1856
114 890
86 209
896
90 073
90 177
91 858
j
1888
93 794
j
103 844j
2048
103 892j
2080
105 390j
2100
106 772j
108 133j
2220
108 237j
2240
108 741j
2272
109 247j
2360
109 649j
2400
109 685j
2432
109 847j
2584
j
113 959
2624
j
2640
114 204j
2704
j
j
2728
117 582j
j
2976
119 478j
90 946
93 288
1664
j
110 547
j
2688
114 968
j
117 791
2816
j
117 895
2888
115 970
j
118 717
2720 2944
114 100
117 270
119 329
3040
119 592j
3072
119 865j
3168
120 146j
3200
120 250j
3208
121 162j
3328
121 266j
3360
121 404j
3362
122 412j
3456
123 222j
3496
123 312j
3552
123 450j
3584
123 554j
3712
j
123 689
3776
j
124 321
3800
j
124 425
3840
j
3844
125 352
j
3872
126 027j
3936
j
132 829
3968
j
132 874
4000
j
135 646
4096
135 724
j
4160
137 282j
4200
j
138 664
4224
j
139 285
4256
j
139 453
4370
j
4416
140 619j
4440
j
140 723
4480
j
141 227
4544
j
j
4640
142 090j
4720
142 333j
4750
142 369j
4800
5024
143 562j
124 470
j
4072
135 718
4352
j
139 543
141 778
4560
j
141 882
4608
141 986
142 414j
4864
142 729j
4960
142 851j
140 526
80
R.A. Walker, C.J. Colbourn / Journal of Statistical Planning and Inference 139 (2009) 69 -- 80
6. Conclusions By utilizing the compact search space afforded by covering perfect hash families, tabu search is able to find smaller arrays for higher strength more efficiently. This efficient representation of a covering array enables searches for t 5. The resulting arrays for small k improve on the best known bounds for many larger k, utilizing the arrays in recursive constructions. Acknowledgments We thank to Sosina Martirosyan for helpful discussions regarding the non-covering conditions. We also thank the referees for improving the presentation. References Bryce, R.C., Colbourn, C.J., 2007. A density-based greedy algorithm for higher strength covering arrays. Software Testing Verif. Reliab., to appear. Chateauneuf, M., Kreher, D., 2002. On the state of strength-three covering arrays. J. Combin. Des. 10 (4), 217–238. Cohen, M.B., 2004. Designing test suites for software interaction testing. Ph.D. Thesis, University of Auckland. Cohen, M.B., 2005. Private communications. Cohen, D.M., Dalal, S.R., Fredman, M.L., Patton, G.C., 1997. The AETG system: an approach to testing based on combinatorial design. IEEE Trans. Software Eng. 23 (7), 437–444. Cohen, M.B., Colbourn, C.J., Ling, A.C.H., 2008. Constructing strength 3 covering arrays with augmented annealing. Discrete Math. 308, 2709–2722. Colbourn, C.J., 2004. Combinatorial aspects of covering arrays. Le Matematiche (Catania) 58, 121–167. Colbourn, C.J., Martirosyan, S.S., Van Trung, T., Walker II, R.A., 2006. Roux-type constructions for covering arrays of strengths three and four. Designs Codes Crypt. 41, 33–57. Fleurant, C., Ferland, J.A., 1996. Genetic and hybrid algorithms for graph coloring. Ann. Oper. Res. 63, 437–461. Glover, F., Laguna, M., 1997. Tabu Search. Kluwer Academic Publishers, Norwell MA. Hartman, A., 2005. Software and hardware testing using combinatorial covering suites. In: Golumbic, M.C., Hartman, I.B.-A. (Eds.), Interdisciplinary Applications of Graph Theory, Combinatorics, and Algorithms. Springer, Norwell, MA, pp. 237–266. Hartman, A., Raskin, L., 2004. Problems and algorithms for covering arrays. Discrete Math. 284, 149–156. Hedayat, A.S., Sloane, N.J.A., Stufken, J., 1999. Orthogonal Arrays, Theory and Applications. Springer, Berlin. Kuhn, D.R., 2006. An algorithm for generating very large covering arrays, Internal Report 7308, National Institute of Standards and Technology. Lei, Y., Kacker, R., Kuhn, D.R., Okun, V., Lawrence, J., 2007. IPOG: a general strategy for t-way software testing, submitted for publication. Martirosyan, S.S., Colbourn, C.J., 2005. Recursive constructions for covering arrays. Bayreuth. Math. Schr. 74, 266–275. Martirosyan, S.S., Van Trung, T., 2004. On t-covering arrays. Designs Codes Crypt. 32, 323–339. Nurmela, K., 2004. Upper bounds for covering arrays by tabu search. Discrete Appl. Math. 138, 143–152. Shasha, D.E., Kouranov, A.Y., Lejay, L.V., Chou, M.F., Coruzzi, G.M., 2001. Using combinatorial design to study regulation by multiple input signals: a tool for parsimony in the post-genomics era. Plant Physiology 127, 1590–1594. Sherwood, G., Martirosyan, S.S., Colbourn, C.J., 2006. Covering arrays of higher strength from permutation vectors. J. Combin. Design 14, 202–213. Sloane, N.J.A., 1993. Covering arrays and intersecting codes. J. Combin. Design 1, 51–63. Walker, R.A. II, 2005. Covering Arrays and Perfect Hash Families, Ph.D. Thesis, Computer Science and Engineering, Arizona State University. Walker, R.A. II, Colbourn, C.J., 2007. Perfect hash families: constructions and existence. J. Math. Crypt. 1, 125–150.