Computational Statistics & Data Analysis 51 (2007) 2958 – 2968 www.elsevier.com/locate/csda
Exploring k-circulant supersaturated designs via genetic algorithms Christos Koukouvinos∗ , Kalliopi Mylona, Dimitris E. Simos Department of Mathematics, National Technical University of Athens, Zografou 15773, Athens, Greece Received 9 May 2006; received in revised form 30 November 2006; accepted 30 November 2006 Available online 2 January 2007
Abstract E(s 2 )-optimal or near optimal, two level k-circulant supersaturated designs are explored by means of genetic algorithms. All known k-circulant classes for k = 2, . . . , 6 have been rebuilt and improved. The successful application of genetic algorithms is further illustrated by the construction of several k-circulant supersaturated designs for k = 7, 8. © 2006 Elsevier B.V. All rights reserved. Keywords: Cyclic generator; E(s 2 )-optimality; Factorial designs; Genetic algorithms
1. Introduction A two level design is called saturated when the number m of factors (columns) equals to n − 1, where n is the number of experimental runs (rows). Supersaturated design is a factorial two-level design in which the number of experimental runs n is lower than the number of factors m, that is n m. For each factor of a two level design there are two possible settings known as levels, which can be coded as ±1. Any combination of the levels of all factors under consideration is called a treatment combination. Let X = [c1 , c2 , . . . , cm ] be the design matrix of the experiment in which, each row represents the n treatment combinations and each column gives the sequence of factor levels. For each factor, both level values are of equal interest and each experimental result should have equal influence. Thus we consider designs with the equal occurrence property, where all columns consist of n/2 elements equal to 1 and n/2 elements equal to −1, when n is even. The designs with the equal occurrence property are called balanced designs. 2. Optimality criteria for supersaturated designs Orthogonality between all pairs of columns of the model matrix, which is formed from the design matrix by appending a column of 1’s as the first column, is required to estimate all factor effects. This condition cannot be satisfied for all pairs of columns in a supersaturated design where m n. Therefore we try to construct designs as near orthogonal as possible. We present here the three optimality criteria which we applied for the construction and evaluation of supersaturated designs. ∗ Corresponding author. Tel.: +30 210 7721706; fax: +30 210 7721775.
E-mail addresses:
[email protected] (C. Koukouvinos),
[email protected] (K. Mylona),
[email protected] (D.E. Simos). 0167-9473/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2006.11.042
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
2959
E(s 2 )-criterion: Let sij be the element in the ith row and j th column of the matrix XT X. Booth and Cox (1962) proposed as a criterion for comparing designs the minimization of average of sij2 , denoted by ave(s 2 ) or E(s 2 ), where m . (1) sij2 E(s 2 ) = 2 1 i
The term sij measures the degree of non-orthogonality between two factors i and j . If sij = 0, the factors i and j are orthogonal. If n is even but not a multiple of 4 (i.e. n ≡ 2 (mod 4)) then sij cannot be equal to 0. In these cases, factors i and j are called near orthogonal if sij is close to 0. When sij = ±n then ci = ±cj and ci and cj are completely depended. Designs with any completely depended factors are usually rejected. Nguyen (1996) and Tang and Wu (1997) independently showed that E(s 2 )
n2 (m − n + 1) . (n − 1)(m − 1)
(2)
Hence any supersaturated design that attains this lower bound (LB) is said to be E(s 2 )-optimal. The LB (2) is attainable when m=q(n−1), where q is a positive integer and n ≡ 0 (mod 4). It is also attainable when q is even and n ≡ 2 (mod 4). Butler et al. (2001) derived some LB for E(s 2 ) for supersaturated designs with n runs and m = q(n − 1) + k factors (|k| < n/2, q positive for n ≡ 0 (mod 4), q even for n ≡ 2 (mod 4)). Recently, Bulutoglu and Cheng (2004) presented some improved LB for E(s 2 ), which apply to all cases. These bounds were improved by Ryan and Bulutoglu (2006). We give their result when n ≡ 2 (mod 4) and q is odd. Notation: For any x ∈ R, x+ = max{0, x}, and x+ = max{0, x} where · and · are the floor and ceiling functions, respectively. Theorem 1 (Ryan–Bulutoglu, 2006). Suppose m is a positive integer such that m > n − 1. Then there is a unique q such that −2n + 2 < m − q(n − 1) < 2n − 2 and (m + q) ≡ 2 (mod 4). Let g(q) = (m + q)2 n − q 2 n2 − mn2 . If m(m−1)(h−4) +
64 n ≡ 2 (mod 4) and q is odd, then E(s 2 ) 4 + 64 m(m−1) , whereas ⎧ g(q) + 2n2 − 4n ⎪ when |m − q(n − 1)| < n − 1, ⎪ ⎪ ⎪ m(m − 1) ⎪ ⎪ ⎪ ⎨ g(q) − 2n2 + 4n + 4n|m − q(n − 1)| 3 h= when n − 1 < |m − q(n − 1)| n − 1, ⎪ 2 m(m − 1) ⎪ ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎩ g(q) + 4n − 12n + 8|m − q(n − 1)| + 8 when |m − q(n − 1)| > 3 n − 1. m(m − 1) 2
rmax criterion: Another reasonable criterion for constructing and comparing supersaturated designs is the minimization of maxi
2960
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
off-diagonal elements of XT X whose absolute values are equal to i + 4(t − 1), whereas t is their position in the (f)i . For a finer comparison of designs, other (f)i can be compared. 3. k-circulant supersaturated designs To accommodate the large number of factors of a supersaturated design, Liu and Dean (2004) proposed cycling the elements of a generator, k elements at a time. They called such a design, a k-circulant supersaturated design. These designs are a generalization of Plackett and Burman’s designs (1946), which are 1-circulant saturated designs. In general, the generator of a k-circulant supersaturated design provides the first row of the design matrix X. Each remaining row of X is obtained from its previous row by moving the last k elements to the first k columns and cycling the other elements k columns to the right. As an alternative a k-circulant supersaturated design can be obtained from an 1-circulant design matrix, by selecting rows 1, k + 1, . . . , m − k + 1. In both cases a row of ones is appended to the design. The next theorem, which is proved by Liu and Dean (2004), states the necessary and sufficient conditions for the existence of two-level k-circulant supersaturated designs with the property of equal occurrence. Theorem 2 (Liu–Dean, 2004). Let D be a k-circulant supersaturated design with n runs and m factors each having two levels, coded +1 and −1. Suppose that the rows of design D are obtained from the generator (g1 , g2 , . . . , gm ) by cycling k-elements at each step and adding a row of +1’s. Necessary and sufficient conditions for D to be balanced are 1. n = 2t, m = (2t − 1)k, for some positive integer t; 2.
the generator contains exactly kt elements equal to −1 and (kt − k) elements equal to +1; 2t−2 3. u=0 guk+j + 1 = 0, i = 1, . . . , k. We briefly note the following consequences. • The E(s 2 ) LB (2) for balanced k-circulant designs with nk ≡ 0 (mod 4), according to Liu and Dean (2004), becomes LB = n2 (k − 1)/(nk − k − 1).
(3)
• In the case of balanced k-circulant designs with n ≡ 2 (mod 4) and k odd, the improved E(s 2 ) LB of Theorem 1 becomes m(m − 1)(h − 4) + 64 LB = 4 + 64 , (4) m(m − 1) where h = [k(k − 1)(n − 1)n2 + 2n(n − 2)]/[k(n − 1)(kn − k − 1)]. Using the aforementioned LB, E(s 2 )-efficiency is defined as LB . E(s 2 )
(5)
A design is called E(s 2 )-optimal when its E(s 2 )-efficiency is equal to 1. 4. Heuristic search for k-circulant supersaturated designs In this section, we establish a genetic algorithm approach for the construction of k-circulant supersaturated designs. Genetic algorithms are a powerful heuristic that mimicks processes from the theory of evolution to establish search algorithms by defining algorithmic analogues of biological concepts such as reproduction, crossover and mutation. Genetic algorithms were introduced in 1970 by Holland (1975) aiming to design an artificial system having properties similar to natural systems. In this paper, we assume some basic familiarity with genetic algorithm concepts. The concepts
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
2961
necessary for a description of the simple genetic algorithm (SGA) can be found in Goldberg (1989), in Forrest’s article (1993) and in the Handbook of Genetic Algorithms edited by Davis (1991). Genetic algorithms have been used before in computational design theory. For instance, see Heredia-Langner et al. (2003) for applying genetic algorithms to the construction of D-optimal designs and Cela et al. (2000) for an analogue application to the construction of supersaturated designs. 4.1. Building supersaturated designs by means of genetic algorithms An appropriate formalism of the simple genetic algorithm in case of supersaturated designs is given below based on the parameters of the algorithm and the genetic operators that we have implemented. Chromosomes representation: A representation of the chromosomes in the genetic algorithm arises naturally from the definition of k-circulant supersaturated designs. Therefore, binary strings represent the row generators of the design. It is clear that the chromosome length is equal to the number of factors in the design. This encoding process adds to the compactness of the GA, since a small amount of storage space is needed for a supersaturated design to be represented by the SGA. In particular, for a supersaturated design with m factors and n runs it is required m bits to be reserved in memory, since only the representation of the generator is needed. In contrast, if we had to represent the whole design we would have to reserve n · m bits. Initial population consists of random chromosomes. We found it useful to generate these chromosomes by retrieving samples of Hadamard matrices. We initially selected rows of the Hadamard matrices that we produced, and a random permutation was applied to each of them. Finally, we shortened each row to fit the corresponding chromosome length. Details for Hadamard matrices are given in Geramita and Seberry (1979). An objective function for supersaturated designs: The choice of the objective function (OF) arises naturally from the optimality criteria given in Section 2. The OF is based on the E(s 2 )-criterion. The SGA attempts to minimize the value of the E(s 2 ), or equivalent to maximize the E(s 2 )-efficiency given in (5). When a value of E(s 2 )-efficiency is detected in the range of [0.95, 1.00] we have found a solution. Thus we are able to detect both E(s 2 )-optimal or near optimal supersaturated designs. We are now able to describe the three genetic operators of reproduction, crossover and mutation as specifically have been applied by the simple genetic algorithm we have used. Reproduction stipulates that chromosomes in the population with higher OF values (in the case where we maximize the OF) must be attributed a higher probability of contributing offspring in the next generation. The probability of any string entering the pool is proportional to its fitness value. This genetic operator is an algorithmic analog of natural selection in the theory of evolution. Although, always chromosomes with the highest fitness values will be chosen to enter the mating pool in each generation this does not reassure that these chromosomes will be better from those that are going to be replaced. The reproduction operator we implemented in a computer program by a biased roulette was based on the uniform distribution and finally determined the percentage of the chromosomes which were to be replaced. According to their fitness values, the most highly fit generators are paired and genetic operators are applied to them. The result of the reproduction operator is a mating pool, which contains the chromosomes of the new generation. Crossover acts on the chromosomes in the mating pool (the new generation) in two steps. First, these chromosomes are mated randomly into pairs. Second, each pair undergoes crossover by selecting a crossover site k randomly. This means that elements before and after the crossover site k are mutually exchanged. We also applied another variation of this genetic operator; for example we can have two crossover sites. Two point crossover calls for two points to be selected on the parent organism strings. More details on these operators can be found in Davis (1991). Mutation changes randomly a bit from −1 to 1 or from 1 to −1, according to a certain probability retrieved from the uniform distribution. This mutation probability is relatively small. We also applied another form of mutation based on the condition 2 of Theorem 2, and we derived that the appearance probability of +1 and −1 in a row generator of a k-circulant supersaturated design is (n − 2)/(2n − 2) and n/(2n − 2), respectively. Thus, for different values of the number of runs n, the mutation probability was selected from the range [0.52, 0.60]. We note that although this probability is relatively high for this genetic operator, it adjusted efficiently to our optimization problem. Termination condition of the simple genetic algorithm was set a predefined number of evolved generations. This number of generations was proportional to the size of the supersaturated design that the genetic algorithm was searching
2962
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
for in each case. Thus the SGA required only a few generations to find a small sized optimal or near optimal supersaturated design, while a larger design required additional generations to be evolved. Since GA is a heuristic process, the time complexity of the algorithm was relatively small compared to exhaustive search algorithms. We note that not all chromosomes that were produced via the simple genetic algorithm were suitable generators for balanced k-circulant supersaturated designs. We implemented an additional routine (based on Theorem 2) that kept only suitable generators in the mating pool. 5. Results In this section, we present some of the designs found via our GA. More details are given to the Appendix. Our initial heuristic search yield to the discovery of all known k-circulant classes for k = 2, . . . , 6 in the existing literature. Going a step further we established several improvements on a wide variety of runs and factors for these classes. Moreover, the flexibility of the genetic algorithm to evolve in the solution space defined by our formulation, enabled us to build new k-circulant supersaturated designs for k = 7, 8. We obtained a vast number of E(s 2 )-optimal or near optimal designs, reaching hundreds of generators in some cases. The generators given in Tables 2–6 are only a sample. We note that we constructed many designs which had the same properties, as far as concerning the optimality criteria given in Section 2, compared to Liu and Dean (2004). As a result these generators were omitted from Tables 2–6, although they are different from those of Liu and Dean (2004). We give a brief summary of our results (Table 1). Notation: We use the following notation throughout the presented tables: (1) We use + and − to denote 1,−1, respectively. (2) The designs marked “∗” have better (f)i vector than some or all of the corresponding k-circulant supersaturated designs of Liu and Dean (2004) and the same rmax . (3) The designs marked “@” have better (f)i vector than all of the corresponding k-circulant supersaturated designs of Liu and Dean (2004), but worse rmax . (4) The designs marked “ ” have the same (f)i vector and rmax as the corresponding k-circulant designs of Liu and Dean (2004). (5) The designs that appear unmarked are new in the entire corresponding class of k-circulant supersaturated designs. Some minor improvements on the k-circulant classes for k = 2, 3, 4 are presented in Table 2. The following example illustrates the construction of a 2-circulant supersaturated design in order to make clear how Tables 2–6 could be used. Example 3. An E(s 2 )-optimal 2-circulant design for m = 30 factors in n = 16 runs can be obtained from the following generator presented in Table 2. (− − − − + + + − − + + − + − + − − − − + − + − + + − − + ++) by repeatedly cycling elements 2 positions to the right and moving the last two elements to the first two positions. This procedure gives the first 15 rows of the design matrix X. A 16th row of +1’s is then added to produce a balanced Table 1 E(s 2 )-optimal or near optimal k -circulant supersaturated designs with n runs and m = k(n − 1) factors constructed via genetic algorithms k
Runs (n)
Factors (m)
2 3 4 5 6 7 8
6 n 22 8 n 18 8 n 16 8 n 22 10 n 16 10 n 22 12 n 20
10 m 42 21 m 51 28 m 60 35 m 105 54 m 90 63 m 147 88 m 152
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
2963
Table 2 Sample generator of E(s 2 )-optimal k -circulant supersaturated design with n runs and m = k(n − 1) factors for k = 2, 3, 4 constructed via genetic algorithms k
(n, m)
E(s 2 )-efficiency
E(s 2 )
(f)i
rmax
1.0000 8.83 (65.52, 27.58, 6.90)0 2 (16,30)∗ (− − − − + + + − − + + − + − + − − − − + − + − + + − − + ++)
0.50
1.0000 11.64 (59.09, 30.30, 10.61)0 0.50 3 (16,45)∗ (+ − − − − + + − − + + + − + + − − − − + + − − − − − − + − + − + + − + − + + − + − − + + +) 0.9850 13.39 (84.00, 9.33, 6.67)2 0.56 3 (18,51)∗ (− + + + − − + + − − − − − − − − + − − + + + + − + + + − − + + + − − − − + − + − − + + − + + − − − + +) 0.9850 13.39 (86.67, 9.33, 1.33, 2.67)2 0.78 3 (18,51)@ (− + − + + − + − − − + + − + + + − − − − + − − − − − + − − − − − + + + + + + − + + − + − + − + + + − −) 0.50 4 (16,60) 1.0000 13.02 (44.07, 47.46, 8.47)0 (− − + − + + − − − − + − − − + + − − − + + − − + − + + − − + − − + + + − + + − + + − − − + + + + + − −− − + − + − − ++) 0.75 4 (16,60) 1.0000 13.02 (45.76, 46.61, 6.78, 0.85)0 (− + + + − − + − + + + − + + − − − − + + + − − + + − + − + − + − − + − + + − + + − + − − + + − − − + −+ − − − − − − −+)
Table 3 Sample generators of E(s 2 )-optimal or near optimal 5-circulant supersaturated designs with n runs and m = 5(n − 1) factors constructed via genetic algorithms (n, m)
E(s 2 )-efficiency
E(s 2 )
(f)i
rmax
(16,75) 0.9756 14.18 (46.49, 42.70, 10.27, 0.54)0 0.75 (+ − − − + − − + − − + − − + − + − + + − − + − − − − − + − + + + − − − + − − − − − − − + + + + + + − − + + + + − + + + + − − − − + + + + + − − + − −+) 0.50 (16,75) 0.9877 14.11 (38.38, 52.97, 8.65)0 (+ − + + − − − + − + + − − − − + + − − + − + + + − + + + + + − + + − − − + − + − − − − + + + + + − + − − − − − + − − − + − − − − − − − + + + + + − +−) 0.50 (16,75) 1.0000 13.84 (37.84, 54.05, 8.11)0 (− − − + + − − + − − − + + + − − + + − − + − − − + + − − − + − + − + + − + − − + − − + − − + + − − + − − − + − + + + + − + − − + − + − + − + + + + +−) 0.56 (18,85) 0.9858 15.73 (71.90, 23.81, 4.29)2 (+ + + + + − + + + − + + + − + + + + + − − − − + + + + − − − − + − − − − − + − + − − − − + + + + − − − − − + − − − − + − + + + + + − − − − + + − − − − + − − − + − − + +−) 0.56 (18,85) 0.9954 15.58 (71.43, 24.76, 3.81)2 (+ − − − − − − + + − − − + + − + + + − − + + − + − + − + + + + + − − + − + − + + − + + − − − + + − − + − − + + − + + − + + − − + + − − − − − − − − + + − + − − + + − + −−) (20,95) 0.9766 17.43 (40.85, 44.68, 13.19, 1.28)0 0.60 (+ + − + − + + − − + − − + + − − + + + + − + + + + + − + − + + − + + − − − + − − + + − − − + + + + − + − − − − − − + − + + − − + + − + − − − − + − − − − − + − + − − − + + + + − + + − − − −−) (20,95) 0.9843 17.29 (39.57, 45.96, 13.62, 0.85)0 0.60 (− + − + − + + + + − + − − + + − + + − − + + − − − + − − − − − − − − + − − − + + + − + − + − − + − + − + − − − − − + + + + + − + − + − − + − + − + − + − + − + − − + + − + + + + + + − − + −−) 0.64 (22,105) 0.9708 19.26 (66.92, 26.92, 5.38, 0.77)2 (− − + − − − − − − + − + − + + − − − + − + + − + − + + + + + + − − + + − − + − + + − + − + − + − − − − + + − + + − + − − + + + + + − + + + − + + − − + + − − + − − + + − − − − − − − + + + + − − − − + + + − − −−) 0.46 (22,105) 0.9898 18.89 (62.69, 32.69, 4.62)2 (+ + + − − − − + − − + + + − − − − − − + + − − + − + + + + + − − − + − + − + + − − − − + − − + + − + + + + + + − + − − + + + − − + + − + − + + − − + + − + − + + − + + − − − + − − − + − − + + − − + + − − − − −−)
2964
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
Table 4 Sample generators of E(s 2 )-optimal or near optimal 6-circulant supersaturated designs with n runs and m = 6(n − 1) factors constructed via genetic algorithms (n, m)
E(s 2 )-efficiency
E(s 2 )
(f)i
rmax
(12,66)∗ 1.0000 11.08 (44.62, 50.77, 4.62)0 0.67 (− − + + − − − − − + + + + − + − + + − + − + − − + + − − + + − + − − − − + + − − + − + − + + − + + − + − −+ − − − − + − − + + + −−) 1.0000 11.08 (50.77, 42.56, 6.67)0 0.67 (12,66)∗ (+ + + − − + − − − + + + + − − − + − − + + − + + − − − − − − − − + + − − − − − + − + + − + − + − + + + + −− + + − + + − − + − − −+) 0.67 11.08 (53.85, 38.46, 7.69)0 (12,66) 1.0000 (+ − − − − + − + + − − − + − + + − − − − + − + − + + − + − − + + + + − + + − − − + − − − − − − + − − − + ++ − + − + + − − + + − ++) 0.71 (14,78) 0.9892 12.87 (77.49, 19.91, 2.60)2 (− − − + − − − + − − + − + + + − + + + + − + − + + − + − − − + + − + + + + + + + − + − − + − + − − − + − −− − + − − + + + − − − − − − − + + − + − − − + +−) 0.71 (14,78) 0.9892 12.87 (78.35, 18.61, 3.03)2 (− − − − − + + − + + − + + + + − − − − − − + − + + − − + + − − + − + + + + + − − − − + + − − + + − − + − ++ − + + − − − − + − + − − − − + − + − + − + + +−) 0.50 (16,90) 0.9836 14.62 (40.07, 49.44, 10.49)0 (+ + + + − − − + − + − − − + + − + + + − − + + + − − − − − + − − + + − − + − − − − − − − + − + − − + − − +− − − − − − + − + + − + + + − + + − + + + − + + − + − − − − − + + + + ++) 0.50 (16,90) 0.9917 14.50 (39.70, 50.19, 10.11)0 (+ + − − − − − + + − + + − + − − − − − − − + − + − − − − + − + + + − + − + − − + + − + + + − + + − − + + +− − − + + − + + + − − − + − + + + + + − − − + − − + − + + − − + − − − −+) (16,90) 0.9917 14.50 (41.57, 48.31, 9.74, 0.37)0 0.75 (− + − + + − − + − − + + − + − + − − + − + − − − − + − + + + + + − − + − + − + − + + + − + + − + + − − + −+ − − − − − − + − − + − + − + + + + − + + + − − − − − + − + + − − + − −−)
design, with the following design matrix X.
The statistical properties of the previous design according to the optimality criteria defined in Section 2 can be viewed from the last three columns of Table 2. In addition, in Table 7 in the Conclusion we give an extensive comparison for E(s 2 )-optimal supersaturated designs only, appeared in the existing literature and found by algorithmic approaches, having n runs and m = k · (n − 1)
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
2965
Table 5 Sample generators of E(s 2 ) near optimal 7-circulant supersaturated designs with n runs and m = 7(n − 1) factors constructed via genetic algorithms (n, m)
E(s 2 )-efficiency
E(s 2 )
(f)i
rmax
0.60 (10,63) 0.9966 9.75 (82.03, 17.97)2 (+ + + − − − − − − + − − + + − − − + + − − + + + + + + + − + − − − − + − + − − + − + − − − + + − − + − + − − + − + − − + − +−) 0.67 (12,77) 0.9895 11.49 (48.50, 44.74, 6.77)0 (− + + + − − + + − + + − + + − − − − − − − + − + − + − − − − − − + + − − − − − − + + − + − + − − + − + − − + − − + + + − + + − + + + + − − + + − − + + +−) 0.71 (14,91) 0.9900 13.24 (75.56, 22.22, 2.22)2 (+ − + + − − − − − − − − − + + − − − + + − − + − + − + − − + − + − − − + − − + − − + − + + − + − + − − − + − + − − + + − + − + − − + − + + + + − + − + + − + + − − + − − + + + + − ++) (16,105) 0.9711 15.21 (43.41, 44.23, 12.09, 0.27)0 0.75 (− + + − − − − − + + − + − − + − − − + − − + + + − + + + + + − + − − − − + − − − − + + − − + − + + − + + + + + − − − + + + + − + − − + − − + − − − + − + − − − − + − − + + + − − + − + − − + − − + + + − + − + +−) 0.56 (18,119) 0.9885 16.71 (69.01, 26.63, 4.36)2 (+ + + + − + − − + − + − − − − + + + + + − + − − − − + + − − + + − − + − − + − + + + + + + + − + − + + − + − − −+−+−−+−++−−+−+−−−++−+++−−+−+−−+−+−+−−+−++−−+−+−−−+−−−− + + − − − − + − −) 0.78 (18,119) 0.9885 16.71 (69.25, 26.63, 3.87, 0.24)2 (− + + − − − + − − + − + + + − − − + − + + − − − + + + − + − + − + + − + − + − − − + − − + + + − − + + − + − + −−−+++++−+−−−−−+++−+−+−+−−−++++−++−−++++−−−++−−−−−+−−−− + − − − − + + − +) 0.56 (18,119) 0.9931 16.63 (68.28, 27.85, 3.87)2 (+ + + + − + + − − − + − − − − − + − + + − + + + − + + + − − + − + + − + − + + + − + − + + + − − + + − − − + + ++++−−+−++−+−−−++−−++−−−−−+−−+−−+−−+−−−+−++−++−−−−−−−−+ − + − + − + − − −) (20,133) 0.9813 18.53 (39.61, 44.81, 13.85, 1.73)0 0.60 (+ − − − + − + + + + − + − − + − + + − + − − + + − + − − − − − + + − − − + − − + + + − − − + − + − + + + − − − +++−++++++−+−−−+−+−−−+−+−++−+−−++−+++−++−−−+−−+−+−−+−−− + + − − + − + + − − + + − − + − − − − − + − +) (20,133) 0.9850 18.45 (38.10, 45.89, 14.94, 1.08)0 0.60 (− + − − − + + − + + − − + + − − − + − + − − + + − + − − + + − + − + − + + − − + + − + − + + + − + − − + − − − +−−−++−−++++−−−−−−−−−+−−−++−+−−++++−+−+−+−−++−+−−++−+−+ − + + + − − − + − + − − + − + − − + + − + + +) 0.64 (22,147) 0.9562 20.85 (67.71, 23.87, 7.24, 1.17)2 (− + + + + + − + + + + − + − + − − − − − + + − − + + + + + + − − − − − + + + − + − − − − − + − + − − − + − + − +−+−+−−++−−+−−−+−+−−−+++−−−−+−++++++−+−−+++−−−+++−−−−−+ − − − + + + + + − − − − − − + + − − + − + − − + + + − − − − + − + + − + +) 0.64 (22,147) 0.9798 20.34 (66.14, 27.01, 5.68, 1.17)2 (+ − + + + − + + + − + + − − + + − + + + + − − − − + − − − + + − − + + − − − − − + + − − + − − − + − + − + + + −++++−−+−−+−+++−−+−+−−+−++−+−++−−+−−−−−−−+−−−++−+−++−++ + − + + − − − + + + + + + − − − + − − + − − + − − + − + − − − + − − − − +)
factors where k is any positive integer, since the OF of the SGA is based on the E(s 2 )-criterion. More precisely, the supersaturated designs that are being compared include the designs of Lin (1993a) based on half-fractions of Hadamard matrices, the designs of Lin (1995) found by a construction algorithm, the k-circulant designs of Liu and Dean (2004), the designs of Nguyen and Cheng (2006) based on BIBDs, and the designs of Ryan and Bulutoglu (2006) constructed by the NOAk algorithm.
2966
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
Table 6 Sample generators of E(s 2 )-optimal or near optimal 8-circulant supersaturated designs with n runs and m = 8(n − 1) factors constructed via genetic algorithms (n, m)
E(s 2 )-efficiency
E(s 2 )
(f)i
rmax
(12,88) 0.9921 11.68 (48.56, 44.25, 7.18)0 0.67 (− − − + − − − + − + − + − + − − + + − + + − + − − − + − + − + − + − − + + − − − − − + − − − + + − + − − + + + + + + + − − + + − − − + − − + − − + − − + − − − + + + + − + + − +) 0.67 (12,88) 1.0000 11.59 (47.41, 45.98, 6.61)0 (− + − − − − − − + − − − + + + − + + + + + − − − − + + − + − + + + − − − − − − + − + + + − + + + + − + + − − − − + − + + − − − + − − − − + + + − − − − − + + − − − + − + − + + +) 0.71 (14,104) 0.9885 13.48 (75.73, 21.60, 2.67)2 (− + − + + + − + − − − − + + − − + − + − − − + − + + + − + − − + + − − − + + − − − − + + − − − + + + − + − − + + + + + − − − + − − − + − − + + − − + − − + − + + − − + + − + + − + + − + − + − + − − − + + − − −) 0.71 (14,104) 0.9942 13.40 (75.49, 22.09, 2.43)2 (− − − + + + − + + − − − + + + − − + + + − − + + − + + − − + + − − + − − − − + + + − − − − − − + + + − + − + + − − + + + + + − − + + + − + − + − + − − + + − − − − − + − − + − + + − + + + − − − − − − − − − − +) 0.75 (16,120) 0.9912 15.19 (40.13, 49.58, 9.45, 0.84)0 (− + − + − + − + − − + + + − + − + − − − − + + + + + − − − + − + − − + − + + − + − + + − − − + − − + − + + − + +−++−−−−−++−−−+−−−−−+−−+−+−−−+−+++++++−+−+−+−++−−+−+++− − + − − − + − + −−) 0.56 (18,136) 0.9726 17.27 (68.52, 26.48, 5.00)2 (− − − − + − + − + + − + + + − + − − − + − − − − − − − + − − + − − − − + − + − − + + − − + + − + + + + − − + − −−+−+−+++−−+−+++++−−−+−−−−−+−+−++−+−−+−−++−++−−+−−+++−+ − + + − + + + − + − + + + − − − − + + + + − − + +−) 0.78 (18,136) 0.9793 17.16 (69.26, 26.11, 4.26, 0.37)2 (+ − + − − − + − + + − − − − + − + + − + + − + + − + − + − + − − + − + − − + − + − + + + + + − + − − − + + + + −−−++−+−+−−−−++−−−−+−−−+−−++++−−−++−+−−−++−−−−++++−−++− + − − + − − − + + + + − + − + − − + − + + − + − −−) 0.78 (18,136) 0.9895 16.98 (67.78, 28.33, 3.70, 0.19)2 (+ + − − + + − + + + + − + + − + − − − − + + + − − + + + + − − − − − + + − − + + + + + + − + + − − + + − − − − ++−−−++−−−−−+−−−−++−−−−+−−−−+−−+++−++−+++−−−++−+−++−+−+ − + + − + − − − − − − − + − + − − + − + − − + + +−) 0.60 (20,152) 0.9615 19.28 (39.57, 43.71, 14.74, 1.99)0 (+ − + − + − − + − − + − − + − − + − − − + + − − − − − + − − + − + + − − + − + + − − − + − − − + − − − − + + + −+++−−+++++++−−+−−+−+−+−+++++−−+−−−+++−++−−+−−+−−−++−−+ − − + + − + + − − + − + − − + + + − + − − − − − + + + + − + + − − − − − + + + + −+) 0.60 (20,152) 0.9777 18.96 (37.42, 46.69, 14.24, 1.66)0 (− − − + − − − − − + + − + − − + + + − − + − + − − − − − − + − − + − + + + + + + − − + + − + + − + + + + + + − +−−+−−−++−−+−+−−+−+−+−−−−+−++−+++−−+−−+−++++−+−−−++−+−− − − − − − − − + + − + + − − + − + + + + − + − + + − + + − − + − + − − − − + + + −+) (20,152) 0.9804 18.91 (35.93, 49.17, 13.25, 1.49, 0.17)0 0.80 (− − − + + − − − + − + − + + + − − − − − − − − + + + − + − − + − + + − − − − − + − − + − − + + − + − + − − − + ++−++++−+−+−+−++−−−−−−−−+−++−++−−++−−++−+++−+−+−−−+++++ + − + − + + − − + + − − − + − − − − + + + + + + + − − + − − + − − + − − + − + − ++)
From these comparisons it is shown that our heuristic approach via genetic algorithms for the construction of E(s 2 )optimal supersaturated designs appears to be extremely promising due to the fact that it can easily compete the existing algorithms. Furthermore, in the case of k-circulant supersaturated designs our SGA was able to overcome, in some cases, the search algorithms which were used by Liu and Dean (2004).
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
2967
Table 7 Comparison for E(s 2 )-optimal supersaturated designs with n runs and m = k(n − 1) factors n
Lin (1993a)
Lin (1995)
Liu and Dean (2004)
Nguyen and Cheng (2006)
Ryan and Bulutoglu (2006)
Section 5
6 8 10
10 – 18
10 – –
10 14, 21, 28, 35 18, 36, 54
10 14 18
10 14, 21, 28, 35 18, 36, 54
12
22
22, 33, 44, 55, 66
22, 33, 44, 55, 66
22
14
26
26, 39, 52, 65, 78, 91, 104
26, 52
26
16 18 20 22
– 34 38 42
30 34, 51, 68, 85, 102 – 42, 63, 84
30, 45 34 38 42
30 34 38 42
– – 18, 27, 36, 45, 54, 63, 72 81, 90, 99, 108, 117, 126 22, 33, 55, 66 110, 132, 220 26, 39, 52, 65 78, 104, 130, 156 – – – –
22, 33, 44 55, 66, 88 26, 52 30, 45, 60, 75 34 38 42
6. Conclusion The approach to construct optimal supersaturated designs by means of optimization is of current interest. Our efforts were concentrated on the minimization of the value of the E(s 2 ) via a heuristic search, because the E(s 2 )-criterion is one of the most well-known criteria for comparing supersaturated designs. The flexibility of genetic algorithms allows different objective functions to be optimized. Therefore, if another criterion was under consideration our SGA could be applied in a similar manner. However, we would have to evaluate a number of parameters for the success of such an application. Due to the randomness of the genetic algorithms a different approach may behave better or worse. Genetic algorithms appeared to be a successful and promising approach for the construction of E(s 2 )-optimal or near optimal k-circulant supersaturated designs since their compactness of encoding allowed us to use OF information (not derivatives) and probabilistic transition rules (not deterministic). Furthermore, the encoding process of the chromosomes to generators significantly restrained the space complexity, thus we were able to represent large supersaturated designs with a few amount of storage space. Finally, all the previous features resulted in an overwhelming population of k-circulant generators in a few generations, considerably restricting the time complexity for the execution of the GA. Acknowledgements The research of the second author was financially supported by Greek State Scholarships Foundation (IKY). Appendix The best k-circulant supersaturated designs that we achieved to construct with the aid of the genetic algorithm, given in Section 4, are presented in Tables 2–6. References Booth, K.H.V., Cox, D.R., 1962. Some systematic supersaturated designs. Technometrics 4, 489–495. Bulutoglu, D.A., Cheng, C.S., 2004. Construction of E(s 2 )-optimal supersaturated designs. Ann. Statist. 32, 1662–1678. Butler, N., Mead, R., Eskridge, K.M., Gilmour, S.G., 2001. A general method of constructing E(s 2 )-optimal supersaturated designs. J. Roy. Statist. Soc. Ser. B Statist. Methodol. 63, 621–632. Cela, R., Martinez, E., Carro, A.M., 2000. Supersaturated designs. New approaches to building and using it: part I. Building optimal supersaturated designs by means of evolutionary algorithms. Chemometrics Intell. Lab. Syst. 52, 167–182. Davis, L., 1991. Handbook of Genetic Algorithms. Van Nostrand, Reinhold. Forrest, S., 1993. Genetic algorithms: principles of natural selection applied to computation. Science 261, 872–878. Geramita, A.V., Seberry, J., 1979. Orthogonal Designs: Quadratic Forms and Hadamard Matrices. Marcel Dekker, New York, Basel. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA.
2968
C. Koukouvinos et al. / Computational Statistics & Data Analysis 51 (2007) 2958 – 2968
Heredia-Langner, A., Carlyle, W.M., Montgomery, D.C., Borror, C.M., Runger, G.C., 2003. Genetic algorithms for the construction of D-optimal designs. J. Qual. Technol. 35, 28–46. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems, an Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. University of Michigan Press, Ann Arbor, Michigan. Lin, D.K.J., 1993a. A new class of supersaturated designs. Technometrics 35, 28–31. Lin, D.K.J., 1993b. Another look at first-order saturated designs: the p-efficient designs. Technometrics 35, 284–292. Lin, D.K.J., 1995. Generating systematic supersaturated designs. Technometrics 37, 213–225. Liu, Y., Dean, A., 2004. k-circulant supersaturated designs. Technometrics 46, 32–46. Nguyen, N.K., 1996. An algorithmic approach to constructing supersaturated designs. Technometrics 38, 69–73. Nguyen, N.K., Cheng, C.S., 2006. New E(s 2 )-optimal supersaturated designs constructed from incomplete block designs. Technometrics, in press. Plackett, R.L., Burman, J.P., 1946. The design of optimum multifactorial experiments. Biometrika 33, 305–325. Ryan, K.J., Bulutoglu, D.A., 2006. E(s 2 )-optimal supersaturated designs with good minimax properties. J. Statist. Plann. Inference, in press. Tang, B., Wu, C.F.J., 1997. A method for constructing supersaturated designs and its E(s 2 )-optimality. Canad. J. Statist. 25, 191–201. Wang, J.C., Wu, C.F.J., 1992. Nearly orthogonal arrays with mixed levels and small runs. Technometrics 34, 409–422.