Two-level supersaturated designs for 2k runs and other cases

Two-level supersaturated designs for 2k runs and other cases

Journal of Statistical Planning and Inference 139 (2009) 23 -- 29 Contents lists available at ScienceDirect Journal of Statistical Planning and Infe...

147KB Sizes 0 Downloads 42 Views

Journal of Statistical Planning and Inference 139 (2009) 23 -- 29

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Two-level supersaturated designs for 2k runs and other cases Neil A. Butler∗ School of Mathematical Sciences, University of Nottingham, University Park, Nottingham NG7 2RD, UK

A R T I C L E

I N F O

Available online 23 May 2008 Keywords: Effect sparsity E(s2 )-optimality Factor sparsity Fractional factorial Hadamard matrix Minimax

A B S T R A C T

Two-level supersaturated designs are constructed for n = 2k (k  5) runs and m factors where n + 3  m  5(n − 4). The designs so formed are shown to have a maximum absolute correlation between factors of 14 and to be efficient in terms of E(s2 ), particularly when the number of factors m is approximately double the number of runs n or greater. Thus, supersaturated designs with favourable properties are found for much higher numbers of runs than would be possible solely using algorithms. © 2008 Elsevier B.V. All rights reserved.

1. Introduction Supersaturated designs (Satterthwaite, 1959; Booth and Cox, 1962; Lin, 1993, 1995) are designs with at least as many factor main effects as there are experimental runs. The designs are particularly useful when there are many factors and when the cost of experimentation is expensive. The objective of experimentation using supersaturated designs is to determine the so-called active factors which have the most substantial effect on the response of interest. However, when using supersaturated designs, it is important to be aware that the complex aliasing structure between the factors makes it much more likely that the results of the analysis could be misinterpreted. Therefore, follow-up experiments (Meyer et al., 1996; Lewis and Dean, 2001) should usually be carried out on the factors that are highly correlated with the response to check that they represent real effects. In choosing a two-level supersaturated design, the aim is to minimise in some sense the pairwise correlations between factors, or strictly speaking between factor main effects. The two main criteria introduced by Booth and Cox (1962) are the minimisation of E(s2 ), where the average squared correlation between factors is minimised, and minimax, where the maximum absolute correlation between factors is minimised. Work on the minimax criterion includes Lin (1993), Wu (1993), Cheng (1997), Butler (2005) and Ryan and Bulutoglu (2007). Work on E(s2 )-optimality includes Nguyen (1996), Cheng (1997), Tang and Wu (1997), Liu and Zhang (2000), Butler et al. (2001), Bulutoglu and Cheng (2004), Liu and Dean (2004), Eskridge et al. (2004), Bulutoglu (2007), Koukouvinos et al. (2007, 2008) and Nguyen and Cheng (2008). In this paper, two-level supersaturated designs are constructed for n=2k runs (k  5) with m factors where n+3  m  5(n−4) except for the 11 cases where 3(n − 4) < m < 3n. Designs are also constructed for n a multiple of 64. The designs utilise some 16-run designs that are efficient in terms of E(s2 ) and minimax previously given by Butler (2005), Ryan and Bulutoglu (2007) and Liu and Zhang (2000). The designs so formed are shown to be appealing in terms of both E(s2 ) and minimax with a maximum absolute correlation between factors equal to 14 . Moreover, the methods in this paper allow much larger supersaturated designs, in terms of the number of runs, to be found than would be possible with search algorithms.



Tel.: +44 115 951 4949; fax: +44 115 951 4951. E-mail address: [email protected].

0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.05.013

24

N.A. Butler / Journal of Statistical Planning and Inference 139 (2009) 23 -- 29

2. Background Two-level supersaturated designs for n runs and m  n factors are specified by an n × m design matrix X. The design matrix X has all elements equal to ±1, and has an equal number of +1's and −1's in each column so that factors are orthogonal to the overall mean. The absolute pairwise correlation between factors i and j with factor columns xi and xj is |sij |/n where sij = xi xj . Note that factors are orthogonal if sij = 0. The E(s2 )-optimality criterion involves minimising E(s2 ) =

m m   1 s2ij , m(m − 1) i=1 j=1,j=i

where, for identifiability purposes, designs are not allowed to have completely aliased factors or columns. It turns out that E(s2 )-optimal designs tend to rely heavily on Hadamard matrices. An n × n Hadamard matrix H has all elements equal to ±1 and has orthogonal rows and columns so that H H = HH = nIn ; see, for example, Dey and Mukerjee (1999). A Hadamard matrix is said to be normalized if all the elements in the first row and first column of the matrix equal +1. Note that Hadamard matrices only exist if n = 1, 2 or n is a multiple of 4, and the most well known in statistics are the Plackett and Burman (1946) designs. E(s2 )-optimal designs are easily constructed from Hadamard matrices when the number of runs n is a multiple of 4 and the number of factors is m = q(n − 1) where q is an integer. In these cases, E(s2 )-optimality is achieved by designs with X0 = (F1 , . . . , Fq ),

(1)

where each Fi is an n × (n − 1) matrix formed by excluding the first column of an n × n normalized Hadamard matrix. However, it must be checked that the designs do not have any completely aliased columns. The designs attain a lower bound LB for E(s2 ) LB =

(m − n + 1)n2 (n − 1)(m − 1)

given by Nguyen (1996) and Tang and Wu (1997). Also X0 X0 = qnIn − qJn , where In and Jn are the n × n identity matrix and matrix of ones, respectively. Designs for n runs and m = q(n − 1) + r factors (0 < r < n − 1) can be formed by combining the design (1) with r extra columns (excluding the first column) from an additional Hadamard matrix; see Tang and Wu (1997). Again it needs to be checked that no factor columns are completely aliased. Such a design is called here a Hadamard constructed design. The following result appeared in Butler (2005) without proof but is proved here in the Appendix. Theorem 1. A Hadamard constructed design for n runs and m = q(n − 1) + r factors (0  r < n − 1) has m m   i=1 j=1,j=i

s2ij = [m(q − 1) + r(q + 1)]n2 .

(2)

Hadamard constructed designs are E(s2 )-optimal when min(r, n − 1 − r)  2 (Cheng, 1997), but they are not generally E(s2 )-optimal for other values of r (Butler et al., 2001). However, we shall see that Hadamard constructed designs often perform reasonably well under the E(s2 ) criterion, motivating the following definition. Definition. A design is said to be as E(s2 )-efficient as Hadamard constructed designs if m m   i=1 j=1,j=i

s2ij  [m(q − 1) + r(q + 1)]n2 .

(3)

3. 16-Run designs Butler (2005) found some two-level supersaturated designs for 16 runs with a maximum absolute correlation of rmax = 14 which

are as E(s2 )-efficient as Hadamard constructed designs. The construction of these designs is based on the following definitions. Define the 16 × 16 `regular design' Hadamard matrix         1 1 1 1 1 1 1 1 H0 = ⊗ ⊗ ⊗ . 1 −1 1 −1 1 −1 1 −1

N.A. Butler / Journal of Statistical Planning and Inference 139 (2009) 23 -- 29

25

Table 1 E(s2 ) lower bound and E(s2 ) for the designs in Butler (2005) for n = 16 runs m

36

48

52

56

60

E(s2 ) lower bound E(s2 ) of designs in Butler (2005)

10.57 10.97

12.14 12.26

12.55 12.74

12.80 12.97

13.02 13.02

Now define the 16 × 16 diagonal matrices D(i) (i = 1, . . . , 5) to have all diagonal elements equal to one except for (1)

for j = 1, 2, 3, 4,

Djj = −1

(2)

for j = 1, 5, 9, 13,

(3) Djj = −1

for j = 1, 6, 11, 16,

(4) Djj = −1

for j = 1, 7, 12, 14,

Djj = −1

(5)

Djj = −1

for j = 1, 8, 10, 15.

Also define the 16 × 16 Hadamard matrices Hi = D(i) H0 for i = 1, . . . , 5. Note that H0 = H0 . It is effectively shown in Butler (2005) that the correlation between any column from Hi1 and any column from Hi2 equals ± 14 for i1 = i2 , i1 , i2 = 1, . . . , 5. It is also shown that 12 columns of each matrix Hi (i  1) are orthogonal to the mean effect. For each i, define Xi to be the 16 × 12 matrix with those 12 columns. Then it is further shown in Butler (2005) that the design X = (X1 , . . . , Xt ) for 16 runs and 12t factors (t  5) has rmax = 14 and is as E(s2 )-efficient as Hadamard constructed designs. The final result in

Butler (2005) is that X = (X1 , X2 , X3 , X4 , V5 ) has rmax = 14 and is as E(s2 )-efficient as Hadamard constructed designs for 16 runs and 48  m  60 factors, where V5 contains any m − 48 columns of X5 . The supersaturated designs for 2k runs (k  5) will be constructed using the E(s2 )-optimal and minimax (rmax = 14 ) 16-run designs for m = 19 and 24 provided by Ryan and Bulutoglu (2007) and for m = 30 provided by Liu and Zhang (2000). They will also use the above designs for m = 36 and 48  m  60. Observe from Table 1 that the latter designs are very close to being E(s2 )-optimal. Note that the lower bound of E(s2 ) is that given by Butler et al. (2001) and Bulutoglu and Cheng (2004). 4. 32-Run designs The proposed supersaturated designs for n = 32 runs and m = 31q + r factors (1  q  4, 0  r < 31) have design matrix   Z0 Z1 . X= Z0 −Z1

(4)

Here, Z0 is the design matrix of a design for 16 runs and m0 = 15q + r0 factors (1  q  4, 0  r0  15) with rmax = 14 that is E(s2 )-optimal or as E(s2 )-efficient as Hadamard constructed designs. Also, Z1 = (H1 , . . . , Hq , G) is a 16 × m1 matrix, where m1 = 16q + r1 (1  q  4, 0  r1  16), Hi (i = 1, . . . , q) are defined in Section 3, and G consists of r1 columns of Hq+1 . Note that m = m0 + m1 and r = r0 + r1 (r0 , r1  0). The design matrix (4) partitions the factor columns into two groups. This has previously been done to construct E(s2 )optimal designs by Butler et al. (2001). In both cases E(s2 )-optimality or E(s2 )-efficiency is attained by finding an appropriate structure within the two orthogonal subspaces. Observe from (4) that factors from different groups are in orthogonal subspaces as the effect of the last 16 runs cancels out the effect of the first 16 runs. Moreover, within groups of factors, as rmax = 14 over both

the first 16 runs and the last 16 runs, rmax = 14 over all runs.

Theorem 2. The designs (4) have rmax = 14 and are as E(s2 )-efficient as Hadamard constructed designs. The proof is given in the Appendix. Table 2 provides the values of m, m0 , m1 and q for which the designs (4) can be found. It turns out that all cases 35  m  140 are covered except for 85  m  95. In some cases, such as m = 40, more than one choice of design is available but the choices are equivalent in terms of E(s2 ) and minimax.

26

N.A. Butler / Journal of Statistical Planning and Inference 139 (2009) 23 -- 29

Table 2 Designs (4) with rmax =

1 4

for 32 runs and m factors, with m0 factors from the first part of the design and m1 factors from the second part

m

m0

m1

q

35--51 40--56 57--62 62--78 68--84 96--124 124--140

19 24 30 30 36 48--60 60

16--32 16--32 27--32 32--48 32--48 48--64 64--80

1 1 1 2 2 3 4

Table 3 E(s2 ) lower bound and E(s2 ) for designs (4) in Table 2 m

40

50

60

70

80

90

100

110

120

130

140

E(s2 ) lower bound E(s2 ) of designs (4)

8.53 10.50

13.37 15.05

16.78 16.78

18.87 19.93

20.74 21.71

22.05 22.24

23.17 23.58

24.08 24.60

24.74 24.95

25.46 25.65

25.99 26.31

Table 3 provides E(s2 ) for the designs (4) in Table 2 along with the E(s2 ) lower bound, given by Butler et al. (2001) and Bulutoglu and Cheng (2004), for m a multiple of 10. Note that the `blip' at m = 60 where the two values are the same occurs because Hadamard constructed designs are E(s2 )-optimal when min(r, n − 1 − r)  2, as mentioned earlier. The table nevertheless shows that the designs in Table 2 perform well in terms of E(s2 ) when m is 60 or more. As these designs also have rmax = 14 , they should make good designs in practice. Whether the designs should be used for m  50 is more debatable despite their good minimax properties. 5. Higher run designs Let n = 2k (k  5) or let n be a multiple of 64. Also let p = n/16 and define H(p) = (1p , c1 , . . . , cp−1 ) to be a p × p normalized Hadamard matrix with columns ci . The proposed supersaturated designs for n runs and m = q(n − 1) + r factors (1  q  4, 0  r < n − 1) have design matrix X = (1p ⊗ Z0 , c1 ⊗ Z1 , . . . , cp−1 ⊗ Zp−1 ).

(5)

The design matrix Z0 is again that of a design with rmax = 14 for 16 runs and m0 = 15q + r0 factors (1  q  4, 0  r0  15) that is as E(s2 )-efficient as Hadamard constructed designs. The 16 × mi matrices Zi (1  i  p − 1) are given by Zi = (H1 , . . . , Hq , Gi ), where mi = 16q + ri (1  q  4, 0  ri  16) and Gi consists of ri columns of Hq+1 . Note that m = For example, for 64 runs, ⎞ ⎛ Z0 Z1 Z2 Z3 ⎜Z Z1 −Z2 −Z3 ⎟ 0 ⎟. X=⎜ ⎝Z −Z Z −Z ⎠ 0

Z0

1

−Z1

2

−Z2

p−1 i=0

mi and r =

p−1 i=0

ri .

3

Z3

In general, the design (5) partitions the factor columns into p orthogonal groups of factors. Moreover, within each group of factors, as rmax = 14 over each set of 16 runs, rmax = 14 over all runs. Theorem 3. The designs (5) have rmax = 14 and are as E(s2 )-efficient as Hadamard constructed designs. The proof is given in the Appendix. Table 4 provides the values of m, m0 , mi  1 and q for which the designs (5) can be found. The table covers all cases n + 3  m  5(n − 4) except for 3(n − 4) < m < 3n. Table 5 provides E(s2 ) for the designs (5) in Table 4 along with the E(s2 ) lower bound for n = 64 runs and m a multiple of 20. As in Table 3, observe that the designs perform well in terms of E(s2 ) for m factors when m is at least approximately double the number of runs. Designs with m < 2n appear to do less well.

N.A. Butler / Journal of Statistical Planning and Inference 139 (2009) 23 -- 29 Table 4 Designs (5) with rmax =

1 4

27

for n runs and m factors, with mi factors from the (i + 1)th part of the design

m

m0

mi  1

q

(n + 3)–(2n − 13) (n + 8)–(2n − 8) (2n − 7)–(2n − 2) (2n − 2)–(3n − 18) (2n + 4)–(3n − 12) 3n–(4n − 4) (4n − 4)–(5n − 20)

19 24 30 30 36 48--60 60

16--32 16--32 27--32 32--48 32--48 48--64 64--80

1 1 1 2 2 3 4

Table 5 E(s2 ) lower bound and E(s2 ) for the designs (5) in Table 4 for n = 64 runs m 2

E(s ) lower bound E(s2 ) of designs (5)

80

100

120

140

160

180

200

220

240

260

280

300

14.91 20.74

25.03 29.79

31.70 32.70

36.41 38.31

39.93 42.18

42.65 43.48

44.90 45.69

46.76 47.95

48.20 48.84

49.48 49.88

50.65 51.38

51.60 52.06

In summary, supersaturated designs have been found in this paper with rmax = 14 for n = 32, 64, 128, 192, 256, . . . runs and m

factors where n + 3  m  5(n − 4). These designs have been shown also to perform very well in terms of E(s2 ) for n = 32 and 64, particularly when m is approximately double the number of runs n or greater. The designs are thus worthy of consideration in practical applications. Acknowledgement I would like to thank the Editors and Referees for their helpful comments and suggestions. Appendix Proof of Theorem 1. First note that m m   i=1 j=1,j=i

s2ij = tr((X  X)2 ) −

m 

s2ii

i=1

= tr((XX  )2 ) − mn2 .

(6)

Next observe that X = (X0 , F) where X0 is defined by (1) and F is an n × r matrix with columns from the same Hadamard matrix. Thus tr((XX  )2 ) = tr((X0 X0 + FF  )2 ) = tr((X0 X0 )2 ) + 2 tr(X0 X0 FF  ) + tr((FF  )2 ). Now tr((X0 X0 )2 ) = tr((qnIn − qJn )2 ) = q2 n2 tr(In ) − 2q2 n tr(Jn ) + q2 n tr(Jn ) = q2 n3 − 2q2 n2 + q2 n2 = q2 n2 (n − 1) = (m − r)qn2 . Also tr(X0 X0 FF  ) = tr((nqIn − qJn )FF  ) = nq tr(FF  ) = rqn2 , tr((FF  )2 ) = tr((F  F)2 ) = rn2 .

28

N.A. Butler / Journal of Statistical Planning and Inference 139 (2009) 23 -- 29

Hence tr((XX  )2 ) = (m − r)qn2 + 2rqn2 + rn2 = [mq + r(q + 1)]n2 . The result then follows on substituting into (6).

(7)



Proof of Theorem 2. As in Theorem 1, m  m  i=1 j=1,j=i

s2ij = tr((X  X)2 ) − 322 m.

Also, as factors in separate groups are orthogonal, tr((X  X)2 ) = 4 tr((Z0 Z0 )2 ) + 4 tr((Z1 Z1 )2 ).

(8)

Now, as Z0 is as E(s2 )-efficient as Hadamard constructed designs for 16 runs and m0 factors tr((Z0 Z0 )2 ) = tr((Z0 Z0 )2 )  [m0 q + r0 (q + 1)]162 .

(9)

Denote Z1 = (L, G) where L = (H1 , . . . , Hq ). Then tr((Z1 Z1 )2 ) = tr((Z1 Z1 )2 ) = tr((LL + GG )2 ) = tr((LL )2 ) + 2 tr(LL GG ) + tr((GG )2 ). Now LL = H1 H1 + · · · + Hq Hq = 16qI16 ,

G G = 16Ir1 .

Therefore tr((Z1 Z1 )2 ) = tr((LL )2 ) + 2 tr(LL GG ) + tr((G G)2 ) = 163 q2 + 2 × 162 qr1 + 162 r1 = [(m1 − r1 )q + 2qr1 + r1 ]162 = [m1 q + r1 (q + 1)]162 . Substituting into (8), tr((X  X)2 )  [m0 q + r0 (q + 1)]322 + [m1 q + r1 (q + 1)]322 = [mq + r(q + 1)]322 . Hence, as n = 32, m m   i=1 j=1,j=i

s2ij  [m(q − 1) + r(q + 1)]n2

and so, from (3), the result is proved.



Proof of Theorem 3. As in the proof of Theorem 2, as factors in separate groups are orthogonal, tr((X  X)2 ) = (n/16)2

p−1 

tr((Zi Zi )2 ).

i=0

Now, from (10), tr((Zi Zi )2 ) = [mi q + ri (q + 1)]162

(10)

N.A. Butler / Journal of Statistical Planning and Inference 139 (2009) 23 -- 29

29

and tr((Z0 Z0 )2 ) is given by (9). Therefore, tr((X  X)2 )  (n/16)2

p−1 

[mi q + ri (q + 1)]162

i=0

= [mq + r(q + 1)]n2 . Then, from (6), m  m  i=1 j=1,j=i

s2ij  [m(q − 1) + r(q + 1)]n2

and the proof is complete.



References Booth, K.H.V., Cox, D.R., 1962. Some systematic supersaturated designs. Technometrics 4, 489--495. Bulutoglu, D.A., 2007. Cyclically constructed E(s2 )-optimal supersaturated designs. J. Statist. Plann. Inference 137, 2413--2428. Bulutoglu, D.A., Cheng, C.S., 2004. Construction of E(s2 )-optimal supersaturated designs. Ann. Statist. 32, 1662--1678. Butler, N.A., 2005. Minimax 16-run supersaturated designs. Statist. Probab. Lett. 73, 139--145. Butler, N.A., Eskridge, K.M., Mead, R., Gilmour, S.G., 2001. A general method of constructing E(s2 )-optimal supersaturated designs. J. Roy. Statist. Soc. B 63, 621--632. Cheng, C.S., 1997. E(s2 )-optimal supersaturated designs. Statist. Sinica 7, 929--939. Dey, A., Mukerjee, R., 1999. Fractional Factorial Plans. Wiley, New York. Eskridge, K.M., Gilmour, S.G., Mead, R., Butler, N.A., Travnicek, D.A., 2004. Large supersaturated designs. J. Statist. Comput. Simulation 74, 525--542. Koukouvinos, C., Mylona, K., Simos, D.E., 2007. Exploring k-circulant supersaturated designs via genetic algorithms. Comput. Statist. Data Anal. 51, 2958--2968. Koukouvinos, C., Mylona, K., Simos, D.E., 2008. E(s2 )-optimal and minimax optimal supersaturated designs via multi-objective simulated annealing. J. Statist. Plann. Inference 138, 1639--1646. Lewis, S.M., Dean, A.M., 2001. Detection of interactions in experiments on large numbers of factors (with discussion). J. Roy. Statist. Soc. B 63, 633--672. Lin, D.K.J., 1993. A new class of supersaturated designs. Technometrics 35, 28--31. Lin, D.K.J., 1995. Generating systematic supersaturated designs. Technometrics 37, 213--225. Liu, M.Q., Zhang, R.C., 2000. Construction of E(s2 )-optimal supersaturated designs using cyclic BIBDs. J. Statist. Plann. Inference 91, 139--150. Liu, Y.F., Dean, A., 2004. k-Circulant supersaturated designs. Technometrics 46, 32--43. Meyer, R.D., Steinberg, D.M., Box, G., 1996. Follow-up designs to resolve confounding in multifactor experiments. Technometrics 38, 303--313. Nguyen, N.K., 1996. An algorithmic approach to constructing supersaturated designs. Technometrics 38, 69--73. Nguyen, N.K., Cheng, C.S., 2008. New E(s2 )-optimal supersaturated designs obtained from incomplete block designs. Technometrics 50, 26--31. Plackett, R.L., Burman, J.P., 1946. The design of optimum factorial experiments. Biometrika 33, 303--325. Ryan, K.J., Bulutoglu, D.A., 2007. E(s2 )-optimal supersaturated designs with good minimax properties. J. Statist. Plann. Inference 137, 2250--2262. Satterthwaite, F.E., 1959. Random balance experimentation (with discussion). Technometrics 1, 111--137. Tang, B., Wu, C.F.J., 1997. A method for constructing supersaturated designs and its E(s2 )-optimality. Canad. J. Statist. 25, 191--201. Wu, C.F.J., 1993. Construction of supersaturated designs through partially aliased interactions. Biometrika 80, 661--669.