Journal of Statistical Planning and Inference 15 (1986) 19-27 North-Holland
19
ON OPTIMAL DESIGNS AND COMPLETE CLASS THEOREMS FOR EXPERIMENTS WITH CONTINUOUS AND DISCRETE FACTORS OF INFLUENCE W. WIERICH Fachbereich Mathematik, Freie Universitiit Berlin, Arnimallee 2-6, 1000 Berlin 33, West Germany Received 6 August 1985; revised manuscript received l0 January 1986 Recommended by F. Pukelsheim
Abstract: Linear models with one discrete factor (treatment) and several continuous factors are considered. Restriction to product designs, i.e. designs with equal regression design measure for all treatment levels facilitates the optimal design problem considerably. It is shown that the product designs form an essentially complete class with respect to D-optimality for inference on (i) the treatment effects (ii) the regression parameters (iii) the combined parameter vector. These theorems yield characterizations of optimal designs in terms of optimal designs for pure regression setups. Applications of the results for special regression functions are included as well as open problems concerning A-optimality. A M S Subject Classification: Primary 62K05; Secondary 62J99. Key words: Linear models; Qualitative and quantitative factors; Determinantal inequality; Complete class theorems; Product designs; D-optimality; A-optimality; Mixture designs.
1. Outline Throughout the last decades there has been a large number of research papers on optimal designs for both linear models o f the ANOVA type and of the linear regression type. The optimum design problem for more complex linear models which include an ANOVA term and an additive linear regression term, i.e. which include discrete (qualitative) factors of influence as well as continuous (quantitative) ones, has been attacked in the last years by several authors (see e.g. Harville (1975), Lopes-Troya (1982), Kurotschka and Wierich (1984), Wierich (1984)). Linear models with discrete and continuous factors of influence seem to be appropriate in many real life situations. Different varieties of fertilizer (discrete factor) may be tested in different doses (continuous factor), several brands of tires may be tested with different atmospheric pressure or some chemical process may be of interest in dependence on ingredients, temperature and atmospheric pressure, etc. The above mentioned papers give results on exact designs, but the theory of exact designs leads 0378-3758/86/$3.50 © 1986, Elsevier Science Publishers B.V. (North-Holland)
20
I4I. Wierich / Optimal designs and complete class theorems
to difficult combinatorial problems even in moderately high dimensional models. Recently Kurotschka dealt with the optimum design problem for experiments with qualitative and quantitative factors in terms o f generalized designs (Kurotschka (1981)). There the case of complete interaction between discrete and continuous factors has been solved by reducing it in some sense to optimum design problems for the linear regression term, whereas for the case of no interaction between discrete and continuous factors he uses heuristical arguments to restrict attention to the class of product designs, i.e. the class of designs with equal allocation of the quantitative factors for each combination of the levels of the qualitative factors. In this paper for the D-optimality criterion the heuristical arguments are replaced by rigorous proofs; Section 3 includes the complete class theorems, based on two inequalities for matrices and determinants respectively, as well as counterexamples and open questions. Section 4 gives general results on D-optimal designs based on the complete class theorems which are applied to several examples in Section 5.
2. Preliminary notions We consider the following linear model: (I)
EX(i, t) = Bi (1) + a' (t )B(2)
where i e { 1, ..., I} =: T (1) denotes the discrete factor level (treatment), t e T(2)C ~ r the continuous factor combination, a'= (al, ..., au) the vector of known regression functions, B(1) e ~z the unknown vector of treatment effects and//(2) the unknown vector of regression parameters. Observations are assumed to be uncorrelated and to have equal variances. It should be realized that models with more than one discrete factor and complete interaction between those, i.e. models with I-Im=l L lm treatment parameters Pi,..... i,(1), may be reduced to the above 1-discrete-factor case by simply relabeling the parameters. Let f'(i) := (~u, ---, ~ii) denote the vector of Kronecker symbols and g ' : = (f', a'). Then we get for any generalized design, i.e. for any probability measure 6 on T:= T O) x T (2), the following information matrix of ~ for fl' := (B'(1),ff(2)):
IB(O)=Eegg,=(E,~ff' \Eaa f"
Eafa"~=: (Ill(di) E,~aa'] \I~2(a)
/12(dO'~
(1)
I22(~)J"
Here E,~ denotes the integral with respect to 6. (All integrals are assumed to exist and to be finite.) From (1) the information matrices of ~ for//(1) and//(2) may be derived:
IB(I)(t~)--/11 (t~)-
I12(~)I22(~)I~2(~),
(2)
~tB(2)(t~)-- I22(t~) -/12 (t~)IH (t~)I12(~).
(3)
(For any real matrix M a generalized inverse of M is denoted by M - . ) It is well
W. Wierich / Optimal designs and complete class theorems
21
known that for any exact design d the parameter vector B is linear estimable iff lp(d) is nonsingular and that in this case la(d) is, up to a constant factor, the inverse of the covariance matrix of the BLUE for t ; analogous results hold for fl(1) and #(2). A n y ~ e A (class of generalized designs) can be written as ~ = v ® u where v denotes a probability measure on T 0) and a a Markov kernel from T °) to T ~2). Thus we get for any ~ = v ® a e A, 111 (v ® a ) = diag(v(1), ..., v(1)) =: Ill (v),
(4)
I;2(v ® a ) = (v(l)Eo(., l)a, ..., v(I)E~(, z)a),
(5)
I22(v ® a ) = E~aa'= Eaaa'= I22(v ® av),
(6)
where av stands for the v-mixture of a. v ® av is a (generalized) product design, i.e. belongs to the class A p : = { 8 = v ® / z ; v,/z probability measures on T 0) and T (2) resp.}. Definition. For j = l , 2 a design 8 * c A is called D~t/)-optimal iff it maximizes det IBti)(8) within A, and it is called Dp-optimal iff it maximizes det IB(8) within A.
3. Complete class theorems In Section 4 we will see that the design problem for product designs is much easier than for general designs. Therefore it is important to know whether Ap forms an essentially complete class with respect to the different optimality criteria, i.e. whether for any 8 e A there is a 8~ e Ap which is 'not worse' than 6. We need the following lemma the p r o o f of which is due to L. Eisner. Lemma 1. For e ' : = ( 1 , . . . , l ) , any positive definite diagonal matrix A with trace(A)= 1 and any p.s.d, matrix B such that A - B is p.s.d., the following inequality holds: det(A - B) < det(A )e' (A - B)e.
(7)
Proof. If A - B is singular the inequality is obviously true. For positive definite A - B the matrix C : = A - I / 2 ( A - B ) A -1/2 has positive eigenvalues ]2i not greater than 1. Thus we get ][I/zi = det C < min tl i "< (e'A I/2CA 1/2e)/(e'a l/2a I/2e) i
i
= e'(A - B)e/e'Ae = e'(A - B)e which implies det(A - B) = det(A)det(C)_< det(A)e'(A - B)e. Theorem 2. ,dp is essentially complete with respect to DBo)-optimality.
22
W. Wierich / Optimal designs and complete class theorems
Proof. Let v ® tr be any design with nonsingular information matrix I#o)(v® a).
With A :=Ill(V)
and
B:=I12(v®tT)I22(v®ty)I~2(v®tr )
we calculate Ipo)(v ® a) = A - B.
(8)
If try again denotes the v-mixture of tr and J the I x / - m a t r i x with all entries equal to one we get Ii2(v ® tr v) = Ill(V)Jll2(V @ tr) which implies (remember I22(v ® tr) = I22(v ® try)):
Ipo)(v ® av) 61(v) - fi (v)JI 2(v ® a)Ifi( v ® a)I 2(v ® a)Jlll(V) =
= A - Ilx(v)JBJIll(v ) = A - e'Be(v(i)vU))id. Consequently det Iotl)(v ® try) = det(A)(1 - e'Be) = det(A)e'(A - B)e.
(9)
Now it follows from (7), (8) and (9) that v®tr~ is not worse than v Q t r : det I#(l)(V® tY)< det IB(I)(V ® av).
(10)
Theorem 3. d p is essentially complete with respect to DB-optimality. Proof. Let v ® tr be any design with nonsingular IB(v ® tr). Partition (1) and equations (2), (6) and (10) show 0 < det IB(v ® tr) = det I22(V® tr)det/#o)(v ® tr) = det I22(v ® trv)det Ip(l)(V ® tr) _
1B(E)(V® ~) = ~ v(i)[Ea(.,i)aa' - Ea(., i)aEo.t.,i)a'] i
by means of (3)-(6).
W. Wierich / Optimal designs and complete class theorems
23
Lemma 5. Let (I21,~'1, v) be a probability space, tr a Markov kernel from (g21,Jl) tO (I22,~¢2) and Z a real random vector on (g22,,~/2) whose covariance matrix with respect to trv is finite. Then Cova~ Z - l C°v~r("c°')Zv(d°gl) is p.s.d. The proof follows easily by first looking at the case of a one-dimensional Z and then using the identity y'(Cov Z ) y = Vary'Z for all y.
4. D-optimal designs We now obtain characterizations of the various D-optimal designs. These are stated in Theorem 6 below. It turns out that the results of Section 3 enable us to reduce the problem of determining DB-, DBO)- and DB(2)-optimal designs in the complex model (I) to the respective problems in a corresponding simple linear regression model (II): (II)
EX(t) = y + a'(t)fl(2),
t ~ T (2).
To this purpose we will evaluate the interesting information matrices of product designs v®/2 and their determinants in model (I) (cf. Kurotschka (1981)): (la)
IB(v®/2)=\(Eua)v,
Euaa' ]
with v := (v(l), ..., v(I))'. (2a)
IBo)(v®/2)= Izl(V)-C(/2)vv"
with cO):=Eua'(Euaa')-Eua. (3a)
b(z)(v®/2)= Euaa'-(Eua)(Eua')=: M(~).
(Note that this is the second order moment matrix of a with respect to/2 and that it is independent of v.) These equations yield the following determinants: I
det/#(v®/2)=det(M~u)) H v(i)
(11)
i=1 I
det/#(D(v®/2)=(1-c(,u)) H v(i).
(12)
i=1
For the simple linear regression model (II) and any design/2 on T (2) we get the information matrices
1 lir, a(2))~)= Eua
e a') Euaa,},
(13)
24
W. Wierich / Optimal designs and complete class theorems
Iy (/a) = 1 - c(/a),
(14)
IBtz)(u) = M(U),
(15)
and determinants det Itr ' ~(2))(a)= det IB(2)(/a) = det M(/a).
(16)
Theorem 6. S e t v * ( i ) = 1~I f o r i = 1, ... , L (a) If/a* is DB(2)-optimal in the linear regression model (II), i.e. maximizes detM(/a), then v*®/a* is D¢-optimal in model (I) and with any probability measure v on {1, ...,I}, v®/a* is Dp(2)-optimal in model (I). (b) If~a* is Dy-optimal in model (II), i.e. minimizes c(/a), then v*®/a* is DBtl)optimal in model (I). The proof follows easily from the complete class results in Section 3 by comparing (15) with (3a) and (11) and by comparing (14) with (12).
5. Applications We will now specialize the regression part of model (I), i.e. T (2) and the regression functions a and see what Theorem 6 gives us when a is of polynomial type, of exponential type and trigonometric type. (a) Treatments with additional polynomial regression. Let T (2)-- [a, b] by any finite real interval and a(t)= (t, t 2, ..., t r - l ) ' for t G T (2). Set t i : = } s i ( b - a ) + } ( a + b ) for i = l , . . . , r where s l = - l , Sr=l and s2,...,Sr-1 denote the roots of the derivative of the ( r - 1)-th Legendre polynomial and let/a~' give mass 1/r to each of the points ti. From Hoel (1958) and Guest (1958) and the equivariance of the D-criterion for polynomial regression we know that /a~ is Dy,B(2)-optimal and DB(2)-optimal in the simple polynomial regression model (II). (For a different method of derivation of these results see Titterington (1975).) Consequently v * ~ / a ~ , where v*(i)= 1/I for all i, is D~- and DptE)-optimal in the complex model (I). When interest is in//(1) only (or mainly) we look at two different cases:
(a) 0 [a, b]. For the one point measure e0 on 0 we get C(eo)=0 which shows with respect to (14) that e0 is optimal for the zero-intercept y in model (II) with Iy(e0)= 1. Consequently v*®eo is D~0)-optimal in (I). (fl) 0 ~ [a, b], i.e. we are enfaced with an extrapolation problem. With ti := ~(b - a)½cos((i - 1)~t/(r- 1)) + ½(b + a) and Li(x) ":,~iM ( x - t v ) / v I J*` (t,-Iv)
for i= 1, . . . , r
W. Wierich / Optimal designs and complete class theorems
25
let p~ denote the probability measure which gives weight
Pi'=lLi(-(b+a)/(b-a))
~, [Lv(-(b+a)/(b-a))[ v=l
to the points ti. (~ is 1 if 0 < a and ~ is - 1 if b<0.) From Hoel and Levine (1964) and equivariance considerations we know that p~' is optimal for ~, in the polynomial regression setup (II). (Note that example 4.2.10 in Bandemer and N~ither (1980) is incorrect.) Theorem 6 shows that v*®/z~' is Dao)-optimal for the model (I). (Note that the important ease of exponential instead of polynomial regression may be solved by a one to one transformation of the regression variable of the latter one.) (b) Treatments with additional trigonometric regression. T ~2~:= [0, 2rt], a(t) = (cos t, sin t, cos 2t,..., sin kt)' for all t. If p~ gives equal weight (2k + 1)-~ to each of the points
ti := 2 n ( i -
1)/(2k + 1)
for i = 1,..., 2k + 1 then p~ is Dr. B(2)- and D~(2)-optimal in the trigonometric regression model (II) (see e.g. Fedorov (1972), Chapter 2.4). Thus in model (I) v*®/t~ is D B- and Dpt2)-optimal. Since c(tt~) (defined in (2a)) is zero, equality (14) shows that p~ is also optimal for ~, in the trigonometric regression setup (II). Now Theorem 6 proves DB(l)-optimality of v*®~t~ for model (I) with additional treatment effects. (c) Treatments with additional multivariate linear regression on the unit q-cube. Let
T(2):= It~[Rq; max ]ti[< and a(t):= t for all t e T <2). I f / ~ gives equal weight 2 -q to the corners of T ~2), then /z~ is D~,Bt2)- and DBt2)-optimal in the multivariate linear regression Model (II) (see e.g. Kiefer (1961)). Since c(/z4*) is zero, (14) proves that/z~ is also D~-optimal in (II). Consequently v*®/z~' is DBO)-, Dpt2)- and D~-optimal for model (I).
6. Further criteria and unsolved problems
Two additional optimality criteria of statistical importance are U-optimality and A-optimality. F o r j = 1, 2 a design S* e A is called UBtj)-optimal iff IBtj)(S*) - IBtj)(S) is positive semidefinite (p.s.d.) for any S e A with nonsingular Ipt/)(S); it is called A~tj)-optimal iff it minimizes trace I~t~.)(S) among all S with nonsingular I~t/)(S). U B- and A~-optimality are defined analogously. In Theorem 4 we actually proved that At, is essentially complete with respect to UB(2)-optimality, which implies essential completeness of Ap with respect to D~(2)and A~(2)-optimality.
W. Wierich / Optimal designs and complete class theorems
26
The following example shows that generally d p is not essentially complete with respect to U0-optimality. Example 1. For K = 1 and
a(t) := t the model becomes
EX(i, t) = fl~(1) + tfl(2) For any design J = v ® tr with v(i)> 0 for all i and Ea(. ,i)a =/=Ea(.j~a for at least one tuple (i,j) one can show, that there is no product design J * with p.s.d, l # ( J * ) -
To prove 'Theorems 2, 3 and 4 we used a simple procedure: Substitute v ® tr by v ® try. Unfortunately this procedure does not work for A#- and A#0)-optimality. One then might try to find the product design v ® tr(-, i0) not worse than v ® tr, where tr(., i0) is the 'best' of the tr(., i), i = 1,..., L But this procedure too does not work for A#- and A#o)-optimality. This can all be seen by the following example.
Example 2. Take I = 2, T (2)= [-1, 1], a ( t ) = t for all t e T (2) and let tr(-, 1) give mass 0,5 to each of the points 0 and 0,5 and tr(-,2) give mass 0,5 to each of the points 0 and 1. With vl (1)= 1/10, Vl (2)= 9/10 and rE(1 ) = 1/4, v2(2 ) = 3/4 elementary calculations yield tr I~l(Vl ® t r ) = 16.8< 17.2 = tr I/~l(Vl ® try,), tr It~)(v2 ® tr) = 6 . 9 < 7.1 = t r
I~[)(VE®trv2).
For fl as well as for fl(1) the mixture is worse than tr with respect to the trace. For the second procedure we see: tr
I~l(vl ® tr) = 16.8 < 17.1 = tr I~l(Vl ® tr(., 2)) <29.1 = tr I~l(Vl ® t r ( - , 1)),
tr I ~ ) ( v 2 ® tr) = 6 . 9 < 7.1 = t r It~1)(v2 ® tr(., 1)) = t r I~)(v2 ® tr(-, 2)). So it may be worse with respect to the trace to use one of the measures tr(., i) instead of the kernel. Nevertheless I conjecture that the product designs constitute an essentially complete class also with respect to A~- and A~0)-optimality.
Acknowledgement I would like to thank the referees for some suggestions which hopefully made the paper more streamlined.
W. Wierich / Optimal designs and complete class theorems
27
References Bandemer, H. and W. Nfither (1980). Theory und Anwendung der Optimalen Versuchsplanung II. Akademie-Verlag, Berlin. Fedorov, V.V. (1972). Theory o f Optimal Experiments. Academic Press, New York-London. Guest, P.G. (1958). The spacing of observations in polynomial regression. Ann. Math. Statist. 29, 294-299. Harville, D.A. (1975). Computing optimum designs for covariance models. In: J.N. Srivastava, Ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam. Hoel, P.G. (1958). Efficiency problems in polynomial estimation. Ann. Math. Statist. 29, 1134-1146. Hoel, P.G. and A. Levine (1964). Optimal spacing and weighting in polynomial prediction. Ann. Math. Statist. 35, 1553-1560. Kiefer, J. (1961). Optimum designs in regression problems, II. Ann. Math. Statist. 32, 298-325. Kurotschka, V. (1981). A general approach to optimum design of experiments with qualitative and quantitative factors. In: Proc. o f I.S.I. Golden Jubilee Intern. Conf. on Stat.: Appi. and New Directions, Calcutta, 353-368. Kurotschka, V. and W. Wiedch (1984). Optimale Planung eines Kovarianz-analyse- und eines Intraclass Regressions-Experiments. Metrika 31,361-378. Lopes-Troya, J.. (1982). Optimal design for covariates models. J. Statist. Plann. Inference 6, 373-419. Titterington, D.M. (1975). Optimal design: Some geometrical aspects of D-optimality. Biometrika 62, 313-320. Wierich, W. (1984). Konkrete optimale Versuchsplfine fiir ein lineares Modell mit einem qualitativen und zwei quantativen EinfluB-faktoren. Metrika 31,285-301.