Journal of Statistical Planning and Inference 12 (1985) 347-352 North-Holland
347
ON UNIVERSALLY O P T I M A L DESIGNS FOR REGRESSION SETUPS Olaf KRAFFT Technical University Aachen, West Germany Received 31 October 1984; revised manuscript received 11 March 1985 Recommended by F. Pukelsheim
Abstract: The set of optimality criteria to which Kiefer's maximum trace principle is applicable is discussed. Furthermore, for a linear regression model y(x)= a'f(x) let ~* be an (approximate) D-optimal design for estimating a by its BLUE. Then ~* is universally optimal for estimating b =Ka by its BLUE, where K is a regular matrix determined by ~*. A M S Subject Classification: Primary 62K05; Secondary 62J05. Key words: Universally optimal designs; Regression models.
1. Introduction An unpleasant phenomenon in the theory of optimal design - as also in other applications of optimization theory - is the lack of robustness against the optimality criterion. Some exceptions exist however; the first example has been given by Kiefer (1971): In the block-design setting balanced block designs have such a robustness property which later has been called universal optimality, cf. Kiefer (1975). His method to prove universal optimality is based on the following result which is reproduced here only for the full rank case, i.e. for models where the relevant covariance matrices are regular. Let ~n be the set of all real nonnegative definite n × n matrices and • be the set of all functionals 0" ~ n ~ ( - o o , oo] satisfying is convex, 0_< bl -< b2 and M e ~n imply
(1 a)
r~(blM)>- ~(b2M),
is orthogonal invariant.
(lb)
(lc)
Let ~n C ~n be such that ~n contains a C* which is a multiple of I,,
(2a)
which maximizes tr C for C e ~n-
(2b)
and
0378-3758/85/$3.30 © 1985, Elsevier Science Publishers B.V. (North-Holland)
O. Krafft / Universally optimal designs
348
Then it holds that
(p(C*)<_O(C) for all C~ ~n and all 0eq~.
(3)
Applications of this method for full rank models seem to have been made mainly for problems of weighing or fractional factorial designs, cf. Cheng (1980) or Mukerjee (1982), and for second-order control processes, cf. Chang and Chiang (1980). In this note we will show that this method works also for linear regression models if one is ready to accept that the design problem will be related to the problem which parameters in the regression model should be estimated. Before doing this we will in Section 2 shortly discuss Kiefer's result (1)-(3) restricting ourselves solely on the full rank case.
2. Some remarks on universal optimality
The known versions by Pukelsheim (1983), his Theorem 1. In the (la-c) can be replaced
of the result (1)-(3) are in all generality carefully analyzed Section 3. We will discuss and illustrate by some examples special case of the full rank model this theorem says that by the (weaker) conditions
O(M)>_q~((n-XtrM)In) for all M e ~n
(4a)
and 0_
q)(blln)>_gp(bzln).
(4b)
(As Sinha and Mukerjee (1982) noticed one can replace (lb-c) also by
~(aIn+/31.×n)>-O(aIn) for a_>a+fl
(5a)
and is a permutation invariant.)
(5b)
A more condensed form of (4a-b) is M e ~n and trM<_trbIn imply O(M)>-O(bln). In fact, from trM<_trbI~=bn, with bl = n - l t r M and
(6)
bE=b we get from (4a-b)
0(M)_> 0((n - ltr M)I~) >_O(bln). Conversely, since tr M = tr((n- ltr M)I~), (6) implies (4a) and, since bl -< bE implies tr blI~ <-tr bEIn, (6) also implies (4b). The following example shows that it has some advantages to weaken condition (6) slightly. Example 1 (Shah's criterion). Let a > 0 be given and let
O. Krafft / Universally optimal designs Itr M ~0s(M) = (oo,
349
2, if tr M = a, otherwise.
If M and b are such that a = t r M < t r ( b I , ) , then (6) is not satisfied. But replacing (6) by M, bI, e g~, and trM
_q)(bI,,) (7) and putting ~¢, = { M e ~n" tr M = a}, (7) holds true, since by the Cauchy-Schwarz inequality we have ¢Ps(M)=trM2>_n-l(trM)2=n-la2=nb2=CPs(bI.) for all
M, bln e 2n. Accordingly, for a subset ~'nC 2n we define as set ~b'= ~ ' ( ~ n ) of optimality criteria all q~ : ~ , , ~ ( - 0% oo] satisfying (7). By an obvious modification of proposition 1 in Kiefer (1975) one easily sees that qSC ~ ' ( ~ n ) for all choices % C 2,, and also that ~ ' covers the set of 0 satisfying (la, 5a-b). It is even simpler to check that, if ~nC ~n satisfies (2a-b), then (3) holds true with ~ replaced by q~'(%,)Since q~' is defined by weaker conditions t h a n those defining ¢,, it is not surprising that a p r o o f for q~ being a member of qs' becomes simpler. This shall be illustrated by two further examples.
Example 2 (Elfving's minimaxity criterion, cf. Elfving (1959), Kiefer (1974), p. 871). Let (-max @E(M)
=
(M-l)ii
if M is regular,
1 : i<-n otherwise,
and ~¢n be the set of all regular M e ~n- q~E does in general not satisfy (lc). It fulfills however (la, 5a-b): Verification of (5a-b) is easy. To check (la) one can argue that w.r.t, the Loewner ordering M - 1 is convex, cf. Olkin and Pratt (1958), and ( M - l ) i i is isotonic, that (M-l)ii as function of M - 1 is linear, and that the maxi m u m o f convex functions is convex. A direct proof runs as follows: Let 2 e (0, 1), C1, C2 e ~(n and ~E(/].CI q- (1 -- A)C2) = AC, + (1 - A)C2)i0--/~. If e is the unit vector having one as io-th component, then OE(2CI + (1 -- ~.)C2) = e'(ACl + (1 - 2)(22)- le = max{ 2 e ' y - Y'()~C1 + (1 - it) C2)Y : y e ~" } =
max{A (2e'y - Y ' C l Y ) + (1 - 2)(2e'y -Y'C2Y ) "y e [Rn }
< 2 m a x { 2 e ' y - y ' C l Y : y e ~n } + (1 - 2 ) m a x { 2 e ' y - y ' C 2 y "y e [Rn }
= ;~e'C? le + (1 - 2)e'Cf le <_2 max (C7 l)ii + (1 - A) max ( C f l)ii 1 <_i<_n
I <_i<_n
= ,~.¢E(Cl) + (1 -- ,~)(~E(C2).
O. Krafft / Universallyoptimal designs
350
Verification of (7) is simpler: F r o m the arithmetic-harmonic means inequality we get
n-ltrC>_(n-ltrC-1)-l=n
(C-1)ii >_ max(C-l)ii
l<_i
i=1
so that b > n - 1 tr C implies
OE(bI,) = b -
,
1_<¢~E(C).
Example 3 (0fcriteria). Let for p >_ - 1, Op (M) =
f !n_
~
i=,
(~'i(M))-P) 1/p
if M is regular, otherwise,
where/]./(Mr) a r e the eigenvalues of M. Some effort - similarly to Example 2 - is necessary to verify (la) for Op. To check (7), one only has to show that
,x_l/p n-l i=~ ~ ~'i(M)> n-l i=, ~ (~i(M))-P) " But this is an immediate consequence of the monotonicity of s-norms
mct,x(S):(i=~1aiX:l l/s, cf. Beckenbach and Bellman (1965), p. 17.
3. The linear regression model Let
y(x)=a'f(x)+ e, xeX, a'=(al,...,a,,)e~ n,
(8)
where f = (f~, ... , f n ) ' is a known Nn-valued function on the experimental region X with compact range f(X) and linear independent J~, Ee = 0, Coy e = a2In. Furthermore, let Z be the set of discrete probability measures ~ on X for which M(~)-J f f ' d~ is regular. It is well k n o w n that for estimating a by its BLUE a, optimal designs are in general different for different optimality criteria. For instance for the regression
y(x) = ao+ alx + a2x2
(9)
on X = [ - 1 , 1] the (approximate) D-, A- and E-optimal designs have support { - 1, 0, 1 } with weights (~-,T,Y),~ 1 ~-,~-,x,,t 1 2 ~~ ,5,t±±5,±~5,, respectively. If the experimental region X = [a, b] is not symmetric around zero, then in this example the corresponding designs have even different supports. Therefore, in regression models there seems to be no hope for robustness against the optimality criterion. But if one is m a i n l y interested in the form of the function (9), one would have the same information from
O. Krafft / Universally optimal designs
351
the knowledge of a' = (a0, al, a2) as from that of b ' = (a0, a0 - al + a2, a0 + al + a2), since there is a 1-1 transformation of both vectors. If one thus estimates b by its BLUE b, the situation might change. In fact, we will show that the D-optimum design for estimating a is universally optimal for estimating b. (In the block-design setting a similar approach has been used by Pukelsheim (1983), Theorems 5 and 7.) Let now for the model (8) ~* be a D-optimal design for estimating a by its BLUE ti, i.e. det M(~*)_> det M(~) for all ~ ~ ~, and let K be any regular n × n matrix. Then * is also D-optimal for estimating b = Ka by its BLUE 6 = Kti. It is even universally optimal, if K is chosen in such a way that K'K is a multiple of M(~*), K'K= aM((*) say. Putting ~n={(KM-I(~)_K')-I'~e2} we have C*=(KM-~I(~*)K')-I= aI~ ~ ~n. Moreover, t r ( K M - I ( ( ) K ' ) - 1 is maximized by ~*, since D-optimality of ~* is equivalent to trM(()M-l(~*)<_n for all ( ~ 3 , cf. e.g. Pukelsheim (1980), Theorem 8. Thus ~n satisfies (2a-b). Hence, applying the result of Section 2, we have proved: Proposition. Let in the regression model (8) ~* be a D-optimal design f o r estimating
a by its B L U E a. Let K be a regular n×n matrix such that K'K=aM(~*), where a ~ ~,+. Then ~* is qb-optimalfor estimating b =Ka by its BLUE for all qb~ ~ ' ( ~ ) , where Tn = {(KM-I(()K')-1 : ( ~ 3 } . Remark. As concerns the choice of K one can of course always take K=M1/2(~*), in fact the set of all regular K satisfying K'K=M(~*) is identical with {QAI/2P': Q orthogonal} where M(~*)=PAP' is the spectral decomposition of M((*). In settings where ~* has a support {x l, ... ,xn } consisting of exactly n points with weights Pi, 1 < i<_n, one can simply take K = (p:/2fj(xi))l <_i,j<_n. Examples for this case are the polynomial regression y(x) = ~ ai x i - 1 i=1
on an interval [a, b] - for n = 3 this is the example used in the beginning of this section -, trigonometric regression ¥
r
Y(x)=ao+ ~ ascossx+ ~., btsintx $=1
I=1
on [0, 2rt] or the (2, 2)-regression
y(x) = ao + alXl + a2x2 + a3x? + a4x22 + asxlx2 on the sphere {x 2 + x 2 <2}.
Acknowledgement I thank Dr. N. Gaffke for helpful discussions concerning this topic and the coor-
352
O. Krafft / Universally optimal designs
dinating editor and a referee for various remarks which led to several improvements of the paper.
References Beckenbach, E.F. and R. Bellman (1965). Inequalities. Springer, Berlin. Chang, Der-Shin and Yuan-Chin Chiang (1980). Designs of ~-optimal control for second-order processes. Ann. Inst. Statist. Math. 32, 275-281. Cheng, Ching-Shui (1980). Optimality of some weighing and 2 n fractional factorial designs. Ann. Statist. 8, 436-446. Elfving, G. (1959). Design of linear experiments. In: U. Grenander, Ed., Probability and Statistics, The Harald Cram6r Volume. Wiley, New York, 58-74. Kiefer, J. (1971). The role of symmetry and approximation in exact design optimality. In: S.S. Gupta and J. Yackel, Eds., Statistical Decision Theory and Related Topics. Academic, New York, 109-118. Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2, 849-879. Kiefer, J. (1975). Construction and optimality of generalized Youden designs. In: J.N. Srivastava, Ed., A Survey o f Statistical Design and Linear Models. North-Holland, Amsterdam, 333-353. Muckerjee, R. (1982). Universal optimality of fractional factorial plans derivable through orthogonal arrays. Calcutta Statist. Assoc. Bull. 31, 63-68. Olkin, I. and J.W. Pratt (1958). A multivariate Tchebycheff inequality. Ann. Math. Statist. 29, 226-234. Pukelsheim, F. (1980). On linear regression designs which maximize information. J. Statist. Plann. Inference 4, 339-364. Pukelsheim, F. (1983). On optimality properties of simple block designs in the approximate design theory. J. Statist. Plann. Inference 8, 193-208. Sinha, B.K. and R. Mukerjee (1982). A note on the universal optimality criterion for full rank models. J. Statist. Plann. Inference 7, 97-100.