joumalof statistical andh ~ P ~ lanrmg "
ELSEVIER
Journal of Statistical Planning and Inference 52 (1996) 225-240
Pre-test estimation and design in the linear model Norbert Benda * Institut fiir Medizinische Biometrie, Universitiit Tfibingen, Westbahnhofstrafle 55, D-72070 Tiibingen, Germany Received 7 April 1994; revised 24 June 1995
Abstract
As a robust method against model deviation we consider a pre-test estimation function. To optimize a continuous design for this problem we g!ve an asymptotic risk matrix for the quadratic loss. The risk will then be given by an isotonic criterion function of the asymptotic risk matrix. As an optimization criterion we look for a design that minimizes the maximal risk in the deviation model under the restriction that the risk in the original model does not exceed a given bound. This optimization problem will be solved for the polynomial regression, the deviation consisting in one additional regression function and the criterion function being the determinant. A M S classification." Primary 62K05; secondary 62J05 Keywords: Linear model; Model deviation; Pre-test estimation function; Experimental design;
Polynomial regression
1. Introduction
In the linear model X ( t i ) = a(l)(ti)Tfl + Zi,
ti E T,
1 <~i<~N,
(I)
where the unobservable random errors Zi should be independent and normally distributed with expectation 0 and unknown variance a 2, the unknown parameter vector fl E •P is usually estimated by the Gauss-Markov estimate (GME), the best linear unbiased estimate,/~*. For a number of regression functions a (1) = (al . . . . . a p ) x : T --~ R p the optimal design is known, i.e. the optimal choice of the ti E T, which minimizes a given real-valued function (for instance the determinant) of the covariance matrix of the estimate. Nevertheless, in many practical situations it might be possible to misspecify the response function of the experiment, # = a(1)rfl, and to neglect further * Tel.: 07071/293403; fax: 07071/440183; e-mail:
[email protected]. This work was completed during the author's residence at the Institute of Mathematics of the Free University of Berlin. 0378-3758/96/$15.00 (~) 1996---Elsevier Science B.V. All rights reserved SSDI 0 3 7 8 - 3 7 5 8 ( 9 6 ) 0 0 1 1 5 - 8
226
N. Benda/Journal of Statistical Planning and Inference 52 (1996) 225-240
regression functions the form
a (2) ---- ( a p + l , . . .
,ap+q) y. So a possible extended model could take
X(ti)=a(l)(ti)Tfl+a(2)(ti)TT+zi, ti E T,
l<~i~N,
(II) ^~
where 7 E lt~q. In the general case, where the GME of the first model fl differs from ^*
^:¢
the GME of the extended model fill, and fl has an unbounded bias, other methods for the estimation of fl and other criteria for a good design should be investigated. A possible approach could discriminate with a first design between the two rival models and after a decision for one model estimate with the optimal design of this model. For the discrimination the design problem was investigated, for instance, by Atkinson and Fedorov (1975) and Fedorov and Khabarov (1986). Generally, the optimal design for discrimination differs completely from the optimal design for estimation. More reasonable might be to discriminate and estimate simultaneously, i.e. with the same design, which should be appropriate for this procedure. Therefore, this paper deals with estimation functions, which are conditioned by a statistical test to the hypothesis 7 = 0. Such methods are sometimes called testimators or pre-test estimators and were already applied in linear models (see e.g. Judge and Book, 1978; Book et al., 1973; Giles, 1991 or Boscher, 1991) and also in different outlier models (see e.g. Gather, 1986). The aim of this paper is to study and to optimize designs for such a procedure. The pretest estimation function uses either the GME of the first or the GME of the second model conditioned by the result of the F-test to the hypothesis 7 -- 0. That defines a set of estimation functions, parametrized by the critical test value c. Every estimation function will be judged by the mean square error risk matrix (MSE). This risk matrix for the pre-test estimation function was already calculated by Judge and Bock (1978). Due to the special properties of the F-test the usual transition to generalized designs (i.e. probability measures on T) is not possible for the pre-test estimation function. Therefore, we consider the asymptotic risk matrix Ras(7; c, 6), which can be evaluated for any generalized design 6 with a positive information matrix. For a given criterion function • the risk cannot be minimized uniformly in 7. Hence, we chose the following criterion for an optimal design for the pre-test estimation: The design 6 together with the critical test value c should be determined such that • (R~(0; c, 6)) does not exceed a given bound and maxima ~(Ras(7; c, 6)) is minimized under this restriction. Theorem 1 will lead us to the restriction on those 6 and c for which the bound for 7 = 0 is reached. For the polynomial regression on [-1, 1], q = 1 and • = det this fact allows explicit evaluation rules for optimal designs, because it can also be shown that the set of all symmetrical designs with a support of p + 1 points including - 1 and 1, is an essentially complete class in the sense of the criterion mentioned above. For such designs the general optimization problem can be reduced to a numerically easy solvable, ( p - 1 )-dimensional problem. As an example we illustrate in Section 5 the case p = 2.
N. Bendal Journal of Statistical Planning and Inference 52 (1996) 225-240
227
2. The estimators and their risks
Let DN be the set of all N-observation designs on T, A the set of all generalized designs for al . . . . . ap+q on T and S(6) the support of the generalized design 6. For the N-observation design =( dN
tl . . . . . tj "~ \N(tl ) ..... N(tj )J
and 1N(h)×la(i)(tl)T / A(i) aN :=
,
i = 1,2,
1U(tj)× l a(i)(tj )T / ~,A(1) A(2)~ Adu : = ~" dlv' dN J'
we have the models (I) and (II) for ClN E DN: XaN = AdNfl (l ) + Zd N,
(I),
(II),
XaN = A a N ( f l ) + Z a N ,
with Sd N : = ( X l ( t l ) . . . . . XN(tt )( tl ) . . . . . X l ( / J ) . . . . . SN(t, )( tJ ) ) T, E(ZaN ) = O,
Cov(ZaN ) = a2EN.
Let a := (aO)T,a(2)T) T. The regression function a is assumed to be continuous on T and T is assumed to be compact. For dN E DN and rg(AdN ) = p + q the Gauss-Markov estimate for 13 in model (I) and model (II), respectively, is A* X,
l~(l)Ta(l)x-- I~(I)T v
/3~.(d~) = ~ a . "aN) .~a. AaN, /3II,dN(XaN) = (Ep, O)(A~NAd N ) -1 Ad~XaN. T
For c E [0, oc], Zdu ~ N(0, ff2E N ) and the N-observation design dN the pre-test estimation function for fl is given by A,
X,
A c:
flC'dN(XdN ) : = flaN( dN )lto, c)(UdN ) + /3II,dN(XdN )l[c,oo](UaN ) with T
2_ (1)
: = N -- rg(AdN ) X~N(Pr (Ad N ) -- pr2_(AdN ))XdN UdN
q
XTNPr±(AaN )XdN
'
228
N. Bendal Journal of Statistical Planning and Inference 52 (1996) 225-240
where pri(A) = EN -- pr(A) and pr(A) = A(ATA)-IA T is the orthogonal projector on the range of A. For any estimation function fl for fl the risk matrix to quadratic loss is given by 7 ) : =
-
-
The expectation and the risk for the pre-test estimation function were calculated by Judge and Bock (1978) to E(flc,d N)
= r + kl • IO)+d .-IA(I)TA(2) ~ N) d~, d.vT'
R~,.d, (T) =
~ - I [ I I ) ( d N ) -1 -- k l - a2 ~D(dN
(1)
)
_~ (2kN___~kz)r(X)t.j ~-ln(1)T~(2)~,~,Ta(Z)TA(l)r0)t.j ~--I -
-
-
1
t,UN)
Zad~' "*dN ?i~ •*dN
•*dul
t,UN 1
,
(2)
with ki=Fq+2i, N_p_q;22 ( c ~ 2 i )
22 =
,
/= 1,2,
1 TA(Z)T ri.A(1).A(2) ~? aN P t d,,) d,.?,
D(dN ) = I~II)(dN )-1 _ iO)(d N )- l, I~'I)(du) = -~1 ( (Ep, O)(A~NAd,~)-I ( O p ) ) - ' IO)(dN)= LA(I N aN)TA(1 aN') where F,n,,,,;: is the cumulative distribution function of the noncentral F-distribution with (m,n) degrees of freedom and noncentrality parameter 22. Especially, if c tends to infinity, we get the risk of fl~N, i.e. 2 + 1 i0):~.dN)._IA(1)TA(2).. TA(2)TA(I)I(I):d .~-1 R~,7,~(7) = a-Nl(1)(dN)_ 1 ~ dN d..~,y,v d,,, dN I. N ) ,
(3)
(2) # O. The case AO)TA(2) which is unbounded in 7 if A(l)TA aN d,. aN aN = 0 is not of interest to further considerations, because it yields the equality of the GMEs of the two models and the pre-test estimation function.
3. The asymptotic risk
In the classical theory of optimal design the covariance matrix of the Gauss-Markov estimate can easily be generalized to the class of all continuous designs. This cannot be translated to the risk matrix of the pre-test estimation function, because of the special properties of the F-test. Therefore, we give in Lemma 1 an asymptotic risk, which can be calculated for all generalized designs.
N. BendaI Journal of Statistical Planning and Inference 52 (1996) 225-240
229
For 6 E A, let i0)(~) be the information matrix in model (I) and lllI)(6), I~n)(6) the information matrices in model (II) for estimating/3 and ~, respectively, i.e.
I0)(~) = f a(l )(t)a O)(t)r 6(dt),
I~1I)(6) =
((Ep, ( /O)
I~I0(6) =
(
(O, Eq)
-'
a(t)a(t)Tf(dt)
I/
a(t)a(t)T6(dt)
)l))-' o Eq
Lemma 1. If the regression functions ai, 1 <~i <~p + q, are continuous and bounded
on T and {dN}N~ is a sequence of N-observation designs the measures of which converge weakly to 6 E A for N ~ cx~, then the matrix NR~c.@(TN-1/2) converges for N ~ oe to 0"2I~I1)(6)-1
--
pla2D(6) + (2pl - p2)I(I)(6) -1
× f a(l)(t)a(2)(t)I6(dtly7 v f a(2)(t)a(1 )(t)T6(dt)l(I)(6) -1
with D(~5) = I~n)(6)-1 - I(1)(6)-l,
Pi = Gq+2i;,v(qe),
i = 1,2,
and
2 2 = -~2 1 .T~(n)tx~. 7 12 tttIy,
where Gq+2i;22 is the cumulative distribution function of the noncentral Z2 distribution with q + 2i degrees of freedom and noncentrality parameter 2 2. Proof. Let ON be the measure corresponding to dN. Then the risk given in (2) can be rewritten as NR~,.@ ( T N
- 1 / 2 ) - - O'21~I1)(6N) - 1 --
+(2kiN)
X77 T
with k} N) = Fq+2i,N_p_q;2~
_
klff2O(6N)
k~U))i(l)(6 N )-1 f aO)(t)a(2)(t)T 6u(dt)
f a(2)(t)aO)(t)T 6N(dt)lO)(6U) -I
N. Benda/ Journal of Statistical Planning and Inference 52 (1996) 225-240
230
For Yq+2i,N_p_q;s 2 ~ ~q+2i, N_p__q(S 2) and Zq+2i;s2 ,'~ )~q+2i(s 2 2 ) the random variable Yq+2i,N_p_q;s 2 converges weakly to (1/(q + 2i))Zq+2i;s2 (see Johnson and Kotz, 1970, p. 193). Since the regression functions are continuous and bounded, we have 2~/ 2 2 = (1/2a 2 )7T I~II) ( 6 )7, and we get:
Nlim --+oo P ( Yq+2i,N_ p_q;2~ <~q ~ 2 i ) = P( Zq+2i;22 <~cq ), which completes the proof.
[]
With the help of this lemma we can define the asymptotic risk matrix for all generalized designs.
Definition 1. For 6 E A,c E [0,o~], and any sequence {dN}N~ of N-observation designs the measures of which converge weakly to 6 the asymptotic risk matrix of the pre-test estimation function with critical testvalue c for the design 6 and the nuisance parameter V is given by Ras(7; c, 6) := N1Lm(N .R~c, au(7N-1/2)). By Lemma 1, Ras(7;e, 6) is independent of the choice of the sequence dN. For q = 1 we get after a few rearrangements a quite convenient formula for the asymptotic risk matrix.
Corollary 1.
For q =
1
eas(,;C, 6)=¢72 {1(I)(6)-1 + fc (~'-~ " I ~ I I ) ( 6 ) ) D ( 6 ) ) ,
with fc(X)
= 1 - pl(c,x) + (2pl(c,x) - p2(c,x))2x,
pi(c,x) : Gl+2i;x(C),
i = 1,2.
For the function f~ we need the following properties.
Lemma 2. (i)
f o ( x ) = 1, Vx c
[0,0<>).
(ii) l i m c ~ fc(X) = 2x, Vx E [0, ~ ) . (iii) limx__.~f~(x) = 1, Vc E [0,c~). (iv) fc,(O) > fc2(O), /f 0~
m(c) := max fc(x) > 1, Vc > O. x>>.O (vi) maxx~>0f~,(x) < maxx>~0f~2(x) zf0~
~ f~(O), Vx>~O.
N. Benda I Journal of Statistical Plannin# and Inference 52 (1996) 225-240
231
The properties ( i ) - ( i v ) and (vii) are easy to see, but (vi) requires a quite lengthy and technical proof. Therefore, the proof of Lemma 2 is given in the appendix. The properties (iv) and (vi) of Lemma 2 lead to the fact that for every isotonic functional, i.e. for every to : R p×p ~ ~ with to(A)>>.to(B) for A>~B>~O, the risk for 7 = 0 will become greater if and only if the maximal risk becomes smaller. Lemma 3. For C l , C 2 / > O , q = 1,8 E A and any &otonic functional to: ~P×P --* ~ the followin9 statements are equivalent:
(i) cl <~c2. (ii) R~(0; Cl, 5) ~>Ra~(0; c2, 8). (iii) max,,oR to(R~s(7; el, 8)) ~< max~,ER to(Ras(7; c2, 8)). Proof. Using Corollary 1 the property (iv) of Lemma 2 leads directly to (i) ¢~ (ii). The equivalence (i) ¢::, (iii) is straightforward from Lemma 2 (vi), since 10)(8) > 0, D(6)>j0 and I~II)(6) > 0. []
4. Optimality of design In the following we will assume q = 1 and tO : ~pxp continuous. Let
~
~ to be isotonic and
A' := {8 E A;l~n)(5) > 0}, m~ :-- inf to(tr2IO)(8) -l), lEA r
M~ := inf to(a:IIII)(8)-l), 6Ezl r
and for r/ > 0, A(n, to) := {8 E A'; to(ItI)(8)-t ) ~<(1 + n)m#}. Lemma 4 will show that A(~/, to) is the class of all designs for which the risk in model (I) does not exceed (1 + r/)m, if we choose a suitable critical test value c. This will be the class of our interest. Instead of the unbounded risk of the optimal procedure in model (I) we get for any deviation r/m# from the minimal risk in model (I) (i.e. from m~) an estimation procedure with a bounded risk, and design and critical test value can be choosen to minimize the maximal risk.
Lemma 4. 6 E A(r/,to) ¢:~ 3c E [0,c~] with to(Ras(O;c, 8))<~(1 +tl)mo. Proof. Lemma 2(ii) leads to l i m c ~ fc(0) = 0. Because of f~(0)~>0,Vc E [0,oo) and Corollary 1 we get inf
cE[0,oo]
to(R~s(0; c, 6))
which leads to Lemma 4.
= to(tr2I(I)(8)-I ) =
[]
clim to(Ras(0; c, 8)),
N. Benda/ Journal of Statistical Plannin9 and Inference 52 (1996) 225-240
232
Under the condition that the risk for ? = 0 (i.e. for the model (I)) does not differ more than 100r/% from the minimal risk, we want to minimize the maximal risk.
Definition 2. Let S(q,q~):= {(6,c);8 E A(r/,~),c~>0 and ~(Ras(O;c, 8))<<.(l+~l)m~ }. Then a design fi* E A(r/, ~ ) is called asymptotically-q,-r/-optimal if there exists a c* with (8*,e*) E SU/,~) and max ¢~(Ras(?; c*, 8" )) =;'E~
rain
max ~(R,s(?; c, 6)).
(&c)ES(q,~) ?E~
Due to Lemma 3 and hence from the property (vi) of the function fc in Lemma 2 we can restrict ourselves to all designs and critical test values for which in model (I) the given bound (1 + r/)rn~ is reached:
Theorem 1. For 0 < ~l<~M~/m~- 1 and 8 E A(q,q~) there exists a c>~O with • (R~s(0,c,8)) = (1 + q)m~. Let c(8,~1):= rain{e; ¢(R,s(O,c,8)) = (1 + q)m~}. Then a design 8" E A(rl, ~) is asymptotically-~-q-optimal if and only if max ~(Ras(7; c(6*, q), 6* )) =
rain maxq~(R,s(7;c(6,~/),8)).
Proof. rl<
5. D-optimal designs for the polynomial regression Next we will treat the D-optimal design problem for the polynomial regression. First we will calculate the determinant of the risk matrix of the pre-test estimation function. Lemma 5. Let 6 E A', q = 1,
Mij(6) : = / a U ) ( t ) a ( J ) ( t ) t r ( d t ) , l <~i,j<<.2, and re(c) as in Lemma 2(v). Then
(i) det Ras('~; c, 6)
M21(6)M11(6)-1MI2(6) = ~rZPdet(Mjl(6) -1) 7z
1 + M 2 2 ( 6 ) - Mz1(8)MII(6)-lMlz(6) "
N. Benda/Journal of Statistical Plannin 9 and Inference 52 (1996) 225 240
233
(ii) max det Ras(~; c, 6) 7oR
= a2pdet(Mll(6) -1 ) {1 +
M21(6)MIl(6)-IM12(6) , ,] M22(J) - Mzl(6)Mll(6)-lM12(6) mtc) f "
Proof. (i) For q = 1, D(6) can be written as uu x, where
U = 10)(6) -1 f a(1)(t)a¢2)(t)T6(dt)I~Xl)(6) -I/2 E ~P. Since det(I(l)(6) -1 + uu T) = (1 + uTlO)(6)u)det(I(1)(6)) (see Rao, 1973, p. 34) (i) follows from Corollary 1 and the inversion formula for partitioned matrices. (ii) follows immediately from (i) since M21(6)M11(6)-lM12(6)/(M22(6)M21(6)M~l(6)-lM12(6))>~O and det(Mll(6) - l ) > 0. [] Theorem 2. For a(l)(t) = (1,t . . . . . tp-1) T, a(2)(t) = t p, T = [ - 1 , 1], 6 E A', c E [0,cx~)
and A* := {6 E A';6 is symmetrical, 1S(6)1 = p + 1 , { - 1 , 1} C S(6)},
there exists a design 6" E A* with (i) det(Ras(0; c, 6" )) ~< det(R~s(0; c, 6)), (ii) max,.ca det(Ras(7; c, 6* )) ~< max:.eR det(Ras(7; c, 6)). Proof. With Lemma 5 we get det(Ras(0; c, 6)) = a 2p det(10)(6) -l )
M21(6)MIl(6)_lMI2(6) ×
f "0 ")
1 + M22(6)-MzI(6)M11(f)-lM12(6) Jct )~'
max det(R,s(7; c, 6)) = a 2p det(IO)(6) - 1 ) 7ER
×
M21(6)Mll(6)_lm12(6) , ,] 1 + M22(6) - M21(6)Mll(6)-lM12(6)mtc)~ "
For 6 E A', let 6_(A) := 6(-A),VA 6 ~1 N T and 3 := l ( J + 6_). Since 6 c A' we have 1(I1)(6) > 0 hence 1~1)(6) > 0. Thus, we have det(l¢I)(3)) t> det(I(I)(6)) and l~II)(3) >~I~H)(6). Since I~n)(6) = M 2 2 ( 6 ) - M21(6)Mll(6)-lm12(6) E R and M22(3) = Mz2(6), we get
M2~ (3)M~ ~(3)-IMlz (3) ~
N. Bendal Journal of Statistical Plannin 0 and Inference 52 (1996) 225-240
234
Hence, det(Ras(0; c, 3)) ~< det(Ras(0; c, 6)), max det(Ras(7; c,3)) ~< max det(Ras(7; c, 3)). 7ER
Let •A/'n :=
7ER
{
(/~0. . . . . /~n)T E ~n+l;lti =
f'
~t,+ (~to. . . . . # , - l ) :. max{z . . E .R;(/~o, ]2i(6 ) -~-
ti6(dt),O<-..i<-..n, 6 E A
--1
/_l
}
,
,kt,-1,z) T E J//"},
ti6(dt),i E No
-1
(see Karlin and Studden, 1966, Ch. II). If (/~o(3) . . . . . #Zp_l(3)) T ~ IntJ¢/p-I (the inner space of ~/2p-l) we get (see Preitschopf, 1989, (1.8.17)) a go E A with I~i(6o) = #i(3),O<~i<~2p and [S(6o)l < p + 1, which implies, the singularity of 100(3) = I(H)(60) and contradicts 3 E A'. Hence, ( ~ ( 3 ) . . . . . #2p_1(3)) T E IntJ/g 2p-1, which leads to a symmetrical 6" E A with IS(g*)[ = p + 1, {-1, 1} cS(61), /~i(6") =/~i(3),0-..
= I~i(3),O<-~i<<.2p- 1 and
M11(6") = M I I ( 3 )
and
/~zp(6*)/>kt2p(3), i.e. M22(6*)>~M22(3),
M12(6") =M12(3)
det(R~(0; e, 6" ) ~< det(Ras(0; c,3)) ~< det(Ra~(0; c, 6)), max det(Ras(7; c, 3" )) ~< max det(Ras(7; c, 3)) ~< max det(Ra~(7; c, 6)) ~E~
7ER
for all c E [0, c~), which completes the proof.
7E~
[]
While searching for an optimal design we can thus restrict A to A*, which leads to a ( p - 1)-dimensional optimization problem. The easiest case is p = 2, which is illustrated in the concluding example.
Example (p = 2). For p = 2 and I~i(6) := fl_ l titS(dt)
we
get for any symmetrical
design 6: det(I(1)(6)) = p2(6) and det(IllI)(6)) = ~u2(6)
]22(6) 3 /./4(3 ) "
Since #4(6),.<#2(6), detlln)(6) attains its maximum if #2(3) = #4(3) = ½, i.e. for 6 = 6~ := ~e-l + leo + ¼ca, which is the D-optimal design for estimating fl in model l (II). Hence, we have Mdet =- 4 0 "4. The D-optimal design in model (I) is given by 6~' = ½e-1 + ½e, leading to mdet = 0-4. Therefore, we choose rl<~Mdet/rnde t -- 1 = 3.
235
N. Bendal Journal of Statistical Plannin9 and Inference 52 (1996) 225-240 0 .
.
.
.
~
.
.
.
.
i
.
.
.
.
)
.
.
.
.
)
.
.
.
.
i
.
,
.
r
g 6 o
"3 c5
o c5 g 6
o ~.0
.
0.5
1.0
1.5
2.0
,
T
i
2.5
3.0
Fig. 1. With 6w := lwe-l+(1-W)~o+½we, we have A* = {6w;0 < w < 1}. Using Corollary 1 and/12(6w) = p4(6w) -- w we get 1
6w E A(q, det) ¢~ w~>
1+~/'
0"4(
det(Ras(7;C,6w)) = --w
W
(])2
1 + ~_wfC
))
~a2w(1 - w)
.
Since fc(0) = 1 - G 3 ( c ) , where G3 is the cdf of the central Z32 distribution, the unique solution of det(Ras(0, c, 6w)) = ( 1 + r/)mdet, say C(6w,r/), is therefore given by
C(6w,t l ) = G 3 1 ( 1 - ( 1 - w ) ( l
+~l-1)).
Thus, we get an asymptotic-det-q-optimal design by
1 * e-1 + (1 6w* = ~w
1 * el, w*)eo + -~w
-
with w* = w*(q) = arg min
{
w1 + ~
.
.
,
m
.
((
G31
.
1
(1
.
w)
(
l+q
•
)
l+q
< w~
The function re(c) (see Lemma 2(v)) and the optimal weight w* have to be calculated numerically. Fig. 1 shows w*(q), which was evaluated with the help of the statistical programming language GAUSS and Fig. 2 shows for the optimal design 6*(q) := 6w*(,) the maximal risk (divided by 0.4) as a function of q, i.e. 1
rmax(r/) : : ""7 max det(Ras(7; c(6*(t/), q), 6*(q))). 0.~ 7ER
236
N. Benda/ Journal of Statistical Planning and Inference 52 (1996) 225-240
~o o
i . . r . . l . r . . . T i . . l l l . . . T , . T
.0
0.5
1.0
1.5
2.0
2.5
'
3.0
Fig. 2.
The case p -- 2 is a very simple design situation. However, it is the best way to show how our method works. For p > 2 the computation of the optimal design gets more laborious, but for a moderate p it remains well feasible, since there is only a ( p - 1)-dimensional optimization problem.
6. Conclusion The risk of the GME of the original model tends with an increasing nuisance parameter 7 to infinity (except in the case of equal GMEs, see (3)). For example, in the case of linear versus quadratic regression we get by if4
lim det(Ras(7; c, 6 w ) ) = -W- + 7 2 w 2
C----* OO
the risk for the GME of model (I). Setting w = 1 we get with tr 4 --F 7 2 the risk of the GME of model (I) with the D-optimal design in (I). Thus, the best estimation method in model (I) is not robust against the occurrence of 7 # 0. But allowing any small deviation from its risk in model (I), we get with the pre-test estimation function an unbiased procedure in model (I) which has always a bounded risk in model (II) (e.g. for 7 ¢- 0). To optimize this procedure under the restriction that a given deviation from the minimal risk in model (I) is not exceeded, the design should be chosen appropriate, for instance, to minimize the maximal risk. For the polynomial regression with one additional regression function and the determinant as a criterion function, this problem can be solved. Numerical results show that this is quite close but not equal to a uniform minimization under the mentioned restriction. Similar results for other optimization criteria, such as the trace, can be presumed, but their evaluations seem to be more troublesome. Finally, the case of more than one nuisance parameter, i.e. q > l, remains a difficult task.
N. Benda l Journal of Statistical Planning and Inference 52 (1996) 225-240
237
Appendix Here we prove Lemma 2. The properties ( i ) - ( v ) and (vii) are quite easy to show, but the part (vi) needs some technical calculations. First we need a special property of the cdf of the X2 distribution. Lemma A.1. For v E ~ and x>~O, let )~v;x 2 be a Z 2 distributed random variable with v degrees o f freedom and noncentrality parameter x and with density function J~L,' For all c >>.0 and v E ~ it holds: 2 .< c) = P(Z~a < c ) - 2fz~+2~(c). P(Z~+za "-:
Proof. From Johnson and Kotz (1970, Vol. 1, p. 173, (24)) we get P(Z~;o ~
c~
(c/2)v/2+j + I) =~ P(z~+2;o ~
Z r(v/2+ j
(c/2)v/2+2 + 1)
~=l r(v/2 + j
j=o
which yields Lemma A. 1 for x = O, since fz~+2o (c) = le-c/2 •
2
(c/2)~/2 r(v/2 + 1)
With Johnson and Kotz (1970, Vol. 2, p. 132, (2) and (3)) we get for x > 0, 2
P(xv+2;x
~C)
j
=
X--"(X/2) --x/2~. 2 .< . 2__, ,---7i- - e / " [ J ( v + 2 j + 2 ; 0 "...v.C) j=0
J"
y (x/2) e_X/Zrpe,, 2 <<'ca = Z.., ,--7i--t ~Zv+Zj;0 J - 2fz~+2;+2 0(c)] j=0
J"
= P(Z2;x ~
'
[]
Proof of Lemma 2. fc(x) = 1 - P(Z~;x <~c) + [2P(Z~;x <~c) - P(ZE;x ~ f o ( x ) = 1,Vx E [0, c¢). (ii) lim~_,~P(Z2;x ~ c ) = 1 ~ lim~--+o¢f i ( x ) = 2x, Vx E [ 0 , ~ ) . (iv) cl < c2 ~ P ( z J ~ c l ) < P(Z32~c2) =~ f~,(0) > fc2(0). (vii) 2p(Z2;x <<.c) - P(Z2;x <<.c) ¢t,,~4.1) p0~23;x <~c) + 2fx~ (c)>~O, Vc>~O, p0~23~ <<.c) ~ fc(x)>~fc(O),Vx>~O. (iii), (v), (vi): Let 9(c,x) := fcffx), x,c E [0,oo) and denote by h and • the pdf and the cdf, respectively, of the standard normal distribution. Using Lemma A.1, the properties of the pdf of the noncentral Z2 distribution and P(Z2;x ~
N. Benda/ Journal of Statistical Planning and Inference 52 (1996) 225-240
238
~P(x/~ -- x/-C) we get for x > 0,
g(c,x)=(2x2-1)(~(x+c)-g~(x-c))+(2c-2x-1)h(c-x) +(2c+2x+l)
h(c+x)+l
leading to (iii), since ~P(x + c) - ~P(x - c) < Setting gl(c,x) = (a/dc)g(c,x) and g2(c,x) =
~,~,x,--(4~x. ~ -~c 0 ~c-x~-
2ch(x - c) for x > c. (d/dx)g(c,x), we get for x > 0: ¢
(4cx . x ÷ ~cO ~ ÷x,,
g2(c,x)=4X(~(X+C)__~(X__C))+(2CX2__4CX --
(
1) -~
4cx + C +
2C2 +
x
- xC+ )51 ) h(~-x)
h( c + x ).
For c > 0,x > 0 it holds: 2
gl(c,x) = 0 ¢* tanh(cx) - 4x 2 +-------]cx. Because of 2/(4x2 + 1) < cx ¢~, x < 1, tanh(y) < y, Vy > O, (O/t~y)tanh(y)ly=O = 1 and (O/Oy)Ztanh(y) < 0,Vy > 0, we get for all x > ½ an unique Y(x), such that ~(x) > 0 and gl(?(x),x) = 0. For all x~<½ we get gl(C,X) < 0,Vc > 0. Let
c*(x) :=
/
~(x), 0,
x > ' 1 x ~< ~.
Since gl(O,x) = (d/~c)gl(c,X)[c=O = O,Vx >>.O,(d/ac)2gl(c,x)lc=o = (16x 2 - 4)h(x) and 1 (O/dc)3gl(c, i)Jc=0 < 0, it follows:
max g( c, x ) c>~O
g(c*(x),x) ~ = g(O,x) = 1, [ > g(O,x)= l,
x<<.1/2, x>
(A.1)
1/2,
and for all x > I the function g(c,x) is strictly increasing in c for c < tanh2(z) + tanh(z)/z- 1 > 0, Vz > 0, and tanh(z) < 1 we get
[(
2x-
1
2x+
Proof of (A.3): Since
1
< c*(x) < 2 x + ~--~, V x > ~ .
Next we will prove the monotonicity of
O(c*(x),x) > 0,
c*(x). Using
g(c*(x),x) in x:
1
Vx > ~.
gI(C*(X),X)
=
(A.2)
(A.3)
0 we have
~'~(C *(X),X) = ( ~--~¢*(X)) ~I(C *(X),X) d-~2(e*(X),X) = g2(c*(x),x).
N. Benda/Journal of Statistical Planning and Inference52 (1996) 225-240
239
(i) For e* = c*(x)>>.2x we get
• (x+c*)-
~ ( x - c*)>~c*h(c* - x ) +c*h(c* + x )
=e~g2(c,,x)~ (2C,: +--+
-
C* + 1 )
-x
h(c*
-
x)
h ( c * + x ) > O.
since x > ½ and e 2c*x > 1 + 2 c ' x + 2c'2x 2. (ii) For c* ~
(h(x-c*)-h(x))c*
~<(h(0)-h(12~))
1
121/4
<~(h(x) - h(1))(1 - x). Forx+c*~>l
we get
¢'(x + c*) - ¢'(x - c*)
>~c*h(x + c*) ÷ c*h(x) + ½c*(h(x - c*) - h(x)) ÷ ½(1 - x)(h(x) - h(1 )) >~c*h(x + c*) + c*h(x) + c*h(x - c*) - c*h(x) = c*h(x + c* ) + c*h(x - c* ). The same inequality holds for x + c * < 1 because of the concavity of h(z) for Izl < 1, and leads again to 92(c*,x) > O. (iii) For x<~c*<~2x it holds that
q~(x + c* ) - ~(x - c* ) >~h(x - c* )2(c* - x) + h(x + c* )2x. Hence, using 91(c*(x),x)= 0 and (A.2) we get
92(c*,x)
= 4x(~(x + c*) - q~(x - c*)) + -~(h(c* - x) - h(c* + x))
-(2c*2+4c*x+C-*)h(c*+x)
(
1)
>~ 8 c * x - 8x 2+~-~ +
h( c* - x )
8x 2 - 2c .2 - 4c*x
-
x
;2
h(c* + x ) > O.
which completes the proof of (A.3). For c > 0 the properties ( A . 1 ) - ( A . 3 ) imply the existence o f a xc > 1 with c*(xc) = c and 9(c,x~) > 1, which proves Lemma 2(v) by 9(c,0) = 1 - P ( Z 2 < c) < 1 and l i m x ~ g(c,x) = 1.
N. Benda/Journal of Statistical Planning and Inference 52 (1996) 225-240
240
N o w let 0~~O x>~O Since g(c, xl) is strictly increasing in c for c < c*(xl ), we get for c2 ~c(xl ),
g(Cl ,Xl ) < g(C2,Xl ) <~g(c2,x2 ). Otherwise, for c2 > c*(xl), property (A.2) implies the existence of xc2 > xt with c*(Xc2) = c2, and using (A.3) we get
g(Cl ,Xl ) ~ g(C*(X1 ),Xl ) < g(g*(Xc 2 ),Xc 2 ) = g(c2,Xc2 ) ~ g(c2,x2), which completes the proof of Lemma 2.
[]
References Atkinson, A.C. and V.V. Fedorov (1975). The design of experiments for discriminating between two rival models. Biometrika 62, 57-70. Baksalary, J.K. (1984). A study of the equivalence between a Gauss-Markov model and its augmentation by nuisance parameters. Math. Operationsforschung Statistik Ser. Stat. 15, 3-35. Bock, M.E., G.G. Judge and T.A. Yancey (1973). The statistical consequences of preliminary test estimators in regression. JASA 68, 109-112. Boscher, H. (1991). Contamination in linear regression models and its influence on estimators. Statist. Neerlandica 45, 9-19. Fedorov, V.V. and V. Khabarov (1986). Duality of optimal designs for model discrimination and parameter estimation. Biometrika 73, 183-190. Fellman, J. (1985). Estimation in linear models with nuisance parameters. Statistics Decisions (Suppl. (2)) 161-164. Gather U. (1986). Robust estimation of the mean of the exponential distribution in outlier situations. Comm. Statist. A 15, 2323-2345. Giles, J.A. (1991). Pre-testing in a mis-specified regression model. Comm. Statist. A 20, 3221-3238. Johnson, N.L. and S. Kotz (1970). Continuous Univariate Distributions 1,2. Houghton Mifflin, Boston. Judge, G.G. and M.E. Bock (1978). The Statistical Implications of Pre-test and Stein-rule Estimators in Econometrics. North-Holland, Amsterdam. Karlin, S. and W.J. Studden (1966). Tchebycheff Systems." With Applications in Analysis and Statistics. Wiley, New York. Preitschopf, F. (1989). Bestimmung optimaler Versuchspliine in der polynomialen Regression. Ph.D. Thesis, Universitfit Augsburg. Rao, C.R. (1973). Linear Statistical Inference and its Applications, 2nd Ed. Wiley, New York.