Journal
of Statistical
Planning
and Inference
319
35 (1993) 319-333
North-Holland
Estimation T. Kubokawa,
of the variance and its applications K. Morita,
S. Makita and K. Nagakura
Department of Mathematical Engmeering and Information Physics, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan Received 4 September
1990; revised manuscript
Abstract: For the variance
of a normal distribution
tors superior
to the best affine equivariant
numerically.
As an application,
it is demonstrated Stein estimator
received 14 April 1992
the simultaneous
that using an improved
with an unknown estimation
estimator
for the mean vector. Also simulation
AMS Subject ClassiJication: Primary Key words andphrases:
of a multivariate
of the variance
estima-
asymptotically
normal mean is considered
leads to the improvement
and and
on the James-
results for the relative risk improvement
62FlO; secondary
are given.
62507.
variance; multivariate
Point estimation;
mean, three types of truncated
are treated and their efficiencies are compared
normal mean; shrinkage;
James-Stein
estima-
tor; efficiency comparison.
1. Introduction In many applications such as designs of experiments and linear regression models, one has the models with the following canonical form: The statistics X, Y and S are mutually independent random variables and X-N,<&
02ZP),
Y-N 4’(< 0’1 4)
and
S-rr’~~ n’
where 0 and a2 are unknown parameters of interest and [ is a nuisance unknown parameter. In this paper we want to treat two problems of estimation of the variance o2 and the mean vector 8. For estimation of 02, it is desired to find a superior estimator 6 = 6(S, A’, Y) in the sense of minimizing the risk function R(o,6)=E,[(6/a21)2] for co = (0, <, 02). Let Z= (X’, Y’)’ and ZI= (8: 5’)‘. This problem is invariant under the group of affine transformations (S,Z)+(&s,aTZ+b),
(&)+7W,aZ~+b)
for any positive constant a> 0, any vector b and any (p + Correspondence to: Prof. T. Kubokawa, Information
Physics, Bunkyo-ku,
0378-3758/93/$06.00
0
Tokyo
1993-Elsevier
University
of Tokyo,
q) x (it, + q)
Dept. of Mathematical
113, Japan.
Science Publishers
B.V. All rights
reserved
orthogonal
Engineering
and
320
T. Kubokawa et al. / Estimation
of variance
matrix r. Any affine equivariant estimator 6,,(S, Z), satisfying 6,&a2S, aTZ + b) = Z), can be expressed as 6&S, Z) =cS for c>O. Then the best affine equivariant estimator in terms of risk is
a26&S,
60=(n+2))‘S,
(1.1)
with the risk R(o, 6,) = 2/(n + 2). For improving subgroup described by (S, Z) + (a*S, al-Z),
on &, Stein (1964) considered
a
(a2,p) + (a2a2, arp),
and tried to look for a better estimator among a broader class of scale equivariant estimators &,(,I$ Z) = S@(IjZl12/S) for a positive valued function @( +) and the Euclidean norm 11 . 112.In fact, he derived the improved estimator 6i =min
& 1
S+
IlXl12 + IIY/l2
n+p+q+2
(1.2)
I ’
which may be viewed as a preliminary test procedure in the sense that the decision whether to employ 6e or (S+ /lXl12+ IIYl12)(n +p+ q+ 2)-l as an estimator of o2 depends on a test statistic for testing He: p =0 vs. H,: ,u#O. The above type of domination results in point and confidence estimation and various extensions have been studied by several authors. For the bibliography, see Maatta and Casella (1990). Among these, Sinha (1976) and Gelfand and Dey (1988) considered the estimator a2=min
S+ 11~112 s+ 11~112+ IIYl12 &, n+p+2’ n+p+q+2 L I )
(1.3)
and verified that a2 has a smaller risk than min_{&, (S_+ llXl12)(n +p + 2)-l). While the improvement by 6, is possible for small IIXIIL+ IIYIJL,the superiority of a2 arises when either of l(Xlj2, llXl12+ IIYl(* is small. In fact, if 11Xl12is small, and if 11Y/l2 is This imvery large, then llXl1* is available, but the effect of l/X/l2 in 6i disappears. plies that d2 is superior to 6t. However, when /lXl12 is very large, even if IIYl)* is small, the effects of IIYII2 in both 6t and a2 are gone. To eliminate this undesirable property, it may be reasonable to consider the estimator of the form Bs=min
&, t
s+
/Ixl12s+ IIYl12 s+ 11x/1*+ IIYl12
n+p+2’
n+q+2’
n+p+q+2
I ’
(1.4)
which was suggested by George (1990), but no dominance properties for 8s have been established. It seems difficult to present an exact decision-theoretic result. In Section 2, we derive the asymptotic risk expansions for the estimators 6i, a2 and 8s and compare numerically their second order terms. From the expansion for as, it is analytically demonstrated that 8s is asymptotically better than 6e. Section 3 presents the results of Monte Carlo simulation for the relative risk improvement. As expected, if one has the prior knowledge that A = ~I~~~*/(2a2) or T= II
321
T. Kubokawa et al. / Estimation of variance
and
r, the estimator
d3 is desirable.
It is also revealed
that
the relative
risk im-
provements are getting greater for larger dimension p or q. In Section 4, we deal with the problem of estimating the mean vector 0 simultaneously and consider an application of the estimation of variance. Stein (1956), James and Stein (1961) showed that for ~23, the usual estimator X is dominated
by the shrinkage
estimator
eJS = (1 - (p - 2)6e/l~X1~2)X relative to the loss function I/8--B112/a2. Since the estimator 6e of a2 can be improved on by using the information contained in X and Y, we have one question whether gJs can be further improved on by employing better estimators of o2 instead of &,. This is a conjecture of George (1990). The answer is affirmative and it can be proved that eJS is dominated by
min 6 s+ IIYl12 s-t 11~/12+ lIYl12x n+p+q 1 IlXl12 L O’n+q+2’
x-p-2
for instance. That is, using the improved estimator of the variance leads to the improvement on the James-Stein estimator of the mean vector. Also numerical investigations of relative risk improvements are given. Finally two examples in designs of experiments are stated.
2. Asymptotic
properties
of the truncated
estimators
For investigating the nature of the risk improvements about the estimators given in Section 1, we shall derive their asymptotic risk expansions and compare their second order terms. Define the asymptotic risk difference for the estimators a0 and 6 by ARD(G0,6)=
lim n2{R(~,60)-R(o,6)}. n-C=
(2.1)
Note that llXl12-a2~~+2, and IIYll’- a2~j+2K where J and K follow Poisson laws with means L = (181(2/(2a2) and r= II
Proposition
2.1. Let u1 =p - ljXl12/~2 and u2 = q - I/Y112/a2. Then, ARD(G0,6,)=E[(u,
+u2)(u1 +u2+4J+4K)I(ul
+u2>O)],
(2.2)
ARD(G0,~2)=E[~,(~1+2u2+4J+4K)I(ul~0,u2<0) + (u, + u,)(u,
+ u2 + 4J+ 4K) (2.3)
T. Kubokawa
322
ARD(c~~,c~~)=E[u~(LQ
et al. / Estimation
of variance
+2u, +4J+4K)Z(u,
20, u2
+24,(~~+224,+4~+4K)Z(24~<0,24~~0) +(u, + U2)(UI +u,+45+4K)Z(u,
>O,u2>0)],
(2.4)
where I(. ) designates the indicator function. The proof is given in the Appendix. Noting ~,(~~+22~2+4~+4K)+u~(U2+2U~+4~+4K)-2U~U~ E [u2 1K] = -2K, we can rewrite (2.4) as ARD(60,83)=E[~,(~1
that
(ul + u2)(uI + u2+4J+4K) and that E[u, IA=-25
= and
+2u,+4J+4K)Z(u,>O) +U2(U2+2U1 +4J+4K)Z(U,>O) -2u,u2z(u~~Q~2~0)1
=E[u,(u, +4J)Z(ul >o)+U2(U2+4K)Z(U2>o) - 2241u&Q 2 0, u2 2 011 =E[u,(u, +4J)Z(u,
>0,24,<0)
+ u2 (u2 + 4K)Z(u, < 0, u2 > 0) +{(U,-U2)2+4~U,+4K~~}Z(~~~0,~2>O)] >o. Hence
Table
we get
1
The asymptotic I 0
1
3
5
(2.5)
risk differences
for p =
q =2
0
1
2.59 1.81
2.21
82
1.67
63
1.03
1.39
4
2.21
62
1.93
83
3
4
5
1.57
1.01
0.62
0.36
1.46
1.29
1.19
1.16
1.37
1.27
1.18
1.13
1.57 1.52
1.01
0.62 1.06
0.36
1.24
0.97
0.20 0.91
1.39
1.47
1.30
1.14
1.02
0.95
61
1.01
0.62
0.36
0.20
0.11
0.06
62
0.69 1.14
0.52 0.87
0.43
0.38
63
0.99 1.27
0.65
0.51
0.36 0.43
61
0.36
0.20
0.36 1.13
0.23
0.11 0.17
0.06
82
0.13
0.03 0.11
0.11
0.95
0.66
0.43
0.28
0.20
5
61
63
2
0.02
T. Kubokawa et al. / Estimation of variance
Proposition
2.2.
The estimator ~3~is asymptotically
323
better than do.
Table 1 provides the numerical values of the asymptotic risk differences _4RD(& S) for 6 = aI, a2 and a3 when p = q = 2. Table 1 reveals that (1) the asymptotic risk reduction of 6, is great (resp. small) when both A and r are small (resp. large), (2) & and a3 are better than d1 when A is small and Y is large, (3) a3 is better than a1 and a2 for large A, (4) ARD(&,&) is concave in 5 when L is small, and (5) ARD(&,6,) and ARD(&&) are decreasing in r. Although the maximum reduction for a3 is smaller than those for &I and &, a3 is superior in a wide area of unknown parameters I and T.
3. Simulation
results
In this section
we present the results of Monte risk improvement which is defined by
Carlo
simulation
for the relative
RRI(G)=lOOx(R(w,&,)-R(o,&}/R(o,&), where 6=a1, a2 and d3 given in Section 1. This is done in the two cases of (n=l,p=q=l) and (n=l,p=q=lO). Using a VAX8600 computer with ULTRIX-32 operating system at the University of Tokyo, uniform deviates are generated by the linear congruential method stated in Fushimi (1989). Tables 2 and 3 report the average values of the relative risk improvements based on 10000 replications. In the tables, we see that for the small sample size (n = l), the estimators aI, C& and a3 have the similar risk properties as described below Proposition 2.2. In the
Table 2 The relative
risk improvements
in estimation
I
r
0
0
61
2.86
62
1.93
63
1
of the variance
for p = q = 1 and n = 1 (in percents)
2
3
4
5
2.28
1.54
1.04
0.67
0.49
1.65
1.55
1.51
1.51
1.51
1.01
1.27
1.40
1.46
1.49
1.56
2.47
1.58
1.14
0.77
0.56
0.47
2.11
1.55
1.30
1.23
1.22
1.21
1.50
1.30
1.24
1.20
1.27
1.20
1.36 1.31
0.90
0.64 0.75
0.53
0.46
0.91
0.70
0.39 0.66
1.62
1.09
0.85
0.71
0.68 0.69
0.76
0.57 0.59 1.00
0.47 0.51
0.42
0.75 1.62
0.70
0.48 0.55
0.37 0.47 0.52
0.65 0.37 0.47 0.48
T. Kubokawa et al. / Estimation of variance
324
Table 3 The relative I
risk improvements
in estimation
0
T
of the variance
for
p = q = 10 and n = 1 (in percents)
1
2
3
4
6.92
6.87
6.69
6.41
6.08
5.71
6.38
6.38
6.34
6.27
6.21
6.16
5.86
6.14
6.23
6.24
6.21
6.17
6.83
6.65
6.39
6.06
5.70
5.32
6.56
6.45
6.32
6.20
6.09
5.99
6.08
6.28
6.31
6.24
6.14
6.04
5
6.35
6.03
5.68
5.30
4.89
4.49
6.28
6.31
5.81
5.58
5.39
4.49
6.19
6.27
6.16
5.96
5.73
5.50
5.65
5.29
4.89
4.48
4.09
3.70
5.64
5.34
5.04
4.76
4.50
4.28
6.17
6.15
5.93
5.62
5.27
4.93
sequel, as expected, if one has the prior knowledge that both A and r are small, then 6, is desired and if it is known that I. only is small, the estimator & should be chosen. When one has no information about i and r, the estimator a3 may be desirable. From the comparison of Tables 2 and 3, it is also revealed that one can get great relative risk improvements for large dimension p or q.
4. An application
to simultaneous
estimation
of a mean vector
The improvements in estimation of the variance are discussed in the previous sections. As an application of the variance estimation, George (1990) suggested the problem of simultaneous estimation for the multinormal mean vector. Following his suggestion, we shall demonstrate that using the improved estimator of the variance leads to the improvement on the James-Stein estimator. In the model given in Section 1, suppose that we want to estimate the mean vector 6’ relative to the loss l/15- 8112/02. Stein (1956), James and Stein (1961) showed that for ~23, the estimator X of 0 is dominated by the shrinkage estimator P=
(1 -(p
- 2)60/~Ix~~2)x
with do= (n +2))‘S. Since then, as one of the most famous instances of inadmissibility, this Stein-rule estimation theory has been studied in a considerable literature. Looking at the James-Stein estimator oJs in the model given in Section 1, we notice that the statistic S is utilized for improving on X, while the statistic Y is still neglected. Then we have the question: Can Y be used for dominating the James-Stein estimator? Noting that the random variables X, Y and S have a common parameter cr2 in distributions, following the results in the previous sections,
T. Kubokawa et al. / Estimation of variance
325
we think of an idea that Y may be available for estimation of the variance 02. In the next subsections, we shall verify that eJS is further dominated by using an improved 4.1.
estimator
based
on S and
Y for the variance
Use of the statistic Y for estimation
Consider
o2 instead
of Se.
of the variance
the estimator
&f)=(
l-p~sf(llYl12~s)
where f (. ) is a positive valued estimator of 02, that is,
function.
>
x9
Assuming
(4.1) that
E[{Sf(llY1)2/S)/a2-1}2]
Sf(11 Y(I’/S) is an improved for all 0,
by 6(f).
For the purpose,
=~2~W’(Z)l,
(4.2) the following
(4.3)
where Z is a random variable having N(P, a2) and h( . ) is an absolutely continuous function. By using the identity (4.3), the risk function of 6(f) is written as
~(sf)2,0’-~x’(x-B)sf/02] llxl12
R(w &f )) =p+E =P+E =p-E[
(P-2)2
Ilxl,2 (Sf)2/02-2
[
go21
(P-2)2
llxl12 sf
1
+E[ go21
which yields Theorem 4.1. Let p 2 3 and assume that (4.2) holds. Then the estimator g(f) given by (4.1) is better than sJs. Theorem 4.1 implies that any improved estimator of the variance gives an improved version of the James-Stein rule. The usual choice for the function f (. ) is f(u)=min{(n+2)-‘, (n+q+2)-‘(l+u)}, giving 6, =X-‘G
min [a,,
n+;+2
ww~]~.
Other choices are the smooth function given by Brewster and Zidek (1974), several forms stated in the previous sections and so on. Theorem 4.1 is further extended in the next section.
T. Kubokawa
326
4.2.
et al. / Estimation
of variance
Use of the statistics Y and X for estimation of the variance
In the estimation of the variance consider the estimator
a2, we try to employ
not only
Y but X, and
(4.5) where g( . , . ) is a positive-valued function. In this case, it should be noted that the estimator for the variance is not independent of the estimator X for the mean. Assume that g(u,o) is an absolutely continuous with respect to u. Therefore from the Stein identity (4.3),
R(o,@g))=p+E
(Sg)2/a2-
=P+E
,,x,,2 (sg)2/02-
2(P - 2) ,,x,,2 X’(X2(P - 2)
,,x,,2 S
c a2
+(P-~)~E where g;,) = (d/do)g(u,
m(Sg/a2-I)2
u). Here we observe
0)Sg/a2
1 IIX112g~2~
1,
(4.6)
that for n b 3,
(4.7) where U=x,“(r)/~i_~, ditional expectation
V=$@)/X~_~, r= l/
gA,,(a,b)=E[x,f_2
1U=a,
V=WE[(x~_2)2 1U=a, V=bl
jo” t3S,-2(t)fp(bf;~)Sq(af; ~1dt =!r t4fn-2(t)fp(bt;~)f4(at;5)dt’ where fp(x; A) designates the density density of x,2-2. Since &(x; J)/‘_(x)
of xj (A) and f, _ 2 (x) = f, _ 2(x; 0), namely is increasing in x, we have that
jr t3f,-2(t)fp(bt;~)f4(at;
r) dt
!r t4f,-2(t)fp(bt;~)f,(at;
r) dt
(4.8) the
T. Kubokawa
~
et al. / Estimation
327
50”t3fn-2(t)fp(bt)fq(at) dt j,” t4fn- z Wf,Wf,(aO dt e-(‘ta+b)f/2dt
(n+p+q)/2-1
=
of variance
so”t S$ t
(n+p+q+2)/2-1
&l+a+b)t/2
&
=(l+a+b)/(n+p+q),
which,
together
with (4.8), shows that
g&Au, V<(l+ Hence
(4.9)
u+
(4.10)
W(n+p+q).
from (4.7), if we set
l
g*(U, V)=min
g(U, V),
l+U+V
n+p+q
(4.11)
1’
then for all CO,
E[-$ GE[ Therefore
[Sg*(q$y&l]2]
$
[Sg(~,+y0’l12].
we get the following
(4.12)
theorem.
Theorem 4.2. Let n,p > 3 and let g*(u, u) be given by (4.11). Assume the following
conditions: (a) g(u, u) and g*(u, u) are absolutely continuous (b) E[d/do{g*(U,v)-g(U,v)} lo=v]>Oforallo.
with respect to v.
Then the estimator 6(g*) is better than 8(g). Corollary
4.3. Let n,p>3
and put a,, _s+
Q2=X-PAmin
IlX/12
IIY/l2s+ 11~112+ IIYl12x
n+q+2’
L
n+p+q
I
(4.13)
*
Then Q2dominates dl, being better than eJS. In the case where the statistic Corollary
Y does not exist, the same arguments
give
4.4. Let n,p 2 3 and put (4.14)
Then d3 is better than I?‘.
T. Kubokawa et al. / Estimation qf variance
328
Table 4 The relative
risk improvements
I
0
5
0
p
From
in estimation
of the mean vector for p = 4, 4 = 2 and n = I (in percents) 1
2
3
4
5
16.16
16.16
16.16
16.16
16.16
16.16
01
16.93
16.82
16.64
16.42
16.31
16.23
02
18.38
18.02
17.67
17.26
16.93
16.63
e3
20.01
20.01
20.01
20.01
20.01
20.01
04
18.95
19.55
19.83
19.89
19.95
19.97
QJS
10.26
10.26
10.26
10.26
10.26
10.26
e1
10.79
10.52
10.45
10.37
10.33
10.28
02
11.40
11.12
10.85
10.65
10.51
10.41
03
12.12
12.12
12.12
12.12
12.12
12.12
e4
11.68
11.91
12.02
12.07
12.09
12.10
&JS
4.98
4.98
4.98
4.98
4.98
4.98
01
5.24
5.15
5.07
5.03
5.00
4.99
e2
5.35
5.26
5.16
5.08
5.03
5.01
e3
5.39
5.39
5.39
5.39
5.39
5.39
04
5.41
5.43
5.42
5.40
5.39
5.38
QJS
2.88
2.88
2.88
2.88
2.88
2.88
01
3.06
3.00
2.94
2.90
2.88
2.88
e2
3.09
3.02
2.96
2.92
2.90
2.88
03
2.99
2.99
2.99
2.99
2.99
2.99
04
3.11
3.08
3.03
3.01
2.99
2.99
the results
given by Sections
2 and 3, one may consider
s+ 11~112s+ IIYl12s+ ~
B,=X-p$min[&,
n+P
‘n+q+2’
the estimator
11~112+ - IIYl12x. n+p+q
I
-” , Ql, e2, I$ and e,, in the case ofp=4, q=2 For the five estimators 19 Table 4 provides the simulated values of the relative risk improvements defined
(4 15)
and n=l, which are
by RRI(@=lOOx{R(w,X)-R(~,~)}/R(u,X),
where R(w, 6) =E[ll&
8112/a2]. Table
eJs4B82af?
4 reveals that
49
where 6+6* means that 6* is better than 6. Also, if A is small, then & is desirable and if A is large, the estimator d4 is a good choice. The above relations between estimators in Table 4 suggest that Theorem 4.2, Corollaries 4.3 and 4.4 hold without the condition of n > 3. It is also expected that the relative risk improvements are getting greater for larger dimension p or q. Remark
4.1.
Table
4 reveals
that
& has a substantially
smaller
risk when
A is
T. Kubokawa et al. / Estimation of variance
329
small, or 0 is close to the origin. Also a simulation result we tried for n = 10, p = 4 and q=2 shows that the maximum relative risk improvement for 0s is 44.75 percents
at A= 0.
Thereby I!?~,shrunken toward the origin, is a good estimator when it is known that 0 is around the origin. In general, based on vague prior information as to 0, there are cases where 19may be assumed to be close to a linear subspace V/c Rp of dimension dim V=r. Sclove, Morris and Radhakrishnan (1972) considered shrinking X toward subspace V. Let PvX denote the projection of X onto I/ defined by
The resulting
shrinkage
estimator
is given by
p-r-2
MV=X-
min60,
I,x_p,x,,2
s+ 11x- P,XI12
(X-
ni-p-r
P,X),
(4.16)
which dominates X for p - r 3 3 and presents, of course, a great risk gain for 8 E I/. For instance, if V is thought to be {u: u =Pe} for unknown scalar ,U and e=(l,..., l)‘, then @s(V) is the Lindley (1962) type estimator S+ IlX-Xel12
P-3 BL=X-
(IX_Xel12
i
4,
n+p-1
(X- Xe), 1
where X= Cf=‘=,Xi/P (George, 1986b). George (1986a) further generalized the shrinkage estimators to the cases where several subspace targets Vi, . . . , i$ are taken, and considered shrinking X towards V,, . . . , V,. Such procedures are called multiple shrinkage estimators, which will be also considered for (4.16). Remark 4.2. The same way of thinking as in Remark 4.1 is applicable to the estimation of the variance. Let W be a linear subspace of Y with dim W= t. < guessed be
W, then the estimator 6,(W)
= min
do, i
shrunken
to W is proposed
as
s+1/Y-P,YI~2 n+q-t+2
I ’
(4.17)
which is better than So and the maximum improvement is attained at l/l- Pw ~112/(202)= 0. For instance, putting W= {o: o=pe} yields min{aO,(S+ I/Y- Yel~2)(n+q+1)-1} for q > 2. When several subspace targets W,, . . . , wk are chosen, a combined shrinkage estimator Cf=, PiSs( I+$) may be considered as one of improved procedures where Pi’s are positive constants satisfying Cf= 1pi = 1. If both B and r are thought to be close to guessed subspaces, an appropriate combination of (4.16) and (4.17) will provide a substantial improvement. We conclude
this section
with examples.
T. Kubokawa et al. / Estimation of variance
330
Example 4.1 (one way analysis of variance). Consider the one way layout with XV (i= 1, . ..) m; j = 1,2) independent normal variables with means ,u; and variances cr2 (Cox and Hinkley, 1974, p. 17). Let Xj=+(Xj1+Xi2) for i=l,...,m 1 C (Xi;-Xi)2. Then X= (Xi, . . . , X,)’ and S are independent random such that X-
N,(u
and S= variables
S-(&2)X$
(Q~/~)M,
p,)‘. This is the situation where the statistic Y does not exist. where p=(pi,..., When we want to estimate p for m = 3, by the Stein effect, X can be improved on by the James-Stein rule. However, the estimator +S has a disturbance to a certain extent since the degrees of freedom in S are small. In this case it seems meaningful to modify +S by using the statistic X, so that employing the estimator 8s may be better. In some practical applications, the treatment contrasts ,u2-pl, . . . ,P~ --,u~ are parameters of interest. For example, this is the situation where ,~i is the effect of and we a control treatment, ,u2, . . . , pm are the effects of new competing treatments want to know the treatment differences pi - pl. Let 2 = (X2 - Xt, . . . ,X, -Xi)‘, Y= (2/m)“2(X, + . ..+X.,,), I!?=(~~-~~,...,~~-~~)’ and ~=(~/Rz)“~(~~+...+&). Then 2 and Y are independent variables such that z-N,-,(~,02~,-i), which is the model
given in Section
Example 4.2 (balanced incomplete block design (BIBD) as
1.
block designs).
Consider
a balanced
incomplete
for (i,j)ED,
E;j=/A+CXi+Pj+&ij
where D is defined of treatments and dependent random tions and k = # of
Y- N(& 02),
by a BIBD, aj (i= 1, . . . , t) and pj (j= 1, . . . , b) are fixed effects blocks, respectively, such that C ai= C bj = 0, and Eij are inerrors following N(0,a2). Let r= # (the number) of replicacells per block. Then Hirotsu (1976) derived the canonical form
as X-N,_,(8,a2Z,-~), Y- Nb(& 02&),
S-a2xik-b-1+1y where X, Y, S are minimal sufficient and mutually independent and 8 corresponds to a vector of treatment contrasts. Here f3 is of interest and (<, a2) are nuisance. This is the model treated in the present paper. For the designs (T, t, b, k) = (3,4,6,2), (4,5, 10,2), we have (t - 1,6, bk - b - t + 1) = (3,6,3), (4,10,6), respectively. In these cases, the dimension of Y is greater than the degrees of freedom in S and the information contained in Y is available for improvement in the estimation of 0.
T. Kubokawa et al. /Estimation
of variance
331
Appendix Proof of Proposition u = (S +
2.1.
Let
s
I/XII2+ 11 Yl/2)/02,
w1=s + IIX(12
and
s+ IIW w2=s+ l/3$+ llYl/2’ Then the random independent and
u, w1 and w2 are, given .Z and K, conditionally
variables
2 v-Xn+p+q+2J+2K3
mutually
wr-beta(t,y)
and
n+p+2J
w2 - beta
Proof of (2.2).
2
Using
at/a2
q+2K ‘2
the random
= min
>* variables
o, w1 and w2, we express
1
Wl w2
-
6r/02
as
u
n+2’n+p+q+2 u
Wl w2 =-VI,,
+
n+2
n+p+q+2
(A-1)
I129
Z,,=Z{w,w2~(n+2)/(n+p+q+2)} and Zr2= l-Z,,. Letting T,,= n(w,w2-(n+2)/(n+p+q+2)}, we can see that T,-,ul+u2 a.s. as nd@. Note that 6a/02 = (n + 2))‘~~ w2u(Zr1 + Zr2), and that given J and K, u is conditionahy independent of (w,, w2). Since w1 w2 = T,/n + (n + 2)/(n +p + q + 2),
where
=E[[(3-92-(n+p~q+2-1)2]42] =E =E
v2
T2+
2
n2(n+2)2
’
n(n+2)
n+p+q+2J+2K
* n+p+q+2
I n+p+q+2J+2K+2
(
n+p+q+2
=n-2E[{T,2+4(J+K)T,)Z12] which yields (2.2) by noting
-V
112
>I
1
n+p+q+2J+2K+2
n(n + 2) +2T,
v2
T
T,2
n(n + 2) -1
>I
112
1
+o(nm2),
that Zr2-’ Z(u, + u2 20)
as.
(A4 as n--t 03.
T. Kubokawa et al. / Estimation of variance
332
Proof of (2.3).
Similar
to (A.l),
w2
n+p+2
V 0122
+
where Z2i = I{ w1 <(n + 2)/(n +p + 2), w1 w2 < (n+2)/(n+p+2), ~,<(n+p+2)/(n+p+q+2)} (n+p+q+2), w2>(n+p+2)/(n+p+q+2)}. I22 -+ I(u, >0, u2O, n{w2-(n+p+2)/(n+p+q+2)}, we can see same arguments as in (A.2),
=n-2~[(Tn
-
u,,)(T,
+
u,, +
n+p+q+2 (n + 2)/(n
I23 3
+p + q + 2)}, Zz2 = I{ w1 >
and Z33=I{~i~2>(n+2)/ Note that Z2, +I(u, GO, uI +u,O) as. as n + 03. Letting U, = that U,,-+u2 a.s. as n-+~. By the
45+ 4K)I22 + T,(T, + 45+ 4K)I,,]
+ o(n-2),
which shows (2.3). Proof of (2.4).
Write d3/a2
as
+
n+p+2
V
w,w2+1-w2
w2 0131
vz32 +
n+q+2
VI33+
n+p+q+2
IMY
w,w2<(n+2)/ 13,=I{w,<(n+2)/(n+p+2), qwl w2<(n+W~2)~ w2<(n+p+2)/(n+q+2)x I32=I{Wi>(n+2)/(n+p+2), (n+p+q+Q), w2<(n+p+2)/(n+p+q+2)}, 133=I{qw,w2>(n+2)(1-w2), (wlw2+1-w2), and w2> (n +p + 2)/(n + q + 2)(w, w2 + 1 - w2), w2(1-w,)~p/(n+p+q+2)) w2>(n+p+2)/(n+p+q+2), w2(l-wl)< 13,=I{w,w2>(n+2)/(n+p+q+2), p/(n+p+q+2)}. Note that I3i+I(uiO, ~290), I33+ I(u,O) and 134+1(~1>0, u2>O) as. as n+m. Letting ‘V,=n{w,w2+ I- w2- (n+q+2)/(n+p+q+2)}, we see that V, + u1 a.s. as n + 00. Similar to where
(A.2),
=nm2E[(T,-
U,,)(T,+ U,,+4J+4K)132
+ (T, - V,)(T, + V, + 4J+ 4K)13, + T,(T, + 4J+ 4K)13,] + o(ne2), which establishes
(2.4).
Acknowledgements The authors are grateful to Editor, Associate valuable comments and helpful suggestions.
Editor
and the referees
for their
T. Kubokawa
et al. / Estimation
of variance
333
References Brewster, Cox,
J.F.
and _I. Zidek
(1974).
D.R. and D.V. Hinkley
Fushimi, Gelfand,
Improving
(1974).
on equivariant
Theoretical
Statistics.
M. (1989). Random Numbers. Univ. Tokyo E. and D.K. Dey (1988). Improved estimation
model.
J. Econometrics E.I. (1986a).
Minimax
George,
E.I. (1986b).
Combining
multiple
George, E.1. (1990). Comment 90-120.
shrinkage
minimax on paper
C. (1976). Analysis
estimation.
shrinkage by J.M.
of Variance.
Lindley, D.V. (1962). Contribution 24, 285-287. Maatta, J.M. and G. Casella Science 5, 90-120.
Kyoiku
to discussion
B.K. (1976). On improved
Stein, C. (1956). Inadmissibility Stein,
Third Berkeley
C. (1964).
unknown
mean.
Inadmissibility Ann.
Inst.
estimators
Maatta
Press,
Math.
and G. Casella
2, 21-38.
London.
Math.
Tokyo
Berkeley
Stein. J. Roy. variance
Symp.
Statist.
of preliminary-test
variance.
1, 197-206,
for the variance
16, 155-160.
Science 5, Math.
J.
Sot.
estimation.
Ser. B
Statistical
estimators
for
Mu/t. Anal. 6, 617-625.
for the mean of a multivariate
Probab.
81, 437-445.
Assoc.
(1990). Statistical
(in Japanese).
by C.M.
of the generalized
of the usual estimator Statist.
14, 188-205. Statist.
loss. Proc. Fourth
(1972). Nonoptimality 43, 1481-1490.
Statist.
Statist.
J. Amer.
in decision-theoretic
of the usual estimator Symp.
Ann.
estimators.
on paper
(1990). Developments
Sclove, S.L., C. Morris and R. Radhakrishnan the multinormal mean. Ann. Math. Statist.
tion. Proc.
Statist.
Press, Tokyo (in Japanese). of the disturbance variance in a linear regression
James, W. and C. Stein (1961). Estimation with quadratic Statisf. Probab. 1, 361-379, Univ. California Press.
Sinha,
Ann.
and Hall,
39, 387-395.
George,
Hirotsu,
estimators. Chapman
normal
Univ. California of a normal
distribu-
Press.
distribution
with