Dominance of the positive-part version of the James-Stein estimator

Dominance of the positive-part version of the James-Stein estimator

Statistics & Probability Letters 7 (1989) 97-103 North-Holland September 1988 DOMINANCE OF THE POSITIVE-PART VERSION OF THE JAMES-STEIN ESTIMATOR D...

289KB Sizes 0 Downloads 30 Views

Statistics & Probability Letters 7 (1989) 97-103 North-Holland

September 1988

DOMINANCE OF THE POSITIVE-PART VERSION OF THE JAMES-STEIN ESTIMATOR

David M. N I C K E R S O N Department of Statistics, University of Georgia, Athens, GA 30602, USA Received August 1987 Revised February 1988

Abstract: James-Stein estimators are shown to be dominated by positive-part versions when estimating the mean of a p-variate normal distribution ( p >~3), where the covariance matrix is known up to a constant. A Monte Carlo study is undertaken to compare their risks.

Keywords: estimation, James-Stein estimator, positive-part rule.

1. Introduction and summary

Let X 1, X 2. . . . be a sequence of i.i.d. N(O, o2V) random vectors where 0 ~ R p and o 2 ~ (0, o0) are unknown ( p >/3); while V ( p × p ) is a known positive definite matrix. Based on ( X 1. . . . . An) if the estimator 8n = 8 , ( X 1. . . . . An) is used for estimating 0, suppose the loss incurred is given by L ( O , 8 , ) = ($~ - 0 ) ' Q ( $ , , - 0 ) ,

(1.1)

where Q denotes a known positive definite matrix. It is well known that James-Stein estimators for this case dominate the sample mean .~, = n -1 E in= l X i with respect to risk (expected loss) under the loss (1.1). This is demonstrated in James and Stein (1961) when Q = v = I p and 02 is known, in Berger (1976) when o 2 is known and in Efron and Morris (1976) when Q = V = Ip but o 2 is unknown. For a general class of estimators dominating the sample mean under loss (1.1) one may refer to Baranchik (1970), Strawderman (1971), Efron and Morris (1976) and Berger (1976). However, what is widely accepted, but as of yet not explicitly proven, is the behef that the positive-part versions of the James-Stein estimators for this case dominate the usual James-Stein estimators and hence the sample mean. The purpose of the present paper is to prove the above assertation for the general class of estimators exhibited in Hudson (1974) and Berger (1976). In Section 2 we exhibit the class of James-Stein estimators found in Hudson (1974) and Berger (1976) and its positive-part counterpart. Next we state the result concerning the dominance of the class of James-Stein estimators over the sample mean under the loss (1.1). We then give the main result of this paper which concerns the dominance of the class positive-part versions over the class of James-Stein estimators under the loss (1.1). Section 3 contains Monte Carlo simulations comparing the risks of the sample mean, the James-Stein estimators and their positive-part versions. As expected, the positive-part versions achieve greater risk reduction than the usual James-Stein estimators. Research partially supported by University of Georgia Research Foundation Grant Program. 0167-7152/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)

97

September 1988

STATISTICS& PROBABILITY LETTERS

Volume 7, Number 2 2. The main result

From Hudson (1974) and Berger (1976), for a sample of size n from the aforementioned distribution, define the class of James-Stein estimators as 8~JS_ - 8;JS( X 1 . . . .

Xn)=X-I- ( Ip

r(F~)

F, Q-1V-' )( X,,-h),

(2.1)

where h ~ R p is a preassigned constant (usually taken as the prior mean), Ip is the identity matrix of order p,

F , , = n ( X n + X)tv-Io-lv-I(x,n-X)/S2n) 2 -- -[ ( n - 1 ) p + 2 ]

Sn

-1 ~ ( X j - X , , )--V - t 1( X j - X . ) j=l

for every n >/2, and r is a function which will satisfy certain conditions. From the class of James-Stein estimators defined in (2.1) we define the class of positive-part James-Stein estimators as

8+=8+(X,,. "" X~)fh'+(l, where

( le

r(en) F~ Q-aV-1

r(F,) F, Q - l v - 1 )+ (~e~n-- X)'

)+ [(( =p-a

diag

1

r(Fo) F, d;

)+

.....

(2.2)

1

F,

e;'

e,

(2.3)

( a ) + = max (0, a) and P is a nonsingular matrix such that p Q - 1 p , = Ip and P V P ' = D = diag(d a. . . . . de). (The existence of such a matrix is given by the simultaneous diagonalization theorem). This brings us to the following theorem. Theorem 2.1. Consider the loss function (1.1). Assume that for every n >12, r is absolutely continuous and

(i) 0 < r(Fn) < 2(p - 2); (il) ~bn(Fn) -- F(p-2)/2r( Fn)//(2(p -- 2) - r( Fn)) 1+2b" is nondecreasing in F~, where b~ = ( p - 2 ) / ( ( n - 1)p + 2). Then, 8Js dominates X~. Proof. The proof is similar to the arguments found in Efron and Morris (1976). At this stage we now know that 8Js dominates X n with respect to risk under the loss (1.1). We next prove the dominance of 8+ over 8Js. Theorem 2.2. Consider the loss function (1.1). Assume that for every n >t 2, r is absolutely continuous. Then,

8+ dominates 8J~s. Proof. Note that

_ ] q,-E[(Xn--~)t(Ip

r(gn)Q-l~-l)'Q(lp-Fn

r(gn)Q-1V-1)(Xn-~)].gn (2.4)

98

Volume 7, Number 2

STATISTICS & PROBABILITY LETTERS

September 1988

Now, using the existence of a nonsingular matrix P as previously encountered in equation (2.3), we have

r(F,) F.

E[(O-•)'Q(I,

e - i v - a)(~n_ X)]

= g[(e(o-x))'(ee-te ') l(/p -

r(Fn)

Fn

(,O-le,)(eve,)-~)e(~_x)]

r(F.) g 1) -1 )(Z,, _ , ) ]

= E [(1~- h)' (lp

(2.5)

where j=l

Fn=,(e(i.-x))'(ere')-'(ee-'e')(eve')'e(in-x)/s~ =.(~.-~)'o 2(L-~)/s~ and s2= [ ( n - 1)p+ 2]-a ~ ( P ( X j , Xn))' (PVP')-'P(X2-X.) j=l

=[(n-l)p+2]

-1

~"~(Zj-Z,,)D-(Zj-L). --

t

1

j=l

Next, using the Helmert orthogonal transforn~_ation, define Y2= (ZI - Z2)/~/2, Y3 = (Z1 + Z2 - 2Z3)/v/~, .... Then the Y~'s are i.i.d. N(O, o2D) and Z,, is distributed independently of (Y2..... Y~) for every n .>/2. Further, s2 = [(n-- 1)p + 2]-1 ~ Yj,D_lyj. j-2

Let ft, denote the o-algebra generated by (I12..... Y.). Then we have

r(F.)

F. D-1)(2~-')' fl"]}" (2.6)

Now,

e[(~-v)'(1,

r(F.) g o -1 )(~-v) I/~.] = E n-l°adi E $iWni 1 i--1

fn

d~-' [fin

'

(2.7) 99

STATISTICS& PROBABILITYLETTERS

Volume 7, Number2 where

N(Si,

8i ~- ( ~ i - "Y,)(n-l°2d,) -'/2, W~, = (Z~,- Vi) × (n-lo2d,) -1/2,

September 1988

the W,,'s are independent and W,, -

1) and

F~

P

P

-

=

j~l

withf~=

E[Si~i( 1 r(F.) Fn

d71 )

---8,f... fwi(1 xexp

/s..

j~l

= 1,.. ., p,

Further, for e a c h i

W,,ja[j

=

o 2Ej=lWjd~ p 2 -~ /s,,2

we have

IBm]

r(-~nn)dTl)(2¢r)-P/2exp[-l ~ (wj-SJ)2]

I"

- 1 I2 (wj - 8j) 2 -

]

w~,/2 + w : , - 8:/2 dWl". dwp

1",...", Here, to obtain the first equalit_y in equation (2.8) we use the independence of Z,, and (Y2 . . . . . Y,,), and the fact that Wn~ is a function of Z.i, i = 1 ..... p, and sn2 is a function of (112. . . . . Yn)" Next, by the fact that W,;~- X2(82/2), i = 1.... , p, then for each i = 1 . . . . . p, with f. = the W~'s are independent and --2 o2v-p y .t-1/o2 and f2(x) being the density function of a X 2 random variable with 1 + 2k i degrees of •..,,j= 1.~,j~j /on freedom, we have

exp(8~/2)E[(1

r(FnFn)d;1)

a exp(a2/2)f ...

= ~8---~ =f-'"

f(1

k:o ki 100

[fin]

( r(f")d?"fi[~ exp(_82/2)(82/2)*' (xj) dxj ] L ) J=~ k:o kfl.

f a-

r(fn)dT') f i [ ~ f-

(8:/2:'-' ~

J*~[,:o

exp(-Sj2/2)(Sfi/-----2)•Jfj(xj)dxj]

2(Si/2)fi(xi) dxi

kfl

Volume7, Number2

STATISTICS& PROBABILITYLETTERS

September1988

r(fn)dT1)fl[ ,*i £k, exp(-82/2)

=28i-1 exp(8/2/2)f'''f( 1

kj, fj(xj) dxj]

× k,=0 ~ kiexp(-63/2)(82/2) ki! k' fi(xi) dxi"

(2.9)

Therefore, by combining equations (2.7)-(2.9) we have

E[(,-'t)'(Ip

r(_~,F,)v--1) (Z,, - 7) Ifl,]

"dif"" Jt1

= 2n-1°2 E

r(f.) d.' I-I E

;=1

L

exp(-Sf/2) (81/2)k'fj(xj)

/'I

/j.~l_k,=o

kj!

dxj

]

× ~_. k, exp(-82/2) ( 82/2)k'

kilO

ki! fi(xi) dxi

~2n-lo2~.,dif i=1 ...

1 r(f") fl kj=O ~ exp(-Bff/2) f. d71 j*i

(Sjz/2)'~j

kj!

~.(xj) dxj

× ki=O ~ exp(-8~/2)(#/2) k~ii ~' fi(xi) dx, =E

(~-'/)'

diag

1

F,,

.

.

.

.

.

.

(2.10)

Consequently, by (2.5), (2.6) and (2.10), we have

E[(O--•)'Q(Ip

r(F._____))

gn Q-Iv- 1)( .~. _ X)]

<~E[(O-)k)'Q(Ip ~7-Q-r(F"),V_I) + (~,

_ 2k)].

(2.11)

Next, observe that

[

(

t

=E (e(~o-x))' i~ F. (eq-~e')(ere')-' (e0-ae')-' x(,, r(F.) F,, (eQ-'e')( ev,')-')(e(.~o X))]

=e[(e(.~.-X))'(,, r(F.)T.°-~)~('('°-x))] 101

Volume 7, Number 2

=E

STATISTICS& PROBABILITYLETTERS

[

(P(X,-A))"

[ (( diag

1

~,

)2( ,...,

r(F.)

1

~,,

September 1988

)2)]

dp 1

(P(X,-h))

]

..., P

r(F.)

= E [(~', - h)'e'(diag((1

× p , - 1 Q p - a diag

=E

[,X . - X ) ' ( (Ie ) ) F.(

r(F.)

1

(1

r( F. F") d;1

Q-1V-'

+ '

.....

Q Ip

1

F.

dp 1

r(F.) Q _ I v _ I

F.

P( X. - ~ )

) (~n__~k)]

(2.a2)

Combining equations (2.4), (2.11) and (2.12) we have

R(0, o2, 8 2 ) > / R ( 0 , o 5, 8.+~ and hence the desired result. Remark 2.1. For the case Q = V = lp the calculations for the proof of Theorem 2.2 can be found in Sclove et al. (1972). Remark 2.2. Let r(F,) = b where 0 < b < 2 ( p - 2) and replace s,2 by ~2 = as 2 where a is a prechosen constant. Then one can show the dominance of the class of positive-part versions of J a m e s - S t e i n estimators over the usual class of James-Stein estimators for the sequential sampling procedure and loss structure discussed in G h o s h et al. (1987).

3. Monte Carlo study Consider a randomized block design with p treatments per block and n blocks. Writing each group of observations within a block as a p-dimensional vector one would have Yj =/~1 + a + flj.l + c where /~ would be the overall mean, a the vector of treatment effects, flj the random block effect and cj the vector of random errors, j = 1 . . . . . n. Making the usual assumptions on the flj's and the cj's, i.e., /3j i.i.d. N(0, o~) and ,j i.i.d. N(0, o,Zl), j = 1 . . . . , n, one would obtain Yj i.i.d. N(txl + a, o,21+ o~J), j = 1.... , n. Consequently, assuming o,2 = o~ = 02, one could apply Theorem 2.2 with 0 = t~l + a and V = I + J. The above situation was simulated with Q - - - I (mean squared error loss) for p = 6 and 9. For each dimension ~ = 0, 02 = 1 and 0 = 0, ( 1 / ) / p ) l and ( 2 / v ~ ) l . For each combination of p and 0, samples of size n = 5, 15 and 25 were generated 1000 times each. For each of the 1000 replications a loss in estimation was computed for X,, 8,Js and 8 + . Then, an average of the 1000 losses was computed for each estimator to represent the respective risks. As a final step, the percentage risk improvements were computed for 8,Js and 8,+ relative to X,, where percentage risk improvement of 8, relative to X, is defined as

% 8 = IooR(O, o2, X . ) - R ( $ , °2, 8.)

The results are contained in Table 3.1. 102

Volume 7, Number 2

September 1988

STATISTICS & PROBABILITY LETTERS

Table 3.1 Risks and percentage risk improvements over X,

p

Ilell

,

R(e, ~ . )

• 8~s

• 87

6

0

5 15 25

2.3835 0.7765 0.5063

38.93 40.83 40.40

46.85 50.05 47.65

1

5 15 25

2.3892 0.8262 0.4916

35.76 35.59 28.32

43.87 41.04 36.19

2

5 15 25

2.3510 0.7903 0.4796

32.37 27.97 23.50

37.07 29.30 23.97

0

5 15 25

3.6928 1.2113 0.7660

42.74 43.85 42.42

48.09 49.74 48.02

1

5 15 25

3.5762 1.2257 0.7764

43.57 41.64 40.87

48.18 46.93 45.03

2

5 15 25

3.5617 1.2060 0.6889

39.84 37.75 36.27

45.64 41.31 38.61

9

As can be seen from Table 3.1, there is improvement for both 8Js and 8+ , with 8~+ always having the greater improvement. Also of note is the decreasing improvement as 0 moves away from X = 0, i.e., when we shrink to the wrong point. Further, when 2k ~ 0, the improvement decreases with increasing sample size.

References Baranchik, A.J. (1970), A family of minimax estimators of the mean of a multivariate normal distribution, ~Inn. Math. Statist. 41, 642-645. Berger, J. (1976), Admissible minimax estimation of a multivariate normal mean with arbitrary quadratic loss, ~Inn. Statist. 4, 223-226. Efron, B. and C. Morris (1976), Families of minimax estimators of the mean of a multivariate normal distribution, Ann. Statist. 4, 11-21. Ghosh, M., D.M. Nickerson and P.K. Sen (1987), Sequential shrinkage estimation, ~Inn. Statist. 15, 817-829.

Hudson, H.M. (1974), Empirical Bayes estimation, Technical Report No. 58, Department of Statistics, Stanford University. James, W. and C. Stein (1961), Estimation with quadratic loss, in: Proc. Fourth Berkeley Symp. Math. Statist. Probab. (University of California Press) 1, 361-379. Sclove, S.L., C. Morris and R. Radhakrishnan (1972), Nonoptireality of preliminary-test estimators for the mean of a multivariate normal distribution, Ann. Math. Statist. 43, 1481-1490. Strawderman, W.E. (1971), Proper Bayes minimax estimators of the multivariate normal mean, Ann. Math. Statist. 42, 385-388.

103