Comparisons of improved risk estimators of the multivariate mean vector

Comparisons of improved risk estimators of the multivariate mean vector

Computational Statistics & Data Analysis 50 (2006) 402 – 421 www.elsevier.com/locate/csda Comparisons of improved risk estimators of the multivariate...

285KB Sizes 0 Downloads 73 Views

Computational Statistics & Data Analysis 50 (2006) 402 – 421 www.elsevier.com/locate/csda

Comparisons of improved risk estimators of the multivariate mean vector B.U. Khana,∗ , S.E. Ahmedb a Department of Mathematics and Computing Science, Saint Mary’s University, Halifax, NS, Canada b Department of Mathematics and Statistics, University of Windsor, Windsor, Ont. Canada

Received 1 January 2004; accepted 1 July 2004 Available online 11 September 2004

Abstract The estimation of the mean vector of a multivariate normal distribution, under the uncertain prior information (UPI) that component means are equal but unknown, is considered. The positive part of Stein-Rule (PSE) and improved preliminary test (IPE) estimators are proposed. It is demonstrated analytically as well as computationally that the positive part of Stein-Rule estimator is superior to the usual Stein-Rule estimator (SE). Furthermore, it is shown that the proposed improved pretest estimator dominates the traditional preliminary test estimator (PE) regardless the correctness of the nonsample information. The relative dominance of the proposed estimators are presented analytically as well as graphically. Percentage improvements of the proposed estimators over the unrestricted estimator (UE) are computed. It is shown that for p  3, SE or PSE is the best to use while for p  2, UE is preferable. © 2004 Elsevier B.V. All rights reserved. Keywords: Uncertain prior information; Quadratic biases; Risk functions; Positive part of Stein-Rule estimator; Improved pretest estimator; Percentage risk improvements

1. Introduction Suppose we have a random sample, x1 , x2 , . . . , xn , from a multivariate normal distribution Np (, ), where  = (1 , . . . , p ) and  is unknown. In this article the problem of estimating the mean vector under the uncertain prior information (UPI) that component ∗ Corresponding author.

E-mail address: [email protected] (B.U. Khan). 0167-9473/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2004.07.027

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

403

means are equal but unknown, is considered. It is well documented in the literature that Preliminary test estimator (PE) performs better than unrestricted estimator (UE) over a small portion of the parameter space but is inadmissible under squared error loss. Thus in the spirit of Sclove et al. (1972) we propose improved preliminary test estimator (IPE) and it is shown that the IPE improves upon PE uniformly. James and Stein (1961) showed that UE is inadmissible and provided a better estimator, Stein-Rule estimator (SE), when p  3. Ali and Saleh (1991) showed that SE combine the sample information and UPI in a superior way. The resulting estimator, improves, in-terms of risk, over UE regardless of the correctness of UPI. However, the drawback of SE is that it may shrink beyond the hypothesis vector. Thus we propose a superior alternative to SE by considering its positive part only. The PSE not only controls the over-shrinking problem but is also superior to SE in the entire parameter space and prevents the changing of sign of UE. The IPE, in which PE is replaced by SE, dominates PE at the expense of larger bias. The estimators are proposed in Section 2. Biases, mean squared error matrices (MSEM) and risk functions of the proposed estimators are given in Section 3. In Section 4 we provide a detailed analyses of biases and risk functions of the proposed estimators. The computed risk comparisons of the estimators are given in Section 5. Section 6 is devoted to concluding remarks. 2. The usual and proposed estimators Our main goal is to estimate the population mean vector  when it is suspected that

1 = 2 = · · · = p = o

(unknown).

(2.1)

The following estimators for the mean vector  are usually considered in the literature when the relation in (2.1) is suspected: 1. Unrestricted estimator (UE): It is given by

ˆ = (ˆ 1 , . . . , ˆ p ),

n

1 ˆ i = xij , n

i = 1, 2, . . . , p,

(2.2)

j =1

ˆ is an unbiased estimator of  in the global model. R 2. Restricted estimator (RE): This estimator is denoted by ˆ and is defined as R ˆ = (ˆ R , . . . , ˆ R ) = ˆ R 1p ,

1p = (1, . . . , 1),

(2.3)

where

ˆ R =

p 1 1 x¯i = 1p ˆ . p p

(2.4)

i=1

It is also known as the pooled estimator of  under the restricted model. It is a well R known fact that ˆ performs better than ˆ (under quadratic loss) when the restriction

404

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421 R in (2.1) holds, but as component means deviates from one another, ˆ becomes considerably biased and inefficient. However, the performance of ˆ remains steady over such departure from the null hypothesis. As a result, when the prior information in (2.1) is rather uncertain, it may be desirable to have a preliminary test based on UPI in the form of the null hypothesis

H0 : 1 = · · · = p = o

(unknown).

(2.5)

P 3. Preliminary test estimator (PE): It is denoted by ˆ and is defined as P R R R ˆ = ˆ + (ˆ − ˆ ) I (T 2 > t2 ) = ˆ − (ˆ − ˆ )I (T 2  t2 ),

(2.6)

 where I (.) is the indicator function defined on the set (.) and T 2 = nˆ C S−1 Cˆ is the 2 Hotelling T test statistic for the preliminary test on H0 in (2.5) for (p − 1) dimensional distribution. The test statistic T 2 has F-distribution with (p − 1) and m = n − p + 1 degrees of freedom. The upper (100)th percentile of Fp−1,m distribution is given by  () m t2 and (n−1)S=C[ ni=1 (xi − ˆ )(xi − ˆ ) ]C , where C=Ip − p1 1p 1p Fp−1,m = (n−1)(p−1) is an idempotent and symmetric matrix of rank (p − 1) satisfying C1p = 0. When the restriction in (2.5) holds then C = 0. 4. Shrinkage or Stein-Rule estimator (SE): S R ˆ = ˆ − c(n − 1)T −2 ](ˆ − ˆ ),

p > 3.

(2.7)

The proposed estimators of  are given by 5. Positive-part of Stein-Rule Estimator (PSE) +

S R R ˆ = ˆ + (1 − c(n − 1)T −2 )+ (ˆ − ˆ ),

p > 3,

(2.8)

where a + = max(0, a). Eq. (2.8) can be rewritten in the following form which is more convenient for computational purposes: +

S S R ˆ = ˆ − (1 − c(n − 1)T −2 )I (T 2  (n − 1)c)(ˆ − ˆ ). P+

(2.9)

S

The ˆ is obtained from Eq. (2.6), replacing ˆ by ˆ . 6. Improved Preliminary Test Estimator (IPE) +

P P R ˆ = ˆ − c(n − 1)T −2 I (T 2 > t2 )(ˆ − ˆ ),

(2.10)

where c, is the shrinkage constant, such that 0 < c < 2(p−3) m+2 and p > 3. The properties of P ˆ relative to ˆ have been studied by Da Silva and Han (1984) and Judge and Bock (1978) S among others. The risk analysis of ˆ and ˆ is available in the literature. For the sake of computations and graphical presentations, we take the optimum value of p−3 shrinkage constant i.e. copt = m+2 in the expressions of biases, mean square error matrices, and risk functions of the proposed estimators provided in the following section.

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

405

3. Biases, MSEM and risk functions of the proposed estimators In this section, we present the expressions for biases, mean squared error matrices and R risk functions of the usual and proposed estimators. Since ˆ − ˆ = ˆ − R 1p = Cˆ , therefore we make the following transformations: √ Y = nCˆ (3.1) and Z = MY,

(3.2) M(CC )M

where M is any symmetric and non-singular matrix such that CC = (M M)−1 . Using the result from Anderson (1984), we can write

= Ip−1 and

Z Z T2 (3.3) = 2 . n−1 m √ Since Z ∼ N ( nM, Ip−1 ), therefore Z Z is distributed as a non-central chi-square distribution with (p − 1) degrees of freedom and non-centrality parameter  = n (M M). () m T2 Since Z Z = 2p−1 () is independent of 2m , therefore p−1 n−1 is distributed as F(p−1),m . Expressions for the biases of the estimators are given in Theorem 1 as follows: +

+

S P Theorem 1. Biases of ˆ and ˆ are, respectively, given in (3.4) and (3.6).   S+ S m ˆ ˆ B( ) = B( ) − Gp+1,m c;  p+1   2  p+1 () −2 + mcE p+1 ()I c , 2m

S B(ˆ ) = −mcE −2 p+1 () . P+

B(ˆ

P ) = B(ˆ ) − mcE −2 p+1 ()    2 p+1 () p − 1 −2 , F()  + mcE p+1 ()I 2m m

(3.4) (3.5)

(3.6)

P B(ˆ ) = −Gp+1,m (F(∗) ; ),

(3.7)

where Gv1 ,v2 (·; ) is the cumulative distribution function of a non-central F distribution with (v1 , v2 ) degrees of freedom and noncentrality parameter  = n (CC )−1 . F(∗) = p−1 () p+1 Fp−1,m

Section 2.

()

and, for the sake of simplicity, we write F() instead of Fp−1,m defined in

Proof of Theorem 1. The proof of Theorem 1 is straight forward. The bias expressions are obtained by using the transformations in (3.1)–(3.3) and applying the results from

406

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

Judge and Bock (1978, Appendix B). Since the bias expressions of all the estimators are not in the scalar form, we therefore take the recourse by converting them to the quadratic ∗ form for graphical purposes. Thus, let us define the quadratic bias of an estimator ˆ of  by ∗



Q = B (ˆ )−1 B(ˆ ). +

+

S S P P Corollary 1. Quadratic biases of the estimators ˆ , ˆ , ˆ and ˆ are, respectively, given in (3.8)–(3.11). S+





Q(ˆ ) =  mcE



−2 p+1 ()I 

−Gp+1,m

2p+1 () 2m



c



2 m −2 , c;  − mcE p+1 () p+1

2

S Q(ˆ ) =  mcE −2 (  ) , p+1

(3.8)

(3.9)

   2 p+1 () p − 1 P+ −2 ˆ Q( ) =  mcE p+1 ()I  F() 2m m



2 −2 ∗ , −Gp+1,m F() ;  − mcE p+1 ()

(3.10)



2 P Q(ˆ ) =  Gp+1,m F(∗) ;  .

(3.11)



Since E −2 p+1 () is a decreasing convex function of , therefore the bias functions of +

S S both ˆ and ˆ start at  = 0 with a value that decreases to a point and then increases S+ S towards 0. However, the graph of the bias of ˆ remain below the graph of the bias of ˆ . +

P P These features are depicted in Fig. 1. The estimator ˆ is obtained from ˆ by replacing S an unbiased estimator ˆ by a biased estimator ˆ resulting in the increased bias but a +

P minimum risk. The bias of ˆ is a function of  and . For fixed , as a function of , this function starts from 0 increase to a point, then decreases gradually tozero. On the other hand, as afunction of  (for fixed ) it is decreasing function of  ∈ [0, 1) after achieving P+ P a maximum value at  = 0 and 0 at  = 1. Fig. 2, displays this behavior of ˆ versus ˆ .

R The quadratic bias of ˆ is a function of  and is unbounded in  which goes to ∞ as  P tends to ∞. On the other hand, the bias function of ˆ is bounded in .

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

407

Fig. 1. Quadratic bias of PSE and SE.

Fig. 2. Quadratic bias of IPE and PE.

Definition. Let ∗ be an estimator of  and W be a positive semi-definite matrix, the quadratic loss function is given as

L(∗ , ) = n(∗ − ) W(∗ − ) = n tr{W(∗ − )(∗ − ) }, R(∗ , ) = E{L(∗ , )} = tr(W),

therefore

(3.12)

408

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

where the MSEM is denoted as  and is given by

 = nE{(∗ − )(∗ − ) }. Expressions for the mean squared error matrices and the risk functions of the estimators are given in Theorems 2 and 3, respectively: Theorem 2. The mean squared error matrices of the proposed estimators are given by (3.13)–(3.17), respectively: +

S (ˆ ; )

m(p − 3) CC m+2  2    p+1 () p − 3 −4 −2 × E (p − 3)p+1 () − 2p+1 () I  − n 2m m+2 S

= (ˆ ; ) − CC Gp+1,m (c∗ ; ) −

 2    p+3 () p − 3 m(p − 3) −4 −2 × E (p − 3)p+3 () − 2p+3 () I  m+2 2m m+2  +2−2 p+1 ()I

2p+1 () 2m

p−3  m+2

 + n {2Gp+1,m (c∗ ; )

− Gp+3,m (co ; )},

(3.13)

where c∗ =

m(p − 3) , (p + 1)(m + 2)

co =

m(p − 3) (p + 3)(m + 2)

and S m(p − 3) −4 (ˆ ; ) =  − CC E{2−2 p+1 () − (p − 3)p+1 ()} m+2

m(p − 3) + n (p + 1)E −4 p+3 () , m+2

(ˆ ; ) = .

(3.14)

(3.15)

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421 P+

(ˆ

409

  m(p − 3) −4  CC E 2−2 ; ) = (ˆ ; ) − p+1 () − (p − 3)p+1 () m+2   2 p+1 () p − 1 F() > ×I 2m m   m(p − 3) −4  − (  ) − (p − 3)  (  ) n E 2−2 p+3 p+3 m+2   2 p+3 () p − 1 > F() − 2−2 ×I p+1 () 2m m  2  p+1 () p − 1 ×I , (3.16) > F() 2m m P



P (ˆ ; ) =  − CC Gp+1,m F(∗) ; 



+ n {2Gp+1,m F(∗) ;  − Gp+3,m F(o) ;  },

(3.17)

where F(o) =

p−1 Fp−1,m (). p+3

The mean squared error matrix measure gives the overall performance of the estimators while the risk measure depends on the sum of the mean square errors of the individual components. Moreover, the decision theoretic approach leads to risk function via loss function. The decision whether to use risk or mean squared error analysis of the estimators rests on the user’s discretion. Nevertheless, we obtain the same conclusions about the performance of the estimators by using either of the two approaches. The risk functions of the proposed estimators are presented in the following theorem: Theorem 3. Risk functions of the proposed estimators under the quadratic loss function are given in Eqs. (3.18) and (3.22). S+ S m(p − 3) R(ˆ ; ) = R(ˆ ; ) − tr(WCC)Gp+1,m (c∗ ; ) − tr(WCC) +2   m   2p+1 () p − 3 −4 −2 × E (p − 3)p+1 () − 2p+1 () I  2m m+2   m(p − 3)  −2 − n WE (p − 3)−4 p+3 () − 2p+3 () m+2  2    2 p+3 () p − 3 p+1 () p − 3 −2 ×I + 2p+1 ()I   2m m+2 2m m+2

+ n W{2Gp+1,m (c∗ ; ) − Gp+3,m (co ; )},

(3.18)

410

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421 S m(p − 3) tr(WCC) R(ˆ ; ) = tr(W) − m+2   −4 × E 2−2 (  ) − (p − 3)  (  ) p+1 p+1

m(p − 3)(p + 1)  + (  ) , n WE −4 p+3 m+2

R(ˆ ; ) = tr(W)

(3.19) (3.20)

P+ P m(p − 3) tr(WCC) R(ˆ ; ) = R(ˆ ; ) − m+2   −4 × E 2−2 p+1 () − (p − 3)p+1 ()



 p−1 ×I > F() 2m m   m(p − 3)  −4 − n WE 2−2 (  ) − (p − 3)  (  ) p+3 p+3 m+2  2  p+3 () p − 1 ×I > F() 2m m   2

p+1 () p − 1 −2 −2 p+1 () I , > F() 2m m

2p+1 ()



P R(ˆ ; ) = tr(W) − tr(WCC)Gp+1,m F(∗) ;  



 + n W 2Gp+1,m F(∗) ;  − Gp+3,m F(o) ;  .

(3.21)

(3.22)

Note: See the Appendix for the proof of Theorem 3. In order to portray the risk functions of the estimators, we in the sequel assume,  to be −1 equicorrelation matrix i.e.  = 2 {(1 − )Ip + J} and W = −2 Ip , where (p−1) <  < 1,  J = 1p 1p . Thus under these assumptions, we have CC = 2 (1 − )C and

n  = 2 (1 − ).

4. Risk analyses of the proposed estimators Risk functions’ analyses of the proposed estimators are provided in the following subsections: S+

S

4.1. Comparison of ˆ , ˆ and ˆ Case 1: When  = 0

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

411

From Eq. (3.18) we obtain   S+ S R(ˆ ; ) − R(ˆ ; ) = (1 − )(p − 1)k[ Gp−1,m (k; 0) − k −1 Gp+1,m (k ∗ ; 0) + {Gp−1,m (k; 0) − Gp−3,m (k ∗ ; 0)}],

(4.1)

where k=

m(p − 3) (p − 1)(m + 2)

and k ∗ =

m . (m + 2)

Since, Gp−3,m (k ∗ ; 0) > Gp−1,m (k; 0) and 0 < k < 1, therefore the quantities in the braces of Eq. (4.1) are negative which implies that S+

S

R(ˆ ; ) < R(ˆ ; ). Using the Stein’s identity,



−4 −4 E −2 p+1 () − (p − 3)E p+1 () = E p+1 () , we can write Eq. (3.19) as follows: S m(p − 3) R(ˆ ; ) − R(ˆ ; ) = − (1 − ). (m + 2)

The above risk difference is negative and hence at  = 0, we have +

S S R(ˆ ; ) < R(ˆ ; ) < R(ˆ ; ).

It is interesting to note that the magnitude of risk improvement is greater when the correlation coefficient is negative. Case 2: When   = 0 In order to establish the dominance behavior for the non-null case we first consider, a class of positive semi-definite matrices W∗ , such that   p+1 tr(WC) ∗ W = W:  , (4.2) max (W) 2 where C is defined in Section 2 and max (W) is the largest eigenvalue of W. Using the following relation due to Courant’ Theorem

 W max (W) for all ,   one can easily deduce from Eqs. (3.19), (3.20) and (4.2) S

R(ˆ , ) R(ˆ , ) for all , where strict inequality holds for some .

W ∈ W∗ ,

p > 3,

412

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421 S+

S

In order to compare the risk function of ˆ with that of ˆ , we first note that    2 2p+j p+j  c, j = 1, 3, 1 if 2m c = I 2m 0 otherwise 2m )  0. The Eq. (A.4) given in the Appendix can also be written in q+j ()

and the term (1 − c 2 the following form: +

S S R(ˆ ; ) − R(ˆ ; )    2  2 2p+1 ()   = −p(1 − )E I c 1−c 2 m 2m p+1 ()    2  2 2p+3 ()   − (1 − )E I c 1−c 2 m 2m p+3 ()  2   p+1 () 2m c 1−c 2 . + 2(1 − )E I 2m p+1 ()

(4.3)

Hence from Eq. (4.3), we conclude that S+

S

R(ˆ , ) R(ˆ , ) for all , +

S where strict inequality holds for some . Thus ˆ combines sample and nonsample inforS mation in a superior way. A risk improvement over ˆ and hence over ˆ under quadratic loss function is guaranteed regardless the correctness of the nonsample information. When S+ S  → ∞, the risk functions of both ˆ and ˆ tend to the risk function of ˆ from below. Hence the dominance picture of the estimators is +

S S ˆ  ˆ  ˆ ,

for all , W ∈ W∗ and p > 3,

where  stands for dominance. S In order to portray the above dominance picture graphically, the risk functions of ˆ , ˆ , + S and ˆ are provided in Fig. 3. We notice that the biggest risk improvement is obtained near R

 = 0, because both estimators shrink ˆ toward the ˆ . In passing, we remark that James and Stein (1961) shrank their estimator towards vector 0. However, there is no magic about null vector, and these estimators could be shrunk toward any point. P+

4.2. Comparison of ˆ

Case 1. When  = 0

P

, ˆ and ˆ

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

413

Fig. 3. Risk behavior of PSE, SE, and UE. +

P P The risk functions of ˆ and ˆ , for comparative purposes, can be rewritten due to the following result:  2   2  p+j () p − 1 p+j () p − 1 I for j = 1, 3 >  F() = 1 − I F() 2m m 2m m

and the Stein’s identities



−4 −4 E −2 (  ) − (p − 3) E  (  ) = E  (  ) p+1 p+1 p+1



−2 −4 E −2 p+3 () − E p+1 () = −2E p+3 () . +

P P Thus the risk difference between ˆ and ˆ , given in Eq. (3.21), becomes +

P P R(ˆ ; ) − R(ˆ ; )

m(p − 3) m(p − 3)2 (1 − )E −2 (1 − )(p − 1) (  ) − =− p−1 m m+ 2 +2    2p+1 () p − 1 −4 −2 F()  × E (p − 3)p+1 () − 2p+1 () I 2m m m(p − 3) −2 − (1 − ) E {(p − 3)−4 p+3 () − 2p+3 ()} m+2   2   2 p+3 () p − 1 p+1 () p − 1 −2 F() + 2p+1 ()I F( ) , (4.4)  ×I 2m m m 2m

414

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

which for  = 0 becomes +

P P R(ˆ ; ) − R(ˆ ; ) m(p − 3) m(p − 3)2 (1 − )E(−2 (1 − ) =− p−1 ) − m+2 m + 2    2p+1 p − 1 −4 −2  F() × (p − 1)E (p − 3)p+1 − 2p+1 I 2m m     m(p − 3) p−1 =− (1 − ) 1 + Gp−3,m F() − 2Gp−1,m (F() ) . m+2 p−3

(4.5)

+

P P The right-hand side of Eq. (4.5) is negative, and ˆ dominates ˆ when the null hypothesis in (2.5) is true. Thus P+

P

R(ˆ ; ) < R(ˆ ; ) P

The difference between the risk functions of ˆ and ˆ from Eq. (3.22) reduces to

P R(ˆ ; ) − R(ˆ ; ) = −(1 − )(p − 1)Gp+1,m F(∗) ; 0 ,

(4.6)

which is clearly a negative quantity and hence P R(ˆ ; ) < R(ˆ ; ).

Hence the dominance picture under the null hypothesis H0 is +

P P ˆ  ˆ  ˆ . P+

S+

P

Whenever F() ∈ [0, c] then ˆ provide a minimax substitute for ˆ and is in fact ˆ for p−3 copt = m+2 . Thus, based on the comparative analyses of the risk functions in the preceding section, we conclude that ˆ

P+

S

dominates ˆ as well as ˆ .

Case 2: When   = 0. P+ The estimator ˆ is the function of both  and . Since, F() cannot be restricted and can +

P obtain any value in the interval (0, ∞), therefore whenever F() ∈ / [0, c] then ˆ is no more P a minimax estimator but it continues to dominate ˆ for comparable values of F() . Thus

whenever

p−1 m F() is P+

greater than

2(p−3) m+2 ,

the estimator ˆ

P+

P

dominates ˆ and as  → ∞,

P the risk of both ˆ and ˆ approaches the risk of ˆ . Fig. 4 exhibit plot of the risk curves P P+ P+ P for ˆ ,ˆ and ˆ . The graphs confirm our analytical findings that ˆ dominates ˆ for all values of , whereas it dominates ˆ for a range of  values such that F ∈ [0, c]. It can be P+

seen that ˆ

P

has remarkably improved upon both ˆ and ˆ near  = 0. We can summarize

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

415

Fig. 4. Risk behavior of IPE, PE, and UE. +

P the properties of ˆ as follows: +

P P 1. The ˆ is superior to ˆ regardless of the correctness of the null hypothesis. P+ S+ P+ S 2. IfF() ∈ [0, c], then ˆ = ˆ , and hence the ˆ dominates ˆ and ˆ in the entire parameter space. P+ P 3. When F() takes value outside the interval [0, c] then ˆ behaves like ˆ as compared to +

P ˆ . In other words, the performance of ˆ depend on the correctness of UPI as compared P to ˆ , however it dominates ˆ .

5. Computed risk analysis In this section we compute the percentage improvements in the risks of various estimators over the UE by using the following formula: PIb =

100(Rb − R1 ) R1

b = 2, 3, 4, 5, +

+

S S P P where R1 –R5 are the risks of ˆ , ˆ , ˆ , ˆ , and ˆ , respectively. Tables 1 and 2 provide the percentage improvements in the risks of various estimators over the risk of ˆ for different combinations of (n, p, ) and  = 0, 0.5, 1.0, 2.0. We summarize the findings from the S+ tables as follows. First, we consider the risk improvement of ˆ over ˆ . For given n and p,

416

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

Table 1 Percentage improvement of various estimators over UE at  = 0,  = 0.05 and  = −0.8 p/n

16

18

20

22

24

26

28

30

4

PSE SE IPE PE RE

54 39 74 49 135

55 39 73 47 135

56 40 72 45 135

57 41 72 43 135

57 41 71 42 135

58 41 71 41 135

58 42 71 41 135

59 42 70 40 135

5

PSE SE IPE PE RE

78 61 86 35 144

80 62 86 32 144

81 64 86 30 144

82 64 86 28 144

83 65 86 27 144

84 66 87 26 144

84 66 87 25 144

85 67 87 24 144

6

PSE SE IPE PE RE

91 75 95 25 150

94 77 96 22 150

96 79 97 19 150

97 80 98 17 150

98 81 99 16 150

99 82 99 15 150

100 83 100 14 150

101 83 100 14 150

7

PSE SE IPE PE RE

100 84 100 18 154

103 87 102 14 154

105 89 104 12 154

107 91 105 11 154

108 92 106 9 154

110 93 107 9 154

110 95 108 8 154

111 95 109 7 154

8

PSE SE IPE PE RE

105 90 103 14 158

109 94 105 10 158

112 96 108 8 158

114 98 109 6 158

115 100 111 5 158

117 101 112 5 158

118 102 113 4 158

119 103 113 4 158

9

PSE SE IPE PE RE

107 93 104 12 160

112 98 107 7 160

116 102 110 5 160

119 104 112 4 160

121 106 113 3 160

122 107 115 3 160

123 109 116 2 160

124 110 116 2 160

10

PSE SE IPE PE RE

108 95 104 12 162

114 101 108 6 162

119 105 111 4 162

122 108 113 2 162

124 110 115 2 162

126 112 117 1 162

127 113 118 1 162

128 115 119 1 162

P I 2 (, ) is a function of  and , and has its maximum at  = 0. For fixed , P I 2 (, ) is a decreasing function of . Interestingly, the maximum value of P I 2 (0, ) is also a decreasing function of. Also, as  → ∞, the percentage improvement goes to 0. Thus, S+ ˆ dominates ˆ for all the values of . The behavior of P I 3 (, ) is similar to that of

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

417

Table 2 Percentage improvement of various estimators over UE at  = 0,  = 0.05 and  = 0.8 p/n

16

18

20

22

24

26

28

30

4

PSE SE IPE PE RE

6 4 8 5 15

6 4 8 5 15

6 4 8 5 15

6 4 8 5 15

6 5 8 5 15

6 5 8 5 15

6 5 8 5 15

7 5 8 4 15

5

PSE SE IPE PE RE

9 7 10 4 16

9 7 10 4 16

9 7 10 3 16

9 7 10 3 16

9 7 10 3 16

9 7 10 3 16

9 7 10 3 16

9 7 10 3 16

6

PSE SE IPE PE RE

10 8 11 3 17

10 9 11 2 17

11 9 11 2 17

11 9 11 2 17

11 9 11 2 17

11 9 11 2 17

11 9 11 2 17

11 9 11 2 17

7

PSE SE IPE PE RE

11 9 11 2 17

11 10 11 2 17

12 10 12 1 17

12 10 12 1 17

12 10 12 1 17

12 10 12 1 17

12 10 12 1 17

12 11 12 1 17

8

PSE SE IPE PE RE

12 10 11 2 18

12 10 12 1 18

12 11 12 1 18

12 11 12 1 18

13 11 12 1 18

13 11 12 1 18

13 11 13 0 18

13 11 13 0 18

9

PSE SE IPE PE RE

12 10 12 1 18

12 11 12 1 18

13 11 12 1 18

13 12 12 0 18

13 12 13 0 18

14 12 13 0 18

14 12 13 0 18

14 12 13 0 18

10

PSE SE IPE PE RE

12 11 12 1 18

13 11 12 1 18

13 12 12 0 18

14 12 13 0 18

14 12 13 0 18

14 12 13 0 18

14 13 13 0 18

14 13 13 0 18

P I 2 (, ). However, P I 2 (, )  P I 3 (, ),

418

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421 +

S S for fixed  and all values of . Thus, ˆ is superior to ˆ from risk point of view. The + P risk improvement of ˆ over ˆ is studied, which is denoted by P I 4 (, , ) and which is a function of ,  and  while n and p are fixed quantities. The function P I 4 (, , ) for  = 0, has its maximum at  =0, which will be denoted by P I 4 (0, , ). For any fixed value of , the maximum value of P I 4 (0, , ) is a decreasing function of . We notice that for / (0, c), ), fixed , P I 4 (, , ) is a decreasing function of . Also for fixed values of ( ∈ the magnitude of percentage improvement decreases as  increases from 0, crosses the value P I 4 (, 1, ) = 0, achieves a minimum value and then increases asymptotically to P 0. The behavior of risk improvement function of ˆ overˆ is similar to that of P I 4 (, ). However,

P I 5 (, )  P I 4 (, ), +

P P for fixed ,  in the entire parameter space induced by . Thus, ˆ dominates ˆ . P+ Finally, we observe that at  = 0, the percentage improvement of ˆ over ˆ is the largest P+

as compared to other proposed estimators. Tables indicate that ˆ has larger gain over ˆ . P+ S+ Thus, the risks of ˆ and ˆ may not be comparable in general even at  = 0. Also, for P+

P

large values of , both ˆ and ˆ improve significantly over ˆ . However, the performance P of ˆ is adversely affected as p increases.

6. Concluding remarks In this paper some of the statistical consequences of dealing with uncertain prior informaP+ S+ tion for a multivariate normal distribution are considered. Two new estimators, ˆ and ˆ , +

P / [0, c] the proposed ˆ , which makes use of both are proposed . It is noted that when F() ∈ P sample and nonsample information, dominates ˆ in the entire parameter space regardless of the correctness of the nonsample information. It is known that the risk improvement of S ˆ over ˆ is guaranteed regardless of the correctness of the nonsample prior information S

and hence ˆ is inadmissible. Sometimes ˆ change the sign of ˆ which is not appreciated by S S+ S+ practitioner, therefore the positive part of ˆ denoted as ˆ is considered.ˆ dominates

S S ˆ in the entire parametric space and prevents over shrinking behavior of ˆ as well. For P+ S+ some values of , when F() ∈ [0, c], the estimator ˆ behaves like ˆ and is superior +

P to ˆ for W ∈ W and p  3 for all values of the specification error. In this case ˆ do P not dominate ˆ uniformly. Our analytical results are well supported by the computational work in the forms of graphs and tables.

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

419

Acknowledgements Thanks are due to the referees for their constructive suggestions to bring the paper in the present form. Appendix A. S

Proof of Theorem 3. (a) Proof of the risk function of ˆ can be found in the literature and S+ can be easily derived. To prove Eq. (3.18) given in Theorem 3, we write ˆ as follows: +

S S R ˆ = ˆ − I (T 2  (n − 1)c)[1 − (n − 1)cT −2 ](ˆ − ˆ ). +

S Under the weighted loss function, the risk function of the ˆ can be written as +

S R(ˆ ; ) √ S+ √ S+ √ S √ S = E( n(ˆ − ) W n(ˆ − )) = E( n(ˆ − ) W n(ˆ − )) √ S √ R − 2E[{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 ) n(ˆ − ) W n(ˆ − ˆ )}] √ √ R R + E[{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )2 n(ˆ − ˆ ) W n(ˆ − ˆ )}].

Since

√ ˆS √ n( − ) = n(ˆ − ) − c(n − 1)T −2 Y, therefore we can right +

S S R(ˆ ; ) − R(ˆ ; ) = − 2E[I (T 2  (n − 1)c)(1 − c(n − 1)T −2 ) √ × { n(ˆ − ) − c(n − 1)T −2 Y}WY] +E{I (T 2  (n−1)c)(1−c(n−1)T −2 )2 Y WY}.

Note that the first term of Eq. (A.1) can be written as √ − 2E{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )( n(ˆ − ))WY} + 2E{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )c(n − 1)T −2 Y WY}.

(A.1)

(A.2)

Eq. (A.2) can also be written as − 2E{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )Y WY} √ + 2 nE{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )WY} + 2E{I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )c(n − 1)T −2 Y WY}.

(A.3)

Thus by substituting (A.3) in (A.1), opening the square term in the last expression, and collecting the like terms will result in S+

S

R(ˆ ; ) − R(ˆ ; ) = −E[I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )2 Y WY] √ + 2 n WE[I (T 2  (n − 1)c)(1 − c(n − 1)T −2 )Y].

(A.4)

420

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

which by Eqs. (3.2) and (3.3) takes the form +

S S R(ˆ ; ) − R(ˆ ; )   2 Z Z 2m   −1 −1 c 1 − c  Z M WM Z = −E I 2m ZZ       √ ZZ 2m −1 + 2 n WE I  c 1 − c Z . M 2m Z Z

(A.5)

Since CC = (M M)−1 , therefore tr(M −1 WM−1 ) = tr(WCC ). Applying Theorems 1 and 2 from Judge and Bock (1978) Appendix B, we obtain the following result: +

S S R(ˆ ; ) − R(ˆ ; )    2  2 () 2   p+1  = −tr(WCC )E I c 1−c 2 m 2m p+1 ()    2  2 ( ) 2   p+3  − n WE I c 1−c 2 m 2m p+3 ()  2   p+1 () 2m  c 1−c 2 . + 2n WE I 2m p+1 ()

(A.6)

Opening the square terms and taking the conditional expectation on 2m in Eq. (A.6), the exS+ pression for the risk function of ˆ in Eq. (3.18) is established. Substituting, tr(WCC )= p−3 2 (1 − )tr(WC) = (1 − )(p − 1), n W = (1 − ) and copt = m+2 in Eq. (3.18), we obtain an expression for computational purposes. +

P (b) In order to derive the risk function of ˆ in Eq. (3.21) of Theorem 3, we use the Stein’s 1 and  2 identities, Theorems   22 of Judge and Bock, Appendix B and the relation p+j ()  () p−1 I > m F() = 1 − I p+j  p−1 m F() for j = 1, 3. To save space, the full 2 2 m

m

proof is not provided here but following the same procedure as we did in the derivation of S+ the risk of ˆ , the expression in Eq. (3.21) is obtained.

References Ali, M.A., Md. E. Saleh, A.K., 1991. Preliminary test and empirical Bayes approach to shrinkage estimation of regression parameters. J. Japan Statist. Soc. 21, 22–30. Anderson, T.W., 1984. An Introduction to Multivariate Analysis, Wiley, New York. Da Silva, A.G., Han, C.P., 1984. Pooling means in a multivariate normal populations. ESTADISTICA—J. InterAmerican Statist. Institute 36 (126–127), 63–75. James, W., Stein, C., 1961. Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability.University of California Press, California, pp. 361–379.

B.U. Khan, S.E. Ahmed / Computational Statistics & Data Analysis 50 (2006) 402 – 421

421

Judge, G.G., Bock, M.E., 1978. Statistical Implication of Pre-test and Stein-Rule Estimators in Econometrics, North-Holland, Amsterdam. Sclove, S.L., Morris, C., Radhakrishnan, R., 1972. Non-optimality of preliminary test estimators of the mean of a multivariate normal distribution. Ann. Math. Statist. 43, 1481–1490.