Statistics & Probability Letters 54 (2001) 331 – 340
MSE dominance of the pre-test iterative variance estimator over the iterative variance estimator in regression Kazuhiro Ohtani ∗ Faculty of Economics, Kobe University, Rokko, Nada-ku, Kobe 657-8501, Japan Received November 2000; received in revised form March 2001
Abstract In this paper, we examine the small sample properties of the pre-test iterative variance estimator in regression. The explicit formula of MSE is derived, and it is shown that the pre-test iterative variance estimator with an appropriate critical value dominates the iterative variance estimator without pre-testing in terms of MSE. We also compare the MSE performances of the pre-test iterative variance estimators using the Stein-rule, minimum mean squared error, and adjusted c 2001 Elsevier Science B.V. All rights reserved minimum mean squared error estimators by numerical evaluations. MSC: 62J07; 62C15; 62F11 Keywords: Iterative variance estimator; Mean squared error; Pre-test; Regression error variance
1. Introduction and estimators Consider a linear regression model, y = X + ;
∼ N(0; 2 In );
(1)
where y is an n × 1 vector of observations on a dependent variable, X is an n × k full column rank matrix of observations on independent variables (k ¡ n), is a k × 1 vector of coe?cients, and is an n × 1 random vector of normal error terms with E[] = 0 and E[ ] = 2 In . The ordinary least squares (OLS) estimator of is b = S −1 X y;
(2)
where S = X X , and the corresponding residual vector is e = y − Xb: ∗ Tel.: +81-78-881-1212; fax: +81-78-803-7293. E-mail address:
[email protected] (K. Ohtani).
c 2001 Elsevier Science B.V. All rights reserved 0167-7152/01/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 1 ) 0 0 1 3 1 - 6
(3)
332
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
If our concern is to estimate the error variance (i.e., 2 ), then the estimator s2 = e e=( + 2);
(4)
may be used, where = n − k, since s2 dominates the unbiased estimator of 2 in terms of mean squared error (MSE). We call s2 the usual estimator. Although the usual estimator dominates the unbiased estimator, Stein (1964) showed that the usual estimator is dominated by the so-called Stein variance estimator ˆ2 = min[y y=(n + 2); e e=( + 2)]:
(5)
Let the null hypothesis be H0 : = 0, and the alternative hypothesis H1 : = 0. If H0 is accepted in the pre-test for H0 , then y itself is used as a residual vector. However, if H0 is rejected, then e is used as a residual vector. Thus, if we conduct the pre-test for H0 , then a pre-test variance estimator is expressed as ˆ2 (c) = I (F ¡ c)
e e y y + I (F ¿ c) ; n+2 +2
(6)
where F = (b Sb=k)=(e e=) is the test statistic for H0 : = 0, c is a critical value of the pre-test, and I (A) is an indicator function such that I (A) = 1 if an event A occurs and I (A) = 0 otherwise. We see that if the critical value of the pre-test is c = =( + 2), then the pre-test variance estimator reduces to the Stein variance estimator. Consider the following formally general shrinkage estimator of : b Sb + 1 e e ˆ b; (7) = b Sb + 2 e e where 1 and 2 are appropriate constants. Then, ˆ reduces to the minimum mean squared error (MMSE) estimator when 1 = 0 and 2 = 1=, the adjusted MMSE (AMMSE) estimator when 1 = 0 and 2 = k=, and the Stein-rule (SR) estimator when 1 = − a and 2 = 0 (0 6 a 6 2(k − 2)=( − k) for k ¿ 3). [See, for example, Stein (1956); James and Stein (1961); and Ohtani (1996) for deJnition of these estimators.] Also, as is shown in Ohtani (2000), ˆ includes the double k-class estimator proposed by Ullah and Ullah (1978). ˆ then the residual vector is If a residual vector is constructed based on , ˆ eˆ = y − X :
(8)
When the pre-test variance estimator given in (6) is constructed, e is used as a residual vector when H0 : = 0 is rejected. However, if eˆ is used instead of e, then an alternative pre-test variance estimator is expressed as ˆ21 2 (c) = I (F ¡ c)
y y eˆ eˆ + I (F ¿ c) : n+2 +2
(9)
When 1 = 0, 2 = 1= and c = 0, ˆ21 2 (c) reduces to ˆ2MM = (y − X ˆMM ) (y − X ˆMM )=( + 2);
(10)
where bMM is Farebrother’s (1975) operational variant of Theil’s (1971) MMSE estimator. Wan and Kurumai (1999) show that ˆ2MM may be viewed as the Jrst step of the iterative procedure suggested by Vinod (1976). Thus, we call ˆ2MM the iterative variance estimator using the MMSE estimator. In a similar way, we can obtain the iterative variance estimators using the AMMSE estimator (ˆ2AMM ) and the SR estimator (ˆ2SR ). We
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
333
call ˆ21 2 (c) the pre-test iterative variance estimator. We see that when 1 = 2 and c = =( + 2), the pre-test iterative variance estimator reduces to the Stein variance estimator. In this paper, we examine the small sample properties of the pre-test iterative variance estimator. In Section 2 the explicit formula of MSE is derived, and in Section 3 it is shown that the pre-test iterative variance estimator with an appropriate critical value dominates the iterative variance estimator without pre-testing in terms of MSE. In Section 4 we compare the MSE performances of the pre-test iterative variance estimators using the SR, MME and AMMSE estimators by numerical evaluations. Our numerical results show that although the pre-test variance estimators do not dominate the usual estimator, the pre-test variance estimators have smaller MSE than the Stein variance estimator when the noncentrality parameter is close to zero. 2. MSE The MSE of ˆ21 2 (c) is MSE(ˆ21 2 (c)) = E
y y eˆ eˆ I (F ¡ c) + I (F ¿ c) − 2 n+2 +2
2
(y y)2 y y = E I (F ¡ c) − 22 I (F ¡ c) 2 (n + 2) n+2 (eˆ e) ˆ2 eˆ eˆ − 2 I (F ¿ c) + I (F ¿ c) + 4 : +2 ( + 2)2 2
(11)
Noting that y y = b Sb + e e and b Xy = b Sb, we obtain b Sb + 1 e e b Sb + 1 e e b b y−X eˆ eˆ = y − X b Sb + 2 e e b Sb + 2 e e = y y − 2
b Sb + 1 e e b Sb + 2 e e
(b Xy) +
b Sb + 1 e e b Sb + 2 e e
2
(b Sb)
2 b Sb + 1 e e = e e + 1 − (b Sb) b Sb + 2 e e
=e e+
(2 − 1 )e e b Sb + 2 e e
2
(b Sb):
Substituting (12) in (11), we have 1 MSE(ˆ21 2 (c)) = [E[I (F ¡ c)(b Sb)2 ] + 2E[I (F ¡ c)(b Sb)(e e)] + E[I (F ¡ c)(e e)2 ]] (n + 2)2 22 [E[I (F ¡ c)(b Sb)] + E[I (F ¡ c)(e e)]] n+2 2 2 − 1 22 2 [E[I (F ¿ c)(e e)] + E I (F ¿ c)(b Sb)(e e) − +2 b Sb + 2 e e −
(12)
334
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
+
1 [E[I (F ¿ c)(e e)2 ] ( + 2)2
+ 2E I (F ¿ c)(b Sb)(e e)
3
+ E I (F ¿ c)(b Sb)2 (e e)4
2 − 1 b Sb + 2 e e 2 − 1 b Sb + 2 e e
2 4
+ 4 :
(13)
If we deJne the function H (p; q; r; 1 ; 2 ; c) as p q H (p; q; r; 1 ; 2 ; c) = E I (F ¡ c)(b Sb) (e e)
2 − 1 b Sb + 2 e e
r ;
(14)
where p, q and r are parameters which satisfy r ¡ ( + k)=2 + p + q, then the MSE of ˆ21 2 (c) is written as MSE(ˆ21 2 (c)) =
1 [H (2; 0; 0; 1 ; 2 ; c) + 2H (1; 1; 0; 1 ; 2 ; c) + H (0; 2; 0; 1 ; 2 ; c)] (n + 2)2 −
22 [H (1; 0; 0; 1 ; 2 ; c) + H (0; 1; 0; 1 ; 2 ; c)] n+2
−
22 [(H (0; 1; 0; 1 ; 2 ; ∞) − H (0; 1; 0; 1 ; 2 ; c)) +2
+ (H (1; 2; 2; 1 ; 2 ; ∞) − H (1; 2; 2; 1 ; 2 ; c))] +
1 [(H (0; 2; 0; 1 ; 2 ; ∞) − H (0; 2; 0; 1 ; 2 ; c)) ( + 2)2
+ 2(H (1; 3; 2; 1 ; 2 ; ∞) − H (1; 3; 2; 1 ; 2 ; c)) + (H (2; 4; 4; 1 ; 2 ; ∞) − H (2; 4; 4; 1 ; 2 ; c))] + 4 :
(15)
Similar to Ohtani (2000), the explicit formula of H (p; q; r; 1 ; 2 ; c) can be derived: H (p; q; r; 1 ; 2 ; c) = (22 )p+q−r (2 − 1 )r
∞
wi ()Gi (p; q; r; 1 ; 2 ; c);
(16)
i=0
where Gi (p; q; r; 1 ; 2 ; c) =
(( + k)=2 + p + q + i − r) (k=2 + i) (=2)
0
c∗
t k=2+p+i−1 (1 − t)=2+q−1 dt; [2 + (1 − 2 )t]r
(17)
c∗ = kc=( + kc), and wi () = exp(−=2)(=2)i =i!. Substituting (16) into (15), the explicit formula of MSE is obtained.
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
335
3. MSE performance To examine the MSE performance of ˆ21 2 (c), we diNerentiate the MSE with respect to c. To do so, the following formula is useful: @Gi (p; q; r; 1 ; 2 ; c) (( + k)=2 + p + q + i − r) k k=2+p+i =2+q ck=2+p+i−1 = : @c (k=2 + i) (=2) (2 + kc)r ( + kc)(+k)=2+p+q+i−r Using (18), we diNerentiate (15) with respect to c: ∞ @MSE(ˆ21 2 (c)) 1 @Gi (2; 0; 0; 1 ; 2 ; c) 2 2 = (2 ) wi () @c (n + 2)2 @c i=0
+ 2(22 )2
∞
wi ()
i=0
+ (22 )2
∞ i=0
@Gi (0; 2; 0; 1 ; 2 ; c) wi () @c
−
@Gi (1; 1; 0; 1 ; 2 ; c) @c
∞
22 @Gi (1; 0; 0; 1 ; 2 ; c) (22 ) wi () n+2 @c i=0
+ (22 )
∞ i=0
@Gi (0; 1; 0; 1 ; 2 ; c) wi () @c
+
∞
22 @Gi (0; 1; 0; 1 ; 2 ; c) (22 ) wi () +2 @c i=0
2
+ (2 )(2 − 1 )
2
∞ i=0
@Gi (1; 2; 2; 1 ; 2 ; c) wi () @c
∞
1 @Gi (0; 2; 0; 1 ; 2 ; c) 2 2 − (2 ) wi () ( + 2)2 @c i=0
+ 2(22 )2 (2 − 1 )2
∞ i=0
2 2
+ (2 ) (2 − 1 )
4
∞ i=0
= (22 )2
∞ i=0
wi ()
wi ()
@Gi (1; 3; 2; 1 ; 2 ; c) @c
@Gi (2; 4; 4; 1 ; 2 ; c) wi () @c
(( + k)=2 + i + 2) k k=2+i =2 ck=2+i−1 (k=2 + i) (=2) ( + kc)(+k)=2+i+1
(18)
336
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
×
1 (n + 2)2
1 − n+2 1 + +2
∞
k 2 c2 2kc 2 + + + kc + kc + kc
kc + ( + k)=2 + i + 1 ( + k)=2 + i + 1
k2 c (2 − 1 )2 + ( + k)=2 + i + 1 ( + k)=2 + i + 1 (2 + kc)2
1 − ( + 2)2 = (22 )2
2(2 − 1 )2 k3 c (2 − 1 )4 k 2 4 c2 2 + + + kc (2 + kc)2 ( + kc) (2 + kc)4 ( + kc)
wi ()
i=0
(( + k)=2 + i + 2) k k=2+i =2 ck=2+i−1 (k=2 + i) (=2) ( + kc)(+k)=2+i+1
×D1 (c)D2 (c);
(19)
where
D1 (c) = +2
D2 (c) =
(2 − 1 )2 kc 1+ (2 + kc)2
+ kc ; − n+2
+ kc (2 − 1 )2 kc 1 1 1+ : − + ( + k)=2 + i + 1 + kc n + 2 +2 (2 + kc)2
(20)
Noting that n = + k and (2 − 1 )2 kc ¿ 0; + 2 (2 + kc)2
(21)
we obtain D1 (c) ¿ =
+ kc − +2 n+2 −k[( + 2)c − ] : ( + 2)(n + 2)
(22)
Also, noting that i ¿ 0, and 1 1 ; 6 ( + k)=2 + i + 1 n=2 + 1
(23)
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
we obtain D2 (c) 6 6
1 1 − ( + k)=2 + i + 1 + kc k[( + 2)c − ] : (n + 2)( + kc)( + 2)
+ kc + n+2 +2
337
(24)
We see from (22) and (24) that if c ¡ =( + 2), then D1 (c) ¿ 0 and D2 (c) ¡ 0. This indicates that MSE (ˆ21 2 (c)) decreases as c increases from 0 to =( + 2). Since the pre-test iterative variance estimator with c = =( + 2) has smallest MSE among the pre-test estimators with c ∈ [0; =( + 2)], the critical value c = = ( + 2) may be regarded as an appropriate critical value of the pre-test. Since the pre-test estimator reduces to the iterative variance estimator when c = 0, we have the following theorem: Theorem 1. The pre-test iterative variance estimator with c = =( + 2) dominates the iterative variance estimator in terms of MSE. Since Theorem 1 holds for all pairs of 1 and 2 , we see that the estimators ˆ2MM , ˆ2AMM , and ˆ2SR are inadmissible. When 1 = 2 and c = =( + 2), ˆ21 2 (c) reduces to the Stein variance estimator. Also, when 1 = 2 and c = 0, ˆ21 2 (c) reduces to the usual estimator. Thus, Theorem 1 gives an alternative proof of the well-known result that the Stein variance estimator dominates the usual estimator in terms of MSE. However, Theorem 1 does not necessarily indicate that the pre-test iterative variance estimator dominates the usual estimator. Also, although the MSE of ˆ21 2 (c) decreases as c increases from 0 to =( + 2), the MSE performance needs to be explored when c increases from =( + 2). Since more theoretical analysis of MSE is di?cult, we examine the MSE performance of ˆ21 2 (c) by numerical evaluations in the next section. 4. Numerical analysis We examine the MSE performances of the pre-test iterative variance estimators using the MMSE, AMMSE and SR estimators by numerical evaluations. The parameter values used in the numerical evaluations are: n = 15; 20; 30; 40; k = 3; 4; 5; 6; 7; 8; = various value. The numerical evaluations are executed on a personal computer, using the FORTRAN code. Since Gi (p; q; r; 1 ; 2 ; c) given in (17) is expressed by an integral, we use Simpson’s 3=8 rule with 500 equal subdivisions. Also, the inJnite series in H (p; q; r; 1 ; 2 ; c) given in (16) is judged to converge when the increment of the series gets smaller than 10−10 . Table 1 shows the relative MSE of the pre-test iterative variance estimator using the AMMSE estimator to the usual estimator for various values of c. Since the relative MSE is deJned as MSE(ˆ21 2 (c))=MSE(s2 ), where MSE(s2 ) = 2=( + 2), the pre-test iterative variance estimator has smaller MSE than the usual estimator if the relative MSE is smaller than unity. We see from Table 1 that the relative MSE decreases as c increases from zero to =( + 2) = 0:895. This conJrms the result given in Theorem 1. However, when = 0, the relative MSE decreases monotonically as c increases from 0.895. When = 1, the relative MSE decreases as c increases from 0.895 to 1.2, and it has a local minimum at = 1:2. Also, as c increases from 1.2, the MSE increases and it has a local maximum at = 2:2. When = 15, the relative MSE decreases as c increases from 0.895 to 1.2, and it increases monotonically as c increases from 1:2. Thus, the MSE seems to have a global minimum at c = 1:2. When the MMSE and SR estimators are used in the residual vector, the MSE performance of the pre-test iterative variance estimator for various values of c is similar to that of the pre-test iterative variance estimator using the AMMSE estimator. Thus, we do not show the results when the MMSE and SR estimators are used in the residual vector. The relative MSE decreases as c increases from zero to
338
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340 Table 1 Relative MSEs of the pre-test iterative variance estimator using the AMMSE estimator for k = 3 and n = 20 c
=0
=1
= 15
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
1.0070 0.9984 0.9859 0.9740 0.9641 0.9566 0.9513 0.9479 0.9458 0.9446 0.9441 0.9440 0.9440 0.9440 0.9438 0.9435 0.9429 0.9421 0.9410 0.9397 0.9383 0.9366 0.9348 0.9329 0.9308 0.9287 0.9265 0.9243 0.9221 0.9199 0.9177
1.0130 1.0073 0.9980 0.9879 0.9784 0.9700 0.9632 0.9579 0.9540 0.9514 0.9497 0.9490 0.9488 0.9491 0.9496 0.9504 0.9512 0.9519 0.9527 0.9532 0.9537 0.9539 0.9540 0.9539 0.9536 0.9532 0.9525 0.9518 0.9509 0.9498 0.9487
1.0319 1.0318 1.0318 1.0317 1.0315 1.0311 1.0307 1.0302 1.0296 1.0291 1.0286 1.0282 1.0281 1.0284 1.0291 1.0305 1.0327 1.0358 1.0399 1.0451 1.0517 1.0597 1.0693 1.0805 1.0934 1.1081 1.1247 1.1431 1.1635 1.1859 1.2102
=( + 2), and the value of c at which the relative MSE has a local minimum depends on unknown parameter . Thus, we use the critical value c = =( + 2) in the following numerical evaluations. Figs. 1 and 2 show the typical results of the numerical evaluations. In the Jgures, ‘SV’ indicates the Stein variance estimator, and ‘MMSE’, ‘AMMSE’ and ‘SR’ indicate the pre-test iterative variance estimators using the AMMSE, MMSE and SR estimators, respectively. When 2 = 0, the integral in (17) is expressed as c∗ t k=2+p−r+i−1 (1 − t)+q−1 dt: (25) 0
Since the minimum values of p − r and i are −2 and 0, k should be larger than or equal to 5 for the integral to exist. Since the value of 2 is 0 in the SR estimator, the MSE does not exist when k 6 4. Thus, when k = 3, the relative MSE of the pre-test iterative variance estimator using the SR estimator is not shown. We see from Fig. 1 that the pre-test iterative variance estimators using the MMSE and AMMSE estimators do not dominate the usual estimator. However, the pre-test iterative variance estimators have smaller MSE than the Stein variance estimator for 0 6 6 3:05. In particular, the pre-test iterative variance estimator using the AMMSE estimator has smallest MSE for 0 6 6 3:05 among the three estimators considered here. Comparing
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
339
Fig. 1. Relative MSEs of the pre-test iterative variance estimators for k = 3 and n = 20.
Fig. 2. Relative MSEs of the pre-test iterative variance estimators for k = 6 and n = 20.
the maximum gain in MSE by using the pre-test iterative variance estimator instead of the usual estimator at = 0 with the maximum loss around = 19, the gain is larger. We see from Fig. 2 that both the maximum gain and loss in MSE by using the pre-test iterative variance estimator gets large when k increases from 3 to 6. Again, the pre-test iterative variance estimator using the AMMSE estimator has smallest MSE around = 0 among the four estimators considered here. The MSE performances of the Stein variance estimator and the pre-test iterative variance estimator using the MMSE estimator are comparable, though the relative MSE of the latter is slightly larger than unity for ¿ 19. Also, although the pre-test iterative variance estimator using the AMMSE estimator has much smaller MSE around = 0 than the pre-test iterative variance estimator using the SR estimator, the maximums of the relative MSEs of these two estimators are almost identical. Again, comparing the maximum gain in MSE by using the pre-test iterative variance estimator instead of the usual estimator at = 0 with the maximum loss around = 19, the gain is larger.
Acknowledgements The author is grateful to the referee for very helpful comments and suggestions.
340
K. Ohtani / Statistics & Probability Letters 54 (2001) 331 – 340
References Farebrother, R.W., 1975. The minimum mean square error linear estimator and ridge regression. Technometrics 17, 127–128. James, W., Stein, C., 1961. Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1, University of California Press, Berkeley, pp. 361–379. Ohtani, K., 1996. On an adjustment of degrees of freedom in the minimum mean squared error estimator. Comm. Statist. Theory Methods 25, 3049–3058. Ohtani, K., 2000. Pre-test double k-class estimators in linear regression. J. Statist. Plann. Inference 87, 287–299. Stein, C., 1956. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, pp. 197–206. Stein, C., 1964. Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Statist. Math. 16, 155–160. Theil, H., 1971. Principles of Econometrics. Wiley, New York. Ullah, A., Ullah, S., 1978. Double k-class estimators of coe?cients in linear regression. Econometrica 46, 705–722. Vinod, H.D., 1976. Simulation and extension of a minimum mean squared error estimator in comparison with Stein’s. Technometrics 18, 491–496. Wan, A.T.K., Kurumai, H., 1999. An iterative feasible minimum mean squared error estimator of the disturbance variance in linear regression under asymmetric loss. Statist. Probab. Lett. 45, 253–259.