Journal of Statistical Planning and Inference 115 (2003) 425 – 439
www.elsevier.com/locate/jspi
Con%dence intervals based on robust regression Christopher Fielda; ∗ , Julie Zhoub a Department
of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada, B3H 3J5 b University of Victoria, Canada Received 29 January 2001; accepted 16 March 2002
Abstract In this paper, we consider constructing reliable con%dence intervals for regression parameters using robust M-estimation allowing for the possibility of time series correlation among the errors. The change of variance function is used to approximate the theoretical coverage for con%dence intervals based on uncorrelated errors. Bootstrap techniques for dependent data and weighted empirical adaptive variance estimators are used to construct three variance estimates. A simulation study is conducted to investigate the performance of those variance estimates for small and large sample sizes. In particular, the simulated empirical coverages are examined for various models of correlated errors. Our procedures show improvement over intervals based on independence but do illustrate the need for caution in carrying out inference in linear models for c 2002 Elsevier Science moderate sample sizes if correlation among the errors is suspected. B.V. All rights reserved. MSC: primary 62E20; 62G35; secondary 62G15 Keywords: Bootstrap; Change of variance function; Correlated errors; M-estimator; Robust con%dence interval
1. Introduction In this paper, we investigate con%dence intervals for regression parameters. When the errors in a regression model are uncorrelated, the usual t-interval works well and has coverage close to the nominal level. However, when the errors are weakly correlated, the coverage of the t-interval based on uncorrelated errors can be quite di>erent from the nominal level. Thus the t-interval is not robust against small departures from ∗
Research is supported by the Natural Sciences and Engineering Research Council of Canada. Corresponding author. Tel.: +1-902-494-3339; fax: +1-902-494-5130. E-mail address: %
[email protected] (C. Field).
c 2002 Elsevier Science B.V. All rights reserved. 0378-3758/02/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 2 ) 0 0 1 6 8 - 4
426
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
uncorrelated errors. In many situations, particularly where the data is collected in sequence, it is likely that observations may be correlated, and it is important to construct robust (reliable) con%dence intervals allowing for the possibility of correlation among the errors. Such intervals should have coverages close to the nominal level for both uncorrelated and weakly correlated errors. Two issues are addressed in the paper: (I) What is the coverage of the t-interval based on uncorrelated errors if the errors are weakly correlated? (II) How do we construct reliable con%dence intervals in the presence of possible time series correlation among the errors? In particular, we use Huber’s M-estimator to estimate regression parameters, but the results are quite general and can be applied to other estimators (e.g. least squares estimator). We consider a linear regression model, y i = Â T xi + i ;
i = 1; : : : ; n;
where  ∈ Rq is an unknown parameter, and the errors {i } may be correlated with mean 0 and variance 2 . A robust M-estimator ˆ (Huber 1981, p. 162) is de%ned to be a solution of min Â
n
(yi − ÂT xi );
i=1
where function is usually symmetric and convex with a unique minimum at zero. With a proper choice of function , the M-estimator is robust against outliers in the response variable y. When the errors are uncorrelated, under some regular conditions the M-estimator is asymptotically normally distributed (Huber, 1973, 1981), i.e. √ ˆ n(ˆ − Â) → N (0; ACOV(Â)) as n → ∞; where, with = (superscript X = (x1 ; : : : ; xn )T , ˆ = 2 ACOV(Â)
denotes derivative in this paper) and design matrix
E[ ()]2 lim (E[ ()])2 n→∞
X TX n
−1 :
Using this asymptotic distribution, one can construct approximate 100(1 − )% con%dence interval for a linear combination of regression parameters aT  as ˆ (1.1) aT ˆ ± t=2; 0:6n aT COV(Â)a; ˆ = ACOV(Â)=n, ˆ where COV(Â) and the t-distribution with 0:6n degrees of freedom is suggested by Field and Wiens (1994) to obtain good coverage and is also con%rmed by our simulation results. Some theoretical basis for the reduced degrees of freedom for the studentized ratio is given in Huber (1970) based on expansions of the terms
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
427
involved in the ratio. The actual ratio is largely based on empirical studies as noted above. ˆ is given in Huber (1981, An unbiased estimate for the covariance matrix COV(Â) p. 173), ( (ri = )) ˆ 2 T −1 2 2 [1=(n − q)] (X X ) ; K ˆ ˆ 2 [(1=n) (ri = )] T where ˆ is a scale estimate of the residuals ri = yi − ˆ xi , for example (Huber 1981, p. 107)
ˆ = MAD(ri ) = 1:4826 med i {|ri − med j rj |} and K is a correction factor given by K = 1 + (q=n)var( )=(E )2 . When there is correlation among the errors, a good empirical coverage depends on a reliable covariance estimate. If we ignore the correlation and still use the covariance estimates for the uncorrelated errors, then the empirical coverages of con%dence intervals are very di>erent from the nominal level. For instance, Table 2 of Field and Wiens (1994) presented the lower coverage (the proportion of those intervals whose upper limits lie below the true parameter value) for 90% con%dence intervals based on Huber’s M-estimate under MA(1) (the %rst-order moving average process), AR(1) (the %rst-order autoregressive process) and MA(7) correlation structures for sample size n = 20. These lower coverages are 0.117, 0.182 and 0.010, compared to the nominal level at 0.050. The change of variance function (CVF) is a very useful tool to gain an understanding of Issue (I) and we will use in Section 2 to assess the coverage. Both theoretical and simulated coverages of interval (1.1) are presented for various sample sizes, and the results are consistent. Even if the autocorrelation among the errors is very weak and sometimes hard to detect, the coverages are very di>erent from the nominal level for all sample sizes. Some research has been done to improve the empirical coverages of con%dence intervals, when data are dependent. Field and Wiens (1994) studied one-step M-estimators by transforming the model to one of approximate independence. The empirical coverages have been improved, but the di>erences between the empirical coverages and the nominal levels are still visible. Lumley and Heagerty (1999) investigated weighted empirical adaptive variance estimators, and simulated empirical coverages were obtained for large sample size n = 100. Their approach is based on the sandwich estimate of the covariance matrix which arises when estimates are obtained from estimating equations. To take account of the dependence, they introduce weighting into the central term of the sandwich estimate. Their formulation includes quadratic spectral estimates, window or block resampling methods and jackknife methods. In this paper, we construct reliable con%dence intervals for regression parameters without knowing the correlation structures among the errors when the dependence is mild. We propose three di>erent procedures for calculating these intervals. In each case we estimate the variance of the estimate taking into account the unknown correlation in the errors. The %rst procedure uses a bootstrap procedure in which we estimate an AR structure for the residuals and then resample using this estimated AR model. The
428
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
remaining procedures build in a test for a simple correlation among the residuals. If we conclude there is no correlation, we use the uncorrelated variance estimate while if it is correlated we use an estimate of the variance which assumes correlated errors. The %rst of these methods assumes an MA(1) structure and estimates the MA parameter. A second method uses a particular case of the weighted adaptive variance estimates of Lumley and Heagerty (1999). These con%dence intervals have good empirical coverages for uncorrelated errors as well as for weakly correlated errors. In Section 3, we give details of the three variance estimation methods for M-estimators which accommodate the possibility of correlations. In Section 4, the empirical coverages of 90% and 95% con%dence intervals are examined through simulation study for sample sizes n = 20; 40; 60 and 100. Concluding remarks are in Section 5.
2. Change of variance function The change of variance function provides a natural mechanism for assessing coverage under deviations from the idealized model. We begin by looking at the change of variance function (CVF) for the asymptotic covariance matrix. In Field and Wiens (1994), the asymptotic covariance matrix for M-estimator ˆ under correlated errors is T −1 T −1 T 2 X X X X X PX ˆ = 2 E[ ()] ACOV(Â) lim ; (2.1) (E[ ()])2 n→∞ n n n where P is the autocorrelation matrix of { (i )}. The variance of aT ˆ can be approximated by ˆ = c0 aT (X T X )−1 (X T PX )(X T X )−1 a; ˆ X; P) = aT COV(Â)a V (Â; 2 2 ˆ ˆ ˆ [1=(n−q)] ( (ri = )) ˆ 2 =[(1=n) (ri = )] ˆ 2. where COV(Â)=ACOV( Â)=n and c0 =K ˆ X; P) = V (Â; ˆ X; P). Denote the standard error S(Â; As noted earlier, ignoring the error correlations (i.e. setting P = I ) gives coverages for the con%dence intervals which are very di>erent from the nominal levels even if the correlation is very weak. To discuss this issue in more detail and %nd the coverage of interval (1.1), we use the CVF when there is a small departure from uncorrelated errors. The CVF is a very useful tool to study robustness of a procedure and indicates the local robustness when there is a small departure from the underlying model assumptions. General theory about the CVF can be found in Hampel et al. (1986). Here we de%ne the CVF of aT ˆ in the direction of an autocorrelation matrix P1 to be ˆ X; (1 − v)I + vP1 ) 9V (Â; ˆ X; I ); V (Â; CVF(P1 ) = 9v v=0
where 9=9v denotes the partial derivative with respect to v. From (2.1) we get CVF(P1 ) =
aT (X T X )−1 (X T (P1 − I )X )(X T X )−1 a : aT (X T X )−1 a
(2.2)
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
429
Remarks: 1. The CVF indicates a relative change in the variance when the errors move from independence to small correlation in the direction of error structure P1 . If P1 = I , then CVF(I ) = 0. 2. If the errors, {i }ni=1 are MA(1) errors, then the autocorrelation matrix is P( 1 ) = (pi; j ) with pi; i = 1; pi; i+1 = pi+1; i = 1 ; pi; j = 0 for |i − j| ¿ 1. We can write, for 1 ¿ 0, P( 1 ) = (1 − v)I + vP1
with v =
1 ; P1 = P(0:5): 0:5
(2.3)
For 1 ¡ 0, we have v = − 1 =0:5, and P1 = P(−0:5). 3. For general errors, we can de%ne v = | 1 |=0:5 and P1 = I + (P − I )=v, where 1 is the lag-1 autocorrelation. Using the CVF, we can approximate the standard error of aT ˆ by ˆ X; P) S(Â;
2 3 ˆ X; I ) 1 + v CVF(P1 ) − v CVF2 (P1 ) + v CVF3 (P1 ) + · · · =S(Â; 2 8 16
(2.4)
ˆ X; I ) is the for some P1 and v ¿ 0 satisfying P = (1 − v)I + vP1 . Notice that S(Â; Tˆ standard error of a  for uncorrelated errors. From (2.4), if CVF(P1 ) ¿ 0 and v is ˆ X; I ) underestimates S(Â; ˆ X; P). Therefore, the con%dence interval based small, then S(Â; ˆ on S(Â; X; I ) is shorter, and the empirical coverage is smaller than the nominal value. On the other hand, if CVF(P1 ) ¡ 0, the empirical coverage is larger than the nominal value. The following theorem gives an approximation for the empirical coverages of con%dence interval (1.1) based on uncorrelated errors. Theorem 2.1. For weakly correlated errors with autocorrelation matrix P; the empirical coverage for the 100(1 − )% con6dence interval (1.1) based on uncorrelated errors is 100(1 − )%; ˆ where ˆ is approximately t=2; 0:6n ; (2.5) 2 · P T0:6n ¿ 1 + 2v CVF(P1 ) for some P1 and v ¿ 0 satisfying P = (1 − v)I + vP1 . Proof. Since the 100(1 − )% con%dence interval based on uncorrelated errors is ˆ = aT ˆ ± t=2; 0:6n S(Â; ˆ X; I ); aT ˆ ± t=2; 0:6n aT COV(Â)a from (2.4) we have ˆ X; I ) ≈ t=2; 0:6n S(Â;
t=2; 0:6n ˆ X; P): S(Â; 1 + 2v CVF(P1 )
430
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
Table 1 The change of variance function for MA(1) processes MA(1)
n = 10
1
¿ 0; v =
2 1 1+ 21
1
¡ 0; v =
−2 1 1+ 21
n = 20
n = 40
n = 60
n = 100
0.806
0.900
0.949
0.966
0.979
−0:806
−0:900
−0:949
−0:966
−0:979
Table 2 The empirical coverages for 90% con%dence intervals calculated from Theorem 2.1, when the errors are MA(1) processes with parameter 1 1
0.1 0.2 −0:1 −0:2
n = 10
n = 20
n = 40
n = 60
n = 100
0.878 0.856 0.921 0.939
0.872 0.845 0.926 0.948
0.869 0.839 0.929 0.953
0.868 0.837 0.930 0.955
0.867 0.835 0.931 0.956
Table 3 The empirical coverages for 95% con%dence intervals calculated from Theorem 2.1, when the errors are MA(1) processes with parameter 1 1
0.1 0.2 −0:1 −0:2
n = 10
n = 20
n = 40
n = 60
n = 100
0.936 0.922 0.962 0.973
0.931 0.912 0.966 0.978
0.928 0.906 0.968 0.981
0.928 0.904 0.969 0.983
0.927 0.903 0.969 0.983
Thus when there is autocorrelation among the errors; the con%dence coverage is 100(1 − )%; ˆ where t=2; 0:6n t=2; ; ˆ 0:6n = 1 + 2v CVF(P1 ) which is equivalent to (2.5). Example 1. We calculate the CVF and the coverages for 90% and 95% con%dence intervals using Theorem 2.1. Let us consider the simple linear regression yi = !0 + !1 xi + i ; with design points xi = 5(i − 1)=(n − 1); i = 1; : : : ; n. Assume the errors follow an MA(1) process; i.e. i = wi + 1 wi−1 ; where 1 is the MA(1) parameter and the wi is white noise. The CVF of !0 + !1 is calculated from (2.2) with P1 and v in (2.3). The representative results are reported in Table 1 for various sample sizes. To compute the empirical coverage for 90% and 95% con%dence intervals in (1.1) based on uncorrelated errors, we apply Theorem 2.1, and the results are in Tables 2 and 3 for various values of 1 and various sample sizes. For 1 ¿ 0, the CVF in Table 1 is positive, and the empirical coverages in Tables 2 and 3 are lower than the nominal levels. On the other hand, for 1 ¡ 0, the CVF is negative, and the empirical coverages are higher than the nominal levels. This situation does not improve as n increases. In
431
0.8 0.7
confidence coverage
0.9
1.0
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
0.6
uncorrelated MA(1): 0.2 MA(1): -0.2 MA(1): 0.1 MA(1): -0.1
20
40
60
80
100
sample size n Fig. 1. Simulated coverages of 90% con%dence intervals based on uncorrelated errors when the errors are simulated from %ve processes.
fact as n → ∞, the empirical coverages converge to limits very di>erent from the nominal levels. Simulated coverages for con%dence interval (1.1) for !0 + !1 are obtained to check the accuracy of the result in Theorem 2.1. The simulated coverage is the percent of con%dence intervals containing the true value of !0 +!1 based on 5000 simulation runs. Huber’s M-estimator is applied, i.e. (u)=min(c; max(u; −c)) with tuning constant c = 1:345 corresponding to 95% ePciency. More details of computing M-estimate are given in Section 4. We take true value of !0 + !1 = 1 + 1 = 2. The results are plotted in Fig. 1
432
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
for 90% con%dence intervals for various values of the MA(1) parameter. Similar results are obtained for 95% intervals and omitted here. The values in Table 2 are consistent with Fig. 1. It is also very interesting to note that our results in Table 3 are very close to that in Sche>e (1959, p. 339) for con%dence interval coverage for location estimate, where 95% con%dence intervals and large sample size are considered. Therefore, the result in Theorem 2.1 provides a very good approximation to the coverage of the t-interval (1.1) based on uncorrelated errors. The change of variance function is a useful tool to assess the robustness of a con%dence interval. In Theorem 2.1, the result does not depend on the function used in the computation ˆ Thus the result holds for any M-estimator. The problem shown in Fig. 1 is of Â. common for all M-estimators. It should be noted that the situation does not improve as n increases. As a special case, the least squares estimates also has this problem. Consequently, one needs to give attention to the construction of con%dence intervals when errors may be correlated.
3. Con#dence intervals for correlated errors The key point to obtain a reliable variance estimate using (2.1) is to %nd an accurate estimate for the autocorrelation matrix P from the residuals. When the sample size is small, this is diPcult to do. Here we propose three methods to estimate the variance of M-estimate when there is possibility of correlation among the errors. One method gives an estimate for P, while the other two do not directly estimate P. Bootstrap techniques for dependent data and Lumley and Heagerty’s (1999) weighted empirical adaptive variance estimators are incorporated in these methods. We now proceed with a detailed description of the procedures. Method one (BP) is a bootstrap procedure for dependent data. The idea is to use the bootstrap to get a variance estimate for the parameter of interest. The error process is approximated with an AR model. Then based on the estimated order and parameters of the AR model, we transform the residuals into white noise. Bootstrap resampling is applied to white noise, and dependent errors are obtained by the same AR model. Here are the detailed steps. T B1. Let ri = yi − ˆ xi ; i = 1; : : : ; n be the residuals, and rQ be the mean of the residuals. Q i = 1; : : : ; n. B2. Center the residuals: r˜i = ri − r; B3. Estimate the order p and the AR parameters ˆ 1 ; : : : ; ˆ p from the centered residuals, r˜i = 1 r˜i−1 + : : : + p r˜i−p + ui ; i = p + 1; : : : ; n, where ui ’s are white noise. B4. Compute estimated white noise: uˆ i = r˜i − ˆ 1 r˜i−1 − · · · − ˆ p r˜i−p ; i = p + 1; : : : ; n. n B5. Center the estimated white noise: u˜ i = uˆ i −(1=(n−p)) j=p+1 (uˆ j ); i=p+1; : : : ; n. j B6. For j = 1; : : : ; k, construct bootstrap estimates ˆ as follows: (i) Sample n + 50 observations from centered estimated white noise u˜ i ; i = p + 1; : : : ; n with replacement, and denote them ui∗ ; i = 1; : : : ; n + 50. (ii) Construct residuals from ui∗ based on estimated AR model, ri∗ = ui∗ for i = ∗ ∗ 1; : : : ; p, and ri∗ = ui∗ + ˆ 1 ri−1 + · · · + ˆ p ri−p for i = p + 1; : : : ; n + 50.
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
433
∗ ; i = 1; : : : ; n to obtain bootstrap obser(iii) Take the last n residuals as i∗ = ri+50 T ∗ ∗ vations on y, i.e. yi = ˆ xi + i ; i = 1; : : : ; n.
j (iv) Compute the jth bootstrap estimate ˆ using M-estimate based on data (xi ; yi∗ ); i = 1; : : : ; n. j B7. Estimate the covariance of ˆ based on k bootstrap estimates ˆ ; j = 1; : : : ; k, T k k k j 1 ˆi 1 ˆj 1 ˆi ˆ ˆ  − ( )  − ( ) : (3.1) COV(Â) = k −1 k k j=1
i=1
i=1
It should be noted that we are using model-based resampling (see Davison and Hinkley, 1997, Section 8.2.2). An alternative would have been to use block resampling methods. However in Smith and Field (1993) simulation results show that the performance of the block bootstrap is quite dependent on the window size and we felt that the model-based method was the better choice. Bose (1998) showed that model-based resampling has good asymptotic properties in many cases. We also tried using the phase scrambling technique for bootstrapping (Davison and Hinkley, 1997, Section 8.2.4) and found the results were qualitatively quite similar to those from the model-based resampling. The remaining two methods use the idea of testing for the presence of correlation among the errors and then either using the uncorrelated variance estimate or an estimate taking correlation into account. Method two (TMA) is based on a testing procedure and an MA(1) approximation. In (2.1) we need an estimate for P to get the variance for correlated errors. For possibly weak correlation among the errors, we can write the autocorrelation matrix as P = I + M1 + M 2 + · · · ; where (Mk )ij = k only if |i − j| = k, and (Mk )ij = 0 otherwise, for k = 1; 2; : : : ; n − 1: For very weak autocorrelation among the errors, we expect that | k | is decreasing as k increases. We also expect that | k | to be quite small for all k. So it is reasonable to approximate P by I +M1 . In this way we only need to estimate the lag-1 autocorrelation ˆ On the other hand, I + M1 represents the autocorrelation matrix function 1 to get P. of an MA(1) process. Thus P is estimated by for i = 1; : : : ; n; p =1 ii Pˆ = (pij )n×n with pij = ˆ1 for |i − j| = 1; (3.2) pij = 0 otherwise; where ˆ1 is an estimate of the lag-1 autocorrelation function of the errors. To estimate 1 , we propose the following two-step method. Step 1: (Estimating) Apply Durbin–Watson estimate to the residuals from robust (M-estimate) regression, i.e. n−1 i=1 ri ri+1 : (3.3) ˜1 = n 2 i=1 ri
434
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
Step 2: (Testing) Test H0 : 1 = 0 vs. H1 : 1 = 0 based on the estimate ˜1 in Step 1. The test statistic is √ 3z0 + ˜1 ; t = n − 3 z0 − 4n where z0 = 0:5 log((1 + ˜1 )=(1 − ˜1 )). The distribution of the test statistic under H0 is t-distribution with n − 2 degrees of freedom. The level of signi%cance for this test is 0 . The value of 0 indicates the sensitivity of the test to autocorrelation. If we cannot reject the H0 , then we set ˆ1 = 0, otherwise we set ˆ1 = ˜1 . Also to be an autocorrelation matrix of MA(1) process, we require that | ˆ1 | 6 0:5. If | ˆ1 | ¿ 0:5, we set ˆ1 = 0:5 sign( ˆ1 ). Method three (TLH) is a modi%cation of TMA. The modi%cation is only in Step 2. If we reject the H0 , method TLH applies a weighted empirical adaptive variance estimate in Lumley and Heagerty (1999). The weighted empirical adaptive variance estimate is based on modifying the usual ‘sandwich’ estimate of the asymptotic variance for estimates derived as the solution of estimating equations. Eq. (2.1) gives this estimate when the errors are correlated. The outer terms which correspond to the mean of the derivative of the score function are estimated by an empirical mean even for correlated ˆ given below for the outer terms. The inner term involves data. We use the quantity In (Â) the variance of the score function and its estimation needs to take the correlation structure into account. We use the estimator proposed by Lumley and Heagerty (p. 461 for motivation) which proceeds by summing over correlated pairs but does it so that the pairs which are close together receive higher weight and that the weights ˆ decrease as the distance between the observations increases. We use the quantity Jn (Â) given below as the estimate. The introduction of weights into the empirical estimate can be shown to include several methods in the literature including quadratic spectral estimates, block or window resampling and jackknife methods. We have used a weight function derived from modi%ed Bartlett weights of the form given below where R(n) is suggested to be of order O(n1=3 ). The variance estimate for an M-estimator is computed as the following (cf Eq. (2.2) of Lumley and Heagerty, 1999). ˆ n (Â)I ˆ n−1 (Â); ˆ ˆ = 1 In−1 (Â)J COV(Â) n where n ri ˆ =1 xi xiT ; In (Â) n ˆ i=1
n 2 r r j i ˆ = ˆ wn∗ xi xjT ; wijn Jn (Â) n ˆ ˆ i; j=1
ˆ = MAD(ri ); |i − j| ; |i − j| 6 R(n); 1− wijn = R(n) 0; |i − j| ¿ R(n);
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
wn∗ = 1 −
435
n 1 wijn : n2 i; j=1
The performance of the three proposed variance estimates is examined through a simulation study in the next section. Their empirical coverages are compared with that in Tables 2 and 3.
4. Simulation study and numerical results A simulation study is conducted to compare the empirical coverages of con%dence intervals for three proposed variance estimates in Section 3. The simple linear regression model is considered throughout the simulation study, yi = !0 + !1 xi + i , with true value (!0 ; !1 ) = (1; 1) and xi is equally spaced on interval [0,5]. Huber’s M-estimator with (u) = min(c; max(u; −c)) and c = 1:345 is applied to calculate !ˆ0 and !ˆ1 . The correction factor is approximated by K = 1 + (q=n)(1 − n0 )=n0 , where n0 is the relative frequency of the residuals satisfying −c ¡ ri ¡ c. It is direct to show that n0 is a moment estimate of E( ) and n0 (1 − n0 ) is a moment estimate of var( ). The scale estimate is ˆ = MAD(ri ). Con%dence intervals are constructed for !0 + !1 . In Step 2 of methods TMA and TLH, several values of the level of signi%cance 0 were tried, and we %nd that 0 =0:10 gives the best results. When bootstrap sampling is involved, the number of bootstrap runs conducted is k = 100. For method TLH, R(n) is set to be 3 in the weights which is approximately n1=3 for the range of n of interest. For each case 1000 simulation runs are conducted to obtain empirical coverages. Coverages under two situations are studied: (i) the errors are uncorrelated, and (ii) the errors are weakly correlated with short-term dependence. For uncorrelated errors, empirical coverages are obtained for each of the following error distributions: the standard normal N (0; 1), symmetric mixture normal 0:8N (0; 1)+ 0:2N (0; 52 ), and asymmetric mixture normal 0:8N (0; 1) + 0:2N (10; 1). Both mixture models generate outliers. Table 4 presents the simulation results for 90% con%dence intervals for three variance estimates: BP, TMA and TLH. The coverages for BP have large deviations from the nominal level. The coverages for TMA and TLH are all close to the nominal level which indicate that the two estimates perform well for uncorrelated Table 4 Simulated coverages for three variance estimates for uncorrelated errors; sample size n = 20; 30; 1000 simulation runs for each case, the nominal level 90% Error
n
BP
TMA
TLH
LH
N (0; 1)
20 30
0.864 0.869
0.896 0.880
0.888 0.889
0.857 0.869
0:8N (0; 1) + 0:2N (0; 52 )
20 30
0.841 0.855
0.885 0.884
0.905 0.881
0.872 0.880
0:8N (0; 1) + 0:2N (10; 1)
20 30
0.912 0.951
0.910 0.917
0.900 0.911
0.865 0.883
436
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
90% level 1.0 0.9 0.8 0.7
AR(1): 0.1 AR(1): 0.2 AR(1): -0.1 AR(1): -0.2
0.6
confidence coverage
0.9 0.8 0.7
MA(1): 0.1 MA(1): 0.2 MA(1): -0.1 MA(1): -0.2
0.6
confidence coverage
1.0
90% level
20
40
60
80
sample size n
(a)
20
100
40
(c)
95% level
100
1.0 0.9 0.8 0.7
AR(1): 0.1 AR(1): 0.2 AR(1): -0.1 AR(1): -0.2
0.6
confidence coverage
1.0 0.9 0.8 0.7 0.6
confidence coverage
80
95% level
MA(1): 0.1 MA(1): 0.2 MA(1): -0.1 MA(1): -0.2
20 (b)
60
sample size n
40
60
80
sample size n
100
20 (d)
40
60
80
100
sample size n
Fig. 2. Plots of simulated coverages of 90% and 95% con%dence intervals vs. sample size for method BP.
errors. To see the e>ect of pretesting in TLH, we also obtain the empirical coverages for Lumley and Heagerty (LH) method without testing in Table 4. It is clear that TLH has better coverages. The results with no correlation give an indication that we do not pay a price in the case of independence by testing, and we always run the possibility of deciding there is independence when there is in fact none. For correlated errors, since we are focusing on very weak correlation and short-term dependence, we consider the %rst-order moving average MA(1) and the %rst-order autoregressive AR(1) error processes. Coverages are simulated for MA(1) with four di>erent parameter values ±0:1 and ±0:2 and the same for AR(1). Both 90% and 95% con%dence intervals are constructed for sample size n = 20; 40; 60 and 100, which allow us to check the asymptotic empirical coverages for the three methods. Results are presented in Figs. 2– 4 for the standard normal N (0; 1). Results for the normal mixtures
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
90% level 1.0 0.9 0.8 0.7
AR(1): 0.1 AR(1): 0.2 AR(1): -0.1 AR(1): -0.2
0.6
confidence coverage
0.9 0.8 0.7 0.6
confidence coverage
1.0
90% level
MA(1): 0.1 MA(1): 0.2 MA(1): -0.1 MA(1): -0.2
20
40
60
80
100
sample size n
(a)
20
60
80
100
1.0 0.9 0.8 0.7
AR(1): 0.1 AR(1): 0.2 AR(1): -0.1 AR(1): -0.2
0.6
0.7
MA(1): 0.1 MA(1): 0.2 MA(1): -0.1 MA(1): -0.2
confidence coverage
0.8
0.9
1.0
95% level
0.6
confidence coverage
40
sample size n
(c)
95% level
20 (b)
437
40
60
80
sample size n
100
20 (d)
40
60
80
100
sample size n
Fig. 3. Plots of simulated coverages of 90% and 95% con%dence intervals vs. sample size for method TMA.
are similar and are not reported. Those results show that the coverages for BP have a slow convergence rate to the nominal level, and all the coverages are lower than the nominal level. Thus the variance is underestimated based on BP. The coverages for TMA and TLH converge faster than that for BP. Almost all coverages for TMA are also lower than the nominal level, but not for TLH. All three methods show improvement over the method based on uncorrelated errors when there is weak correlation among the errors. The improvement is very signi%cant especially for large sample sizes, see Figs. 1– 4. Methods TMA and TLH also perform well for uncorrelated errors (Table 4). Therefore, we recommend TMA and TLH for constructing reliable con%dence intervals allowing for the possibility of correlation among the errors. Interval (1.1) assuming uncorrelated errors has poor performance even if the correlation among the errors is very weak. It should be noted that we have not made explicit assumptions for any of the methods. The TMA and TLH rely
438
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
90% level 1.0 0.9 0.8 0.7
AR(1): 0.1 AR(1): 0.2 AR(1): -0.1 AR(1): -0.2
0.6
confidence coverage
0.9 0.8 0.7
MA(1): 0.1 MA(1): 0.2 MA(1): -0.1 MA(1): -0.2
0.6
confidence coverage
1.0
90% level
20
40
60
80
95% level
80
100
1.0 0.9 0.8 0.7
AR(1): 0.1 AR(1): 0.2 AR(1): -0.1 AR(1): -0.2
0.6
confidence coverage
1.0 0.9 0.8 0.7 0.6
confidence coverage
60
95% level
MA(1): 0.1 MA(1): 0.2 MA(1): -0.1 MA(1): -0.2
20 (b)
40
sample size n
(c)
sample size n
(a)
20
100
40
60
80
sample size n
20
100 (d)
40 60 80 sample size n
100
Fig. 4. Plots of simulated coverages of 90% and 95% con%dence intervals vs. sample size for method TLH.
on approximating the error structure using MA(1) while the BP uses an AR(p) approximation for the error structure. In the case of mild correlation, these give good approximations to the error structure. We have considered a constant variance and a constant correlation parameter in the above simulations. We now consider the sensitivity of those methods if we allow the variance and correlation parameter to vary over time. One simulation is conducted for the following error model: i = w i +
1; i wi−1 ;
i = 1; : : : ; n;
where the wi ’s are white noise, and parameter 1; i is not constant. Three cases are considered for parameter 1; i . Case I, 1; 1 =· · ·= 1; n=2 =0:1; 1; n=2+1 =· · ·= 1; n =0:2; case II, 1; 1 = · · · = 1; n=2 = 0; 1; n=2+1 = · · · = 1; n = −0:1; and case III, 1; i = 0:1 + i=n; i = 1; : : : ; n. Both the variance and correlation parameter for i vary over time.
C. Field, J. Zhou / Journal of Statistical Planning and Inference 115 (2003) 425 – 439
439
Table 5 Simulated coverages for 90% con%dence intervals for nonstationary error processes Error model
n
TMA
TLH
Case I
30 60 30 60 30 60
0.883 0.889 0.886 0.895 0.872 0.885
0.878 0.882 0.895 0.902 0.871 0.880
Case II Case III
Empirical coverages for 90% con%dence intervals for TMA and TLH are presented in Table 5. These results are consistent with that in Figs. 3(a) and 4(a). 5. Conclusion From this study we can see that in linear regression with mild correlation among the errors has a nonnegligible e>ect on the coverage. Using the change of variance function, we can approximate the coverage based on uncorrelated errors. Among three proposed methods, our two methods based on the testing show improvement on the coverage, but there remains room for improvement for small and moderate sample sizes. References Bose, A., 1998. Edgeworth correction by bootstrap in autoregressions. Ann. Statist. 16, 1709–1722. Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and Their Application. Cambridge, New York. Field, C., Wiens, D., 1994. One-step M-estimators in the linear models, with dependent errors. Canad. J. Statist. 22, 219–231. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on InUuence Functions. Wiley, New York. Huber, P.J., 1970. Studentizing robust estimates. In: Puri, M.L. (Ed.), Nonpaarametric Techniques in Statistical Inference. Cambridge University Press, Cambridge. Huber, P.J., 1973. Robust regression: asymptotics, conjectures and monte carlo. Ann. Statist. 1, 799–821. Huber, P.J., 1981. Robust Statistics. Wiley, New York. Lumley, T., Heagerty, P., 1999. Weighted empirical adaptive variance estimators for correlated data regression. J. Roy. Statist. Soc. B 61, 459–477. Sche>e, H., 1959. The Analysis of Variance. Wiley, New York. Smith, B., Field, C., 1993. Variance estimation for quadratic statistics. J. Time Ser. Anal. 14, 381–395.