COMPUTATIONAL STATISTICS & DATAANALYSIS ELSEVIER
Computational Statistics & Data Analysis 25 (1997) 43-53
A resampling method for regression models with serially correlated errors Jan Christoffersson* Swedish University of Agricultural Sciences, Department of Forest Resource Management and Geomatics, S-901 83 Umec~, Sweden Received December 1995; revised September 1996
Abstract
A method for using bootstrap in regression models with serially correlated errors to estimate the variance of the parameter vector is proposed. This method uses the real Fourier transform to eliminate the serial correlation of the errors. In the resulting heteroscedastic regression model, resampling of (yi, x~) pairs is performed. The conditional distribution of the bootstrap replications of the ordinary least-squares (OLS) estimator of the parameter vector is shown asymptotically to be close to the sampling distribution of the OLS estimator. The performance of the method is illustrated via simulation. @ 1997 Elsevier Science B.V.
Keywords: Resampling; Regression model; Serial correlation; Fourier transform AMS Classification: 62G09; 62J05; 60G10
1. Introduction This paper deals with bootstrapping of linear regression models where the random errors are serially correlated. Bootstrap, or resampling is a computer-intensive method to estimate the variance or distribution of an estimator of some parameter of interest. In this paper, the estimator which is bootstrapped is the OLS estimator of the parameter vector. Bootstrap has previously been applied to regression problems where the observations have been independent or consistent estimators of the covariance matrix have existed (see Freedman, 1981, 1984; Wu, 1986; Belyaev, 1995). Work has also been done for time series. In the time domain, resampling * Tel.: +46 90 16 66 46; Fax: +46 90 77 81 16; E-mail:
[email protected]. 0167-9473/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved I'll S 0 1 6 7 - 9 4 7 3 ( 9 6 ) 0 0 0 8 1-3
44
J.
Christoffersson/Computational Statistics & Data Analysis 25 (1997) 43-53
of blocks (see Kiinsch, 1989; Politis and Romano, 1994), and resampling residuals from fitted autoregressive or moving-average models have been proposed. Methods for resampling in the frequency domain have been studied by Hurvich, et al. (1991), Franke and Hiirdle (1992), Nordgaard (1992) and Janas and Dahlhaus (1994). The method proposed in this paper uses the real Fourier transform to transform the model from a regression model with correlated, homoscedastic errors to a regression model with approximately uncorrelated, heteroscedastic errors. The heteroscedastic model can then be bootstrapped by resampling (yi, x~) pairs.
2. The basic regression model The model considered in this paper is
yt=x~fl+e,,
t=l,...,T.
(2.1)
The errors et are assumed to be a stationary stochastic process with expectation zero and autocovariance E(etet+s)=y~(Is[) and distributed independently of x~, the p × 1 vector of explanatory variables, xt is either a vector of fixed constants or a stationary stochastic vector. Both et and xt are assumed to be general linear processes of the form O<3
et =
~ U=
gut~t_u,
(2.2)
--OG
where the 6t are i.i.d, with E ( f t ) = 0 and E(64) < cx~ and satisfying
Igullul 1/= < U=
(2,3)
--O~
The corresponding representation and mixing condition for the multivariate xt series can be found in Hannan (1970). In matrix notation the model is
y = Xfl + e,
(2.4)
where y = (Yt) is a T x 1 vector and X = (x~) is a T × p matrix. The covariance matrix of e is written E(ee')----F where/" is such that the (i,j)th element is 7~(li-jl ). The OLS estimator of fl
(x'x)-'X'y is a consistent estimator of ft. However, the usual estimator of the variance of VOLS= a E ( x ' x ) - 1 , where
~2_
l_~(y,y_ fiX'y) T-p
(2.5)
J. Christoffersson/Computational Statistics & Data Analysis 25 (1997) 43-53
45
is a biased estimator of the variance of ~. If xt is fixed the variance of ~ is given by VOLS = ( X t X ) - I x t F x ( x t x )
-1.
(2.6)
This is also the conditional variance (conditional on X) of ~ when xt is stochastic. An explicit expression for the unconditional variance is not easily obtained. The variance of the asymptotic distribution of ~ is considered in Section 6.
3. Resampling regression models
Resampling regression models can be performed by sampling from the (Yt, X~) pairs to give the bootstrap sample ( y * , x * ' ) , (Y2* ,X2*' ),'- .,(Yr,Xr* *'). This sample is used to compute a bootstrap replication ~* as the OLS estimator based on the bootstrap sample
= (x*'x*
x * ' y *,
(3.1)
where X* is the T x p matrix with tth row x t, t and y , is the T × 1 vector with tth element y*. This is repeated B times and the distribution of the bootstrap replicates ~* - l~, j = 1,... ,B is used as an approximation of the distribution of i ~ - j0 where is the OLS estimator based on the original sample. The variance of ~ is estimated by B
v*-- 1 B-1
~ (ft, _ ~)(fi, _ fi),,
(3.2)
j=l
where ~* is the OLS estimate based on the jth bootstrap sample. Since the resampling process resamples x~, the distribution of fi* will mimic the unconditional distribution of fi, hence the unconditional variance is estimated. If the x~'s are fixed or the conditional variance is sought, resampling of residuals can be used as an alternative (see Freedman, 1981). However, resampling of residuals requires that the et's are homoscedastic. This is not necessary when resampling pairs. Whether the conditional or unconditional variances are to be estimated depends on the problem at hand. Asymptotically, they are equal but will generally differ for finite samples. If the et's are correlated these resampling schemes will result in biased estimators. A method for overcoming this difficulty is given in the following.
4. The real Fourier transform
Let Ut, t = 1 , . . . , T , be generated by a general linear process of the form (2.2) satisfying the mixing condition (2.3), with autocovariances 7u(h)= E(UtUt+h). This sample can be written in vector form as the T x 1 vector U = ( U t ) , with E ( U ) = 0
46
J. Christoffersson/ Computational Statistics & Data Analysis 25 (1997) 43-53
and V a r ( U ) = F. The real Fourier transform (RFT) is a transformation of the vector U which results in a vector Wv with approximately uncorrelated elements (see, for example, Harvey 1978). The RFT of the vector U is defined by wv = Z U ,
where the elements of the orthogonal matrix Z are given by zij = T -1/2,
i = 1,
zij=(2/T)l/2cos(7~i(jzij=(2/T)l/2sin(rt(izij=T-l/2(-1)
j+l,
1)/T),
i=2,4,6,...,T-
1 ) ( j - 1)/T), i=T
2 or T -
1,
i - - - - 3 , 5 , 7 , . . . , T - 1 or T
if T i s even, j = I , 2 , . . . , T .
The covariance matrix of Wv is then Var(wv ) = Z F Z '
~- 2reD,
where D is a diagonal matrix with the spectral densities of Ut defined as fu(),k) = 2-~1 ~
7 u ( h ) e -ih'~k,
21c = 2~k/T, k = O, . . . , T - 1
on the diagonal. The order in which the elements appear on the diagonal are given in Fuller (1976). The data are transformed by the RFT from a sequence of correlated variables U ( t ) , t = 1 . . . . , T with constant variance to a sequence of approximately uncorrelated random variables w v ( k ) , k = 1 , . . . , T which are heteroscedastic.
5. The use of the RFT in regression problems The property of the RFT to transform a sequence of correlated variables to a sequence of approximately uncorrelated variables can be very useful in regression problems. Applying the RFT to (2.4) yields wy = w:# +
(5.1)
where W~--ZX and E(w~w'~) ~= 2 n D where D contains the spectral densities of et on the diagonal. Note that if OLS is used to fit (2.4) the same estimator of 18 is obtained as when fitting (5.1) since ~ = ( Wxt Wx)-1Wxlwy = ( X I Z t Z X ) - I x t Z I Z y
= (XtX)-aXty.
This implies that the OLS estimator has two interpretations. One is as the OLS estimator of 18 in model (2.4) with correlated but homoscedastic errors and the second as the OLS estimator of 18 in the transformed model (5.1) with uncorrelated but heteroscedastic errors. A more efficient method for fitting (2.4) or (5.1) is generalized least squares (GLS). In the time domain the correlation structure of the et is modelled and F is estimated. In the frequency domain, on the other hand, the spectral density
J. Christoffersson/ Computational Statistics & Data Analysis 25 (1997) 43-53
47
function is estimated. These estimates can then be used for GLS estimation in the different domains. GLS estimation in the frequency domain is sometimes preferred when dealing with large data sets since only weak model assumptions have to be made. This is because f ( 2 ) may be estimated by non-parametric methods. To obtain a consistent estimator of the variance of 16 in the time domain the covariance matrix F must be estimated. In the frequency domain, we have the choice of either estimating f ( 2 ) or using a modified version of Whites (1980) estimator applied to the RFT data, yielding
= (
-1 P(
-1,
(5.2)
where V = T -l W~'XW~ and ,~ = diag(w2/(1-W'xi(W~Wx)-lwxi)) where Wei = wyi-W~xi~ is the ith RFT residual. This estimator is consistent even in the case of heteroscedastic errors implying that it can be used to estimate i~'s variance in the frequency domain. This estimator is similar to the weighted Jackknife proposed by Hinkley (1977).
6. Asymptotic justification of the bootstrap Freedman (1981) shows that the resampling of pairs method is valid in heteroscedastic linear models under the assumption of independence. In this section, the use of resampling of pairs in the approximately uncorrelated model induced by the RFT is justified. This is done by showing that the sampling distribution of the OLS estimator will be close to the distribution of the bootstrap replications of 1~. Consider first the asymptotic distribution of the OLS estimator in models with serially correlated errors. Let the model be (2.4) with stochastic x. The asymptotic distribution of the OLS estimator is given by v / T ( ~ - 16) ~ N(O, Q-1 VQ-~), where Q : Cxx(0), the covariance matrix of xt. The matrix V has (i,j)th element 2n f~_~ f ( 2 ) f i j ( 2 ) d 2 where fij(2) is the crosspectral density of xt, i and xt, j. When x is fixed the same limiting distribution is obtained if we let V = l i m r ~ T -1 X ' F X : ~h~=_o~ y~(h)cxij(h) where Cxij(h) is defined as
Cxij(h~,, = lim T---* ~
x - ~ T - - 1-- h 2_.t=0
Xt, iXt+ h,j
T
and Q = limT__.~T-1X'X; see Brillinger (1975) and Fuller (1976) for details and further references. Since x / - f ( ~ - 16) converge in distribution to a random variable with a N(0, Q-1 VQ-1 ) distribution it follows that sup
- 16 ) < z } -
,VQ , { z } l - - +
0,
z
where ~Q-1VQ-' is the distribution function of a N(0, vector.
Q-1 VQ-~)
distributed random
J. Christoffersson/ Computational Statistics & Data Analysis 25 (1997) 43-53
48
For the bootstrap, consider the resampling of (y~,x~) pairs in the RFT transformed model. For a given sample, the bootstrap samples are drawn from the population Wy i :
W;i ~ ' ~ Wei,
i = 1,..., T
where fl is the OLS estimate and Wei the ith RFT residual. Because of this relation the resampling of (Wyi, W'xi) pairs is equivalent to the resampling of (W'xi,Wei) pairs. Denoting the ith resampled pair by (w*',w*), one of the simple consequences of OLS estimation and the resampling procedure is that w*' is a random vector and E(w*'w*) = O, E[w*w*'] = T - 1 WxtWx : 0 a n d E[(we*i)2w~.Wx*i]= V. The justification for the resampling of pairs method is as follows: Theorem 1. Under the assumptions of model (2.4), with the additional assumption
that the ~t 9eneratin9 the xt series have finite sixth moment, the conditional distribution of x/-T(fl* - f l ) is close to the true samplin9 distribution of x/-T(fl- [J) in the sense that sup I P { v ~ ( f l * - ~) < x I ( y , X ) } - P { x / T ( f l -
fl) <
x}l--~ 0
x
in probability as T ~ c~. Proof. Write the bootstrap estimator of fl as :
~x
,
Wy : fl'-~-
Wx (Tx,)
-,
Wx*'W~
Using a generalization of the Berry-Essern theorem to R p by Bhattacharya (1975) we obtain sup IP{T -1/2"''*' ~x we* < z I ( y , X ) } - ~ { z } l z
< -C(p)~
--
T1/2
'
(6.1)
where C(p) is a universal constant depending only on p, f~ = Ellwxi*' we~ll * 3, and II-II denotes the Euclidean norm. With the assumption that bt have finite sixth moment it can be shown that fi is bounded in probability. Multiplying T-1/2 Wx,'W, from the left by (Wx*' W~*/T) -1 and conditionally on ( y , X ) we find that p lim(Wx*' Wx*/T) -1 = and hence
sup IP{v~(~*- ~) < z l ( y , X ) } -
~6_1~6_,{z}1-,
0
(6.2)
Z
in probability. Since p lim Q = Q and p lim V : V it follows that (6.2) is valid also if Q-1VQ-~ is replaced by Q-1VQ. An application of the triangle inequality yields the result. [] Given this result it seems reasonable to use the distribution of the bootstrap^replications of x/-T(fl* - fl) as an approximation to the true distribution of x/-T(fl - fl) even for moderately large data sets.
J. Christofferssonl Computational Statistics & Data Analysis 25 (1997) 43-53
49
7. Simulation study A small simulation study was conducted to illustrate the performance of the proposed resampling method. Data are generated from the model Yt = flxt + et,
t = l,...,T
The simulations were performed with a stochastic x variable, i.e. a new x-vector was simulated for each new replicate. Both et and xt are generated as AR(1) processes with E(xt) = E ( g t ) = O. The first observation in each series, 81 and Xl are simulated as N(0, 1/(1 - p ~ ) ) . For t ----2 , . . . , T , et are generated by et = pegt_l-~-ut and the xt by x, = pxXt-1 + yr. The ut are i.i.d. N(0, 1 ) and the vt are i.i.d. N(0, (1 - px2)/(1 - Pc)).2 fl is set to 1 and different combinations of p, and Px are used. The reason for choosing these specific variances for xt is that for a fixed r, the expected values of the coefficient of determination (R 2) are approximately equal for all combinations of p~ and Px. For this case with fl = 1, E ( R 2) ~ 0.5. The lengths of the series are set to T -- 16, 32, 64, 128, 256, 512 and 1024. 5000 replications are simulated for each sample size. The number of bootstrap samples used in each replication is B = 500. The variance estimators studied are the resampling of pairs variance estimator, v* (3.2) and Whites (5.2) variance estimator, v'z¢. These estimators are compared in terms of relative bias. The variance of fl is determined in a separate simulation study with 100000 replicates. The sample variance of these estimates, denoted OVoLs, is used as an estimate of the variance of/~. The averages over the 5000 replicates of vw and v*, denoted o~w, and or*, respectively, are used to calculate the relative bias as rb'~w = (vw - OVOLs)/OVOLS and analogously for the other estimators yielding rbv*. The standard errors for rb~w and rbv* are not reported in the figures. These ranged from 2% in the shorter series to 0.3% in the longer ones. The efficiency of the bootstrap with respect to Whites variance estimator, defined as the quotient between the mse of o~w and or*, ranged between 0.6 and 1.45 depending on Px, P~ and T. The efficiency being higher for shorter series and large correlation in the error terms. Correlation among the errors, especially for small samples, will carry over to the transformed errors to some extent. This causes the variance estimators to be negatively biased. The effects of correlation between the x's for vw is that higher correlation causes more bias. For v*, some interaction effects seem to be present. If p, is high, v* will become more negatively biased, as Px increases. If p~ is low, v* becomes more positively biased, as Px increases. A plausible explanation is that high correlation in x results in a very peaked spectral density for the Wxi'S. This means that some of the Wxi'S will have a large variance resulting in realizations with a few high leverage points. Whether these observations are included in the bootstrap samples or not has a large influence on v*. Samples not including the high leverage points result in ~*'s with higher variances than the samples including these points. Since the original sample contains these points, the variance estimator will be positively biased. As a whole, Whites estimator has most bias for large p~ and Px. The bootstrap estimator has its largest bias for high Px. The relative biases for
50
J. Christoffersson I Computational Statistics & Data Analysis 25 (1997) 43-53
0,4-
0,4 0,2
0,2 0 ),8
-05
-0,:
0,8 5
Px
Px P~
0,8
U,,L~
b
P~
0,8
Fig. 1. Observed relative bias of (a) v*, T = 32; (b) v'rv, T = 32.
0,05-
0,05 0,03 0,01 -0,01 -0,0~
0,8
-0,0,~
43,0,~
5 )1
a
P~
0,8
Px
v,v
b
Pe
0,8
Fig. 2. Observed relative bias o f (a) v*, T = 512; (b) vrv, T = 512.
the different correlations at samplesize T = 32 and T = 512 are shown in Figs. 1 and 2. Two types of confidence intervals f o r / / w e r e also compared. These were an interval based on Fw and the normal or t distribution (WCI) and the bootstrap percentile confidence interval (BPCI). The BPCI is constructed by using the ~/2 and (1 - ~ / 2 ) percentiles of the bootstrap distribution as endpoints of the confidence interval. The comparison is made with respect to confidence levels. The choice of the bootstrap percentile confidence interval is made because of its ease of computation and the belief that these confidence intervals are often used by practitioners. For details and examples of more efficient confidence intervals, see Efron and Tibshirani (1993) and references therein. Figs. 3 and 4 report the observed confidence levels and the average length of the intervals for the 5000 replications. The nominal confidence level is 95% and the standard errors of the observed confidence levels are about 0.3%. Both BPCI and WCI seem to perform well. The intervals tend to be somewhat short for short series but as the length of a series increases, the observed confidence levels
J. Christoffersson/ Computational Statistics & Data Analysis 25 (1997) 43-53
51
0,95-
0,95
0,93
0,93
0,91
0,91 0,8c;
0,89 0,87
0,8; 0,8~
],2
0,2
0,8~
Px
Px p~
0,8
b
P~
0,8
Fig. 3. Observed confidence level of (a) BPCI, T -- 32; (b) WCI, T = 32
0,955 0,950,945 0,94 0,93~ ],2
0,9"
3,2
Px P~
0.8
Px
b
P,~
0,8
Fig. 4. Observed confidence level of (a) BPCI, T = 512; (b) WCI, T = 512
come closer to the nominal level. For small samples, the BPCI has an advantage over WCI when the x t ' s have a high correlation. In this case the BPCIs observed confidence levels are closer to the nominal and have a shorter average length. This probably reflects the fact that the distribution is skewed and that these intervals are not forced to be symmetric like the WCIs are. Note that the unconditional distribution of ~ does not have to be normal even if, as in this simulation study, both xt and et are normally distributed.
8. Discussion The main advantage of the proposed resampling method when compared to other resampling methods for time series is its simplicity and the sparseness of model assumptions which have to be made. No modelling of correlation structures or spectral densities are necessary. The drawback when using the method for regression analysis
52
J. Christoffersson/Computational Statistics & Data Analysis 25 (1997) 43-53
is that it is limited to resampling of pairs. When working with fixed x, or when the analysis is conditional on x, bootstrapping of residuals is usually preferred. However, since the asymptotic distribution of ~ is the same irrespective of how x is treated, resampling of pairs may just as well be used for large samples. A method for bootstrapping time series in the frequency domain has previously been given by Hurvich et al. (1991). They estimated the spectral density function and transformed the data to approximately i.i.d, observations which were resampled and retransformed. Their method could be used for regression analysis and would have the advantage that residuals could be resampled. On the other hand, estimation of the spectral density function would require a large data set to be satisfactory. A large data set implies that the unconditional and conditional distributions are very similar. The need to use resampling of residuals to estimate the conditional variance would then be small since the unconditional distribution could be estimated by resampling pairs. To compare the bootstrap variance estimator to the conventional variance estimator might not seem to be a very interesting comparison since it is known that ~OLS is biased in the presence of serial correlation. There are also better estimators than when serial correlation is present as mentioned above (GLS). There are however, reasons why this comparison is important. If the assumptions about the form of the correlation structure are false, the estimated weights used in the GLS estimation will not result in uncorrelated errors. In this case GLS estimation is just an OLS estimation in a regression model with serially correlated errors and the associated variance estimator is the conventional variance estimator in the same serially correlated model. Another important area of application lies in the construction of confidence intervals. The bootstrap percentile confidence interval is not optimal. As seen from the simulation study, the small sample properties were no better than those obtained by using Whites estimator. More refined techniques for confidence interval construction could be used.
References Belyaev, Y.K., Bootstrap, Resampling and Mallows Metric. Lecture notes, No. 1, 1995. (Institute of Mathematical Statistics, Ume~ University). Brillinger, D.R., Time series: data analysis and theory (Holt, Rinehart and Winston, Inc., New York). Dahlhaus, R. and D. Janas, A frequency domain bootstrap for ratio statistics in time series analysis (Institut fiir Angewandte Mathematik, Universit~it Heidelberg 1995). Efron, B and R.J. Tibshirani, An introduction to the bootstrap (Chapman & Hall, New York, 1993). Franke, J. and W. H/irdle, On bootstrapping kernel spectral estimates, Ann. Statist., 20 (1992) 121-145. Freedman, D.A., Bootstrapping regression models, Ann. Statist., 9 (1981) 1218-1228. Freedman, D.A., On bootstrapping two-stage least-squares estimates in stationary linear models, Ann. Statist., 12 (1984) 827-842. Fuller, W.A., Introduction to statistical time series (Wiley, New York, 1976). Hannan, E.J., Multiple time series (Wiley, New York, 1970). Harvey, A.C., Linear regression in the frequency domain, lnt. Econ. Rev., 19 (1978) 507-512. Hinkley, D.V., Jackknifing in unbalanced situations, Technometrics, 19 (1977) 285-292. Hurvich, C.M., J.S. Simonov and S.L. Zeger, Variance estimation for sample autocovariances: direct and resampling approaches, Austral. J. Statist., 33 (1991) 23-42.
J. Christoffersson/ Computational Stat&tics & Data Analysis 25 (1997) 43-53
53
Kiinsch, H.R., The Jackknife and the bootstrap for general stationary observations, Ann. Statist., 17 (1989) 1217-1241. Nordgaard, A., Resampling stochastic processes using a bootstrap approach, in: K.-H., J6ckel, G. Rohde and W. Sendler (Eds.), Bootstrapping and related techniques, Lecture Notes in Economics and Mathematical Systems, Vol. 376 (Springer, Berlin, 1992). Politis, N. and J.P. Romano, The stationary bootstrap, JASA, 89 (1994) 1303-1313. White, H., A heteroscedacity-consistent covariance matrix estimator and a direct test for heteroscedacity, Econometrica, 48 (1980) 817-838. Wu, C.F.J., Jackknife, bootstrap and other resampling methods in regression analysis, Ann. Statist., 14 (1986) 1261-1295.