International Journal of Forecasting 31 (2015) 587–597
Contents lists available at ScienceDirect
International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast
Testing for multiple-period predictability between serially dependent time series Chris Heaton Macquarie University, Department of Economics, Balaclava Road, North Ryde, New South Wales 2109, Australia
article
info
Keywords: Covariance estimation Causality Statistical tests Inflation forecasting Macroeconomic forecasting
abstract This paper reports the results of a simulation study that considers the finite-sample performances of a range of approaches for testing multiple-period predictability between two potentially serially correlated time series. In many empirically relevant situations, but not all, most of the test statistics considered are significantly oversized. In contrast, both an analytical approach proposed in this paper and a bootstrap are found to have accurate empirical sizes. In a small number of cases, the bootstrap is found to have a superior power. The test procedures considered are applied to an empirical analysis of the predictive power of a Phillips curve model during the ‘great moderation’ period, which illustrates the practical importance of using test statistics with accurate empirical sizes. © 2015 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction Tests of multiple-period-ahead predictability are complicated by the fact that the prediction errors necessarily have a moving average structure, due to the overlap of successive prediction periods. In a theoretically ideal large sample setting, this does not present any problem for applied researchers, since there exist many covariance matrix estimators that are consistent in the presence of this type of autocorrelation. However, for samples of the sorts of magnitudes that are encountered most frequently in economics, test statistics constructed using many of the wellknown alternative heteroscedasticity and autocorrelation consistent (HAC) covariance estimators may be severely oversized. While this issue has received some attention in the literature (see Ang & Bekaert, 2007; Hodrick, 1992; Kilian, 1999; Nelson & Kim, 1993; Richardson & Smith, 1991; Smith & Yadav, 1996; Wei & Wright, 2009), the context has often been the prediction of asset returns, and so the null model has usually been a martingale difference sequence
E-mail address:
[email protected].
(MDS). Consequently, while this body of literature has found evidence of significant size distortions when using well-known techniques for dealing with autocorrelation, and has suggested some superior methods, these methods are not usually directly applicable to cases in which the predicted variable is serially correlated under the null, as would be expected for macroeconomic series and many other applications of interest. Previous work that has considered a serially correlated predicted variable includes that of Lütkepohl and Burda (1997), who consider the Wald test in the context of a vector autoregression (VAR); Dufour, Pelletier, and Renault (2006), who use a parametric bootstrap to circumvent the technical difficulties of the Wald test; Pesaran, Pick, and Timmermann (2011), who pool non-overlapping regressions and propose a SURE estimator for a factor-augmented VAR; and Britten-Jones, Neuberger, and Nolte (2011), who propose a transformation to account for the serial correlation induced by the construction of the overlapping dependent variable, and deal with serial correlation in the variable from which it is constructed using the Newey–West estimator. In this paper, the covariance estimator that was proposed by Hodrick (1992) for the multiple-period prediction
http://dx.doi.org/10.1016/j.ijforecast.2014.09.004 0169-2070/© 2015 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
588
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
regression of a variable that is an MDS under the null is generalized to cover cases in which the predicted variable is serially correlated. A simulation study is then conducted that compares the small-sample performance of a test for multiple-period predictability based on this estimator to those of a selection of other approaches that might be considered by applied researchers. The simulation study considers a range of prediction horizons, sample sizes, and degrees of serial correlation in both the predictor and predicted variables, and the results provide clear guidance for researchers who are interested in testing multiple-period predictability. West (1997) also proposed a generalization of the Hodrick (1992) estimator for dealing with regression models with moving average errors. However, his approach differs from that proposed in this paper. Furthermore, the simulation study reported in West’s paper considers only moving average orders of one and two, which is of little interest for applications in which predictions are being made more than three periods ahead.1 Other authors have reported simulation studies that consider longer horizons for test statistics based on different estimators. Simulation studies by Ang and Bekaert (2007), Britten-Jones et al. (2011), Dufour et al. (2006), and Smith and Yadav (1996) found the kernel-based estimators of Andrews (1991) and Newey and West (1987) to provide test statistics for multiple-period predictability that are significantly oversized. Smith and Yadav (1996) found that the statistics were still oversized when the prewhitening procedure of Andrews and Monahan (1992) was used. Similar results were found for the Hansen and Hodrick (1980) estimator by Ang and Bekaert (2007) and Smith and Yadav (1996). In contrast, Dufour et al. (2006) and Kilian (1999) found that a bootstrapped statistic may have an accurate empirical size, and Ang and Bekaert (2007) found a similar result for the method of Hodrick (1992). The simulation study reported in this paper extends this literature in several ways. Firstly, in contrast to the studies by Ang and Bekaert (2007) and Smith and Yadav (1996), this study considers serial correlation in the predicted variable, and, in contrast to Britten-Jones et al. (2011) and Dufour et al. (2006), a range of different strengths of this correlation are considered. Like Smith and Yadav (1996), I consider different strengths of the serial correlation in the predictor variable. Secondly, the present study considers a wider range of prediction horizons and sample sizes than has been considered in previous studies. Thirdly, this study considers nine different alternative test procedures, in contrast to the studies by Ang and Bekaert (2007), Dufour et al. (2006) and Smith and Yadav (1996), which consider seven, three and two, respectively. As a consequence, it provides a comparison of a wide range of test statistics within a single study design. The remainder of this paper is structured as follows. In Section 2, the generalization of the Hodrick (1992) covariance estimator is derived, and compared to the alternative generalization due to West (1997). In Section 3, the Monte
1 As is well-known, and as Section 2 shows, an h-step-ahead prediction regression has an error that follows an MA(h − 1) process.
Carlo simulations are presented. Section 4 presents a brief application of all of the test procedures considered to the question of whether an expectations-augmented Phillips curve model was able to predict inflation over horizons of four to 12 quarters during the ‘great moderation’ period between the 1980s and the start of the financial crisis in 2008. Section 5 provides some concluding comments. 2. The estimator and test statistic Suppose that we wish to test the null hypothesis that a vector wt does not predict the change in a scalar variable (1) yt over h time periods. Let yt = yt +1 − yt be an observable variable. The change in yt over h time periods is then h−1 (1) (h) yt = k=0 yt +k . It is assumed that, under the null hy(1)
pothesis, yt may be approximated well by a stable, finiteordered autoregression (1)
yt
= β0 +
p
βj yt(1−)j + εt +1 ,
t = p + 1, . . . , T − 1. (1)
j =1
The technical assumptions for what follows are that (under the null hypothesis) E (εt +1 |Ft ) = 0, where Ft = σ (εt , wt , εt −1 , wt −1 , . . .); that the fourth moments of εt and wt are finite; and that the characteristic roots of Eq. (1) lie outside the unit circle. Autoregressions are used widely in applied prediction problems, often quite successfully. Nonetheless, not all economic variables can be represented adequately by an autoregression. The working assumption made in this paper is that a stable, finite-order autoregression approximates the process of interest sufficiently well to provide errors that, at most, differ from an MDS only negligibly. Standard statistical tools exist for estimating the order and assessing the fit of an autoregression, and these should be applied prior to the application of the procedures proposed in this paper. What follows may be understood more easily by first considering a simple special case. Set the order (p) in Eq. (1) to one, set β0 = 0, and set the prediction horizon h (2) (1) (1) to two. Since yt = yt +1 + yt , it is simple to show that (2)
yt
= by(t 1−)1 + ηt ,
where ηt = (β + 1)εt +1 + εt +2
and b = β(1 + β).
(2)
Let bˆ be the OLS estimator of b. Then
T (bˆ − b)2 = ST
where ST =
T −2 1 (1)2 y T t =2 t −1
T −2 T −2 1
T t =2 s=2
−2 ,
1) ηt ηs y(t 1−)1 y(s− 1,
(3)
and the estimation of the variance of bˆ requires the construction of an estimator of E (ST ). This task is the main topic of the present paper. Popular approaches include the kernel-based estimators of Andrews (1991), Andrews and Monahan (1992) and Newey and West (1987). The new approach proposed here is related to, but distinct from, the approach taken by West (1997). Let (1)
(1)
gt = εt +1 ((β + 1)yt −1 + yt −2 )
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
(1)
(1)
and e(2) = (β + 1)ε3 y1 + εT yT −2 . Note that T −2
ηt y(t 1−)1 =
t =2
T −2
(1)
yt −1 ((β + 1)εt +1 + εt +2 )
t =2 T −2
=
gt + e(2).
(4)
t =3
Since, under the previously stated assumptions, εt is an MDS, it follows that T −2 1 ′ E (ST ) = E gt gt + O(T −1 ). T T t =3
1
1
T
ST
=
T −2 1 ′ E gˆt gˆt . T t =3
(h)
= c + b′1 zt + ηt ,
(6)
t = p + 1, . . . , T − h,
(7)
where c and b1 are functions of the parameters of Eq. (1)3 and zt = (y(t 1−)1 y(t 1−)2 · · · y(t 1−)p )′ . Of particular h−1 k ′ interest is the form of the error term ηt = k=0 j=0 e1 Ak−j e1 εt +j+1 , where A is the companion matrix of Eq. (1). Changing the order of the summation, it may be shown that
ηt =
h
θj εt +j where θj =
j =1
h
e′1 Ak−j e1 .
(8)
k=j
Thus, as is well-known, the h-step-ahead regression model has MA(h − 1) errors. Of particular interest for what follows is the fact that, if the 1-step-ahead variable has an autoregressive representation, then the moving average parameters are known functions of the autoregressive parameters, and may therefore be estimated by constructing the appropriate functions of the OLS estimator of the autoregression. Including the predictor variable wt , Eq. (7) may be rewritten as (h)
yt
= b′ xt + νt ,
(9)
2 See Davidson (2000, p. 68). 3 Similar calculations (with more detail) are provided by Dufour et al. (2006).
T t =p+1 s=p+1
ηt xt ηs x′s .
(10)
It follows from Eq. (8) that
Now consider the general case with the inclusion of a predictor variable wt . Writing Eq. (1) in companion matrix form,2 recursive substitution yields yt
T −h T −h 1
(5)
Since the O(T ) term, which is due to the sum of the endpoint terms e(2), is asymptotically smaller in magnitude than the estimation error under standard assumptions, it is of no consequence in large samples, and can be ignored in the construction of an estimator of E (ST ). Note that Eq. (5) is similar to Equation 2.2 of West (1997). However, West assumed covariance stationarity in his derivation— a restriction that is clearly not necessary and may be undesirable for applications in finance and macroeconomics. Eq. (1) may be estimated by OLS in order to yield an estimator of the autoregressive parameter βˆ and the residuals εˆ t . From these, gˆt = εˆ t +1 ((βˆ + 1)y(t 1−)1 + y(t 1−)2 ) may be calculated and the estimator of E (ST ) constructed as
where xt = (1 zt wt )′ and b = (c b1 b0 )′ . Under the null hypothesis that wt does not predict the h-period change in yt , b0 = 0 and νt = ηt . Since E (ηt |xt ) = 0, under the null hypothesis, the parameter vector b may be estimated consistently by the ordinary least squares (OLS) method. The estimation of the covariance matrix of bˆ under H0 requires the estimation of E (ST ), where ST =
−1
E
589
T −h
T −h h
ηt x′t =
t =p+1
θj εt x′t −j + e(h),
(11)
t =p+h+2 j=1
where e(h) is the sum of h(h − 1) endpoint terms. Defining g t = εt
h
θj xt −j ,
(12)
j =1
it follows from the assumptions stated earlier that 1 T
T −h
1
E (ST ) =
E gt gt′ + O(T −1 ).
T t =p+h+2
(13)
The West (1997) approach to covariance estimation involves, first, estimating Eq. (7) to produce a sequence of residuals ηˆt ; second, estimating the parameters and errors h in the MA model ηˆ t = j=1 θj εt +j ; third, using these estimates θˆj and εˆ t to construct gˆt = εˆ t
ˆ j=1 θj xt −j ; and finally, ′ T −h 1 1 ˜t g˜t . constructing the estimator E ( T ST ) = T t =p+h+2 E g In contrast, the approach proposed in this paper involves exploiting the known relationship between the autoregressive parameters of Eq. (1) and the moving average parameters of Eq. (7) given by Eq. (8). Specifically, the proposed method is as follows: h
(h)
ˆ = b′ xt + νt using OLS to yield b. p (1) 2. Estimate yt = β0 + j=1 βj yt −j + εt by OLS. Denote the estimators of the parameters as βˆ 1 , βˆ 2 , . . . , βˆ p , and the residuals as εˆ t . 3. Construct the estimatedcompanion matrix
1. Estimate yt
(1)
βˆ 1
βˆ 2
1 Aˆ = 0
0 1
··· ··· ···
···
1
0
βˆ p 0 0 0
.
′ ˆ k−j 4. Construct the estimators θˆj = e1 for j = k=j e1 A 0, . . . , h − 1, where e1 is a p × 1 vector of zeros, with a one in the first element. Note that θˆj is the top left
h
element of
h
k=j
Aˆ k−j .
5. Construct gˆt = εˆ t
h
j=1
θˆj xt −j .
6. Construct the estimator of T1 ST ,
E
1
T
ST
=
1
T −h
T t =p+h+2
E gˆt gˆt′ .
(14)
590
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
This procedure offers two main advantages over the procedure of West (1997). Firstly, the estimator of the covariance matrix is straightforward to compute, since it involves only basic matrix operations and some summation. In contrast, West’s procedure requires the use of an iterative algorithm to solve the non-linear optimization problem required to produce the estimates. Secondly, since it imposes restrictions that are consistent with the assumed autoregressive (1) structure of yt on the moving average coefficients of the error, it might be hoped that the finite sample performance of the proposed estimator could be superior in cases in which the autoregression is a good approximation to the process of interest. Of course, the procedure is unlikely to (1) perform well in cases where yt cannot be represented adequately by a stable, finite-order autoregression.
(h)
h−1
(1)
where yt = k=0 yt +k , and the null hypothesis H0 : b2 = 0 is tested at the 5% significance level using each of the methods described below:
• H: A t-test constructed using the new covariance estimator detailed in Section 2.
• BS: A p-value estimated using a fixed-design wild bootstrap (Gonçalves & Kilian, 2004). Each iteration of the bootstrap is computed by first simulating an IID binary sequence ut using the parameter values suggested by Mammen (1993). This is then used to construct εt∗ = ut εˆ t , where εˆ t is the sequence of residuals from an (1) autoregression for yt estimated by OLS. From this, the moving average errors may be simulated as ηt∗ =
h
j =1
θˆj εt∗+j , where θˆj is as used in the construction of
the statistic H. These errors are used with the OLS esti(h)∗ mates of b0 and b1 in Eq. (19) to produce yt = bˆ 0 + ( 1 ) (h)∗ = bˆ 1 yt −1 +ηt∗ . Finally, OLS is used to estimate b2 in yt
3. Finite-sample size properties of the test statistic
(1)
In this section, the empirical size and power of a t-statistic for testing multiple-period predictability, constructed using the approach outlined in Section 2, are investigated using Monte Carlo simulations. The objective is to gain an understanding of the way in which the performance of the test might vary according to the sample size, the prediction horizon, and the extent of the serial correlation in the predictor and predicted variables. The model will be misspecified deliberately, in the sense that, while the data will be generated by ARMA(1, 1)-GARCH(1, 1) processes, the test statistic will be constructed from an assumed model of an AR(p) filter of white noise, as described in the previous section. The order of the autoregression will be treated as unknown and estimated using the Akaike Information Criterion (AIC). Comparisons will be made to a range of alternative approaches. The data generating process for the Monte Carlo simulations is given by the following four equations: (1)
yt
= 0.5 + β y(t 1−)1 + 0.5σt −1 εt −1 + σt εt
σ = 0.1 + 0.8σ 2 t
2 t −1
+ 0.1ε
(16)
wt = 0.5 + φwt −1 + ξt εt 0 1 0 ∼N , . ξt 0 0 1
(17)
= b0 + b1 y(t 1−)1 + b2 wt + ηt ,
• •
•
•
•
(18)
The values of β and φ are each chosen from the set {0, 0.5, 0.9}, i.e., cases are considered in which the predictor variable and/or the predicted variable are serially uncorrelated, moderately serially correlated, or strongly serially correlated. The prediction horizon h is chosen from the set {4, 12, 24}, and the sample size T is chosen from the set {100, 200, 300}. These parameter ranges are chosen to reflect the prediction horizons and sample sizes that might be found in macroeconomic studies. A forecast horizon of 24 might be relevant for studies that use monthly data, but is unlikely to be of interest in studies that use quarterly or annual data. For each combination of values, 2000 samples are generated. For each of these samples, the following model is estimated by the least squares method: (h)
•
(15)
2 t −1
yt
•
(19)
•
b0 + b1 yt −1 + b2 wt + et . This procedure is repeated 1000 times and the p-values constructed from the empirical distribution of the estimates of b2 . W: A t-test constructed using the covariance estimator of West (1997). HH: A t-test constructed using the covariance estimator of Hansen and Hodrick (1980). BJNW : A t-test constructed using the method of BrittenJones et al. (2011) with the Newey–West estimator. NW94 : A t-test constructed using the Newey–West kernel-based covariance estimator, with the bandwidth selected using the data-based method of Newey and West (1994). A: A t-test constructed using a kernel-based estimator with a Quadratic Spectral kernel and the data-based bandwidth selection technique proposed by Andrews (1991). NWPW : A t-test constructed using a kernel-based estimator with a Bartlett kernel, the bandwidth selection technique of Newey and West (1994), and the VAR(1) prewhitening approach proposed by Andrews and Monahan (1992). APW : A t-test constructed using a kernel-based estimator with a Quadratic Spectral kernel, the bandwidth selection technique of Andrews (1991), and the VAR(1) prewhitening approach proposed by Andrews and Monahan (1992). TR: A t-test constructed using a kernel-based estimator with a truncated kernel (Andrews, 1991) and a bandwidth set equal to the prediction horizon.
It may be noted that the rates of convergence of the BJ NW √ , NW94 , A, NWPW and APW are strictly slower than the T rate achieved by the other estimators under appropriate conditions. However, this does not guarantee superior performances in finite samples. All simulations were written in the R programming language4 and run on a cluster of three multi-core PCs created using the Snowfall package,5 each running the
4 R Core Team (2014). 5 Knaus (2013).
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
591
Table 1 Empirical size of the test with a 5% theoretical size and φ = 0.
β
h
T
p¯
H
BS
W
HH
BJNW
NW94
A
NWPW
APW
TR
0 0 0 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24
100 100 100 200 200 200 300 300 300 100 100 100 200 200 200 300 300 300 100 100 100 200 200 200 300 300 300
3.021 3.026 2.987 3.554 3.490 3.578 3.872 3.884 3.859 3.587 3.686 3.635 4.118 4.126 4.109 4.475 4.526 4.550 3.945 3.889 3.836 4.464 4.408 4.404 4.739 4.705 4.728
0.050 0.054 0.049 0.056 0.063 0.052 0.056 0.055 0.058 0.056 0.055 0.055 0.053 0.061 0.052 0.050 0.049 0.054 0.050 0.063 0.065 0.047 0.059 0.067 0.044 0.059 0.065
0.043 0.043 0.038 0.053 0.054 0.050 0.051 0.054 0.055 0.043 0.043 0.042 0.038 0.051 0.051 0.035 0.046 0.052 0.029 0.044 0.027 0.031 0.050 0.044 0.028 0.049 0.051
0.062 0.082 0.097 0.061 0.070 0.084 0.059 0.063 0.071 0.068 0.085 0.110 0.060 0.073 0.080 0.051 0.054 0.072 0.062 0.090 0.116 0.057 0.066 0.079 0.050 0.060 0.076
0.061 0.092 0.126 0.058 0.094 0.091 0.077 0.087 0.072 0.063 0.078 0.106 0.054 0.079 0.090 0.058 0.096 0.078 0.082 0.108 0.097 0.081 0.096 0.082 0.077 0.060 0.105
0.095 0.093 0.095 0.085 0.087 0.088 0.070 0.083 0.078 0.131 0.145 0.159 0.102 0.130 0.127 0.074 0.108 0.117 0.158 0.308 0.382 0.114 0.246 0.311 0.097 0.240 0.305
0.069 0.073 0.073 0.068 0.070 0.066 0.064 0.067 0.071 0.076 0.070 0.089 0.062 0.073 0.070 0.060 0.059 0.056 0.071 0.085 0.101 0.065 0.073 0.079 0.054 0.070 0.067
0.068 0.074 0.075 0.067 0.065 0.064 0.060 0.059 0.068 0.078 0.090 0.115 0.059 0.085 0.083 0.060 0.065 0.061 0.081 0.151 0.239 0.074 0.108 0.129 0.056 0.088 0.108
0.063 0.053 0.060 0.059 0.051 0.043 0.056 0.048 0.046 0.068 0.060 0.068 0.059 0.051 0.051 0.056 0.043 0.035 0.063 0.065 0.078 0.063 0.057 0.052 0.051 0.050 0.040
0.056 0.043 0.042 0.051 0.048 0.035 0.050 0.046 0.038 0.059 0.051 0.049 0.055 0.046 0.043 0.051 0.036 0.033 0.054 0.050 0.058 0.048 0.051 0.043 0.048 0.044 0.036
0.094 0.106 0.222 0.072 0.102 0.109 0.069 0.101 0.072 0.086 0.125 0.179 0.068 0.091 0.118 0.056 0.085 0.070 0.086 0.119 0.000 0.074 0.089 0.140 0.056 0.082 0.140
Ubuntu Linux 12.10 operating system. The kernel-based estimators were computed using the Sandwich package.6 For the Hansen and Hodrick (1980) and truncated kernel approaches, the covariance matrix is not guaranteed to be positive definite. Cases in which the estimated covariance matrix was not positive definite were excluded from the estimation of the size of the test statistics. The proportions of rejections of H0 are recorded and reported in Tables 1–3. For each set of 2000 simulations (i.e., each row of the tables), the mean autoregressive order estimated by the AIC is reported in the tables as p¯ . Note that the results in these tables are generated for a theoretical size of 0.05, meaning that a good performance is indicated by a tabulated figure that is close to that value. Since each cell is generated from 2000 simulations, if the true size is 0.05, then the probability that the estimated size will fall between 0.04 and 0.06 is approximately 0.95. Note also that each row of each table was generated by a single set of 2000 simulations. As a consequence, for a given set of parameters, the rejection rates for the different test procedures are likely to be dependent. In contrast, the elements of each column were generated by independent sets of 2000 simulations. As such, for a given test procedure, the reported rejection rates for each set of parameters are independent of each other. Consider first the truncated kernel (TR). In a small number of the cases considered, the empirical sizes of statistics based on this estimator are close to the theoretical size. However, in other cases the statistics are oversized, often severely so. At its most extreme, when φ = 0.9, β = 0, h = 24 and T = 100 (see Table 3), the estimated 6 Zeileis (2004).
empirical size for the test with a theoretical size of 0.05 is 0.369. Overall, it seems reasonable to conclude that statistics based on the truncated kernel estimator of the covariance matrix should not be used when testing for multiple-period predictability. Next consider the test statistics constructed from the Newey–West and Andrews estimators without prewhitening (NW94 and A). Interestingly, the statistic constructed from the Newey–West estimator is often only slightly oversized in cases where the predictor is serially uncorrelated (Table 1). The statistic based on the Andrews estimator is only slightly oversized in cases where the predictor is serially uncorrelated and the predicted variable is not serially correlated (β = 0 in Table 1). However, it tends to be oversized when the predicted variable is strongly serially correlated (β = 0.9 in Table 1). In contrast, when the predictor is serially correlated (Tables 2 and 3), the statistics based on both the Newey–West and Andrews estimators are significantly oversized, particularly when the sample size is small and the prediction horizon long, confirming the findings of Ang and Bekaert (2007), Britten-Jones et al. (2011), Dufour et al. (2006) and Smith and Yadav (1996). Prewhitening greatly improves the size performances of the test statistics constructed from the Newey–West and Andrews estimators (see columns NWPW and APW in Tables 1–3). Now, the empirical size is quite close to the theoretical size of 0.05 when the predictor variable is serially uncorrelated (Table 1), even in cases where the predicted variable is strongly serially correlated (i.e., β = 0.9). However, the empirical size becomes inflated in cases where the predictor variable is serially correlated (Tables 2 and 3). In particular, note that, in cases in which T = 100 in Tables 2 and 3 (moderate and strong serial correlation in the predictor respectively), the empirical size is inflated
592
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
Table 2 Empirical size of the test with a 5% theoretical size and φ = 0.5.
β
h
T
p¯
H
BS
W
HH
BJNW
NW94
A
NWPW
APW
TR
0 0 0 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24
100 100 100 200 200 200 300 300 300 100 100 100 200 200 200 300 300 300 100 100 100 200 200 200 300 300 300
3.075 3.010 3.083 3.545 3.615 3.557 3.833 3.804 3.827 3.599 3.598 3.709 4.296 4.240 4.162 4.415 4.521 4.479 4.005 3.876 3.977 4.375 4.478 4.372 4.663 4.787 4.668
0.053 0.056 0.055 0.052 0.060 0.048 0.057 0.050 0.049 0.054 0.052 0.052 0.058 0.059 0.052 0.053 0.059 0.048 0.067 0.066 0.073 0.059 0.061 0.060 0.058 0.058 0.060
0.056 0.057 0.052 0.053 0.062 0.050 0.053 0.052 0.052 0.041 0.049 0.057 0.044 0.054 0.054 0.043 0.057 0.043 0.045 0.051 0.035 0.046 0.053 0.046 0.040 0.053 0.053
0.068 0.110 0.130 0.060 0.084 0.083 0.057 0.068 0.068 0.079 0.100 0.129 0.063 0.070 0.085 0.056 0.069 0.063 0.081 0.097 0.139 0.070 0.072 0.088 0.059 0.070 0.075
0.038 0.120 0.118 0.030 0.069 0.089 0.028 0.062 0.071 0.034 0.084 0.135 0.021 0.043 0.082 0.018 0.052 0.052 0.040 0.102 0.127 0.029 0.058 0.082 0.018 0.042 0.102
0.098 0.101 0.091 0.083 0.089 0.084 0.082 0.075 0.077 0.130 0.163 0.157 0.113 0.130 0.121 0.093 0.115 0.118 0.197 0.312 0.388 0.142 0.240 0.322 0.133 0.241 0.331
0.122 0.162 0.182 0.095 0.121 0.106 0.087 0.094 0.095 0.124 0.160 0.165 0.095 0.116 0.121 0.086 0.095 0.086 0.124 0.146 0.179 0.102 0.106 0.112 0.087 0.097 0.102
0.104 0.145 0.159 0.081 0.102 0.096 0.074 0.085 0.086 0.105 0.148 0.160 0.084 0.104 0.110 0.074 0.085 0.080 0.116 0.176 0.250 0.090 0.106 0.142 0.073 0.100 0.120
0.079 0.107 0.118 0.067 0.070 0.061 0.062 0.056 0.053 0.079 0.102 0.111 0.066 0.077 0.070 0.059 0.065 0.051 0.084 0.085 0.116 0.075 0.070 0.067 0.062 0.067 0.056
0.070 0.098 0.099 0.059 0.065 0.056 0.056 0.051 0.049 0.067 0.089 0.098 0.054 0.068 0.066 0.049 0.063 0.046 0.071 0.078 0.099 0.065 0.058 0.053 0.057 0.064 0.053
0.103 0.152 0.345 0.089 0.133 0.123 0.072 0.101 0.138 0.104 0.149 0.179 0.079 0.108 0.140 0.072 0.099 0.098 0.100 0.174 0.286 0.095 0.110 0.198 0.068 0.110 0.124
Table 3 Empirical size of the test with a 5% theoretical size and φ = 0.9.
β
h
T
p¯
H
BS
W
HH
BJNW
NW94
A
NWPW
APW
TR
0 0 0 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24 4 12 24
100 100 100 200 200 200 300 300 300 100 100 100 200 200 200 300 300 300 100 100 100 200 200 200 300 300 300
3.067 3.045 2.910 3.555 3.483 3.527 3.789 3.864 3.886 3.560 3.643 3.567 4.132 4.208 4.130 4.557 4.494 4.488 3.881 3.896 3.859 4.434 4.507 4.479 4.790 4.823 4.702
0.062 0.050 0.048 0.061 0.049 0.051 0.060 0.050 0.048 0.068 0.064 0.059 0.064 0.056 0.060 0.059 0.059 0.057 0.093 0.093 0.076 0.081 0.072 0.070 0.069 0.071 0.077
0.067 0.067 0.062 0.060 0.057 0.052 0.059 0.051 0.054 0.064 0.074 0.077 0.052 0.056 0.073 0.050 0.059 0.062 0.077 0.089 0.077 0.065 0.072 0.063 0.047 0.066 0.070
0.090 0.132 0.216 0.071 0.082 0.100 0.070 0.069 0.079 0.097 0.140 0.234 0.076 0.081 0.117 0.070 0.075 0.083 0.129 0.171 0.254 0.092 0.100 0.127 0.072 0.086 0.106
0.029 0.086 0.192 0.027 0.026 0.061 0.026 0.022 0.037 0.029 0.087 0.207 0.014 0.032 0.094 0.015 0.041 0.051 0.040 0.106 0.269 0.016 0.040 0.105 0.000 0.033 0.061
0.105 0.108 0.102 0.088 0.084 0.080 0.077 0.076 0.068 0.157 0.141 0.153 0.126 0.106 0.146 0.115 0.119 0.125 0.343 0.356 0.424 0.289 0.305 0.326 0.267 0.311 0.342
0.177 0.281 0.399 0.117 0.190 0.261 0.105 0.144 0.203 0.173 0.299 0.421 0.134 0.184 0.280 0.105 0.149 0.217 0.219 0.332 0.427 0.146 0.194 0.271 0.116 0.151 0.232
0.183 0.320 0.458 0.117 0.194 0.257 0.096 0.144 0.210 0.169 0.328 0.501 0.126 0.199 0.312 0.093 0.159 0.214 0.194 0.357 0.517 0.124 0.190 0.295 0.093 0.140 0.225
0.068 0.152 0.243 0.046 0.079 0.112 0.035 0.053 0.093 0.074 0.150 0.261 0.048 0.066 0.139 0.040 0.059 0.082 0.118 0.188 0.275 0.077 0.096 0.139 0.056 0.060 0.111
0.060 0.143 0.225 0.037 0.076 0.109 0.027 0.051 0.093 0.066 0.139 0.242 0.035 0.060 0.135 0.027 0.056 0.080 0.100 0.174 0.254 0.059 0.090 0.134 0.035 0.060 0.111
0.156 0.241 0.369 0.091 0.149 0.208 0.076 0.125 0.176 0.139 0.254 0.324 0.103 0.159 0.222 0.078 0.126 0.182 0.167 0.309 0.333 0.110 0.154 0.259 0.085 0.119 0.171
when the prediction horizon (h) is large, or when the predicted variable is more strongly serially correlated (i.e., β is larger). Nonetheless, when the sample size is large (T = 300) and/or the prediction horizon is not large (h = 4 and h = 12), the test statistics constructed from the prewhitened versions of the Newey–West and Andrews estimators tend to have accurate empirical sizes. Smith
and Yadav (1996) report significant size distortions for the prewhitened Andrews estimator, but their results are not inconsistent with those presented here because they consider a maximum sample size of only 200. The test statistic constructed using the transformation of Britten-Jones et al. (2011) is generally oversized, and severely so when the predicted variable is strongly seri-
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
ally correlated (β = 0.9). This is to be expected, since their transformation accounts for the autocorrelation induced by the construction of the h-step-ahead dependent variable, but not for the serial correlation in the variable from which it is constructed, which is accounted for by the Newey–West estimator. This is in contrast to the other kernel-based statistics reported to the right of BJNW in Tables 1–3, for which the serial correlation induced by the construction of the h-step-ahead variable is accounted for by the kernel-based covariance estimator, and the serial correlation in the variable from which the h-step-ahead change is constructed is accounted for by an assumed autoregressive structure. Consequently, there is not necessarily any reason to expect the Britten-Jones et al. (2011) statistic to have a superior size to the Newey–West-based statistic computed without the transformation that they propose. It has been observed previously (see e.g. Andrews & Monahan, 1992) that the test statistics constructed from kernel-based HAC estimators may be severely oversized when the dependent variable is strongly serially correlated, and this is the most likely explanation of the results reported in this paper. Britten-Jones et al. (2011) conduct simulations in which, under the null hypothesis, the oneperiod change in the dependent variable is an AR(1) plus white noise.7 Their model is equivalent to an ARMA(1, 1), with an autoregressive parameter equal to 0.8 and a moving average parameter equal to 0.55248. Their predictor variable is an AR(1) with an autoregressive parameter of 0.8. For T = 100 and h = 12, their test with a theoretical size of 0.05 has an empirical size of about 0.17, which is broadly consistent with the results reported in this paper. It should be noted that Britten-Jones et al. (2011) also report simulation results for examples with no serial correlation, in which their approach works well. Ang and Bekaert (2007), Britten-Jones et al. (2011) and Smith and Yadav (1996) report that statistics based on the Hansen and Hodrick (1980) estimator are oversized. The results reported in column HH of Tables 1–3 confirm this finding in cases where the sample size is small and the prediction horizon long. However, in cases where the sample size is large and the prediction horizon relatively small (e.g., T = 300, h = 4 in Tables 1–3), the empirical size is usually under the theoretical size of 0.05, and thus, notwithstanding the fact that the estimator is not guaranteed to be positive definite, the approach based on Hansen and Hodrick (1980) may be useful in cases with a large sample and a small prediction horizon. It should be noted, however, that it is sometimes significantly undersized. In cases where the predictor variable is not serially correlated (Table 1), the performance of the West (1997) approach (W ) is typically slightly inferior to that of the prewhitened estimators (NWPW and APW ). When the predictor variable is serially correlated (Tables 2 and 3), the results are often quite close. This is consistent with the results reported by West (1997). In general, the statistics computed using the West (1997) approach tend to be oversized. In cases where the size performance is quite good,
7 See their Table 4.
593
statistics constructed from the prewhitened kernel estimators are often superior. Furthermore, there are many cases in which the West statistic is significantly oversized, particularly when the predictor is strongly serially correlated (Table 3), the sample size is small, and the prediction horizon is large. The wild bootstrap (BS) has excellent size control in general, and is clearly superior to everything that appears to its right in Tables 1–3. This is consistent with the results of Dufour et al. (2006) and Kilian (1999), who found accurate size for a parametric bootstrap and a non-parametric bootstrap respectively. Note, however, that a small size distortion does exist in cases where the sample size is small (T = 100) and the predictor and predicted variables are both strongly serially correlated (Table 3, β = 0.9). The empirical size of the test statistic constructed using the new approach proposed in this paper is presented in the column headed H in Tables 1–3. Overall, the performance of this statistic is similar to that of the bootstrap statistic. That is, the empirical size is accurate in almost all scenarios, with some size distortion when there is strong serial correlation in both the predicted and predictor variables, the sample size is small, and the prediction horizon is long (see Table 3, β = 0.9 and T = 100). However, the size distortion in the other statistics considered (except the bootstrap) is generally much more severe in these cases. It is particularly interesting to compare the size of this statistic to the size of the statistic computed using the method proposed by West (1997) (W ). In the cases where West’s statistic performs well, the new statistic has a similar performance. In the cases where West’s method is oversized, the new method is superior. This is particularly the case when the sample size is small, the prediction horizon is large and the predictor variable is serially correlated (see, for example, Table 3, T = 100). Since the difference between the two statistics is the method used to estimate the moving average parameters of the error process, this must be the reason for the superior performance. Note that the statistic proposed in this paper (H) is based on a covariance matrix estimator that is computed under the null hypothesis. This was also the approach taken by West (1997) (W ). In contrast, the other statistics considered are computed using covariance matrix estimators that are computed under the alternative hypothesis. A referee asked whether the poor performances of the kernel-based approaches in some situations might be due to this. This was investigated by re-running the simulations with the kernel-based estimators being constructed using residuals computed under the null hypothesis instead of the alternative hypothesis, as is standard.8 It was found that this produced statistics that were significantly undersized in almost all situations for the prewhitened estimators, the Andrews (1991) estimator, and the truncated kernel estimator. For the Newey–West estimator (NW94 ), using the restricted residuals reduces the empirical size in all cases considered, but does not produce a statistic with satisfactory size performances over the entire range of parameter values considered.
8 The results are not presented in this paper, but are available from the author on request.
594
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
Fig. 1. Estimated power curves for H and BS.
The conclusions that may be drawn from this analysis are that the new statistic (H) and the bootstrap statistic (B) have accurate empirical sizes in almost all of the scenarios considered. In cases in which other statistics have accurate sizes, H and B are just as good, and in other cases they have superior sizes. While the bootstrap requires some computational work, this is unlikely to be a problem in a modern computing environment, and the new statistic proposed is no more difficult to program and compute than other HAC estimators. Consequently, of the statistics considered, they are clearly the best choices for applied researchers who are interested in testing for multiple-period predictability. Since both the new statistic proposed in Section 2 and the bootstrap are considerably better than the other statistics considered from the perspective of empirical size, power will be investigated for these two approaches only. Fig. 1 shows estimated power curves for both test statistics. From left to right, the columns of Fig. 1 correspond to
sample sizes of T = 100, T = 200 and T = 300. From top to bottom, the rows correspond to prediction horizons of h = 4, h = 12 and h = 24. Each point on each power curve is estimated by simulating 1000 samples from Eqs. (15) to (18) with β = φ = 0.5, then, for each sample, generating h−1 (1) (h) (h) yt = + wt and estimating k=0 yt +k and vt = γ yt (h)
yt
= b0 + b1 y(t 1−)1 + b2 vt + ηt .
(20)
Each point on each curve records the proportion of the 1000 samples for which H0 : b2 = 0 is rejected at the 5% significance level for values of b2 ranging from 0 to 0.25. As can be seen in Fig. 1, the power levels of the two tests are very close in cases where the sample size is reasonably large and the prediction horizon relatively small (i.e., for plots that are not close to the bottom left corner of the figure). In contrast, in cases with small samples and large prediction horizons, the parametric bootstrap has
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
an appreciable power advantage over the new procedure proposed in Section 2. 4. Application: Do Phillips curve models predict inflation multiple periods ahead? Inflation forecasting is an important topic in applied macroeconomics, not least because of the role that inflation forecasts play in the determination of monetary policy. Unsurprisingly, the question of which model provides the best inflation forecasts has been the subject of extensive empirical research. Much of this research has focused on Phillips curve models, with a wide range of variables being used as measures of economic activity. Out-of-sample forecasting tests comparing Phillips curve models to simple AR or ARMA benchmarks have returned mixed results. Fisher, Liu, and Zhou (2002), Stockton and Glassman (1987) and Stock and Watson (1999) find evidence that Phillips curve models can beat the benchmark at forecast horizons of up to 8 quarters, 12 months, and 12 and 24 months respectively. Ang, Bekaert, and Wei (2007), Atkeson and Ohanian (2001) and Cecchetti, Chu, and Steindel (2000) do not. Stock and Watson (2008) provide an extensive review of the literature and conclude that the good performance of Phillips curve models is episodic. Fisher et al. (2002) suggest that Phillips curve models tend to perform poorly both during periods of low inflation volatility and following regime changes. In this section, the test statistics considered in Section 3 are used to test the null hypothesis that Phillips curve models do not predict inflation over prediction horizons ranging from 4 quarters to 12 quarters in the United States. The time period considered is 1983:1–2008:2, and is chosen to approximately cover the period of the ‘great moderation’.9 Since this was a period of relatively low inflation volatility, based on the work of Fisher et al. (2002) we might expect Phillips curve models to have little predictive power over this period. To establish the theoretical context of this study, define πt = ln(CPI t ) − ln(CPI t −1 ), let wt be a variable that measures real economic activity, and consider the following equation:
∆(1) πt = β0 +
p
595
unemployment (NAIRU) is −β0 /γ . Inflation will be increasing (decreasing) when the unemployment rate is below (above) this level. Under H0 : γ = 0, the calculations in Section 2 provide the following equation for the h-period change in inflation:
∆(h) πt = b0 +
p
bi ∆(1) πt −i + γ wt + ηt ,
(22)
i =1
where ∆(h) πt = πt +h − πt , h ∈ N. The methodology employed in this section involves estimating Eq. (22) using ordinary least squares, constructing t-statistics for H0 : γ = 0 using all of the methods considered in Section 3, and comparing the p-values to conventional significance levels. The price variable used is the seasonally adjusted Consumer Price Index for All Urban Consumers: All Items (CPI t ). The measure of real activity used as the explanatory variable wt is the GDP gap, measured as the cyclical component, estimated by the Hodrick–Prescott filter, of the seasonally adjusted real gross domestic product (with the Hodrick–Prescott filter parameter set to 1600). This variable has been used before in the literature; see for example Ang and Bekaert (2007) and Clark and McCracken (2006). All of the data series are taken from the Federal Reserve Economic Database (FRED).10 The lag order p in Eq. (22) was estimated by using the Akaike Information Criterion (AIC) to choose the lag order11 in an autoregression for ∆(1) πt . This produced an order of p = 2. A Box–Ljung test for serial correlation in the AR(2) error term returned a p-value of 0.1468. Prior to constructing the test statistic, a Monte Carlo simulation was conducted, using an AR(2) with the estimated parameter values to simulate data for the change in inflation. An autoregression was fitted to the GDP gap data with the order chosen using the AIC,12 and the estimated parameter values were used to simulate values for the GDP gap. 2000 simulated samples were generated for prediction horizons h of 4, 6, 8 and 12. For each simulated sample πt∗ and wt∗ , ∆(h) πt∗ = πt∗+h − πt∗ was computed, the equation
∆(h) πt∗ = b0 +
p
bi ∆(1) πt∗−i + γ wt∗ + ηt
(23)
i=1
βi ∆(1) πt −i + γ wt + εt +1 ,
(21)
i=1
where ∆(1) πt = πt +1 −πt . Note that the change in inflation is defined as a forward difference. Consequently, πt −1 and wt become observable in the same time period, and εt +1 is a shock that affects inflation over the subsequent time period and is assumed to have an expected value of zero conditional on πt −1 . . . πt −p and wt . Eq. (21) may be interpreted as an expectationsaugmented Phillips curve. Assuming that the autoregressive component is stable, and taking wt to be the unemployment rate, the non-accelerating-inflation rate of
9 See Stock and Watson (2003) for a discussion of the dating of the commencement of this period. It may be argued that this period ended with the large shock that occurred in 2008:3.
was estimated by OLS using 102 observations, and the null hypothesis H0 : γ = 0 was tested using a 5% significance level and each of the methods used in the simulations in Section 3. The rejection rates are presented in Table 4. Note that the empirical sizes reported in Table 4 are broadly in line with those reported in Section 3. In particular, the statistics computed using kernel-based estimators are significantly oversized, the H statistic has a good empirical size, the bootstrap mostly has a good size (although it is somewhat oversized for h = 4), the West statistic
10 http://research.stlouisfed.org/fred2/. The variable codes are CPIAUCSL and GDPC1. The monthly CPI series was converted to quarterly by using the value from the last month of the quarter. 11 The maximum lag order considered was 12. 12 The maximum lag order considered was 12, which gave an estimated autoregressive order of 3.
596
C. Heaton / International Journal of Forecasting 31 (2015) 587–597
Table 4 Empirical sizes of the tests, with a 5% theoretical size, using parameter estimates from the application. h
H
BS
W
HH
BJNW
NW94
A
NWPW
APW
TR
4 6 8 12
0.044 0.046 0.045 0.048
0.084 0.062 0.060 0.060
0.064 0.071 0.076 0.094
0.010 0.020 0.048 0.076
0.006 0.011 0.024 0.009
0.146 0.189 0.213 0.230
0.155 0.164 0.178 0.187
0.155 0.154 0.149 0.169
0.160 0.158 0.151 0.160
0.120 0.153 0.171 0.180
Table 5 p-values for H0 : γ = 0. h
H
BS
W
HH
BJNW
NW94
A
NWPW
APW
TR
4 6 8 12
0.243 0.285 0.265 0.396
0.196 0.191 0.170 0.231
0.155 0.079 0.048 0.051
0.509 NA 0.390 NA
0.724 0.738 0.690 0.861
0.182 0.029 0.017 0.010
0.326 0.133 0.047 0.019
0.186 0.097 0.018 0.004
0.338 0.127 0.066 0.007
0.123 NA NA NA
has a slight size distortion that increases with the prediction horizon, and the statistic computed using the Hansen and Hodrick (1980) method is undersized for short prediction horizons and oversized for long prediction horizons. Interestingly, the BJNW statistic is undersized at all horizons considered, despite being oversized in the simulations presented in Section 3. Using the actual data on inflation and GDP, the hypothesis that the GDP gap does not predict the inflation rate (H0 : γ = 0) at various prediction horizons was tested using each of the methods considered in Section 3, and the p-values are presented in Table 5. For prediction horizons h = 6 and h = 12, the Hansen and Hodrick (1980) covariance estimator was singular, and so the relevant test statistic could not be computed. Similarly, the truncated kernel covariance estimator was singular for all prediction horizons except h = 4, and so the corresponding statistics are missing. Note that, for all prediction horizons considered, the p-values computed using H and B are relatively large. Since these statistics were shown to have reasonably accurate sizes, it should be concluded that there is no convincing evidence in Table 5 that the particular form of the Phillips curve specified in this paper is able to predict inflation at the prediction horizons considered, at conventional levels of statistical significance. However, it is the other statistics that are of particular interest. Using a 5% significance level, the statistic computed using the Newey–West estimator (NW94 ) rejects the null hypothesis of no predictability at all prediction horizons except h = 4, and the statistic based on the Andrews estimator (A) rejects the null for horizons 8 and 12. Prewhitening does improve the situation a little, but the null hypothesis is rejected by both statistics based on prewhitening for at least one prediction horizon. The statistic based on West’s method produces p-values that are close to 0.05 for horizons 8 and 12, and rejects the null hypothesis for all prediction horizons except h = 4 when using a 10% significance level. This example provides a useful illustration of the importance of using test statistics with accurate finite-sample sizes in applied research on multiple-period prediction. In particular, it shows that statistics based on the more commonly used kernel-based covariance matrix estimators may be quite misleading.
5. Conclusions Accurate size control is critical for empirical researchers who wish to argue against a hypothesis on the basis of a test statistic. This paper has shown that most of the test statistics that might typically be chosen to assess multipleperiod predictability are oversized in many scenarios that might be considered to be representative of applications in macroeconomics. An alternative statistic has been proposed, and has been shown to have an accurate size in almost all situations considered. A bootstrapped statistic has also been found to have a comparable empirical size in all situations considered, and superior power in cases in which the sample size is small and the prediction horizon very large. Appendix A. Supplementary data Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.ijforecast.2014. 09.004. References Andrews, (1991). Heteroskedastic consistent autocorrelation consistent covariance matrix estimation. Econometrica, 59, 817–858. Andrews, D. W. K., & Monahan, J. C. (1992). An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica, 60, 953–966. Ang, A., & Bekaert, G. (2007). Stock return predictability: is it there? The Review of Financial Studies, 20(3), 651–707. Ang, A., Bekaert, G., & Wei, M. (2007). Do macro variables, asset markets, or surveys forecast inflation better? Journal of Monetary Economics, 54, 1163–1212. Atkeson, A., & Ohanian, L. E. (2001). Are Phillips curves useful for forecasting inflation? Federal Reserve Bank of Minneapolis Quarterly Review, 25(1), 2–11. Britten-Jones, M., Neuberger, A., & Nolte, I. (2011). Improved inference in regression with overlapping observations. Journal of Business Finance and Accounting, 38(5–6), 657–683. Cecchetti, S. G., Chu, R. S., & Steindel, C. (2000). The unreliability of inflation indicators. Current Issues in Economics and Finance, 6(4), 1–6. Clark, T. E., & McCracken, M. W. (2006). The predictive content of the output gap for inflation: resolving in-sample and out-of-sample evidence. Journal of Money, Credit and Banking, 1127–1148. Davidson, J. (2000). Econometric theory. Wylie-Blackwell. Dufour, J.-M., Pelletier, D., & Renault, E. (2006). Short run and long run causality in time series: inference. Journal of Econometrics, 132, 337–362.
C. Heaton / International Journal of Forecasting 31 (2015) 587–597 Fisher, J. D. M., Liu, C. T., & Zhou, R. (2002). When can we forecast inflation. Economic Perspectives - Federal Reserve Bank of Chicago First Quarter, 30–42. Gonçalves, S., & Kilian, L. (2004). Bootstrapping autoregressions with conditional heteroskedasticity of unknown form. Journal of Econometrics, 123(1), 89–120. Hansen, L., & Hodrick, R. J. (1980). Forward exchange rates as optimal predictors of future spot rates: an econometric analysis. Journal of Political Economy, 88, 829–853. Hodrick, R. J. (1992). Dividend yields and expected stock returns: alternative procedures for inference and measurement. The Review of Financial Studies, 5(3), 357–386. Kilian, (1999). Exchange rates and monetary fundamentals: what do we learn from long-horizon regressions? Journal of Applied Econometrics, 14(5), 491–510. Knaus, J. (2013). Snowfall: easier cluster computing (based on snow). R package version 1.84-4. URL : http://CRAN.R-project.org/package= snowfall. Lütkepohl, H., & Burda, M. (1997). Modified Wald tests under nonregular conditions. Journal of Econometrics, 78, 315–332. Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. The Annals of Statistics, 21(1), 255–285. Nelson, C. R., & Kim, M. J. (1993). Predictable stock returns: the role of small sample bias. The Journal of Finance, 48(2), 641–661. Newey, W. K., & West, K. D. (1987). A simple, positive definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55, 703–838. Newey, W. K., & West, K. D. (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61, 631–653.
597
Pesaran, M. H., Pick, A., & Timmermann, A. (2011). Variable selection, estimation and inference for multi-period forecasting problems. Journal of Econometrics, 164(1), 173–187. R Core Team, (2014). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Richardson, M., & Smith, T. (1991). Tests of financial models in the presence of overlapping observations. Review of Financial Studies, 4, 227–254. Smith, J., & Yadav, S. (1996). A comparison of alternative covariance matrices for models with overlapping observations. Journal of International Money and Finance, 15(5), 813–823. Stock, J., & Watson, M. (1999). Forecating inflation. Journal of Monetary Economics, 97, 1167–1179. Stock, J. H., & Watson, M. W. (2003). Has the business cycle changed and why? In NBER macroeconomics annual 2002, Vol. 17 (pp. 159–230). MIT Press. Stock, J., & Watson, M. (2008). Phillips curve inflation forecasts. NBER Working Paper 14322. Stockton, D. J., & Glassman, J. E. (1987). An evaluation of the forecast performance of alternative models of inflation. The Review of Economics and Statistics, 69, 108–117. Wei, M., & Wright, J. (2009). Confidence intervals for long-horizon predictive regressions via reverse regressions. Tech. rep. US: Board of Governors of the Federal Reserve System. West, K. D. (1997). Another heteroskedasticity and autocorrelationconsistent covariance matrix estimator. Journal of Econometrics, 76, 171–191. Zeileis, A. (2004). Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software, 11(10), 1–17.