International Journal of Forecasting 35 (2019) 1240–1249
Contents lists available at ScienceDirect
International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast
Do forecasters target first or later releases of national accounts data? Michael P. Clements ICMA Centre, Henley Business School, University of Reading, Reading RG6 6BA, United Kingdom
article
info
a b s t r a c t
Keywords: Survey expectations Data vintages Advance estimates Revised data Growth forecasts
We consider whether it is possible to determine if macro forecasters are attempting to forecast first estimates of data, or revised estimates. Our approach requires that data revisions are predictable prior to the first estimate being released. There is some evidence that this condition is met for some series, and that some forecasters put some weight on later estimates for consumers’ expenditure and the GDP deflator. Crown Copyright © 2019 Published by Elsevier B.V. on behalf of International Institute of Forecasters. All rights reserved.
1. Introduction
In addition to the empirical evidence we present, one contribution of this paper is to bring to the fore the reasons why it is difficult to determine the vintage of data that is being targeted. The key requirement is the need to predict future revisions to the data before the first of the two estimates is available. The literature on data revisions typically considers whether they can be characterized by news or noise, as is explained below. They are noise if the revision can be predicted from a knowledge of the first estimate, and a number of papers have suggested that revisions to some series do constitute noise (see, e.g., Aruoba (2008)). Our requirement is that the revision be predictable prior to the first estimate being observed; that is, on an information set that does not include the first estimate but may include any other variable that is known at the time when the forecast is made.2 Given that the revisions to some series are noise, it is an empirical question as to whether revisions are also predictable on an information set that excludes the first estimate. We find evidence that revisions to consumers’ expenditure and residential investment are predictable at the 5% significance level, and that revisions to the GDP deflator are predictable at the 10% level.
Most macro data are subject to revision over time, with different vintage estimates (or data maturities) being published as more complete information becomes available. Which vintage of data should be taken as the ‘actual values’ when calculating forecast errors in forecast evaluation exercises? Some studies in the literature use first-release actuals, while others use actuals that are available after a relatively small number of monthly revisions, or the vintage that is available immediately prior to a benchmark revision, or even (what might be described as) fully-revised data.1 This paper suggests a simple method that can be used in principle for determining the target, although there is a stringent requirement that the future revision be predictable before the first estimate of the data value is released. We show that, for two of the five macro variables that we analyze, consumers’ expenditure and the GDP deflator, there is evidence that some survey respondents look beyond the first or advance estimate and put some weight on later estimates. E-mail address:
[email protected].
1
Revisions to some macro variables can be substantial. Croushore (2011, p. 249, Figure 9.1 ) illustrates the magnitudes of the revisions that have occurred to the growth rate of real residential investment by plotting the initial release of the 1976:Q3 growth rate and all subsequent estimates (up to those made in 2009). The estimates of the annual rate rose from less than 3% to nearly 16%, before finishing at around −5%!
2 This is a more stringent requirement than that revisions are noise, because the first estimate is often found to predict the revision (this is the definition of a series that is subject to ‘noise’ revisions), and we exclude this variable. At the same time, we also allow a more general information set, in that any variable that is known at the forecast origin is a legitimate predictor of the revision.
https://doi.org/10.1016/j.ijforecast.2018.11.009 0169-2070/Crown Copyright © 2019 Published by Elsevier B.V. on behalf of International Institute of Forecasters. All rights reserved.
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
One might wonder why statistical agencies (such as the US Bureau of Economic Analysis) would produce first estimates that are then subject to predictable revisions. Croushore (2011, p. 260–261) argues that statistical agencies may release noisy estimates (rather than estimates which add ‘news’) if they choose to refrain from using judgment to adjust sample information subjectively. We do not pursue this line of inquiry here. The finding that revisions are predictable makes it possible to ask the question of which vintage is being targeted. Why does it matter what vintage is being targeted? If one were able to determine the vintage of data being targeted reliably, at the level of the individual forecaster, there would be the potential for an improved understanding of various aspects of the expectations process, and in particular, for the study of why forecasters disagree. This is a much-studied question,3 with recent explanations stressing the role of informational rigidities (IR). Under full-information rational expectations (FIRE), in which all agents know the true structure of the economy and have access to the same information set, agents have identical expectations. IR have been used to explain the empirical finding of disagreement without jettisoning the basic notion that forecasters form their expectations rationally, subject to the information constraints that they face. The two leading contenders are sticky information4 and noisy information.5 However, if different individual forecasters do actually target different vintages, then a proportion of the observed disagreement may not reflect true disagreement at all, but may arise simply because different forecasters are forecasting different things, viz. the data vintage of the macro variable. As was stressed by Manski (2017), research by Patton and Timmermann (2007), Engelberg, Manski, and Williams (2009) and Clements (2009, 2010), inter alia, indicates that disagreement would arise from the heterogeneity of response practices even if all individuals shared the same underlying probability distributions regarding future outcomes. Target heterogeneity would generate an additional source of apparent disagreement. Our findings suggest that heterogeneity regarding targets is unlikely to be a major part of the story, at least for real GDP growth. For GDP growth, we find little evidence against the null that the first estimate is being targeted. More generally, of course, forecast-accuracy comparisons among individual forecasters, and forecast efficiency assessments (e.g. using the approach popularized by Mincer & Zarnowitz, 1969), may depend on the vintage of actual values that is chosen. In addition to facilitating research on expectations formation, survey forecasts have been shown to provide more accurate ‘nowcasts’ and short-horizon forecasts than purely model-based forecasts (see e.g. Ang, Bekaert, & Wei, 2007; 3 See for example Zarnowitz and Lambros (1987), Bomberger (1996), Rich and Butler (1998), Capistrán and Timmermann (2009) , Lahiri and Sheng (2008), Rich and Tracy (2010), and Patton and Timmermann (2010), inter alia . 4 See inter alia Mankiw and Reis (2002), Mankiw, Reis, and Wolfers (2003), and Coibion and Gorodnichenko (2012, 2015). 5 See Woodford (2002), Sims (2003), and Coibion and Gorodnichenko (2012, 2015), inter alia.
1241
Clements, 2015), and are increasingly being used as a complement to other approaches (e.g. Wright, 2013). The plan of the remainder of the paper is as follows. Section 2 presents a simple statistical framework for illustrating the prediction problem when data are subject to revision, and shows that, in principle, a simple regression can be used to determine the vintage of data being forecast. Section 3 presents the empirical analysis, and Section 4 contains some concluding remarks. 2. Statistical framework and test We set out a simple statistical framework for addressing the question of the circumstances under which we can determine the forecaster’s target. Suppose that there are just two vintages. The first-vintage estimate of yt is denoted ytt +1 , to reflect an assumed one-period delay in the release of the data, where the superscript indicates the t + 1-data vintage and the subscript the reference period t. ytt +1 is assumed to be a noisy observation of the true value of yt , denoted by ytt +2 , that is: ytt +1 = ytt +2 + ηt where ηt ⊥
(1)
ytt +2 ,6
and
ytt +2
follows an AR(1), say:
1 ytt +2 = ρ ytt + −1 + vt .
(2)
Here, the revision to the estimate of yt , ηt = − is assumed to be uncorrelated with the true value nally, we assume that data revisions are serially correlated (as per Howrey, 1978, for example): ytt +1
ηt = hηt −1 + wt ,
ytt +2 , t +2 yt . Fi-
(3)
where wt is an innovation error term. Because of the oneperiod delay in the release of the data, the{public information set at time t will consist of Ω t = ytt −1 , ytt −2 , . . . ; } t −1 1 ytt − −2 , yt −3 , . . . ; . That is, the vintage-t values of the variable through observation t − 1; the vintage- t − 1 values through t − 2, and so on. Statistical frameworks that allow for multiple vintage estimates, and for more general models of the true process (here ytt +2 ), are provided by Jacobs and van Norden (2011) and Kishor and Koenig (2012), inter alia, although Eqs. (1)–(3) suffice for our purposes. Consider optimal forecasts of the first and second estimates of reference quarter yt given information Ω t . From Eq. (1) we have: E ytt +1 |Ω t = E ytt +2 |Ω t + E ηt |Ω t .
(
)
(
)
(
)
It follows immediately that unless revisions ) are predictable, ( ) ( i.e., h ̸ = 0 in our framework, then E ηt |Ω t = E wt |Ω t = 0, and the prediction of the revision is zero. If the revision is unpredictable, then the forecasts of the different vintage estimates are identical, as shown, and there is no way of knowing, from the data evidence alone, whether the forecaster’s intention is to forecast the first or second estimate. 6 The assumption that the revision is uncorrelated with the true value is a feature of a noise revision: see e.g., Mankiw and Shapiro (1986). The implication is that the later-vintage estimate is predictable from the earlier estimate.
1242
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
To make progress, suppose that revisions exhibit some degree of predictability, so that h ̸ = 0. Given a reported forecast of yt , denoted ft , we wish to determine whether ft is a forecast of ytt +2 or of ytt +1 . The intuition underlying our proposed strategy is simple. Suppose that we calculate the forecast error using ytt +1 as the actual value, i.e., assuming that the forecaster is targeting the first estimate. Assuming forecast efficiency in the sense of Mincer and Zarnowitz (1969),7 the forecast error should not be related to any information that was known at the time when the forecast was made, including the expected data revision, if the assumption that the forecaster is targeting the first estimate is correct. If the assumption is incorrect and the forecaster was targeting a second estimate, say, then the expected data revision and the forecast error would be correlated. To see why this is the case, suppose ( that the data) are expected to be revised up; that is, E ytt +2 − ytt +1 |Ω t > 0. If the forecaster is targeting ytt +2 , then the forecast error using ytt +1 as the target will tend to be negative. In a regression context, we would expect β = 0 in Eq. (4) when the forecaster is targeting the first estimate (as is assumed by the construction of the dependent variable). Of course, the future revision is an endogenous explanatory variable, and Eq. (4) will need to be estimated by TSLS, with the explanatory variable replaced by the predicted value from the first-stage regression on suitable instruments: ytt +1 − ft = α + β ytt +2 − ytt +1 + εt .
(
)
(4)
This explains the requirement that future revisions should be predictable in advance. Given the statistical { setup in Eqs. (1)–(3), obvious instruments are Z t = ytt −1 , ytt −2 , } −1 ytt − 2 , for example, which includes the revision for yt −2 , 1 i.e., ytt −2 − ytt − −2 . This follows from the assumption that revisions are serially correlated as per Howrey (1978), and as in the ‘spillovers’ of Jacobs and van Norden (2011). In practice, any variables which predict the future revision to observation quarter t, at time t, could be used. Our simple statistical framework suggests instruments and helps to fix ideas, but revision processes for US national accounts variables are more complicated in practice than we allow, and suggest that casting the net more widely may provide additional useful instruments.8 Appendix details the derivation of the population values of β in Eq. (4) when the forecaster is targeting either the first or second release, when, as was shown in Eq. (4), the forecast error is constructed using the first-estimate actual value, ytt +1 . We show that the TSLS estimate of β is β = 0 when the forecaster targets the first estimate, but that β = −1 when the second estimate is targeted. Note also that we have framed the problem in terms of the first estimate being targeted: the left-hand-side of Eq. (4) is the forecast error using the first estimate, and we then consider whether there is evidence that weight is attached to a later release. However, this ‘normalization’ of the forecast error does not mean that ytt +1 and ytt +2 are 7 If the forecast error is correlated with anything that was known at the time when the forecast was made (including the forecast itself), the forecast is inefficient. 8 Clements and Galvão (2018) provide a recent review of the forecasting of data subject to revision.
being treated differently. The estimation of Eq. (4) using Z t as instruments is equivalent to estimation based on the moments: E (1 − γ ) ytt +1 + γ ytt +2 − ft − α | Z t = 0,
[
]
(5)
where γ = −β . It is evident from Eq. (5) that we are estimating the weights (1 − γ ): γ on the first and second estimates, such that β = 0 in Eq. (4) implies no weight on the later estimate, and β = −1 (γ = 1) implies that only the later estimate is targeted. 2.1. Efficient forecasts Two conditions must hold in order for β = 0 in Eq. (4) when the forecaster targets the first release, ytt +1 . Firstly, the agent’s information set that is used to generate ft must be at least as large as the information set used by the econometrician in the first-stage regression to predict the righthand-side, ytt +2 − ytt +1 . It need not be any larger (we do not require the agent to have additional information), but the possibility is not ruled out. Secondly, we require forecast efficiency in the Mincer–Zarnowitz sense. This ensures that the forecast error ytt +1 − ft in Eq. (4) is uncorrelated with the instruments, Z t , which is the so-called ‘exclusion restriction’ that is required for instrument validity. Appendix illustrates for the data generation process given by Eqs. (1) to (3). 2.2. Predictability of revisions We stress that the requirement that ytt +2 − ytt +1 can be predicted from Z t can be related to the literature on testing whether data revisions are news or noise, but does not correspond to either classification directly. Mankiw and Shapiro’s (1986) standard characterization of revisions as news or noise suggests that revisions are predictable if they are noise. Data revisions are news when they add new information, and noise when they reduce the measurement error. If data revisions are noise, they are unrelated to the later estimate, so that γno = 0 in: ytt +2 − ytt +1 = α + γno ytt +2 + ωt , but are predictable from in:
ytt +1 .
(6)
For news revisions, γne = 0
ytt +2 − ytt +1 = α + γne ytt +1 + ωt ;
(7)
that is, revisions are unpredictable given the first estimate. Revisions being noise is not sufficient to ensure predictability in the first-stage regression, because the first-stage regression considers an information set that excludes ytt +1 . Hence, our key requirement is that the revision be predictable from an information set that does not include the first estimate. However, data revisions which are news based on Eqs. (6) and (7), i.e., with γno ̸ = 0 and γne = 0, may still be predictable on an extended information set which excludes ytt +1 but includes other variables that are known at time t, the forecast origin. The empirical findings in the literature regarding news/ noise are somewhat mixed. Mankiw and Shapiro (1986) and Faust, Rogers, and Wright (2005) provide empirical evidence that data revisions to US real GDP are largely
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
news, while Aruoba (2008) and Corradi, Fernandez, and Swanson (2009) provide extensions and more nuanced findings. Clements and Galvão (2017) consider the use of more general information sets than the variable being forecast, but always assume that the earlier estimate is in the information set. 2.3. Robust inference Given that we use TSLS for our testing procedure, we need to take into account the possibility that our instruments may be weak. As was reported in the introduction, we find predictability of data revisions for three of our variables at conventional significance levels. However, the importance of strong instruments for reliable TSLS inference is evident from the recent literature on weak instruments, surveyed by Stock, Wright, and Yogo (2002). Their rule-ofthumb suggests a first-stage F statistic that is in excess of 10 for tolerable TSLS inference (see Stock et al., 2002, Table 1, p. 522, and their discussion of the meaning of ‘tolerable’). Anticipating our empirical findings, in the event that the first-stage regressions’ F statistics suggest that TSLS will be unreliable, we will instead report results using two of the partially-robust (to weak instruments) estimators discussed by Stock et al. (2002): limited-information maximum likelihood (LIML) and the Fuller-k estimator (Fuller, 1977), both of which can be seen as k-class estimators (see e.g. Davidson & MacKinnon, 1993, ch. 18). Letting γ = (α, β)′ , the general k-class estimator of γ in Eq. (4) is:
[ ]−1 [ ′ ] γˆ (k) = X ′ (I − kMZ ) X X (I − kMZ ) y , where X is the T × 2 matrix of the right-hand-side variables (the first column is a vector of ones and the second is the endogenous variable denoted Y1 , with typical element ytt +2 − ytt +1 ), and MZ is the annihilation matrix formed from the instruments (plus the constant), i.e., MZ = I − PZ , where PZ projects onto the space spanned by the instruments. y is a vector with typical element ytt +1 − ft . k = 0 delivers OLS, and k = 1 TSLS. For LIML, k is given⏐ by the smallest root ⏐ to the equation for the determinant ⏐Y ′ M1 Y − kY ′ MZ Y ⏐ = 0, where Y = (y Y1 ), and M1 is the annihilation matrix formed from the columns of the non-endogenous righthand-side variables, here just a constant. When there is a single instrument, k = 1, so that LIML is the same as TSLS under just-identification. When the model is overidentified, k > 1. The Fuller-k estimator sets k = kLIML − b/(T − K ), where K is the number of instruments. We implement this with b = 1. Stock et al. (2002) discuss the merits of these and other estimators when the instruments are weak. The covariance matrix of estimates is [ ]−1the parameter given by σˆ 2 X ′ (I − kMZ ) X , where σˆ 2 is the estimator of σ 2 , the error variance in Eq. (4). Our attention in our empirical work will be confined to Z t as given above. This leaves open the possibility that better instruments may exist for the revisions to some of the variables we consider, so that TSLS of Eq. (4) might give rise to reliable inference for β .
1243
3. Empirical findings We use the US Survey of Professional Forecasters (SPF) as our source of expectations, as it is the oldest quarterly survey of macroeconomic forecasts in the US.9 The SPF has been used extensively in research; for details, see the academic bibliography maintained at https://www.philadel phiafed.org/research-and-data/real-time-center/survey-of -professional-forecasters/academic-bibliography. We consider the forecasts of five variables: real GDP growth (RGDP), real personal consumption expenditures (RCONSUM), real non-residential fixed investment (RNRESIN), real residential fixed investment (RRESIN), and the GDP deflator inflation (PGDP). We focus on the current-quarter forecasts, e.g., the forecast made in 2010:Q1 (around the middle of the middle month of the quarter, 15th February) of the quarterly growth rate in the first quarter over the fourth quarter of 2009, and so on. The SPF also collects forecasts of the next quarter and of the quarters up to one year ahead, as well as forecasts of the current and next year’s values of the variables. We chose the current quarter forecasts because the survey forecasters are naturally able to forecast better at the shortest horizon (see e.g. Clements, 2015), and we suspect that they are also able to forecast the revised values better at this horizon than at longer horizons, should they choose to do so. Of course, the approach could be adapted to consider longer horizon forecasts if desired. We use the forecasts made in response to the 129 surveys from 1981:Q3 to 2013:Q3, inclusive. The number of respondents has varied over time, with only nine respondents in 1990:Q2, a maximum number that is in excess of 50 (the precise number depends on the variable), and a median of 33. In conjunction with this dataset, we also use real-time vintage data from the real time data set for macroeconomists (RTDSM), which is likewise maintained by the Federal Reserve Bank of Philadelphia (see Croushore & Stark, 2001). The definitions of the variables in the RTDSM match those of the variables being targeted in the SPF exactly. The RTDSM gives quarterly vintage estimates of the quarterly-frequency national accounts variables, and this is reflected in our notation in Section 2 , where, for example, ytt +1 is the value of y in reference quarter t for the data vintage t + 1. However, monthly-vintage estimates of the data are also produced. The first estimate of yt is made approximately one month after quarter t. This is usually known as the ‘advance’ estimate. It is the first monthly estimate, and corresponds to the first quarterly estimate of the RTDSM. The second and third monthly estimates are produced at the ends of the following two months. The third monthly estimate pre-dates the publication of the second quarterly estimate (ytt +2 ). In addition to considering the revision between the second quarterly estimate and the advance, we also consider the (shorter) revision between the third monthly estimate and the advance, because it 9 The SPF began as the NBER-ASA survey in 1968:4 and runs through to the present day. Since June 1990, it has been being run by the Philadelphia Fed under the name ‘‘the Survey of Professional Forecasters’’ (SPF): see Zarnowitz (1969) and Croushore (1993) .
1244
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
Table 1 Descriptions of the macroeconomic variables.
Table 2 First-stage regression F statistics: the predictability of data revisions.
Variable
SPF code
RTDSM
Real GDP (GNP) Real personal consumption Real nonresidential fixed investment Real residential fixed investment GDP price index (implicit deflator, GNP / GDP deflator)
RGDP RCONSUM RNRESIN RRESINV PGDP
ROUTPUT RCON RINVBF RINVRESID P
RGDP RCONSUM RNRESIN RRESIN
Notes: The SPF data are from the Philadelphia Fed, http://www.phil.frb. org/econ/spf/. The RTDSM data are from http://www.philadelphiafed.org/research-anddata/real-time-center/real-time-data/.
PGDP
3rd monthly–1st
2nd–1st
5th–1st
15th–1st
0.44 (0.72) 5.10 (0.00) 0.69 (0.56) 2.87 (0.04) 2.28 (0.08)
1.74 (0.16) 6.80 (0.00) 1.87 (0.14) 2.44 (0.07) 0.60 (0.61)
1.73 (0.16) 2.98 (0.03) 2.11 (0.10) 2.94 (0.04) 1.13 (0.34)
1.26 (0.29) 3.35 (0.02) 0.98 (0.40) 0.69 (0.56) 0.37 (0.77)
Notes: } sets for the first and second quarterly estimates { The instrument 1 are ytt −1 , ytt −2 , ytt − −2 ; those for the first and fifth quarterly estimates are 4 ytt −1 , ytt −5 , ytt − −5 ; and those for the first and 15th quarterly estimates
}
{
is possible that the survey respondents are targeting the third monthly estimate. The notation could be adapted to accommodate monthly vintages by writing the revision between the second quarterly estimate and the advance t +1+1/3 t +1/3 t +1+1/3 as, say, yt − yt , where yt is the (end) of the first month of quarter t + 2 estimate (i.e., the second t +1/3 quarterly estimate) and yt is the advance (available at the end of the first month of quarter t + 1). Then, t −1+1/3 t −1+1/3 t −2+1/3 the instruments would be yt −1 , yt −2 and yt −2 , and we could write the revision between the third month t +1/3 estimate and the advance as ytt +1 − yt , where ytt +1 is the (end) of quarter t + 1 (i.e., the third monthly estimate) t +1/3 and yt is the advance. In that case, the instruments t −1+1/3 t −2+1/3 −1 would be yt −1 , ytt − . We dispense with 2 and yt −2 this formality and use the quarterly vintage notation of Section 2, although it should be remembered that this needs to be adjusted when we consider monthly estimates. The SPF and RTDSM mnemonics are recorded in Table 1. Finally, all of the growth rates reported in this paper are calculated as annualized percentage quarter-on-quarter growth rates. 3.1. Properties of data revisions We begin with a consideration of the predictability of future revisions, given the need to carry out the first-stage regression in the estimation of Eq. (4). The results are shown in Table 2. Our primary focus is on the early revisions, either between the advance and the third monthly estimate or between the advance and the second quarterly estimate. The first quarterly estimate coincides with the first monthly; these are the advance estimates, released roughly one month after the reference quarter. The second quarterly estimate is the vintage that is available at the middle of the quarter two quarters after the reference quarter,10 whereas the third monthly is released at the end of the third month after the reference quarter. To illustrate the timing, consider a forecaster who is responding to the 2010:Q1 survey. When comparing the advance and second quarterly estimates, the question that we ask is whether the forecast is of the advance value of y in 2010:Q1 that will be released at the end of April, or the value to be released by 10 See e.g. Landefeld, Seskin, and Fraumeni (2008) and Fixler, Greenaway-McGrevy, and Grimm (2014) for a discussion of the revisions to US national accounts data.
14 are ytt −1 , ytt −15 , ytt − −15 . For the revision between the advance and thirdmonthly estimates, we follow the approach for the first–second quarterly revision but substitute the third monthly estimate for the second quarterly estimate.
{
}
the middle of August? Alternatively, when we compare the advance and third monthly estimate, is it the value released at the end of April or that released at the end of June? When we consider the predictability of the future revision for the first and second quarterly estimates, the firstregression}is of ytt +2 − ytt +1 on a constant and Z t = {stage 1 t yt −1 , ytt −2 , ytt − −2 (suitably adapted for monthly estimates, as described above). According to the advice of Stock et al. (2002), the F statistics presented in Table 2 are too small to ensure that the subsequent second stage of TSLS will yield reliable inference for any of the five variables, whether we look at advance three-month revisions or advance secondquarter revisions. For RGDP, the shorter revision F -statistic is smaller than for the second quarterly estimate revision, whereas the situation for PDGP is reversed. We also asses whether the longer-term revisions — the revision between the first and fifth, and between the first and fifteenth estimates — are more or less predictable in advance, and these results are also reported in Table 2. For these longer-term revisions, the requirement that the instruments are known at time t necessarily means that the more mature data necessarily relate to a more distant reference quarter For example, the first-stage regressions 4 t +15 are of ytt +5 − ytt +1 on ytt −1 , ytt −5 , ytt − − ytt +1 −5 , and of yt t −14 t t on yt −1 , yt −15 , yt −15 , respectively. The table shows that the first-stage regression F statistics do not satisfy the ‘rule of thumb’ of an F statistic that is greater than 10 for these longer-term revisions either. Note the F statistics do reject at the less stringent conventional significance levels for RCONSUM and RNRESIN, and for the shortest revision for PGDP, suggesting that these revisions are predictable to some extent. The p-values are given in parentheses in Table 2. Our empirical work does not formally consider the possibility that the forecaster is targeting the two longer-term estimates, or even the most mature data available (the vintage available at the time when the study is undertaken). These more mature estimates are likely to embody the effects of methodological changes and other revisions that could not have been foreseen at the time. Nevertheless, even the revisions between the advance and early revisions
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249 Table 3 The size and variability of data revisions. RGDP
RCONSUM
RNRESIN
RRESIN
PGDP
3rd monthly–1st
2nd–1st
5th–1st
15th–1st
0.14 0.72 0.31 0.01 0.53 0.23 1.07 2.77 0.29 −0.02 3.68 0.23 0.08 0.41 0.26
0.10 0.78 0.33 −0.03 0.60 0.26 0.81 3.51 0.37 −0.23 4.45 0.27 0.10 0.44 0.28
0.08 1.11 0.47 −0.11 0.84 0.36 0.87 4.51 0.47 −0.18 5.88 0.36 0.09 0.63 0.41
−0.09 1.67 0.68 −0.20 1.36 0.57 −0.88 4.86 0.49 −0.52 8.06 0.48 0.22 0.85 0.55
Notes: For each variable, the first two rows record the mean and standard deviation of the designated revision (third monthly, second quarterly, fifth and 15th quarterly estimates minus the first estimate). The third row is the standard deviation of the revision to the standard deviation of the advance estimate of the variable.
can be substantial, as is shown by Table 3. Table 3 provides summary statistics for the various revisions and gives their magnitudes relative to the underlying variability of the (advance estimate) of the variable. The standard deviations of the revisions between the two early estimates (third monthly and second quarterly) and the advance estimates are generally between a quarter and a third of the standard deviation of the advance estimates. This ratio exceeds two thirds for the revision of the 15th estimate (to the advance) for RGDP. Hence, revisions can be large (as was suggested in footnote 1 referencing Croushore, 2011, and his illustration of the revisions to a residential investment). 3.2. The median forecasts As a result of the findings for the first-stage regressions in Table 2, we next report results using the two estimators described in Section 2. Table 4 records the results of estimating Eq. (4) for the median forecasts, using LIML and Fuller-k. We report results for the second-quarterly estimates (compared to the advance). There is no significant evidence against the null that β = 0 for any of the variables.11 The results for the third monthly estimates were similar, and are not reported. 3.3. The individual forecasts The advantage of using the median forecasts is that we can use all of the forecasts, from the 1981:Q3 survey to the 2013:Q3 survey. However, Engelberg, Manski, and Williams (2011) argue that the changing composition of panels of forecasters suggests the need for care in interpreting aggregate measures (such as the median), and 11 The closest we came to rejecting is for RCONSUM, where the null of
β = 0 is rejected against a one-sided alternative (β < 0) at the 11% level for both LIML and Fuller-k. (The reported p-value of 0.89 is the probability of obtaining a larger t-statistic than the calculated value under the null hypothesis, and so corresponds to rejection at the 11% level in a one-sided test).
1245
Table 4 Median forecasts; inference on β in Eq. (4) for the advance versus the second quarterly estimates. SPF variable
N
LIML
βˆ RGDP RCONSUM RNRESIN RRESIN PGDP
124 124 124 124 124
Fuller-k p-value
−0.04 −0.88 −0.12 2.38 5.10
0.51 0.89 0.55 0.26 0.19
βˆ 0.06 −0.87 −0.15 0.93 2.35
p-value 0.48 0.89 0.57 0.32 0.17
Notes: The table reports the coefficient estimates and p-values for the two estimators. The p-value is the probability of obtaining a larger t -statistic than that calculated under the null hypothesis, so that, for example, values less than 0.05 or greater than 0.95 would signify rejections of the null that β = 0 in a two-sided test with a 10% significance level.
point out that temporal variation in the median (or the mean) might be due solely to changes in the composition; for example, joiners being more optimistic than leavers. Related concerns arise here if forecaster heterogeneity in terms of the vintage being targeted is masked by the use of the aggregate, and further confounded by changes in panel composition. Fig. 1 records the cross-sectional inter-quartile range of the point forecasts. The IQ range is more robust than the standard deviation,12 and guards against aberrant forecasts that might reflect transcription errors or (legitimate) extreme views about the future course of the economy. The figure displays a marked level of disagreement across forecasters, given that these are current-quarter forecasts (that is, made roughly around the middle of the quarter being forecast) and that we are looking at the difference between the third and first quartiles. For all variables, disagreement increases around 2008–10, and is relatively high throughout for the investment variables. The introduction to the paper rehearsed possible explanations of why forecasters disagree; now, this sub-section considers the possibility that forecasters might be targeting different vintages of data. Given the heterogeneity across forecasters, we estimate Eq. (4) separately for each individual who made 50 forecasts or more over the sample period. Table 5 reports the proportions of the (22 to 25) forecasters for whom we failed to reject the null H0 : β = 0 in either a two-sided test (with alternative H1 : β ̸ = 0) or a one-sided test (against H1 : β < 0), both at the 10% level. The individual-level results indicate that we reject the null that β = 0 in the direction of putting some weight on the later estimate for around a third of respondents for RCONSUM, and for a quarter to a half of all respondents for PGDP (depending on whether we consider the second quarterly estimate or the third monthly). Note that there tend to be more rejections against β < 0, which is consistent with some weight being put on the later estimate. For the other three variables, the proportions for which the null is rejected are smaller, corresponding to 1, 2 or 3 12 Half the distance between the 84th and 16th percentiles of the cross-section distribution of forecasts is also used at times, as this ‘quasistandard deviation’ is equal to the standard deviation for normally distributed data (see e.g. Giordani & Söderlind, 2003).
1246
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
Fig. 1. The inter-quartile ranges of the cross-section dispersions of the current-quarter growth forecasts (annualized), 1981–2013. Table 5 Individual respondents: average results of inference on β in Eq. (4). # forecasters
LIML
Fuller-k
1.00 1.00 0.88 0.67 1.00 0.87 0.91 0.96 0.68 0.45
1.00 1.00 0.83 0.63 1.00 0.87 0.91 0.91 0.68 0.45
0.96 0.96 0.96 0.67 0.96 0.87 1.00 0.87 0.95 0.73
0.96 0.96 0.96 0.71 0.96 0.83 1.00 0.87 0.91 0.64
Third monthly estimates H1 : β ̸ = 0 25 H1 : β < 0 RCONSUM H1 : β ̸ = 0 24 H1 : β < 0 RNRESIN H1 : β ̸ = 0 23 H1 : β < 0 RRESIN H1 : β ̸ = 0 23 H1 : β < 0 PGDP H1 : β ̸ = 0 22 H1 : β < 0 Second quarterly estimates
RGDP
RGDP RCONSUM RNRESIN RRESIN PGDP
H1 : β H1 : β H1 : β H1 : β H1 : β H1 : β H1 : β H1 : β H1 : β H1 : β
̸= 0 <0 ̸= 0 <0 ̸= 0 <0 ̸= 0 <0 ̸= 0 <0
25 24 23 23 22
Notes: The table reports the proportions of forecasters for whom the null that H0 : β = 0 was not rejected against the specified alternative (either H1 : β ̸ = 0 or H1 : β < 0), at the 10% level.
individuals. The small number of rejections for these other variables is broadly consistent with the number that we would expect by chance when the null is true. We undertook an additional comparison for RCONSUM and PGDP, the two variables for which we found evidence that a third to a half of all individuals looked beyond the
advance estimate. For each respondent we calculated the ratio of the MSE obtained using the advance estimates to the MSE obtained using the later estimates. We then calculated the averages of these ratios separately for the set of forecasters who appear to target later releases (that is, for whom we reject the null that β = 0 in Eq. (4) at the 10% level on a one-sided test) and the set of forecasters for whom there is no evidence against the null of targeting the advance estimate (at the 10% level). The results are shown in Table 6. The average ratios of the first set (‘reject β = 0’) are larger than those for the second (‘do not reject β = 0’), with one exception. That is, when the test suggests that an individual pays attention to later estimates, that individual’s forecasts are more likely to be relatively more accurate when evaluated on later-release estimates. Note that the average ratios when we compare advance and third monthly estimates exceed one for both sets with the exception of RCONSUM, suggesting that the MSEs obtained using later-release actuals are smaller than those obtained using the advance estimate actuals. This is consistent with the revisions being ‘noise’, in the sense that the revised data (either the third monthly or second quarterly estimates) remove measurement error relative to the advance estimates. The important point for our purposes is not whether the ratio is larger or smaller than one, but the size of the ratio for the two sets of forecasters (i.e., the set for whom we reject the null that β = 0 in Eq. (4) and the set for whom we do not). The table suggests that, on average (measured by the median), the accuracy of the first set of forecasters at forecasting PGDP is some eight percentage points better for the second quarterly estimate (compared to the advance) relative to the second set of forecasters (compare the entries 1.096 and 1.017). That is, an explicit
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
1247
Reject β = 0
Do not reject β = 0
1.015 0.995
0.982 0.997
1.073 1.068
1.038 1.020
Mean Median
1.065 1.059
1.021 1.009
order to sustain the interpretation that a rejection of β = 0 in Eq. (4) in favour of β < 0 suggests that the later vintage is being targeted. The instruments consist of the advance and revised estimates of the variable in question. The respondents may or may not base their reported forecasts on additional sources of information. There are a number of interesting avenues to explore. We have used as instruments those variables that are suggested directly by our simple model of the revisions process, although other variables may be relevant, and might alleviate the weak instruments problem. A possible extension to the study would be to undertake a more general search for potential predictors of future revisions.
Mean Median
1.067 1.096
1.044 1.017
Acknowledgments
Table 6 Targeting later releases and forecast accuracy. Third monthly estimates RCONSUM Mean Median PGDP Mean Median Second quarterly estimates RCONSUM
PGDP
Notes: The entries are based on the MSE ratios for each individual forecaster, where we calculate the MSE using advance estimates as a ratio to the MSE using the third monthly, or second quarterly, estimates. The table reports the averages for the set of forecasters for whom we reject the null that β = 0 in Eq. (4) at the 10% level on a one-sided test, and for the set of forecasters for whom there is no evidence against the null of targeting the advance estimate (at the 10% level).
targeting of second quarterly estimates, rather than the advance, improves the relative accuracy of the forecasts of the later estimate by around 8%. 4. Conclusions Questions concerning the appropriate data vintage of actual values to use in analyses of forecast performances are somewhat vexing. Recent papers have typically used an ‘early estimate’, such as the advance estimate, the third monthly estimate, or the second quarterly estimate, although other choices have been made, including the vintage available at the time when the study is undertaken. However, the revisions even between two sets of early estimates, such as the advance and the third monthly estimate, for example, can be substantial. If these revisions were unforecastable, then the survey respondent’s forecast would be the same regardless of which vintage the forecaster was putatively targeting, and the question that we ask in this paper would be unanswerable. However, there is some evidence that revisions are predictable using public information, most notably for consumption expenditure, although the instruments are weak and may result in unreliable inference in our tests of the weights attributed to early and later estimates by forecasters. For this reason we use estimators which are partially robust to weak instruments, and find evidence that some individual forecasters do not focus exclusively on the first (advance estimate), at least for consumption and the GDP deflator. An important caveat to our finding that some respondents target later estimates when they forecast consumption growth and the GDP deflator is that the tests may not be as well-behaved as we would hope, given the weak instrument problem, even though the estimators that we use are designed to be partially robust. Another important caveat is that the forecasts need to be efficient, and to use at least as much information as in the instrument set, in
The computations were performed using code written in Gauss 14, Aptech Systems. The figure was plotted in OxMetrics 7.10, Timberlake Consultants. The helpful comments of seminar participants at the ICMA Centre, Reading are gratefully acknowledged, as are those of conference participants at the International Symposium of Forecasting 2018, and of two reviewers and an associate editor of this journal. Appendix Suppose that the data generation process (DGP) is defined by Eqs. (1) to (3). For the sake of simplicity, suppose 1 that the only instrument used is Z t = ηt −2 = ytt − −2 − ytt −2 , to make the argument as transparent as possible. The instrument is suggested by Eq. (3), and by the fact that the right-hand-side endogenous variable is −ηt . The population value of the IV estimator of β in Eq. (4) is:
βIV =
E E
ytt +1 − ft ηt −2
[(
[(
)
]
ytt +2 − ytt +1 ηt −2
)
].
Eq. (3) shows that the denominator is simply −h2 ση2 , where ση2 = Var (ηt ). A.1. Forecaster targets the first estimate Suppose that ft = γ1 E ytt +1 |Ω t , where E ytt +1 |Ω t is the optimal forecast, defined below. When γ1 ̸ = (1, the ac-) tual forecast is non-optimal. We can evaluate E ytt +1 |Ω t as follows:
(
)
(
)
( ) ( ) = E ytt +2 |Ω t + E ηt |Ω t ( t +1 t ) ( ) ( ) = ρ E yt −1 |Ω + E vt |Ω t + E ηt |Ω t , ( ) and E vt |Ω t = 0. t t Given that Ω t includes ( ty+t1−1 as ) well as yt −2 , the minit mum MSE predictor of E yt −1 |Ω is given by: ( 1 t) E ytt + −1 |Ω ( ( 1 ) ) t t = ρ ytt −2 + γH ytt −1 − h ytt − −2 − yt −2 − ρ yt −2 E ytt +1 |Ω t
(
)
(see for example Kishor & Koenig, 2012), where γH = ( )−1 σv2 σv2 + σw2 . This weights the forecast optimally based on the last true value (the term ρ ytt −2 ) and the first estimate
1248
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249
of period t − 1, ytt −1 , allowing for the revisions to be autocorrelated. Then, the forecast error is given by: 1 t t ytt +1 − γ1 ρ 2 ytt −2 + ργH ytt −1 − h ytt − −2 − yt −2 − ρ yt −2
[
)
(
(
)] ( + E ηt |Ω t .
)
(A.1)
+ ηt = ρ + ρvt −1 + Next, substitute ( ) = vt + ηt and E ηt |Ω = h2 ηt −2 into Eq. (A.1): ytt +1 t
ytt +2
2 t yt −2
ρ 2 ytt −2 + ρvt −1 + vt + ηt [ ( ) ] −γ1 ρ 2 ytt −2 + ργH ytt −1 − hηt −2 − ρ ytt −2 + h2 ηt −2 . (A.2) Calculating the unconditional expectation of the product of Eq. (A.2) and Z t = ηt −2 results in: h2 ση2 − γ1 ργH E ytt −1 ηt −2 − ργH hση2 + h2 ση2 ,
[
(
)
]
which is the numerator ) βIV .] ) population [( t +1 value of ( t of the ηt −2 = = E y + η Substituting E y η t − 1 t − 2 t − 1 t − 1 ] ) [( E ρ ytt −2 + vt −1 + ηt −1 ηt −2 = hση2 , we obtain: h2 ση2 (1 − γ1 ) . Hence, β = 0 iff γ1 = 1, that is, when the forecast is efficient. A.2. Forecaster targets the second estimate Suppose now that ft = γ1 E ytt +2 |Ω t . The forecast error
(
)
is: ytt +1 − γ1 E ytt +2 |Ω t = ρ 2 ytt −2 + ρvt −1 + vt + ηt
)
(
[ ( )] −γ1 ρ 2 ytt −2 + ργH ytt −1 − hηt −2 − ρ ytt −2 , and: E
[( 2 t [ ( ρ yt −2 + ρvt −1 + vt + ηt − γ1 ρ 2 ytt −2 + ργH ytt −1 ] )]) −hηt −2 − ρ ytt −2 ηt −2 = h2 ση2 ,
so that β = −1. A.3. Inefficient forecasts In our setting, an example of an inefficient forecast would be using the latest data point ytt −1 as an estimate of the true value of yt −1 , and calculating the forecast as ˜ ft = ρ ytt −1 . Relative to the optimal forecast of the first estimate, ( ) given by ft = γ1 E ytt +1 |Ω t , ˜ ft neglects to combine ytt −1 with the model prediction. Notice though that ˜ ft is not optimal for the second estimate either. Suppose now that we estimate Eq. (4) using IV, with the first estimate as the actual value; then, the forecast error is: 1 ytt +1 − ˜ ft = ρ 2 ytt −2 + ρvt −1 + vt + ηt − ρ ytt + −1 ,
and E ytt +1 − ˜ ft ηt −2 = ση2 , so β = −h−2 ̸ = 0. Thus, the test will reject the null hypothesis that the forecaster is targeting the first estimate. In order for our tests to be informative about the data vintage being targeted by the forecaster, the forecaster must make efficient forecasts of the vintage being targeted, as is argued in the main text.
[(
)
]
References Ang, A., Bekaert, G., & Wei, M. (2007). Do macro variables, asset markets, or surveys forecast inflation better? Journal of Monetary Economics, 54(4), 1163–1212. Aruoba, S. (2008). Data revisions are not well-behaved. Journal of Money, Credit and Banking, 40, 319–340. Bomberger, W. (1996). Disagreement as a measure of uncertainty. Journal of Money, Credit and Banking, 28, 381–392. Capistrán, C., & Timmermann, A. (2009). Disagreement and biases in inflation expectations. Journal of Money, Credit and Banking, 41, 365–396. Clements, M. P. (2009). Internal consistency of survey respondents’ forecasts: Evidence based on the Survey of Professional Forecasters. In J. Castle, & N. Shephard (Eds.), The methodology and practice of econometrics. A festschrift in honour of David F. Hendry. Chapter 8 (pp. 206–226). Oxford: Oxford University Press. Clements, M. P. (2010). Explanations of the inconsistencies in survey respondents forecasts. European Economic Review, 54(4), 536–549. Clements, M. P. (2015). Are professional macroeconomic forecasters able to do better than forecasting trends? Journal of Money, Credit and Banking, 47(2–3), 349–381. Clements, M. P., & Galvão, A. B. (2017). Predicting early data revisions to us gdp and the effects of releases on equity markets. Journal of Business & Economic Statistics, 35(3), 389–406. Clements, M. P., & Galvão, A. B. (2018). Data revisions and real-time forecasting. The Oxford Research Encyclopedia of Economics and Finance, Oxford University Press. (Forthcoming). Coibion, O., & Gorodnichenko, Y. (2012). What can survey forecasts tell us about information rigidities? Journal of Political Economy, 120(1), 116–159. Coibion, O., & Gorodnichenko, Y. (2015). Information rigidity and the expectations formation process: a simple framework and new facts. American Economic Review, 105(8), 2644–2678. Corradi, V., Fernandez, A., & Swanson, N. (2009). Information in the revision process of real-time datasets. Journal of Business & Economic Statistics, 27, 455–467. Croushore, D. (1993). Introducing: The survey of professional forecasters (pp. 3–15). Federal Reserve Bank of Philadelphia Business Review. Croushore, D. (2011). Forecasting with real-time data vintages, Chapter 9. In M. Clements, & D. Hendry (Eds.), The Oxford handbook of economic forecasting (pp. 247–267). Oxford University Press. Croushore, D., & Stark, T. (2001). A real-time data set for macroeconomists. Journal of Econometrics, 105(1), 111–130. Davidson, R., & MacKinnon, J. (1993). Estimation and inference in econometrics. Estimation and inference in econometrics. Oxford: University of Oxford. Engelberg, J., Manski, C., & Williams, J. (2009). Comparing the point predictions and subjective probability distributions of professional forecasters. Journal of Business & Economic Statistics, 27(1), 30–41. Engelberg, J., Manski, C. F., & Williams, J. (2011). Assessing the temporal variation of macroeconomic forecasts by a panel of changing composition. Journal of Applied Econometrics, 26(7), 1059–1078. Faust, J., Rogers, J., & Wright, J. (2005). News and noise in G-7 GDP announcements. Journal of Money, Credit and Banking, 37(3), 403–417. Fixler, D., Greenaway-McGrevy, R., & Grimm, B. (2014). The revisions to GDP, GDI, and their major components. Survey of Current Business, August, 1–23. Fuller, W. (1977). Some properties of a modification of the limited information estimator. Econometrica, 45, 939–953. Giordani, P., & Söderlind, P. (2003). Inflation forecast uncertainty. European Economic Review, 47(6), 1037–1059. Howrey, E. (1978). The use of preliminary data in economic forecasting. The Review of Economics and Statistics, 60, 193–201. Jacobs, J., & van Norden, S. (2011). Modeling data revisions: measurement error and dynamics of ‘true’ values. Journal of Econometrics, 161, 101–109. Kishor, N., & Koenig, E. (2012). VAR estimation and forecasting when data are subject to revision. Journal of Business & Economic Statistics, 30(2), 181–190. Lahiri, K., & Sheng, X. (2008). Evolution of forecast disagreement in a bayesian learning model. Journal of Econometrics, 144(2), 325–340.
M.P. Clements / International Journal of Forecasting 35 (2019) 1240–1249 Landefeld, J. S., Seskin, E. P., & Fraumeni, B. M. (2008). Taking the pulse of the economy. Journal of Economic Perspectives, 22, 193–216. Mankiw, N. G., & Reis, R. (2002). Sticky information versus sticky prices: a proposal to replace the new keynesian phillips curve. Quarterly Journal of Economics, 117, 1295–1328. Mankiw, N. G., Reis, R., & Wolfers, J. (2003). Disagreement about inflation expectations. Cambridge MA: National Bureau of Economic Research. Mankiw, N., & Shapiro, M. (1986). News or noise: An analysis of GNP revisions. Survey of Current Business (May 1986) (pp. 20–25). US Department of Commerce, Bureau of Economic Analysis. Manski, C. F. (2017). Survey measurement of probabilistic macroeconomic expectations: progress and promise. In NBER chapters, NBER macroeconomics annual 2017: Vol. 32. National Bureau of Economic Research, Inc, https://ideas.repec.org/h/nbr/nberch/13907.html. Mincer, J., & Zarnowitz, V. (1969). The evaluation of economic forecasts. In J. A. Mincer (Ed.), Economic forecasts and expectations: Analysis of forecasting behavior and performance (pp. 3–46). New York: National Bureau of Economic Research. Patton, A. J., & Timmermann, A. (2007). Testing forecast optimality under unknown loss. Journal of the American Statistical Association, 102, 1172–1184. Patton, A. J., & Timmermann, A. (2010). Why do forecasters disagree? Lessons from the term structure of cross-sectional dispersion. Journal of Monetary Economics, 57(7), 803–820. Rich, R., & Butler, J. (1998). Disagreement as a measure of uncertainty: A comment on Bomberger. Journal of Money, Credit and Banking, 30, 411–419. Rich, R., & Tracy, J. (2010). The relationships among expected inflation, disagreement, and uncertainty: Evidence from matched point and density forecasts. Review of Economics and Statistics, 92(1), 200–207.
1249
Sims, C. A. (2003). Implications of rational inattention. Journal of Monetary Economics, 50, 665–690. Stock, J. H., Wright, J. H., & Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics, 20(4), 518–529. Woodford, M. (2002). Imperfect common knowledge and the effects of monetary policy. In P. Aghion, R. Frydman, J. Stiglitz, & M. Woodford (Eds.), Knowledge, information, and expectations in modern macroeconomics: in honor of Edmund Phelps (pp. 25–58). Princeton University Press. Wright, J. (2013). Evaluating real-time VAR forecasts with an informative democratic prior. Journal of Applied Econometrics, 28, 762–776. Zarnowitz, V. (1969). The new ASA-NBER survey of forecasts by economic statisticians. The American Statistician, 23(1), 12–16. Zarnowitz, V., & Lambros, L. (1987). Consensus and uncertainty in economic prediction. Journal of Political Economy, 95(3), 591–621.
Michael P. Clements is Professor of Econometrics at the ICMA Centre, Henley Business School, University of Reading, and an associate member of the Institute for New Economic Thinking at the Oxford Martin School, University of Oxford. He obtained a DPhil in Econometrics from Nuffield College, University of Oxford before moving to Warwick University Economics Department as a Research Fellow, and moved to Reading in 2013. He has been an editor or associate editor of the International Journal of Forecasting since 2001, and is currently Series Editor of the Palgrave Texts in Econometrics and Palgrave Advanced Texts in Econometrics.