Forecasting GDP growth with NIPA aggregates: In search of core GDP

Forecasting GDP growth with NIPA aggregates: In search of core GDP

International Journal of Forecasting 35 (2019) 1814–1828 Contents lists available at ScienceDirect International Journal of Forecasting journal home...

1MB Sizes 0 Downloads 29 Views

International Journal of Forecasting 35 (2019) 1814–1828

Contents lists available at ScienceDirect

International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast

Forecasting GDP growth with NIPA aggregates: In search of core GDP Christian Garciga, Edward S. Knotek II



Federal Reserve Bank of Cleveland, United States

article

info

Keywords: Forecasts GDP GDI Real-time data Survey of professional forecasters Consumption

a b s t r a c t In addition to GDP, which is measured using expenditure data, the U.S. national income and product accounts (NIPAs) provide a variety of measures of economic activity, including gross domestic income and other aggregates that exclude one or more of the components that make up GDP. Similarly to the way in which economists have attempted to use core inflation—which excludes volatile energy and food prices—to predict headline inflation, the omission of GDP components may be useful in extracting a signal as to where GDP is going. We investigate the extent to which these NIPA aggregates constitute ‘‘core GDP.’’ In an out-of-sample forecasting exercise using a novel real-time dataset of NIPA aggregates, we find that consumption growth and the growth of GDP excluding inventories and trade have historically outperformed a canonical univariate benchmark for forecasting GDP growth, suggesting that these are promising measures of core GDP growth. © 2019 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction Real gross domestic product (GDP) is a comprehensive measure of an economy’s output. In the United States, the Bureau of Economic Analysis (BEA) publishes estimates of GDP in the National Income and Product Accounts (NIPAs). These estimates are derived using expenditure data: real GDP is the chain-weighted sum of the components consumption, investment, government, and net exports; see Landefeld, Seskin, and Fraumeni (2008). These components in turn can be separated into subcomponents; e.g., investment comprises nonresidential fixed investment, residential investment, and private inventory investment. The BEA calculates and publishes a number of NIPA aggregates, which are combinations that typically omit one or more of the components or subcomponents of GDP.1 While the expenditure-based GDP measure ∗ Correspondence to: Research Department, Federal Reserve Bank of Cleveland, P.O. Box 6387, Cleveland, OH 44101-1387, United States. E-mail address: [email protected] (E.S. Knotek II). 1 While some economists may view consumption and investment as ‘‘aggregates’’, our choice of terminology in calling combinations of

receives a considerable amount of attention, the NIPAs also contain estimates of the theoretically-equivalent income-based measure of the economy’s output—gross domestic income (GDI)—and the average of GDP and GDI. Similarly to the way in which economists have attempted to use core inflation—which excludes the volatile food and energy prices—to predict headline inflation, the omission of components or subcomponents of GDP in these NIPA aggregates may be useful in extracting the signal as to where GDP is going.2 For example, because inventory investment can be volatile from one quarter to the next, a measure of GDP that excludes the change in inventories may be helpful for forecasting GDP. In practice, economists appear to have drawn conclusions about which aggregates most closely resemble overall GDP and components ‘‘NIPA aggregates’’ is in line with the usage by BEA; see in particular pages 2–11 through 2–14 in Bureau of Economic Analysis (2017). 2 Aruoba, Diebold, Nalewaik, Schorfheide, and Song (2012, 2016) posit that the GDP is a noisy measure of the true output growth; by extension, NIPA aggregates may be useful in extracting the true underlying state of the economy.

https://doi.org/10.1016/j.ijforecast.2019.03.024 0169-2070/© 2019 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

therefore are most likely to be helpful for forecasting on the basis of the in-sample fit using the most revised data; see e.g. Council of Economic Advisers (2015) and Kawa (2017). This paper considers both in-sample and out-ofsample exercises in an attempt to investigate the extent to which NIPA aggregates constitute ‘‘core GDP’’. We undertake two in-sample exercises. First, we conduct LASSO regressions that rank the strengths of the relationships between the six main components and subcomponents of the expenditure approach to constructing GDP—consumption, residential investment, business fixed investment, inventory investment, trade, and government spending—and contemporaneous and future GDP growth. In general, the LASSO regressions reveal relatively strong relationships between GDP and consumption, residential investment, and business fixed investment, suggesting that these components and aggregates thereof are important for thinking about what constitutes the core of GDP. Second, we borrow from the core inflation literature and look at the ability of NIPA aggregates to approximate the trend in GDP growth. We show that consumption growth by itself and the growth of GDP excluding inventories and trade both perform relatively well as measures of core GDP growth based on this metric. We consider it necessary to use real-time data in order to conduct out-of-sample forecasting exercises. While real-time vintages for real GDP and real personal consumption expenditures (PCE) are available readily from real-time data repositories, a major contribution of the paper is to compile, or, in some cases, reconstruct, realtime vintages for NIPA aggregates. In addition to GDP and PCE, we also collect real-time vintage data on the growth rates of: GDI; the average of GDP and GDI (GDA); real final sales of domestic product; real final sales to domestic purchasers, which we refer to as domestic final purchases (DFP); and real final sales to private domestic purchasers, which we refer to as private domestic final purchases (PDFP). We format the vintages of real-time data to match the formatting of other real-time data repositories, ready for others to use in their research. Using these real-time data vintages, we assess the ability of NIPA aggregates to forecast GDP growth out-ofsample. Our idea of considering a range of NIPA aggregates in a real-time, out-of-sample forecasting exercise for GDP growth is novel. Despite a great interest in forecasting GDP growth and a large body of research, the survey article by Chauvet and Potter (2013) shows that, in general, a univariate AR(2) process is a difficult forecasting benchmark to beat.3 Using simple forecasting techniques, we show that, historically, a variety of forecasts generated using NIPA aggregates have been more accurate than the forecasts from the AR(2) benchmark. The results are strongest for PCE and DFP—which is the sum of PCE, nonresidential fixed investment, residential investment, and government purchases—over horizons of one to three 3 Faust and Wright (2009) document that the GDP growth forecasts in the Greenbook projections of the Federal Reserve Board of Governors’ staff have been more accurate historically than forecasts made with recursive and direct AR(4) models for the jumping-off (i.e., nowcast) quarter, but that the advantage over the AR(4) models disappears thereafter.

1815

quarters. The gains in forecasting accuracy are especially visible during recessions, although we document modest gains during expansions as well. Overall, our analysis finds that PCE growth is a useful measure of core GDP growth that specifically helps in predicting future GDP growth, while DFP is not far behind. Our results also illustrate a key limitation of GDI. Nalewaik (2010) advocates for GDI as a more compelling measure of economic output than GDP, and documents that it also has explanatory power for GDP in real time, and Aruoba et al. (2012, 2016) and Nalewaik (2012) provide further support for GDI. However, when asking and answering a different question, our results indicate that the ability of GDI to forecast GDP is hampered by data release lags. 2. Real-time data vintages We compile or construct real-time vintage data for seven quarterly series in the NIPAs; see the Bureau of Economic Analysis (2017) for more details on the series. We do this using data from the Federal Reserve Bank of St. Louis’ Archival Economic Data (ALFRED) online database; the Federal Reserve Bank of Philadelphia’s online Real-Time Data Set for Macroeconomists (RTDSM), as documented by Croushore and Stark (2001); the BEA Data Archive: National Accounts (NIPA); historical print issues of the BEA’s monthly Survey of Current Business, available on the Federal Reserve Bank of St. Louis’ FRASER website; and Nalewaik (2012). The seven series are: 1. Real gross domestic product (GDP). 2. Real gross domestic income (GDI). 3. The average of GDP and GDI, which we denote by GDA (for ‘‘average’’). The BEA defines GDA as the arithmetic average of nominal GDP and nominal GDI, divided by the implicit GDP deflator. The BEA began publishing this measure officially in July 2015 with the release of the first or ‘‘advance’’ estimate of second quarter GDP for 2015, although in practice economists could have been calculating it themselves at any point before then; in fact, see the discussion following the article by Nalewaik (2010) for evidence that this was being done in practice.4 4. Real final sales of domestic product (FSDP). The BEA defines this series as GDP minus the change in private inventories; alternatively, it can be built up as the sum of PCE, nonresidential fixed investment, residential investment, government spending, and net exports.5 5. Real domestic final purchases (DFP). Formally referred to by the BEA as real final sales to domestic purchasers, this aggregate is GDP minus the change in private inventories and net exports; alternatively, it can be built up as the sum of PCE, nonresidential fixed investment, residential investment, and government spending. 4 See Fixler, Greenaway-McGrevy, and Grimm (2014) for an analysis of combinations of GDP and GDI, as well as the research cited below. 5 For the sake of exposition in the text, we abstract from chainweighting when summing series to form aggregates, but we follow BEA practice as necessary when constructing the measures.

1816

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

6. Real private domestic final purchases (PDFP). Formally referred to by the BEA as real final sales to private domestic purchasers, but also referred to as PDFP (see Council of Economic Advisers, 2015) or private final domestic purchases (PFDP) (see, e.g., Federal Reserve Board of Governors, 2006), this aggregate is GDP minus the change in private inventories, trade, and government spending; alternatively, it is the sum of PCE, nonresidential fixed investment, and residential investment. As with GDA, the BEA did not begin publishing this series officially until July 2015, although economists could have been calculating it themselves at any point before then, in an effort to remove the volatility in GDP that arises from inventories, trade, and government spending. 7. Real personal consumption expenditures (PCE). The real-time data vintages that we construct are similar to those in ALFRED. The appendix provides a detailed accounting of the way in which we compiled or constructed vintages of each aggregate. The first row in our dataset contains vintage dates, which identify when the NIPA data releases occurred. This convention allows us to generate forecasts using the vintage of data that would have been available at any point in the past. The first data vintage for real GDP in ALFRED is December 4, 1991, while most of the other series have earlier vintage dates.6 The last real-time data vintage comes from the NIPA release on September 28, 2017, with the third release of 2017:Q2 data. For our core GDP exercises in the following section, we also collect the ‘‘current’’ vintage of data, available as of November 7, 2018. We compute quarterly annualized growth rates using the standard BEA formula as g(Xt ) = 100 ×[(Xt /Xt −1 )4 − 1], where Xt is the level of the series in quarter t. Most of the data vintages have their first growth rate observation in 1947:Q2.7 While GDI, GDA, and PCE differ from the other aggregate measures that we consider in their construction, we include them under the rubric of ‘‘NIPA aggregates’’ for the sake of expositional simplicity. 3. What constitutes the core of GDP? Because some of our aggregates omit relatively volatile components or subcomponents from GDP, our study has parallels to parts of the inflation forecasting literature that have examined the ability of ‘‘core’’ inflation measures— which can be interpreted broadly as inflation measures that exclude food and energy prices, focus on price changes in the center of the monthly distribution by using medians or trimmed means, or omit other components

than food and energy—to predict headline inflation. The results from that literature have been mixed, with some studies reporting that the forecasts of headline inflation from various core measures are relatively more accurate than those from headline inflation itself, while others find little forecasting benefit from core measures vis-àvis headline inflation; see Bryan and Cecchetti (1994), Crone, Khettry, Mester, and Novak (2013), Meyer and Pasaogullari (2010), Meyer and Venkatu (2014) and Smith (2004).8 However, the definition of what comprises ‘‘core’’ inflation across these and other related studies varies. Thus, before proceeding to formal out-of-sample forecasting exercises for GDP growth, we consider two exercises to shed light on what constitutes the ‘‘core’’ of GDP. In the first exercise, we use least absolute shrinkage and selection operator (LASSO) regressions to assess the abilities of the major components and subcomponents of the expenditure approach to defining GDP to explain and predict GDP growth. In particular, GDP is often written as the sum of consumption, residential investment, business fixed investment, inventory investment, government purchases, and net exports. Let Yt be the quarterly annualized GDP growth at time t and Xt be a vector of the quarterly annualized growth rates of these six components.9 We consider LASSO regressions for both the contemporaneous relationship between Xt and Yt and the relationships between the current values of the components (Xt ) and future GDP growth (Yt +1 to Yt +3 ). We estimate each of the four LASSO regressions recursively on expanding windows of data, with the first estimates using data from 1964:Q2 to 1992:Q1 and each subsequent estimation using an additional quarter until the sample extends from 1964:Q2 to 2017:Q2. In each estimation, we recover the coefficients conditional on the penalty size, which starts at zero and increases in steps of 0.001 until the coefficients on all of the components of Xt are shrunk to zero. As we increase the size of the penalty, we record the order in which the coefficients on the components in Xt are shrunk to zero. Within each estimation, the predictor variable whose coefficient goes to zero first is assigned a rank of six, while the last remaining variable with a non-zero coefficient is assigned a rank of one. The ranks are then averaged across estimation samples for each of the six components within each of the five different variations of the LASSO regression. Table 1 shows the results of this exercise. Consumption and residential investment generally exhibit the longest

6 The earliest GDI and GDA vintages are from January 29, 1992. The earliest vintages for FSDP, DFP, and PDFP are from August 19, 1965, while the earliest vintage for PCE is from December 21, 1958. 7 Because the levels of most series begin in 1947:Q1, the growth

8 The ability of core inflation to forecast headline inflation is related to a broader body of literature on the usefulness of disaggregates for forecasting an aggregate; see e.g. Hendry and Hubrich (2011) or Lütkepohl (2006). While much of this literature has focused on inflation, recent work on nowcasting suggests the potential for gains from aggregating across component nowcasts when generating both inflation nowcasts (see Knotek II & Zaman, 2017) and GDP nowcasts (see Foroni & Marcellino, 2014; Higgins, 2014). 9 All components are given in real terms, and for the sake of

rates begin in Q2. Complete time series data extending all the way back to 1947:Q1 were not always available, with gaps at the beginning of some series, especially around comprehensive revisions. In such cases, we make the reasonable assumption that forecasters would have backfilled the missing growth rate data temporarily using the data that had been available prior to the revision.

simplicity we use current-vintage data for this exercise. For inventories and trade, we use the change in the change in inventories and 400 times the log of exports divided by imports, respectively. For each regression, we first standardize the variables in Xt by demeaning and dividing by the standard deviation. We also include an intercept in Xt that is not subject to the LASSO penalty.

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

1817

Table 1 Average survival rank in a LASSO regression with an increasing penalty. Lead of GDP

Consumption

Residential investment

Business fixed investment

Inventories

Government

Net exports

t t +1 t +2 t +3

1 2 1.8 2.4

4.9 1 1.2 1

2 3.1 4.6 2.6

3 4 4.9 4.6

6 5.1 4.5 5.9

4.1 5.9 4.1 4.4

Notes: The results are averages based on recursive estimates obtained using an expanding window, with the first estimation sample spanning the period 1964:Q2 to 1992:Q1 and the last estimation sample spanning the period 1964:Q2 to 2017:Q2, using current-vintage data. The penalty starts at zero and increases in 0.001 increments until all coefficients have been shrunk to zero. Rank = 1 implies the longest survival on average, and rank = 6 implies the shortest survival on average.

survivals, followed by business fixed investment.10 These results suggest that the current period’s consumption growth or the growth of PDFP—which consists of consumption, residential investment, and business fixed investment—are important parts of core GDP and may be useful for forecasting GDP growth. In contrast, inventories, government spending, and net exports exhibit the shortest survivals on average, suggesting that they are relatively less important components when thinking about the core of GDP, with the exception that inventories are relatively important in the nowcast (time t) quarter. The second exercise borrows directly from the core inflation literature and considers the abilities of NIPA aggregates to capture the trend in GDP growth. If GDP growth exhibits transitory fluctuations around a trend, then a measure of core GDP growth that behaves similarly to the trend in GDP growth may prove a useful predictor. Bryan, Cecchetti, and Wiggins II (1997) use a 36-month centered moving average to capture the inflation trend when evaluating measures of core inflation; with current-vintage quarterly data, we use a 13-quarter centered moving average to approximate the trend in GDP growth. Table 2 shows the root mean squared errors (RMSEs) for the quarterly annualized growth rates of the six NIPA aggregates from the previous section compared with the GDP growth trend, plus the RMSE for contemporaneous GDP growth with its own trend. The results are similar qualitatively whether we look over a longer sample starting in 1964 and ending in 2017 or a shorter sample starting in 1992. In both cases, consumption growth (PCE) registers the smallest RMSEs, followed by DFP, which is GDP excluding inventories and trade. While the PDFP aggregate appeared promising based on the LASSO results, it fares poorly by this measure, and the results for GDI show little improvement over the contemporaneous value of GDP itself for capturing the trend in GDP growth.11 Taken together, these two exercises suggest that consumption growth by itself is one of the most reliable measures of core GDP growth—an intuitive result given that consumer 10 The ability of residential investment to serve as a predictor for GDP growth is in line with the empirical findings of Aastveit, Anundsen, and Herstad (2019) regarding its ability to predict recessions. 11 With the exception of those for consumption growth, the squared errors for most of the aggregates compared with the GDP growth trend were very large during 2008:Q4 and 2009:Q1. However, excluding these quarters from the RMSE calculations does not change the results qualitatively.

spending comprises roughly two-thirds of GDP, although broader aggregates such as DFP growth or PDFP growth deserve further exploration.12 4. Forecasting methodology We build on the in-sample findings by evaluating the ability of NIPA aggregates, in combination with simple forecasting methods, to forecast quarterly annualized real GDP growth out-of-sample using real-time data. When using real-time data, the timing of data releases is important, and some of the NIPA aggregates that we consider have longer publication lags than others. We capture the importance of the available information set by conducting two separate forecasting exercises involving the recursive generation of forecasts using the same information sets that professional forecasters participating in either the Survey of Professional Forecasters (SPF) or the Blue Chip Economic Indicators (BC) survey would have had available. The first exercise uses the SPF information sets. The SPF survey is conducted once per quarter, in the first half of the middle month of the quarter.13 We take the forecast origin in each quarter to be the closing date for submissions to that quarter’s SPF release. By this point in quarter t, we generally have the ‘‘advance’’ (i.e., first) release of GDP for quarter t − 1, along with many NIPA aggregates. However, publication delays in GDI, and thus GDA, mean that the last observation for these variables is usually from quarter t − 2.14 In addition, the BEA makes the advance estimate using incomplete source data. Hence, this exercise implicitly allows us to examine the effects of measurement errors in the initial estimates of GDP and many NIPA series compared with the publication lags in GDI and GDA on the abilities of these aggregates to forecast real GDP growth. The second exercise uses BC information sets. The BC survey is released three times each quarter on the 10th 12 An aggregate that combines consumer spending and residential investment would appear promising based on the LASSO results. Because the BEA does not publish such an aggregate series, we leave the exploration of it for future research. 13 The SPF survey and release dates are available from the Federal Reserve Bank of Philadelphia’s website at https://www.phil.frb.org//media/research-and-data/real-time-center/survey-of-professionalforecasters/spf-release-dates.txt?la=en. 14 While this delay is the normal pattern, there are a few exceptions in our dataset where GDI was available at this point in the quarter.

1818

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828 Table 2 Root mean squared errors for quarterly annualized NIPA aggregate growth v. the GDP growth trend. Sample

GDP

GDI

GDA

FSDP

DFP

PDFP

PCE

1964:Q2–2017:Q1 1992:Q1–2017:Q1

2.94 2.02

2.83 2.42

2.70 1.95

3.21 2.08

2.43 1.63

3.07 2.14

2.34 1.53

Notes: The RMSEs are in percentage points. The results are based on current-vintage data. The GDP growth trend is calculated using a centered 13-quarter moving average. GDI is gross domestic income; GDA is the average of GDP and GDI; FSDP is final sales of domestic product, or GDP minus the change in inventories; DFP is domestic final purchases, or FSDP minus net exports; PDFP is private domestic final purchases, or DFP minus government; and PCE is personal consumption expenditures.

of each month, with the surveys being conducted in the previous week.15 Following Chauvet and Potter (2013), we produce out-of-sample forecasts in the first month of each quarter using the data that would have been available to Blue Chip survey contributors when submitting their forecasts. At this point in quarter t, we have available the BEA’s third NIPA release with GDP and GDI for quarter t − 2. Thus, this timing allows us to assess the forecasting ability of NIPA aggregates conditional on using the BEA’s third estimates—which include more complete source data than are included in the advance release—through the same final quarter for all series. We look at the ability of each NIPA aggregate to generate out-of-sample forecasts using real-time data for 1- to 3-step-ahead GDP growth.16 We focus on relatively simple forecasting methods and time series models in order to let the information in the NIPA aggregates speak. The contributions of the data versus the model become less clear as we use more complicated forecasting models and methodologies. While this is a potential limitation of our analysis, the principle of parsimony tends to be successful in a variety of forecasting contexts.17 For the sake of exposition, assume that, when making a forecast in quarter t, the information set includes the most recently available observation on GDP growth from quarter t −j, g(GDP t −j ), while the most recently available observed growth rate for the aggregate X is from quarter τ = t − k, where k ≥ j: g(Xτ ). We consider four basic forecasting methods: 1. RW(p): Our first method takes the flavor of a random walk. The forecast for quarterly GDP growth going forward, which we denote by a hat, ‘‘∧ ’’, is the average of the most recent p observations on the growth of the aggregate X :

ˆ g(GDP t −j+i ) =

p−1 1∑

p

g(Xτ −h ), i = 1, 2, 3, 4.

(1)

h=0

15 The BC survey is conducted over a two-day period. If it is reported in the release, we use the closing date for submissions. In cases where the BC survey date is not available, we follow Knotek II and Zaman (2017) and assume that the survey date was the first Thursday of the month, unless that is the first day of the month, in which case we assume that the survey date was the first Tuesday of the month. 16 While these results are not reported here, we found our measures to have essentially no ability to outperform univariate GDP growth forecasts from the 4-step horizon onward, consistent with the findings of Chauvet and Potter (2013) and other studies such as that of Bańbura, Giannone, Modugno, and Reichlin (2013). 17 See for example Chauvet and Potter (2013) and Edge, Kiley, and Laforte (2010) for GDP, and Atkeson and Ohanian (2001) and Faust and Wright (2013) for inflation.

If p = 1, a forecaster would simply predict future GDP growth by reading the most recentlyavailable growth rate of the aggregate X from the NIPA release. 2. RW-AR(p): The second method estimates an AR(p) model using the growth in the aggregate X until time τ : g(Xτ ) = a +

p ∑

bh g(Xτ −h ) + eτ .

(2)

h=1

We estimate and then use Eq. (2) iteratively to forecast the growth of the aggregate X at time t −j+ ˆ t −j+i ), for i = 1, 2, 3, 4. Then, as in Eq. (1), we i, g(X treat that forecast as the forecast of GDP growth: ˆ ˆ t −j+i ). g(GDP t −j+i ) = g(X 3. Direct(p): The third method makes direct i-stepahead forecasts for GDP growth using p lags of the aggregate X. We estimate regressions of the form: g(GDPτ +i ) = a +

p ∑

bh g(Xτ +1−h ) + eτ +i , i = 1, 2, 3, 4,

h=1

(3) and use the estimated coefficients from Eq. (3) to ˆ generate forecasts for g(GDP t −j+i ). 4. VAR(p): The fourth method uses bivariate vector autoregressions in GDP growth and the growth of the aggregate X with p lags. We use equationby-equation OLS to estimate VARs of the form:

[

g(GDPτ ) g(Xτ )

] = A+

p ∑ h=1

[ Bh

g(GDPτ −h ) g(Xτ −h )

]

+eτ , (4)

and use the estimated coefficients from Eq. (4) to ˆ generate forecasts of g(GDP t −j+i ) iteratively for i = 1, 2, 3, 4. Implicitly, the RW(p) and RW-AR(p) methods assume that the growth rate of aggregate X is similar to the underlying growth rate of GDP and that forecasts can benefit from considering this relationship, while the direct(p) and VAR(p) methods are able to capture persistent differences in growth rates. For the RW-AR(p), direct(p), and VAR(p) models, we estimate the parameters of the respective model using expanding windows with the first data point in 1964:Q2. We consider p lags between 1 and 4. For our purposes, we will refer to a given combination of a forecasting method, choice of lag length p, and NIPA aggregate as a ‘‘model’’. For each aggregate, we also report

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

results from simple model averaging across all of the forecasting methods and lag lengths. We assess our models’ forecasting performances by examining root mean squared forecast errors (RMSFEs) relative to the benchmark used by Chauvet and Potter (2013): an AR(2) model with an expanding estimation window starting in 1964:Q2. Edge et al. (2010) also employ an AR(2) benchmark for GDP growth; D’Agostino, Giannone, and Surico (2006) document that it is difficult to improve upon the GDP growth forecasts from simple benchmarks in the post-1985 period, a finding that is echoed implicitly in the exercises of Faust and Wright (2009) when examining the accuracy of the Federal Reserve Board of Governors’ staff Greenbook forecasts for GDP growth. As in all studies using real-time data, we need to take a stand on what constitutes the ‘‘true’’ GDP growth rate for measuring the forecasting accuracy, and we chose to use the third NIPA release for a given quarter from the RTDSM; Tulip (2009) provides a justification for using this vintage as the ‘‘truth’’. The evaluation period is 102 quarters long, beginning in 1992:Q1 and ending in 2017:Q2.18 Furthermore, like Chauvet and Potter (2013), we look at forecasting performances based on the entire sample, the subsample including only expansions (as defined by the National Bureau of Economic Research’s (NBER) Business Cycle Dating Committee), and the subsample including only recessions. We stress that NBER recessions and expansions are not known in real time, but examining the results conditional on the business cycle augments our understanding of what drives the full-sample, true real-time results. Our evaluation sample includes two recessions that lasted a combined eleven quarters: the 2001:Q1–2001:Q4 recession and the 2007:Q4–2009:Q2 recession. We examine predictive ability conditional on being in an economic expansion by removing all forecasting errors that correspond to quarters in which the U.S. economy was in recession, and follow a similar procedure when we condition on being in a recession. For models with relative RMSFEs of less than one, we assess the statistical significance using the Diebold and Mariano (1995) (DM) test for equal predictive accuracy between a given forecast and the forecast from the AR(2) benchmark with the small-sample correction of Harvey, Leybourne, and Newbold (1997), using the MSE as the metric and two-sided t-statistic tests.19 18 Following the BEA, we use fixed-weight real GDP at the beginning of the evaluation period and chain-weighted real GDP from 1996 onward. 19 Clark and McCracken (2009) discuss issues with tests of equal forecast accuracy when using real-time data, and hence, we view these DM test statistics as approximations. The appendix to the working paper version of this paper presents results from the directional forecast accuracy approach of Pesaran and Timmermann (2009) and shows success ratios that measure the frequencies with which, for ˆ a given forecast horizon i, the forecast, g(GDP t −j+i ) and the third release g(GDP t −j+i ) are both either greater than or less than the most recently available real-time GDP growth rate at the time of the forecast, g(GDP t −j ). We document a large number of cases in which the success ratios from our NIPA aggregate models are greater than those from the AR(2) benchmark.

1819

5. Survey of professional forecasters exercise results The SPF surveys are conducted in the first half of the second month of each quarter. At that point, the first estimate of GDP growth is available for the previous (t −1) quarter. The first forecast of the quarterly annualized real GDP growth in each survey is for the quarter t in which the survey is released (i.e., the nowcast). Our exercise considers forecasts for quarters t, t + 1, and t + 2.20 Using matched SPF information sets, Table 3 presents relative RMSFEs, with the AR(2) benchmark in the denominator, for the full sample results across all model specifications and forecast horizons of one to three steps ahead. We shade entries that outperform the benchmark, with darker shades implying lower relative RMSFEs. We note three findings. First, NIPA aggregates are useful for generating forecasts of GDP growth that outperform the AR(2) benchmark over short horizons, although relatively few of the improvements in forecast accuracy are statistically significant at the 10% level based on the DM test. At the one-step horizon, using PCE, DFP, or PDFP produces lower RMSFEs than the benchmark model across nearly all methods. Based on model averaging, the RMSFE gains are 7% for PCE and DFP and 5% for PDFP. At the two-step horizon, forecasting gains remain widespread among these aggregates but are smaller: model averaging produces RMSFE gains over the AR(2) benchmark of 5% for PCE, 3% for DFP, and 2% for PDFP. At the three-step horizon, PCE, PDFP, and DFP continue to achieve improvements over the AR(2) benchmark, but the magnitude of the gains has decreased again to between 1% and 2%.21 These forecasting results reinforce our earlier findings that PCE growth, DFP growth, and PDFP growth possess properties that make them good measures of core GDP growth. Second, the publication lag in GDI compared with GDP and the other NIPA aggregates in this exercise renders the forecasting accuracy of GDI- and GDA-based models nearly uniformly worse than that of the AR(2) benchmark for forecasting future GDP growth. This is true even though the AR(2) benchmark is using the imprecise advance GDP release for quarter t − 1 to inform future forecasts iteratively. Third, the RMSFEs from the SPF median forecasts are smaller than those from the models using NIPA aggregates and the AR(2) benchmark, but the gains relative to the benchmark are statistically significant at the 10% level based on the DM test only at the 1-step (nowcast) horizon. In comparison, nowcasting models that use mixed-frequency or higher-than-quarterly frequency data 20 As an example, the 2016:Q1 SPF survey was conducted in February 2016. At that time, the advance estimate of 2015:Q4 GDP was available, as were estimates for most of the other NIPA aggregates that we consider; however, the GDI and GDA were only available through 2015:Q3. The one- to three-step-ahead forecasts were for 2016:Q1, 2016:Q2, and 2016:Q3. 21 Meaningful forecast improvements disappear at longer forecast horizons, leaving the forecasts that use NIPA aggregates about as good (or, depending on one’s perspective, about as bad; see Edge & Gürkaynak, 2010) as those from the benchmark.

1820

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828 Table 3 Relative RMSFEs for real GDP forecasts at SPF survey dates, full sample.

Notes: The AR(2) benchmark reports RMSFEs for quarterly annualized real GDP growth; all other statistics are relative to the AR(2) benchmark. For model relative RMSFEs that are less than one (shaded), *, **, and *** denote rejections of the null hypothesis of equal predictive accuracy of the model and the benchmark at the 10%, 5%, and 1% levels, respectively, based on the Diebold and Mariano (1995) test with the small-sample correction of Harvey et al. (1997). See Table 2 for the definitions of the variables. The forecasting methods are described in Section 4.

and evaluation periods similar to ours—such as the dynamic factor models of Bańbura et al. (2013) and Liebermann (2014), and the mixed-frequency models estimated with Bayesian methods and featuring stochastic volatility of Carriero, Clark, and Marcellino (2015)—produce nowcast RMSFEs that are comparable to, but greater than, those from SPF. Even though we are only using quarterly data on NIPA aggregates for near-term forecasting, the combination of these aggregates with simple forecasting tools helps to close the gap between the quarterly AR(2) benchmark and the nowcasts from SPF that incorporate professionals’ judgment and high-frequency data. Chauvet and Potter (2013) show that the relative forecasting abilities of competing models vary over the business cycle. Table 4 examines this issue by presenting the RMSFEs conditional on a given forecast quarter being part of an expansion, as defined by the NBER.22 While we document a number of cases in which NIPA aggregates and simple forecasting models improve upon the AR(2) benchmark during expansions, these improvements are more model-specific than in the full sample. For example, the RW-AR(1) model outperforms the benchmark for all measures except for PDFP at the 1-step horizon and for all measures at the 2-step horizon, with small gains from three of the six measures at the 3-step horizon. The VAR(1) performs well across all measures at the 2-step horizon. These highly parsimonious models appear to do well for the same reason that the AR(2) benchmark does well: in normal times, both the aggregates and GDP growth tend to exhibit transitory deviations from trend. In looking across the NIPA aggregates, PCE appears to be the most useful during expansions; together with a RW-AR(p), it offers the largest improvements in forecast 22 Again, we stress that these results are calculated on an ex post basis, conditional on knowing the NBER-defined state of the business cycle. In this sense, they are useful in explaining which episodes drive our full-sample results.

accuracy over the benchmark—4% to 5% at the 1-step horizon and 3% to 4% at the 2-step horizon. When conducting model averaging, it is the only measure with a relative RMSFE of less than 1. While GDI and GDA suffer from release lags in the full-sample results, there are some cases in which they outperform the benchmark conditional on the economy being in an expansion. This finding essentially says that old information can still be relevant, provided that the economy does not suddenly turn down. Finally, while SPF outperforms the benchmark in the nowcast quarter during expansions, it underperforms the benchmark at other horizons. Notably, using PCE with the RW-AR methods produces forecasts that essentially match SPF at the 1-step horizon and outperform the SPF forecasts at the 2- and 3-step horizons. Table 5 displays RMSFEs conditional on a given forecast quarter being part of an NBER-defined recession. With only eleven recession quarters, inference is based on a small sample. Nevertheless, the table shows that PCE, PDFP, and DFP all outperform the AR(2) benchmark across most methods at all horizons conditional on the economy being in a recession, with the relative RMSFEs from model averaging being well below one. The largest improvements in forecast accuracy come from the simplest RW(1) method for using PCE, PDFP, and DFP—in which the forecaster would simply read off the growth rates for the aggregate from the NIPA release and use those values as the forecast for GDP growth—with RMSFE gains near 40% at the 1-step horizon and 25% at the 2- and 3-step horizons. Because the economy behaves in fundamentally different ways during recessions and expansions, the models that work well in expansions and feature transitory deviations from trend perform poorly during recessions.23 In contrast, the RW(1) 23 This is true for the benchmark as well: as is indicated in Tables 4 and 5, the RMSFEs from the AR(2) model are about twice as large during recessions as during expansions.

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

1821

Table 4 Relative RMSFEs for real GDP forecasts at SPF survey dates, expansion-only sample.

Notes: See the notes to Table 3.

Table 5 Relative RMSFEs for real GDP forecasts at SPF survey dates, recession-only sample.

Notes: See the notes to Table 3.

models capture the weakness most efficiently and extrapolate it best over the remainder of the recession, a point that we return to below. Conditioning on recession quarters, historically the SPF forecasts have been more accurate than both those of the AR(2) benchmark and our model-based forecasts 1 and 2 steps ahead, but by the 3-step horizon a random walk in either DFP or PDFP is substantially more accurate. Comparing the split sample and full sample results, the relative strength of PCE, DFP, and PDFP in the full sample is driven in part by the relative forecasting performances of these aggregates during recessions. Conversely, GDI and GDA do poorly in the full sample because their lagged releases relative to both the survey dates and the other series cause them to miss turning points. The relatively strong forecasting performance of SPF comes from handily outperforming the benchmark in and around recessions and being about on a par with it—outside the nowcast quarter—during expansions.

6. Blue chip exercise results We highlight the role of information by running a second exercise in which we generate our forecasts using the closing date of the first BC survey in each quarter. At the start of the first month of quarter t, all of our series— including GDI and GDA—would be available up to quarter t − 2.24 Furthermore, the observations would come from the third NIPA release for quarter t − 2, which incorporates more comprehensive information than is available 24 For example, for the BC survey released in January 2016, the end of our estimation sample across models is 2015:Q3. Then, the 1-step forecast across the models is made for 2015:Q4, the 2-step forecast is for 2016:Q1, and the 3-step forecast is for 2016:Q2. We omit the BC forecasts from the comparison because the survey respondents have a tremendous informational advantage: at the BC survey date, the previous quarter has already ended, and the BC respondents have access to much of the higher-frequency data from that quarter to inform their 1-step forecasts (which are backcasts in this case).

1822

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

in the advance release. This exercise replicates the design of Chauvet and Potter (2013), who compare the realtime forecasting ability of an AR(2) in real GDP growth with those of twelve competing models, including the BC survey. Table 6 presents the relative RMSFEs for the entire sample. The results at all three forecast horizons for FSDP, DFP, PDFP, and PCE are broadly similar to those obtained when forecasting at the SPF date. The main difference from the SPF exercise comes from the results for GDI and GDA: because these series no longer lag the other measures, simple models that utilize GDI or GDA now also outperform the AR(2) benchmark, and these gains are statistically significant based on the DM test in a number of cases, even though they are usually smaller than those from the equivalent forecasting methods using other measures such as PCE or DFP. The results conditional on only expansionary quarters, shown in Table 7, are similar to those in Table 4. Across forecasting methods, the RW-AR(1) and VAR(1) again produce notable gains in forecast accuracy relative to the AR(2) benchmark, a number of which are statistically significant based on the DM test. Across measures, PCE generally performs better than the others, according to the lower relative RMSFEs from model averaging. During recessions, as is shown in Table 8, the GDIand GDA-based models now produce broad gains over the AR(2) benchmark across forecasting horizons, with a number of statistically significant forecasting gains based on the DM test. Clearly, the improvements in forecast accuracy that come from using GDA and GDI in the full sample are due to their usefulness during recessionary quarters. Across all aggregate measures and all forecasting methods, the smallest relative RMSFEs again come from the simplest models—the RW(1) specification using PCE growth for the 1- and 2-step horizons, and either DFP or PDFP at the 3-step horizon. 7. Discussion and related literature Our Blue Chip results are directly comparable to those of Chauvet and Potter (2013). Those authors document that a range of models are unable to produce more accurate forecasts than an AR(2) benchmark during expansions at the 1- and 2-step horizons.25 We show that NIPA aggregates can help in this regard, with some models producing lower RMSFEs from the early 1990s onward. During recessions, Chauvet and Potter (2013) find that an autoregressive model augmented with additional regressors from a dynamic factor model with Markov switching (AR-DFMS) produces RMSFEs relative to the AR(2) benchmark of 0.575 at the 1-step horizon and 0.583 at the 2-step horizon.26 In the AR-DFMS model, the Markov 25 Nowcasting approaches that take advantage of higher-frequency data have enjoyed success in outperforming univariate benchmarks at these horizons; see e.g. Bańbura et al. (2013), Carriero et al. (2015), Giannone, Reichlin, and Small (2008), and Higgins (2014). 26 This approach builds on work by Chauvet (1998), Chauvet and Hamilton (2006), Chauvet and Piger (2008), and Chauvet, Lu, and Potter (2012).

switching is driven by monthly data on four coincident indicators that the NBER Business Cycle Dating Committee uses to date recessions, one of which is real manufacturing and trade sales, which is released with a two-month lag. Thus, when making a forecast at the second quarter’s BC survey date, which is in early April, for example, the model is taking signals from monthly data that are available through January. In our approach, the relative RMSFE from our best measure during recessions, the RW(1) in PCE, is 0.585 at the 1-step horizon and 0.760 at the 2-step horizon. The former reading essentially matches the performance of the AR-DFMS model, while the ARDFMS model clearly improves upon the latter. However, it is worth noting that the PCE reading that we use is from two quarters earlier; e.g., in early April, we would be using the fourth-quarter PCE reading when making this forecast. Based on the timing of data releases, monthly PCE readings would be available through February by that point. Given that PCE is a useful near-term predictor of growth during recessions—and arguably at the ‘‘core’’ of GDP, as has been documented above—utilizing the monthly PCE data fully could yield further forecasting gains over those that we present based entirely on quarterly data. Of course, in practice, determining whether or not the economy is in a recession is extremely challenging. Our results have shown that, historically, the best simple models for forecasting GDP growth have differed depending on the state of the business cycle. In theory, the forecast accuracy would have benefitted from switching from the RW-AR(2) in PCE growth during expansions to the RW(1) in PCE growth during recessions. One could operationalize such switching by tying these forecasting models to a reliable real-time indicator of the state of the business cycle, such as the dynamic factor model with regime switching of Chauvet (1998). Allowing for such switching would seem a more fruitful avenue for research than performing simpler forecast combinations, precisely because the models that do relatively well in forecasting during expansions do not do so well during recessions and vice versa, as has been described. Thus, while we find gains in forecast accuracy from NIPA aggregates over the full sample, maximizing those gains would require a more complex framework than those pursued here. In practice, though, the inevitable uncertainty regarding the state of the economy implies that the forecasting gains that we document above from perfectly-timed switches would represent an upper bound. The main, and novel, contribution of this paper is its approach to forecasting GDP growth out-of-sample by using real-time data on NIPA aggregates. This exercise replicates the forecasting process and often generates different results from the in-sample fit using the most revised data. Notably, PDFP is intuitively appealing because it omits idiosyncratic quarterly fluctuations in inventories, trade, and government spending (especially those from defense spending). Dropping these three components of GDP lines up with the results of the LASSO exercise, and it appears to have the highest correlation with GDP growth at the 1-step horizon based on revised data (Council of Economic Advisers, 2015). However, we find little

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828 Table 6 Relative RMSFEs for real GDP forecasts at Blue Chip survey dates, full sample.

Notes: See the notes to Table 3.

Table 7 Relative RMSFEs for real GDP forecasts at Blue Chip survey dates, expansion-only sample.

Notes: See the notes to Table 3.

Table 8 Relative RMSFEs for real GDP forecasts at Blue Chip survey dates, recession-only sample.

Notes: See the notes to Table 3.

1823

1824

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

evidence that this measure generally produces forecasts that are superior to those of other measures. Overall, a consistent picture emerges from our analysis that PCE growth is a useful measure of core GDP growth that specifically helps in predicting future GDP growth, frequently outperforming a typical univariate forecasting benchmark across time and within different phases of the business cycle. Along these dimensions, an aggregate measure of GDP that excludes inventories and trade data—DFP—is not far behind. The results in this paper using the SPF survey dates find little benefit from looking at GDI growth compared with a univariate benchmark for forecasting GDP growth, whereas when we use the BC survey dates, GDI is competitive with other measures and often outperforms GDP growth.27 On the surface, these mixed findings are in contrast to other recent research that looks favorably on GDI, although the details matter. In one study, Nalewaik (2012) finds that, historically, real-time GDI growth has been a better indicator than real-time GDP growth for determining that the economy has fallen into a recession. In working with the real-time data, Nalewaik (2012) uses the BEA’s third NIPA release—by which point both GDP and GDI data are available—to compute the recession probability for quarter t. Due to data release lags, the third NIPA release occurs at the end of the third month of quarter t + 1; hence, the study has an important backcasting component to it. When making forecasts based on BC survey dates, we also use the data from the BEA’s third NIPA release, and we find more positive results for GDI and GDA at that point.28 Nevertheless, our performance metric—forecasting GDP growth per se—differs from determining whether the economy is in a recession or not. While we document that GDI has difficulty in capturing the trend in GDP growth compared with other measures (see Table 2), thereby diminishing its attractiveness as a measure of core GDP, Nalewaik (2010) finds that GDI tends to provide a better broad summary of the state of the economy than GDP across a range of indicators. Using the real-time history of third NIPA releases, Nalewaik (2010) shows that lagged GDI growth readings have more explanatory content for 1- and 2-step-ahead GDP growth than lagged GDP growth. Running a formal outof-sample forecasting exercise using third-release data, our results at the BC survey dates are similarly favorable for GDI-based forecasts. However, we also document that the predictive content of GDI for forecasting GDP growth out-of-sample is diminished greatly at the SPF survey dates due to data release lags; in attempting to generate an ‘‘advance’’ estimate of GDI that would match up with the advance estimate of GDP, Nalewaik (2010) finds 27 Indeed, the GDI- and GDA-based forecasts enjoyed more statistically significant improvements than did the other measures. 28 For example, Nalewaik (2012) estimates the probability that the economy was in recession in Q1 using the third NIPA release of Q1 data, which would have become available at the end of June, which is in Q2. When making forecasts using BC data, our forecasts made in Q3 (July) use the data available at that time, which would have been the estimates of Q1 data released in June. As was noted in footnote 24, the 1-step forecast would have been for Q2, etc.

that the explanatory power of GDI declines substantially. In related research, the object of underlying interest of Aruoba et al. (2016) is the unobserved true state of output growth in the economy, whereas both GDP growth and GDI growth are assumed to contain measurement error; this unobserved true output growth loads on GDI growth more heavily than on GDP growth.29 However, Aruoba et al. (2016) note that delays in the arrival of GDI data would pose challenges in real time, and they do not examine the ability of their approach to forecast (reported) GDP growth. 8. Conclusion The U.S. national income and product accounts (NIPAs) provide a variety of measures of economic activity beyond expenditure-constructed GDP, including gross domestic income and other aggregates that exclude one or more of the components that make up GDP. Similarly to the way in which economists have attempted to use core inflation—which excludes volatile food and energy prices—to predict headline inflation, the omission of components or subcomponents of GDP from these NIPA aggregates may be useful in extracting a signal as to where GDP is going. This paper uses both in-sample and out-of-sample exercises to investigate the extents to which six NIPA aggregates constitute ‘‘core GDP’’. The six aggregates range from income-based GDI, to measures that exclude one or two components, to the single largest major component of GDP—consumption. Our in-sample exercises use LASSO regressions and comparisons with trend GDP growth to show that PCE growth, DFP growth, and PDFP growth have promising properties as measures of core GDP growth. We conduct an out-of-sample forecasting exercise by building a novel real-time dataset of NIPA aggregates through the collection, and in some cases the reconstruction, of real-time data vintages for multiple aggregate series in the BEA’s national income and product accounts. We format these vintages of real-time data to match the formatting conventions of other real-time data repositories in order to enable others to use them in their research. Using these real-time vintages in out-of-sample GDP growth forecasting exercises, we document gains in forecasting accuracy from the use of NIPA aggregates combined with relatively simple time series models relative to a canonical autoregressive benchmark model, over horizons of one to three quarters. Of the NIPA aggregates, the forecasting gains are strongest for consumption growth and DFP growth (i.e., the growth of GDP excluding inventories and trade), further strengthening their case as promising measures of core GDP growth. In contrast, the ability of GDI to compete with these other measures as a forecasting tool depends on its availability, as it suffers relative to the other advance NIPA data in cases where it has not yet been released. 29 The same results are true of Aruoba et al. (2012), who used a forecast-error-based approach to recovering the unobserved true state of output growth.

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

Acknowledgments We thank the editor, associate editor, and two anonymous referees for helpful suggestions, along with conference participants at the 38th International Symposium on Forecasting and the 2nd Forecasting at Central Banks conference. The views expressed herein are those of the authors and do not necessarily represent the views of the Federal Reserve Bank of Cleveland or the Federal Reserve System. Appendix A. Real-time data Here, we detail the sources of the seven real-time data series that we use in the paper. Real gross domestic product (GDP) Real-time vintages for real GDP come from the Federal Reserve Bank of St. Louis’ ALFRED online database. Real gross domestic income (GDI) Our real-time real GDI vintages begin on January 29, 1992, and end on September 28, 2017. For vintages dated September 27, 2002, to September 28, 2017, we construct real-time vintages of the nominal GDP, nominal GDI, and real GDP using the BEA Data Archive: National Accounts (NIPA).30 Real-time vintage dates are assigned based on the dates of the NIPA releases, as reported by the BEA. From these, we deflate the nominal GDI using the implicit GDP deflator computed from the nominal and real GDP.31 For vintages dated January 29, 1992, to August 29, 2002, we hand-collect real GDI data as published in the BEA’s monthly Survey of Current Business (SCB) series. Most SCB publications include the most recent estimate of real GDI and a handful of earlier observations, while those that correspond to BEA annual revision releases include observations from the previous three years. We fill in the remaining time series of these vintages by using the real GDI vintage dataset of Nalewaik (2012).32 The columns (which represent different vintages) in this dataset are unlabeled, which requires us to assign a vintage date to each column. We do this by finding the intersection of the hand-collected SCB data and the Nalewaik (2012) 30 The data are available at: https://www.bea.gov/histdata/histChildLevels.cfm?HMI=7. 31 The time series history prior to 1995:Q1 is missing for the vintage dated July 31, 2009, for nominal GDP, nominal GDI, and real GDP. The ALFRED data indicate that the deep-history for this vintage is the same as that for the following vintage, dated August 27, 2009, for both nominal GDP and real GDP. Hence, we fill in the history of the vintages dated July 31, 2009, using the August 27, 2009, vintages of nominal GDP, nominal GDI, and real GDP. The real GDP vintage dated December 23, 2003, is also missing. The ALFRED data indicate that the data for this vintage are the same as those for the January 30, 2004, vintage, except for the last (i.e., the 2003:Q4) observation. Hence, we fill in the vintage of real GDP dated December 23, 2003, using all but the last value of the vintage dated January 30, 2004. 32 Data are available at: https://jmcb.osu.edu/archive/volume-44.

1825

data.33 , 34 On the basis of this match, we then fill in the observations for the 1992M1–2002M8 vintages going back to 1959Q4. Finally, we correct the following three vintage labels in the final real GDI dataset:

• The May 1997 SCB had the GDP release date listed as April 30, 1997. However, the SCB data actually come from the release on May 7, 1997, because the deep-history of real GDP in the May 7, 1997 vintage of ALFRED matches that in this SCB (i.e., this SCB coincided with a comprehensive revision). Also, matching the Nalewaik (2012) data to the data collected by hand in the SCB shows that the deephistory data changes from the vintage of March 28, 1997, to that of April 30, 1997. Hence, the date of this vintage in the collected-by-hand file was changed from April 30, 1997, to May 7, 1997. • The November 1999 SCB had the GDP release date listed as October 28, 1999. However, judging by the real GDP and GDP deflator ALFRED vintages, there was a release on October 29, 1999, that had the exact same data from 1994 onwards and additional observations earlier which differed from the previous vintage with a deep-history. That is, this release coincided with a comprehensive revision. Also, matching the Nalewaik (2012) data to the data collected by hand from the SCB shows that the deep history data changes from the vintage dated September 30, 1999, to that dated October 28, 1999. Hence, we changed the date of this vintage from October 28, 1999, to October 29, 1999. • The April 2000 SCB had the GDP release date listed as March 30, 2000. However, the SCB data actually come from the release on April 3, 2000, as, according to this SCB: ‘‘On March 30, 2000, as part of the comprehensive revision of the NIPA’s, BEA released revised NIPA estimates for 1929–58 that incorporated the definitional and statistical changes that had been incorporated earlier into the estimates beginning with 1959. In addition, BEA released revised estimates beginning with 1959 that incorporated corrections and a previously announced methodological improvement. The revisions were not sizable enough to affect the average annual growth rate in real GDP for 1929–58 or for 1959–98, but the growth rates for individual years were revised by as much as 0.5 percentage point’’ (page i). This 33 As a robustness check, we also assigned vintage labels to the Nalewaik (2012) data by using ALFRED vintages of nominal GDP, real GDP, and the GDP deflator, as well as vintages of nominal GDI from Haver Analytics going back to 1999M9. The resulting real-time real GDI set is identical to that derived by assigning vintage labels to the Nalewaik (2012) data on the basis of the hand-collected SCB data. 34 In cases where a column in the Nalewaik (2012) data clearly corresponds to a particular SCB vintage but the data differ between the two, we assign the vintage label to the Nalewaik (2012) column but take the available SCB data to be the truth. This occurs for five SCB vintages: February 23, 1996; May 2, 1996; August 1, 1996; October 30, 1996; and October 28, 1999. In each case, either there is one additional observation in the SCB vintage beyond the final observation available in the Nalewaik (2012) column, or the final observation differs between the two.

1826

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

SCB had a longer history than most SCBs. In the real GDP ALFRED data, the deep history for the vintage dated April 3, 2000, is slightly different from that for the vintage dated March 30, 2000. Finally, after matching the Nalewaik (2012) data to the SCB data that were collected by hand, the deep history data changes slightly from the vintage dated February 25, 2000, to that dated March 30, 2000. Hence, we changed the date of this vintage in the collected-by-hand file from March 30, 2000, to April 3, 2000. While these changes have no impact on our results for GDI, we make them in order to match the vintages of real GDI with those of the real GDP more accurately when constructing the average of GDP and GDI. The average of GDP and GDI, which we denote by GDA Given real-time vintages of quarterly annualized growth rates of real GDP and real GDI, we calculate the real GDA growth by first matching each vintage of GDI with a vintage of GDP. All but a handful of vintages of GDI have exact matches. For these few, we take the nearest GDP vintage that occurs within five days of the GDI vintage and in the same month, and label the GDA vintage the later of the GDI and GDP vintages. We are able to match each vintage of GDI in this way. Due to the identities (real GDP) = (nominal GDP)/deflator and (real GDI) = (nominal GDI)/ deflator, the real GDA may be calculated as the arithmetic average of the real GDP and the real GDI. Therefore, we convert the growth rates of the real GDP and real GDI to level indices, take the arithmetic average, and compute the annualized quarter-over-quarter growth rates.

In addition to real-time vintage data on real PCE, we also collect real-time vintage data on real nonresidential (or business) fixed investment, BFI, and real residential investment, RES. In both cases, we require data from ALFRED and the RTDSM.35 Prior to the changeover to chainweighting with the January 19, 1996, NIPA release, real PDFP was simply the sum of real PCE, real BFI, and real RES. From the NIPA releases of January 19, 1996, to July 30, 2015, we construct PDFP using the chain-weighting formula of, e.g., Landefeld, Moulton, and Vojtech (2003):

√ It = It −1

P′t −1 Qt ′

Pt −1 Qt −1

×

P′t Qt P′t Qt −1

,

(5)

where It is the chain-type quantity index at time t for PDFP, Qt = [PCE, BFI, RES]′ is the vector of real quantities at time t for the components included in PDFP, and Pt is the vector with their associated chain-type price indexes. The PCE price index is available in ALFRED starting with the January 19, 1996, NIPA release. We compiled the BFI and RES price indexes from multiple sources. For vintages dated January 19, 1996, to August 29, 2002, we handcollected the BFI and RES price indexes from monthly issues of the SCB. For vintages dated September 27, 2002, to January 30, 2013, we collected real-time data on the BFI and RES price indexes from the BEA via the BEA Data Archive: National Accounts (NIPA). For vintages dated February 28, 2013, onward, the BFI and RES price indexes are available via ALFRED.36 As a check, for the NIPA vintage of July 30, 2015, we compared the time series history of annualized growth rates from our procedure for reconstructing the series with the annualized growth rates of the as-reported BEA series from ALFRED. The root mean squared error was 0.0151 percentage points, and the maximum absolute error was 0.0692 percentage points.

Real final sales of domestic product (FSDP) Real domestic final purchases (DFP) Real-time vintages for real final sales of domestic product (FSDP) come from ALFRED. Personal consumption expenditures (PCE) Real-time vintages for real PCE come from ALFRED. Real private domestic final purchases (PDFP) Formally referred to by the BEA as real final sales to private domestic purchasers, PDFP is GDP minus the change in private inventories, trade, and government spending; alternatively, it is the sum of PCE, nonresidential fixed investment, and residential investment. As with GDA, the BEA only began publishing this series officially in July 2015, although economists could have been calculating it themselves at any point prior. The real-time vintage data for PDFP are available in ALFRED from the July 30, 2015, NIPA release date onward. Prior to that time, we reconstruct the growth rates of PDFP that economists would have been able to construct for themselves in real time. To do so, we sum components, rather than starting from GDP and subtracting components.

Formally referred to by the BEA as real final sales to domestic purchasers, DFP is GDP minus the change in private inventories and net exports; alternatively, it can be built up as the sum of PCE, nonresidential fixed investment, residential investment, and government spending. While the BEA has a long history of reporting DFP, the real-time vintage data in ALFRED begin with the February 28, 2013, vintage. For vintages prior to January 19, 1996, which do not use chain-weighting, we collect real-time vintages of data on real government spending, GOVT, from ALFRED, and we compute real DFP as the sum of real PCE, real BFI, real RES, and real GOVT. For the January 19, 1996, vintage, 35 Following the BEA, the ALFRED data on BFI and RES from December 2003 onward do not have the entire time series history, but stop in the 1990s. The RTDSM data go back to 1947 by using the chain-weight quantity indexes to backfill the missing observations. We splice the series together to fill in the missing data. 36 The ALFRED data contain errors in both cases. The January 10, 2014, vintage is labeled erroneously, so we drop it. In addition, the vintages from August 29, 2013, to December 20, 2013, are missing. We insert these data using data collected in real time from the BEA.

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

we collect real DFP by hand from the SCB for the period 1992:Q1 to 1995:Q3. For observations prior to 1992:Q1, we collect the chain-type price index for government spending from the SCB and use Eq. (5) to extend the real DFP series back to 1959:Q3. For vintages from January 19, 1996, to August 29, 2002, we hand-collect the real DFP from issues of the SCB. For vintages dated September 27, 2002, to January 30, 2013, we collected real-time data on real DFP from the BEA via the BEA Data Archive: National Accounts (NIPA). Appendix B. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.ijforecast.2019. 03.024. References Aastveit, K. A., Anundsen, A. K., & Herstad, E. (2019). Residential investment and recession predictability. International Journal of Forecasting, (forthcoming). Aruoba, S. B., Diebold, F. X., Nalewaik, J., Schorfheide, F., & Song, D. (2012). Improving GDP measurement: a forecast combination perspective. In X. Chen, & N. Swanson (Eds.), Recent advances and future directions in causality, prediction, and specification analysis: essays in honor of Halbert L. White Jr. Springer. Aruoba, S. B., Diebold, F. X., Nalewaik, J., Schorfheide, F., & Song, D. (2016). Improving GDP measurement: a measurement-error perspective. Journal of Econometrics, 191(2), 384–397. Atkeson, A., & Ohanian, L. E. (2001). Are Phillips curves useful for forecasting inflation? Federal Reserve Bank of Minneapolis Quarterly Review, 25(1), 2–11. Bańbura, M., Giannone, D., Modugno, M., & Reichlin, L. (2013). Nowcasting and the real-time data flow. In G. Elliott, & A. Timmermann (Eds.), Handbook of economic forecasting: Vol. 2. North Holland: Elsevier. Bryan, M. F., & Cecchetti, S. G. (1994). Measuring core inflation. In N. G. Mankiw (Ed.), Monetary policy. Chicago: University of Chicago Press. Bryan, M. F., Cecchetti, S. G., & Wiggins II, R. L. (1997). Efficient inflation estimation. NBER Working Paper 6183. Bureau of Economic Analysis (2017). Concepts and methods of the U.S. national income and product accounts. U.S. Department of Commerce. https://www.bea.gov/sites/default/files/methodologies/nipahandbook-all-chapters.pdf. (Accessed 20 January 2019). Carriero, A., Clark, T. E., & Marcellino, M. (2015). Realtime nowcasting with a Bayesian mixed frequency model with stochastic volatility. Journal of the Royal Statistical Society, Series A (Statistics in Society), 178(4), 837–862. Chauvet, M. (1998). An econometric characterization of business cycle dynamics with factor structure and regime switches. International Economic Review, 39(4), 969–996. Chauvet, M., & Hamilton, J. D. (2006). Dating business cycle turning points. In C. Milas, P. Rothman, & D. van Dijk (Eds.), Nonlinear time series analysis of business cycles. North Holland: Elsevier. Chauvet, M., Lu, M. E., & Potter, S. (2012). Forecasting output growth during the Great Recession. Unpublished manuscript.. Chauvet, M., & Piger, J. (2008). A comparison of the real-time performance of business cycle dating methods. Journal of Business & Economic Statistics, 26(1), 42–49. Chauvet, M., & Potter, S. (2013). Forecasting output. In G. Elliott, & A. Timmermann (Eds.), Handbook of economic forecasting: Vol. 2. North Holland: Elsevier. Clark, T. E., & McCracken, M. W. (2009). Tests of equal predictive ability with real-time data. Journal of Business & Economic Statistics, 27, 441–454. Council of Economic Advisers (2015). Economic Report of the President. Washington, DC.

1827

Crone, T. M., Khettry, N. N. K., Mester, L. J., & Novak, J. A. (2013). Core measures of inflation as predictors of total inflation. Journal of Money, Credit and Banking, 45(2–3), 505–519. Croushore, D., & Stark, T. (2001). A real-time data set for macroeconomists. Journal of Econometrics, 105(1), 111–130. D’Agostino, A., Giannone, D., & Surico, P. (2006). (Un)Predictability and macroeconomic stability. European Central Bank working paper series no 605. Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13, 253–263. Edge, R. M., & Gürkaynak, R. S. (2010). How useful are estimated DSGE model forecasts for central bankers? Brookings Papers on Economic Activity, 41(2), 209–259. Edge, R. M., Kiley, M., & Laforte, J.-P. (2010). A comparison of forecast performance between Federal Reserve staff forecasts, simple reduced-form models, and a DSGE model. Journal of Applied Econometrics, 25, 720–754. Faust, J., & Wright, J. H. (2009). Comparing greenbook and reduced form forecasts using a large realtime dataset. Journal of Business & Economic Statistics, 27(4), 468–479. Faust, J., & Wright, J. H. (2013). Forecasting inflation. In G. Elliott, & A. Timmermann (Eds.), Handbook of economic forecasting: Vol. 2. North Holland: Elsevier. Federal Reserve Board of Governors (2006). Minutes of the Federal Open Market Committee, October (2006) 24–25. https: //www.federalreserve.gov/fomc/minutes/20061025.htm. (Accessed 26 February 2018). Fixler, D. J., Greenaway-McGrevy, R., & Grimm, B. T. (2014). The revisions to GDP, GDI, and their major components. Survey of Current Business, 94, 1–23. Foroni, C., & Marcellino, M. (2014). A comparison of mixed frequency approaches for nowcasting euro area macroeconomic aggregates. International Journal of Forecasting, 30, 554–568. Giannone, D., Reichlin, L., & Small, D. (2008). Nowcasting: the real time information al content of macroeconomic data releases. Journal of Monetary Economics, 55(4), 665–676. Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13, 281–291. Hendry, D. F., & Hubrich, K. (2011). Combining disaggregate forecasts or combining disaggregate information to forecast an aggregate. Journal of Business & Economic Statistics, 29, 216–227. Higgins, P. (2014). GDPNow: a model for GDP ‘nowcasting’. Federal Reserve Bank of Atlanta working paper 2014-7. Kawa, L. (2017). Here comes the economic growth that confidence data’s predicting. https://www.bloomberg.com/news/articles/201702-06/here-comes-the-economic-growth-that-confidence-data-spredicting. (Accessed 6 February 2017). Knotek II, E. S., & Zaman, S. (2017). Nowcasting U.S. headline and core inflation. Journal of Money, Credit and Banking, 49(5), 931–968. Landefeld, J. S., Moulton, B. R., & Vojtech, C. M. (2003). Chained-dollar indexes: issues, tips on their use, and upcoming changes. Survey of Current Business, 83, 8–16. Landefeld, J. S., Seskin, E. P., & Fraumeni, B. M. (2008). Taking the pulse of the economy: measuring GDP. Journal of Economic Perspectives, 22(2), 193–216. Liebermann, J. (2014). Real-time nowcasting of GDP: a factor model vs. professional forecasters. Oxford Bulletin of Economics and Statistics, 76(6), 783–811. Lütkepohl, H. (2006). Forecasting with VARMA models. In G. Elliott, C. W. J. Granger, & A. Timmermann (Eds.), Handbook of economic forecasting: Vol. 1. North Holland: Elsevier. Meyer, B. H., & Pasaogullari, M. (2010). Simple ways to forecast inflation: what works best? Federal Reserve Bank of Cleveland Economic Commentary 2010-17. Meyer, B., & Venkatu, G. (2014). Trimmed-mean inflation statistics: just hit the one in the middle. Federal Reserve Bank of Cleveland working paper no. 12-17R. Nalewaik, J. J. (2010). The income- and expenditure-side estimates of U.S. output growth. Brookings Papers on Economic Activity, 71–106. Nalewaik, J. J. (2012). Estimating probabilities of recession in real time using GDP and GDI. Journal of Money, Credit and Banking, 44(1), 235–253.

1828

C. Garciga and E.S. Knotek II / International Journal of Forecasting 35 (2019) 1814–1828

Pesaran, M. H., & Timmermann, A. (2009). Testing dependence among serially correlated multicategory variables. Journal of the American Statistical Association, 104, 325–337. Smith, J. K. (2004). Weighted median inflation: is this core inflation? Journal of Money, Credit and Banking, 36, 253–263. Tulip, P. (2009). Has the economy become more predictable? Changes in Greenbook forecast accuracy. Journal of Money, Credit and Banking, 41(6), 1217–1231.

Christian Garciga is a data scientist at the Federal Reserve Bank of Cleveland. He received an M.A. in economics from Duke University in 2015. Edward S. Knotek II is a senior vice president and research economist at the Federal Reserve Bank of Cleveland, where he leads the Research Department’s Macroeconomic Forecasting Group. He received a Ph.D. in economics from the University of Michigan in 2005.