Empirical simultaneous prediction regions for path-forecasts

Empirical simultaneous prediction regions for path-forecasts

International Journal of Forecasting 29 (2013) 456–468 Contents lists available at SciVerse ScienceDirect International Journal of Forecasting journ...

682KB Sizes 0 Downloads 28 Views

International Journal of Forecasting 29 (2013) 456–468

Contents lists available at SciVerse ScienceDirect

International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast

Empirical simultaneous prediction regions for path-forecasts Òscar Jordà a,b , Malte Knüppel c , Massimiliano Marcellino d,e,∗ a

Economic Research, MS 1130, Federal Reserve Bank of San Francisco, 101 Market St., San Francisco, CA 94105, United States

b

Department of Economics, University of California, Davis, One Shields Ave., Davis, CA 95616, United States

c

Deutsche Bundesbank, Wilhelm-Epstein-Str. 14, D-60431 Frankfurt am Main, Germany

d

Department of Economics, European University Institute, Via della Piazzuola 43, 50133 Firenze, Italy

e

Department of Economics, Bocconi University, Via Roberto Sarfatti 25, 20100 Milano, Italy

article

info

Keywords: Path-forecast Forecast uncertainty Simultaneous prediction region Scheffé’s S-method Mahalanobis distance

abstract This paper investigates the problem of constructing prediction regions for forecast trajectories 1 to H periods into the future—a path forecast. When the null model is only approximative, or completely unavailable, one cannot either derive the usual analytic expressions or resample from the null model. In this context, this paper derives a method for constructing approximate rectangular regions for simultaneous probability coverage that correct for serial correlation in the case of elliptical distributions. In both Monte Carlo studies and an empirical application to the Greenbook path-forecasts of growth and inflation, the performance of this method is compared to the performances of the Bonferroni approach and the approach which ignores simultaneity. © 2013 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction In this paper we determine how to measure the uncertainty associated with the forecast of a variable over several periods, known as a path-forecast in the terminology of Jordà and Marcellino (2010), when the model generating the forecast is unknown or unavailable. For example, several forecasting firms (Blue Chip Forecasting, Macroeconomic Advisors, Standard and Poor, etc.) release U.S. inflation forecasts but do not provide sufficient details about the forecasting model. The same holds for the macroeconomic path-forecasts published by many central banks like the Bank of England or the U.S. Federal Reserve. Providing appropriate prediction intervals for pathforecasts is complicated. The region containing a given proportion of realizations each period will not, in general, contain that proportion of the entire path-forecast’s

∗ Corresponding author at: Department of Economics, European University Institute, Via della Piazzuola 43, 50133 Firenze, Italy. E-mail addresses: [email protected], [email protected] (Ò. Jordà), [email protected] (M. Knüppel), [email protected] (M. Marcellino).

realizations. Conversely, the region designed to simultaneously contain a certain proportion of all future trajectories will, in general, lack the desired per-period coverage. This discrepancy is related to having multiple probability statements, serial correlation in multi-step-ahead forecast errors, and differences in the prediction error rate one which may wish to control for. These are some of the central themes in our paper. Prediction intervals for a path-forecast should summarize the information in high-dimensional densities appropriately. We investigate a simple method for their construction based on the work of Williams and Goodman (1971), whose ideas precede but are related to the resampling methods in Efron’s (1979) well-known bootstrap procedures, and subsequent literature (see for example Politis, Romano, & Wolf, 1999). In particular, the method we propose is based on a sample of forecast errors at different horizons and on the assumption that they have an elliptical distribution. Given typical predictive sample sizes, non-parametric estimates of the joint forecast error distribution and its relevant quantile contours (which one would need for constructing non-parametric simultaneous prediction regions) are impractical. Instead, we

0169-2070/$ – see front matter © 2013 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2012.12.002

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

construct the empirical Mahalanobis (1936) distance, and from it we induce a simultaneous prediction region for the path forecast. Next, we provide a rectangular approximation to the induced simultaneous prediction region, in order to obtain appropriate, per-period, prediction intervals using a version of Scheffé’s (1953) projections. The resulting prediction intervals are an alternative to the wellknown Bonferroni intervals. Our proposed procedure relies on the assumption that the empirical distribution of the forecast errors remains constant over time. While this may appear restrictive, it is a weaker assumption than making conjectures about the data generating process, and hence its stability over time. In fact, a changing DGP approximated by an appropriately changing forecasting model may nevertheless produce stable forecast errors. This way of thinking differs from common applications of bootstrap procedures, such as forecasting applications with ARIMA models by Masarotto (1990), Thombs and Schucany (1990), Kim (1999), Pascual, Romo, and Ruiz (2004), and Clements and Taylor (2001). This literature typically relies on residual resampling given the null model. In contrast, we make no assumptions about whether the null model is correct or not. The performance of our procedure in Monte Carlo experiments is reasonably good, particularly when compared with the standard approach that ignores serial correlation in the elements of the path forecast. The method also works rather well when the distributional assumption is incorrect. As an empirical illustration, we construct prediction regions using Greenbook forecasts produced by the staff of the U.S. Federal Reserve (now referred to as Tealbook forecasts). The model used to generate the forecasts is not made available to the public. Nevertheless, we are able to compute empirical prediction regions for these forecasts, which is important for the evaluation of forecast uncertainty. The paper is structured as follows. Section 2 presents our method. Section 3 evaluates its performance in Monte Carlo experiments. Section 4 contains the empirical illustration. Section 5 summarizes the main results and concludes. 2. Statistical discussion Let yτ (h) be a random variable and let  yτ (h) denote a realization of this random variable—an h-step-ahead prediction of yτ +h , whose observed realization is denoted as  yτ +h . Hence, the forecast error  uhτ ≡  yτ +h −  yτ (h) is a realization of the random variable uhτ ≡ yτ +h − yτ (h), whose distribution is assumed to have mean zero (i.e., without loss of generality, we assume that the forecasts are unbiased) and constant variance σh2 . Further, we assume that uhτ is weakly stationary, which is less restrictive than making a stationarity assumption directly on yτ (h) or yτ +h . We are interested in the 1- to H-step-ahead predictions, with τ = R + 1, . . . , T − H observations being available for forecast evaluation, where N = T − H − R, and R indicates the estimation sample of the (possibly unknown) model underlying the forecast. We also find it convenient to define the H × 1 random vectors

457

Yτ (H ) ≡ [yτ (1), . . . , yτ (H )]′ ; Yτ ,H ≡ [yτ +1 , . . . , yτ +H ]′ ′    and Uτ (H ) ≡ [u1τ , . . . , uH τ ] , with Yτ (H ), Yτ ,H and Uτ (H ) denoting the actual forecasts, realizations and forecast errors, respectively. We assume that the distribution of the forecast error path Uτ (H ) is elliptically contoured, that is, it has the form: fU (Uτ (H )) = cH det (ΩH )−1/2 g [(Uτ (H ) − µU )′

× ΩH−1 (Uτ (H ) − µU )],

(1)

where cH is a normalizing constant, g [.] is a non-negative function which does not depend on τ , µU = 0, and ΩH is a positive definite H ×H matrix, the forecast error covariance matrix in what follows.1 This distributional assumption is restrictive but encompasses many of the forecasting environments encountered in practice, and makes the justification of the calculations that follow straightforward. We call Eq. (1) the path-error generating distribution, and the availability of N observations for evaluation will allow us to characterize it empirically. The path-error generating distribution reflects a variety of sources of uncertainty associated with the predictions in  Yτ (H ). Among these, the most common are: (i) uncertainty about the data generating distribution of Yτ ,H from which innovations are drawn; (ii) uncertainty about the null model used to describe the data generating distribution; and (iii) uncertainty about the parameter estimates of the null model in finite samples with respect to their population values. The reader is referred to Clements and Hendry (1998) for a more exhaustive list. For our purposes, it is unnecessary to be explicit about this breakdown, though it is important that the path-error generating distribution be stable over time. Our framework also extends to forecasting the paths of a vector of random variables and to the case of biased forecasts (µU ̸= 0), but we prefer to keep the discussion simple here. In the following subsections we discuss some particular statistical issues related to measuring forecast uncertainty, distinguishing the results that do not require distributional assumptions from those that do. 2.1. Period-specific prediction intervals Let us consider first the simpler problem of constructing a prediction interval for the individual elements of the path-forecast. That is, for each yτ +h , we want an interval Sh such that P (yτ +h ∈ Sh ) ≥ 1 − α for some prespecified error-level α chosen by the researcher and for h = 1, . . . , H. In the case of a unimodally distributed uhτ (in order to avoid having to discuss disjoint intervals here), Sh = yτ +h : c ≤ wh (uhτ ) ≤ c ,





(2)

where wh (.) : R → R. In the simplest case, wh (uhτ ) = uhτ = yτ +h − yτ (h), and an empirical prediction interval could

1 To be more precise, g [.] is a non-negative Lebesgue measurable  

function on [0, ∞), and cH equals cH = Γ

∞

H 2

H  2

/ Qπ

H 2

, with Q =

q g [q] dq < ∞. Note also that an elliptical distribution is not 0 necessarily normal. −1

458

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

be calculated from the (α/2)- and (1 − α/2)-quantiles of

 T −h  h T −h  uhτ ) τ =R+1 when N is sufficiently large. uτ τ =R+1 = wh (  h  T −h If the  uτ τ =R+1 are approximately Gaussian, and use

 σh2 =

1

T −h 

h ( uhτ − u )2 ,

N − 1 τ =R +1 h

with u denoting the sample mean, then another example is

wh ( uhτ ) ≡

h  uhτ − u ∼ t N −1 ,  σh

from which symmetric and approximately 100(1 − α)% coverage level predictive intervals can easily be constructed from the appropriate quantiles of the t-distribution. In order to assess the coverage errors from different methods, we introduce the auxiliary variable dhτ ∈ {0, 1}. This variable records whether a given realization yτ +h falls outside the pre-determined prediction region Sh , dhτ = 1, or not, dhτ = 0. If we choose Sh such that P (yτ +h ∈ Sh ) ≥ 1 − α , then E (dhτ ) = P (dhτ = 1) ≤ α, and the auxiliary variable dhτ provides a simple way to calculate the empirical value of α by averaging, either in applications or in simulations. 2.2. Simultaneous prediction intervals Constructing prediction regions for Yτ ,H is considerably more complicated. Suppose that the {Sh }H h=1 individual prediction regions are independent across horizons to begin with (an assumption that usually fails in time series data, but which we make temporarily for expositional purposes). In this case, the probability that all of the next H realizations of yτ will fall within the individual per-period prediction intervals is given by (1 − α)H . At this point it is useful to define another auxiliary variable for the purposes of evaluating path-forecast error H h rates. Let kH τ = h=1 dτ , that is, the number of realizations in the path that fall outside the union of per-period prediction regions Sh . In multiple comparison situations, it is often desirable to be able to control the probability error rate P (kH τ > 0) = γ , which we call the family-wise prediction error rate (FWPER), in order to use a nomenclature similar to that used in hypothesis testing (though our problem is different). In the example just described, note that by choosing constant individual error rates with P (dhτ = 1) = αh = α for h = 1, . . . , H, we have that H P (kH τ > 0) = 1 − (1 − α) .

For example, γ = 0.2649 if H = 6 and α = 0.05. A crude bound relating γ to the individual αh is provided by Bonferroni’s inequality, using the fact that 1 − P (kH τ > 0) ≥ 1 − α1 − · · · − αH ,

that is h P (∩H h=1 dτ = 0) ≥ 1 −

H 

P (dhτ = 1).

h=1

Hence, choosing αh = α/H ensures that γ ≤ α . There are, of course, several alternatives to providing a bound for γ using the αh under more restrictive assumptions, and there are also step-wise methods that can be used, such as Holm’s (1979) procedure. However, Bonferroni’s procedure is by far the most commonly used adjustment to correct for simultaneity, when an adjustment is made at all. For extensions of the Bonferroni procedure, the reader is referred to Lehmann and Romano (2005). Common practice in economics is to construct perperiod (point-wise) prediction intervals with the desired coverage, ignoring any issues related to simultaneity and control of the FWPER. In what follows, these intervals will be called marginal bands or intervals. Marginal bands are not entirely without justification. Implicit in this common practice is perhaps the desire to control the expected prediction error rate (EPER)

 E

kH τ



H

= ξ,

which has the nice property of being directly related to the marginal performance of each individual prediction interval. The EPER is related to αh by

 E

kH τ

 =

H

α1 + · · · + αH H

= ξ.

Although there is no requirement that αh = α , it is immediately clear that such a choice would guarantee that

ξ = α.

In the special case where the family consists of a single prediction interval,

ξ =E



k1τ 1



  = E d1τ = 1 = P (d1τ = 1) = α,

and EPER and FWPER coincide. However, in general,

 E

kH τ H



  ≤ P kHτ > 0 .

Independence across horizons will often fail in a forecasting setting, since positive serial correlation in yt means that if yτ +1 is above  yτ (1) then yτ +2 is more likely to be above  yτ (2) than not. In terms of the path-error generating distribution of Uτ (H ), this simply means that ΩH will not be diagonal. Consider then the problem of constructing a prediction region S such that P (Yτ ,H ∈ S ) ≥ 1 − α . In particular, it helps to transform the problem as follows. Let S = Yτ ,H : δ ≤ W (Uτ (H )) ≤ δ ,





where W (.) : RH → R is a one-dimensional distance function. In statistics, a popular distance function W (.) related to the class of elliptical distributions described in Eq. (1) is the

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Mahalanobis distance Wτ =



Yτ (H ) − Yτ ,H

′

ΩH

Yτ (H ) − Yτ ,H

 −1



∈ [0, ∞),

(3)

where ΩH can be estimated from the evaluation sample using

H = Ω

T −H

1

   ′  Uτ (H ) − U (H ) , Uτ (H ) − U (H ) 

N − 1 τ =R+1 T −H

1   U (H ) = Uτ (H ), with E (U (H )) = 0 if the forecasts are unbiased. The Mahalanobis distance evaluates the distance of a point from the center of the ellipsoid, accounting for correlation. To help draw the connection between a prediction region based on Eq. (3) and the per-period prediction regions based on Eq. (2), notice that ΩH is symmetric and positive definite, and hence admits a unique Cholesky decomposition ΩH = QQ ′ , where Q is a lower triangular matrix. Another way to think of this decomposition is as the projection of the h-step-ahead forecast error into the 1 to h − 1 previous forecast errors. Let us define the path of orthogonalized forecast errors as Vτ (H ) = Q −1 (Uτ (H ) − U (H )). Rewriting Eq. (3) in terms of these orthogonalized forecast errors, we have

=

(Uτ (H ) − U (H ))′ ΩH−1 (Uτ (H ) − U (H ))



Vτ (H )′ Vτ (H ) (4)

h =1

Summarizing, the simultaneous prediction region S with P (Yτ ,H ∈ S ) = 1 − α based on the Mahalanobis distance W is an ellipsoid given by the condition Pr(Wτ2 ≤ δ 2 ) = 1 − α, which can be transformed into the sphere given by the condition P

H 

and hence P Wτ2 ≤ χ22 (1 − α) = 1 − α,





(6)

where χ22 (1 −α) refers to the critical value of a chi-squared random variable with two degrees of freedom evaluated at a confidence level of 1 − α . This prediction region is displayed in Fig. 1 for α = 0.05. Scheffé (1953) introduced a rectangular projection of this region using an inversion of the usual F -statistic in a simultaneous hypothesis test. The goal was to search for all of the directions where the null hypothesis could be rejected. Such a search is implicitly a data-snooping exercise, and the virtue of the Scheffé (1953) procedure is that it corrects the statistical statements to account for this search process. Geometrically, and in the context of the illustrative example in Fig. 1, the Scheffé (1953) intervals can be calculated without difficulty. These are yτ +h ∈ yτ (h) ± 

χ22 (1 − α) for h = 1, 2 (when the covariance matrix

  H  vτ (h)2 . =



Yτ ,2 ∼ N (Yτ (2), I ).

(Yτ ,2 − Yτ (2))′ (Yτ ,2 − Yτ (2)) ∼ χ22 ,

N τ =R +1



within the multi-dimensional sphere described in that expression for an appropriately chosen value of δ . However, we cannot represent it in two-dimensional space, while this is often desirable for clarity of presentation. One solution is to apply a version of Scheffé’s (1953) projections to our problem. Consider the easier case where H = 2 first, with the additional simplification

Then, the region such that P (Yτ ,2 ∈ S ) = 1 − α is the circumference resulting from noting that

where

Wτ =

459

has to be estimated, one can instead use the F -distribution critical values, which is the original form given by Scheffé, 1953). The Scheffé (1953) bounds are displayed in Fig. 1 as the box that contains the circumference corresponding to Eq. (6). As Miller (1981) explains, in practical applications, Scheffé (1953) intervals will generally be wider than Bonferroni intervals for small values of H, but will be narrower when H is large. The following lemma from Turner and Bowden (1977, p. 887), which is a version of the Cauchy–Schwarz inequality, will be useful in the derivations that follow.

 v τ ( h) ≤ δ 2

2

= 1 − α.

(5)

h=1

A sphere such as that in the previous expression is still a complicated object to represent in two-dimensional space—a rectangular region is far more convenient. The question we examine next is how to choose the vertices of this rectangular region. 2.3. Per-period simultaneous prediction intervals: a balanced approach Eq. (5) is a correct probabilistic statement about the proportion of realizations of Yτ ,H which are expected to fall

Lemma 1. Let a and V be H × 1 real vectors.   Let δ be a real constant. Then V ′ V ≤ δ 2 if and only if a′ V  ≤ δ(a′ a)1/2 for all real vectors a. Returning to Eq. (4) and applying the lemma, we have P (Wτ2 ≤ δ 2 ) = P (Vτ (H )′ Vτ (H ) ≤ δ 2 )

   = P a′ Vτ (H ) ≤ δ(a′ a)1/2 for all a ,

where a is any H × 1 real valued vector. As was explained by Turner and Bowden (1977), if at least one element of a is restricted to a specific value, one obtains the inequality P (Vτ (H )′ Vτ (H ) ≤ δ 2 ) ≤ P a′ Vτ (H ) ≤ δ(a′ a)1/2 .







460

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Fig. 1. The geometry of two-dimensional prediction regions. Notes: 95% chi-square circumference. The square encasing this circumference indicates Scheffé (1953) bounds. The Turner–Bowden Rhombus results from applying the Cauchy–Schwarz inequality. The bands that we propose are shown by the square encircled by the 95% chi-square circumference, the result of rotating the Turner–Bowden rhombus.

For reasons that will be apparent soon, consider what happens when a = ( H1 , . . . , H1 )′ . Notice that a′ a = H1 , and hence





  P a′ Vτ (H ) ≤

δ2



H

    H 1   δ2   =P  vτ (h) ≤  H h=1  H   ′ 2 ≥ P Vτ (H ) Vτ (H ) ≤ δ = 1 − α.

 (7)

  P a′ Vτ (H ) ≤



δ2 H





H   ′  a Vτ (H ) ≤ ≥P h h=1

≥ 1 − α,



δ2



H (8)

where the middle term describes an H-cube. Fig. 1 displays the H-cube associated with this probabilistic statement for H = 2, which can be seen as the rhombus encasing the circumference given by Eq. (6). Rotating this rhombus gives the box corresponding to the Scheffé bounds mentioned above.

2 These cases are given by H = 2 and H = 4k with k ∈ N, where the orthogonal vectors can be obtained as the columns of an H × H Hadamard matrix multiplied by 1/H. In the other cases, some elements of a2 , a3 , . . . , aH differ from ± H1 .

δ2

iH , (9) H where iH is an H × 1 vector of ones, and δ will depend on the desired α . Note that this region guarantees that the allocation of prediction-error control is balanced in the sense of Beran (1988). Recall that Vτ (H ) = Q −1 (Uτ (H ) − U (H )), so that, to express the intervals in terms of the original forecast errors, we have

|Vτ (H )| ≤

The middle term of Eq. (7) makes a probabilistic statement about the interval associated with the average value of the orthogonalized components of the pathforecast. Now suppose that we consider H vectors of identical length, the vector a1 = a = ( H1 , . . . , H1 )′ , and all H − 1 vectors a2 , a3 , . . . , aH orthogonal to it. cases, these  In many  can be vectors with typical element ± H1 . 2 We then have that



For reasons that we will explain below, we propose the use of a smaller H-cube, which intersects the original circumference at the point where vτ (1) and vτ (2) simultaneously achieve maximum variability. In H-dimensional space, this region is defined by the H conditions

  −1  Q (Uτ (H ) − U (H )) ≤

δ2 H

iH .

(10)

Premultiplying Eq. (10) by |Q | and then applying the matrix inequality |AB| ≤ |A| |B| to the first expression, we get3

 2   Uτ (H ) − U (H ) ≤ |Q | δ iH ,

H and if the forecasts are unbiased such that U (H ) = 0 so that Y τ ,H = Y τ (H ), then

 Yτ ,H ∈ Yτ (H ) ± |Q |

δ2 H

iH .

(11)

3 Jordà and Marcellino (2010) neglected to invert Q using the absolute value. This is usually not a problem in common economic applications because path-forecasts tend to be positively correlated. However, the general result requires the absolute value transformation. This correction was first introduced in a manuscript by Staszewska-Bystrova (2012). The matrix inequality used is that given by Lütkepohl (1996, p. 40).

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Eq. (11) forms the basis for the per-period prediction intervals that we propose in this paper and which we will call Scheffé bands or intervals, to use the Jordà and Marcellino (2010) nomenclature, and as distinct from the Scheffé bounds discussed earlier. The Scheffé bands are designed to deliver the correct Wald coverage. This means that the probability of observing a path with a Mahalanobis distance larger than that of the Scheffé bands is not larger than α .4 While any point on the sphere defined by Eq. (5) could be chosen for this purpose, the intervals defined by Eq. (9) minimize the FWPER for the orthogonalized forecast errors. Put differently, the Scheffé bands are the bands which maximize the traditional coverage, i.e., the coverage related to the FWPER, for the orthogonalized forecast errors, given correct Wald coverage. It should be noted that the traditional coverage of the Scheffé bands is smaller than 1 − α for the orthogonalized forecast errors by construction. By delivering the correct Wald coverage using the method described above, the Scheffé bands provide information about the dynamics of the forecast errors which is not contained in either marginal or Bonferroni bands. As a simple example, suppose that the DGP is an AR(1)-process with a positive coefficient, but that the forecaster has failed to detect the serial correlation in the data, and therefore uses the unconditional mean as his point forecast, meaning that the prediction intervals are constructed based on the empirical forecast errors.5 In this case, the expected widths of the marginal and Bonferroni bands, respectively, are identical for all forecast horizons, whereas the Scheffé bands become wider as the horizon increases. Two points deserve a mention. First, returning to Fig. 1, the intervals proposed in Eq. (11) correspond to the inner square mentioned earlier. That square is tangential to the circumference given by Eq. (6) and to the rhombus given by the middle term of Eq. (7). The point-wise intervals implied by that square are of equal lengths, in order to achieve the desired balance. Second, notice that the Scheffé intervals that we propose in Eq. (11) are narrower than the intervals based on the marginal distribution of each element (which in this example would correspond to the [−1.96, 1.96]×[−1.96, 1.96] square), and much narrower than the Scheffé bounds. In practice, when the null distribution is unknown and one wants to make as few assumptions as possible, the evaluation sample, N-sample for short (R-sample will refer to the sample used to estimate the model and T = R + N + H comprises all of the available data), can be used to construct the Scheffé intervals. Suppose that one wants to provide prediction intervals for  YT (H ). The NH , and the Cholesky sample can then be used to estimate Ω decomposition to calculate  Q . Finally, using N-sample H , one can generate estimates of Y τ (H ), together with Ω τ2 }T −H and use the (1 − α)-quantile of this sequence {W τ =R +1

4 The expression ‘Wald control’ seems appropriate, since the Mahalanobis distance is the main ingredient used in constructing the celebrated Wald statistic. 5 See below for the details of this construction.

461

as an estimate of δ 2 . Alternatively, with small N-samples and if the data are reasonably well approximated by the Gaussian distribution, δ 2 could be calculated as the critical value of a chi-square distributed random variable at a 1 −α confidence level or, if a small sample correction is desired, as H times the critical value of an F -distributed random variable with H , N degrees of freedom at a 1 −α confidence level. Another practical matter has to do with the fact that the width of the intervals depends on the choice of H. This is a natural consequence of the simultaneous nature of the evaluation problem. However, one could construct stable per-period intervals with some additional coverage by replacing the intervals in Eq. (11) with

H 2 δ ∈ Yτ (H ) ± | Q |  h  

Yτ ,H

h

,

(12)

h=1

where δh2 denotes the critical value calculated for period h (either empirically or using the chi-square or F distribution assumption) and where  the H × 1 vector in δ2

h brackets has a typical element for h = 1, . . . , H. h This is the version of the Scheffé bands we use for the Monte Carlo simulations and the application below, since we feel that researchers may feel more comfortable choosing a method of prediction interval construction that is independent of H.6 The Wald control of the prediction bands is evaluated as follows. Each type of band considered, i.e., the marginal, Bonferroni and Scheffé bands, generates an implied value for W 2 in expressions such as Eq. (3). Call this value Ws2 . In Monte Carlo simulations, we then compute the proportion

2

τ exceeds Ws2 . of the simulated path-forecasts whose W A proper calibration of the bands produces Wald control that corresponds to the desired α error rate. It should be clear that neither version of the Scheffé bands proposed is designed to guarantee correct FWPER control. In contrast, Bonferroni bands control the FWPER (by design), but their Wald coverage can deviate substantially from the prespecified nominal level. We examine these and many other features in the simulation exercises discussed in the next section. 3. Probability coverages of alternative prediction bands: simulation evidence We now present simulation evidence on the probability coverages of different types of prediction bands for a pathforecast. The next subsection describes the experimental design, and the following subsection reports the results. With regard to FWPER control, if the actual coverage of the bands exceeds (falls short of) the nominal level, the bands are commonly said to be too wide (narrow). For convenience, we adopt the same terminology when evaluating the Wald coverage.

6 The difference between Eqs. (11) and (12) depends on the choice of α . For common values of α such as 0.05 or 0.32, Eq. (12) tends to yield wider intervals.

462

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

3.1. Experimental design We simulate data from the AR(1)-model yt = ρ yt −1 + ϵt ,

(13)

with  the coefficient ρ equal to 0.5 or 0.9, E [ϵt ] = 0 and E ϵt2 = 1. Three different distributions are considered for the errors ϵt . The first is the standard normal distribution, and the second is the t-distribution with 4 degrees of freedom, scaled to have a variance equal to 1. In both cases, the errors are i.i.d. For the third distribution, ϵt is generated by the ARCH process

ϵt =



z t vt ,

zt = a0 + a1 ϵt2−1 , where vt is i.i.d. standard normal, and a0 = 1 −√a1 ,  so that E ϵt2 = 1. The parameter a1 is set to 1/ 3. Thus, in the case of the t-distributed errors and the ARCH errors, the parameters are set to the least extreme values, for which the kurtosis of ϵt does not exist. Choosing these parameter values yields similar unconditional nonnormal distributions of ϵt . For example, using the critical value c = 0.99 (1.96, 2.58), which yields a coverage of P (−c < ϵt < c ) = 0.68 (0.95, 0.99) in the case of the standard normal distribution, the coverage equals 0.77 (0.95, 0.98) in the case of the t-distribution and 0.75 (0.95, 0.98) in the case where ϵt is generated by the ARCH process. Note that the (unconditional) covariance matrices of the forecast errors are identical for all three of the error distributions considered. Again, let R denote the estimation sample size, N denote the predictive sample size, and H denote the length of the path-forecast considered. We consider the values R = 100, N = 40, and H = 1, 4, 8 and 12. For each value of H, we generate 10,000 Monte Carlo samples.7 At each replication, we use that sample to generate path-forecasts of length H over a predictive sample N using OLS estimates from an AR(1) model without a constant, fitted on R observations. Then, we construct three types of prediction bands: (1) traditional marginal error bands; (2) Bonferroni bands; and (3) Scheffé bands (using Eq. (12)). We consider both of the analytical formulas derived by Jordà and Marcellino (2010), based on the null model being the DGP, as well as the empirical formulas described in the previous section, based on the forecast errors only.8 The empirical covariance matrices of the forecast errors are estimated assuming unbiased forecasts. Thus, in the case of model-based   bands, two parameters have to be estimated, ρ and E ϵt2 . In the case of bands based on forecast errors, marginal bands and Bonferroni bands only require the H , estimation of the H elements on the main diagonal of Ω H with whereas for the Scheffé bands, the entire matrix Ω H (H + 1) /2 distinct elements has to be estimated. We examine bands for 68% and 95% probability coverage rates.

7 Additional simulations with R = 25 and N = 80 are available in the online Appendix of this article. 8 In the case of using the analytical formulas, the size N of the predictive sample is obviously of no consequence.

As was mentioned above, coverage is evaluated in terms of FWPER and Wald control. FWPER control means that any path with one or more elements outside the prescribed bands is considered to fall outside the bands. Wald control is calculated using the Mahalanobis distance squared, recording all those paths with distances above Ws2 , as defined in the previous section, as being outside the bands. We do not use the empirical critical values because the empirical predictive sample sizes usually do not allow for accurate estimates of their (1 − α)-quantiles. In practice, a normal distribution for the error terms is frequently assumed, and it is probable that the normality hypothesis often could not be rejected due to the small sample sizes, even if the errors were not normal. We therefore consider situations in which the errors are nonnormal, but the bands are constructed using the normality assumption. In addition to the design described above, we also investigate a simple case of misspecification, where we assume that the forecaster uses the misspecified model yt = ϵt

(14)

instead of the true model given by Eq. (13), so that  YT (H ) is an H × 1 vector of zeros. The only parameter that has to be estimated in the case of model-based bands is the second moment of yt . Finally, the true covariance matrix needed for evaluating the Wald coverage is assumed to be unknown, and therefore has to be estimated from the Nsample.9 3.2. Simulation results Table 1 shows the results obtained using the wellspecified model and model-based bands. In general, the marginal bands are much too narrow with respect to both FWPER control and Wald control. The coverage of the Bonferroni bands usually exceeds the nominal level only slightly with respect to FWPER control, but often does so strongly with respect to Wald control. The Wald coverage is mostly close to its nominal level for the Scheffé bands, but the bands are overly narrow with respect to FWPER control in many cases. This problem is less pronounced in the case of shorter forecast horizons or higher persistence. Similar results were found by Staszewska-Bystrova (2011), Staszewska-Bystrova and Winker (2013), and Wolf and Wunderli (2012). However, it should be noted that certain extreme results reported in these studies can be avoided by using |Q | instead of Q in Eq. (12). It can also be seen that, in general, the distribution of the forecast errors does not matter too much if the coverage is close to its nominal level. However, if the coverage deviates strongly from the nominal level, as is the case for the Wald coverage of the marginal bands, for example, large differences based on the distribution can sometimes be observed. Interestingly, the coverage of the bands often becomes larger in the case of non-normal errors. This

9 Additional simulations and results for a VAR model based on Stock and Watson (2001) with several types of misspecifications are provided by Jordà, Knüppel, and Marcellino (2010).

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

463

Table 1 Actual coverage in percent with the correct null model, model-based bands, R = 100. H

ρ

S

H

ρ

1 −α

M

B

S

ρ

1 −α

M

M

B

S

Norm FWPER t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

67 76 74

67 76 74

67 76 74

1 1 1

0.9 0.9 0.9

68 68 68

67 77 75

67 77 75

67 77 75

4 4 4

0.5 0.5 0.5

68 68 68

25 39 44

73 77 82

49 60 64

4 4 4

0.9 68 0.9 68 0.9 68

34 47 52

77 82 84

60 69 71

Norm t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

67 76 74

67 76 74

67 76 74

1 1 1

0.9 0.9 0.9

68 68 68

67 77 75

67 77 75

67 77 75

4 4 4

0.5 0.5 0.5

68 68 68

28 44 49

81 84 86

65 75 78

4 4 4

0.9 68 0.9 68 0.9 68

16 29 35

63 74 77

66 75 78

Norm FWPER t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

95 95 96

95 95 96

95 95 96

1 1 1

0.9 0.9 0.9

95 95 95

95 95 95

95 95 95

95 95 95

4 4 4

0.5 0.5 0.5

95 95 95

82 83 86

95 92 93

89 88 90

4 4 4

0.9 95 0.9 95 0.9 95

85 87 88

96 94 94

93 93 93

Norm t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

95 95 96

95 95 96

95 95 96

1 1 1

0.9 0.9 0.9

95 95 95

95 95 95

95 95 95

95 95 95

4 4 4

0.5 0.5 0.5

95 95 95

89 89 90

98 95 95

97 94 94

4 4 4

0.9 95 0.9 95 0.9 95

76 81 83

93 92 92

95 93 93

Norm FWPER t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

7 16 23

74 74 80

39 50 55

8 8 8

0.9 0.9 0.9

68 68 68

19 29 34

80 82 82

59 67 68

12 12 12

0.5 0.5 0.5

68 68 68

2 8 13

73 72 77

32 42 48

12 12 12

0.9 68 0.9 68 0.9 68

11 19 23

80 82 82

58 67 68

Norm t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

10 24 33

91 89 89

65 75 77

8 8 8

0.9 0.9 0.9

68 68 68

1 6 10

50 64 68

65 75 76

12 12 12

0.5 0.5 0.5

68 68 68

4 16 24

95 91 90

66 74 77

12 12 12

0.9 68 0.9 68 0.9 68

0 1 2

39 55 62

65 74 76

Norm FWPER t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

68 71 77

94 89 91

83 81 84

8 8 8

0.9 0.9 0.9

95 95 95

77 79 81

95 93 93

92 92 92

12 12 12

0.5 0.5 0.5

95 95 95

58 60 68

94 86 89

76 72 76

12 12 12

0.9 95 0.9 95 0.9 95

69 73 75

95 93 92

92 91 91

Norm t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

86 87 87

100 97 96

98 94 93

8 8 8

0.9 0.9 0.9

95 95 95

44 59 65

87 87 87

95 92 91

12 12 12

0.5 0.5 0.5

95 95 95

85 84 85

100 97 96

98 93 92

12 12 12

0.9 95 0.9 95 0.9 95

22 39 49

81 82 83

96 92 91

Wald

Wald

Wald

Wald

1 −α

M

B

H

B

S

H

ρ

1 −α

Notes: H refers to the forecast horizon, ρ refers to the autoregressive parameter of the DGP, 1 − α refers to the nominal coverage, M stands for ‘‘marginal’’, B stands for ‘‘Bonferroni’’, and S stands for ‘‘Scheffé’’.

might appear surprising at first sight, but it is directly related to the coverage for certain critical values reported above. For a nominal coverage of 68%, bands based on a normal distribution are actually too wide if the errors are t-distributed or follow an ARCH process.10 Table 2 displays the results for the well-specified model and bands based on forecast errors. It can be seen that, in general, the coverage is lower than with the model-based bands. While the differences are relatively small for H = 1 and H = 4, they can become large for the longer forecast horizons. For example, with H = 12, 1 −α = 0.68, ρ = 0.9 and ARCH errors, the Wald coverages of the Bonferroni and Scheffé bands drop by 0.22 and 0.26, respectively. Considering 1 − α = 0.95 and all the same settings as before otherwise, the Wald coverage of the marginal bands decreases from 0.49 if based on the model to 0.29 based on the forecast errors. In general, the coverage of the Scheffé bands appears to be affected more strongly, especially for the longer horizons. This seems plausible, because the differences with respect to the results in Table 1 are due to higher levels of estimation uncertainty, and the Scheffé bands require the estimation of the entire covariance matrix of the forecast errors, whereas for the marginal and Bonferroni bands, only the elements on the main diagonal are needed. The results in Table 3 are the coverage rates that can be observed if the misspecified model in Eq. (14)

10 Had we considered 99% bands, we would, of course, have found that the bands based on the normal distribution tend to have a smaller coverage in the cases of non-normal errors.

is used for the construction of the bands. For the marginal and Bonferroni bands, the results with respect to FWPER control are very similar to those reported in Table 1 for the correct model. The reason for this is that the misspecification leads to a larger estimate of the residual variance, which the model uses as the estimate of the forecast error variances for all horizons. This causes the bands to become wider, and therefore the coverage rates are broadly as before, although the variances of the forecast errors are larger than before. For the Scheffé bands, the coverage with respect to FWPER control decreases, because the true covariances of the forecast errors differ from the model-based covariances, which equal zero. For the Wald coverage evaluation, the covariance matrix implied by the misspecified model is used. Thus, the results are necessarily misleading. For the marginal and Bonferroni bands, the reported Wald coverage is larger than the true coverage of these bands, because the correlation of the forecast errors is ignored in the calculation of the Mahalanobis measure. For the Scheffé bands, the errors made in their construction and their evaluation tend to cancel each other out, so that their reported Wald coverage is always relatively close to the nominal level. Table 4 contains the results for the misspecified model and bands based on forecast errors. With respect to FWPER control, the performances of the marginal and Bonferroni bands are similar to those based on the misspecified model, as reported in Table 3. The coverage of the Scheffé bands increases strongly, because here the correlation of the forecast errors is considered in their construction, so that the bands become wider. Since the Wald coverage

464

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Table 2 Actual coverage in percent with the correct null model, forecast error-based bands, R = 100, N = 40. H

ρ

M

B

S

H

ρ

1 −α

M

B

S

ρ

1 −α

M

B

S

M

B

S

Norm FWPER t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

68 75 71

68 75 71

68 75 71

1 1 1

0.9 0.9 0.9

68 68 68

68 74 70

68 74 70

68 74 70

4 4 4

0.5 0.5 0.5

68 68 68

26 36 39

72 73 75

49 56 57

4 4 4

0.9 68 0.9 68 0.9 68

35 44 45

76 79 77

59 66 64

Norm t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

68 75 71

68 75 71

68 75 71

1 1 1

0.9 0.9 0.9

68 68 68

68 74 70

68 74 70

68 74 70

4 4 4

0.5 0.5 0.5

68 68 68

27 40 41

78 79 78

62 68 67

4 4 4

0.9 68 0.9 68 0.9 68

15 26 27

60 67 66

63 69 68

Norm FWPER t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

94 93 93

94 93 93

94 93 93

1 1 1

0.9 0.9 0.9

95 95 95

94 94 93

94 94 93

94 94 93

4 4 4

0.5 0.5 0.5

95 95 95

81 80 80

94 89 89

87 84 84

4 4 4

0.9 95 0.9 95 0.9 95

83 83 83

94 92 91

92 90 89

Norm t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

94 93 93

94 93 93

94 93 93

1 1 1

0.9 0.9 0.9

95 95 95

94 94 93

94 94 93

94 94 93

4 4 4

0.5 0.5 0.5

95 95 95

87 85 83

97 93 91

94 91 88

4 4 4

0.9 95 0.9 95 0.9 95

71 74 73

90 87 85

92 89 87

Norm FWPER t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

8 15 20

72 68 70

34 40 42

8 8 8

0.9 0.9 0.9

68 68 68

19 27 30

78 76 76

57 62 61

12 12 12

0.5 0.5 0.5

68 68 68

3 7 11

71 64 66

25 29 32

12 12 12

0.9 68 0.9 68 0.9 68

12 18 21

77 76 73

56 61 59

Norm t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

13 23 26

82 80 78

53 59 59

8 8 8

0.9 0.9 0.9

68 68 68

1 5 7

41 51 52

53 59 60

12 12 12

0.5 0.5 0.5

68 68 68

10 17 20

83 80 78

43 51 51

12 12 12

0.9 68 0.9 68 0.9 68

0 1 2

28 38 40

44 51 50

Norm FWPER t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

66 64 67

93 85 86

73 69 70

8 8 8

0.9 0.9 0.9

95 95 95

74 74 73

93 89 87

91 89 86

12 12 12

0.5 0.5 0.5

95 95 95

55 53 57

92 82 81

57 54 54

12 12 12

0.9 95 0.9 95 0.9 95

67 67 66

92 88 86

89 87 84

Norm t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

79 77 75

97 93 91

91 86 83

8 8 8

0.9 0.9 0.9

95 95 95

37 46 47

77 75 74

87 82 80

12 12 12

0.5 0.5 0.5

95 95 95

72 71 70

96 92 90

85 80 77

12 12 12

0.9 95 0.9 95 0.9 95

16 27 29

64 65 63

79 75 72

Wald

Wald

Wald

Wald

1 −α

H

H

ρ

1 −α

Notes: H refers to the forecast horizon, ρ refers to the autoregressive parameter of the DGP, 1 − α refers to the nominal coverage, M stands for ‘‘marginal’’, B stands for ‘‘Bonferroni’’, and S stands for ‘‘Scheffé’’.

Table 3 Actual coverage in percent with the misspecified null model, bands based on the model, R = 100. H

ρ

ρ

1−α

M

1−α

M

Norm FWPER t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

68 73 73

68 73 73

68 1 73 1 73 1

0.9 0.9 0.9

68 68 68

65 65 65 67 67 67 68 68 68

4 0.5 68 4 0.5 68 4 0.5 68

26 37 40

73 76 77

30 41 44

4 0.9 4 0.9 4 0.9

68 68 68

42 47 48

79 81 81

45 50 51

Norm t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

68 73 73

68 73 73

68 1 73 1 73 1

0.9 0.9 0.9

68 68 68

65 65 65 67 67 67 68 68 68

4 0.5 68 4 0.5 68 4 0.5 68

61 68 69

96 94 92

67 72 73

4 0.9 4 0.9 4 0.9

68 68 68

63 64 66

91 91 90

66 67 69

Norm FWPER t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

95 94 94

95 94 94

95 1 94 1 94 1

0.9 0.9 0.9

95 95 95

93 93 93 93 93 93 93 93 93

4 0.5 95 4 0.5 95 4 0.5 95

82 82 83

94 91 91

70 73 76

4 0.9 4 0.9 4 0.9

95 95 95

85 86 86

95 94 93

77 78 79

Norm t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

95 94 94

95 94 94

95 1 94 1 94 1

0.9 0.9 0.9

95 95 95

93 93 93 93 93 93 93 93 93

4 0.5 95 4 0.5 95 4 0.5 95

98 96 95

100 99 98

96 93 92

4 0.9 4 0.9 4 0.9

95 95 95

94 93 93

98 98 97

90 90 90

Norm FWPER t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

8 15 20

73 72 74

11 8 19 8 24 8

0.9 0.9 0.9

68 68 68

26 82 30 12 0.5 68 30 82 34 12 0.5 68 33 80 37 12 0.5 68

2 7 11

74 67 71

4 12 0.9 9 12 0.9 15 12 0.9

68 68 68

17 21 25

81 81 79

20 25 28

Norm t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

58 100 64 97 66 96

66 8 70 8 71 8

0.9 0.9 0.9

68 68 68

61 96 66 12 0.5 68 63 95 67 12 0.5 68 64 94 68 12 0.5 68

56 61 65

100 99 97

67 12 0.9 69 12 0.9 71 12 0.9

68 68 68

60 61 63

98 97 95

66 66 67

Norm FWPER t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

68 68 72

95 87 89

41 8 49 8 54 8

0.9 0.9 0.9

95 95 95

78 94 59 12 0.5 95 78 93 62 12 0.5 95 78 91 64 12 0.5 95

57 56 62

94 84 86

22 12 0.9 30 12 0.9 38 12 0.9

95 95 95

72 73 71

94 93 90

47 50 51

Norm t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

99 100 97 99 96 99

96 8 92 8 92 8

0.9 0.9 0.9

95 95 95

95 99 88 12 0.5 95 94 99 87 12 0.5 95 93 98 87 12 0.5 95

100 97 96

100 100 99

96 12 0.9 91 12 0.9 90 12 0.9

95 95 95

95 100 95 99 93 98

87 86 85

Wald

Wald

Wald

Wald

1−α

M

B

S

H

B

S

H

ρ

1−α

M

B

S

H

ρ

B

S

Notes: H refers to the forecast horizon, ρ refers to the autoregressive parameter of the DGP, 1 − α refers to the nominal coverage, M stands for ‘‘marginal’’, B stands for ‘‘Bonferroni’’, and S stands for ‘‘Scheffé’’.

is evaluated based on the empirical covariance matrix of the forecast errors, on average, the correct Mahalanobis distance is used for the evaluation. Therefore, the results obtained are broadly in line with those reported in Table 2.

To summarize, if the bands are based on forecast errors instead of a model, the increased estimation uncertainty tends to lead to lower coverage rates. These effects are strongest for the Scheffé bands, because they require an

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

465

Table 4 Actual coverage in percent with the misspecified null model, bands based on forecast errors, N = 40. H

ρ

M

B

S

H

ρ

M

B

S

ρ

1 −α

M

B

S

ρ

1 −α

M

B

S

Norm FWPER t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

67 73 71

67 73 71

67 73 71

1 1 1

0.9 68 0.9 68 0.9 68

61 62 63

61 62 63

61 62 63

4 4 4

0.5 0.5 0.5

68 68 68

26 36 39

71 72 75

49 54 57

4 4 4

0.9 0.9 0.9

68 68 68

38 41 43

75 75 75

57 57 58

Norm t ARCH

1 1 1

0.5 68 0.5 68 0.5 68

67 73 71

67 73 71

67 73 71

1 1 1

0.9 68 0.9 68 0.9 68

61 62 63

61 62 63

61 62 63

4 4 4

0.5 0.5 0.5

68 68 68

27 38 42

77 78 78

61 67 67

4 4 4

0.9 0.9 0.9

68 68 68

11 17 19

50 55 56

61 64 64

Norm FWPER t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

94 94 93

94 94 93

94 94 93

1 1 1

0.9 95 0.9 95 0.9 95

92 92 92

92 92 92

92 92 92

4 4 4

0.5 0.5 0.5

95 95 95

81 80 80

93 90 89

86 84 83

4 4 4

0.9 0.9 0.9

95 95 95

81 81 80

92 91 89

88 87 87

Norm t ARCH

1 1 1

0.5 95 0.5 95 0.5 95

94 94 93

94 94 93

94 94 93

1 1 1

0.9 95 0.9 95 0.9 95

92 92 92

92 92 92

92 92 92

4 4 4

0.5 0.5 0.5

95 95 95

86 85 83

96 93 91

94 90 89

4 4 4

0.9 0.9 0.9

95 95 95

60 65 65

82 81 80

87 85 84

Norm FWPER t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

8 15 19

71 68 70

34 38 42

8 8 8

0.9 68 0.9 68 0.9 68

24 27 29

75 74 73

55 54 55

12 12 12

0.5 0.5 0.5

68 68 68

3 7 11

71 63 66

25 29 31

12 12 12

0.9 0.9 0.9

68 68 68

17 20 22

74 73 71

54 53 52

Norm t ARCH

8 8 8

0.5 68 0.5 68 0.5 68

13 23 26

82 80 78

53 59 58

8 8 8

0.9 68 0.9 68 0.9 68

1 3 5

33 40 43

52 56 56

12 12 12

0.5 0.5 0.5

68 68 68

11 18 21

82 79 78

43 50 51

12 12 12

0.9 0.9 0.9

68 68 68

0 1 2

24 33 34

43 48 48

Norm FWPER t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

66 64 67

92 85 85

71 68 69

8 8 8

0.9 95 0.9 95 0.9 95

72 72 70

90 88 86

85 84 82

12 12 12

0.5 0.5 0.5

95 95 95

56 52 57

92 81 81

56 53 54

12 12 12

0.9 0.9 0.9

95 95 95

66 64 63

88 86 83

83 81 79

Norm t ARCH

8 8 8

0.5 95 0.5 95 0.5 95

78 76 76

96 92 91

91 85 83

8 8 8

0.9 95 0.9 95 0.9 95

28 37 38

67 67 66

82 78 76

12 12 12

0.5 0.5 0.5

95 95 95

72 71 70

95 92 90

86 80 78

12 12 12

0.9 0.9 0.9

95 95 95

14 22 25

54 57 56

74 71 68

Wald

Wald

Wald

Wald

1 −α

1 −α

H

H

Notes: H refers to the forecast horizon, ρ refers to the autoregressive parameter of the DGP, 1 − α refers to the nominal coverage, M stands for ‘‘marginal’’, B stands for ‘‘Bonferroni’’, and S stands for ‘‘Scheffé’’.

Table 5 Coverages of alternative bands for US inflation and growth. Nominal

Inflation

Growth

68

95

68

95

Marg Bonf FWPER control

Schef

Marg

Bonf

Schef

Marg

Bonf

Schef

Marg

Bonf

Schef

Actual

31 79 Wald control

60

88

92

93

27

72

51

73

100

91

Actual

16

75

75

88

91

40

83

73

85

99

99

73

Note: The table reports the coverage rates of alternative bands for forecast paths in percent. The sample 1974:Q2–1985:Q1 is used for the estimation of the first variance covariance matrix of forecast errors. The first forecast path used starts in 1985:Q2, and the exercise is repeated in a rolling fashion. The last available forecast path comes from 2003:Q4, so that the coverages of bands are evaluated for 75 paths.

additional estimation of covariances. In the case of model misspecification, the evaluation of the Wald coverage can give very misleading results. Moreover, the coverage of the Scheffé bands in particular can be strongly affected if they are based on a misspecified model. Both problems can be avoided if the empirical forecast errors are used for both the construction of the bands and the evaluation of the Wald coverage. If the errors are assumed to be normal although they are not, the effects on the coverage depend on the nominal level of α and the type of nonnormality, but they are relatively small in most of the settings considered here.

colloquially known as the Greenbook forecasts (and more recently as the Tealbook forecasts), are made available to all of the members of the Federal Open Market Committee (FOMC) prior to each FOMC meeting. Because the forecasting model is not made available, nor is there routine reporting of the forecasting uncertainty, we find it a natural environment for the application of our methods. The sample of Greenbook forecasts is relatively large, meaning that it can be used to evaluate our prediction regions simultaneously by splitting the predictive sample and saving the second subsample for evaluation. Notice that, while the forecasts and the realizations are available, the null model is not, and hence, one has to rely entirely on the empirical approach we have proposed.

4. An application to the Greenbook forecasts 4.1. Data and the construction of bands The staff at the Board of Governors of the Federal Reserve routinely produce forecasts of inflation and output, among many other variables. These forecasts,

We have collected quarterly Greenbook forecasts starting in 1974:Q2, for which H = 5 is available. The

466

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Fig. 2. Widths of the empirical prediction bands based on Greenbook forecasts of output and inflation. Sample: 1974Q2–2003Q4. Notes: These bands show how much should be added to/subtracted from a path-forecast in order to construct prediction bands with a 95% coverage, as defined in the paper (i.e., they have been re-centered at zero). The bands are constructed from the empirical sample of forecast errors using a rolling window that begins with the sample 1974Q2–1985Q1. The data come from the Federal Reserve’s Greenbook forecasts (now called Tealbook). Output is measured as GNP until 1991 and GDP afterwards. We report the forecasts based on the annualized quarterly growth rate. The inflation forecast is based on the GNP/GDP deflator and calculated as the annualized quarterly growth rate. For h = 0, it refers to a nowcast. For h = 1, 2, 3, 4, it refers to the h-period-ahead forecast.

data for h = 1 represents a nowcast, so that we have path-forecasts up to 1 year ahead. The sample ends in 2003:Q4, since the Greenbook data are released with a delay of 5 years. The forecasts are for output growth (where output is measured by GNP until 1991 and GDP afterwards) and inflation (measured as growth in the GNP deflator until 1991 and in the GDP deflator afterwards), measured as 400 ∗ ln(yt /yt −1 ). To enable comparisons with actual

data, we use the chain-weighted GNP and GDP, and their deflators.11 Using forecasts and the corresponding actual data, we have constructed the sequences of path-forecast errors and tested them for normality using the test of Bai and

11 Here, we use the data as published in 2009. Jordà et al. (2010) also considered the role of real time data in the context of this application.

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Ng (2005). Having obtained non-rejections of normality in all cases, we proceed using the normality assumption.12 We then used the error sequences to estimate the ΩH matrix empirically on the predictive sample. Of course, the analytical approach is not available, since the null model that produces the forecasts is unknown. Next, we construct marginal, Bonferroni and Scheffé bands for the growth and inflation forecasts for different nominal coverage rates. In Fig. 2, we plot the (one-sided) 95% bands, centered at zero for convenience.13 In general, the Bonferroni bands are wider than the Scheffé bands for low values of h, but they are sometimes narrower for higher values of h.14 The commonly-used marginal bands are clearly the narrowest, except for h = 1. We now assess the relative performances of the bands in the application at hand in terms of actual coverage rates. For the evaluation, we split the sample at T ∗ = 1985:Q1. With the sample of forecast paths starting in 1974:Q2 and ending at T ∗ , we get 40 forecast paths with the corresponding errors, from which we construct the H . This estimate is used to construct empirical estimate Ω uncertainty bands for the forecast path from T ∗ + 1, H by rolling the i.e., from 1985:Q2. We then calculate Ω window of forecast paths forward, one observation at a time, by deleting the first path and adding one path at the end; we also roll the forecast path for which we construct uncertainty bands. We keep doing this until uncertainty bands have been constructed for the final path starting in T = 2003:Q4, for which the final window of forecast paths H . starting from 1992:Q4 to 2002:Q3 is used to determine Ω In this way, we produce uncertainty bands for 75 forecast paths. This procedure yields a set of measures of pathforecast uncertainty that can be compared with the actual realizations in order to compute the actual coverage rate of each type of prediction band. Table 5 reports the actual coverage rates under FWPER and Wald control for U.S. inflation and output growth for nominal levels of 68% and 95%. As expected, the marginal bands perform very poorly, particularly for the 68% nominal coverage rates, but they are also dominated by the Bonferroni and Scheffé bands for the 95% level. The ranking of the latter is less clear-cut. On the whole, in terms of FWPER control, the Bonferroni bands are preferable. For Wald control, the Scheffé bands tend to yield the best results. Thus, the empirical results are in line with those found in the Monte Carlo simulations overall, but they also confirm a certain robustness in the coverage of the bands, even in the likely presence of structural changes in the distribution of the forecast errors due to the Great Moderation described by McConnell and Pérez Quirós (2000).

12 Detailed results are available upon request. 13 This means that, for any given path-forecast, adding or subtracting the values that we provide in the figure would generate the appropriate 95% prediction interval. 14 This counterintuitive behavior is likely to be due to the lack of predictability for horizons with h ≥ 4, which causes the forecast error variances to be very similar for these horizons, as well as to the estimation uncertainty implied by the small sample size.

467

5. Conclusions This paper introduces new methods for computing prediction intervals with simultaneous probability coverages. We make few assumptions in deriving our results, and they can be applied in situations where the model generating the forecasts may be unavailable. Using our methods, one can still construct prediction intervals using the empirical distribution of past forecast errors. However, in small samples, distributional assumptions can hardly be avoided. We assess the performance of the proposed prediction intervals, the Scheffé bands, as well as the performances of the marginal and Bonferroni bands. This evaluation rests on two criteria, the coverage with respect to FWPER control and the Wald coverage. In Monte Carlo studies, the Bonferroni bands tend to yield the best results with respect to FWPER control, whereas the Scheffé bands perform best with respect to Wald coverage, in general. In the case of model misspecification, it is advisable to construct and evaluate the bands based on past forecast errors. In an empirical application to the Greenbook forecasts for inflation and output growth, we find that the use of traditional point-wise intervals based on the marginal distribution of forecast errors can be very misleading. As in the Monte Carlo studies, the Bonferroni bands turn out to be preferable with respect to FWPER control, whereas the Scheffé bands generally have the best Wald coverage. Acknowledgments This paper represents the authors’ personal opinions and does not necessarily reflect the views of the Deutsche Bundesbank, the Federal Reserve Bank of San Francisco or the Federal Reserve System. We thank the Editor Mike Clements, three anonymous referees, and seminar participants at the Bundesbank for helpful comments and suggestions. Jordà acknowledges partial financial support from the Spanish Ministerio de Ciencia e Innovación, grant SEJ-2007-63098. References Bai, J., & Ng, S. (2005). Tests for skewness, kurtosis and normality for time series data. Journal of Business and Economic Statistics, 23, 49–60. Beran, R. (1988). Balanced simultaneous confidence sets. Journal of the American Statistical Association, 83, 679–686. Clements, M. P., & Hendry, D. F. (1998). Forecasting economic time series. Cambridge, U.K: Cambridge University Press. Clements, M. P., & Taylor, N. (2001). Bootstrapping prediction intervals for autoregressive models. International Journal of Forecasting, 17, 247–267. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. Jordà, Ò., Knüppel, M., & Marcellino, M. G. (2010). Empirical simultaneous prediction regions for path-forecasts. Discussion Paper Series 1: Economic Studies 2010, 06, Deutsche Bundesbank, Research Centre. Jordà, Ò., & Marcellino, M. G. (2010). Path-forecast evaluation. Journal of Applied Econometrics, 25, 635–662. Kim, J. H. (1999). Asymptotic and bootstrap prediction regions for vector autoregression. International Journal of Forecasting, 15, 393–403. Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses. New York, NY: Springer.

468

Ò. Jordà et al. / International Journal of Forecasting 29 (2013) 456–468

Lütkepohl, H. (1996). Handbook of matrices. Chichester, U.K: John Wiley & Sons. Mahalanobis, P. C. (1936). On the generalized distance in statistics. Procedures of the National Institute of Sciences of India, 2(1), 49–55. Masarotto, G. (1990). Bootstrap prediction intervals for autoregressions. International Journal of Forecasting, 6, 229–239. McConnell, M. M., & Pérez Quirós, G. (2000). Output fluctuations in the United States: what has changed since the early 1980s? American Economic Review, 90(5), 1464–1476. Miller, R. G. (1981). Simultaneous statistical inference (2nd ed.). New York, NY: Springer-Verlag. Pascual, L., Romo, J., & Ruiz, E. (2004). Bootstrap predictive inference for ARIMA processes. Journal of Time Series Analysis, 4(7), 449–465. Politis, D. N., Romano, J. P., & Wolf, M. (1999). Subsampling. In Springer Series in Statistics. New York, NY: Springer. Scheffé, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika, 40(1-2), 87–110. Staszewska-Bystrova, A. (2011). Bootstrap prediction bands for forecast paths from vector autoregressive models. Journal of Forecasting, 30(8), 721–735. Staszewska-Bystrova, A. (2012). Modified Scheffé’s prediction bands. Mimeo. Staszewska-Bystrova, A., & Winker, P. (2013). Constructing narrowest pathwise bootstrap prediction bands using threshold accepting. International Journal of Forecasting, 29(2), 221–233. Stock, J. H., & Watson, M. W. (2001). Vector autoregressions. Journal of Economic Perspectives, 15(4), 101–115. Thombs, L. A., & Schucany, W. R. (1990). Bootstrap prediction intervals for autoregression. Journal of the American Statistical Association, 85, 486–492. Turner, D. L., & Bowden, D. C. (1977). Simultaneous confidence bands for percentile lines in the general linear model. Journal of the American Statistical Association, 72(360), 886–889.

Williams, W. H., & Goodman, M. L. (1971). A simple method for the construction of empirical confidence limits for economic forecasts. Journal of the American Statistical Association, 66(336), 752–754. Wolf, M., & Wunderli, D. (2012). Bootstrap joint prediction regions. Working Paper No. 64. Department of Economics, University of Zurich.

Òscar Jordà is research advisor at the Federal Reserve Bank of San Francisco and professor of Economics at the University of California, Davis. His research interests are in the areas of time series econometrics and monetary economics, and his current research focuses on time series methods for macroeconomics, forecasting, non-linear time series, and classification of economic cyclical phenomena. He has a vast publication record, including articles in the American Economic Review, Journal of Political Economy, Review of Economics and Statistics, and International Economic Review. Malte Knüppel is an economist in the Research Centre of the Deutsche Bundesbank. His research interests include forecasting, business cycle analysis and monetary policy. He has published papers in the Journal of Business and Economic Statistics, the International Journal of Central Banking, and Macroeconomic Dynamics. Massimiliano Marcellino holds the Pierre Werner Chair on the European Monetary Union at the European University Institute, on leave from Bocconi University where he is professor of Econometrics. He is also a fellow of the Center for Economic Policy Research, and Scientific Vice Chair of the Euro Area Business Cycle Network. He has published a large number of academic articles in leading international journals on applied macroeconomics and econometrics, his main areas of research and teaching. He is currently an editor of the Journal of Forecasting and the coordinator of the European Forecasting Network.