Bootstrap prediction intervals for SETAR models

Bootstrap prediction intervals for SETAR models

International Journal of Forecasting 27 (2011) 320–332 www.elsevier.com/locate/ijforecast Bootstrap prediction intervals for SETAR models Jing Li ∗ D...

289KB Sizes 0 Downloads 110 Views

International Journal of Forecasting 27 (2011) 320–332 www.elsevier.com/locate/ijforecast

Bootstrap prediction intervals for SETAR models Jing Li ∗ Department of Economics, Scobey Hall, Box 504, South Dakota State University, Brookings, SD 57007, United States

Abstract This paper considers four methods for obtaining bootstrap prediction intervals (BPIs) for the self-exciting threshold autoregressive (SETAR) model. Method 1 ignores the sampling variability of the threshold parameter estimator. Method 2 corrects the finite sample biases of the autoregressive coefficient estimators before constructing BPIs. Method 3 takes into account the sampling variability of both the autoregressive coefficient estimators and the threshold parameter estimator. Method 4 resamples the residuals in each regime separately. A Monte Carlo experiment shows that (1) accounting for the sampling variability of the threshold parameter estimator is necessary, despite its super-consistency; (2) correcting the small-sample biases of the autoregressive parameter estimators improves the small-sample properties of bootstrap prediction intervals under certain circumstances; and (3) the two-sample bootstrap can improve the long-term forecasts when the error terms are regimedependent. c 2010 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. ⃝ Keywords: Bootstrap; Interval forecasting; SETAR models; Time series; Simulation

1. Introduction As was shown by Chatfield (1993) and Christoffersen (1998), interval forecasts can provide useful information about how close the true future value is likely to be to a predicted value. Using interval forecasts, we can make statements such as “the true future value is believed to lie within a certain interval with such and such a level of confidence”. For an autoregressive (AR) model, one can use the BoxJenkins method and construct classical prediction intervals (CPIs), see Granger and Newbold (1986). In ∗ Tel.: +1 605 688 4848; fax: +1 605 688 6386.

E-mail address: [email protected].

practice, three factors may worsen the finite sample performances of CPIs. The first factor is that the distribution of prediction errors may be non-normal. The second is that CPIs ignore the sampling variability of autoregressive coefficient estimators. The third factor, which Shaman and Stine (1988) emphasize, is the small sample bias of the autoregressive coefficient estimator. As was documented by Thombs and Schucany (1990) and Kim (2001), these factors cause CPIs to under-cover the nominal coverage rate. This paper focuses on bootstrap prediction intervals (BPIs) for the self-exciting threshold autoregressive (SETAR) model of Tong (1983), for which the non-normality of prediction errors is a prominent fea-

c 2010 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. 0169-2070/$ - see front matter ⃝ doi:10.1016/j.ijforecast.2010.01.013

321

J. Li / International Journal of Forecasting 27 (2011) 320–332

ture. All of the autoregressive coefficient estimators in the SETAR model are functions of the threshold value estimator, which was shown by Hansen (2000) to follow a nonstandard distribution. Therefore the prediction error of the SETAR model, calculated as the difference between the true future value and the predicted value, follows a nonstandard distribution by construction. Inference in the threshold autoregressive model is discussed by Enders, Falk, and Siklos (2007), Gonzalo and Wolf (2005) and Hansen (1997, 1999). This paper considers four alternative methods of obtaining BPIs for the SETAR model. Chan (1993) proves that the threshold value estimator using a grid search is super-consistent. In this paper we investigate whether accounting for the sampling variability of the threshold value estimator improves the small-sample performance of BPIs. Method 1 ignores the sampling variability of the threshold value estimator, while Method 3 takes the sampling variability into account. Following Kilian (1998) and Kim (2001), we propose Method 2 and obtain BPIs based on the bias-corrected estimators for the autoregressive coefficients. Method 4 is applicable to the SETAR models with regimedependent error terms. For these models, constructing BPIs belongs to the “two-sample problem”, in the terminology of Efron and Tibshirani (1993). Method 4 resamples the residuals in each regime separately (known as the two-sample bootstrap hereafter), instead of resampling the pooled residuals. In this paper we do not consider the percentile-t method, since it is theoretically difficult to compute the asymptotic standard error and its bootstrap counterpart (σˆ kc (h) and σˆ k∗ (h) in Kim, 2001) for the SETAR model. There are several studies which are concerned with BPIs for linear autoregressive models. For instance, Thombs and Schucany (1990) generate bootstrap replicates based on backward AR models. Kim (2001, 2002) apply the bootstrap-after-bootstrap method of Kilian (1998) to correct the biases of autoregressive coefficient estimators. Kim (1999, 2004) consider BPIs for vector autoregressions based on backward AR models, while, in contrast, Grigoletto (1998) and Masarotto (1990) build BPIs based on forward AR models. Clements and Taylor (2001) apply the bootstrap-after-bootstrap method to forward AR models. De Gooijer and Kumar (1992) provide a good overview of nonlinear time series forecasting. De

Gooijer and de Bruin (1998) derive approximate expressions for the conditional mean and conditional variance of future values for various SETAR models. Clements and Smith (1997) find that the bootstrap (BS) method has a desirable performance for point forecasting, and we generalize their BS method for interval forecasting in this paper. Clements and Smith (1999) investigate the forecasting performances of several empirical SETAR models, while Clements, Franses, Smith, and Van Dijk (2003) compare the forecasting performances of SETAR and AR models. In this paper, the small-sample performances of four BPIs for the SETAR model are compared using an extensive Monte Carlo experiment. Special attention is paid to the effects of (i) the magnitude of varying coefficients across regimes; (ii) the number of observations subject to regime-switching; (iii) the heterogeneous error terms; and (iv) the non-stationarity. Points (i), (ii) and (iii) have not been discussed previously in papers on BPIs for linear AR models. The remainder of the paper is organized as follows. Section 2 specifies the SETAR model, and a simple simulation is used to demonstrate the nonstandard distribution of the predicted values. The first three methods of constructing BPIs, for the SETAR model with regime-invariant error terms, are proposed in Section 3. Section 4 presents the fourth method, which is for the SETAR model with regime-varying error terms. A Monte Carlo experiment is conducted in Section 5. Section 6 discusses several technical issues, and Section 7 concludes. The nominal coverage rate is set to 0.95 throughout the paper.

2. The SETAR model Let Yt be a time series of interest which generates the observed data (y1 , . . . , yn ), with pre-sample values (y0 , y−1 , . . . , y− p+1 ). A two-regime self-exciting threshold autoregressive (SETAR) model of order p is written as   p − Yt = β10 + β1 j Yt− j 1(Yt−1 > γ ) j=1

 +

β20 +

p − j=1

 β2 j Yt− j 1(Yt−1 ≤ γ ) + et , (1)

322

J. Li / International Journal of Forecasting 27 (2011) 320–332

where et is the error term whose distribution function is Fe . We do not assume a specific parametric form for Fe . However, we do assume that et ∼ iid(0, σ 2 ), in order to facilitate the bootstrap based on residual resampling. We assume that the lag order p is known. In practice, p can be chosen based on information criteria, such as the AIC and BIC. See page 144 of De Gooijer and Kumar (1992) for more discussions. γ denotes the threshold value and 1( .) denotes the indicator function, which equals one if the event inside the parentheses is true and zero otherwise. In regime one, yt−1 is greater than γ , while in regime two it is less than or equal to γ . At a given lag j, model (1) allows for different autoregressive coefficients across regimes. The threshold effect exists if β1 j ̸= β2 j for some j. Formal tests for the threshold effect were developed by Chan (1990) and Hansen (1996). The unknown threshold value1 can be estimated by a grid search as follows. For a given value of γ , we define the indicator function and fit model (1) by OLS. We do this for a range of 0 = [γ l , γ u ], where γ l and γ u are the τ th and (100 − τ )th percentiles of the empirical distribution of yt−1 . We follow Andrews (1993) and let τ = 15 throughout the paper.2 The value of γ is estimated by minimizing the residual sum of squares (RSS), i.e.,

γ is needed. For expositional purposes, the following discussion focuses on the basic model (Eq. (1)).

γˆ = argmin RSS(γ ).

where yˆt = yt (t = n, n − 1, . . . , n − p + 1).

(2)

γ ∈[γ l ,γ u ]

Chan (1993) shows that the threshold value estimator (Eq. (2)) is super-consistent with the rate of convergence of O p (n −1 ).3 Model (1) can be generalized in several ways. For example, an exogenous variable can serve as the threshold variable entering the indicator. A three-regime or band SETAR model can be specified by defining 1(yt−1 > γ2 ) for regime 1, 1(γ1 ≤ yt−1 ≤ γ2 ) for regime 2, and 1(yt−1 < γ1 ) for regime 3. In addition, one can specify the indicator based on yt−d , where d is unknown. Then a joint grid search for d and 1 Simulations show that assuming a known threshold value leads to prediction intervals for which the coverage rate is lower than for those with an unknown threshold value. This is because assuming a known γ adds no variability to the prediction intervals. 2 We find no qualitative changes to our results if we let τ = 25. 3 We check the consistency of γˆ using simulations. We let γ = 0, c1 = 0.2, and c2 = 0.8 in DGP (12). Based on 500 replications, we find that the mean square error (MSE) of γˆ decreases from 0.5858 when n = 100 to 0.1528 when n = 1000.

Once γˆ has been obtained, the residual is computed as  eˆt = yt − βˆ10 +

p −

 βˆ1 j yt− j 1(yt−1 > γˆ )

j=1

 −

βˆ20 +

p −

 βˆ2 j yt− j 1(yt−1 ≤ γˆ ),

(3)

j=1

where βˆi j denotes the least squares estimate of the coefficient, conditional on γˆ . By construction, the residual is centered at zero. We do not re-scale the residual, since preliminary simulations show little effect of rescaling. Let h denote the forecast horizon. The h-stepahead forecast, conditional on the last p observations of yt , can be computed as   p − yˆt+h = βˆ10 + βˆ1 j yˆt+h− j 1( yˆt+h−1 > γˆ ) j=1

 +

βˆ20 +

p −

 βˆ2 j yˆt+h− j 1( yˆt+h−1 ≤ γˆ ),

(4)

j=1

Hansen (2000) proves that the threshold value estimator (Eq. (2)) follows a nonstandard distribution. The predicted values of the SETAR model are linear combinations of autoregressive coefficient estimators, which are themselves functions of the threshold value estimator. This implies that the predicted values and the prediction errors of the SETAR model follow nonstandard distributions. To demonstrate this, we generate 5000 1-step-ahead predicted values from the SETAR model: yt = 0.2yt−1 1(yt−1 > 0) + 0.8yt−1 1(yt−1 ≤ 0) + et , t = 1, . . . , n, where n = 60, et ∼ iidN(0, 1), and y0 = 0. After obtaining γˆ as in Eq. (2), we compute yˆt+1 as in Eq. (4). Fig. 1 displays a histogram of the standardized yˆt+1 and the corresponding statistics. The values of the skewness and kurtosis are found to be highly significant, which makes normality doubtful. The test of Jarque and Bera (1987) also rejects normality at the 0.01 level. This simulation shows that the assumption of normal prediction errors is not valid for the SETAR model.

323

J. Li / International Journal of Forecasting 27 (2011) 320–332

1.6

Histogram of the 1-step-ahead predicted values of TAR models Skewness = – 0.69, Kurtosis = 47.52, Jarque-Bera test = 64366.10

 ∗ ∗ = βˆ10 yˆt+h +

p −

 ∗ ∗ βˆ1∗ j yˆt+h− j 1( yˆt+h−1 > γˆ )

j=1 1.4

 +

1.2

∗ βˆ20

+

p −

 ∗ βˆ2∗ j yˆt+h− j

∗ ∗ , 1( yˆt+h−1 ≤ γˆ ) + et+h

j=1

1.0

∗ et+h

0.8 0.6 0.4 0.2 0.0 -1

-2

0

1

2

Fig. 1. Histogram of standardized 1-step-ahead predicted values of TAR models.

3. One-sample bootstrap prediction intervals In this section we propose bootstrap prediction intervals for the SETAR model with regime-invariant error terms. The point is to use the bootstrap to “automatically” account for the variability in the parameter estimators and the non-normality of the prediction errors. Chan (1993) shows that the threshold value estimator (Eq. (2)) is n-consistent. Method 1 ignores the sampling variability of γˆ , on the basis of its superconsistency, and involves the following steps. Method 1 Step 1.1. Let Feˆ denote the empirical cdf of eˆt computed by Eq. (3). Generate the bootstrap replicate of yt (denoted by yt∗ ) as yt∗ = yt , 

(t = 1, . . . , p),  p − ∗ ∗ βˆ10 + βˆ1 j yt− j 1(yt−1 > γˆ )

yt∗ =

j=1

 +

βˆ20 +

p −

 ∗ βˆ2 j yt− j

∗ 1(yt−1 ≤ γˆ ) + et∗ ,

j=1

(t > p), where et∗ is a random draw from Feˆ with replacement. Step 1.2. Re-estimate model (1) using yt∗ (t = 1, . . . , n) and γˆ , and obtain the bootstrap coefficient βˆi∗j . Then compute the bootstrap h-step-ahead forecast ∗ ), (denoted by yˆt+h yˆt∗ = yt

(t = n, n − 1, . . . , n − p + 1),

is a random draw from Feˆ with where replacement. Step 1.3. Repeat Steps 1.1 and 1.2 B times, and obtain B  ∗ (i) i=1 . Method 1’s a set of bootstrap forecasts yˆt+h h-step-ahead BPIs at the 0.95 nominal level are given by   0.025 0.975 BPI1 = yˆt+h , yˆt+h , (5) 0.025 and yˆ 0.975 are the 2.5th and 97.5th where yˆt+h t+h  ∗ B percentiles of the empirical cdf of yˆt+h (i) i=1 . Notice that in Step 1.1 the bootstrap replicates are generated using the forward model. The backward representation used by Kim (2001), Thombs and Schucany (1990) and others is not available for SETAR models because it is impossible to invert the lag polynomial augmented with indicators. Following Efron and Tibshirani (1986), we use the first p observations of the observed series as the initial values for bootstrap replicates. Alternatively, one could use any block of p observations of yt . In Step 1.2 we compute the bootstrap forecasts conditional on the last p observations of the observed series. The importance of this conditionality on the last p observations is stressed by Chatfield (1993), Kim (2001) and Maekawa (1987). In Step 1.3 we construct BPIs (Eq. (5)) using the percentile method of Efron and Tibshirani (1993). Hall (1988) compares different percentile methods. De Gooijer and Kumar (1992) emphasize that the percentile methods work well when the conditional distribution of the predicted values is unimodal. To check the unimodality, we apply the DIP test of Hartigan and Hartigan (1985) to the series of 500 1-stepahead forecasts for the SETAR model: yt = (0.2 + 0.2yt−1 )1(yt−1 > 0) + (−0.8 + 0.8yt−1 )1(yt−1 ≤ 0) + et , (t = 1, . . . , 100), where et ∼ iidN(0, 1), y0 = 0. A Monte Carlo experiment shows that the rejection frequency of the DIP test based on 1000 replications is nearly zero. We try several other SETAR models and find similar rejection frequencies. These findings support the idea that the distribution of the predicted values of the SETAR models is unimodal.

324

J. Li / International Journal of Forecasting 27 (2011) 320–332

Shaman and Stine (1988) provide the bias formula for autoregressive coefficient estimators in the linear AR model. Our simulations show that for the SETAR model the small sample biases of autoregressive coefficient estimators increase as the difference between the coefficients across regimes rises. The small-sample biases are corrected by Method 2 as follows. Method 2 Step 2.1. Same as Step 1.1. Step 2.2. Re-estimate model (1) using yt∗ (t = 1, . . . , n) and γˆ , and obtain the bootstrap coefficient βˆi∗j . Repeat this process C times to get a set of bootstrap coef C ficients βˆ ∗ (k) . Then compute the bias-corrected ij

k=1

autoregressive coefficient as βˆicj = 2βˆi j − C −1

C −

βˆi∗j (k),

k=1

i = 1, 2, j = 0, 1, . . . , p.

(6)

Step 2.3. Use βˆicj , and generate the bias-corrected bootstrap replicate as ytc∗ = yt (t = 1, . . . , p),   p − c∗ c c c∗ c∗ yt = βˆ10 + βˆ1 j yt− j 1(yt−1 > γˆ ) j=1

 +

c βˆ20 +

p −

 c∗ c∗ ∗ βˆ2c j yt− j 1(yt−1 ≤ γˆ ) + et ,

j=1

t > p, where et∗ is a random draw from Feˆ with replacement. Step 2.4. Re-estimate model (1) using ytc∗ (t = 1, . . . , n) and γˆ , and obtain the bootstrap coefficient βˆic∗j . Next, compute the bias-corrected bootstrap forecast as yˆtc∗ = yt (t = n, n − 1, . . . , n − p + 1),   p − c∗ c∗ c∗ ∗ c∗ ˆ ˆ > γˆ ) yˆt+h = β10 + β1 j yˆt+h− j 1( yˆt+h−1 j=1

 +

c∗ βˆ20 +

p −

 c∗ c∗ ∗ βˆ2c∗j yˆt+h− j 1( yˆt+h−1 ≤ γˆ ) + et+h ,

j=1 ∗ et+h

where is a random draw from Feˆ with replacement. Step 2.5. Repeat Steps 2.1, 2.2, 2.3 and 2.4 B times and obtain a set of bias-corrected bootstrap forecasts  c∗ B yˆt+h (i) i=1 . Method 2’s h-step-ahead BPIs are given

by   0.025c 0.975c BPI2 = yˆt+h , yˆt+h ,

(7)

0.025c and yˆ 0.975c are the 2.5th and 97.5th where yˆt+h t+h  c∗ B percentiles of the empirical cdf of yˆt+h (i) i=1 . Following Kilian (1998) and Kim (2001), we calculate the bias-corrected estimates for the AR coefficients in Step 2.2. Then Step 2.3 generates the bootstrap replicates using the bias-corrected coefficients. Note that Steps 2.3 and 2.4 resample the residuals given in (3). Alternatively, one may resample the residuals implied by the bias-corrected autoregressive coefficients. The stationarity correction of Kilian (1998) is needed when the data are nonstationary. We will discuss this issue later. Corradi and Swanson (2007) discuss the issues of bootstrapping for models that are recursively estimated. Methods 1 and 2 both ignore the variability of the threshold value estimator. Method 3, on the other hand, explicitly takes the variability of γˆ into account.

Method 3 Step 3.1. Same as Step 1.1. Step 3.2. Estimate the bootstrap threshold value γˆ ∗ from Eq. (2) using yt∗ , t = 1, . . . , n. Then re-estimate model (1) using yt∗ (t = 1, . . . , n) and γˆ ∗ , and γ∗ obtain the bootstrap coefficient βˆi j . Next compute γ∗ the bootstrap h-step-ahead forecast yˆt = yt , t = n, n − 1, . . . , n − p + 1,   p − γ∗ γ∗ γ∗ γ∗ γ∗ ˆ ˆ yˆt+h = β10 + β1 j yˆt+h− j 1( yˆt+h−1 > γˆ ∗ ) j=1

 +

γ∗ βˆ20 +

p −



γ∗ γ∗ γ∗ βˆ2 j yˆt+h− j 1( yˆt+h−1 ≤ γˆ ∗ )

j=1 ∗ + et+h , ∗ where et+h is a random draw from Feˆ .

Step 3.3. Repeat Steps 3.1 and 3.2 B times to obtain  γ∗ B a set of bootstrap forecasts yˆt+h (i) i=1 . Method 3’s h-step-ahead BPI is given by   0.025γ 0.975γ BPI3 = yˆt+h , yˆt+h , (8) 0.025γ

where yˆt+h

0.975γ

and yˆt+h

are the 2.5th and 97.5th  γ∗ B percentiles of the empirical cdf of yˆt+h (i) i=1 .

325

J. Li / International Journal of Forecasting 27 (2011) 320–332

Note that the variability of γˆ has been accounted for, since the threshold value is re-estimated in Step 3.2. 4. Two-sample bootstrap prediction intervals Model (1) assumes that the error term follows a regime-invariant distribution. Thus, bootstrapping model (1) belongs to the “one-sample problem” defined by Efron and Tibshirani (1993). More generally, we can write a SETAR model with regime-dependent error terms as   p − β1 j yt− j + e1t 1(yt−1 > γ ) yt = β10 + j=1

 +

β20 +

p −

 β2 j yt− j + e2t 1(yt−1 ≤ γ ), (9)

j=1

where e1t ∼ iid(0, σ12 ) and e2t ∼ iid(0, σ22 ). Let Fe1 and Fe2 denote the distribution functions for e1t and e2t . Then model (9) allows Fe1 ̸= Fe2 . Bootstrapping this model now becomes a “two-sample problem”. Therefore, instead of resampling the pooled residuals, Method 4 resamples the residuals in regimes one and two separately. Method 4 Step 4.1. Estimate γˆ as in Eq. (2), and define regimes 1 and 2 accordingly. Then fit model (1) and compute the residual eˆt as in Eq. (3). Collect the residuals in regime one as {eˆ1t }n1 t=1 , and the residuals in regime two as {eˆ2t }n2 , where n1 + n2 = n − p. t=1 Step 4.2. Denote the empirical cdfs of {eˆ1t }n1 t=1 and {eˆ2t }n2 by F and F . Generate the bootstrap e1 ˆ e2 ˆ t=1 replicate of yt as yttwo∗ = yt (t = 1, . . . , p),   p − two∗ two∗ ∗ two∗ yt = βˆ10 + βˆ1 j yt− j + e1t 1(yt−1 > γˆ ) j=1

 +

βˆ20 +

p −

 two∗ βˆ2 j yt− j

∗ + e2t

two∗ 1(yt−1 ≤ γˆ ),

j=1

t > p, ∗ and e∗ are random draws from F and F where e1t e1 ˆ e2 ˆ 2t with replacement. Step 4.3. Re-estimate model (1) using yttwo∗ (t = 1, . . . , n) and γˆ , and obtain the bootstrap coefficient

βˆitwo∗ j . Then compute the bootstrap h-step-ahead forecast, yˆttwo∗ = yt (t = n, n − 1, . . . , n − p + 1),   p − ∗ two∗ two∗ two∗ two∗ ˆ ˆ β1 j yˆt+h− j + e1t+h yˆt+h = β10 + j=1 two∗ × 1( yˆt+h−1 > γˆ )  p − two∗ two∗ + βˆ2two∗ + βˆ20 j yˆt+h− j j=1 two∗ ≤ γˆ ), × 1( yˆt+h−1

 ∗ + e2t+h

∗ ∗ where e1t+h and e2t+h are random draws from Fe1 ˆ and Fe2 with replacement. ˆ

Step 4.4. Repeat Steps 4.2 and 4.3 B times and obtain  two∗  B a set of bootstrap forecasts yˆt+h (i) i=1 . Method 4’s h-step-ahead BPIs are given by   0.025two 0.975two BPI4 = yˆt+h , yˆt+h , (10) 0.025two and yˆ 0.975two are the 2.5th and 97.5th where yˆt+h t+h  two∗  B percentiles of the empirical cdf of yˆt+h (i) i=1 .

5. Monte Carlo experiment This section investigates the performances of BPIs using a Monte Carlo experiment. Following Thombs and Schucany (1990), the criterion of comparison is the average coverage rate, computed as m −1

m −

1 (yt+h ∈ PI) ,

(11)

i=1

where m = 100 for each replication and PI denotes BPI1 (Eq. (5)), BPI2 (Eq. (7)), BPI3 (Eq. (8)) and BPI4 (Eq. (10)). The forecast horizon h ranges from 1 to 8.4 The number of Monte Carlo simulations (replications) is set to 500. The number of bootstrap replications is set to B = 999 for Methods 1, 3 and 4. For Method 2, the number of bootstrap replications is C = 200 for Step 2.2, and B = 999 for Step 2.5. The method that produces the prediction intervals with the average coverage rate closest to 0.95 is deemed to be the best method. 4 The results are similar when h ranges from 1 to 12.

326

J. Li / International Journal of Forecasting 27 (2011) 320–332

We first consider the data generating process (DGP) with the regime-invariant error terms: yt = c1 yt−1 1(yt−1 > γ ) + c2 yt−1 1(yt−1 ≤ γ ) + et , t = 1, . . . , n, (12) where y0 = 0. Figs. 2–5 plot the average coverage rates of BPI1, BPI2 and BPI3 against the forecast horizon h (this is a “one-sample” problem, so we ignore BPI4). A note about the key in those figures: the number after the underlining sign indexes the graphs. For example, in Fig. 2, BPI1 1 denotes the coverage rate of BPI1 in the first graph, BPI3 2 denotes the coverage rate of BPI3 in the second graph, etc. Letting c1 = 0.2, c2 = 0.8, γ = 0.0, and n = 60 in Eq. (12), Fig. 2 shows the coverage rates of the BPIs when the error terms are simulated using the standard normal distribution, the fat-tailed Student-t distribution with 5 degrees of freedom, and the skewed chi-square distribution with 4 degrees of freedom. Following Thombs and Schucany (1990), the non-normal error terms are standardized before simulation. There are four findings in Fig. 2. First, there is no severe distortion in coverage rates; the coverage rates of all BPIs are between 0.94 and 0.96 in most cases. This implies that bootstrap prediction intervals generally work well for various distributions. Second, BPI1 has the lowest coverage rate among the three methods, just because BPI1 ignores the variability of the threshold value estimator. On the other hand, BPI2 has the highest coverage rate except for h = 1. This fact is consistent with the findings of Efron and Tibshirani (1993), who point out that more variability is introduced by the bias-corrected statistics. The coverage rate of BPI3 is in the middle, because BPI3 accounts for the variability of γˆ but does not correct the small-sample biases for the autoregressive parameter estimators. The third finding is that the chi-square distribution causes more coverage distortion than the Student-t distribution. Finally, we find that the coverage rates of all BPIs increase as the forecast horizon increases. This result is typical, since long-term forecasting intrinsically involves more uncertainty than short-term forecasting. Fig. 3 shows the effects of autoregressive coefficients on the coverage rates. We let c2 = 0.8, γ = 0.0, n = 60, and et ∼ iidN(0, 1) in Eq. (12), and consider various values for c1 . The threshold effect is present when c1 = 0.2, and the results are the same

as in the left graph of Fig. 2. The threshold effect disappears and the SETAR model reduces to a linear AR model when c1 = c2 = 0.8. The middle graph shows that BPI1 and BPI3 suffer more reduced coverage distortion than BPI2, which shows the benefit of bias-correction. This finding agrees with Kilian (1998) and Kim (2001). We have an interesting finding when c1 = 1.0 and the data are nonstationary in that regime: we see a severe coverage distortion for BPI1, as its coverage rate declines monotonically with the forecast horizon. By contrast, the coverage rate of BPI2 is relatively stable and slightly above the nominal level. Fig. 3 is obtained without using the stationarity correction of Kilian (1998). By contrast, Fig. 12 is produced using Kilian’s stationarity correction. When comparing Figs. 12 and 3, we see small differences in the left and middle graphs, but in the right graph the stationarity correction shifts the BPI2 curve down by a significant amount. This is because the stationarity correction takes effect only when nonstationarity is present, i.e., only when c1 = 1.0. The prediction intervals with the stationarity correction are shorter than the intervals without the stationarity correction, since the stationarity correction avoids pushing the coefficient estimates into the nonstationary region. Fig. 4 illustrates the way in which the frequency of regime-switching affects the coverage rates. We consider various values for γ and let c1 = 0.2, c2 = 0.8, n = 60, and et ∼ iidN(0, 1) in Eq. (12). An increasing γ decreases the likelihood of regime-switching. In the limit as γ → ∞, the regime-switching tends to stop when the threshold autoregression converges to the linear autoregression. Fig. 4 shows that when γ = 0.0, regime-switching occurs frequently and the results are almost the same as in the left graph of Fig. 2. When γ = 1.0, regime-switching occurs less frequently, and thus the SETAR model behaves more like a AR model and BPI2 has the best performance. Fig. 5 investigates the way in which sample sizes affect the coverage rates. We consider n = 50, 100, 150, and let c1 = 0.2, c2 = 0.8, and γ = 0.0, et ∼ iidN(0, 1) in Eq. (12). We find that the coverage rates of the BPIs stay closer to the nominal level 0.95 in the large sample (n = 150) than in the small sample (n = 50). For instance, the coverage rate of BPI1 is between 0.935 and 0.945 when n = 50, but between 0.945 and 0.953 when n = 150. This finding confirms the consistency of the point forecast.

327

J. Li / International Journal of Forecasting 27 (2011) 320–332 Error distribution Normal

Student -t

Chi-squared

0.970

0.970

0.965

0.965

0.960

0.960

0.960

0.955

0.955

0.955

0.950

coverage

0.965

coverage

coverage

0.970

0.950

0.950

0.945

0.945

0.945

0.940

0.940

0.940

0.935

0.935

0.935

0.930

0.930

BPI1_1

BPI2_1

0.930 1 2 3 4 5 6 7 8 horizon

1 2 3 4 5 6 7 8 horizon BPI3_1

BPI1_2

BPI2_2

1 2 3 4 5 6 7 8 horizon

BPI3_2

BPI1_3

BPI2_3

BPI3_3

Fig. 2. Average coverage rates of BPIs for different error distributions. Threshold effect c1=0.8

c1=1.0

0.97

0.97

0.96

0.96

0.96

0.95

0.95

0.95

0.94

coverage

0.97

coverage

coverage

c1=0.2

0.94

0.94

0.93

0.93

0.93

0.92

0.92

0.92

0.91

0.91

0.91

0.90

0.90 1 2 3 4 5 6 7 8

0.90 1 2 3 4 5 6 7 8

BPI1_1

BPI2_1

1 2 3 4 5 6 7 8

horizon

horizon BPI3_1

BPI1_2

BPI2_2

horizon BPI3_2

BPI1_3

BPI2_3

BPI3_3

Fig. 3. Average coverage rates of BPIs for different autoregressive coefficients without the stationarity correction.

In order to investigate the performance of BPI4, we consider the following SETAR model with regimedependent error terms: yt = (c1 yt−1 + e1t )1(yt−1 > γ ) + (c2 yt−1 + e2t ) × 1(yt−1 ≤ γ ), t = 1, . . . , n, (13) where y0 = 0, e1t ∼ iidN(0, 1), e2t ∼ iidN(0, s 2 ), s = 0.5, 1.0, 2.0, c1 = 0.2, c2 = 0.8, n = 60, and

γ = 0.0. Fig. 6 compares only BPI1 and BPI4; BPI2 and BPI3 are excluded so as to highlight the difference between the one-sample bootstrap and the two-sample bootstrap. First of all, Fig. 6 shows that the coverage rates of both BPI1 and BPI4 are below the nominal level, simply because both methods ignore the variability of γˆ . Second, BPI4 has less coverage distortion than

328

J. Li / International Journal of Forecasting 27 (2011) 320–332 Threshold value Gamma=0.5

Gamma=1.0

0.970

0.970

0.965

0.965

0.965

0.960

0.960

0.960

0.955

0.955

0.955

0.950

coverage

0.970

coverage

coverage

Gamma=0.0

0.950

0.950

0.945

0.945

0.945

0.940

0.940

0.940

0.935

0.935

0.935

0.930

0.930 1 2 3 4 5 6 7 8

0.930

horizon BPI1_1

BPI2_1

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8 horizon

BPI3_1

BPI1_2

BPI2_2

horizon BPI3_2

BPI1_3

BPI2_3

BPI3_3

Fig. 4. Average coverage rates of BPIs for different threshold values.

0.970

Sample size n=100 0.970

0.970

0.965

0.965

0.965

0.960

0.960

0.960

0.955

0.955

0.955

0.950

n=150

coverage

coverage

coverage

n=50

0.950

0.950

0.945

0.945

0.945

0.940

0.940

0.940

0.935

0.935

0.935

0.930

0.930

0.930 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

horizon BPI1_1

BPI2_1

1 2 3 4 5 6 7 8

horizon BPI3_1

BPI1_2

BPI2_2

horizon BPI3_2

BPI1_3

BPI2_3

BPI3_3

Fig. 5. Average coverage rates of BPIs for different sample sizes.

BPI1 in most cases, except when s = 0.5 and h is small. It is interesting to notice that BPI4 outperforms BPI1 when the error terms are homoskedastic (s = 1.0). Thus, it loses nothing by using the two-sample bootstrap when homoskedasticity is true. Finally, the position of the coverage line for BPI4 is relatively fixed in the three graphs, whereas the coverage line

for BPI1 keeps shifting down as s rises. Thus, loosely speaking, BPI4 is “heteroskedasticity-robust” while BPI1 is not. Fig. 6 simulates e2t using the normal distribution. Fig. 7, on the other hand, simulates e2t using the standard normal distribution, the Student-t distribution with 5 degrees of freedom, and the chi-square distribution with 4 degrees of freedom. Now e1t and

329

J. Li / International Journal of Forecasting 27 (2011) 320–332 Heteroskedasticity s=1.0

s=0.5

s=2.0

0.96

0.96

0.95

0.95

0.94

0.94

0.94

coverage

0.95

coverage

coverage

0.96

0.93

0.93

0.93

0.92

0.92

0.92

0.91

0.91 1 2 3 4 5 6 7 8

0.91 1 2 3 4 5 6 7 8

horizon BPI1_1

1 2 3 4 5 6 7 8

horizon

BPI4_1

BPI1_2

horizon

BPI4_2

BPI1_3

BPI4_3

Fig. 6. Average coverage rates of BPIs for heteroskedastic errors. Heterogeneous errors Chi-squared

Student -t

0.960

0.960

0.955

0.955

0.955

0.950

0.950

0.950

0.945

0.945

0.945

0.940

coverage

0.960

coverage

coverage

Normal

0.940

0.940

0.935

0.935

0.935

0.930

0.930

0.930

0.925

0.925

0.925

0.920

0.920 1 2 3 4 5 6 7 8

0.920 1 2 3 4 5 6 7 8

horizon BPI1_1

BPI4_1

1 2 3 4 5 6 7 8

horizon BPI1_2

BPI4_2

horizon BPI1_3

BPI4_3

Fig. 7. Average coverage rates of BPIs for heterogeneous errors.

e2t become more heterogeneous, and the findings from Fig. 7 are more favorable to BPI4. Notice that the chi-square and Student-t distributions improve the coverage rate of BPI4 by pushing the BPI4 curve closer to the nominal level. Overall, BPI4 outperforms BPI1, and the gain from using the twosample bootstrap increases as the forecast horizon

rises. For short term forecasting, BPI1 may do better than BPI4, thanks to its relatively simple algorithm. 6. Discussion 1. Discontinuous SETAR models. The SETAR model may become discontinuous at long run values if

330

J. Li / International Journal of Forecasting 27 (2011) 320–332 Discontinuous SETAR models

Length properties

0.970

5.50

0.965

5.25 average length

coverage

0.960 0.955 0.950 0.945

5.00 4.75 4.50 4.25

0.940

4.00

0.935

3.75

0.930 1

2

3

4

5

6

7

8

3.50 1

2

3

horizon BPI1_1

BPI2_1

BPI3_1

BPI1_1

Fig. 8. Average coverage rates of BPIs for discontinuous SETAR models. Baseline linear modals 0.98

coverage

0.97 0.96 0.95 0.94 0.93 1

2

3

4

5

6

7

8

horizon BPI1_1

BPI2_1

BPI3_1

PI_1

Fig. 9. Average coverage rates of BPIs for SETAR models and linear AR models. SETAR(2,1,3) model with a delay lag of 2 0.970 0.965

coverage

0.960 0.955 0.950 0.945 0.940 0.935 0.930 1

2

3

4

5

6

7

8

horizon BPI1_1

BPI2_1

BPI3_1

Fig. 10. Average coverage rates of BPIs for the SETAR(2, 1, 3) model.

intercept terms are used. To investigate the effect of discontinuity, we consider the DGP of yt = (0.2 + 0.2yt−1 )1(yt−1 > 0) + (0.6 + 0.8yt−1 )1(yt−1 ≤

4 5 horizon BPI2_1

6

7

8

BPI3_1

Fig. 11. Average lengths of BPIs.

0) + et , et ∼ iidN(0, 1), t = 1, . . . , 60. The series converges to the long run value (tractor) of 0.25 = 0.2/(1 − 0.2) in regime 1, while it converges to 3 = 0.6/(1 − 0.8) in regime 2. This model is discontinuous because the two long run values (tractors) are different. By comparing Fig. 8 to the left graph of Fig. 2, we find that discontinuity tends to reduce the coverage rates of all BPIs when h is small, but increase the coverage rates when h is big. Enders et al. (2007) discuss the effect of discontinuity on the confidence intervals of parameter estimates. 2. Linear baseline models. We investigate how well the BPI performs if we ignore the nonlinearity and fit a linear AR(1) model for the DGP (Eq. (12)) with c1 = 0.2, c2 = 0.8, γ = 0.0, et ∼ iidN(0, 1), n = 100. Fig. 9 compares the coverage rates of BPI1, BPI2, BPI3 and the BPIs for the baseline AR(1) model (denoted by ∇ PI 1). We can see that the BPIs for the baseline AR(1) model are excessively conservative, with the coverage rate stable at 0.97. 3. Pretesting methods. So far, we have constructed BPIs under the assumption that the nonlinearity is known, i.e., assuming that the true model is SETAR. In most cases the nonlinearity is not certain, so practitioners need to pretest the SETAR models against the linear autoregressive models before forecasting is performed. To examine the effects of the pretesting methods, the sup F test of Hansen (1997) is applied to the DGP (Eq. (12)), with c1 = 0.2, c2 = 0.8, γ = 0.0, et ∼ iidN(0, 1), and n = 100. We find that the rejection frequency (power) of the sup F test based on 200 replications

331

J. Li / International Journal of Forecasting 27 (2011) 320–332 Threshold effect c1=0.8

c1=1.0

0.97

0.97

0.96

0.96

0.96

0.95

0.95

0.95

0.94

0.94

0.94

0.93

coverage

0.97

coverage

coverage

c1=0.2

0.93

0.93

0.92

0.92

0.92

0.91

0.91

0.91

0.90

0.90 1 2 3 4 5 6 7 8

0.90 1 2 3 4 5 6 7 8

horizon BPI1_1

BPI2_1

1 2 3 4 5 6 7 8

horizon BPI3_1

BPI1_2

BPI2_2

horizon BPI3_2

BPI1_3

BPI2_3

BPI3_3

Fig. 12. Average coverage rates of BPIs for different autoregressive coefficients with stationarity correction.

is 0.75. This means that for this test, the pretesting (weighted) coverage rate of prediction intervals can be computed as 0.75 × (coverage rate of SETAR prediction intervals) + (1 − 0.75) × (coverage rate of linear prediction intervals) (which is about 0.97, according to PI 1 in Fig. 9). The power varies for different tests and different data generating processes. Thus, the pretesting coverage rate should be evaluated on a case-by-case basis. Overall, the coverage rate with pretesting methods is bounded between the coverage rate for the linear model and the coverage rate for the SETAR model. 4. To evaluate the performances of the proposed BPIs for a general SETAR model, we consider the DGP of yt = 0.2yt−1 1(yt−d > 0) + (−1.0yt−1 + 0.38yt−2 − 0.04yt−3 )1(yt−d ≤ 0) + et , d = 2, et ∼ iidN(0, 1), n = 100. This is a stationary SETAR(2, 1, 3) model, similar to the model studied by Clements and Smith (1999). Fig. 10 shows that BPI1, BPI2 and BPI3 all perform reasonably well. 5. Length properties. We also investigate the length properties of BPIs using the DGP (Eq. (12)), with c1 = 0.2, c2 = 0.8, γ = 0.0, et ∼ iidN(0, 1), n = 60. Fig. 11 displays the average lengths of BPI1, BPI2 and BPI3. We find that the prediction intervals of Methods 2 and 3 are generally wider than those of Method 1. This result is intuitive, since Method

2 adopts the bias-correction procedure and Method 3 accounts for the sampling variability of γˆ . 7. Conclusion This paper compares four methods of obtaining bootstrap prediction intervals for the SETAR model. Method 1 accounts for the sampling variability of the autoregressive coefficient estimators but ignores the sampling variability of the threshold value estimator. Method 2 corrects the small-sample biases of the autoregressive coefficient estimators before constructing prediction intervals. Method 3 takes into account the sampling variability of the threshold value estimator. Method 4 resamples the residuals in each regime separately. The Monte Carlo experiment shows that bootstrap prediction intervals perform reasonably well for a variety of SETAR models. The coverage rate is generally the lowest for the prediction intervals produced by Method 1. The bias-correction used by Method 2 leads to improved small-sample properties of the bootstrap prediction intervals when the autoregressive coefficients vary by a small amount across regimes, when data are nonstationary, and when the number of observations subject to regime-switching is small. The twosample bootstrap used by Method 4 outperforms the

332

J. Li / International Journal of Forecasting 27 (2011) 320–332

one-sample bootstrap when the error terms are regimedependent and when the forecast horizon is long. Acknowledgements The paper was previously titled “Bootstrap Prediction Intervals for Threshold Autoregressive Models”. We are grateful to three anonymous referees and an Associate Editor for many helpful comments and suggestions. References Andrews, D. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61, 821–856. Chan, K. S. (1990). Testing for threshold autoregression. The Annals of Statistics, 18, 1886–1894. Chan, K. S. (1993). Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. The Annals of Statistics, 21, 520–533. Chatfield, C. (1993). Calculating interval forecasts. Journal of Business and Economic Statistics, 11, 121–135. Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Review, 39, 841–862. Clements, M. P., & Smith, J. (1997). The performance of alternative forecasting methods for SETAR models. International Journal of Forecasting, 13, 463–475. Clements, M. P., & Smith, J. (1999). A Monte Carlo study of the forecasting performance of empirical SETAR models. Journal of Applied Econometrics, 14, 123–141. Clements, M. P., & Taylor, N. (2001). Boostrapping prediction intervals for autoregressive models. International Journal of Forecasting, 17, 247–267. Clements, M. P., Franses, P. H., Smith, J., & Van Dijk, D. (2003). On SETAR non-linearity and forecasting. Journal of Forecasting, 22, 359–375. Corradi, V., & Swanson, N. R. (2007). Nonparametric bootstrap procedures for predictive inference based on recursive estimation schemes. International Economic Review, 48, 67–109. De Gooijer, J. G., & de Bruin, P. T. (1998). On forecasting SETAR processes. Statistics and Probability Letters, 37, 7–14. De Gooijer, J. G., & Kumar, K. (1992). Some recent developments in non-linear time series modeling, testing, and forecasting. International Journal of Forecasting, 8, 135–156. Efron, B., & Tibshirani, R. J. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1, 54–75. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. London: Chapman and Hall.

Enders, W., Falk, B., & Siklos, P. (2007). A threshold model of real US GDP and the problem of constructing confidence intervals in TAR models. Studies in Nonlinear Dynamics and Econometrics, 11(3), Article 4. Gonzalo, J., & Wolf, M. (2005). Subsampling inference in threshold autoregressive models. Journal of Econometrics, 127, 201–224. Granger, C., & Newbold, P. (1986). Forecasting economic time series. San Diego: Academic Press. Grigoletto, M. (1998). Bootstrap prediction intervals for autoregressions: some alternatives. International Journal of Forecasting, 14, 447–456. Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals. Annals of Statistics, 16, 927–953. Hansen, B. E. (1996). Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica, 64, 413–430. Hansen, B. E. (1997). Inference in TAR model. Studies in Nonlinear Dynamics and Econometrics, 2, 1–14. Hansen, B. E. (1999). The grid bootstrap and the autoregressive model. Review of Economics and Statistics, 2, 1–14. Hansen, B. E. (2000). Sample splitting and threshold estimation. Econometrica, 68, 575–603. Hartigan, J. A., & Hartigan, P. M. (1985). The DIP test of unimodality. Annals of Statistics, 13, 70–84. Jarque, C. M., & Bera, A. K. (1987). A test for normality of observations and regression residuals. International Statistical Review, 55, 163–172. Kilian, L. (1998). Small sample confidence intervals for impulse response functions. The Review of Economics and Statistics, 80, 218–230. Kim, J. (1999). Asymptotic and bootstrap prediction regions for vector autoregression. International Journal of Forecasting, 15, 393–403. Kim, J. (2001). Bootstrap-after-bootstrap prediction intervals for autoregressive models. Journal of Business and Economic Statistics, 19, 117–128. Kim, J. (2002). Bootstrap prediction intervals for autoregressive models of unknown or infinite lag order. Journal of Forecasting, 21, 265–280. Kim, J. (2004). Bias-corrected bootstrap prediction regions for vector autoregression. Journal of Forecasting, 23, 141–154. Maekawa, K. (1987). Finite sample properties of several predictors from an autoregressive model. Econometric Theory, 3, 359–370. Masarotto, G. (1990). Bootstrap prediction intervals for autoregressions. International Journal of Forecasting, 6, 229–239. Shaman, P., & Stine, R. A. (1988). The bias of autoregressive coefficient estimators. Journal of the American Statistical Association, 83, 842–848. Thombs, L. A., & Schucany, W. R. (1990). Bootstrap prediction intervals for autoregression. Journal of the American Statistical Association, 85, 486–492. Tong, H. (1983). Threshold models in non-linear time series analysis. New York: Springer-Verlag.