Statistical Methodology 5 (2008) 263–276 www.elsevier.com/locate/stamet
Forecasting with univariate TAR modelsI Fabio H. Nieto ∗ Universidad Nacional de Colombia, A.A. 72157 Bogot´a, Colombia Received 28 November 2006; received in revised form 24 July 2007; accepted 14 September 2007
Abstract The forecasting stage in the analysis of a univariate threshold-autoregressive model, with exogenous threshold variable, has been developed in this paper via the computation of the so-called predictive distributions. The procedure permits one to forecast simultaneously the response and exogenous variables. An important issue in this work is the treatment of eventual missing observations present in the two time series before obtaining forecasts. c 2007 Elsevier B.V. All rights reserved.
Keywords: Forecasting; Nonlinear time series; Regime-switching models; Predictive distributions; Thresholdautoregressive models
1. Introduction Nieto (see [13]) developed an MCMC-based Bayesian approach for fitting a univariate threshold-autoregressive (TAR) model with exogenous threshold variable, in the presence of missing data in the output variable of interest and in the input threshold variable. This model can be seen as a particular case of Tsay’s multivariate TAR model [19], the main differences being that, in Nieto’s work, the fitting approach is Bayesian and there are missing observations in the involved variables. For forecasting nonlinear models, different approaches have been addressed, each one depending on the type of nonlinear model under consideration. Thus, the more known procedures are self-exciting threshold-autoregressive (SETAR) models (see [6]), smooth I This research was partially sponsored by DIB, the investigation division of National University of Colombia at Bogot´a. ∗ Tel.: +57 1 316 5327; fax: +57 1 316 5327. E-mail address:
[email protected].
c 2007 Elsevier B.V. All rights reserved. 1572-3127/$ - see front matter doi:10.1016/j.stamet.2007.09.002
264
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
transition autoregressive (STAR) models [10], and regime-switching autoregressive models (see [8]), among others. Under the minimum mean square error (MMSE) criterion, the goal is to obtain both the conditional expectation of the variable to be predicted, as the optimal predictor, and its MMSE, given historic observations of the interest variables. In the case of linear models, this work is relatively easy and exact analytical formulas are well-known. In the case of nonlinear models, the computation of the above quoted first and second moments is not always easy. The general strategy is to obtain the so-called predictive distributions and, then, from them to compute those moments. The task can be done by means of numerical integration, Monte Carlo simulation or bootstrapping (see [7,18], and [20], among others). Following this approach, I address the problem of developing a procedure for computing point forecasts of a variable that follows Nieto’s TAR model and their corresponding uncertainty measures via the quantiles of the predictive distributions. Additionally, the paper contributes (i) a way for forecasting simultaneously the variable of interest and the exogenous threshold variable and (ii) an indication on how the presence of missing data in the two variables can be taken into account before forecasting. I illustrate the methodological results with a real application in the hydrological/meteorological field. Here, I shall remark the important aspect of doing forecasting of the threshold variable when this does not follow a linear process but instead it can be adequately modeled by a Markov chain. For now, I do not address the very important problem of comparing the forecasting performance of this TAR model relative to linear-autoregressive models, Markovswitching models, or SETAR models, as was done in [1–4], and [12], among others. This topic will be promptly considered in the economy scenario, as these authors did. The paper is organized as follows. In Section 2, I present the main theoretical and methodological results. In Section 3, I include the real-data application in the hydrological/meteorological scenario and in Section 4 I draw some conclusions. 2. The model and main results 2.1. Theoretical background Let {X t } and {Z t } be stochastic processes related by the equation (TAR model) Xt =
( j) a0
+
kj X
( j)
ai X t−i + h ( j) εt ,
(1)
i=1
if Z t belongs to the real interval B j = (r j−1 , r j ] for some j; j = 1, . . . , l; with r0 = −∞ ( j) and rl = ∞. Here, ai and h ( j) ; j = 1, . . . , l; i = 0, 1, . . . , k j ; are real numbers and {εt } is a Gaussian zero-mean white noise process with variance 1. The real numbers r j ; j = 1, . . . , l − 1; are called the threshold values of the process {Z t } and they define l regimes for it. Additionally, the nonnegative integer numbers k1 , . . . , kl denote, respectively, the autoregressive orders of {X t } in each regime. We shall use the symbol TAR(l; k1 , . . . , kl ) to denote this model and l; r1 , . . . , rl−1 and k1 , . . . , kl are called the model structural parameters. These kinds of models were introduced by Tong (see [16]) and Tong and Lim (see [17]), specifically in the case where the threshold variable is the lagged variable X t−d , where d is some positive integer. In this case, the model is known as the self-exciting TAR (SETAR) model and, at present, there is a lot of literature on the topic of analyzing these models. Extending the SETAR model’s scope, Tsay [19] developed a multivariate threshold-autoregressive model, with
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
265
exogenous delayed threshold variable denoted by Z t−d , d ≥ 1. Under this scenario and putting d = 0, model (1) can be seen as a particular case of Tsay’s multivariate model. Also, I assume that {Z t } is exogenous in the sense that there is no feedback of {X t } towards it and that {Z t } is a homogeneous pth-order Markov chain with initial distribution F0 (z, θ z ) and kernel distribution F p (z t |z t−1 , . . . , z t− p , θ z ), where θ z is a parameter vector in an appropriate numerical space (this is a more general assumption than in Tsay’s model). Furthermore, we assume that these distributions have densities in the Lebesgue-measure sense. Let f 0 (z, θ z ) and f p (z t |z t−1 , . . . , z t− p , θ z ) be, respectively, the initial and kernel density functions of the distributions above. In what follows, we assume that the p-dimensional Markov chain {Zt } has an invariant or stationary distribution f p (z, θ z ). For more details on Markov chains with general state space, the reader can see Meyn and Tweedie [11]. We remark here that a stationary distribution implies that sample paths from {Z t } look long-term stable, in the sense that they are analogous to observed time series that come from stationary stochastic processes. This assumption on process {Z t } covers the case where it is a linear-autoregressive (AR) process and also, after an approximation to an AR process, when {Z t } is an MA process (or an ARMA process with a nontrivial MA part). 2.2. The forecasting procedure Now, I am concerned with the prediction of the variable X T +h , given observations of the variables X t and Z t for t = 1, . . . , T , where T is the length of the sample period and h ≥ 1 denotes the forecast horizon. It is well-known that, under the MMSE criterion, the best prediction is given by E(X T +h |xT , zT ), where xT = (x1 , . . . , x T ) and zT = (z 1 , . . . , z T ). Contrary to the linear case, exact analytical expressions for this first-order moment are not easy to obtain in the nonlinear situation. Hence, I address the problem of computing this conditional expectation via the Bayesian approach, that is, I shall look for the predictive distribution of X T +h , given xT and zT . Let us assume that this distribution has a Lebesgue-measure density, which I denote p(x T +h |xT , zT ). Then, I have the following result. Proposition. With the assumptions in model (1) and, furthermore, supposing that (i) Z T +i and X T + j are independent for all i > j ≥ 0, conditional on xT + j−1 and zT + j , with the convention that the conditioning set is only zT when j = 0, and (ii) the set {Z T +h−1 , . . . , Z T +1 } is independent of XT conditional on z T +h and zT , then, for each h ≥ 1, the predictive density of X T +h , given xT and zT is given by Z p(x T +h |xT , zT ) = p(x T +h |z T +h , xT , zT ) p(z T +h |zT )dz T +h , (2) where p(z T +h |zT ) =
Z
Z ···
p(z T +h |z T +h−1 , . . . , z T +1 , zT )
× p(z T +h−1 |z T +h−2 , . . . , z T +1 , zT ) · · · p(z T +1 |zT )dz T +1 · · · dz T +h−1 (3) and p(x T +h |z T +h , xT , zT ) =
Z
Z ···
p(x T +h |xT +h−1 , zT +h )
× p(x T +h−1 |xT +h−2 , zT +h−1 ) × · · · × p(x T +1 |xT , zT +1 )dx T +h−1 · · · dx T +1 , with xT +i = (xT , x T +1 , . . . , x T +i ) and zT +i defined in a similar manner.
(4)
266
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
Proof. Notice that p(x T +h |z T +h , xT , zT ) Z Z = · · · p(x T +h , x T +h−1 , . . . , x T +1 , z T +h−1 , . . . , z T +1 |z T +h , xT , zT ) × dx T +h−1 . . . dx T +1 dz T +h−1 · · · dz T +1 Z Z = · · · p(x T +h |xT +h−1 , zT +h ) · · · × p(x T +1 |xT , zT +h ) p(z T +h−1 , . . . , z T +1 |z T +h , zT , xT ) × dx T +h−1 · · · dx T +1 dz T +h−1 · · · dz T +1 R R and that · · · p(z T +h−1 , . . . , z T +1 |z T +h , zT )dz T +h−1 , . . . , dz T +1 = 1. Then, the result follows from these observations. Some important remarks on this result are respectively the following: (1) The integrand in expression (3) is the product of kernel densities of the Markov chain {Z t }. These are nothing but the Chapman–Kolmogorov equations. According to Congdon (see [5]), to draw from p(z T +h |xT , zT ), first, we can draw z T +1 from p(z T +1 |xT , zT ), then z T +2 from p(z T +2 |z T +1 , xT , zT ) and so on. In this way, one has a procedure for computing forecasts of variable Z . (2) In expression (4), the ith factor of the integrand, i = 1, . . . , h, is a normal density with ( j) Pk j ( j) mean a0 + m=1 am x T +i−m and variance [h ( j) ]2 if z T +i ∈ B j , for some j = 1, . . . , l. To draw from p(x T +h |z T +h , xT , zT ), we proceed in an order analogous to that indicated for expression (3), using the values z T +i ; i = 1, . . . , h; generated previously with expression (3). 2.3. Other characteristics of a TAR process Before proceeding with additional forecasting aspects of the TAR model, I note the following point: taking into account that the dynamic behavior of process {X t } is determined by the threshold process {Z t } and that its regimes allow one to define a Markov chain {Jt } with discrete state space, where Jt = j if Z t ∈ B j , j = 1, . . . , l, one is tempted to use Hamilton’s forecasting approach. In fact, model (1) can be redefined in terms of {Jt } instead of {Z t }. Following the regime-switching approach, certain interesting properties of the TAR model emerge (which were not studied by Nieto), as can be seen below. First of all, for each t, the distribution of X t is a mixture, as it happens in Hamilton’s model, given by the distribution function Ft (x) =
l X
p j Ft, j (x),
x ∈ R,
j=1
where p j = P(Z t ∈ B j ) = P(Jt = j), Ft, j (x) = P(X t ≤ x|Z t ∈ B j ), and R denotes the set of real numbers. Clearly, p j = Fz (r j ) − Fz (r j−1 ), where Fz is the invariant distribution function of the Markov chain {Z t }, which is assumed to be of continuous type. Under the assumptions on model (1), it is straightforward to compute p j , but not Ft, j (x), especially, when Pk j ( j) i ( j) the autoregressive polynomial φ j (B) = a0 − i=1 ai B , j = 1, . . . , l, has unit roots. With
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
267
Ft (x) one can compute the first and second moments of X t , for each t. Indeed, one has that E(X t ) =
l X
Z pj
xt dFt, j (xt ) =
l X
j=1
where µt, j,1 = E(X t2 ) =
R
p j µt, j,1 ,
(5)
j=1
xt dFt, j (xt ), j = 1, . . . , l, and that
l X
Z pj
xt2 dFt, j (xt ) =
j=1
l X
p j µt, j,2
(6)
j=1
R with µt, j,2 = xt2 dFt, j (xt ), j = 1, . . . , l. The values µt, j,i ; j = 1, . . . , l; i = 1, 2; can be considered as the first two moments of X t “in each regime”. Consequently, I have that Var(X t ) = E(X t2 ) − [E(X t )]2 . In summary, one has a way for computing the mean and variance functions of process {X t }. Second, if our interest is in the conditional distribution of X t , given X s = xs , s = t − 1, . . . , 1, it is not difficult to show that it is also a mixture distribution with the same weights p j , j = 1, . . . , l, as above, but now with l component Gaussian densities with means Pk j ( j) ( j) a0 + i=1 ai xt−i and variances [h ( j) ]2 , j = 1, . . . , l. This fact shows that {X t } is also a homogeneous Markov chain, with transition kernel density f X (xt |xt−1 , . . . , xt−k+1 ) given by the mixture just described, where k = max{k j | j = 1, . . . , l}. It is an issue for future research to find out if this Markov chain has an invariant distribution (and other properties). It is important to note here that the conditional distribution of X t , given xt−1 , . . . , xt−k+1 and z t , is Gaussian. Now, the predictive distribution function FT +h (x|xT , zT ), h ≥ 1, is also a mixture distribution and its computation is accomplished by obtaining the l component distribution function FT +h, j (x|z T +h ∈ B j , xT , zT ) and the weights p j (h) = P(Z T +h ∈ B j |xT , zT ), j = 1, . . . , l. One plausible route would be to use expressions (3) and (4) above for obtaining the corresponding densities, although this would demand the calculation of complex integrals. In any way, the mathematical form of the predictive distribution function leads one to compute the h-step ahead forecast for variable X , given the observed information trough time T , by means of x T +h|T = E(X T +h |xT , zT ) =
l X
p j (h)E(X T +h |z T +h ∈ B j , xT , zT ),
j=1
which is an analogous result to that in [8]. Note that this value is a weighted average of the forecasts “in each regime”, a fact similar to that observed in expressions (5) and (6). Example. I consider a theoretical example to illustrate some of the points raised in this section. Let us assume that {X t } follows the TAR(3;1,2,1) model given by Z t < 2.5 0.5X t−1 + εt , X t = 2.0 + 0.3X t−1 − 0.9X t−2 + 1.5εt , 2.5 ≤ Z t < 4.63 −1.0 − 0.7X t−1 + 3.0εt , 4.63 ≤ Z t , where {Z t } obeys the AR(1) model Z t = 1.5 + 0.6Z t−1 + at , with {at } a zero-mean Gaussian white noise process with variance 1, which is mutually independent of {εt }. Obviously, {Z t } is a homogeneous first-order Markov chain with invariant distribution N(3.75, 1.5625), which can be considered as the initial distribution of the chain. Its transition kernel density f 1 (z t |z t−1 ) corresponds to a N(1.5 + 0.6z t−1 , 1). Fig. 1 shows two simulated time series {xt } and {z t } from
268
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
Fig. 1. Simulated time series for the theoretical example: (a) variable Z , (b) variable X .
this model, with a sample size of 1000 each, where it can be noted the stationarity property of {z t } and bursts of large values in {xt }, although its mean function seems to be constant. The two time series were simulated using the RATS package (V. 6.02), with seed 16 440 for simulating {εt } and seed 14 400 for {at }. Before obtaining some forecasts for process {X t }, I compute the mean and variance function of it, using expressions (5) and (6). First of all, I note that the autoregressive operators for each regime have their roots outside the unit circle, thus, the distribution functions Ft, j (x), j = 1, 2, 3, can be easily computed from the infinite stochastic sum that represents X t in terms of the present and past variables of the noise {εt }. Indeed, we find that each one corresponds to a normal distribution with means 0, 1.25, and −10/3 and variances 4/3, 12.15 and 17.65 for the first, second and third regime, respectively (independent of t). Under this scope, one can say that, conditional on the regimes, {X t } is more volatile in the third regime. I find now that E(X t ) = −0.20 (see Fig. 1 for comparison) and Var(X t ) = 17.58 for all t. That is, the mean and variance functions of process {X t } are constant, as it happens in stationary processes. Conditions for establishing the stationarity property of a process that obeys the TAR model (1) need to be developed, on the author’s actual knowledge. All in all, it is important to remark here that the distribution of X t is the same for all t. Using the results established in the proposition, with 20 000 iterates for the Monte Carlo simulation, I obtained the forecasts presented in Table 1, where the sample period considered was 1–990 and the forecast horizon was 10. I include the simulated values for the time period 991–1000, the point forecasts, the root mean square errors (RMSE), 90% credible intervals, and
269
F.H. Nieto / Statistical Methodology 5 (2008) 263–276 Table 1 Forecasting results in the theoretical example t
xt a
xt|990
RMSEb
90% CIc
90% HDR
991 992 993 994 995 996 997 998 999 1000
−8.51 2.00 9.39 1.28 −2.13 −0.99 −0.40 0.90 1.11 0.72
−2.40 2.04 2.50 0.33 −0.27 0.74 1.28 0.79 0.38 0.54
5.71 6.22 7.81 7.72 8.05 8.28 8.41 8.64 8.70 8.71
[−8.97, 9.70] [−9.33, 10.48] [−12.86, 12.49] [−13.17, 12.21] [−13.14, 14.43] [−12.86, 14.55] [−14.09, 14.76] [−14.11, 14.50] [−14.45, 14.96] [−14.00, 14.87]
[−10.75, 6.63]∪[6.98, 8.17] [−8.96, −8.21]∪[−7.13, 12.05] [−10.84, 13.96] [−12.94, 12.79] [−13.63, 13.06] [−12.03, 15.19] [−13.03, 15.28] [−13.94, 14.80] [−14.21, 14.91] [−13.41, 15.97]
a Simulated value. b RMSE = root mean square error. c CI: credible interval.
90% Highest-Density Regions (HDR) (as suggested in [9]). One can see that all the simulated values fall in the corresponding credible intervals and that the RMSEs increase, as one would expect. Also, it is worth noting that the HDRs are, approximately, the same as the credible intervals, except for the first and second predictive distributions (i.e., h = 1, 2), where slight disjoint subintervals conform the regions. Looking at the corresponding estimated densities, only a very small mode seems to lie at the right side of the density for h = 1 and at the left side for h = 2, implying numerically that the HDR has a component subinterval. Nevertheless, one can say that, essentially, all the predictive distributions considered at this forecast horizon are unimodal. Now, I re-compute the forecast for X 991 using the regime-switching approach described in Section 2.3. Here, p1 (1) = 0.0537, p2 (1) = 0.6448, and p3 (1) = 0.3015 and the forecasts ( j) for each regime, i.e., x991|990 = E(X 991 |z 991 ∈ B j , x900 , z900 ), j = 1, 2, 3, are the following: (1)
(2)
(3)
x991|990 = −1.35, x991|990 = −4.08, and x991|990 = 0.89. Then, x991|990 = −2.43, a value that is close to that obtained with the Bayesian approach (see Table 1). To compute the minimum mean square error of X 991|990 (note that I use capital letters for denoting predictors) following the regime-switching approach, I note that, in general, Z E[(X T +h − X T +h|T )2 |xT , zT ] = (x T +h − x T +h|T )2 f (x T +h |xT , zT )dx T +h l X
( j)
p j (h)MSE j (X T +h|T ),
(7)
(x T +h − x T +h|T )2 dFT +h, j (x T +h |z T +h ∈ B j , xT , zT ),
(8)
=
j=1
where ( j)
MSE j (X T +h|T ) =
Z
for each j = 1, . . . , l. Using (7) and (8), I obtain that the RMSE of the predictor X 991|990 is 5.21, a value that is very close to that in Table 1. The computations for the one-step ahead point predictor and its MMSE were relatively easy. For h > 1, these calculations get complex because the component predictive distributions needed ( j) for obtaining the quantities MSE j (X T +h|T ); j = 1, 2, 3; must be obtained by means of iterated
270
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
Fig. 2. Time series for the empirical example: (a) rainfall, (b) flow.
integrations. For this reason, I recommend using the procedure described in the proposition in order to forecast a univariate process that obeys the TAR model given in expression (1), for any forecast horizon h ≥ 1. 3. An empirical application Nieto (see [13]) presented a real-data application for illustrating his methodology. The time series considered were the daily rainfall (in mm), as the threshold variable, and a daily river flow (in m3 /s), as the response variable, in a certain Colombian geographical region. The rainfall was measured at the San Rafael Lagoon’s meteorological station, with an altitude of 3420 m and geographical coordinates 2.23 north (latitude) and 76.23 west (longitude). The flow corresponds to Bedon river, a small one in hydrological terms, and was measured at the San Rafael Lagoon’s hydrological station, with an altitude of 3300 m and coordinates 2.19 north and 76.15 west. These stations are located close to the earth equator and in a very dry geographical zone. This last characteristic permits the control of hydrological/meteorological factors, which may distort the kind of dynamical relationship explained by the TAR model. The data set corresponds to the sample period from January 1, 1992, up to November 30, 2000 (3256 data), and it was assembled by IDEAM, the official Colombian agency for hydrological and meteorological studies. In Fig. 2 one can see the two time series, where it can be noted the dynamical relationship between the two variables, in the sense that the more the precipitation the more the river flow. Additionally, one can see certain stable paths in both variables although there are bursts of large values. This fact is a major characteristic to be taken into account for explaining the river flow dynamical behavior in terms of precipitation.
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
271
Let Pt and X t be respectively the rainfall and the river flow at day t. Because of the universal convention for measuring these two variables, I needed to put Z t = Pt−1 , that is, I translated the precipitation one period back for relating it to the river flow. As suggested in [13], the flow time series needs two transformations: (1) square root of the data and (2) an adjustment for (conditional) heteroscedasticity via an ARCH(1) model. From now on, the flow data to be analyzed will be the transformed ones and I denote them as {x˜t }. I remark here that the two time series have missing data, 32 in {xt } and 52 in {z t }, and in this last case, almost all of them are located very close to the end of the time series. Then, I shall address the time series forecasting exercise in the presence of these missing observations. As explained in [13], the distributions characterizing the first-order Markov chain {Z t } are not absolutely continuous, hence initial and kernel densities do not exist. In that paper, approximate continuous type distributions given by f n (z) = ph n (z) + (1 − p)g(z) as the invariant density and f n (z t |z t−1 ) = p(z t−1 )h n (z t ) + [1 − p(z t−1 )]g(z t |z t−1 ) as the kernel density were used, where n is a positive integer number, p = P(Z = 0), g(·) and g(·|·) denote a truncated normal density at z = 0, the first one with mean 3.24 and variance 7.762 and the second with mean z t−1 and the same variance, p(z t−1 ) = P(Z t = 0|z t−1 ) = p ( j) if z t−1 belongs to regime j, and, finally, −∞ < z < 1/n 0, h n (z) = nπ/2[cos(nzπ + π/2)], −1/n ≤ z ≤ 0 0, z > 0. It was found that there are l = 2 regimes for precipitation, that pˆ = 0.26, pˆ (1) = 0.87 and = 0.13, and that the TAR model is given by 1.32 + 0.59 X¯ t−1 + 1.35εt , Z t < 6.0 ¯ Xt = 1.98 + 0.58 X¯ t−1 + 2.28εt , Z t ≥ 6.0,
pˆ (2)
which was identified and estimated in the presence of the missing data. These missing observations can be replaced by the estimated ones, if needed, for the forecasting purposes. Following the results presented in Section 2.3, one can say that, conditional on the regimes, the average and the variability of the flow are higher in the second regime than in the first one. Nevertheless, the (marginal) average and the (marginal) variance are 3.82 and 5.28, respectively, and are constant for t. These values were obtained using p1 = Fz (6.0) = 0.60 and p2 = 1 − Fz (6.0) = 0.40, where Fz denotes the exact invariant-distribution function of {Z t } (see [13] for details). Here, it is important to remark that the mean and variance functions for process { X˜ t } are constant because the autoregressive polynomials for each regime have their roots outside the unit circle. Additionally, the sample partial and simple autocorrelation functions of this process decay quickly, as can be seen in Fig. 3. This observation and the fact that its mean and variance functions are constant indicate that process { X˜ t } could be second-order stationary. From this point of view, I obtained an estimate of the density function for variable X˜ , which is plotted in Fig. 4. This estimation was accomplished nonparametrically, using the Epanechnikov kernel (via the RATS package with its default value for the bandwidth). One can see there that the distribution is asymmetric and that it looks like a bimodal one. An aspect noted in [13], when estimating the missing data in the time series {x˜t }, was that credible-interval lower extremes can be negative, an inadequate value taking into account that the (transformed) river flow variable is positive. The explanation to this fact lies in that the TAR model noise can take negative values and there are no explicit restrictions on the model parameters to assure that variable X˜ takes positive values. Hence, an important topic for future
272
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
Fig. 3. Sample autocorrelation functions for the flow variable: partial at top and simple at bottom.
Fig. 4. Estimated density function for the distribution of the (transformed) flow variable.
research emerges here, in the sense of allowing a nonnegative noise in the model or finding restrictions on the model nonstructural parameters. After having analyzed the TAR model statistical properties, I proceed to obtain some forecasts for variables X˜ (the transformed river flow) and Z (the delayed precipitation). The key part of the forecasting exercise is in the simulation of variables Z T +h , h ≥ 1, via expression (3) in the proposition, since the kernels are mixture distributions with the second distribution being a truncated normal one. I follow Nieto’s approach with a modification for drawing from the last distribution. Strictly speaking, this is a modification to Robert and Casella’s approach
273
F.H. Nieto / Statistical Methodology 5 (2008) 263–276 Table 2 Flow forecasts in the empirical example t
xt|3256
RMSE
90% CI
3257 3258 3259 3260 3261
4.08 3.74 3.58 3.50 3.36
2.46 2.65 2.68 2.68 2.73
[0.59, 7.70] [−0.27, 7.85] [−0.54, 7.89] [−0.71, 7.66] [−0.83, 7.56]
Table 3 Forecasts for rainfall in the empirical example t
z t|3256
90% CI
3257 3258 3259 3260 3261
1.58 0.95 0.86 0.83 0.86
[0, 13.80] [0, 7.91] [0, 6.90] [0, 7.14] [0, 7.27]
(see [15], pp. 52). The details are in the Appendix. In Tables 2 and 3, I present forecasts for both variables, including 90% credible intervals, for the forecast horizon 3257–3261. One can see there that the credible-interval lower extremes for variable Z are zero, a consistent value, while for variable X some of them are negative, a fact explained previously. In order to obtain the previous forecasts, I used the time series without estimating the missing data. Actually, this estimation is not necessary because (i) the autoregressive orders for process { X˜ t } are 1 in both regimes and (ii) {Z t } is a Markov chain of order 1. In a different case, estimates of the missing data would be necessary for implementing the results obtained in the proposition, especially for the time series {z t } that has missing data at the time points t = 3247, 3248, . . . , 3255 (9 data), which are close to the end of the time series. In fact, almost all of the missing data in this time series are located in the time period 3198–3255. An exercise for obtaining the same forecasts, using the estimated missing data (with Nieto’s smoother), was conducted. As expected, and except for very small numerical differences, the results were practically the same. At the bottom line, imputation of missing observations does not provide new information (in Fisher’s or Kullback–Leibler’s sense) on the data generating process. I did an ex post exercise, taking as sample period 1–3190 and leaving out the 7 data that are located in the period 3191–3197. As mentioned in the preceding paragraph, after this time period there are missing observations, then putting estimated data as observed ones in a forecast horizon is not so reasonable. In Tables 4 and 5, I include the results of the forecasting exercise, where one can see that (i) for the flow variable, all the observed data fall in the 90% credible intervals and there is no systematic forecasting error, and (ii) for the rainfall variable, all the observed values are in the 98% credible interval, which are practically the same that the HDRs, but there is a negative systematic prediction error. I feel that this fact is due to the relative high proportion of zeroes (about 25%) and small values in the sample, which make the interquartile rank to be inside 0 and 8.0 mm, and the stationary property of this Markov chain. Together, these two empirical facts make the forecasts tend to a low quantile. Fig. 5 displays the box plot of the rainfall variable, which is computed under the Markov-chain invariant-distribution assumption. With
274
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
Table 4 Ex post forecasts for the flow variable t
xt a
xt|3190
RMSE
90% CI
90% HDR
3191 3192 3193 3194 3195 3196 3197
2.22 3.00 1.30 1.23 4.91 3.80 3.16
6.58 5.26 4.47 3.98 3.72 3.55 3.46
2.32 2.59 2.67 2.65 2.74 2.73 2.76
[3.14, 10.08] [1.37, 9.29] [0.33, 8.68] [−0.18, 8.12] [−0.60, 8.10] [−0.68, 7.79] [−0.85, 7.74]
[3.03, 9.80] [1.27, 9.17] [0.18, 8.44] [−0.23, 8.19] [−0.66, 7.87] [−0.59, 7.78] [−0.71, 7.65]
a Observed data.
Table 5 Ex post forecast results for the rainfall variable t
zt a
z t|3190
98% CI
3191 3192 3193 3194 3195 3196 3197
15.0 15.0 2.0 1.0 1.0 9.0 5.0
1.21 0.91 0.88 0.84 0.87 0.91 0.85
[0, 19.14] [0, 16.12] [0, 15.21] [0, 15.05] [0, 14.04] [0, 14.98] [0, 14.39]
a Observed data.
Fig. 5. Box plot of the (delayed) precipitation variable. The scale is in mm.
respect to the 90% HDRs for the flow variable forecasts, overall, they are slightly shorter than the corresponding credible intervals and indicate that all the predictive distributions considered in that forecast horizon are unimodal.
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
275
4. Conclusions I have developed a procedure for computing forecasts of the two variables involved in a univariate TAR model. Essentially, the method consists in obtaining the predictive distributions for the variables of interest and, then, in obtaining samples from them via Monte Carlo simulation, in order to estimate their means and their variances. Because the TAR model can be seen as a regime-switching model, I also have computed the forecasts under this approach and obtained general expressions for the predictors, which can be implemented in certain particular cases. Importantly, the predictive distributions are mixtures. However, because of the lack of generality for computing both the point forecasts and uncertainty measures (as the MMSE), I recommend using directly the predictive distributions. As a byproduct of this approach, I also find some statistical properties of the TAR process, which were unknown from previous studies on the subject. Specifically, it can be said that (i) the output process of the dynamical system described by the TAR model is a homogeneous Markov chain and (ii) the kernel and univariate marginal densities are mixtures. Forecasting with the univariate TAR model can be done in the presence of missing data. If needed, the missing data can be estimated before forecasting, but this is not a mandatory requirement. Methodologically, first, one tests for nonlinearity in the presence of missing observations (see [14]), second, one fits the assumed TAR model and estimates the missing data, if necessary, and at last, one does predictions. Finally, I consider that the results can be easily extended to the multivariate TAR model with exogenous covariates (as in [19]), except, perhaps, for computational details. Appendix. Simulating from a truncated normal distribution Following [15], the so-called kernels for the standard normal density and the translated exponential density are exp{−z 2 /2} and exp{−α(z − a)}, respectively, with α > 0 and a the translation point. The ratio between the first and the second kernel becomes exp{−(z − α)2 /2} exp{α 2 /2 − αa}. Since exp{−(z − α)2 /2} ≤ 1 for all real numbers z, an appropriate upper bound for the ratio is given by M(α) = exp{α 2 /2 − αa}. There are two possibilities for a: (i) a > 0 and (ii) a ≤ 0. In the first case, M(α) takes its minimum value at α = a and, in the second, its minimum value does not exist. For practice, in the last case, one can set α very close to zero. The analysis above was made for standard normal densities (as in [15]); then, for applying the accept–reject method to nonstandard normal distributions, I propose the following modified procedure. Assume X is distributed according to a left-truncated normal at x = b, with mean µ and variance σ 2 . With M and α selected as indicated above, and setting a = (b − µ)/σ , do the following for drawing from X : 1. Draw y from exp(α) (the no translated exponential distribution with parameter α). 2. Set z = y + a. 3. Compute r (z) = exp{−z 2 /2}/M exp{−α(z − a)} and draw u from the u(0, 1) distribution. 4. If u ≤ r (z), set x = σ z + µ, otherwise go back to step 1. Step 2 is justified by the following argument: let Y ∼ exp(α) and let Z = Y + a, then the density function of Z is f (z) = α exp{−α(z − a)}, which corresponds to an exponential distribution translated towards a and with parameter α. Step 4 is motivated by the fact that given Z ∼ N (0, 1), then X = µ + σ Z ∼ N (µ, σ 2 ). Finally, note that if z > a, then x > b.
276
F.H. Nieto / Statistical Methodology 5 (2008) 263–276
References [1] M.P. Clements, J. Smith, The performance of alternative forecasting methods for SETAR models, International Journal of Forecasting 13 (1997) 463–475. [2] M.P. Clements, J. Smith, A Monte Carlo study of the forecasting performance of empirical SETAR models, Journal of Applied Econometrics 14 (1999) 123–141. [3] M.P. Clements, H.-M. Krolzig, A comparison of the forecast performance of Markov-switching and threshold autoregressive models of US GNP, Econometrics Journal 1 (1998) C47–C75. [4] M.P. Clements, P.H. Franses, J. Smith, D. van Dijk, On SETAR non-linearity and forecasting, Journal of Forecasting 22 (2003) 359–375. [5] P. Congdon, Bayesian Statistical Modeling, John Wiley & Sons, New York, 2001. [6] J.G. De Gooijer, P. De Bruin, On SETAR forecasting, Statistics and Probability Letters 37 (1997) 7–14. [7] P.H. Franses, D. van Dijk, Nonlinear Time Series Models in Empirical Finance, Cambridge University Press, Cambridge, 2000. [8] J.D. Hamilton, Time Series Analysis, Princeton University Press, Princeton, 1994. [9] R.J. Hyndman, Computing and graphing highest density regions, The American Statistician 50 (1996) 120–126. [10] S. Lundbergh, T. Terasvirta, Forecasting with smooth transition autoregressive models, in: M.P. Clements, D.F. Hendry (Eds.), A Companion to Economic Forecasting, Blackwell, Oxford, 2002, pp. 485–509. [11] S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Volatility, Springer-Verlag, London, 1993. [12] A.L. Montgomery, V. Zarnowitz, R.S. Tsay, G.C. Tiao, Forecasting the US unemployment rate, Journal of the American Statistical Association 93 (1998) 478–493. [13] F.H. Nieto, Modeling bivariate threshold autoregressive processes in the presence of missing data, Communications in Statistics: Theory and Methods 34 (2005) 905–930. [14] F.H. Nieto, On testing for nonlinear autoregressive processes in the presence of missing data, Technical Report No. 6 (2006), Department of Statistics, National University of Colombia, Bogot´a (This manuscript can be obtained from the author upon request.). [15] C.P. Robert, G. Casella, Monte Carlo Statistical Methods, Springer-Verlag, New York, 1999. [16] H. Tong, in: C.H. Chen (Ed.), On a Threshold Model in Pattern Recognition and Signal Processing, Sijhoff & Noordhoff, Amsterdam, 1978. [17] H. Tong, K.S. Lim, Threshold autoregression, limit cycles, and cyclical data, Journal of the Royal Statistical Society Series B 42 (1980) 245–292. [18] H. Tong, Nonlinear Time Series: A Dynamical System Approach, Oxford University Press, Oxford, 1990. [19] R.S. Tsay, Testing and modeling multivariate threshold models, Journal of the American Statistical Association 93 (1998) 1188–1202. [20] R.S. Tsay, Analysis of Financial Time Series, John Wiley & Sons, New York, 2002.