Computational Statistics and Data Analysis (
)
–
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Bayesian estimation of smoothly mixing time-varying parameter GARCH models Cathy W.S. Chen a,∗ , Richard Gerlach b , Edward M.H. Lin a a
Department of Statistics, Feng Chia University, Taiwan
b
Discipline of Business Analytics, University of Sydney, Australia
article
info
Article history: Received 21 January 2013 Received in revised form 20 September 2013 Accepted 21 September 2013 Available online xxxx Keywords: Forecasting Markov chain Monte Carlo method Smooth transition Structure breaks Value-at-Risk Time-varying GARCH model
abstract Smoothly time-varying (TV) GARCH models via an asymmetric logistic function mechanism are proposed, which are incorporated into the conditional volatility equation for capturing smooth structural breaks in the volatility of financial time series. The proposed models allow smooth transitions of varying ‘‘speed’’ between multiple, persistent regimes. A Bayesian computational method is employed to identify the locations of smooth structural transitions, and for estimation and inference, simultaneously accounting for heteroskedasticity. An informative prior is proposed to help ensure identification and allow accurate inference. The proposed methods are illustrated using simulated data, and an empirical study provides evidence for significant improvements in fit for the proposed smooth asymmetric time-varying volatility TV-GARCH models in two international stock market return series. A forecast study shows the proposed models significantly add to forecast accuracy for both volatility and Value-at-Risk. © 2013 Elsevier B.V. All rights reserved.
1. Introduction Modeling time-varying volatility of financial returns has been explored extensively, following from the Autoregressive Conditional Heteroskedastic (ARCH) model in Engle (1982) and the Generalized ARCH (GARCH) model in Bollerslev (1986). Such models usually capture the shorter-term dynamics of volatility clustering only. We will investigate whether parameter constancy is valid in GARCH models applied to long time series of returns. It is expected that economic or political events or changes may cause the structure of volatility models to also change over time. Further, empirically the assumption of stationarity may be inappropriate for volatility whose longer-run properties generally appear to change slowly over time. The main purpose of this paper is to propose a more general heteroskedastic model for capturing smooth structural breaks in the volatility of financial time series. The fractionally integrated GARCH (FIGARCH) model by Baillie et al. (1996) attempts to explain the long-run nonstationary behavior of volatility via a long-memory time series model; Baillie and Morana (2009) allow a deterministically changing intercept to generalize this model. Another line of research has focused on structural change, via sharp or smooth transition functions. Dahlhaus and Subba Rao (2006) introduce a time-varying ARCH model to describe this kind of smooth structural change. Amado and Teräsvirta (2008, 2013) introduce two non-stationary GARCH models in this context. Their smooth transition time-varying (TV-)GARCH model allows the unconditional volatility to transition to different levels via either an additive or multiplicative structure, as follows: y t = µt + εt ,
εt = σt ζt , ∗
ζt ∼ D(0, 1)
Corresponding author. Tel.: +886 4 24517250; fax: +886 4 2451 7092. E-mail addresses:
[email protected],
[email protected] (C.W.S. Chen).
0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.09.019
2
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
σt2 = ht + gt or σt2 = ht × gt q p ht = α0 + αi εt2−i + βj ht −j i=1
gt =
α0 + ∗
j =1
q
αi∗ εt2−i
+
i =1
G(t ; γ ; c ) =
βj ht −j G(t ∗ ; γ ; c ) ∗
j =1
∗
p
1 + exp −γ
K
−1 (t − ck ) ∗
,
t ∗ = t /T , 0 < c1 < · · · < cK < 1,
(1)
k=1
which is Eq. (6) in Amado and Teräsvirta (2008, 2013), and where yt is the time t financial return, D(0, 1) is a distribution with mean 0 and variance 1, and {εt } is an i.i.d. innovation sequence with the conditional mean E (εt |Ft −1 ) = 0 and conditional variance E (εt2 |Ft −1 ) = σt2 . The term ht here is the conditional heteroskedasticity when G(t ∗ ) = 0; the model assumes this process follows the qstandard GARCH(p, q) of Bollerslev (1986). The term gt is the amount of structural change, with a maximum of (α0∗ + i=1 αi∗ εt2−i + pj=1 βj∗ ht −j ), tempered by the magnitude of G(t ∗ ), which is the transition function value; G(t ∗ ) is a continuous, nonnegative function bounded between 0 and 1, which depends on a smoothness, or speed of transition, parameter γ , γ > 0, a threshold location parameter ck , the number of thresholds K , and t ∗ = t /T , where T is the total number of observations. Larger values of γ increase the speed of transition between regimes. Here G(t ∗ ) is a symmetric transition function, as the speed of transition back and forth between regimes is constant. However, such a symmetric transition function may be insufficient in describing the volatility in asset returns, since most financial time series exhibit an asymmetric behavior in volatility; volatility is higher when prices are falling than when they are rising; as such prices tend to fall (rise) faster than they rise (fall), so that volatility tends to rise faster than it falls, suggesting that regime transitions may occur at different speeds. Hence, in this paper we consider asymmetric transition functions. We also adjust the transition function to aid in identifiability of the structural change parameters in gt and build a prior distribution that also aids in that goal. We restrict our focus to the additive representation above, though the methods we employ extend in a straightforward manner to multiplicative models. In this paper, we propose a more general and adjusted asymmetric transition TV-GARCH model with an additive structure. A computational and adaptive Bayesian Markov chain Monte Carlo (MCMC) method is developed and employed to simultaneously identify the locations of smooth structural transitions, and to perform estimation and inference, while also accounting for heteroskedasticity and autocorrelation in the data; this includes interval estimation for the regime transition locations and the speed of transitions between them. The proposed methods are illustrated using a simulation study and empirical examples. Model selection is important here, since the number of transitions is unknown and must be chosen at some stage. We employ a forecast study to compare between models and choose the optimal number of transitions for each data set. The rest of the paper is as follows: Section 2 describes the proposed the time-varying heteroskedastic model. Section 3 discusses estimation in the Bayesian framework, specifically describing the MCMC methods employed and also describes the method of model selection. Section 4 presents and discusses the results of some Monte Carlo simulation studies. An empirical study, including selection of the optimal number of regimes, of the US and Hong Kong stock market index returns, is presented in Section 5. Concluding remarks are in Section 6. 2. The proposed model We consider the additive structure defined in Eq. (1). Fig. 1 shows some examples of the symmetric logistic transition function for various values of the smoothness parameter γ . Suppose K = 1 with c1 = 0.5, as in Fig. 1(a). The transition occurs just once, and the shape of the transition function increases from 0 to 1, since the restriction γ > 0 is employed for identification purposes. Functions employing γ = 5, 10, 25, 50 and 100 are shown, highlighting that the speed of transition between regimes increases with γ . Smaller values of γ cause smoother, slower transitions, while γ > 50 is effectively a sharp or abrupt transition: i.e. a structural break. When K = 2, there are two changes or transitions: Fig. 1(b) highlights this for c1 = 0.1, c2 = 0.7 and γ = 5, 10, 25, 50, 100 and 200. The transition function starts (t ∗ = 0) at a relatively high value (not necessarily 1), then decreases with t ∗ , while not necessarily reaching 0, and then increases back up again, without necessarily reaching 1. The functions are symmetric about c +c t ∗ = 1 2 2 , implying equal speed of transition when volatility regimes change. To potentially be more in line with empirical financial return behavior, we extend the symmetric transition TV-GARCH model in a number of ways. First, we allow for potentially asymmetric transition functions, allowing different values of the smoothness parameter γk for each transition between regimes. The improved TV-GARCH model is defined as: yt = µt + εt ,
εt = σt ζt , ζt ∼ D(0, 1) σt2 = ht + gt
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
3
Fig. 1. Plots of logistic transition function for: (a) c1 = 0.5 and (b) c1 = 0.1, c2 = 0.7.
ht = α0 +
q
αi εt2−i +
i =1
gt =
K
p
βj σt2−j
j=1
q
α0 + ∗
αi∗ εt2−i
+
p
i=1
l =1
βj σ ∗
2 t −j
Gl (t ∗ ; γl ; cl )I [t ∗ ∈ Sl ]
(2)
j=1
−1
Gl (t ∗ ; γl ; cl ) = 1 + exp (−1)K −l+1 × γl × (t ∗ − cl )
,
where γl > 0, l = 1, . . . , K and 0 < c1 < · · · < cK < 1 and Sl stands for the local region where G(t ∗ ) is monotonic and equals G(t ∗ ) = Gl (t ∗ ), as detailed below. I [·] is a 0–1 indicator variable. If K = 2, so that there are two regime transitions, then the structural change part of the model is:
gt =
α0 + ∗
q
αi∗ εt2−i
+
i =1
+
α0 + ∗
p
βj σ ∗
G1 (t ∗ ; γ1 ; c1 )I [t ∗ ≤ tc ]
2 t −j
j=1
q
αi∗ εt2−i
i =1
+
p
βj σ ∗
2 t −j
G2 (t ∗ ; γ2 ; c2 )I [t ∗ > tc ],
j=1
where I [·] is a 0–1 indicator and where tc is a connection point between the two transition functions, calculated by: tc =
γ1 × c1 + γ2 × c2 . γ1 + γ2
(3)
Thus we have S1 ≡ t ∗ ≤ tc and S2 ≡ t ∗ > tc . Here, the functions G1 , G2 are defined as: G1 (t ∗ ; γ1 ; c1 ) = 1 + exp[γ1 × (t ∗ − c1 )]
−1
G2 (t ∗ ; γ2 ; c2 ) = 1 + exp[−γ2 × (t ∗ − c2 )]
−1
.
This is a continuous function that is smooth everywhere except at the point t ∗ = tc , and is still non-negatively bounded between 0 and 1. This new transition function includes two regions, as shown in Fig. 2. The shape of the function tilts one way or the other, depending on which section has the higher value of γ and thus the faster transition speed. Clearly this function can accommodate large differences in transition speed between decreasing and increasing transitions, and is more flexible than those in Fig. 1. As both Figs. 1 and 2 illustrate, there could be parameter identification issues if any of the parameters γl are too low, i.e. too close to 0, or if any of the cl are too close together. If the speed of transition is too slow, there may not be enough observations in a particular regime to identify or estimate either or both of the GARCH parameters α0 , αi , βi or the structural change parameters α0∗ , αi∗ , βi∗ . The same thing may occur if c1 is too close to 0 or cK is too close to 1. It seems sub-optimal to set direct limits on the γl , since they combine to influence this issue and work in unison to determine the transition function values. As such, to help ensure identification, we propose to ensure that min(Gl (t ∗ ; γl ; cl )) < δ and max(Gl (t ∗ ; γl ; cl )) > 1 − δ , during estimation, while also min(cl − cl−1 ) ≥ hl . Suitable choices of δ, hl could be 0.25 and 0.1 respectively (or similar),
4
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
Fig. 2. Plots of asymmetric logistic transition function for K = 2: (a) c1 = 0.1, c2 = 0.7 and (b) c1 = 0.2, c2 = 0.35, G1 with γ1 on the left of crossing point, and G2 with γ2 on the right.
depending on user-preference and sample size. As an example, if we consider the case: c1 = 0.2, c2 = 0.35 (see Fig. 2(b)), these criteria are not achieved for Gl (t ∗ ; γl ; cl ) < δ until γl reaches a value of about 20, after which they are achieved for all higher values of γl . However, γl below 20 is a problem, since then regimes can be ill-defined and identification could be a problem. Clearly, much higher values of γl are required for the TV-GARCH model when K > 1, compared to K = 1 (see Fig. 1), to ensure identification and reasonable estimation properties. An important criterion for Gl (t ∗ ; γl ; cl ) is min(Gl (t ∗ ; γl ; cl )) < δ , since we can see from Figs. 1(b) and 2 that the max(Gl ∗ (t ; γl ; cl )) is easily greater than 1 − δ when min(Gl (t ∗ ; γl ; cl )) < δ . Further, the minimum of Gl (t ∗ ; γl ; cl ) always occurs at tc , defined in Eq. (3). Thus, to ensure min(Gl (t ∗ ; γl ; cl )) < δ ≡ Gl (tc ; γl ; cl ) < δ , we propose the limits on limit γl , in the case K = 2, of:
1−δ
(tc − c1 ), δ 1−δ γ2 > log (c2 − tc ). δ
γ1 > log
In a Bayesian framework, this restriction simply becomes part of the prior distribution, as detailed below. Thus, for K = 2 we can estimate γ1 , γ2 , constrained to be larger than this lower bound, which will ensure that min(Gl (t ∗ ; γl ; cl )) < δ is satisfied, while max(Gl (t ∗ ; γl ; cl )) > 1−δ can be assured empirically. For K > 2, the conditions min(Gl (t ∗ ; γl ; cl )) < δ and max(Gl (t ∗ ; γl ; cl )) > 1 − δ can be ensured empirically, as in Hudson and Gerlach (2008), during the MCMC simulation, as discussed below. 3. Bayesian inference Bayesian inference usually requires the specification of a likelihood function, and a prior probability distribution for the model parameters, which combine via Bayes rule to give the posterior distribution used for estimation and inference. This section presents the general likelihood functional form for the TV-GARCH models considered in the paper, together with details of the prior distributions employed, under both Gaussian and Student-t errors. 3.1. Log-likelihood function From Bollerslev et al. (1992), in general a GARCH(1,1) model will be sufficient to capture the volatility clustering in the data, so we restrict our focus to that specification, though our methods are straightforward to extend to higher order models. We thus focus on the asymmetric TV-GARCH(1,1) model with K = 2, defined as: yt = µt + εt ,
εt = σt ζt ,
µt = 0 ζt ∼ i.i.d. D(0, 1)
σt2 = α0 + α1 εt2−1 + β1 σt2−1 + (α0∗ + α1∗ εt2−1 + β1∗ σt2−1 ) (1 + exp[γ1 (t ∗ − c1 )])−1 I [t ∗ ≤ tc ] + (1 + exp[−γ2 (t ∗ − c2 )])−1 (1 − I [t ∗ ≤ tc ]) , where D(0, 1) is a distribution with mean 0 and variance 1.
(4)
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
5
Let θ = (α, γ, c )′ be a vector of model parameters, where α = (α0 , α1 , β1 , α0∗ , α1∗ , β1∗ )′ , γ = (γ1 , γ2 )′ , and c = (c1 , c2 )′ . Let y = (y1 , y2 , . . . , yT ) denote the log return data. The likelihood function, under Gaussian errors (i.e. D follows a standard normal) is given by: L(y |θ) =
T
√
t =1
1
2π σt
exp −
y2t 2σt2
.
(5)
Evidence in the literature shows that residuals from financial time series models tend to be leptokurtic and hence nonGaussian. Thus, in the empirical study we assume the innovation sequence {εt } follows a standardized Student-t distribution with degrees of freedom ν . Therefore, the likelihood becomes:
− ν+2 1 Γ ν+2 1 (yt − µ)2 1 √ L(y |θ) = 1+ . Γ ν2 (ν − 2)σt2 (ν − 2)π σt t =1 T
The TV-GARCH model is overall not stationary, as there is no constant long-run volatility that the conditional volatility series reverts to. However, each regime is assumed as a 2nd order stationary process. Thus, we employ sufficient conditions for non-negativeness and covariance-stationarity in each regime for the asymmetric TV-GARCH(1,1) model, which are:
α0 > 0, α1 > 0, β1 > 0 Non-negativity of the conditional variance: α0 + α0∗ ≥ 0, α1 + α1∗ ≥ 0, α1 + β1 < 1, Covariance-stationarity: α1 + β1 + (α1∗ + β1∗ ) < 1.
β1 + β1∗ ≥ 0.
(6)
(7)
Thus, there is a finite unconditional mean volatility in each regime and the process smoothly moves between the regimes with these finite mean volatilities. However, the process does not necessarily ever revert to a single long-run variance, so it is non-stationary overall. These likelihood functions are not integrable in terms of γl and frequentist standard errors are problematic for γl too. Amado and Teräsvirta (2008, 2013) experienced difficulty estimating these parameters, since they reported estimates of exactly γl = 100, with—reported for standard errors (pp. 31 and 34). We alleviate this issue in two ways, both via the prior distribution: first through the restrictions on γl described above, and secondly through having an informative prior distribution. 3.2. Priors and posteriors Prior distributions must be assumed. Conjugate priors are not available under the Gaussian or Student-t error distributions, as the likelihood is a non-standard form in all model parameters. Thus we turn to computational methods, such as MCMC algorithms. Gerlach and Chen (2008) demonstrate the favorable performance of Bayesian inference via MCMC methods for doubly smooth transition GARCH models, and we extend and adapt their MCMC sampling scheme in this paper. First, we assume γk |c has a log-normal distribution, denoted ln(γk ) ∼ N (µγ , σγ2 ), constrained to satisfy min(Gl (t ∗ ; γl ; cl )) < δ and max(Gl (t ∗ ; γl ; cl )) > 1 − δ . If the number of breaks is two, i.e. K = 2, the prior for the threshold parameters, as proposed by Chen et al. (2010), for three regimes is: c1 ∼ Unif(lb1 , ub1 );
c2 |c1 ∼ Unif(lb2 , ub2 ),
where we can set lb1 = h1 and ub1 = (1 − h1 − h2 ), with hl defined previously. e.g. If h1 = h2 = 0.1, then c1 ∈ (0.1, 0.8). Further, set ub2 = (1 − h2 ) and lb2 = c1 + h1 . Thus, the prior for c1 , c2 is flat over the region ensuring c1 + h1 ≤ c2 and at least min(h1 , h2 )100% of observations are contained in each regime. A joint Jeffreys-like prior is assumed for α, constrained to satisfy the required conditions in (6) and (7), i.e.,
π (α) = I (α0 ∈ (0, a))I (α0∗ ∈ (−α0 , b))I (α−0 ∈ S )
1 − α1 − β1 1 − α1 − α1∗ − β1 − β1∗
α0
α0 + α0∗
where α = (α0 , α1 , β1 , α0∗ , α1∗ , β1∗ )′ , α−0 is the parameter vector after removing α0 and α0∗ , and S is the region satisfying the conditions in (6) and (7). This prior is inversely proportional to the unconditional mean volatility in each regime and follows the spirit of a Jeffreys prior on a variance parameter. We assume a priori independence among the groupings α, (γ, c ). Multiplication of each prior followed by the conditional likelihood function in (5) leads to the conditional posterior density for each parameter group. These conditional posterior distributions are not of standard forms, thus we decide to use adaptive MCMC methods for computational inference, employing Metropolis (M) and M–Hastings (Hastings, 1970; Metropolis et al., 1953) accept–reject methods. The adaptive sampling methods combine random walk (RW) MH and independent kernel (IK) MH algorithms to increase the speed of convergence and efficiency (see Chen and So, 2006 and Gerlach and Chen, 2008). Mixtures of distributions are employed as proposal distributions, both in the burn-in and sampling period, so as to ensure good coverage of the posterior and to help jump out of local modes where standard algorithms may get stuck. Our method is to use mixtures that ensure fat-tailed proposals
6
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
and builds on suggestions in this area from Chen et al. (2012) and Lin et al. (2012), being also a simplified, special case of the more complicated methods in Hoogerheide et al. (2007). For a general parameter grouping θ , we use a RW-MH algorithm with a mixture of three Gaussians as a proposal density, 3 during the burn-in period. At the ith MCMC iteration, θ [i] ∼ j=1 ρj N (µ, kj Ω ), where we set µ to be the last iterated value for that parameter group and also ρ = {0.7, 0.2, 0.1} and k = {1, 9, 81}. Small changes to either or both of ρ, k cause no practical change to the results reported in this paper. This mixture proposal density allows reasonably heavy tails, that will likely lead to better and more efficient MCMC samples, see Chen et al. (2012) and Lin et al. (2012), compared to the usual Gaussian proposal RW-M method. The detailed procedure for our MCMC algorithm for the asymmetric TV-GARCH model is described as follows: Step Step Step Step Step
1: 2: 3: 4: 5:
Give initial values for all parameters, i.e. for i = 1. The set i = 2. Generate γ (i) |c (i−1) , y , α(i−1) from its posterior. Generate c (i) |γ (i) , y , α(i−1) from its posterior. Generate α(i) |y , c (i) , γ (i) from its posterior. Set i = i + 1.
Repeat these steps until i = M. Each of steps 2–4 above involves simulating from the mixture proposal density detailed above and accepting the new proposed value, or retaining the previous iterate, based on the usual MH acceptance probability. For the sampling period, we employ an IK-MH algorithm, where the proposal distribution is the mixture of three Gaussians above, except that µ, Ω are set equal to the sample mean vector and sample variance–covariance matrix, respectively, for the final 50% of the burn-in period iterates, for each parameter group. The same steps above are followed for i = M + 1, . . . , N. Only these sampling period iterates are used for estimation. 4. Simulation study To examine the effectiveness of the MCMC sampling scheme, we conduct a simulation study to evaluate the proposed methodology. All calculations and simulations are performed in Fortran. We simulate 500 replicated time series with a sample size of 2000 from the TV-GARCH models. Two different parameter specifications are employed, as in Model 1 and Model 2 below. Model 1: yt = εt
εt = ζt σt , σ
2 t
εt |Ft −1 ∼ N (0, ht ),
ζt ∼ N (0, 1)
= 0.04 + 0.05ε + 0.85ht −1 + (0.04 + 0.02εt2−1 + 0.06ht −1 ) × (1 + exp[20(t ∗ − 0.35)])−1 I [t ∗ ≤ tc ] + (1 + exp[−40(t ∗ − 0.65)])−1 I [t ∗ > tc ] . 2 t −1
Model 2: yt = εt
εt = ζt σt ,
εt |Ft −1 ∼ N (0, ht ),
ζt ∼ N (0, 1)
σt2 = 0.05 + 0.10εt2−1 + 0.80ht −1 + (0.02 − 0.01εt2−1 + 0.08ht −1 ) × (1 + exp[40(t ∗ − 0.40)])−1 I [t ∗ ≤ tc ] + (1 + exp[−25(t ∗ − 0.70)])−1 I [t ∗ > tc ] . We use a burn-in sample of M = 8000 and a total sample of N = 20 000 iterations. The initial MCMC iterates for the volatility parameters are set as 0.01 and 0.02, and for the thresholds c1 , c2 as 0.1, 0.1, respectively. For the change points, we set h = 0.25, ensuring that these points are separated by at least 500 observations. Thus, c1 will follow Unif(0.25, 0.5) and c2 will follow Unif(c1 + 0.25, 0.75). For the speed of transition parameters γ , based on the plots in Figs. 1(b) and 2, we assume γk ∼ ln N (µγ = ln(30), σγ2 = 0.52 ), so that these priors put low weight on values of γ below 15, large weight on values between 10 and 100, and low weight above 100 (which is essentially a sharp transition). To ensure identification, we set δ = 0.25. It is worth discussing the level of persistence generated by these TV-GARCH models. Fig. 3 presents the first 100 autocorrelations of absolute returns simulated from the TV-GARCH processes from Models 1 and 2 above. It is clear that both specifications can generate behavior that looks like long-range dependence. MCMC methods require starting values for the parameters. These are randomly simulated from the prior distributions. Table 1 displays the true values plus average posterior means, medians, standard deviations (Std.), and the 95% credibility intervals (95% CI) of the 500 replications for the model parameters for Models 1–2. All the posterior means and medians seem not significantly different from their true values and lie in the Bayesian intervals of between 2.5% and 97.5%. Trace and autocorrelation (ACF) plots of the after burn-in iterates are inspected to check on MCMC convergence for some of the simulated data sets. Although not provided, the ACF plots of those iterates show a reasonable reduction of autocorrelations to zero as lag increases. These plots are available upon request. Histograms and kernel density estimates of the 500 replicated estimates for Model 2 are given in Fig. 4. The red lines represent the true values. It is clear that the estimates center around about the true values, though γ1 seems slightly underestimated.
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
7
Fig. 3. The first 100 autocorrelations of absolute returns of the simulated TV-GARCH processes.
Table 1 Simulation results of the asymmetric TV-GARCH(1, 1) models for 500 replications. Model Model 1
Parameter
α0 α1 β1 α0∗ α1∗ β1∗ c1 c2
γ1 γ2 Model 2
α0 α1 β1 α0∗ α1∗ β1∗ c1 c2
γ1 γ2
True
Mean
Median
Std.
2.5%
97.5%
0.04 0.05 0.85 0.04 0.02 0.06 0.35 0.65 20.00 40.00
0.0402 0.0506 0.8549 0.0394 0.0247 0.0507 0.3418 0.6551 25.0456 38.5473
0.0402 0.0500 0.8551 0.0395 0.0249 0.0510 0.3425 0.6546 22.5999 34.8653
0.0078 0.0122 0.0150 0.0072 0.0077 0.0079 0.0208 0.0168 10.7795 17.0496
0.0261 0.0297 0.8273 0.0260 0.0107 0.0363 0.2995 0.6231 12.3380 17.0182
0.0551 0.0743 0.8821 0.0511 0.0378 0.0636 0.3808 0.6893 52.4666 82.2143
0.05 0.10 0.80 0.02 −0.01 0.08 0.40 0.70 40.00 25.00
0.0675 0.1008 0.7531 0.0508 0.0049 0.0916 0.3983 0.7048 33.7187 26.9883
0.0666 0.1004 0.7540 0.0492 0.0045 0.0920 0.3994 0.7056 29.7210 23.6411
0.0178 0.0225 0.0351 0.0279 0.0233 0.0257 0.0225 0.0198 17.0424 13.4221
0.0371 0.0600 0.6859 0.0031 −0.0358 0.0457 0.3506 0.6644 13.6275 12.1350
0.1033 0.1433 0.8146 0.1054 0.0488 0.1361 0.4396 0.7398 77.8478 61.9943
5. Empirical examples We illustrate our methods using daily stock prices from the S&P500 index of the US stock market and the Hang Seng index (HSI) of the Hong Kong stock market. The data are obtained from Yahoo Finance and cover the period 4th January, 2000 to 24th February, 2010 for S&P500, with sample size 2550; and 4th January, 2000 to 5th March, 2010 for HSI, with sample size 2509. For these markets, the daily returns are the logarithmic difference of the daily price index times 100, i.e. rt = log(Pt /Pt −1 ) × 100, where Pt is the closing price on day t. Time series plots of the log returns for S&P500 and HSI are shown in Fig. 5, showing that these series apparently have a higher volatility level starting around August, 2008, consistent with the effects from the global financial crisis (GFC). Table 2 shows summary statistics for these series. Both markets have a positive return median, but mean closer to 0, whilst the average volatility of the Hong Kong market is slightly higher than that of the US market. Both series have mild negative skewness and fat tails: positive excess kurtosis. Models with no transitions (K = 0), one transition (K = 1), two transitions (K = 2) and three transitions (K = 3), as in Eq. (2), are considered for both time series. The same prior settings as for the simulation study were employed, except that
8
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
Fig. 4. Histogram of simulation results for 500 replications from Model 2.
Table 2 Summary statistics of daily stock log returns.
SP500 HSI
N
Mean
Std.
Max.
Min.
Median
Skew.
Ex. Kurt.
2550 2509
−0.011
1.397 1.710
10.957 13.407
−9.470 −13.582
0.049 0.039
−0.108 −0.038
7.625 7.460
0.007
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
9
10 8 6 SP500
4 2 0 –2 –4 –6
01 Jan 02
01 Jan 04
01 Jan 06
01 Jan 08
01 Jan 10
01 Jan 06
01 Jan 08
01 Jan 10
Date
10
HSI
5 0 –5 –10 01 Jan 02
01 Jan 04 Date
Fig. 5. Time series plots of daily log returns. Table 3 Parameter estimates of a TV-GARCH model with K = 1 and a Gaussian distribution. Market S&P500
Parameter
α0 α1 β1 α0∗ α1∗ β1∗ c
γ HSI
α0 α1 β1 α0∗ α1∗ β1∗ c
γ
Mean 0.0827 0.0826 0.8798 −0.0721 −0.0146 0.0431 0.2346 32.6221 0.0720 0.0738 0.9079 −0.0560 −0.0091 0.0207 0.3772 26.6223
Date
23rd May 2002
3rd November 2003
Median
Std.
2.5%
97.5%
0.0990 0.0776 0.8748 −0.0852 −0.0189 0.0524 0.2404 29.2322
0.0233 0.0130 0.0180 0.0229 0.0113 0.0165 0.0607 15.4567
0.0270 0.0565 0.8438 −0.0905 −0.0339 0.0053 0.1163 12.9732
0.1012 0.1054 0.9215 −0.0193 0.0146 0.0571 0.3366 70.6043
0.0769 0.0741 0.9026 −0.0667 −0.0078 0.0253 0.2034 23.4984
0.0449 0.0103 0.0150 0.0515 0.0112 0.0123 0.2987 13.4552
0.0098 0.0568 0.8832 −0.1090 −0.0312 0.0010 0.1064 10.0512
0.1220 0.0984 0.9357 0.0669 0.0112 0.0432 0.9343 61.8610
for the change points we set h1 = 0.1, h2 = 0.05, so c1 follows a Unif(0.1, 0.85) and c2 follows a Unif(c1 + 0.1, 0.95). This allows regime changes to occur reasonably late in each series, as late as the last 6 months of trading days. Again, initial MCMC iterates are randomly simulated from their prior distribution. 20 000 MCMC iterations are simulated with a burn-in period of the first 10 000 iterates. For illustration, parameter estimates for models where K = 1, 2 and 3 are summarized in Tables 3–7 for models with Gaussian or Student-t errors. The simple GARCH model (K = 0) (not shown) shows a high volatility persistence (α1 +β1 ) estimate of about 0.99 in each market. For the model with K = 1, the transition point for the S&P500 is estimated as c = 0.2346 being on the 23rd May, 2002 (day 598), while the speed of transition is estimated as γ = 32.6. For the HSI, the transition is estimated as c = 0.3772, which is on the 3rd November, 2003 (day 946), while the speed of transition is again estimated as γ = 26.6. Later, the forecasting study will reveal that models with K = 1 are not too accurate, and so we will not discuss these further, though these slow transitions seem to be from a medium to low volatility level when observing Fig. 5. Clearly when K = 1 the global financial crisis (GFC) in 2008–2009 is not well captured. The Gaussian error model with K = 2 transitions, for S&P500, the first transition is estimated as c1 = 0.367, 24th September, 2003 (day 935). The second is c2 = 0.722, on the 3rd May, 2007 (day 1842). For HSI, the first transition is estimated c1 = 0.436, occurring on the 10th June, 2004 (day 1093), and the second is c2 = 0.628, occurring on the 26th May, 2006 (day 1577). The parameter estimates indicate that average volatility (e.g. ≈3.2 for S&P500) is much higher at the start and at end of the sample, i.e. in the 1st and last regimes, and lower in the middle period (≈0.39 for S&P500), in both markets, which can be seen fairly clearly in Fig. 5; but recall that there is a smooth transition between these average volatility levels. Parameter estimates for the models where K = 2 with Student-t errors are summarized in Table 5. The estimates are highly similar to those for the Gaussian error models: the transition points for the S&P500 are estimated as
10
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
Table 4 Parameter estimates of an asymmetric TV-GARCH model with K = 2 and a Gaussian distribution. Market S&P500
Parameter
α0 α1 β1 α0∗ α1∗ β1∗ c1 c2
γ1 γ2 Pr(γ1 < γ2 )
HSI
α0 α1 β1 α0∗ α1∗ β1∗ c1 c2
γ1 γ2 Pr(γ1 < γ2 )
Mean 0.0412 0.0393 0.8585 0.0081 0.0536 0.0332 0.3665 0.7222 28.6989 43.3799 0.7972 0.0500 0.0224 0.8778 −0.0030 0.0546 0.0328 0.4356 0.6284 32.6138 32.0161 0.5127
Date
24th September 2003 3rd May 2007
10th June 2004 26th May 2006
Median
Std.
2.5%
97.5%
0.0391 0.0387 0.8650 0.0080 0.0540 0.0248 0.3647 0.7247 26.4556 40.2067
0.0141 0.0152 0.0332 0.0175 0.0198 0.0299 0.0257 0.0198 12.2905 17.2225
0.0214 0.0116 0.7554 −0.0260 0.0144 0.0012 0.3205 0.6738 12.8726 19.5412
0.0859 0.0710 0.9055 0.0429 0.0905 0.1159 0.4208 0.7558 59.7081 86.3337
0.0468 0.0192 0.8865 −0.0018 0.0572 0.0225 0.4425 0.6248 29.6745 29.2436
0.0179 0.0171 0.0319 0.0244 0.0207 0.0313 0.0401 0.0327 15.2518 11.3390
0.0215 0.0007 0.7909 −0.0552 0.0023 0.0008 0.3448 0.5773 12.9584 18.6343
0.0950 0.0652 0.9191 0.0485 0.0877 0.1123 0.4918 0.7013 68.9114 61.8501
Table 5 Parameter estimates of an asymmetric TV-GARCH model with K = 2 and a Student-t distribution. Market S&P500
Parameter
Mean
α0 α1 β1 α0∗ α1∗ β1∗
0.0297 0.0480 0.8726 0.0111 0.0382 0.0289 0.4047 0.7280 22.6482 46.2823 10.5233 0.8905
c1 c2
γ1 γ2 ν Pr(γ1 < γ2 )
HSI
α0 α1 β1 α0∗ α1∗ β1∗ c1 c2
γ1 γ2 ν Pr(γ1 < γ2 )
0.0479 0.0262 0.8854 0.0096 0.0440 0.0337 0.3853 0.6482 24.1517 32.5671 7.9943 0.7521
Date
12th February 2004 23rd May 2007
4th December 2003 4th August 2006
Median
Std.
2.5%
97.5%
0.0288 0.0471 0.8758 0.0117 0.0377 0.0227 0.3980 0.7299 20.0916 42.5537 10.3087
0.0119 0.0180 0.0246 0.0188 0.0225 0.0226 0.0495 0.0255 11.0138 19.0518 1.6007
0.0093 0.0136 0.8197 −0.0291 −0.0070 0.0007 0.3270 0.6719 9.6035 20.4204 8.0726
0.0563 0.0846 0.9108 0.0488 0.0807 0.0802 0.5240 0.7667 49.6596 94.1190 14.1881
0.0448 0.0241 0.8969 0.0093 0.0439 0.0218 0.4029 0.6478 19.0609 29.8028 7.8292
0.0217 0.0160 0.0353 0.0356 0.0212 0.0347 0.0794 0.0414 15.0459 12.7704 1.0612
0.0166 0.0020 0.7855 −0.0659 −0.0068 0.0011 0.2047 0.5747 8.5962 17.1201 6.3818
0.1002 0.0635 0.9335 0.0861 0.0800 0.1361 0.4978 0.7182 62.8184 63.0996 10.4504
c1 = 0.405, 12th February, 2004 (day 1032) and c2 = 0.728, being 23rd May, 2007 (day 1856); while for the HSI, we have c1 = 0.385, 4th December, 2003 (day 966) and c2 = 0.648, being 4th August, 2006 (day 1626). These are very close to those previously estimated under Gaussian errors when K = 2. The estimated logistic transition functions for both markets and K = 1, 2, 3 are shown in Fig. 6, while Fig. 7 shows the level of uncertainty in these estimated curves (for K = 2, 3 only) by randomly choosing 200 MCMC iterates of parameter values and plotting the calculated functions using these. For the S&P500 market especially, the speed of transition from a high volatility to low volatility period is much slower (i.e. γ1 < γ2 for both K = 2, 3) than the 2nd transition from low to high volatility period incorporating the GFC and also slower than the 3rd transition for K = 3. This agrees with the well known empirical phenomenon that financial volatility rises (and prices fall) faster than it falls (and prices rise).
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
11
Table 6 Parameter estimates of an asymmetric TV-GARCH model with K = 3 and a Gaussian distribution. Market S&P500
Parameter
Mean
α0 α1 β1 α0∗ α1∗ β1∗ γ1 γ2 γ3
0.0222 0.0559 0.8910 0.0358 0.0318 0.0097 23.4101 35.1531 28.2259 0.3120 0.7475 0.9296 0.7835 0.3584
c1 c2 c3 Pr(γ1 < γ2 ) Pr(γ1 < γ3 < γ2 ) HSI
α0 α1 β1 α0∗ α1∗ β1∗ γ1 γ2 γ3 c1 c2 c3 Pr(γ1 < γ2 ) Pr(γ1 < γ3 < γ2 )
0.0389 0.0431 0.8921 0.1799 0.0395 0.0057 8.7637 27.2940 24.8792 0.1401 0.7508 0.9445 0.9998 0.5267
Date
Median
Std.
2.5%
97.5%
7th March 2003 3rd August 2007 9th June 2009 Pr(γ3 < γ2 )
0.0220 0.0555 0.8926 0.0352 0.0305 0.0065 21.1327 31.8165 25.7939 0.3195 0.7473 0.9383 0.6462
0.0082 0.0138 0.0156 0.0187 0.0200 0.0090 13.7438 14.7925 11.9745 0.0413 0.0211 0.0199 Pr(γ1 < γ3 )
0.0099 0.0336 0.8587 0.0033 −0.0075 0.0004 6.7238 16.3242 12.5286 0.2056 0.7098 0.8839 0.6517
0.0392 0.0867 0.9188 0.0805 0.0692 0.0325 58.4318 71.8402 57.5421 0.3748 0.7926 0.9494
8th June 2001 21st August 2007 14th August 2009 Pr(γ3 < γ2 )
0.0352 0.0404 0.8920 0.1679 0.0394 0.0069 8.3789 25.5327 23.8295 0.1357 0.7508 0.9473 0.5453
0.0097 0.0100 0.0123 0.0528 0.0174 0.0052 2.0634 9.2078 8.4004 0.0296 0.0164 0.0092 Pr(γ1 < γ3 )
0.0197 0.0317 0.8603 0.0728 0.0097 0.0002 6.1678 15.8072 11.8306 0.1015 0.7186 0.9198 0.9814
0.0637 0.0622 0.9176 0.2459 0.0865 0.0196 14.1023 51.8421 43.1238 0.2130 0.7769 0.9499
Table 7 Parameter estimates of an asymmetric TV-GARCH model with K = 3 and a Student-t distribution. Market S&P500
Parameter
Mean
α0 α1 β1 α0∗ α1∗ β1∗ γ1 γ2 γ3
0.0164 0.0673 0.8976 0.0296 0.0124 0.0111 22.2269 37.3686 29.7611 0.3472 0.7712 0.9292 10.5736 0.8195 0.4338
c1 c2 c3
ν
Pr(γ1 < γ2 ) Pr(γ1 < γ3 < γ2 ) HSI
α0 α1 β1 α0∗ α1∗ β1∗ γ1 γ2 γ3 c1 c2 c3
ν
Pr(γ1 < γ2 ) Pr(γ1 < γ3 < γ2 )
0.0324 0.0532 0.8996 0.1777 0.0204 0.0130 10.4706 29.5253 22.9170 0.1320 0.7645 0.9171 8.2088 0.9974 0.6386
Date
15th July 2003 30th October 2007 5th June 2009 Pr(γ3 < γ2 )
10th May 2001 10th October 2007 7th May 2009 Pr(γ3 < γ2 )
Median
Std.
2.5%
97.5%
0.0162 0.0670 0.8975 0.0270 0.0109 0.0088 18.7552 33.3854 27.0133 0.3482 0.7619 0.9401 10.3777 0.6621
0.0081 0.0166 0.0158 0.0215 0.0233 0.0091 13.1242 15.9629 12.1495 0.0790 0.0442 0.0298 1.5786 Pr(γ1 < γ3 )
0.0022 0.0314 0.8623 −0.0084 −0.0261 0.0010 7.6768 17.9173 14.2428 0.1733 0.7030 0.8395 8.1126 0.7297
0.0325 0.1051 0.9251 0.0766 0.0600 0.0364 57.3633 77.1978 59.1921 0.4959 0.8500 0.9500 14.2509
0.0291 0.0571 0.9027 0.1799 0.0235 0.0076 10.2645 25.5131 21.2809 0.1289 0.7629 0.9149 8.0698 0.6636
0.0154 0.0121 0.0197 0.0301 0.0157 0.0123 2.2287 12.2645 8.0578 0.0208 0.0193 0.0159 1.0534 Pr(γ1 < γ3 )
0.0123 0.0291 0.8460 0.1493 −0.0194 0.0019 6.7428 16.0369 12.3262 0.1017 0.7315 0.8892 6.5400 0.9750
0.0628 0.0844 0.9211 0.2604 0.0533 0.0380 16.2128 56.6438 41.8387 0.1779 0.8055 0.9491 10.5707
12
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
Fig. 6. Estimated smooth transition functions: K = 1, 2 and 3 for the US and Hong Kong markets.
Fig. 7. The estimated logistic transition functions Gl (t ∗ ; γl ; cl ) for US and Hong Kong markets plus transition functions based on 200 randomly selected joint MCMC iterates for γl and cl .
For both markets TV-GARCH models with K = 3 have fairly similar values of c1 , c2 as their corresponding model with K = 2, especially for c2 ; the values for c1 between K = 2, 3 are a bit more spread out, which is likely influenced by the much lower estimates of γ1 when K = 3 compared to K = 2 (see Tables 4 and 6 and Fig. 6). In particular, for HSI, the estimate of γ1 = 10.5 represents a very slow estimated transition that is highlighted by Fig. 6. The TV-GARCH models with K = 3 estimate c3 as close to 0.92. For S&P500 the estimate is c3 = 0.929 which is 8th June, 2009 (day 2370); for HSI the estimate is c3 = 0.917 which is 7th May, 2009 (day 2301). In each case the 1st transition (whose mid-point is c1 ) from a higher to lower volatility regime is estimated to be the slowest (γ1 = 22.23 for S&P500; γ1 = 10.47 for HSI); the 2nd transition (whose mid-point is c2 ) back to a higher volatility regime, which includes the GFC period in both markets, is the fastest (γ2 = 37.37 for S&P500; γ2 = 29.53 for HSI). The final transition (whose mid-point is c3 for K = 3 models only) back to a lower volatility level, in mid-2009, has a comparatively medium transition speed (γ3 = 29.76 for S&P500; γ3 = 22.92 for HSI). For the Student-t error K = 3 model in the S&P500 market the posterior probability in favor of γ1 < γ3 < γ2 is 0.434, i.e. 43.4% of MCMC iterates had the transition parameters in that order. That is the highest probability ordering of these parameters, the next
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
13
Fig. 8. The time series plot of daily S&P500 returns. The estimated break dates are indicated by vertical lines; the single breaks for K = 1 have a dashed (green) line; the two breaks for K = 2 are in solid red and those for K = 3 are in dot-dashed magenta.
Fig. 9. The time series plot of daily HSI returns. The estimated break dates are indicated by vertical lines; the single breaks for K = 1 have a dashed (green) line; the two breaks for K = 2 are in solid red and those for K = 3 are in dot-dashed magenta.
being 0.24 for γ1 < γ2 < γ3 . The posterior probability of γ1 < γ3 < γ2 for HSI is 0.639, the next highest probability being 0.33, again for γ1 < γ2 < γ3 . Time series plots of log returns in each market, with estimated transition points marked, are shown in Figs. 8 and 9. The estimated break dates are indicated by vertical lines; the single breaks for K = 1 have a dashed (green) line; the two breaks for K = 2 are in solid red and those for K = 3 are in dot-dashed magenta. In each market, the second transition point for K = 2 is before, and leads into the start of the GFC period. The estimated break points for the models with K = 1 are closest to the estimates for c1 when K = 2, 3 but are not really close to that for HSI. This may seem curious, since the GFC period clearly represents a large increase in volatility and it might have been expected that a single break point would occur leading into the crisis period. However, there is a large reduction in volatility level visible in each market series just prior to 2004, that the models with K = 2, 3 have captured as a likely break point, so it is not too surprising that the K = 1 model puts the break close to that period too, at least for the SP500 series. Further, the single break model seems mis-specified in this case, with at least two clearly visible and large changes in volatility level during the time period, and since mis-specified models often induce bias into parameter estimation, that may explain the curious K = 1 result for HSI. 5.1. Forecasting study This section considers volatility and Value-at-Risk forecasting for the last 400 trading days in each sample return series. For S&P500 the forecast sample period is 24th July 2008–24th February, 2010. For HSI it is 24th July, 2008 to 5th March, 2010. This period includes the onset and major effects of the GFC in each market.
14
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
Table 8 Forecast accuracy for all 10 models for both markets under three proxies. AT-N ∗
GARCH-t
TVGARCH-t
2
0
1
2
3
2
1.559
1.557
1.588
1.572
1.564
1.547
1.561
0.940
0.931
1.005
0.974
0.952
0.925
0.945
0.953
0.925
0.922
0.968
0.960
0.933
0.918
0.930
1.178
1.176
1.180
1.177
1.195
1.180
1.184
1.152
1.179
0.728
0.726
0.727
0.720
0.753
0.727
0.734
0.684
0.728
0.673
0.672
0.667
0.662
0.690
0.678
0.672
0.641
0.667
Proxy 1
1.787
1.780
1.761
1.755
1.765
1.813
1.800
1.780
1.761
Proxy 2
1.352
1.332
1.306
1.276
1.313
1.389
1.362
1.342
1.313
Proxy 3
1.237
1.221
1.195
1.175
1.200
1.273
1.250
1.225
1.203
Proxy 1
1.301
1.296
1.286
1.292
1.319
1.315
1.311
1.271
1.289
Proxy 2
1.071
1.060
1.046
1.062
1.109
1.104
1.109
0.996
1.065
Proxy 3
0.920
0.912
0.899
0.914
0.955
0.949
0.954
0.860
0.914
Model
GARCH-N
TVGARCH-N
K
0
1
2
Proxy 1
1.569
1.568
Proxy 2
0.973
0.965
Proxy 3
0.952
Proxy 1 Proxy 2 Proxy 3
3
AT-t ∗
S&P500 RMSE
MAD
HSI RMSE
MAD
AT-N (AT-t) denotes a TV-GARCH with K = 2 and the restriction γ1 = γ2 , with Gaussian (Student-t) errors.
During this period one-step ahead forecasts will be generated for volatility, i.e. hn+1 , and 1% Value-at-Risk under a range of models, including: GARCH(1,1) with Gaussian and Student-t errors, TVGARCH models with Gaussian and Student-t errors where K = 1, 2 and 3. Also, as a proxy for Amado and Teräsvirta’s (2008, 2013) b3 model, we consider a TV-GARCH with K = 2 and the restriction γ1 = γ2 , with Gaussian and Student-t errors which are denoted AT-N and AT-t respectively. Models √ are estimated by MCMC, as described above, and one-step-ahead forecasts of conditional standard deviation and 1% VaR, hn+1 , VaRn+1 , are estimated by their respective posterior means over the MCMC iterates in the post-burn-in period, which are re-simulated for each of the 400 days forecast. Table 8 shows forecast accuracy for all models under three volatility proxies: absolute returns and two range-based proxies, in both markets, using root mean square error (RMSE) and mean absolute deviation (MAD) as loss functions. The first range-based proxy is that by Parkinson (1980), the second is based on Alizadeh et al. (2002); these were employed by Lin et al. (2012). The formulas for the three volatility proxies are: 1. Proxy1 = |yt |. √ 2. Proxy2 = Rt / 4 ln(2) (Parkinson, 1980). 3. Proxy3 = exp[ln(Rt ) − 0.43 + 0.292 /2] (Alizadeh et al., 2002). Fig. 10 shows the competing volatility forecasts for some models together with absolute returns for S&P500 during the forecast period. Clearly, for S&P500 the most accurate volatility forecast model is the TV-GARCH with K = 3 and Gaussian errors, whilst the least accurate forecaster is the GARCH-t model, during this 400 day period. In Fig. 10 the TV-GARCH with K = 3 and Gaussian errors is the thick dark line which is usually the minimum forecast for hn+1 for each day. The maximum forecast for volatility is the thin dark green line given by the GARCH-t model. Clearly, since the majority of absolute returns are below the volatility forecasts, and especially after the crisis the TVGARCH models, led by the K = 3 Gaussian model, recovered with the fastest decreasing volatility forecasts, this explains why the TVGARCH models did better than the GARCH models, and especially why K = 3 was the most accurate overall. There is a similar story for HSI, except that under RMSE the most accurate volatility forecast model is the TV-GARCH with K = 3 and Student-t errors. Overall, models with K = 3 outperformed models with K = 2 which outperformed models with K = 1 and K = 0. Clearly the TV-GARCH specifications have added forecasting power over simple GARCH models in these two market return series. It is possible that higher values of K could lead to further increases in forecast accuracy. However, since the identification of higher numbers of regimes, for a fixed sample size, becomes problematic, we leave this exploration for future work on larger and longer samples of data. Table 9 shows the violation rates for each model’s 400 1% VaR forecasts in the forecast sample in each market. Also shown are the p-values from the three standard tests for coverage, UC Kupiec (1995) and independence and coverage: CC Christoffersen (1998) and DQ Engle and Manganelli (2004). For the S&P500 market, all models with Gaussian errors are rejected for having too high a violation rate: thus they significantly under-estimate the 1% VaR level. Whilst the GARCH-t and
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
15
Fig. 10. The forecast volatility for the S&P500 index in the forecast sample under various GARCH and TVGARCH models.
Table 9 1% VaR forecast violation rate for all 10 models for both markets during forecast sample. AT-N ∗
GARCH-t
TVGARCH-t
3
2
0
1
2
3
2
0.025 0.011 0.031 0.000
0.045 0.000 0.000 0.000
0.025 0.011 0.031 0.000
0.015 0.349 0.589 0.006
0.0175 0.173 0.349 0.000
0.629 0.835 0.974
0.0175 0.173 0.349 0.438
0.015 0.349 0.589 0.774
1.000 0.9603 0.9744
0.015 0.349 0.589 0.916
1.000 0.960 0.974
1.000 0.960 0.995
1.000 0.960 0.993
1.000 0.960 0.980
1.000 0.960 0.945
1.000 0.960 0.985
Model
GARCH-N
TVGARCH-N
K
0
1
2
0.025 0.011 0.031 0.000
0.035 0.0001 0.0003 0.000
0.0125 0.629 0.835 0.992
0.015 0.349 0.589 0.947
AT-t ∗
S&P500
αˆ pUC pCC pDQ HSI
αˆ pUC pCC pDQ
AT-N (AT-t) denotes a TV-GARCH with K = 2 and the restriction γ1 = γ2 , with Gaussian (Student-t) errors.
TVGARCH-t with K = 1 are rejected by the DQ test, the TV-GARCH with K = 2 and K = 3 and the model with γ1 = γ2 , all with Student-t errors, cannot be rejected by the DQ test. For HSI, all models do well at forecasting the 1% VaR level, despite the effects of the GFC, and none can be rejected by any test. 6. Conclusions This paper proposes a Bayesian approach to detect smooth structural regime-transitions via estimation of a smoothly time-varying parameter GARCH model. An informative prior distribution is proposed that aids in identification of parameters in each regime, and in inference for the speed of transition parameter. A simulation study shows that the proposed Bayesian approach can provide an appropriate and accurate inference for all parameters of the proposed model. The empirical study illustrates our methods using daily stock returns of the S&P500 index of the US stock market and Hang Seng index (HSI) of the Hong Kong stock market, finding that volatility was typically high in the early part of the century, followed by a slow change to a period of low volatility, as markets rose slowly in value, followed by a subsequent faster change to a high period of volatility during and after the global financial crisis, precipitated by the housing price falls in mid-2006, finally followed by a transition post-crisis back to a lower level of volatility. The results of a forecast study show that the asymmetric TVGARCH model with three breaks, giving rise to this behavior, is clearly preferred for both stock markets regarding volatility forecasting, whilst models with two or three breaks and Student-t errors are comparable for Value-at-Risk forecasting. Future work could consider more regimes and generalize or diversify the types of different within-regime behaviors allowed.
Acknowledgments We thank the Editor, the Associate Editor, and anonymous referees for their insightful and helpful comments and suggestions which improved this paper. Cathy Chen is supported by the grants (NSC 99-2118-M-035-001-MY2 and NSC 101-2118-M-035-006-MY2) from the National Science Council (NSC) of Taiwan. The authors thank Miss Jane-Lin Wu for her initial empirical work on this project that did not make the current version of the paper and for her assistance with Fig. 1.
16
C.W.S. Chen et al. / Computational Statistics and Data Analysis (
)
–
References Alizadeh, S., Brandt, M.W., Diebold, F.X., 2002. Range-based estimation of stochastic volatility models. Journal of Finance 57, 1047–1092. Amado, C., Teräsvirta, T., 2008. Modelling conditional and unconditional heteroskedasticity with smoothly time-varying structure. Working Paper, Stockholm School of Economics. Amado, C., Teräsvirta, T., 2013. Modelling volatility by variance decomposition. Journal of Econometrics 175, 142–153. Baillie, R.T., Bollerslev, T., Mikkelsen, H.O., 1996. Fractionally integrated generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 74, 3–30. Baillie, R.T., Morana, C., 2009. Modeling long memory and structural breaks in conditional variances: an adaptive FIGARCH approach. Journal of Economic Dynamics and Control 33, 1577–1592. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327. Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modeling in finance; a review of the theory and empirical evidence. Journal of Econometrics 52, 5–59. Chen, C.W.S., Gerlach, R., Lin, M.H., 2010. Falling and explosive, dormant, and rising markets via multiple-regime financial time series models. Applied Stochastic Models in Business and Industry 26, 28–49. Chen, Q., Gerlach, R., Lu, Z., 2012. Bayesian value-at-risk and expected shortfall forecasting via the asymmetric Laplace distribution. Computational Statistics and Data Analysis 56, 3498–3516. Chen, C.W.S., So, M.K.P., 2006. On a threshold heteroscedastic model. International Journal of Forecasting 22, 73–89. Christoffersen, P., 1998. Evaluating interval forecasts. International Economic Review 39, 841–862. Dahlhaus, R., Subba Rao, S., 2006. Statistical inference for time-varying ARCH processes. Annals of Statistics 34, 1075–1114. Engle, R.F., 1982. Autoregressive conditional heteroskedasticity with estimates of United Kingdom inflation. Econometrica 50, 987–1007. Engle, R.F., Manganelli, S., 2004. CAViaR: conditional autoregressive value at risk by regression quantiles. Journal of Business and Economic Statistics 22, 367–381. Gerlach, R.H., Chen, C.W.S., 2008. Bayesian inference and model comparison for asymmetric smooth transition heteroskedastic models. Statistics and Computing 18, 391–408. Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. Hoogerheide, L.F., Kaashoek, J.F., van Dijk, H.K., 2007. On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: an application of flexible sampling methods using neural networks. Journal of Econometrics 139, 154–180. Hudson, B., Gerlach, R., 2008. A Bayesian approach to relaxing parameter restrictions in multivariate GARCH models. TEST 17, 606–627. Kupiec, P.H., 1995. Techniques for verifying the accuracy of risk measurement models. The Journal of Derivatives 3, 73–84. Lin, E.M.H., Chen, C.W.S., Gerlach, R., 2012. Forecasting volatility with asymmetric smooth transition dynamic range models. International Journal of Forecasting 28, 384–399. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1091. Parkinson, M., 1980. The extreme value method for estimating the variance of the rate of return. Journal of Business 53, 61–65.