Moments of Markov switching models

Moments of Markov switching models

Journal of Econometrics 96 (2000) 75}111 Moments of Markov switching modelsq Allan Timmermann Department of Economics UCSD, 9500 Gilman Drive, La Jol...

2MB Sizes 0 Downloads 108 Views

Journal of Econometrics 96 (2000) 75}111

Moments of Markov switching modelsq Allan Timmermann Department of Economics UCSD, 9500 Gilman Drive, La Jolla, CA 92093-0508, USA Received 1 July 1997; received in revised form 1 April 1999

Abstract This paper derives the moments for a range of Markov switching models. We characterize in detail the patterns of volatility, skewness and kurtosis that these models can produce as a function of the transition probabilities and parameters of the underlying state densities entering the switching process. The autocovariance of the level and squares of time series generated by Markov switching processes is also derived and we use these results to shed light on the relationship between volatility clustering, regime switches and structural breaks in time-series models. ( 2000 Elsevier Science S.A. All rights reserved. JEL classixcation: C1 Keywords: Markov switching; Higher-order moments; Mixtures of normals; Volatility clustering

1. Introduction Markov switching models have become increasingly popular in economic studies of industrial production, interest rates, stock prices and unemployment rates. However, so far no study has characterized in any detail the moments that

q Two anonymous referees and an associate editor provided many useful suggestions for improvements of the paper. Discussions with Jim Hamilton and Martin Sola also were very helpful. E-mail address: [email protected] (A. Timmermann)

0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved. PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 5 1 - 2

76

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

these models can generate. This is an important omission since Markov switching models are often adopted by reseachers wishing to account for speci"c features of economic time series such as the asymmetry of economic activity over the business cycle (Hamilton, 1989; Neftci, 1984) or the fat tails, volatility clustering and mean reversion in stock prices (Cecchetti et al., 1990; Pagan and Schwert, 1990; Turner et al., 1989) and interest rates (Gray, 1996; Hamilton, 1988). These features translate into the higher-order moments and serial correlation of the data-generating process, so a characterization of the moments and autocorrelation function generated by Markov switching will allow researchers to better understand when to make use of this class of models. The contribution of this paper is to characterize the moments and serial correlation of the level and the squared values of Markov switching processes. Markov switching models belong to a general class of mixture distributions. Econometricians' initial interest in this class of distributions was based on their ability to #exibly approximate general classes of density functions and generate a wider range of values for the skewness and kurtosis than is obtainable through use of a single distribution. Along these lines, Granger and Orr (1972) and Clark (1973) considered time-independent mixtures of normal distributions as a means of modeling non-normally distributed data. These initial models, however, did not capture the time dependence in the conditional variance found in many economic time series, as evidenced by the vast literature on ARCH models that started with Engle (1982). By allowing the mixing probabilities to display time dependence, Markov switching models can be seen as a natural generalisation of the original timeindependent mixture of normals model and we show that this feature enables them to generate a wide range of coe$cients of skewness, kurtosis and serial correlation even when based on a very small number of underlying states. Regime switches in economic time series can be parsimomiously represented by Markov switching models by letting the mean, variance and possibly the dynamics of the series depend on the realization of a "nite number of discrete states. In increasing order of generality we consider in this paper three types of models, each of which has been adopted in applied econometric studies. The basic Markov switching model is y "k t#p te , (1) t s s t where S "1, 2,2, k denotes the unobserved state indicator which follows an t ergodic k-state Markov process and e is a zero-mean random variable which is t identically and independently distributed over time. The number of states, k, is assumed to be "nite. This model has been used in empirical work by Engel and Hamilton (1990). The second Markov switching model allows for state-independent autoregressive dynamics: q y "k t# + / (y !k t~j)#p te . (2) t s j t~j s s t j/1

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

77

Hamilton (1989) used this type of autoregressive model to analyze growth in US GDP. Finally, we also consider a model with state-dependent dynamics in the autoregressive part: q y "k t# + / t~j(y !k t~j)#p te . (3) t s js t~j s s t j/1 For all models the errors are assumed to be independently distributed with respect to all past and future realizations of the state variable, i.e. F(e DS )"F(e ) for all values of i, where F (.) denotes the cumulative density t t`i t function of e .1 The stochastic transition probability matrix P that determines t the evolution in S is given by t Prob(S "j D S "i, S "k,2,)"Prob(S "j D S "i)"p , t`j t t~1 t`1 t ij k + p "1 for all i, (4) ij j/1 so that the states follow a homogenous Markov chain. In practice, if the process is not irreducible and not all states are visited with non-zero probability in the steady state, then the moment analysis can simply be conducted on the subset of states occurring with non-zero stationary probability.2 The higher is p , the ii longer the process is expected to remain in state i. For this reason we shall refer to p as measuring the &persistence' of the mixing of the underlying state ii densities. We "nd in this paper that these persistence parameters are very important in determining the higher order moments of the Markov switching process. Furthermore, once autoregressive parameters are introduced into the process as in the second and third switching models, this gives rise to cross-product terms that enhance the set of third- and fourth-order moments and the patterns in serial correlation and volatility dynamics that these models can generate. Even low-order autoregressive Markov switching processes with a small number of states provide the basis for very #exible econometric models. Our results prove useful in understanding the literature on structural breaks and volatility clustering since the models that have been adopted in this literature are closely related to Markov switching models with infrequent switches between states. 0)p )1, ij

1 Herein lies a key di!erence to ARCH models which is another type of time-dependent mixture process. While Markov switching models mix a "nite number of states with di!erent mean and volatility parameters based on an exogenous state process, ARCH models mix distributions with volatility parameters drawn from an in"nite set of states driven by lagged innovations to the series. 2 An analysis of the unconditional moments starting from the steady state need not assume that the Markov process is irreducible since, if either a single state or a block of states is absorbing, all other states will have zero steady-state probabilities.

78

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

The paper is organized as follows. Section 2 provides results on the moments of the basic Markov switching process without autoregressive terms. Section 3 extends the results to the second model with state-independent, autoregressive terms, while Section 4 analyses the moments of the most general model with state dependence both in the mean, variance and autoregressive coe$cients. Section 5 reports the autocovariance structure generated by the three Markov switching models and Section 6 discusses their relation to structural break and ARCH models.

2. The basic Markov switching model Let p"(n ,2,n )@ be the k-vector of steady-state (ergodic) probabilities that 1 k solve the system of equations P@p"p. These probabilities can be computed as the eigenvector (scaled so that its elements sum to one) associated with the unit eigenvalue of P@. Assuming that the state process started an in"nite number of periods back in time or that it is initialized by a random variable drawn from the stationary distribution, p is the vector of unconditional probabilities applying to the k states. The following proposition provides the moments of the basic Markov switching model. Proposition 1. Suppose the stationary Markov switching process (1), (4) started from its steady state characterized by the set of unconditional probabilities (p). Then the centered moments of the process are given by n k E[(y !k)n]" + n + C pj E[ej ](k !k)n~j, t i n j i t i i/1 j/0 where C "n!/(n!j)!j!. When e is t-distributed with q degrees of freedom, the n j t centered moments are k n E[(y !k)n]" + n + C pja (k !k)n~j, t i n j i j i i/1 j/0 where

G

qj@2b(j`1, q~j ) 2 2 b(1, q ) a" 22 j 0

if q'j and j is even, otherwise

and b(.) is the Beta function. When e follows a normal distribution we have t n k E[(y !k)n]" + n + C pjb (k !k)n~j, t i n j i j i i/1 j/0

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

79

where

G

j@2 < (2h!1) provided j is even and b" h/1 j 0 otherwise. A proof of Proposition 1 is given in the appendix. Since researchers are often particularly interested in the variance, skewness and kurtosis of their data, we characterize these moments more explicitly for the empirically popular mixture of normals model with two states: Corollary 1. Suppose that there are two states and that the increments are Gaussian. Then the unconditional variance (p2), coezcient of skewness (Jb ) and 1 coezcient of excess kurtosis (b ) of the basic Markov switching process (1), (4) are 2 given by p2"(1!n )p2#n p2#(1!n )n (k !k )2, 1 1 2 1 1 1 1 2 E[(y !k)3] t Jb , 1 (E[(y !k)2])3@2 t n (1!n )(k !k )M3(p2!p2)#(1!2n )(k !k2)2N 1 , 2 1 2 1 1 1 2 " 1 ((1!n )p2#n p2#(1!n )n (k !k )2)3@2 1 1 2 1 1 1 1 2 E[(y !k)4]!(3E[(y !k)2])2 a t t b , " , 2 (E[(y !k)2])2 b t where a"3n (1!n )(p2!p2)2#6(k !k )2n (1!n )(2n !1)(p2!p2) 1 2 1 2 1 1 1 1 1 1 2 #n (1!n )(k !k )4(1!6n (1!n )), 1 1 2 1 1 1 b"((1!n )p2#n p2#(1!n )n (k !k )2)2. 1 1 2 1 1 1 1 2 For a proof, see the appendix. It follows from Corollary 1 that, when the innovations are drawn from a Gaussian density, a necessary condition for the Markov switching process to generate skewness is that the means in the states di!er (k Ok ) and that di!erences in the variances of the states alone are 1 2 insu$cient to generate skewness.3 This also suggests that Markov switching

3 This is similar to the result in Bollerslev (1986) that the skewness of standard GARCH models without leverage e!ects is zero.

80

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

models "tted to high-frequency "nancial data whose means are often very small in all states may have trouble replicating successfully the skewness found in these data. It is useful to evaluate these expressions through some numerical examples. To establish a benchmark for some relevant parameter values, we report in Table 1 the maximum likelihood estimates from a simple two-state Markov switching model "tted to monthly excess returns on the stock price index in four countries. In three out of four markets the switching model identi"es an &outlier' state in which excess returns are very volatile with a large negative mean and a &normal' state with low volatility and small positive mean returns. The di!erence between the volatility parameter in the two states can be very large and is around three times higher in state 2 than in state 1 for the UK and US markets. Likewise the mean return di!erentials across the two states are very large in three of the four markets although the mean parameter is imprecisely estimated in the state with high volatility. In Germany the mean return parameters in the two states are roughly of the same magnitude but of opposite signs. Based on this evidence we consider in our numerical examples a variety of combinations of di!erences in the relative size of the mean and volatility parameters in two or more states. Fig. 1 presents a plot of the coe$cients of skewness and excess kurtosis of a two-state Markov switching process with parameters l "(1!1), r "(1 1). s s Thus the skewness and kurtosis in Fig. 1 is entirely driven by di!erences in the mean parameters of the model, while the variances are identical in the two states. To construct the "gure, these parameters were kept "xed and the probabilities of staying in the two states were varied over the grids [0.01, 0.99]. High, positive values of both skewness and excess kurtosis are obtained when the probability of staying in the second state (p ) is around 0.9 and the probability of staying in 22 the "rst state (p ) is not too high. Likewise, a large negative skewness and 11 a large positive excess kurtosis is obtained when p is around 0.9 and p is 11 22 small. A large, negative excess kurtosis is obtained along the line where the two transition probabilities are identical so that the process spends the same time in the two states. This has the e!ect of moving some of the tail probability mass towards the centre and hence lowers the kurtosis. To investigate the impact on the skewness and kurtosis of switching between two states with di!erent means and volatilities, we used a second parameter con"guration, setting l "(1 !3), r "(2 4). This parameter con"guration is s s close to what was found for three of the four models in Table 1. The plot, which is presented in Fig. 2, is very di!erent from Fig. 1. Now a very large excess kurtosis is produced when the probability of staying in the low volatility state (p ) is high and the probability of staying in the high volatility state (p ) 11 22 is low. This case mixes large, but rare, outliers with a low dispersion distribution, thus generating the extreme kurtosis. Since the high volatility state also has a low mean, a high value of p and a low value of p generate large 11 22

Fig. 1. Skewness and kurtosis of the two-state Markov switching model: e!ect of di!erent mean parameters. Based on the parameter values k "(1,!1), p "(1,1), the "gure plots the coe$cients of skewness and excess kurtosis of a two-state Markov switching model with Gaussian innovations s s as a function of the probability of staying in state 1 (P ) and the probability of staying in state 2 (P ). 11 22

A. Timmermann / Journal of Econometrics 96 (2000) 75}111 81

!19.7 34.6 76.8 12.8

State 2 Mean Volatility Stayer probability % Steady-state probability % 31.3 6.3 14.0

4.2 1.1 2.4

!5.3 28.2 91.3 24.3

5.2 13.7 97.2 75.7 13.1 3.3 5.7

3.5 0.9 1.9

Standard error

!24.9 48.6 85.3 9.2

7.8 15.7 98.5 90.8

Estimate

UK

31.5 7.1 7.5

3.2 0.8 0.9

Standard error

!39.8 31.7 71.2 5.6

7.8 13.2 98.3 94.4

Estimate

US

35.1 8.2 19.6

2.9 0.7 1.7

Standard error

Note: These results are based on Morgan Stanley Capital International index returns minus the risk-free T-bill rate. In each country state 1 is the state with the highest mean return.

9.4 17.6 96.6 87.2

State 1 Mean Volatility Stayer probability % Steady-state probability %

Estimate

Estimate

Standard error

Germany

France

Table 1 Regime switching model "tted to excess stock returns (monthly returns, 1970}1997). This table reports the result from estimating the following return model r "k #p ) u , where k is the mean, p is the standard deviation, s is the state at time t which follows a two-state Markov switching process, and u(t) t st st t t is independently and identically distributed normal variable. The mean and volatility parameters are reported in annual percentage terms

82 A. Timmermann / Journal of Econometrics 96 (2000) 75}111

Fig. 2. Skewness and kurtosis of the two-state Markov switching model: e!ect of di!erent mean and volatility parameters. Based on the parameter values k "(1,!3), p "(2,4), the "gure plots the coe$cients of skewness and excess kurtosis of a two-state Markov switching model with Gaussian s s innovations as a function of the probability of staying in state 1 (P ) and the probability of staying in state 2 (P ). 11 22

A. Timmermann / Journal of Econometrics 96 (2000) 75}111 83

84

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

negative skewness because of the presence in the distribution of large negative outliers.4 Researchers using Markov switching models sometimes identify states with similar means but very di!erent volatilities. To shed light on the sort of moments this situation gives rise to, Fig. 3 plots the excess kurtosis against combinations of variance parameters in two states that vary between 0.1 and 10. The mean parameters are identical in the two states and P " 0.97, P " 0.75, so state 11 22 1 is highly persistent while state 2 is not, matching the estimates in Table 1. Since the mean parameters in the two states are the same, the skewness equals zero throughout. However, a very substantial excess kurtosis is generated when p2 is 1 very small and p2 is very high. In this case the process spends most of the time in 2 the low volatility regime but occasionally shifts to a high volatility state, thus increasing the kurtosis. These "ndings are also representative of mixture processes with more than two states. Suppose that the asssumptions of Proposition 1 hold and that the increments are normally distributed. We refer to such a process as MSI. Then it follows from the proposition that the higher-order moments have the convenient representation E[(y !k)n]"p@M@ m , n n t where m is the (n#1)-vector (1 C 2 C 2 C 1)@, and n n 1 n j n n~1

C

(k !k)n 1 0

p2(k !k)n~2 1 1 M" 0 n 3p4(k !k)n~4 1 1 . b pn n 1

. . . . . . . . . . . . . .

(k !k)n k 0 p2(k !k)n~2 k k 0

(5)

D

3p4(k !k)n~4 k k . b pn n k

is the (n#1)xk matrix of moments E[pjtej(k t!k)n~j] for each of the normal s t s state densities entering the mixture distribution. It is clear from (5) and the form of M that the moment expressions are symmetric with respect to the underlying n 4 Clark (1973) showed that time-independent mixtures of normals can generate skewness and kurtosis beyond that of a single Gaussian distribution. Corollary 1 readily allows us to measure the e!ect on these moments of Markovian dependence in the state probabilities. We can illustrate this point using the grid points and parameter values underlying Fig. 2. Under time-independent switching p "1!p and the coe$cient of excess kurtosis ranges from !0.02 to 0.22 while with 22 11 state-dependent switching the range is much wider at !0.02 to 2.47.

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

85

Fig. 3. Kurtosis of the two-state Markov switching model: e!ect of di!erent volatility parameters. Based on the parameter values k "(1,1), and letting the state variances go from 0.1 to 10, the "gure s plots the coe$cient of excess kurtosis of a two-state Markov switching model with Gaussian innovations. The probability of staying in state 1 (P ) and the probability of staying in state 2 (P ) 11 22 were set equal to 0.97 and 0.75, respectively.

state densities. To illustrate this point, Fig. 4 plots the coe$cients of skewness and kurtosis as a function of p and p for the 3-state case with p " 11 22 13 p "0.1, p "p "p "1/3, l "(1 !3 0), r "(2 4 3). Hence a third 23 31 32 33 s s &intermediate' state is mixed with the two states from Fig. 2. Naturally, the same high values of skewness and kurtosis are not reached in Figs. 2 and 4 since p and p are now constrained to be less than 0.90, but the shapes of the two 11 22 "gures in this range of p , p -values are very similar. 11 22

Fig. 4. Skewness and kurtosis of the three-state Markov switching model: e!ect of di!erent mean and volatility parameters. Based on the parameter values k "(1,!3,0), p "(2,4,3), the "gure plots the coe$cients of skewness and excess kurtosis of a three-state Markov switching model with Gaussian s s innovations as a function of the probability of staying in state 1 (P ) and the probability of staying in state 2 (P ). The transition probabilities for state 11 22 3 all equal 1/3, while P "P "0.10. 13 23

86 A. Timmermann / Journal of Econometrics 96 (2000) 75}111

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

87

3. The simple autoregressive Markov switching model Before dealing with the general case of Eq. (2) that involves q lags of (y !k t), t s we "rst consider the more tractable "rst-order autoregressive model with normally distributed increments. This case demonstrates the e!ect of additional terms entering the expression for the moments that researchers pay most attention to. Notice from (2) that when q"1, the Markov switching process becomes y "k t#/ (y !k t~1)#p te , t s 1 t~1 s s t which we shall refer to as MSII. Upon substituting backward we get

(6)

= y !k t" + /i p t~ie . (7) 1 s t~i t s i/0 From the assumption that E[e DS ]"E[e ]"0 it still holds that t~i t t~i E[y !k t~1DS ]"0. Thus the "rst moment of y is unchanged and t~1 s t~1 t E[y ]"p@ E[y DS ]"p@l , where E[y DS ] is the k-vector whose ith element t t t s t t consists of E[y DS "i]. t t To state concisely the moments for this process, let i be a k-vector of ones, while I is the k-dimensional identity matrix, ( is the element-by-element k multiplication operator and B is the (k]k) matrix of transition probabilities for the &time-reversed' Markov chain that moves back in time: k + b "1. (8) ij j/1 Since Prob(S "jWS "i)"Prob(S "iDS "j)Prob(S "j)" t t`1 t`1 t t Prob(S "jDS "i)Prob(S "i), the &backward' transition probability t t`1 t`1 matrix B is related to the &forward' transition probabilities as follows:5 Prob(S "jDS "i)"b , 0)b )1, t t`1 ij ij

AB

n j , b "p (9) ij ji n i assuming n '0, and b "0 if j belongs to a group of absorbing states while i ij i does not. Again, if this condition is not satis"ed, the analysis can be performed on the sub-set of states for which the steady-state probabilities exceed zero. Using this notation, Proposition 2 presents expressions for the second to fourth centered moments of the "rst-order autoregressive Markov switching model:6 5 If b "p for all i and j, then the process is said to be time reversible. Since the diagonal elements ij ij of B and P are identical, the probability of remaining in a given state is always the same regardless of whether time moves forward or backward. Furthermore, in the case with two states it is easy to verify from the de"nitions of n and n that the Markov chain will be time reversible, although this 1 2 does not hold generally for processes with several states. 6 The proposition requires that (I !/ B) is invertible. Since D/ D(1 this will automatically be k 1 1 satis"ed for all transition probability matrices since B has a single eigenvalue equal to unity and its remaining eigenvalues are less than one.

88

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

Proposition 2. Suppose y follows the xrst-order autoregressive Gaussian Markov t switching process y "k t#/ (y !k t~1)#p te , D/ D(1, e &IIN(0,1), 1 t~1 s s t 1 t t s and assume that the process started from its steady-state distribution. Then the variance, skewness and kurtosis of y are given by t r2 , E[(y !k)2]"p@ (l !ki)((l !ki)# s s s t 1!/2 1 E[(y !k)3]"p@((l !ki)((l !ki)((l !ki)) t s s s #3/2p@((B ) (I !/2B)~1r2)((l !ki)) s s 1 1 k #3p@((l !ki)(r2), s s E[(y !k)4]"p@((l !ki)((l !ki)((l !ki)((l !ki)) t s s s s #6p@((l !ki)((l !ki)( r2) s s s #p@(I !/4B)~1(3r4#6/2(B ) (I !/2B)~1r2)(r2) s s 1 1 k s 1 k #6/2p@((B(I !/2B)~1r2)((l !ki)((l !ki)). s s s 1 1 k

A

B

Comparing Proposition 2 to the results for the case with normal increments stated in Proposition 1, we see that the unconditional variance is increased by the persistence in the y process. This is no di!erent from the usual autoregrest sive model without a Markov switching e!ect. Turning next to the skewness, a new term re#ecting the expectation of the cross-product term (k !k) st (y !k t~1)2 enters into this expression. This term will be negative when low t~1 s mean states tend to follow high variance states and will otherwise be positive. To demonstrate this e!ect, Fig. 5 plots skewness for the same parameter values as in Fig. 2 and with / " 0.9. A comparison of Figs. 2 and 5 shows that introducing 1 an autoregressive term into the model produces a very di!erent pro"le of the skewness. Large negative skew is still obtained when p and p are both high 11 22 so that the process does not switch very often and the negative term E[(k t!k)p2t] dominates. For lower values of p and p the process switches s s 11 22 more often and the positive term E[(k t!k)(y !k t~1)2] #attens the skewness s t~1 s pro"le compared to Fig. 2. Three components change in the expression for the kurtosis once an autoregressive lag is introduced in the Markov switching process. First, the contribution to kurtosis from p4t e4 is scaled by a factor (I !/4B)~1. Furthermore, terms 1 s t k re#ecting the expectation of (y !k t~1)2 times (k t!k)2 or p2t also contribute t~1 s s s to kurtosis. Hence, if high volatility states are followed by states with either high volatility or a mean parameter far away from the unconditional mean, this will tend to create fat tails and increase kurtosis. The highest kurtosis now occurs

Fig. 5. Skewness and kurtosis of the two-state autoregressive Markov switching model: e!ect of autoregressive parameter. Based on the parameter values k "(1,!3), p "(2, 4), / "0.9 the "gure plots the coe$cients of skewness and excess kurtosis of a two-state autoregressive Markov switching s s 1 model with Gaussian innovations as a function of the probability of staying in state 1 (P ) and the probability of staying in state 2 (P ). 11 22

A. Timmermann / Journal of Econometrics 96 (2000) 75}111 89

90

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

when both p and p are high but not too close to one so that rare jumps 11 22 resulting from mean shifts also add to the tail mass. Proposition 2 states results for a "rst-order autoregressive process whose moments are analytically tractable to derive. Similar analytical expressions for higher-order autoregressive Markov switching processes contain the same types of components and are not very insightful to derive as they quickly become intractable. However, we still need to have a method for obtaining the moments of speci"c higher order processes that researchers have in mind. We demonstrate how to do this in the case of the second moment of such processes and note that the skewness and kurtosis can also be derived using similar techniques. In the general case with an arbitrary, but "nite, number of autoregressive lags, the second moment of the Markov process can be derived in one of two ways. First, from (2) we have q E[(y !k)2]"E[(k t!k)2]# + /2E[(y !k t~j)2]#E[p2te2] t s j t~j s s t j/1 q q #2 + + / / E[(y !k t~i)(y !k t~j)]. (10) i j t~i s t~j s i;j j/1 Applying the steady-state probabilities to the "rst and third terms we get q E[(y !k)2]"p@((l !ki)((l !ki))# + /2E[(y !k t~j)2]#p@r2 s t s s j t~j s j/1 q q # 2 + + / / E[(y !k t~i)(y !k t~j)]. (11) i j t~i s t~j s i;j j/1 This equation involves variances and covariances of the process relative to the state-speci"c means. To derive an expression for these terms, we exploit the associated Yule}Walker equations. Multiplying (2) through by (y !k t~1), t~1 s 2,(yt~q!kst~q), we get a system of equations q E[(y !k t)(y !k t~m)]" + / E[(y !k t~i)(y !k t~m)], s i t~i s t~m s t s t~m i/1 m"1,2,q.

(12)

An extra term p@r2 appears on the right-hand side of (12) when m"0. This gives s (q#1) equations that can be used to jointly determine E[(y !k t)2],2, t s E[(y !k t)(y !k t~q)]. These terms can then be substituted into (11) to get the t s t s unconditional variance of the process around k. An alternative, and as far as we know, new method for obtaining second moments of Markov switching processes combines the companion form of the autoregressive model with the technique of expanding the state space of a Markov process proposed by Cox and Miller (1965). This is very convenient

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

91

from a computational point of view and does not require solving a set of q#1 equations in the autocovariances. We can always write (2) as

C D C D C DC D C DC D kt / / s 1 2 k t~1 y 1 0 s t~1 k t~2 1 y t~2 " s # ) ) y t

)

)

)

y t~q`1

k t~q`1 s

q~1

1

1

)

0 0

)

0

0

#

0

0

y

!k t~1 t~1 s y !k t~2 t~2 s y !k t~3 t~3 s ) )

1

pt 0 s 0 0

/ q

0

0

)

/

0

y !k t~q t~q s

e

t e t~1 e t~2 , )

0

(13)

)

0

e t~q`1

or, in matrix form,

z " l Ht #W(z !l Ht~1)#+ H e , t s t~1 s st t

(14)

where l Ht now lies in the kH"k ) q dimensional state space formed as the s Cartesian product S ]S ]S of the original q state spaces. t t~1 t~q`1 Taking expectations of z around the unconditional mean vector we get the t q]q covariance matrix E[(z !ki)(z !ki)@]"E[(l Ht !ki)(l Ht !ki)@] t t s s # WE[(z !l Ht~1)(z !l Ht~1)@]W@ t~1 s t~1 s

C

D

C

D

# E + H e e@+ @ H , st t t st

(15)

where we used that z !l Ht~1"+= Wi+ Ht~i~1e is uncorrelated with i/0 s t~1 s t~i~1 (k Ht !ki) and + Ht e . Notice also from (14) that s s t E[(z !l Ht )(z !l Ht )@]"WE[(z !l Ht~1)(z !l Ht~1)@]W@ t s t s t~1 s t~1 s # E + H e e@+ @ H , s t t t st

(16)

where E[+ Ht e e@+@ Ht ] is a q]q matrix whose "rst element is E[p2t] with all other s s tt s elements being zero. We refer to this matrix as <. It then follows that, provided

92

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

z is a stationary process so E[(z !l Ht )(z !l Ht )@]"E[(z !l Ht~1) t t s t s t~1 s (z !l Ht~1)@], the q2-vector of second moments centered around the state t~1 s means is vec(E[(z !l Ht )(z !l Ht )@])"(W?W) ) vec(E[(z !l Ht )(z !l Ht )@]) s t s t s t s t

AC

#vec E + H e e@+ @ H st t t st so that

DB

,

(17)

vec(E[(z !l Ht )(z !l Ht )@])"(I 2!(W?W))~1vec(<). (18) t s t s q Substituting this back into (15) we get the q2-vector of unconditional second moments: vec(E[(z !ki)(z !ki)@])"vec(E[(l Ht !ki)(l Ht !ki)@]) t t s s #(I 2#(W?W)(I 2!(W?W))~1)vec(<). q q

(19)

This expression is very convenient for computation of the variance of higherorder autoregressive Markov switching models. It also allows calculation of autocovariances from the cross-product terms. For example, E[(y !k) t (y !k)] will be the second element of the vector in (19). t~1 4. State-dependent autoregressive dynamics Again we initially consider the "rst-order autoregressive model and then demonstrate how to generalize the results to the less tractable, but similar, general case. The underlying model (which we shall refer to as MSIII) is y "k t#/ t~1(y !k t~1)#p te , D/ t~1D(1, e &IIN(0,1). t s 1s t~1 s s t s t Backwards substitution gives

A

B

(20)

l = (21) y !k t" + < / t~jp t~j e #p te , 1s s t~l s t t s l/1 j/1 so that E[y !k t]"0 and E[y ]"p@ls"k. To state parsimoniously the mot t s ments of this process, it is convenient to de"ne the (k]k) diagonal matrix of state-speci"c autoregressive coe$cients

C

U"

/ 0 11 0 / )

12

) )

D

,

/ 1k

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

93

where / is the "rst-order autoregressive coe$cient in state r. Proposition 1r 3 provides the second to fourth moments of this process: Proposition 3. Consider the xrst-order Gaussian Markov switching process with a state-dependent autoregressive term y "k t#/ t~1(y !k t~1)#p te , e &IIN(0,1), D/ t~1D(1, 1s t~1 s s t t s t s and suppose that the process started from its steady state. The variance, skewness and kurtosis of y are given by t E[(y !k)2]"p@((l !ki)((l !ki)#U2(I !BU2)~1r2#r2), s s t s s k E[(y !k)3]"p@((l !ki)((l !ki)((l !ki)) t s s s #3p@((BU2(I !BU2)~1r2)((l !ki)) s s k #3p@((l !ki)(r2), s s E[(y !k)4]"p@((l !ki)((l !ki)((l !ki)((l !ki)) t s s s s #6p@((BU2(I !BU2)~1r2)((l !ki)((l !ki)) s s s k #6p@((l !ki)((l !ki)(r2) s s s #p@U4(I !BU4)~1(3r4#6(BU2(I !BU2)~1r2)(r2) s s s k k #6p@((BU2(I !BU2)~1r2)(r2)#3p@r4. s s s k To demonstrate the e!ect on the coe$cients of skewness and kurtosis of having di!erent autoregressive parameters in the di!erent states, Fig. 6 keeps the parameters from Fig. 5 but sets / "0.99 and / "0.81, so that, compared 11 12 with the choice of / in Fig. 5, the serial correlation is now 0.09 higher in state 1 1 and 0.09 lower in state 2. The skewness is now positive for high values of p and p and is otherwise negative.7 The kurtosis still peaks when p and 11 22 11 p are large but is now #at on a larger part of the grid. Letting the auto22 regressive coe$cients di!er between the two states can signi"cantly change the skewness and kurtosis pro"les and a comparison of Figs. 2, 5 and 6 illustrate the complexity of the interaction between the Markov switching parameters and the autoregressive parameters in determining the higher-order moments.

7 The positive skewness for high values of the transition probabilities is caused by the term E[(k t!k)/2 t~1(y !k t~1)2] which is positive when a switch occurs from state 2 to state 1. This s 1s t~1 s term is higher in Fig. 6 (/ t~1"0.99 for s "2) than in Fig. 5 (/ "0.90). t~1 1 1s

Fig. 6. Skewness and kurtosis of the two-state autoregressive Markov switching model: e!ect of di!erent autoregressive parameters. Based on the parameter values k "(1,!3), p "(2, 4), / "0.99, / "0.81, the "gure plots the coe$cients of skewness and excess kurtosis of a two-state s s 11 12 autoregressive Markov switching model with Gaussian innovations as a function of the probability of staying in state 1 (P ) and the probability of staying 11 in state 2 (P ). 22

94 A. Timmermann / Journal of Econometrics 96 (2000) 75}111

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

95

Second moment results for higher-order autoregressive processes can again be obtained through the companion form of y in the expanded state space: t

C D C D C DC D C DC D kt / t~1 / t~2 s 1s 2s k t~1 1 0 y s t~1 1 k t~2 y t~2 " s # ) ) y t

)

)

y t~q`1

k t~q`1 s

)

1

)

0 0

)

0

0

#

0

0

y !k t~1 t~1 s y !k t~2 t~2 s y !k t~3 t~3 s )

1

)

1

pt 0 s 0 0

/ t~q qs

0

0

)

)

0

e t e t~1 e t~2 )

0

y !k t~q t~q s

,

(22)

)

0

e t~q`1

or, in matrix form,

z "l Ht #W Ht~1(z !l Ht~1)#+ H e . s t~1 s t s st t

(23)

To analyze this case we "rst derive an expression for E[(z !l Ht )(z !l Ht )@]. t s t s Let vec(E[(z !ik Ht )2 D SH"i]) be short-hand notation for the q2-vector of ext t s pectations of (z !ik Ht )(z !ik Ht )@ conditional on SH"i, let < , t s t t i s vec(E[+ Ht e e@+@ Ht D SH"i]) be a q2 vector of conditional expectations of the s tt s t residual term and let W be the q]q matrix W Ht~1 conditional on SH "i. As in i s t~1 Section 3, the dimension of the expanded state space is kH"kq. Stacking the moment conditions resulting from (23), we get the kHq2-vector

C

vec(E[(z !ik Ht )2 D SH"1]) t t s vec(E[(z !ik Ht )2 D SH"2]) t s t 2

D

vec(E[(z !ik Ht )2 D SH"kH]) t t s

C

(W ?W )vec(E[(z !ik Ht~1)2 D SH "1]) t~1 1 1 t~1 s (W ?W )vec(E[(z !ik Ht~1)2 D SH "2]) 2 2 t~1 s t~1 "(BH?I 2) q 2 (W H?W H)vec(E[(z !ik Ht~1)2 D SH "kH]) t~1 k k t~1 s

D

96

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

CD

< 1 < # 2 2
(24)

C

DCD

vec(E[(z !l Ht~1)2 D SH "1]) < t~1 t~1 s 1 vec(E[(z !l Ht~1)2 D SH "2]) < t~1 s t~1 "(BH?I 2)WK # 2 , q 2 2 vec(E[(z !l Ht~1)2 D SH "kH])
C

W ?W 1 1

WK "

0

D

0 W ?W 2 2

.

2

W H?W H k k

Under stationarity of the y process we thus have t

C

D

CD

vec(E[(z !ik Ht )2 D SH"1]) t s t vec(E[(z !ik Ht )2 D SH"2]) t t s "(I !(BH?I 2)WK )~1 m q 2 vec(E[(z !ik Ht )2 D SH"kH]) t s t

<

1

<

2 , 2
(25)

where m"kHq2. The q2-vector of unconditional second moments centered around the state means can now be extracted from (25) by pre-multiplying by a (q2]q2kH) matrix K given by K"((pH)@?I 2) q

C

n

2 n2 0 0 n 0 2 n2 1 " 2 2 2 2 2 0 2 n1 0 2 so that

1

0

0 0

2 nkH 2 0

2

0

2

0

2 2 2 2 n 0 2 2 2

D

,

2 nH k

E[(z !l Ht )(z !l Ht )@]"K(I !(BH?I 2)WK )~1(< < 2 < H)@. t s t s m q 1 2 k

(26)

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

97

Finally, the unconditional variance of z follows from (23) as t vec(E[(z !ki)(z !ki)@])"vec(E[(l Ht !ki)(l Ht !ki)@]) s t t s

CD

< 1 < # K ) WK (I !(BH?I 2)WK )~1 2 #vec(<), m q . <

kH

(27)

where <"E[+ Ht e e@+@ Ht ]. Again these expressions are easy to implement, do not s tt s require solving a set of Yule}Walker equations, and should prove useful to researchers wanting to characterize the moments of this quite complex class of Markov switching processes with state-dependent autoregressive parameters.

5. Autocorrelations of the Markov switching processes To fully understand the dynamic patterns that the Markov switching models generate, we need to characterize their autocovariance functions. Just how strong the serial correlation is for a given set of parameters can be computed from Proposition 4:8 Proposition 4. The autocovariance functions of the stationary Gaussian Markov switching processes MSI, MSII and MSIII starting from their steady state are given by E[(y !k)(y !k)] t`n t "p@((Bn(l !ki))((l !ki)) s s

(MSI)

p@((Bn(l !ki))((l !ki))#/n p@(I !/2B)~1r2 s s 1 k 1 s

(MSII)

p@((Bn(l !ki))((l !ki))#p@C (I !BU2)~1r2 s s s n k

(MSIII)

8 We remind the reader that the three Markov switching models are MSI: y "k t#p te , e &IIN(0,1), t s s t t MSII: y "k t#/ (y !k t~1)#p te , e &IIN(0,1), D/ D(1, t s 1 t~1 s s t t 1 MSIII: y "k t#/ t~1(y !k t~1)#p te , e &IIN(0,1), D/ t~1D(1. t s 1s t~1 s s t t 1s

98

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

where C is the (k]k) diagonal matrix n

C

C" n

c 1,n 0

0 c . 2,n . . c

D

k,n

with diagonal elements c "E[(
(28)

where v "((1!n ),!(1!n ),!n , n )@. Of particular interest is the "rst1 1 1 1 1 order autocovariance for the levels of the process which is given by (k !k )2n (1!n )(p #p !1). This expression has an intuitive interpreta1 2 1 1 11 22 tion since the autocovariance will be positive provided the presence of the process in the two states is persistent (p #p '1), otherwise a negative 11 22 autocovariance will result. Without a Markov switching e!ect (p "1!p ) 11 22 there will be no "rst order autocorrelation in the process. Many studies "nd signi"cant serial correlation in the squared values of economic time series. The success of ARCH models in empirical work can be explained by the fact that these models can generate such autocorrelation patterns. We show in the following Proposition that Markov switching models can also give rise to autocorrelation in the squares of a time series: Proposition 5. The autocovariance function of the squared values of the stationary Markov switching process starting from its steady state is given by E[(y2!E[y2])(y2 !E[y2 ])] t~n t~n t t " p@((Bn(r2#l2))((l2#r2))!(E[y2])2 t s s s s

(MSI)

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

"p@((Bn(I !/2B)~1r2)(l2)#p@((Bnl2)(l2) s s s s 1 k # /2np@(I !/4B)~1(3r4#6/2((B(I !/2B)~1r2)(r2)) s s 1 1 k s 1 1 k # /2np@(((I !/2B)~1r2)(l2) s s 1 1 k n # p@ + /2(n~i)(Bi(I !/2B)~1r2#Bil2)(r2 s s s 1 1 k i/1 # 4/n p@(Bn(((I !/2B)~1r2)(l ))(l )!(E[y2])2 1 k 1 s s s t " p@((Bn(I !BU2)~1r2)(l2)#p@((Bnl2)(l2) s s s s k # p@B (I !BU4)~1(3r4#6(BU2(I !BU2)~1r2)(r2) s s s k 2,n k # p@((B (I !BU2)~1r2)(l2) 2,n k s s n #p@ + (B (I !BU2)~1r2#B l2) s i s i k i/1 # 4p@(C (((I !BU2)~1r2)(l )!(E[y2])2, n k s s t

A

A

99

B

(MSII)

B

(MSIII)

where B is a diagonal k]k matrix whose rth diagonal element is given by 2,n B [r, r]"E[(
100

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

Intuition is provided by the autocovariance of y2 in the basic two-state model t which is given by E[(y2!E[y2])(y2 !E[y2 ])] t t t~n t~n (30) "n (1!n )(k2!k2#p2!p2)2vec(Pn)@v . 1 1 2 1 1 1 2 Hence the "rst-order autocovariance is E[(y2!E[y2]) t t (y2 !E[y2 ])]"n (1!n )(k2!k2#p2!p2)2(p #p !1). Again this 1 11 22 2 1 t~1 1 1 2 t~1 expression has the intuitive interpretation that there will be positive autocorrelation in y2 if the state mixing process is persistent (p #p '1), otherwise the t 11 22 squared values of the process will be negatively autocorrelated. If p "1!p , 11 22 there will be no "rst-order serial correlation in the squared values of the series. If there is no switching between the two regimes, i.e. n (1!n )"0, there will be 1 1 no autocorrelation in the squared values of the series. Fig. 7 plots the "rst-order autocorrelation in the squared levels of a Markov switching process for the case where l "(1, 1) and r "(2, 4). Since the two s s mean parameters are identical across states, there is no serial correlation in the level of the series. Re#ecting the di!erent volatility parameters, quite high serial correlation in the squares of the series is obtained, however, when the persistence of the two states is high.

6. Discussion of results Our results on the moments of the regime switching models show that the Markovian dependency in the mixture probabilities signi"cantly expands the scope for asymmetry and fat tails that can be generated by time-independent mixture models. This may help to explain the relative success that these models have had in applications to economic time series that clearly display such features. Our theoretical results are also closely related to the literature on structural breaks and persistence. Structural breaks are usually thought of as one-o! events that introduce non-stationarities in the data. In the context of a Markov switching model a structural break can be modeled as follows. Consider the following partitioned matrix of transition probabilities:

C

P

D

P 2 , P P 3 4 where, say, P is a k ]k matrix, P is a k ]k matrix and k #k "k. If some 1 1 1 4 2 2 1 2 element of P is nonzero while the elements of P all equal zero, then once a state 2 3 ordered k #1 or higher is reached then the process can never return to a state 1 ordered k or lower, so this event represents a break from a switching model that 1 mixes one block of states to another model mixing a di!erent block of states. If P"

1

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

101

Fig. 7. Autocorrelation in the squared levels of a two-state Markov switching model. Based on the parameter values k "(1, 1), p "(2, 4), the "gure plots the "rst-order autocorrelation of the squared s s levels of a two-state Markov switching model with Gaussian innovations as a function of the probability of staying in state 1 (P ) and the probability of staying in state 2 (P ). 11 22

the values of P and P are close to, but not all equal to, zero then there exists 2 3 a stationary &hyper model' that includes both blocks of states, but it will be very di$cult in any "nite sample to distinguish between a structural break and infrequently occurring regime switches. The parameter values for which the Markov switching models seem best capable of generating volatility clustering and autocorrelation, i.e. those values representing infrequent mixing of regimes with quite di!erent mean and variance parameters, are also those for which the switching models most closely resemble structural break models. Thus, our "ndings are clearly related to the literature on structural breaks and persistence in the "rst and second moments of time series. Perron (1989) considers a single exogenous shift in the level or slope of the trend function of a time series while Perron (1990) investigates the case of an exogenous break in the mean level. This latter case closely corresponds to a Markov switching model with k"2, p "p , and k Ok . In both 1 2 1 2

102

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

cases such a break in the time series is found to lead to higher rates of non-rejection of the null of a unit root since it introduces persistence in the series centered around the sample mean. As the size of the mean shift gets larger, the estimated autoregressive parameter in a model without serial correlation but with a single shift in the mean gets closer to one.9 We carried out an experiment to show that these studies are closely related to Proposition 4. Setting p2"p2"1, p "p "0.99, k "1, and varying k we 2 11 22 1 2 1 obtained the following population moments for the "rst-order autocorrelation of the process:

Value of k 2 2 3 4

First-order autocorrelation MSI 0.196 0.495 0.688

MSII (/ "0.9) 1 0.904 0.916 0.930

Even when there is no autocorrelation in any of the states (as for MSI), a shift in the mean can still produce very substantial persistence around the unconditional mean, and the e!ect is an increasing function of the size of the shift. As the shift gets large, and consistent with Perron (1990), the autocorrelation coe$cient for this case with rare shift gets closer to one. The proposition that persistence in the squared level of a Markov switching process can be the result of rare, large shifts in the unconditional variance is also related to earlier empirical "ndings. Lamoureux and Lastrapes (1990) study second moments and "nd that the existence of deterministic structural shifts in the unconditional variance, when not accounted for, will increase the persistence of squared residuals.10 They show that the resulting upward bias in GARCH estimates of persistence of variance can be quite substantial. For a number of individual stock return series once a structural break in the intercept term in the conditional variance equation is accounted for, the estimated persistence declines from an average of 0.98 to an average of 0.82. Indeed, Lamoureux and Lastrapes propose to use Markov switching models as a way of handling misspeci"ation problems due to occasional shifts in the conditional variance. Likewise, Diebold (1986) suggests that the very high persistence in the conditional variance observed in GARCH models may re#ect a failure to include

9 In related work, Hendry and Neale (1991) use Monte Carlo methods to quantify the e!ect that the introduction of shifts in the intercept of an autoregressive process has on the loss in the power of standard unit root tests. 10 Yet another possibility is that a misspeci"ed mean leads to overrejection of the null of no conditional heteroskedasticiy as recently reported by Lumsdaine and Ng (1997).

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

103

dummies for shifts in the intercept in the variance equation caused by exogenous shocks such as monetary policy regime changes.11

Appendix This appendix contains the proofs of the propositions and the corollary. Proof of Proposition 1. From the law of iterated expectations we have k E[(y !k)n]"E[E[(y !k)nDS ]]" + n E[(k #p e !k)n] t t t i i it i/1 k n " + n + C pj(k !k)n~jE[ej ], t i n j i i i/1 j/0

(A.1)

where we used Newton's binomial formula and the assumption that the steady-state probabilities apply. The expressions for the cases where e t follows a t-distribution or a normal distribution are based on the momentgenerating functions for these distributions. For example, from the momentgenerating function of the normal distribution we have that E[ej ]"0, if j is odd t and j@2 E[ej]" < (2h!1),b , t j h/1 otherwise. Substituting this expression into (A.1) we get the result. Proof of Corollary 1. First consider the variance. From the law of iterated expectations we have E[(y !k)2]"E[E[(y !k)2DS ]] t t t "n E[(k #p e !k)2]#(1!n )E[(k #p e !k)2] 1 1 1t 1 2 2t "(1!n )p2#n p2#(1!n )(k !k)2#n (k !k)2 1 2 1 1 1 1 1 2 "(1!n )p2#n p2#n (1!n )(k !k )2. 1 1 2 1 1 1 1 2

(A.2)

11 In principle, ARCH and Markov switching e!ects could be combined to produce a highly #exible, nonlinear mixture model as proposed by Hamilton and Susmel (1994). Our results can explain why such an extension may be necessary because the Markov switching model only appears to be able to generate limited persistence in the squared values of a time series.

104

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

We next compute the skewness and kurtosis of this model: E[(y !k)3]"E[E[(y !k)3DS ]] t t t "(1!n )E[(k #p e !k)3]#n E[(k #p e !k)3] 1 2 2t 1 1 1t "n (1!n )(k !k )M3(p2!p2)#(1!2n )(k !k )2N, 2 1 2 1 1 1 1 1 2 (A.3) and hence the coe$cient of skewness is given by E[(y !k)3] n (1!n )(k !k )M3(p2!p2)#(1!2n )(k !k )2N t 2 1 2 1 . 1 1 1 2 " 1 (E[(y !k)2])3@2 ((1!n )p2#n p2#n (1!n )(k !k )2)3@2 1 1 2 1 1 1 t 1 2 (A.4) To compute the coe$cient of kurtosis we proceed as follows: E[(y !k)4]"E[E[(y !k)4DS ]] t t t "(1!n )E[(k #p e !k)4]#n E[(k #p e !k)4] 1 2 2t 1 1 1t "(1!n )M3p4#(k !k)4#6p2(k !k)2N 2 2 2 2 1 #n M3p4#(k !k)4#6p2(k !k)2N. 1 1 1 1 1

(A.5)

Simplifying the expression by using that k"n k #(1!n )k , we get the 1 1 1 2 coe$cient of excess kurtosis E[(y !k)4]!3(E[(y !k)2])2 a t t , , (E[(y !k)2])2 b t a"3n (1!n )(p2!p2)2#6(k !k )2n (1!n )(2n !1)(p2!p2) 1 2 1 2 1 1 1 1 1 1 2 #n (1!n )(k !k )4(1!6n (1!n )), 1 1 2 1 1 1 b"((1!n )p2#n p2#n (1!n )(k !k )2)2. 1 1 2 1 1 1 1 2

(A.6)

Proof of Proposition 2. Squaring y around its unconditional mean and taking t expectations we have E[(y !k)2]"E[(k t!k)2#/2(y !k t~1)2#p2te2] t s 1 t~1 s s t #2/ Cov(k t!k, y !k t~1)#2Cov(k t!k, p te ) 1 s t~1 s s s t #2/ Cov(y !k t~1, p te ). 1 t~1 s s t

(A.7)

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

105

From (7), the assumption that e is iid and the independence at all leads and lags t between e and S , it follows that all three covariance terms are equal to zero so t t that E[(y !k)2]"p@E[(y !ki)((y !ki)DS ] t t t t

A

B

= "p@(l !ki)((l !ki)#p@/2 + /2iPir2 #p@r2 s 1 1 s s s i/0 "p@((l !ki)((l !ki)#/2(1!/2)~1r2#r2) s s 1 1 s s r2 "p@ (l !ki)((l !ki)# s , (A.8) s s 1!/2 1 where i is a k-vector of ones, ( is the element-by-element multiplication operator and E[(y !ki)((y !ki)DS ] stacks the vector E[(y !k)( t t t t (y !k)DS "i] for i"1,2,k. The third equality uses the property of the t t steady-state probabilities that p@Pi"p@. The third centered moment can be derived using the result that

A

B

E[(y !k)3]"E[(k t!k)3]#3/2E[(k t!k)(y !k t~1)2] t s 1 s t~1 s (A.9) #3E[(k t!k)p2te2], s t s which can be veri"ed by expanding Eq. (6) around k. To derive an expression for the second term notice that k E[(y !k t)2DS "i]"/2 + E[(y !k t~1)2DS "jWS "i] ' t s 1 t~1 s t t~1 t j/1 Prob(S "jDS "i)#p2 i t~1 t k "/2 + E[(y !k t~1)2DS "j] ) b #p2, (A.10) 1 t~1 s i t~1 ij j/1 since the expectation of (y !k t~1)2 conditional on S is the same as its t~1 s t~1 expectation conditional on S and S . Stacking the equations resulting from t~1 t setting i"1,2,k, we have E[(y !l t)2D S ]"/2 ) B ) E[(y !l t~1)2D S ]#r2, 1 t~1 s s t s t t~1 so that, under stationarity of the process,

(A.11)

(A.12) !l t~1)2D S ]"(I !/2B)~1 ) r2, s t~1 s 1 t~1 k where I is the k-dimensional identity matrix. Using this in (A.11) and noting k that E[(y !l t~1)2D S ]"B(I !/2B)~1 ) r2, we get t~1 s s 1 t k E[(y !k)3]"p@((l !ki)((l !ki)((l !ki)) t s s s #3/2p@((B(I !/2B)~1r2)((l !ki)) s s 1 1 k (A.13) #3p@((l !ki)(r2). s s E[(y

106

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

The fourth centered moment is given by E[(y !k)4]"E[(k t!k)4]#6/2E[(k t!k)2(y !k t~1)2] 1 s t s t~1 s #6E[(k t!k)2p2te2]#/4E[(y !k t~1)4] 1 t~1 s s t s #6/2E[(y !k t~1)2p2te2]#E[p4te4]. s t 1 t~1 s s t Most of these terms are simple to evaluate, but notice that

(A.14)

E[(y !l t)4D S ]"/4B ) E[(y !l t~1)4D S ] t s 1 t~1 s t t~1 #3r4#6/2E[(y !l t~1)2r2tD S ]. 1 t~1 s s s t From (6) we also have !l t~1)2r2tD S ]"(B ) E[(y !l t~1)2D S ])(r2 t~1 s s t t~1 s t~1 s "(B ) (I !/2B)~1r2)(r2. k 1 s s Thus, provided the process is stationary, we get E[(y

E[(y !l t)4D S ]"(I !/4B)~1(3r4#6/2(B ) (I !/2B)~1r2)(r2). s s 1 1 k s 1 t s t k (A.15) Using this together with the equation E[(y !k)4]"p@E[(y !ki)4DS ], (A.14) t t t becomes E[(y !k)4]"p@((l !ki)((l !ki)((l !ki)((l !ki)) t s s s s #6/2p@((B(I !/2B)~1r2)((l !ki)((l !ki)) s s s 1 1 k #6p@((l !ki)((l !ki)(r2) s s s #/4p@(I !/4B)~1(3r4#6/2(B ) (I !/2B)~1r2)(r2) s s 1 1 k s 1 1 k (A.16) #6/2p@((B ) (I !/2B)~1r2)(r2)#3p@r4. s s s 1 1 k Collecting terms we get the expression in Proposition 2. Proof of Proposition 3. The unconditional variance of the process underlying MSIII can be evaluated from the expression E[(y !k)2]"E[(k !k)2#/2 (y !k )2#p2e2] t st 1st~1 t~1 st~1 st t #2Cov(k t!k, p te )#2Cov(k t!k, / t~1(y !k t~1)) s t s 1s t~1 s s (A.17) #2Cov(/ t~1(y !k t~1), p te ). t~1 s s t 1s Again the covariance terms are zero so only the second term has changed from (A.7). To evaluate this term we condition on S to get the set of equations t (A.18) E[(y !l t)2D S ]"B ) U2E[(y !l t~1)2D S ]#r2, s t t~1 s t~1 t s

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

107

where U is the k]k diagonal matrix de"ned just before Proposition 3. Under stationarity of y we have t (A.19) E[(y !l t~1)2D S ]"(I !BU2)~1r2, s t~1 k t~1 s from which the variance of y easily follows from (A.17): t E[(y !k)2]"p@((l !ki)((l !ki))#p@(U2(I !BU2)~1r2#r2). s s t s s k (A.20) To derive the third moment, notice that the only new term relative to (A.9) is 3E[(k t!k)/2 t~1(y !k t~1)2]. Conditioning on S and applying the steadys 1s t~1 s t state probabilities, we get E[(k t!k)/2 t~1(y !k t~1)2] 1s t~1 s s (A.21) "p@((B ) U2(I !BU2)~1r2)((l !ki)). s s k Inserting this into (A.9) and using the similar results for MSII, the skewness result follows immediately. The fourth moment of MSIII can be evaluated from the expression E[(y !k)4]"E[(k t!k)4]#6E[(k t!k)2/2 t~1(y !k t~1)2] t s s 1s t~1 s #6E[(k t!k)2p2te2]#E[/4 t~1(y !k t~1)4] s t 1s t~1 s s (A.22) #6E[/2 t~1(y !k t~1)2p2te2]#E[p4te4]. t~1 s s t s t 1s The second, fourth and "fth terms in this expression have changed compared with (A.14). We derive these expressions as follows: E[(k t!k)2/2 t~1(y !k t~1)2] 1s t~1 s s "p@((BU2(I !BU2)~1r2)((l !ki)((l !ki)), k s s s E[/2 t~1(y !k t~1)2p2te2]"p@((BU2(I !BU2)~1r2)(r2). t~1 s s t k s s 1s Finally, conditioning on S we get the system of equations t E[(y !l t)4D S ]"BU4 ) E[(y !l t~1)4D S ]#3r4 s t t~1 s t~1 t s #6(BU2(I !BU2)~1r2)(r2. s s k Hence E[(y !l t)4D S ]"(I !BU4)~1(3r4#6(BU2(I !BU2)~1r2)(r2), s s t s s k t k

(A.23) (A.24)

(A.25)

(A.26)

and the fourth term entering (A.22) is p@U4 times this expression. Inserting these terms into (A.22), we get the kurtosis expression stated in Proposition 3.

108

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

Proof of Proposition 4. We "rst compute E[(y !k)(y !k)] for MSI and then t t~1 demonstrate how to generalize the result. Notice that E[(y !k)(y !k)] t t~1 "E[((k t!k)#p te )((k t !k)#p t~1e )] s t~1 s s t s ~1 (A.27) "E[(k t!k)(k t !k)]"p@((B(l !ki))((l !ki)). s s s ~1 s When we consider the nth-order autocovariance of y , the only part of the t calculation that changes is the transition probabilities, i.e. instead of using B we use Bn. To compute the autocovariances of MSII, notice that E[(y !k)(y !k)]"E[((k t!k)#/ (y !k t~1)#p te ) t t~1 s 1 t~1 s s t ((y !k t~1)#(k t~1!k))]. (A.28) t~1 s s Obviously, E[p te (y !k t~1)]"E[p te (k t~1!k)]"0, and, from the indepens s t s s t t~1 dence of e and S , t~i t E[((k t!k)(y !k t~1))]"E[(k t~1!k)/ (y !k t~1)]"0. (A.29) s t~1 s s 1 t~1 s This leaves us with the terms E[(k t!k)(k t~1!k)#/ (y !k t~1)2], and s s 1 t~1 s hence E[(y !k)(y !k)]"p@((B(l !ki))((l !ki)) t t~1 s s (A.30) #/ p@(I !/2B)~1r2. s 1 1 k This can easily be generalized to obtain the nth-order autocovariance by noting that n y !k"k t`n!k#/n (y !k t)# + /n~ip t`ie , t`n s 1 t s 1 s t`i i/1 so that the autocovariance becomes

(A.31)

E[(y !k)(y !k)]"E[(k t`n!k)(k t!k)#/n (y !k t)2] 1 t s t`n t s s "p@((Bn(l !ki))((l !ki))#/n p@(I !/2B)~1r2. s 1 1 k s s (A.32) To obtain the "rst-order autocovariance of the MSIII process, we need to evaluate E[(k t!k)(k t~1!k)#/ t~1(y !k t~1)2]: s s 1s t~1 s E[(y !k)(y !k)]"p@((B(l !ki))((l !ki))#p@U(I !BU2)~1r2. s t t~1 s s k (A.33) Second- and higher-order autocorrelations do not follow as easily from this expression as (A.32) follows from (A.31), however, due to the state dependence in

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

109

/. To evaluate the nth-order autocovariance, we need to compute E[(y !k)(y !k)]"E[(k t`n!k)(k t!k)] t`n t s s n~1 #E < / t`i (y !k t)2 . 1s t s i/0 This expression can be written as stated in Proposition 4.

CA

B

D

(A.34)

Proof of Proposition 5. Again we "rst derive the result for the simple MS model (MSI). Note that for this process y2"(k2t#2k tp te #p2te2), t s s s t s t y2 "(y !k t~1)2#2k t~1(y !k t~1)#k2t~1, t~1 t~1 s s s t~1 s so their expected cross-product is

(A.35) (A.36)

E[y2y2 ]"E[k2t(y !k t~1)2#k2tk2t~1 s t~1 t t~1 s s s #p2t e2(y !k t~1)2#p2te2k2t~1] s t t~1 s s t s "p@((Br2)(l2)#p@((Bl2)(l2)#p@((Br2)(r2) s s s s s s (A.37) #p@((Bl2)(r2), s s where we used that, for the simple Markov switching model, E[(y !l )2D S ]"r2. The expression in Proposition 5 follows by collecting s t s t terms. Once again, the general expression for E[(y2!E[y2])(y2 !E[y2 ])] t~n t~n t t can be derived by substituting Bn for B. For MSII, the "rst-order autocovariance of y can be based on t y2!E[y2]"k2#/2(y !k )2#p2e2#2k / (y !k ) st t t 1 t~1 st~1 st t st 1 t~1 st~1 (A.38) #2k tp te #2/ (y !k t~1)p te !E[y2], t s s t 1 t~1 s s t and y2 !E[y2 ]"(y !k t~1)2#2k t~1(y !k t~1)#k2t~1!E[y2 ]. t~1 t~1 s t~1 s t~1 s t~1 s (A.39) Using that the expected value of cross-product terms in which e enters linearly is t zero we get E[(y2!E[y2])(y2 !E[y2 ])] t~1 t~1 t t "E[k2t(y !k t~1)2#k2t k2t~1 s t~1 s s s #/2(y !k t~1)4#/2(y !k t~1)2k2t~1 1 t~1 s 1 t~1 s s #p2te2(y !k t~1)2#p2te2k2t~1#4/ k tk t~1(y !k t~1)2] s t t~1 s s t s 1 s s t~1 s (A.40) !(E[y2])2. t

110

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

A similar expression holds for MSIII, substituting / t~1 in place of / . To get 1s 1 the nth-order autocovariance we use that, for MSII,

A

B

2 n y2 "k2t`n#/2n(y !k t)2# + /(n~i)p t`ie s 1 s t`i t`n 1 t s i/1 #2k t`n/n (y !k t)#f (e ,2,e ), s t`n t`1 s 1 t

(A.41)

where f ( ) ) is a linear function of its arguments and hence will be uncorrelated with terms dated period t or earlier. For MSIII, the similar expression is more complex:

A A

B B

A A

B

B

2 n~1 n n~1 y2 "k2t`n# < /2 t`i (y !k t)2# + < / t`j p t`ie s t`n 1s 1s t s s t`i i/0 i/1 j/i n~1 #2k t`n < / t`i (y !k t)#g(e ,2,e ), (A.42) 1s s t s t`n t`1 i/0 where again g( ) ) is a linear function of its arguments and we de"ne
(A.43)

A

B

n #((y !k t)2#k2t) + /2(n~i)p2t`i 1 s s t s i/1 #4/n k t`nk t(y !k t)2]!(E[y2])2 1 s s t t s

n~1 "E[k2t`n(y !k t)2#k2t`nk2t #(y !k t)4 < /2 t`i 1s s s s t s t s i/0 n~1 #(y !k t)2k2t < /2 t`i 1s t s s i/0 n n~1 #((y !k t)2#k2t) + < /2 t`j p2t`i 1s s s t s i/1 j/i n~1 #4k t`nk t(y !k t)2 < / t`i]!(E[y2])2 1s t s s t s i/0 which can be written as stated in Proposition 5.

A

(MSII)

B

(MSIII)

A. Timmermann / Journal of Econometrics 96 (2000) 75}111

111

References Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307}327. Cecchetti, S.G., Lam, P., Mark, N.C., 1990. Mean reversion in equilibrium asset prices. American Economic Review 80, 398}418. Clark, P.C., 1973. A subordinated stochastic process model with "nite variance for speculative prices. Econometrica 41, 135}155. Cox, D.R., Miller, H.D., 1965. The Theory of Stochastic Processes. Chapman & Hall, New York. Diebold, F., 1986. Comment on modelling the persistence of conditional variance. Econometric Reviews 5, 51}56. Engel, C., Hamilton, J.D., 1990. Long swings in the dollar: are they in the data and do markets know it? American Economic Review 89, 689}713. Engle, R.F., 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. in#ation. Econometrica 50, 987}1008. Granger, C.W.J., Orr, D., 1972. In"nite variance and research strategy in time series analysis. Journal of American Statistical Association 67, 275}285. Gray, S.F., 1996. Modeling the conditional distribution of interest rates as a regime switching process. Journal of Financial Economics 42, 27}62. Hamilton, J.D., 1988. Rational expectations econometric analysis of changes in regime. An investigation of the term structure of interest rates. Journal of Economic Dynamics and Control 12, 385}423. Hamilton, J.D., 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57, 357}84. Hamilton, J.D., Susmel, R., 1994. Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics 64, 307}333. Hendry, D., Neale, A.J., 1991. A Monte Carlo study of the e!ects of structural breaks on tests for unit roots. In Huckl, P. Westlund, A. (Eds.), Economic Structural Change. Analysis and Forecasting. Springer, Berlin, pp. 95}119. Lamoureux, C.G., Lastrapes, W.D., 1990. Persistence in variance, structural change and the GARCH model. Journal of Business and Economic Statistics 8, 225}234. Lumsdaine, R.L., Ng, S., 1997. Testing for ARCH in the presence of a possibly misspeci"ed conditional mean. Working paper, Princeton University and Boston College. Neftci, S.N., 1984. Are economic time series asymmetric over the business cycle? Journal of Political Economy 92, 307}328. Pagan, A., Schwert, G.W., 1990. Alternative models for conditional stock volatility. Journal of Econometrics 45, 267}90. Perron, P., 1989. The great crash, the oil price shock and the unit root hypothesis. Econometrica 57, 1361}1401. Perron, P., 1990. Testing for a unit root in a time series with a changing mean. Journal of Business and Economic Statistics 8, 153}162. Turner, C.M., Startz, R., Nelson, C.R., 1989. A Markov model of heteroskedasticity, risk, and learning in the stock market. Journal of Financial Economics 25, 3}22.