Available online at www.sciencedirect.com
Mathematics and Computers in Simulation 93 (2013) 19–28
Original article
On a constrained mixture vector autoregressive model C.S. Wong ∗ Department of Finance, The Chinese University of Hong Kong, Hong Kong, China Received 26 July 2012; received in revised form 20 January 2013; accepted 3 May 2013 Available online 13 May 2013
Abstract A mixture vector autoregressive model has recently been introduced to the literature. Although this model is a promising candidate for nonlinear multiple time series modeling, high dimensionality of the parameters and lack of method for computing the standard errors of estimates limit its application to real data. The contribution of this paper is threefold. First, a form of parameter constraints is introduced with an efficient EM algorithm for estimation. Second, an accurate method for computing standard errors is presented for the model with and without parameter constraints. Lastly, a hypothesis-testing approach based on likelihood ratio tests is proposed, which aids in the selection of unnecessary parameters and leads to the greater efficiency at the estimation. A case study employing U.S. Treasury constant maturity rates illustrates the applicability of the mixture vector autoregressive model with parameter constraints, and the importance of using a reliable method to compute standard errors. © 2013 IMACS. Published by Elsevier B.V. All rights reserved. Keywords: EM algorithm; Interest rate; Likelihood ratio test; Non-linear time-series model
1. Introduction Over the past decade, several new time series models that generalize the idea of finite mixture distribution [11] to the context of practical nonlinear time series model building have been proposed. Le et al. [6] introduced Gaussian mixture transition distribution (GMTD) models to capture the flat stretches, bursts and outliers in univariate time series. The mixture autoregressive (MAR) models introduced by Wong and Li [15] can be considered as a generalization of the GMTD models. MAR models have several advantages over other nonlinear time series models. First, the mixture of a nonstationary autoregressive (AR) component with a stationary AR component can result in a stationary process. Second, the conditional distributions of time series given the past history are changing over time. Lastly, MAR models can capture conditional heteroscedasticity [4], which is common in some financial time series. Several extensions of these models have been proposed, i.e., including conditional heteroscedasticity in the components [17], allowing the mixing proportion to be changing over time [16], and replacing the Gaussian assumption with the Student t-assumption [13,14]. Fong et al. [5] recently generalized MAR models to mixture vector autoregressive (MVAR) models for multiple time series modeling. These models consist of a mixture of K vector autoregressive (VAR) models. These researchers gave the first- and second-order stationarity conditions and proposed an EM algorithm for estimation. Similarly to the univariate case, a stationary MVAR model may consist of a mixture of stationary and nonstationary VAR components. ∗
Tel.: +852 3943 7648; fax: +852 2603 6586. E-mail address:
[email protected]
0378-4754/$36.00 © 2013 IMACS. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.matcom.2013.05.001
20
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
Although the MVAR model is a promising candidate for nonlinear multiple time series modeling, their methodology has two limitations. First, they presented no method for computing or approximating the standard errors of the estimates. The approximation of the standard errors based on numerical procedures could be seriously inaccurate which may be due to the irregularity of the log-likelihood surface for the EM algorithm, such as the existence of local maxima and flat likelihood surface. Reliance on numerical procedures to obtain standard errors may thus result in false conclusions. Second, as a multivariate model, the MVAR model contains a large number of parameters in its specification. Estimation efficiency for short time series may be poor. Thus, it is important to consider procedures that eliminate unnecessary parameters and estimate the models with parameter constraints. In this paper, we introduce a form of parameter constraints for MVAR models that is similar to that presented for VAR models in Lütkepohl [8]. We call MVAR models with and without parameter constraints constrained and full MVAR models, respectively. We derive an EM algorithm for estimation in the constrained case. The standard errors of the estimates for both the full and constrained MVAR models are computed using Louis’ [7] method. In addition to the criterion-based model selection approach, we also propose a hypothesis-testing approach based on likelihood ratio tests for identifying the unnecessary parameters in MVAR models. Extensive simulation studies are conducted to verify the applicability of our approaches, especially in the case of short time series. Our proposed methodology is applied to the same dataset considered by Fong et al. [5] and Tsay [12], which demonstrates the usefulness of constrained estimation and the importance of approximating the standard errors accurately in MVAR modeling. 2. The mixture vector autoregressive model The K-component MVAR model for n-dimensional vector Y t = (Yt,1 , . . ., Yt,n ) , denoted by MVAR(n, K ; p1 , . . ., pK ), is defined as F ( Y t | Ft−1 ) =
K k=1
−1/2 αk N Σ k (Y t − Φk0 − Φk1 Y t−1 − · · · − Φkpk Y t−pk ) .
(1)
Here, F (Y t |Ft−1 ) is the conditional cumulative distribution function of Y t , given past information; Ft is the information set up to time t; N(·) is the conditional cumulative distribution function of the standard multivariate Gaussian distribution; Φk0 is an n-dimensional vector, and Φki (i = 1, . . ., pk ) are the n × n autoregressive coefficient matrices; Σ k is the n × n positive definite variance–covariance matrix for the kth component; α1 + · · · + αK = 1; and αk > 0 (k = 1, . . ., K) . Let p = max(p1 , . . ., pK ). We write Si1 i2 for the (i1 , i2 ) element of matrix S and ei for the ith element of the vector e. We also write I n for the n × n identity matrix unless otherwise stated. The determinant of matrix S is written as |S|. The vectorization operator, vec, is defined as a column vector created from a matrix, say A, by stacking the column vectors of A one after another, and the Kronecker product A ⊗ B is defined as (Aij B). The half-vectorization operator, vech, stacks the column vectors in the lower triangle of a matrix. Every n × n symmetric matrix S can be recovered from vech S, and there exists a unique n2 × n(n + 1)/2 duplication matrix Dn such that vec S = Dn vech S for any such S. As an example, if S = (Sij ) is a 2 × 2 matrix, then vech S = (S11 S21 S22 ) and vec S = (S11 S21 S12 S22 ) . Here, the transpose of a vector or matrix is denoted by the symbol . See Magnus and Neudecker [9] for the relationships between the various vector and matrix operators. 2.1. Estimation of full model Suppose that the observations Y 1 , . . ., Y T , are generated from the MVAR model (1), which is referred to as the full MVAR model hereafter. Define α = (α1 , . . ., αK−1 ) , θ k = vec Θk where Θk = (Φk0 , Φk1 , . . . , Φkpk ) (k = 1, . . ., K), ωk = vech Σ k (k = 1, . . ., K), and the vector of the parameters is defined as θ = (α , θ 1 , ω1 , . . . , θ K , ωK ) . Let Zt = (Zt,1 , . . ., Zt,K ) (t = 1, . . ., T) be the unobservable random vectors that indicate from which component Y t evolves, where Zt,k = 1 if Y t evolves from the kth component and Zt,k = 0 otherwise. Given Zt , the log-likelihood of the complete data (Y t , Zt ) conditional on the first p observations is K T K K 1 1 −1 = Zt,k log αk − Zt,k log |Σ k | − Zt,k ekt Σ k ekt , (2) 2 2 t=p+1
k=1
k=1
k=1
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
21
where ekt = Y t − Θk Xkt and Xkt = (1, Y t−1 , . . . , Y t−pk ) . The first derivatives of the log-likelihood function (2) with respect to θ are given as follows. T Zt,k ∂ Zt,K (k = 1, . . . , K − 1), = − (3) ∂αk αk αK t=p+1
T ∂ = Zt,k Σ −1 k ekt Xkt ∂Θk
(k = 1, . . . , K)
(4)
t=p+1
T
∂ 1 −1 −1 = Zt,k Σ −1 e e Σ − Σ kt kt k k k ∂Σ k 2
(k = 1, . . . , K).
(5)
t=p+1
The iterative EM algorithm for estimation can be found in Fong et al. [5]. 2.2. Estimation of constrained model A form of linear constraints can be imposed on the parameters of the MVAR model (1). We use the form similar to the constraints imposed on the VAR model by Lütkepohl [8]. For k = 1, . . ., K, let mk be the number of free parameters (r) in the autoregressive coefficient matrices of component k, and θ k be the mk -vector of free parameters. The constraints under consideration are (r)
θ k = vec Θk = Rk θ k + vec rk
(k = 1, . . . , K),
(6)
where Rk s are known n(npk + 1) × mk matrices and rk s are known constant matrices with the same dimensions as Θk . We refer to the MVAR model (1) together with the constraints (6) as the constrained MVAR model. For each set of constraints imposed by mk , Rk , and rk (k = 1, . . ., K), there exists a {(K − 1) + Kn(npk + 1) + Kn(n + 1)/2} × {(K − 1) + K k=1 mk + Kn(n + 1)/2} matrix R and a {(K − 1) + Kn(npk + 1) + Kn(n + 1)/2}-vector r such that θ = Rθ (r) + r,
(7)
where θ (r) = (α , θ 1 , ω1 , . . . , θ K , ωK ) . As an example, suppose that the first component of an MVAR(2,2;1,1) model is constrained to have an upper triangular autoregressive matrix (i.e., upper triangular Φ11 ). We then have m1 = 5; m2 = 6; R1 is a 6 × 5 matrix with (r) R1,11 = R1,22 = R1,33 = R1,54 = R1,65 = 1 and R1,ij = 0 otherwise; R2 = I 6 ; θ 1 = (Φ10,1 , Φ10,2 , Φ11,11 , Φ11,12 , Φ11,22 ) ; (r) θ 2 = θ 2 ; r1 = 0 ; r2 = 0; R is a block diagonal matrix with the elements or matrices 1, R1 , I 3 , R2 , I 3 on the diagonal; and r = 0. The conditional log-likelihood of the complete data (Y t , Zt ) for the constrained model is the same as in (2), except that it is being defined for the new vector of parameters θ (r) . The first derivatives with respect to α and Σ k are given by (3) and (5), and for k = 1, . . ., K, (r)
∂ (r)
∂θ k
=
T t=p+1
(r)
(r) −1 − X R . Zt,k Rk vec Σ −1 (Y − r X )X R X ⊗ Σ θ t k kt kt k kt k kt k k k
(8)
The log-likelihood function is maximized through an iterative EM procedure [3] in which the E step and the M step are iterated until the likelihood converges. E step. The missing data Zt are replaced by their expectations, conditional on parameter θ and on observed data Y 1 , . . ., Y T . The conditional expectation of the kth component of Zt is the conditional probability that observation Y t comes from the kth component of the mixture distribution, conditional on θ and Y 1 , . . ., Y T . Let τ t,k be the conditional expectation of the kth component of Zt . Then,
αk |Σ k |−1/2 exp − 21 ekt Σ −1 e kt k
(k = 1, . . . , K). τt,k = (9) K 1 − 21 exp − 2 ekt Σ −1 k=1 αk |Σ k | k ekt
22
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
M step. Suppose that the missing data are known. The estimates of parameters θ can be obtained by setting the first derivatives of to zero. Then, αˆ k =
T 1 τt,k T −p
(k = 1, . . . , K),
(10)
t=p+1
(r) θˆ k
⎫ ⎧ ⎡⎛ ⎞ ⎤ ⎫−1 ⎧ T T ⎬ ⎨ ⎨ ⎬ ⎦ Rk R = Rk ⎣⎝ τt,k Xkt Xkt ⎠ ⊗ Σ −1 τt,k vec Σ −1 k k (Y t − r k Xkt )Xkt ⎭ ⎭ ⎩ k ⎩ t=p+1
t=p+1
(k = 1, . . . , K), ⎛ ˆk =⎝ Σ
T
(11) ⎞−1
τt,k ⎠
t=p+1
T t=p+1
τt,k eˆ kt eˆ kt
(k = 1, . . . , K),
(12)
ˆ k Xkt . Due to the interdependence between θˆ (r) and Σ ˆ k , Eqs. (11) and (12) are iterated until where eˆ kt = Y t − Θ k convergence. 2.3. Observed information matrix The standard errors of the parameter estimates can be obtained by the missing information principle introduced by Louis [7]. The observed information matrix of the full model, I, can be computed from the complete information matrix, I c , and the missing information matrix, I m , with the relation ∂2 ∂ θ, I = Ic − Im = E − Y − var Y . (13) θ, ∂θ∂θ ∂θ θˆ θˆ For the constrained model, the observed information matrix, I (r) , can be computed from I with I (r) = R IR.
(14) (r)
The variance matrix of estimates θˆ (θˆ ) is given by the inverse of observed information matrix I (I (r) ). The variance K−1 K−1 ˆ k , αˆ l ). of estimate αˆ K = 1 − αˆ 1 − · · · − αˆ K−1 is given by k=1 var(αˆ k ) + k=1 K−1 l=1,l = / k cov(α 3. Model selection and hypothesis testing Model selection for MVAR models comprises several aspects. First, the number of components, K, is an important parameter to be determined. Second, we have to select the orders of the vector autoregressive components, pk s. Finally, we may need to omit some of the insignificant parameters in the autoregressive coefficient matrices. Note that for univariate mixture time series models, the last aspect of model selection is of less interest, as the efficiency gain in parameter estimation by dropping insignificant parameters is minimal. The testing and model selection problems for K are difficult to handle, as they correspond to testing problems with nuisance parameters, which do not exist under the null hypothesis [1,2]. In particular, the likelihood ratio statistic for a hypothesis such as Φ1i = Φ2i for all i = 0, . . ., p, does not have the standard χ2 null distribution. Fong et al. [5] suggested the use of the Bayesian information criterion (BIC) [10] to solve this problem. Through simulation experiments, they illustrated that the BIC exhibits satisfactory performance in the selection of K. In the context of the constrained estimation of MVAR models, the BIC is defined as −2∗ + log(T − p){ K k=1 mk + Kn(n + 2)/2 + K − 1}, where ∗ = Tt=p+1 log f (Y t |Ft−1 ) is the log-likelihood based on f (Y t |Ft−1 ), the first derivatives of (1). After the number of components, K, in the MVAR model has been selected, the BIC can be used to select parameters pk s, as well as the insignificant parameters to be dropped in the autoregressive coefficient matrices. Alternatively, classical likelihood ratio tests can be performed when the hypotheses are specified with embedded models, such as that a single parameter in an autoregressive coefficient matrix is zero and the autoregressive coefficient matrix at a
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
23
particular lag is zero. The likelihood ratio statistics are defined as −2(∗0 − ∗1 ) where ∗0 and ∗1 are the maximized log-likelihood * under the null and alternative hypotheses, respectively, and they should have the standard χ2 null distributions. The availability of the likelihood ratio test enhances the procedure for parameter selection and hence improves estimation efficiency by reducing the number of free parameters. Care must be taken, however, because two MVAR models with the same orders may not be embedded, as the model structures may be completely different. For example, a two-component MVAR model with α1 = α2 = 0.5 is considered to have a different structure to that of a two-component model with α1 = 0.95 and α2 = 0.05. We have performed a number of simulation experiments for model selection with the BIC. The results are similar to those in Fong et al. [5] and hence are not shown here. The performance of the BIC in the selection of K and pk s is satisfactory in the case of constrained estimation. The simulation results for assessing the validity of the likelihood ratio tests are given in the next section. A referee suggested that a Lagrange multiplier test should be a good alternative to a likelihood ratio test. In the case that a constrained MVAR model is preferable to its unconstrained counterpart, the Lagrange multiplier test requires estimation only of the constrained MVAR model rather than both (constrained and unconstrained) MVAR models, as in the likelihood ratio test. The likelihood ratio test and Lagrange multiplier test are asymptotically equivalent tests, although there might be small difference in the small-sample performance. It would be an interesting topic for future research to compare the performance of the two tests. 4. Simulation studies 4.1. Performance of constrained estimation and approximation of standard errors The performance of the constrained estimation and approximation of standard errors based on Louis’ method [7] is assessed through simulation experiments. In all of these simulation experiments, 5000 independent sample paths were generated from Model I with a length of 200. Model I is an MVAR(2,2;1,1) model with upper triangular autoregressive coefficient matrices with α1 = 0.3, α2 = 0.7, vech Σ 1 = (0.9, 0.7, 1.6) , vech Σ 2 = (1.2, 0.8, 0.7) , vec Φ10 = (0.1, 0.3) , vec Φ11 = (−0.7, 0.0, 0.6, − 0.9) , vec Φ20 = (− 0.2, 0.5) , and vec Φ21 = (0.2, 0.0, − 0.8, 0.6) . Three sets of parameter constraints are imposed in the estimation. In Set A, no parameter constraints are specified. In Set B, the autoregressive coefficient matrices are restricted to be upper triangular. Finally, in Set C, Φ21,22 = 0.6 in addition to the constraints in Set B. Note that all sets of parameter constraints correctly specify the model. The results are shown in Table 1. The results of the simulation studies reveal that the EM estimation procedure for constrained models has small biases and reasonable standard errors even with a small sample size of 200. Moreover, the standard errors based on Louis’ method match the empirical standard errors well, but with small downward biases. The effect of fewer model parameters on estimation can be observed by comparing the results for the three sets of constraints. The average absolute biases for all of the parameters excluding Φ11,21 , Φ21,21 , and Φ21,22 are 0.0168, 0.0125, and 0.0107 for Sets A, B, and C, respectively. The average ratios of the empirical standard errors to the standard errors based on Louis’ method for the same set of parameters are 1.0294, 1.0288, and 1.0281 for Sets A, B, and C, respectively. The biases in the parameter estimates and approximated standard errors are reduced with the number of model parameters. Moreover, the precision of the estimation for unconstrained parameters may be substantially improved by introducing correct ˆ 21,11 and Φ ˆ 21,12 are reduced by half when the constraint parameter constraints. For example, the standard errors of Φ in Set C is imposed. We repeat the simulation experiments with sample sizes of 500 and 1000. The results confirm that the biases in parameter estimation and approximated standard errors are reduced with a larger sample size. The results are available from the authors. 4.2. Performance of likelihood ratio tests The validity and power of the likelihood ratio tests are assessed through further extensive simulation experiments. The results of eight experiments are reported here. Experiments 1–6 assessed the empirical sizes of the tests, and experiments 7 and 8 assessed their empirical powers. In all of the simulation experiments, 10,000 independent sample paths were generated from Model I with lengths of 200, 500, and 1000. We compared the computed test statistics with the 10%, 5%, and 1% points of the χ2 distribution, with the degrees of freedom given by the difference in the number
24
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
Table 1 Simulation experiments assessing the performance of constrained estimation and approximation of standard errors. αk
k
Σ k,11
Model parameters 0.300 1 0.900 0.700 1.200 2 Set A: no constraints are imposed 1 Mean est.a 0.300 0.849 0.035 0.172 Emp. s.e.b The. s.e.c 0.035 0.166 2 0.700 1.172 Mean est. Emp. s.e. 0.035 0.150 0.035 0.147 The. s.e. Set B: Φ11,21 = Φ21,21 = 0 1 Mean est. 0.300 0.855 0.035 0.173 Emp. s.e. 0.035 0.168 The. s.e. 2 Mean est. 0.700 1.179 0.035 0.150 Emp. s.e. The. s.e. 0.035 0.148 Set C: Φ11,21 = Φ21,21 = 0 and Φ21,22 = 0.6 1 Mean est. 0.300 0.855 Emp. s.e. 0.035 0.173 0.035 0.168 The. s.e. 2 Mean est. 0.700 1.186 0.035 0.151 Emp. s.e. The. s.e. 0.035 0.149 a b c
Σ k,21
Σ k,22
Φk0,1
Φk0,2
Φk1,11
0.700 0.800
1.600 0.700
0.100 −0.200
0.300 0.500
−0.700 0.200
0.662 0.186 0.180
1.514 0.309 0.294
0.097 0.146 0.138
0.289 0.190 0.183
0.782 0.107 0.105
0.685 0.088 0.086
−0.195 0.103 0.102
0.675 0.188 0.183
1.544 0.311 0.299
0.788 0.108 0.106
Φk1,21
Φk1,12
Φk1,22
0.000 0.000
0.600 −0.800
−0.900 0.600
−0.700 0.073 0.070
−0.002 0.095 0.093
0.605 0.092 0.088
−0.889 0.125 0.118
0.505 0.079 0.078
0.198 0.052 0.051
−0.001 0.040 0.039
−0.807 0.067 0.065
0.594 0.051 0.050
0.098 0.146 0.138
0.291 0.188 0.183
−0.699 0.062 0.057
0.000 – –
0.604 0.091 0.087
−0.891 0.116 0.112
0.690 0.088 0.086
−0.195 0.103 0.102
0.505 0.078 0.078
0.199 0.026 0.025
0.000 – –
−0.807 0.065 0.063
0.595 0.048 0.047
0.674 0.188 0.183
1.543 0.311 0.299
0.098 0.146 0.138
0.292 0.188 0.183
−0.699 0.062 0.057
0.000 – –
0.604 0.090 0.087
−0.892 0.116 0.112
0.794 0.108 0.106
0.695 0.089 0.087
−0.198 0.098 0.097
0.502 0.073 0.073
0.199 0.026 0.025
0.000 – –
−0.801 0.033 0.032
0.600 – –
Mean est.: mean of the estimates. Emp. s.e.: standard deviation of the estimates. The. s.e.: mean of the standard errors based on Louis’ method.
of parameters under the null and alternative hypotheses. Selective results are shown in Table 2. The empirical sizes are close to the nominal sizes even for a small sample when the tests involve only a few parameters. When they involve a larger number of parameters, the empirical sizes may be much larger than the nominal sizes in the case of a small sample. However, the empirical sizes improve with larger sample sizes. As indicated by the last two experiments, the likelihood ratio tests have reasonable power. 5. Example: U.S. Treasury constant maturity rates The constrained MVAR modeling and the importance of approximating standard errors via Louis’ method are illustrated using monthly data on 1-year and 3-year U.S. Treasury constant maturity rates from April 1953 to January 2001. These data can be obtained from the Federal Reserve Bank of St. Louis at stlouisfed.org. Fong et al. [5] and Tsay [12] also investigated these data and drawn slightly different conclusions. Figs. 1 and 2 show the original and log-differenced time series, respectively. The original series are clearly nonstationary. The log-differenced series seem to be stationary with no large difference in variability over time despite the two large extreme values. Let Y t = (Yt,1 , Yt,2 ) be the vector of the log-differenced, or, equivalently, the percentage change in, 1-year and 3-year Treasury constant maturity rates. The log-differenced series have a length of 573. The log-differenced data exhibit heavy-tailedness in comparison with a bivariate normal distribution. The excess kurtoses are 5.34 and 1.77 for Yt,1 and Yt,2 , respectively. Models, which imply a marginal bivariate normal distribution, such as the VAR model, may not capture the features of this data well.
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
25
Table 2 Simulation experiments assessing the validity and power of the likelihood ratio tests. No.
d.f.a
Hypotheses
Tb
Theoretical size (in %)c 10
5
1
1
H0 : Φ11,21 = 0
1
200 500
10.94 10.46
5.62 5.25
1.25 0.84
2
H0 : Φ11,21 = Φ21,21 = 0
2
200 500
10.72 9.68
5.46 4.73
1.16 0.96
3
H0 : Φ11,21 = Φ21,21 = 0, Φ21,22 = 0.6
3
200 500
10.70 10.03
5.37 4.77
1.05 0.93
4
H0 : MVAR(2,2;1,1) vs. H1 : MVAR(2,2;2,1)
4
200 500 1000
14.37 11.57 10.99
7.57 5.78 5.57
2.03 1.23 1.23
5
H0 : MVAR(2,2;1,1) vs. H1 : MVAR(2,2;2,2)
8
200 500 1000
14.04 11.44 11.08
7.63 6.12 5.93
1.95 1.29 1.20
6
H0 : MVAR(2,2;1,1) vs. H1 : MVAR(2,2;3,3)
16
200 500 1000
17.11 11.44 10.79
9.57 6.12 5.34
2.52 1.29 0.93
7 8
H0 : Φ21,11 = 0 H0 : Φ21,11 = Φ21,21 = 0
1 2
200 200
98.31 100.00
96.30 100.00
88.04 100.00
15 10 5 0
Interest rate in %
1−year Interest Rate 3−year Interest Rate
0
100
200
300
400
500
Time
0.4
Fig. 1. U.S. 1-year and 3-year constant maturity rates.
0.0
0.2
1−year Interest Rate 3−year Interest Rate
−0.2
Log−differenced Interest Rate
c
d.f.: degrees of freedom in the null χ2 distribution. T: length of the simulated series. Empirical sizes in % are shown under the column for the theoretical sizes.
−0.4
a b
0
100
200
300
400
Time
Fig. 2. Log-differenced U.S. 1-year and 3-year constant maturity rates.
500
26
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
Table 3 Summary results for two-component full mixture vector autoregressive model. p1
p2
*
BIC
No. of parameters
Times selecteda
LRTb against MVAR(2,2;2,1)
p-Value
4 4 4 4 3 3 3 2 2 2 1
4 3 2 1 3 2 1 2 1 0 1
3340.18 3339.78 3337.01 3331.50 3336.97 3334.49 3330.79 3331.41 3327.65 3313.30 3313.14
−6407.6 −6432.1 −6452.0 −6466.3 −6451.9 −6472.3 −6490.3 −6491.5 −6509.4 −6506.1 −6505.7
43 39 35 31 35 31 27 27 23 19 19
16 10 10 4 25 15 50 24 49 21 50
25.06 24.26 18.72 7.70 18.64 13.68 6.28 7.52 – 28.70 29.02
0.199 0.084 0.096 0.463 0.098 0.090 0.179 0.111 – 0.000 0.000
a b
Times selected: number of times the model is selected out of 50 trials. LRT, likelihood ratio test.
We consider two- and three-component MVAR models with the maximum allowable order of four for the autoregressive components. The number of parameters in the most complicated models are 43 and 65 for the MVAR(2,2;4,4) and MVAR(2,3;4,4,4) models, respectively. We start our selection process with the value of pk s. The initial values for the EM estimation are αk = 1/K and Σ k = I 2 (k = 1, . . ., K), and random numbers are generated from the uniform distribution on the interval [− 0.8, 0.8] for the parameters in θ k (k = 1, . . ., K). For each model, 50 sets of estimation results are obtained with different initial values. The estimation results based on different initial values can differ substantially due to the presence of local maxima in the log-likelihood. Table 3 shows the results for two-component models with the values of the log-likelihood and BIC and the number of times of the chosen models appear in the 50 trials. The results in Table 3 are slightly different from those in Fong et al. [5] due to the large number of trials used in searching for the best models. The estimation results for all of the models in this table are available from the authors. The models in Table 3 are quite similar in the sense that their main characteristics do not change substantially. For example, the estimated mixing proportion parameters for component 1, αˆ 1 , all lie on the interval (0.355, 0.424). Moreover, the estimated parameters in the two variance–covariance matrices are all similar in magnitude. Because the lower-order models can be considered to be embedded in the higher-order models, likelihood ratio tests can be performed. The results are shown in Table 3 for reference. Note that it is important that a sufficiently large number of initial values be used for complicated models, as the best model may be selected only as little as 8% of the time in our example. The best two-component model is the MVAR(2,2;2,1) model with a BIC of −6509.4. The likelihood ratio tests confirm that this model is the most parsimonious one that is still statistically acceptable. As three-component models are more complicated, special attention is required in model selection to avoid overfitting. For example, the MVAR(2,3;4,4,4) model with the smallest BIC contains a component with a mixing proportion close to 0.05, which represents around 28 data points, but with 22 parameters. After examining all possible threecomponent models, the MVAR(2,3;2,2,1) model with a BIC of −6502.9 is selected. As the MVAR(2,2;2,1) model has a smaller BIC, we examine it in greater detail. The full MVAR(2,2;2,1) model is shown in Table 4. The parameters in the autoregressive matrices with small absolute estimate to standard error ratios are dropped sequentially starting with the off-diagonal elements. The new models are estimated with the methodology employed in Sections 2.2 and 2.3. Likelihood ratio tests are used to check whether the dropped parameters should be retained in the models. Based on this strategy, the sequence of parameters dropped is Φ21,21 , Φ12,21 , Φ12,12 , Φ11,22 , and Φ11,12 . The estimated constrained MVAR(2,2;2,1) model with 18 parameters is shown in Table 5. The values of the log-likelihood and BIC are 3325.04 and −6535.9, respectively. A likelihood ratio test against the full MVAR(2,2;2,1) model gives a statistic of 5.22 with a p-value of 0.390, which indicates that the constrained model is preferable. In the constrained MVAR(2,2;2,1) model, the log-differenced 1-year and 3-year interest rates are modeled with ˆ 1,11 = two VAR components with different degrees of variability. The first component consists of larger variances (Σ ˆ 0.00749 and Σ1,22 = 0.00385) and a mixing proportion of 0.358. In this component, the 3-year interest rate depends on the past values of the 1-year interest rate but not vice versa. Changes in the short-term interest rate have some
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
27
Table 4 Estimated full MVAR(2,2;2,1) model. Parameter
α1
Σ 1,11
Σ 1,12
Φ10,1
Estimate Standard error
0.375 0.051
0.00708 0.00089
0.00451 0.00057
Σ 1,21 0.00451 0.00057
Parameter Estimate Standard error
Φ11,11
Φ11,12
Φ12,11
Φ12,12
−0.003 0.006
0.359 0.181
0.476 0.242
−0.056 0.174
−0.203 0.239
Σ 1,22 0.00367 0.00043
Φ10,2 −0.001 0.005
Φ11,21 0.290 0.132
Φ11,22 0.295 0.177
Φ12,21 0.092 0.126
Φ12,22 −0.469 0.176
Φ20,1
Φ21,11
Φ21,12
Parameter
α2
Σ 2,11
Σ 2,12
Estimate Standard error
0.625 0.051
0.00128 0.00018
0.00114 0.00014
0.003 0.002
0.031 0.108
0.256 0.144
Σ 2,21 0.00114 0.00014
Σ 2,22 0.00117 0.00013
Φ20,2 0.001 0.002
Φ21,21 −0.065 0.103
Φ21,22 0.347 0.131
Parameter Estimate Standard error
influence on the long-term interest rate in a volatile interest rate market. For the second component, with a mixing ˆ 2,11 = 0.00132 and Σ ˆ 2,22 = 0.00119. In this component, the proportion of 0.642, the variances are smaller, with Σ 1-year interest rate depends on the past values of the 3-year interest rate but not vice versa. Changes in the long-term interest rate have some influence on the short-term interest rate in a peaceful interest rate market. Our conclusion is somewhat different from that of Tsay [12], who suggested that the 3-year interest rate does not depend on the past values of the 1-year interest rate. His conclusion is based on VAR modeling, which may be inappropriate as the log-differenced interest rate exhibits heavy-tailedness. Fong et al. [5] suggested that the 3-year interest rate depends on the past values of the 1-year interest rate but not vice versa in the first component, and that the two interest rate series are mutually correlated in the second component, based on a full MVAR(2,2;3,4) model. However, they computed the standard errors of the estimates based on a numerical procedure that may be inappropriate because of the irregularity in the log-likelihood surface. For example, the estimates and standard errors for vec Φ24 are (− 0.169, − 0.090, 0.196, 0.096) and (0.012, 0.015, 0.030, 0.022) , respectively in their estimated model [5]. Using Louis’ [7] method which is presented in Section 2.3, the standard errors are (0.102, 0.093, 0.118, 0.113) , which implies ˆ 24 has a large estimate to standard error ratio. This finding illustrates the importance that none of the parameters in vec Φ of using Louis’ method to compute the standard errors after EM estimation.
Table 5 Estimated constrained MVAR(2,2;2,1) model. Parameter
α1
Σ 1,11
Σ 1,12
Φ10,1
Φ11,11
Φ11,12
Φ12,11
Φ12,12
Estimate Standard error
0.358 0.051
0.00749 0.00096
0.00477 0.00061
−0.003 0.007
0.668 0.084
0.000 –
−0.189 0.066
0.000 –
Σ 1,21 0.00477 0.00061
Σ 1,22 0.00385 0.00046
Φ10,2 −0.001 0.005
Φ11,21 0.479 0.062
Φ11,22 0.000 –
Φ12,21 0.000 –
Φ12,22 −0.338 0.070
Φ20,1
Φ21,11
Φ21,12
Parameter Estimate Standard error
Parameter
α2
Σ 2,11
Σ 2,12
Estimate Standard error
0.642 0.051
0.00132 0.00018
0.00117 0.00014
0.003 0.002
0.082 0.036
0.229 0.083
Σ 2,21 0.00117 0.00014
Σ 2,22 0.00119 0.00013
Φ20,2 0.001 0.002
Φ21,21 0.000 –
Φ21,22 0.296 0.058
Parameter Estimate Standard error
28
C.S. Wong / Mathematics and Computers in Simulation 93 (2013) 19–28
Acknowledgments An earlier version of the paper was presented at the MODSIM 2011, 19th International Congress on Modelling and Simulation, Perth, Australia. The author appreciates the comments received from the session chair and conference participants which have been included in this fully revised version of the original conference paper. The author also wishes to thank Michael McAleer and two anonymous referees for their valuable comments and suggestions. This work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Competitive Earmarked Research Grant Project No. CUHK4733/05H). References [1] R.B. Davies, Hypothesis testing when a nuisance parameter is present only under the alternative, Biometrika 64 (1977) 247–254. [2] R.B. Davies, Hypothesis testing when a nuisance parameter is present only under the alternative, Biometrika 74 (1987) 33–43. [3] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society B 39 (1977) 1–38. [4] R.F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50 (1982) 987–1007. [5] P. Fong, W.K. Li, C.W. Yau, C.S. Wong, On a mixture vector autoregressive model, The Canadian Journal of Statistics 35 (2007) 135–150. [6] N.D. Le, R.D. Martin, A.E. Raftery, Modelling flat stretches, bursts, and outliers in time series using mixture transition distribution models, Journal of the American Statistical Association 91 (1996) 1504–1514. [7] T.A. Louis, Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society B 44 (1982) 226–233. [8] H. Lütkepohl, New Introduction to Multiple Time Series Analysis, Springer, Berlin, 2006. [9] J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, Wiley, Chichester, 1988. [10] G. Schwarz, Estimating the dimension of a model, Annals of Statistics 6 (1978) 461–464. [11] D.M. Titterington, A.F.M. Smith, U.E. Makov, Statistical Analysis of Finite Mixture Distributions, Wiley, New York, 1985. [12] R.S. Tsay, Analysis of Financial Time Series, Wiley, New York, 2005. [13] C.S. Wong, Modeling Hong Kong’s stock index with the student t-mixture autoregressive model, Mathematics and Computers in Simulation 81 (2011) 1334–1343. [14] C.S. Wong, W.S. Chan, P.L. Kam, A Student t-mixture autoregressive model with applications to heavy-tailed financial data, Biometrika 96 (2009) 751–760. [15] C.S. Wong, W.K. Li, On a mixture autoregressive model, Journal of the Royal Statistical Society B 62 (2000) 95–115. [16] C.S. Wong, W.K. Li, On a logistic mixture autoregressive model, Biometrika 88 (2001) 833–846. [17] C.S. Wong, W.K. Li, On a mixture autoregressive conditional heteroscedastic model, Journal of the American Statistical Association 96 (2001) 982–995.