Available online at www.sciencedirect.com
Mathematics and Computers in Simulation 79 (2009) 2556–2565
On pseudo maximum likelihood estimation for multivariate time series models with conditional heteroskedasticity Shuangzhe Liu a,∗ , Heinz Neudecker b b
a School of Information Sciences and Engineering, University of Canberra, Canberra, ACT 2601, Australia School of Economics and Business, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands
Available online 24 December 2008
Abstract We consider a general multivariate conditional heteroskedastic model under a conditional distribution that is not necessarily normal. This model contains autoregressive conditional heteroskedastic (ARCH) models as a special class. We use the pseudo maximum likelihood estimation method and derive a new estimator of the asymptotic variance matrix for the pseudo maximum likelihood estimator. We also study four special cases in this class, which are conditional heteroskedastic autoregressive movingaverage models, regression models with ARCH errors, models with constant conditional correlations, and ARCH in mean models. © 2008 IMACS. Published by Elsevier B.V. All rights reserved. JEL classification: C13; C32 Keywords: CHARMA; R-ARCH; CCC; ARCH-M; Asymptotic variance matrix
1. Introduction Autoregressive conditional heteroskedastic (ARCH) models and statistical inference on them have received considerable attention. Initial research with applications in economics and finance was done by Engle [4], and further work by Bollerslev [2] and Engle et al. [5]. A handbook survey of volatility models can be found in Palm [18]. The specific-to-general methodological approach to illustrate a number of important developments in the modelling of univariate and multivariate financial volatility is used, and 20 issues in the specification, estimation, and testing of conditional and stochastic volatility models for automated inference are discussed, by McAleer [16]. A collection of papers on these models and applications to the pricing of derivatives can be found in Jarrow [10]. Several ARCH models and applications are discussed within the framework of financial econometrics; see, e.g. Kariya [11], Campbell et al. [3], Rachev and Mittnik [20], Gouriéroux and Jasiak [7], Hall and Yao [8], Liu [14], Poon [19], Straumann [21] and Tsay [22]. In a comprehensive monograph on ARCH models and applications, Gouriéroux [6] advocates the pseudo maximum likelihood (PML) estimation method for estimating a conditional heteroskedastic time series model with conditional distribution not necessarily normal. He gives two key matrices for expressing the asymptotic variance in the univariate case. In a recent survey of theoretical results for time series models with generalised ARCH (GARCH) errors, Li et al. [12] discussed the PLM estimation (quasi-maximum likelihood estimation as their term) for a variety of ARMA-GARCH models. They stated it can be shown that the PML estimator of the parameter vector of the conditional ∗
Corresponding author. E-mail addresses:
[email protected] (S. Liu),
[email protected] (H. Neudecker).
0378-4754/$36.00 © 2008 IMACS. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.matcom.2008.12.008
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
2557
mean in the ARMA-GARCH model under normality, i.e. the maximum likelihood estimator, is more efficient than the least squares estimator. The PML estimation method is also discussed by Arminger [1] in a different setting. It is in a way related to the quasi-likelihood estimation theory as developed by Heyde [9]. See also Straumann [21]. In this paper, we make a further study mainly based on Gouriéroux [6], Li et al. [12] and McAleer [16]. In the line of Gouriéroux’s argumentation, we consider a unified approach to PML estimation of a general multivariate conditional heteroskedastic time series model in which the distribution of the disturbance vector is not necessarily conditionally normal. A feature of this model is that it contains several ARCH models as a special class, including the following: conditional heteroskedastic autoregressive moving-average models (CHARMA), regression models with ARCH errors (R-ARCH), models with constant conditional correlations (CCC), and ARCH in mean models (ARCH-M). Our main result is a new estimator of the asymptotic variance matrix of the PML estimator, and is an important contribution, especially from our statistical point of view. We obtain it by applying matrix differential calculus as developed by Magnus and Neudecker [15]. The structure of the paper is as follows. In Section 2, we introduce the model and the PML method. In Section 3, we consider the first order maximum conditions. In Section 4, we derive the asymptotic variance matrix. In Section 5, we give results for the CHARMA, R-ARCH, CCC and ARCH-M models. We present concluding remarks in Section 6. 2. Model and pseudo maximum likelihood estimation Statistical analysis of multivariate conditional heteroskedastic time series models can be based fruitfully on the PML estimation method, under the assumption of a conditionally normal distribution. Consider y t = μt + u t ,
t = 1, . . . , T,
(1)
where ut is an N × 1 disturbance vector with N × 1 conditional mean vector E0 (ut |Ct−1 ) = 0 and N × N positive definite conditional variance matrix Ht = E0 (ut ut |Ct−1 ) > 0, where E0 indicates the expectation taken with respect to the unknown true distribution (which has a conditional distribution not necessarily normal) as used in Gouriéroux [6], Ct−1 indicates the information set available at time (t − 1), yt is an N × 1 vector of observable variables, μt = E0 (yt |Ct−1 ) is an N × 1 conditional mean vector, and μt = μt (θ) and Ht = Ht (θ) are functions of θ, a p × 1 vector of ‘fundamental’ parameters (with restrictions in such a way that, e.g. Ht > 0). Note that (1) is our general model. When we study a particular model or use a specific estimation method, we shall make accordingly further specifications; e.g. we assume (25)–(28) to follow to use PML. Although the conditional distribution of yt given Ct−1 is not necessarily normal, we shall use the assumption of a conditionally normal distribution to build the following pseudo log-likelihood function L = L(y, θ) =
T
Lt ,
(2)
t=1
where 1 1 Lt = − log |Ht | − ut Ht−1 ut 2 2
(3)
is the conditional log-likelihood function associated with yt , t = 1, . . . , T . Properly speaking, θ in (2) is a mathematical variable and is in fact different from the unknown true parameter θ as introduced in (1). This also applies to μt (θ) and Ht (θ). However, we shall not always make this distinction in the following computations for expository reasons. It is usually clear from the context how to read θ. When necessary we shall denote the (unknown) parameter by θ0 . We define the pseudo maximum likelihood estimator as a maximizer of L in (2) and denote it as θˆ T . This estimator θˆ T is used even when the true underlying distribution is not conditionally normal, i.e. (3) does not hold for the underlying distribution (when the underlying distribution is conditionally normal, θˆ T becomes the g enuine maximum likelihood estimator). The properties of θˆ T depend on the (unknown) true underlying distribution and on the distribution used to compute the likelihood function (here the normal distribution). It is, however, claimed that under standard regularity conditions, θˆ T asymptotically exists and is consistent and asymptotically normal even if the underlying distribution is not conditionally normal; see (8) to follow. This means that this property does not depend on the distribution which
2558
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
is used to build the likelihood function; see, e.g. Gouriéroux ([6], p. 44) and Ling and Li [13]. For comments on computation of PML estimators, see Arminger [1]. Let us define gt =
∂Lt (y, θ) , ∂θ
(4)
Ft =
∂2 Lt (y, θ) , ∂θ∂θ
(5)
A = E0 (gt gt ),
(6)
B = −E0 (Ft ),
(7)
where gt = gt (θ) is a p × 1 vector and Ft = Ft (θ) is a (symmetric) p × p matrix. A = A(θ0 ) and B = B(θ0 ) are assumed to be p × p positive definite matrices. We can obtain θˆ T (analytically or numerically; see (20)–(21) to follow) with the following property; see Gouriéroux [6]: d T 1/2 (θˆ T − θ0 )→N(0, Ω);
(8)
Ω = B−1 AB−1
(9)
is the (positive definite) asymptotic variance matrix and the two key matrices A and B are as given in (6) and (7). In general, A and B differ. But if the true distribution is conditionally normal, A and B are identical. In practice, A and B may be consistently estimated as follows: ÂT = T −1
T
gt (θˆ T )gt (θˆ T ),
(10)
t=1
ˆ T = −T −1 B
T
Ft (θˆ T ),
(11)
t=1
where gt (θˆ T ) and Ft (θˆ T ) are gt (θ) and Ft (θ) both evaluated at θˆ T . Then Ω can be estimated by ˆT =B ˆ T−1 ÂT B ˆ T−1 . Ω
(12)
Gouriéroux ([6], (ii) and (iii) in Appendix 4.1) gives expressions for the conditional moments related to A and B in the univariate case where Ht is a scalar (N = 1), viz. Et−1 (gt gt ) and Et−1 (Ft ), to be introduced and discussed in Section 4 to follow. The conditional moment related to A depends on the conditional third and fourth moments of yt . Gouriéroux also gives (8) through (12) in the multivariate case (N > 1), and a consistent estimator of B with explicit conditional moments; see his (6.23). We establish an expression for the conditional moments related to A and rederive one related to B. We use the matrix differential techniques of Magnus and Neudecker [15] to derive dL = dθ L and d 2 L = dθ2 L, which are the first and second differentials of L with respect to θ. We take dL = 0 to establish the first order conditions. We derive the expressions for A and B by using the following equations: E0 (dLt )2 = (dθ) Adθ,
(13)
−E0 (d 2 Lt ) = (dθ) Bdθ.
(14)
These equations hold because
∂Lt (dθ) Adθ = (dθ) = E0 (dθ) = E0 (dθ) dθ = E0 (dLt )2 , ∂θ ∂θ 2 ∂ Lt (dθ) Bdθ = −(dθ) E0 (Ft )dθ = −E0 (dθ) Ft dθ = −E0 (dθ) dθ = −E0 (d 2 Lt ), ∂θ∂θ
E0 (gt gt )dθ
where dθ is nonrandom and Lt is a scalar.
gt gt dθ
∂Lt
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
2559
3. First order conditions For model (1), we derive the differential dLt of Lt with respect to θ (via Ht and μt ) as follows: 1 1 1 1 dLt = − d log |Ht | − trd(Ht−1 ut ut ) = − tr(Ht−1 dHt ) + tr(Ht−1 ut ut Ht−1 dHt ) + ut Ht−1 dμt 2 2 2 2 ∂μt 1 ∂v(Ht ) = [v(ut ut − Ht )] D (Ht−1 ⊗ Ht−1 )D dθ + ut Ht−1 dθ, 2 ∂θ ∂θ
(15)
where v denotes the vectorization operator that eliminates all supradiagonal elements of the matrix, v(Ht ) is an n × 1 vector with n = N(N + 1)/2, ⊗ denotes the Kronecker product, and D is the N 2 × n duplication matrix with property Dv(Ht ) = vecHt where vec denotes the vectorization operator that stacks the columns of the matrix, and vecHt is an N 2 × 1 vector (clearly N 2 > n, as N > 1). Note that v(Ht ) contains the distinct elements of Ht as Ht is symmetric. For more properties on v, vec, ⊗ and D, see Magnus and Neudecker [15]. From (15), we obtain the first order condition: g(θ) =
T t=1
T T ∂μt −1 1 ∂v(Ht ) −1 −1 gt = D (Ht ⊗ Ht )Dv(ut ut − Ht ) + Ht ut = 0. 2 ∂θ ∂θ t=1
(16)
t=1
Actually, (16) is equivalent to ∂L/∂θ in (6.21) of Gouriéroux [6]. We also refer to the theorems on the nonlinear regression model (with underlying normal errors) in Magnus and Neudecker ([15], Chapter 15). For some applications we assume the following partition of θ:
θ = (α , β , γ )
(17)
μt = μt (α, β),
(18)
Ht = Ht (β, γ),
(19)
with
where α is a k × 1 vector involved only in the mean, β is an l × 1 vector common to both mean and variance, γ is an m × 1 vector specific for the variance, and θ is a p × 1 vector (p = k + l + m). Based on (16) though (18), the first order conditions for the estimators of α0 , β0 and γ0 are T ∂L ∂μt −1 = Ht ut = 0, ∂α ∂α
(20)
T ∂L 1 ∂v(Ht ) −1 ∂μt −1 −1 = D (H ⊗ H )Dv(u u − H ) + Ht ut = 0, t t t t t ∂β 2 ∂β ∂β
(21)
T ∂L 1 ∂v(Ht ) −1 = D (Ht ⊗ Ht−1 )Dv(ut ut − Ht ) = 0, ∂γ 2 ∂γ
(22)
t=1
t=1
t=1
with L as defined in (2). If (20) through (21) can be solved explicitly, then we obtain an analytical expression for the PML estimator θˆ T . Otherwise, they have to be solved numerically via, e.g. the method of scoring or the BHHH method; see, e.g. Engle [4] and Bollerslev [2]. 4. Asymptotic variance matrix We use (9), (13) and (14) to derive the asymptotic variance matrix for model (1). For simplicity let Et−1 denote the conditional expectation given the past values computed with respect to the true distribution, i.e. Et−1 (gt gt ) =
2560
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
E0 (gt gt |Ct−1 ) and Et−1 (Ft ) = E0 (Ft |Ct−1 ), as used in Gouriéroux [6]. We use the law of iterated expectation, see, e.g. Engle [4] and Gouriéroux ([6], Section 6.3.2), to find E0 (gt gt ) = E0 [E0 (gt gt |Ct−1 )] = E0 [Et−1 (gt gt )],
(23)
E0 (Ft ) = E0 [E0 (Ft |Ct−1 )] = E0 [Et−1 (Ft )],
(24)
where gt and Ft are the same as in (4) and (5). To implement the PML estimation approach for model (1), we assume that the third and fourth moments exist and we specify Et−1 (ut ) = 0,
(25)
Et−1 (ut ut ) = Ht ,
(26)
Et−1 (ut ut
(27)
⊗ ut )
= Vt ,
Et−1 (ut ut ⊗ ut ut ) = Wt ,
(28)
where the first two moments in (25) and (26) were introduced already, the second two moments in (27) and (28) may be considered as a multivariate version of skewness and kurtosis, ut is of order N × 1, Vt is of order N × N 2 and Wt is of order N 2 × N 2 . For detailed discussions on model assumptions and statistical properties, see, e.g. Gouriéroux [6], Li et al. [12] and McAleer [16]. Based on the second equality in (15) we have (dLt )2 =
1 1 tr((dHt )Ht−1 ) tr(Ht−1 (dHt )) + tr((dHt )Ht−1 ut ut Ht−1 ) tr(Ht−1 ut ut Ht−1 (dHt )) 4 4 1 +(dμt ) Ht−1 ut ut Ht−1 dμt − tr((dHt )Ht−1 ) tr(Ht−1 ut ut Ht−1 (dHt )) − (dμt ) Ht−1 ut tr(Ht−1 dHt ) 2 +(dμt ) Ht−1 ut tr(Ht−1 ut ut Ht−1 dHt ).
Using
tr(Ht−1 ut ut Ht−1 dHt ) = (vecut ut ) (Ht−1 ⊗ Ht−1 )d vecHt = (ut ⊗ ut )(Ht−1 ⊗ Ht−1 )d vecHt , (dμt ) Ht−1 ut = (dμt ) Ht−1 (ut ⊗ 1) and using (27) and (28), we then obtain from (29) that 1 1 Et−1 (dLt )2 = − tr((dHt )Ht−1 ) tr(Ht−1 dHt ) + (d vecHt ) (Ht−1 ⊗ Ht−1 )[Et−1 ((ut ⊗ ut )(ut ⊗ ut ))] . . . 4 4 ×(Ht−1 ⊗ Ht−1 )d vecHt + (dμt ) Ht−1 (dμt ) +(dμt ) Ht−1 [Et−1 ((ut ⊗ 1)(ut ⊗ ut ))](Ht−1 ⊗ Ht−1 )d vecHt 1 = − dv(Ht ) D (Ht−1 ⊗ Ht−1 )Dv(Ht )v(Ht ) D (Ht−1 ⊗ Ht−1 )Ddv(Ht ) 4 1 + dv(Ht ) D (Ht−1 ⊗ Ht−1 )Wt (Ht−1 ⊗ Ht−1 )Ddv(Ht ) + (dμt ) Ht−1 dμt 4 1 1 + (dμt ) Ht−1 Vt (Ht−1 ⊗ Ht−1 )Ddv(Ht ) + dv(Ht ) D (Ht−1 ⊗ Ht−1 )Vt Ht−1 dμt 2 2 1 ∂v(Ht ) = (dθ) D (Ht−1 ⊗ Ht−1 )[Wt − Dv(Ht )v(Ht ) D ] . . . 4 ∂θ ∂μt ∂v(Ht ) −1 −1 ∂μt ×(Ht ⊗ Ht )D dθ + (dθ) Ht−1 dθ ∂θ ∂θ ∂θ
(29)
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
∂μt −1 1 ∂v(Ht ) + (dθ) Ht Vt (Ht−1 ⊗ Ht−1 )D dθ 2 ∂θ ∂θ 1 ∂v(Ht ) −1 ∂μt + (dθ) D (Ht ⊗ Ht−1 )Vt Ht−1 dθ = (dθ) Qt dθ, 2 ∂θ ∂θ
2561
(30)
where
1 ∂v(Ht ) −1 ∂v(Ht ) Qt = D (Ht ⊗ Ht−1 )[Wt − Dv(Ht )v(Ht ) D ] . . . × (Ht−1 ⊗ Ht−1 )D 4 ∂θ ∂θ ∂μt ∂μt 1 ∂μt ∂v(Ht ) + Ht−1 + Ht−1 Vt (Ht−1 ⊗ Ht−1 )D ∂θ ∂θ 2 ∂θ ∂θ ∂μt 1 ∂v(Ht ) + D (Ht−1 ⊗ Ht−1 )Vt Ht−1 2 ∂θ ∂θ
(31)
and D is the N × n duplication matrix with n = N(N + 1)/2. Using (13) and (30), we establish A = E0 (Qt ), ÂT = T −1
(32)
T
Qt (θˆ T ),
(33)
t=1
where Qt is given in (31) and Qt (θˆ T ) is Qt (θ) evaluated at θˆ T . We then take the differential of the second equality in (15): d 2 Lt =
1 tr(Ht−1 (dHt )Ht−1 dHt ) − tr(Ht−1 (dHt )Ht−1 ut ut Ht−1 dHt ) − tr(Ht−1 (dμt )ut Ht−1 dHt ) 2 −(dμt ) Ht−1 dμt − ut Ht−1 (dHt )Ht−1 dμt .
(34)
Using (25) and (26), we then obtain from (34) that 1 tr(Ht−1 (dHt )Ht−1 dHt ) + (dμt ) Ht−1 dμt 2 1 = (d vecHt ) (Ht−1 ⊗ Ht−1 )d vecHt + (dμt ) Ht−1 dμt 2 ∂μt 1 ∂v(Ht ) ∂v(Ht ) −1 −1 ∂μt = (dθ) D (Ht ⊗ Ht )D dθ+(dθ) Ht−1 dθ=(dθ) Rt dθ, 2 ∂θ ∂θ ∂θ ∂θ
−Et−1 (d 2 Lt ) =
where Rt =
1 ∂v(Ht ) −1 ∂v(Ht ) ∂μt −1 ∂μt −1 D (H ⊗ H )D + Ht . t t 2 ∂θ ∂θ ∂θ ∂θ
(35)
(36)
Due to (14) and (35) we get B = E0 (Rt ), ˆ T = T −1 B
T
(37) Rt (θˆ T ),
(38)
t=1
where Rt is given in (36) and Rt (θˆ T ) is Rt (θ) evaluated at θˆ T . ˆ T in (12) by using (33) and (38) with the advantage that they are computable requiring only the We may obtain Ω first order derivatives and not the second order derivatives, which (11) does. Note that (36) is the same as (6.23) in Gouriéroux [6] or (2.7) in Ling and Li [13], but (31) is new and generalizes (iii) in Appendix 4.1 in Gouriéroux [6] in the univariate case. In the univariate case, e.g. (25) in Bollerslev [2], the finite fourth moment is used to implement the
2562
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
PML estimation approach. When N = 1, the set of assumptions made previously in (26) through (28) corresponds to that in (25) of Bollerslev [2]. For some special models we may need only weaker assumptions; see, e.g. Palm [18]. In fact, we see from our derivations that we have to assume Et−1 (dLt )2 and Et−1 (d 2 Lt ) in (30) and (34) to exist. For a particular case where the true distribution is conditionally normal, we have A = B, and therefore from (9) we get the asymptotic variance matrix Ω = A−1 . 5. Conditional moments To establish Ω = Ω(θ) in (9), we need to obtain B and A. In general, θ is partitioned as in (17). We then obtain for B in (37) the following conditional moments: ⎞ ⎛ 0 Rt,αα Rt,αβ ⎟ ⎜ Rt = ⎝ Rt,αβ Rt,ββ Rt,βγ ⎠ , (39) 0 where
Rt,αα =
∂μt ∂α
Rt,βγ
Ht−1
Rt,γγ ∂μt , ∂α
∂μt −1 ∂μt Ht , ∂α ∂β 1 ∂v(Ht ) −1 ∂v(Ht ) ∂μt −1 ∂μt −1 = D (H ⊗ H )D + Ht , t t 2 ∂β ∂β ∂β ∂β 1 ∂v(Ht ) −1 ∂v(Ht ) = D (Ht ⊗ Ht−1 )D , 2 ∂β ∂γ 1 ∂v(Ht ) −1 ∂v(Ht ) = D (Ht ⊗ Ht−1 )D . 2 ∂γ ∂γ
(40)
Rt,αβ =
(41)
Rt,ββ
(42)
Rt,βγ Rt,γγ
We obtain for A in (32) the following conditional moments: ⎞ ⎛ Qt,αα Qt,αβ Qt,αγ ⎟ ⎜ Qt = ⎝ Qt,αβ Qt,ββ Qt,βγ ⎠ , Qt,αγ Qt,βγ Qt,γγ where
Qt,αα =
∂μt ∂α
Ht−1
∂μt , ∂α
∂μt −1 ∂μt 1 ∂μt −1 ∂v(Ht ) H + Ht Vt (Ht−1 ⊗ Ht−1 )D , t ∂α ∂β 2 ∂α ∂β 1 ∂μt −1 ∂v(Ht ) = Ht Vt (Ht−1 ⊗ Ht−1 )D , 2 ∂α ∂γ
(43) (44)
(45)
(46)
Qt,αβ =
(47)
Qt,αγ
(48)
Qt,ββ =
1 ∂v(Ht ) −1 ∂v(Ht ) D (Ht ⊗ Ht−1 )Wt (Ht−1 ⊗ Ht−1 )D 4 ∂β ∂β 1 ∂v(Ht ) ∂μt −1 ∂μt −1 −1 ∂v(Ht ) − D Dv(H ) v(H )D D + Ht t t 4 ∂β ∂β ∂β ∂β ∂μt 1 ∂μt −1 ∂v(Ht ) 1 ∂v(Ht ) −1 −1 −1 + H V (H ⊗ H )D + D (Ht ⊗ Ht−1 )Vt Ht−1 , t t t t 2 ∂β ∂β 2 ∂β ∂β
(49)
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
Qt,βγ =
Qt,γγ =
1 ∂v(Ht ) −1 ∂v(Ht ) D (Ht ⊗ Ht−1 )Wt (Ht−1 ⊗ Ht−1 )D 4 ∂β ∂γ 1 ∂v(Ht ) 1 ∂μt −1 ∂v(Ht ) −1 −1 ∂v(Ht ) − D Dv(H ) v(H )D D + Ht Vt (Ht−1 ⊗ Ht−1 )D , t t 4 ∂β ∂γ 2 ∂β ∂γ 1 ∂v(Ht ) −1 ∂v(Ht ) D (Ht ⊗ Ht−1 )Wt (Ht−1 ⊗ Ht−1 )D 4 ∂γ ∂γ 1 ∂v(Ht ) ∂v(Ht ) − D Dv(Ht−1 ) v(Ht−1 )D D . 4 ∂γ ∂γ
2563
(50)
(51)
We give now the asymptotic variance matrices in four special cases which are illustrative examples. 5.1. Case 1 We consider θ partitioned as θ = (α , γ ) for Ω = Ω(θ) in (9), where α appears only in the mean μt = μt (α) and γ only in the variance matrix Ht = Ht (γ). Examples are, e.g. the ARCH models studied by Mills [17], and the CHARMA models by Wong and Li [23]; see also Li et al. [12] and Tsay [22]. We get for B and A:
0 Rt,αα , (52) Rt = 0 Rt,γγ
Qt,αα Qt,αγ Qt = , (53) Qt,αγ Qt,γγ where Rt,αα , Rt,γγ , Qt,αα , Qt,αγ and Qt,γγ are the same as in (40), (44), (46), (48) and (51) respectively. Let Ωα and Ωγ be the asymptotic variance matrices of αˆ T and γˆ T respectively. We then have based on (8) d
T 1/2 (αˆ T − α0 )→N(0, Ωα ), d
T 1/2 (γˆ T − γ0 )→N(0, Ωγ ),
(54) (55)
where Ωα = (E0 Rt,αα )−1 ,
(56)
Ωγ = (E0 Rt,γγ )−1 (E0 Qt,γγ )(E0 Rt,γγ )−1 ,
(57)
with E0 Rt,αα , E0 Rt,γγ and E0 Qt,γγ being assumed to exist. This extends the univariate results in Note 4.10 of Gouriéroux [6]. If the true distribution is conditionally normal, Ω depends on only Rt in (52) and Ωγ reduces to Ωγ = (E0 Rt,γγ )−1 ,
(58)
where Rt,γγ is the same as in (44). 5.2. Case 2 Consider θ = (β , γ ) , where β is shared by the mean μt = μt (β) and the variance Ht = Ht (β, γ), and γ is used only for the variance. An example is a R-ARCH model in Gouriéroux [6]. We have
Rt,ββ Rt,βγ , (59) Rt = Rt,βγ Rt,γγ
2564
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565
Qt =
Qt,ββ Qt,βγ
Qt,βγ Qt,γγ
,
(60)
where Rt,ββ , Rt,βγ , Rt,γγ , Qt,ββ , Qt,βγ and Qt,γγ are the same as in (42), (43), (44), (49), (50) and (51) respectively. 5.3. Case 3 Consider θ = γ, where γ is used only for the variance Ht = Ht (γ) and the mean is μt = 0. An example may be the CCC model based on the concentrated likelihood function of γ in Gouriéroux [6]; see also McAleer [16]. We have Rt = Rt,γγ ,
(61)
Qt = Qt,γγ ,
(62)
where Rt,γγ and Qt,γγ are the same as in (44) and (51) respectively. 5.4. Case 4 Consider θ = (α , β ) , where α appears only in the mean μt = μt (α, β) and β is shared by the mean μt = μt (α, β) and the variance Ht = Ht (β). An example is an ARCH-M model in Gouriéroux [6](case 4 for this paper is suggested by him); see also Engle et al. [5]. We have
Rt,αα Rt,αβ , (63) Rt = Rt,αβ Rt,ββ
Qt,αα Qt,αβ Qt = , (64) Qt,αβ Qt,ββ where Rt,αα , Rt,αβ , Rt,ββ , Qt,αα , Qt,αβ and Qt,ββ are the same as in (40), (41), (42), (46), (47) and (49) respectively. 6. Concluding remarks We have studied a general setting in which the conditional distribution for the multivariate conditional heteroskedastic model is not necessarily normal. Our key result is the new estimator of the asymptotic variance matrix for the PML estimator in Section 4. The advantage is that the calculations are computable, requiring only the first-order derivatives and not the second-order derivatives. The special cases in this class of models in Section 5 are useful and are good examples to show the key result is indeed statistically valuable and practical. Acknowledgements The authors would like to thank David Allen, Richard Baillie, Peter Boswijk, Christian Gouriéroux and Michael McAleer for discussion and suggestions. Liu would also like to acknowledge the financial support provided by University of Canberra. References [1] G. Arminger, Specification and estimation of mean structures: Regression models, in: G. Arminger, C.C. Clogg, M.E. Sobel (Eds.), Handbook of Statistical Modeling for the Social and Behavioral Sciences, Plenum Press, New York, 1995, pp. 77–183. [2] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econometrics 31 (1986) 307–327. [3] J.Y. Campbell, A.W. Lo, A.C. MacKinlay, The Econometrics of Financial Markets, Princeton University Press, Princeton, NJ, 1997. [4] R.F. Engle, Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50 (4) (1982) 987–1006. [5] R.F. Engle, D.M. Lilien, R.P. Robins, Estimating time-varying risk premia in the term structure: the ARCH-M model, Econometrica 55 (2) (1987) 391–407. [6] C. Gouriéroux, ARCH Models and Financial Applications, Springer, New York, 1997.
S. Liu, H. Neudecker / Mathematics and Computers in Simulation 79 (2009) 2556–2565 [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]
2565
C. Gouriéroux, J. Jasiak, Financial Econometrics, Princeton University Press, Princeton, New Jersey, 2001. P. Hall, Q. Yao, Inference in ARCH and GARCH models with heavy-tailed errors, Econometrika 71 (2003) 285–317. C.C. Heyde, Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation, Springer, New York, 1997. R. Jarrow, Volatility. New Estimation Techniques for Pricing Derivatives, Risk Books, London, UK, 1998. T. Kariya, Quantitative Methods for Portfolio Analysis—MTV Model Approach, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1993. W.K. Li, S. Ling, M. McAleer, Recent theoretical results for time series models with GARCH errors, J. Econ. Surv. 16 (3) (2002) 245–269. S. Ling, W.K. Li, Diagnostic checking of nonlinear multivariate time series with multivariate ARCH errors, J. Time Ser. Anal. 18 (5) (1997) 447–464. S. Liu, On diagnostics in conditionally heteroskedastic time series models under elliptical distributions, Stochastic Methods Appl.: J. Appl. Prob. Special 41A (2004) 393–405. J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, revised edition, Wiley, Chichester, UK, 1999 (original edition, 1988). M. McAleer, Automated inference and learning in modeling financial volatility, Econometric Theory 21 (2005) 232–261. T.C. Mills, The Econometric Modelling of Financial Time Series, Cambridge University Press, Cambridge, UK, 1993. F.C. Palm, GARCH models of volatility, in: G.S. Maddala, C.R. Rao (Eds.), Handbook of Statistics, Elsevier Science B.V., Amsterdam, The Netherlands, 1996, pp. 209–240. S.-H. Poon, A Practical Guide to Forecasting Financial Market Volatility, Wiley, New York, 2005. S. Rachev, S. Mittnik, Stable Paretian Models in Finance, Wiley, Chichester, UK, 2000. D. Straumann, Estimation in Conditionally Heteroscedastic Time Series Models, Springer, Berlin, 2005. R.S. Tsay, Analysis of Financial Time Series, John Wiley and Sons, Hoboken, NJ, 2005. H. Wong, W.K. Li, On a multivariate conditional heteroskedastic model, Biometrika 84 (1997) 111–123.