Modelling time-varying higher moments with maximum entropy density

Modelling time-varying higher moments with maximum entropy density

Available online at www.sciencedirect.com Mathematics and Computers in Simulation 79 (2009) 2767–2778 Modelling time-varying higher moments with max...

1MB Sizes 1 Downloads 10 Views

Available online at www.sciencedirect.com

Mathematics and Computers in Simulation 79 (2009) 2767–2778

Modelling time-varying higher moments with maximum entropy density Felix Chan School of Economics and Finance, Curtin University of Technology, GPO Box U1987, Perth, WA 6845, Australia Received 26 June 2008; received in revised form 13 November 2008; accepted 16 November 2008 Available online 6 December 2008

Abstract Since the introduction of the Autoregressive Conditional Heteroscedasticity (ARCH) model of Engle [R. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50 (1982) 987–1007], the literature of modelling the conditional second moment has become increasingly popular in the last two decades. Many extensions and alternate models of the original ARCH have been proposed in the literature aiming to capture the dynamics of volatility more accurately. Interestingly, the Quasi Maximum Likelihood Estimator (QMLE) with normal density is typically used to estimate the parameters in these models. As such, the higher moments of the underlying distribution are assumed to be the same as those of the normal distribution. However, various studies reveal that the higher moments, such as skewness and kurtosis of the distribution of financial returns are not likely to be the same as the normal distribution, and in some cases, they are not even constant over time. These have significant implications in risk management, especially in the calculation of Value-at-Risk (VaR) which focuses on the negative quantile of the return distribution. Failed to accurately capture the shape of the negative quantile would produce inaccurate measure of risk, and subsequently lead to misleading decision in risk management. This paper proposes a solution to model the distribution of financial returns more accurately by introducing a general framework to model the distribution of financial returns using maximum entropy density (MED). The main advantage of MED is that it provides a general framework to estimate the distribution function directly based on a given set of data, and it provides a convenient framework to model higher order moments up to any arbitrary finite order k. However this flexibility comes with a high cost in computational time as k increases, therefore this paper proposes an alternative model that would reduce computation time substantially. Moreover, the sensitivity of the parameters in the MED with respect to the dynamic changes of moments is derived analytically. This result is important as it relates the dynamic structure of the moments to the parameters in the MED. The usefulness of this approach will be demonstrated using 5 min intra-daily returns of the Euro/USD exchange rate. © 2008 IMACS. Published by Elsevier B.V. All rights reserved. Keywords: Conditional higher moment; Entropy; Maximum entropy density; Skewness; Kurtosis

1. Introduction Modelling conditional second moment has recently become a standard practice in analysing financial time series following the success of the Autoregressive Conditional Heteroscedasticity (ARCH) model of Engle [8] and the Generalized ARCH (GARCH) model of Bollerslev [3]. A natural extension to model time varying second moment is

E-mail address: [email protected]. 0378-4754/$36.00 © 2008 IMACS. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.matcom.2008.11.016

2768

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

to model the dynamics of higher order moments such as the third and fourth moments, which relate to the skewness and kurtosis of the underlying distribution, respectively. However, the values of the third and fourth moments are predetermined by the first and second moments under the standard assumption of normality. Although empirical evidence, such as [13,14], show that the normality assumption is often unrealistic for financial time series, the parameters of most volatility models, especially those belong to the GARCH-family, are typically estimated by Quasi Maximum Likelihood Estimator (QMLE) with normal density. Moreover, Value-at-Risk (VaR) forecasts based on the normality assumption often lead to excessive violation due to the restrictive assumptions on the third and fourth moments (see for example [6]). This is particular important as VaR has now become a standard tool for forecasting, evaluating and managing market risk (see [12]) and excessive violation would imply that the VaR forecasts were consistently underestimating market risk, which could lead to devastating consequence for the financial market. Therefore, it is important to improve the methodology for VaR forecasts by accommodating the higher order moment structure and consider more flexible distributions. The standard approach to relax the normality assumption in the literature is to replace the normal distribution by more flexible distributions, see for examples, [4,10,11,16]. Although these studies considered more flexible distributions with the possibility of time-varying higher moments, the distributional assumption may still be too restrictive for the following reasons: (i) these distributions usually only allow the first four moments to be time varying, and the values of the higher moments are subsequently pre-determined by the first four moments; and (ii) the properties of the associated (Quasi) Maximum Likelihood Estimator ((Q)MLE) is unclear, especially if the distribution assumption was violated. In a seminal paper, Rockinger and Jondeau [17] proposed to estimate the distribution function directly using maximum entropy density (MED). The main advantage of MED is that it provides a general framework to estimate the density function directly based on a given set of data, and it provides a convenient framework to model higher order moment up to any arbitrary finite order k. That is, it is possible to investigate the dynamic nature of higher order moments up to any finite order k using MED. More importantly, it is possible to determine k before specifying the dynamic structure of the moments using techniques such as those proposed in [19]. However, this flexibility comes with a high cost in computation time as k increases. The aim of this paper is to propose a flexible framework to estimate MED for financial returns accommodating the dynamic structure of higher order moments. Moreover, the new specification reduces the computation burden substantially relative to the standard approach. It is much simpler to implement in practice and it also avoids various computational issues caused by the Stieltjes and Hamburger moment problems. Moreover, the sensitivity of the parameters in the MED with respect to the changes of moments is derived analytically. This result is important as it relates the dynamic structure of the moments to the parameters in the MED. The empirical usefulness of the model will be investigated using 5 min intra-daily Euro/USD exchange rate data. The paper is organised as follows: Section 2 introduces the concept of maximum entropy density and its estimation methods, with a special emphasis on the computational issues as well as various specifications for the dynamics of higher order moments. A new model will be introduced in Section 3. This is followed by an empirical example in Sections 4 and 5 contains some concluding remarks. The proofs of all the results can be found in the Appendix A. 2. Maximum entropy density The basic idea of MED is to estimate the density function by maximising certain entropy functional subject to a set of moment constraints. The continuous counterpart of the original Shannon’s discrete entropy functional is defined as follows (see [18]):  Ent = − p(x) log p(x) dx, (1) A

where A denotes the appropriate set in which the integration takes place. This entropy aims to measure the difference in the information content between the density p(x) and the density of the uniform distribution. The motivation of the Shannon’s entropy relies on the fact that the uniform distribution is used in the absence of any information and therefore, the distance between p(x) and the uniform density would provide a measure of information content in p(x). Given this interpretation, it would then natural to estimate p(x) by maximising its information content through a set of moment constraints. This is equivalent to maximise Eq. (1) subject to a set of moment constraints up to order k,

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

namely

2769



p(x) = argmax −

p(x) log p(x) dx

p

A

subject to  A

p(x) dx = 1 (2)

xi p(x) dx = mi

i = 1, . . . , k

A

where mi denotes the ith raw moment of the distribution. Since the moments of a distribution can be estimated through a given set of data, therefore MED essentially provides a density that capture as much information from the data as the k moments could provide. The conventional way to solve the optimisation problem (2) is to define the following Hamiltonian:  H(p) = − A

p(x) log p(x) dx + λ0

 A

    k p(x) dx − 1 + λi xi p(x) dx − mi , i=1

(3)

A

where the maximisation of H(p) can be easily solved using calculus of variation, which leads to the following closed form solution:  k  i p(x) = exp(λ0 ) exp (4) λi x i=1

 −1



k i dx where λ0 = λ0 − 1. Since p(x) dx = 1, it is straightforward to show that exp(λ0 ) = A exp λ x . i=1 i Thus, the MED is defined to be  k  −1 i p(x) = Q exp (5) λi x i=1

 k i dx. It is clear that the density of the normal distribution is a special case of Eq. (5), where Q = A exp i=1 λi x which can be stated formally as follows:



Proposition 1. The maximum entropy density as defined in Eq. (5) is equivalent to a normal distribution, N(−λ1 /2λ2 , 1/2|λ2 |) if and only if λ2 < 0 and λit = 0∀i = 3, . . . , k. 

Proof. See Appendix A.

Proposition 1 implies that if k = 4 and λ3 = / 0 or λ4 = / 0 then the maximum entropy density is non-normal. It is straightforward to extend the analysis above to cover the case of conditional probability density, pt (xt |t−1 ), where t denotes the information set containing all relevant information up to time t. The Shannon Entropy in this case is defined as  Ent(pt |t−1 ) = − pt (xt |t−1 ) log pt (xt |t−1 ) dxt . (6) A

Under the principal of maximum entropy, the conditional density, pt (xt |t−1 ) can be estimated by solving the following optimisation problem  pt (xt |t−1 ) = argmax − pt (xt |t−1 ) log pt (xt |t−1 ) dxt pt

A

2770

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

subject to  pt (xt |t−1 ) dxt = 1 A



A

(7) xti pt (xt |t−1 ) dxt

= mit

i = 1, . . . , k

Similar to the case of unconditional density, the above optimisation problem has the following closed form solution  k  −1 i (8) λit xt . pt (xt |t−1 ) = Qt exp i=1

 k i where Qt = A exp I=1 λit xt dxt . Obviously, the Lagrange multiplier, λi is a nonlinear function of the moments, mi , for all i. Therefore if there is a dynamic structure underlying the moments, then the λi must also be time varying. This can be seen formally from the following proposition:



Proposition 2. Let mi denotes the ith moment of a MED and λi be the parameters in the MED, for i = 1, . . . , k. If m2k < ∞ then ∂λi = (mi+j − mi mj )−1 ∂mj Proof. See Appendix A.

∀i, j = 1, . . . , k.



Hence, if the dynamics of mit is conditional on its own past, then λit will be time-varying conditional on the past values of λt and mt , where λt = (λ1t , . . . , λkt ) and mt = (m1t , . . . , mkt ) . Rockinger and Jondeau [17] proposed a set of parametric models for the dynamics of the first four moments, k = 4. The parameters in the model can then be estimated by standard Maximum Likelihood approach with density equal to the corresponding MED as given in Eq. (8). While this approach is conceptually straightforward and easy to understand, it imposes significant computational burden on the estimation of the parameters. Under the time-varying assumption of the moments, the estimation of MED would require the computation of λit for every t at every iteration in the optimisation routine. Since there is no closed-form solution for λit given a set of moments, {mit }ki=1 , for k > 2, the computation of λit must rely on numerical procedure. This makes the parameter estimation a time consuming exercise, and may not be feasible for large sample set such as those typically seen in financial time series. Another drawback of imposing dynamic structure on the moments directly is related to the Stieltjes and Hamburger moment problems. Essentially, the problem seeks to find the necessary and sufficient condition in which a sequence of number, {mi }ki=1 , must satisfy in order to ensure the existence of a proper density function such that its ith moment is mi for i = 1, . . . , k. Although such conditions were derived in [9,15], it is virtually impossible to restrict the parameters so that the model could always produce a sequence of number that satisfies the conditions for every t. In order to resolve these issues, this paper propose to model the dynamics of λit directly. This method only requires the computation of λit once and therefore significantly reduces the computation time. Model specification and estimation issues will be discussed in the next section. 3. Specification and estimation 3.1. A model Let λt = (λ1t , . . . , λkt ) and mt = (m1t , . . . , mkt ) with pt (xt ) = pt (xt |t−1 ). Consider the following specification for the dynamics of λt : λt = Ω +

p  j=1

Θj mt−j +

q  l=1

Γl λt−l

(9)

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

2771

where Ω is a k × 1 vector, Θj and Γl are k × k matrices for j = 1, . . . , p and l = 1, . . . , q, respectively. Let vec(A) denotes the vec operator of a matrix, A, the parameter vector, Θ = (Ω ,vec(Θ1 ) , . . . ,vec(Θp ) ,vec(Γ1 ) , . . . ,vec(Γq ) ) , can then be estimated by minimising the Hellinger distance. This leads to the Minimum Hellinger Distance Estimator (MHDE) as follows: T   2 ˆ = argmin (pˆ t (xt )1/2 − pt (xt )1/2 ) dxt , (10) Θ Θ

t=1

where ˆ −1 pˆ t (xt ) = Q t exp

A

 k 

λˆ it xti

(11)

i=1

such that ˆ + λˆ t = Ω

p 

ˆ j mt−j + Θ

j=1

 ˆt = Q

exp

Γˆ l λt−l

l=1

 k 

A

q 

λˆ i xi

dx.

i=1

MHDE has been particularly useful for estimating density function, see for example, [2] for a theoretical treatment on the statistical properties of MHDE and [5] for applying MHDE to estimate the parameters in standard GARCH model. The density function, pt (xt ), for Eq. (10) can be constructed using the estimated moments from the sample. Consider the standard raw moment estimator m ˆ it = t −1

t 

xτi ,

∀i = 1, . . . , k,

(12)

τ=1

λit can then be calculated numerically based on m ˆ it . [17] provides an excellent account on the different efficient methods to compute λit given a set of m ˆ it . More importantly, [17] proved that the corresponding λit exists and is unique given a sequence of moments, for all i = 1, . . . , k, by using the result from [15]. Given the recent availability of intra-daily data, the MED can be constructed based on sample moments from intra-daily data for every day, then the dynamic of the MED can be modelled directly using Eq. (9) and the parameter estimates can be obtained through MHDE as defined in Eq. (10). This is the methodology used for the empirical example in this paper. Divide the total trading time in a day into h equally spaced intervals and let pt,j and pt,j+Δ denote the price of a particular asset at the beginning and the end of the j th interval in day t, respectively for all t = 1, . . . , T and j = 1, . . . , h, that is, the price of the asset is being recorded on equally spaced intervals, h + 1, times a day, for T days. Note that pt,j+Δ = pt,j+1 . Then the return within each interval can be calculated as rt,j = 100 log(pt,j+Δ /pt,j ),

∀j ≥ 2

(13)

which produces h intra-daily returns for all t = 1, . . . , T . The sample moments, and subsequently, the associated λit and the entropy density of the daily return can then be constructed based on the h intra-daily returns for all t = 1, . . . , T . Given the set of λit , the parameters in model (9) can then be estimated using the MHDE as stated in Eq. (10). The procedure can be summarised into the following steps: Step. 1 For every t = 1, . . . , T , calculate the intra-daily return using Eq. (13). Step. 2 Calculate the k sample moments, mit , i = 1, . . . , k using the intradaily returns as calculated in Step (1) for every t = 1, . . . , T . Step. 3 Compute λit for i = 1, . . . , k given the sample moment m ˆ it , for every t = 1, . . . , T .

2772

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

Step. 4 Given the set of λit , construct pt (xt ) for every t = 1, . . . , T . Step. 5 Estimate the parameter vector Θ by minimising the Hellinger distance as defined in Eq. (10). Notice this approach allows the specification of λit to be flexible with minimum computational cost. Using the ˆ as defined in Eq. (10) is shown to be asymptotically theorems derived in [2], (see also [5]), the MHDE for Θ, Θ, normal, that is √

 A ˆ − Θ)→N T (Θ

0, 4

−1

  A

∂p(x ˆ t) ∂Θ



∂p(x ˆ t) ∂Θ

    

−1 dx

.

(14)

ˆ Θ=Θ

ˆ λˆ it can then be calculated using Eq. (9), and hence the estimated MED can then be constructed using Eq. Given Θ, (11). Recently, the impact of market microstructure noise on the estimation of realized volatility has been receiving a lot of attentions, see [1] for a detailed treatment. In the present case, the idea of MED is to estimate the density for the observed intra-daily return. Therefore, if rt = r˜t + εt where r˜t and εt denote the efficient returns and the market microstructure noise, respectively, then the estimated MED would in fact be the estimated density of rt which is a sum of two random variables. Since the focus of the paper is on the dynamics of the observed returns, rt , and not on the dynamics of the efficient returns, r˜t , it is not necessary to separate efficient returns from the market microstructure noise.

Fig. 1. Daily sample moments.

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

2773

3.2. Numerical integration Unless otherwise stated, the following closed Newton-Cotes formula is used for all the numerical integrations in the estimation procedure, which is defined as      b h b−a f (x) dx = f (a) + 4 + f (b) . (15) 3 2 a The reason for choosing the Newton–Cotes formula over the more sophisticated algorithm such as Gaussian Quadrature, is due to its computation simplicity. In the present case, the interval in which the integration takes place is often A = (−∞, ∞) and in the absence of any closed form solution, the integral must be approximated which implies the

b

∞ improper integral, −∞ f (x) dx must be approximated by a f (x) dx for some finite values a and b. Let Δ be some predetermined step-size and tol be a pre-determined tolerance level. Moreover, denotes S(a, b|f (x)) as the approximation

b of the integral a f (x) dx using the Newton–Cotes formula, then a and b are chosen in this paper when |S(a − Δ, b + Δ|f (x)) − S(a, b|f (x))| < tol. 4. Empirical results This section provides an empirical example by estimating MED with higher conditional moments using intradaily data of exchange rate between Euro and US dollar. The data used in this paper was collected through the Philadelphia exchange with the exchange rate being recorded every 5 min from 3/1/2005 to 3/6/2006. As a result, there are 84 intra-daily returns for each day over 330 days, which makes 27,720 observations in total. For this empirical example,

Fig. 2. Descriptive statistics.

2774

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

k = 4, p = q = 1 and the coefficient matrices, Θ1 and Γ1 , are restricted to be diagonal matrices, that is λˆ it = ωi + θi mit−1 + γi λit−1

∀i = 1, . . . , 4.

(16)

All the estimation in this paper was conducted using Ox version 4.10 (see [7]) and the computing codes are available upon request. Since most of the integrals required for the estimation do not have closed form solutions, integration must be computed numerically. As mentioned in the previous section, this paper adopted the Newton–Cotes formula for numerical integration with tol = 10−6 . Fig. 1 contains the plots of 330 days sample moments calculated using 84 intra-daily returns for each day, where Fig. 2 contains the sample estimates of mean, variance, skewness and kurtosis of the return distribution for each day. As shown in both figures, the mean, the third moment and subsequently the skewness are all centred around 0 indicating the distribution of daily returns are symmetric on average. However, the plots also reveal two negative outliers in the third moment and interestingly, these two outliers in the third moment (skewness) match by the two positive outliers in the fourth moment (kurtosis) as shown in Figs. 1 and 2. Moreover, the dynamics of the second moment and the variance follow quite closely to the typical financial time series for returns, especially the clustering of high and low volatilities. Fig. 3 contains the plots of the corresponding λt for t = 1, . . . , 330. Notice all λ4t < 0 for all t which satisfies Qt < ∞ for all t and therefore MED exists for every t. Interestingly, while λ1t and λ3t fluctuate around 0 and do not exhibit any definitive pattern, both λ2t and λ4t exhibit patterns very similar to the patterns found in the variance and kurtosis. However, it is important to note that each λit is a function of all four moments, so it would be misleading to associate λit with the moment of a particular order. Table 1 contains the parameter estimates for Eq. (16) by minimising the Hellinger distance as defined in Eq. (10) with the corresponding t-ratios in the parenthesis. Interestingly only three parameters are significant, namely, ω2 , ω4

Fig. 3. Daily λ’s.

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

2775

Table 1 Parameter estimates of the MED model with the associated t-ratios in parentheses. ** denotes 1% statistical significance. Parameters ωˆ 1 ωˆ 2 ωˆ 3 ωˆ 4 θˆ 1 θˆ 2 θˆ 3 θˆ 4 γˆ 1 γˆ 2 γˆ 3 γˆ 4

Estimates 0.0380 (1.151) −4.446** (−16.566) −0.039 (−0.258) −0.549** (−8.356) 0.052 (0.877) 0.192** (2.715) −0.024 (−0.427) −0.012 (−0.219) 1.251 (1.717) 0.559 (0.423) 0.047 (0.099) 0.023 (0.708)

and θ2 . This has the following implications about the distribution of the daily return for Euro. Firstly, only λ2t evolves over time, as only θ2 is significant. Secondly, since ω3 , θ3 and γ3 are not statistically significant, implying that λ3t is 0 on average. Moreover, both λ2t and λ4t are negative ∀t. Therefore, the distribution of the daily return for Euro is symmetric on average. Thirdly, Proposition 1 implies that the distribution of the daily return for Euro is non-normal since ω4 is statistically different from 0 indicating λ4t is not 0 on average. For the purposes of demonstration, Figs. 4 and 5 contain the plots of the estimated MED, MED and the distribution of the Euro/USD exchange rate under the assumption of normality for 3/6/2006. As shown in Fig. 4, the shape of the estimated MED resembled closely to the MED, which is negatively skewed and this could not be captured by the normal distribution. Moreover, the MED also has a much thicker negative tail than the normal distribution. This implies

Fig. 4. Estimated maximum entropy density.

2776

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

Fig. 5. Negative quantile of the estimated maximum entropy density.

that negative returns have higher probability to occur under the empirical distribution than the normal distribution. The feature of thick negative tail has also been captured by the estimated MED which has a slightly thicker tail at the negative quantile than the normal distribution. This suggests that the estimated MED as proposed in this paper can capture the probability of negative return more accurately than the standard assumption of normality. 5. Conclusions This paper proposed a new method to analyse the distribution of financial time series using maximum entropy density by accommodating potential dynamic structure of higher order moments. The new method is more computational efficient than the conventional MED methods and the dynamic structure of the moments is modelled through their corresponding parameters in the MED. The parameters in the model are then estimated through minimising the Hellinger Distance (MHDE). The usefulness of this approach was demonstrated by using 5 min intra-daily data of the Euro/US exchange rate. The results provide useful insight into the dynamics of the distribution for Euro/US exchange and showed that the method proposed in this paper can improve the accuracy in modelling the probability of negative returns. Acknowledgements The author would like to thank the editors and two anonymous referees for helpful comments. The author would also like to thank Hiroaki Suenaga and Zdravetz Lazarov for insightful discussions as well as the participants of MODSIM 2007 International Congress on Modelling and Simulation. The financial support from the School of Economics and Finance, Curtin University of Technology is gracefully acknowledged. Appendix A Proof of Proposition 1. By completing the squares, the function below can be rewritten as      λ1 2 λ21 2 2 = exp λ2 x + − exp λ1 x + λ2 x 2λ2 4λ2       −λ21 λ1 2 = exp exp λ2 x + 4λ2 2λ2

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

2777

Set μ = −λ1 /2λ2 and σ 2 = −1/2λ2 , the expression above becomes     −λ21 (x − μ)2 exp exp . 4λ2 2σ 2 Hence,



Q

=



exp(λ1 x + λ2 x2 ) dx     (x − μ)2 −λ21 exp dx exp 4λ2 2σ 2 −∞     ∞ −λ21 (x − μ)2 exp dx exp 4λ2 2σ 2 −∞   −λ21 √ exp 2πσ 2 . 4λ2

−∞  ∞

= = = This implies 

−1

exp(λ1 x + λ2 x ) dx = √

Q

2

A

1 2πσ 2





(x − μ)2 exp 2σ 2 A

 dx.



This completes the proof.

Proof of Proposition 2. Notice that the density p(x) has to satisfy the constraint  

i p(x) = Q−1 A exp i λi x dx, it follows that    −1 j i Q x exp λi x dx = mj . A



A p(x) dx

= mj

∀j and since

i

By implicitly differentiating λl , l = 1, . . . , k with respect to mq for q = 1, . . . , k yields,       ∂λl ∂Q ∂λ ∂mj l −Q−2 xj exp λi xi dx + Q−1 xj+l exp λi x i dx = ∂λl ∂mq A ∂m ∂m q q A i

which can be rewritten as   ∂λl ∂mj −1 ∂Q −Q mj + mj+l = . ∂mq ∂λl ∂mq Notice that ∂Q = ∂λl



l

x exp A

 

i

(17)

λi x

i

dx,

i

which implies Q−1

∂Q = ml . ∂λl

(18)

Combining Eqs. (17) and (18) gives  ∂mj ∂λl  −ml mj + mj+l = . ∂mq ∂mq

(19)

2778

F. Chan / Mathematics and Computers in Simulation 79 (2009) 2767–2778

Since this holds for all j = 1, . . . , k, this implies   ∂mj ∂mi −1 −ml mi + mi+l = 1 ∀i, j, l = 1, . . . , k. ∂mq ∂mq −ml mj + mj+l and therefore −ml mj + mj+l ∂mj = . ∂mi −ml mi + mi+l

∀i, j, l = 1, . . . , k.

Substitute the final expression in Eq. (19) gives ∂λl = (ml+q − ml mq )−1 , ∂mq and replace the indexes, l, q by i, j yields the result. This completes the proof.



References [1] Y. Aït-Sahalia, P.A. Mykland, L. Zhang, How often to sample a continuous-time process in the presence of market microstructure noise, The Review of Financial Studies 18 (2005) 351–416. [2] R. Beran, Minimum Hellinger distance estimates for parametric models, The Annals of Statistics 5 (3) (1977) 445–463. [3] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics 31 (1986) 307–327. [4] T. Bollerslev, A conditional heteroskedastic time series model for speculative prices and rates of return, Review of Economics and Statistics 69 (1987) 542–547. [5] S.A. Chandra, M. Taniguchi, Minimum α-divergence estimation for ARCH models, Journal of Time Series Analysis 27 (1) (2006) 19–39. [6] B. da Veiga, M. McAleer, F. Chan, It pays to violate: model choice and critical value assumption for forecasting value-at-risk thresholds, in: A. Zerger, R. Argent (Eds.), MODSIM 2005 International Congress on Modelling and Simulation, 2005, pp. 2304–2311. [7] J. Doornik, Object-Oriented Matrix Programming Using Ox, Timberlake Consultants Press, London, 2002. [8] R. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50 (1982) 987–1007. [9] M. Frontini, A. Tagliani, Entropy-convergence in Stieltjes and Hamburger moment problem, Applied Mathematics and Computation 88 (1997) 39–51. [10] B. Hansen, Autoregressive conditional density estimation, International Economic Review 35 (3) (1994) 705–730. [11] C. Harvey, A. Siddique, Autoregressive conditional skewness, Journal of Financial and Quantitative Analysis 34 (4) (1999) 465–486. [12] P. Jorion, Value at Risk: The New Benchmark for Managing Financial Risk, McGraw-Hill, New York, 2000. [13] B. Mandelbrot, The variation of certain speculative prices, The Journal of Business 36 (1963) 394–419. [14] B. Mandelbrot, The variation of some other speculative prices, The Journal of Business 40 (1967) 393–413. [15] L. Mead, N. Papanicolaou, Maximum entropy in the problem of moment, Journal of Mathematical Physics 25 (8) (1984) 2404–2417. [16] D. Nelson, Conditional heteroskedasticity in asset returns—a new approach, Econometrica 59 (2) (1991) 347–370. [17] M. Rockinger, E. Jondeau, Entropy densities with an application to autoregressive conditional skewness and kurtosis, Journal of Econometrics 106 (2002) 119–142. [18] C. Shannon, The mathematical theory of communication, Bell Systems Technical Journal 27 (1948) 349–423. [19] X. Wu, Calculation of maximum entropy density with application to income distribution, Journal of Econometrics 115 (2003) 347–354.