Generalized spectral testing for multivariate continuous-time models

Generalized spectral testing for multivariate continuous-time models

Journal of Econometrics 164 (2011) 268–293 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/loca...

541KB Sizes 0 Downloads 35 Views

Journal of Econometrics 164 (2011) 268–293

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Generalized spectral testing for multivariate continuous-time models Bin Chen a,∗ , Yongmiao Hong b,c a

Department of Economics, University of Rochester, Rochester, NY, 14627, United States

b

Department of Economics & Department of Statistical Science, Cornell University, Ithaca, NY 14850, United States

c

Wang Yanan Institute for Studies in Economics (WISE) & MOE Key Laboratory in Econometrics, Xiamen University, Xiamen 361005, China

article

info

Article history: Received 17 December 2008 Received in revised form 21 May 2011 Accepted 1 June 2011 Available online 15 June 2011 JEL classification: C4 E4 G0

abstract We develop an omnibus specification test for multivariate continuous-time models using the conditional characteristic function, which often has a convenient closed-form or can be accurately approximated for many multivariate continuous-time models in finance and economics. The proposed test fully exploits the information in the joint conditional distribution of underlying economic processes and hence is expected to have good power in a multivariate context. A class of easy-to-interpret diagnostic procedures is supplemented to gauge possible sources of model misspecification. Our tests are also applicable to discrete-time distribution models. Simulation studies show that the tests provide reliable inference in finite samples. © 2011 Elsevier B.V. All rights reserved.

Keywords: Affine jump–diffusion model Conditional characteristic function Discrete-time distribution model Generalized cross-spectrum Lévy processes Model specification test Multivariate continuous-time model

1. Introduction Multivariate continuous-time models have proved to be versatile and productive tools in finance and economics (e.g., Andersen et al. (2002), Chernov et al. (2003), Dai and Singleton (2003), Pan and Singleton (2008), Jarrow et al. (2010), and Piazzesi (2010)). Compared with that of discrete-time models, the econometric analysis of continuous-time models is often challenging. In the past decade or so, substantial progress has been made in developing estimation methods for continuous-time models.1 However, relatively little effort has been devoted to specification and evaluation of continuous-time models. A continuous-time model essentially specifies the transition density of underlying processes. Model misspecification generally renders inconsistent parameter estimators and their variance–covariance matrix estimators, yielding misleading conclusions in inference and hypothesis testing.



Corresponding author. E-mail addresses: [email protected] (B. Chen), [email protected] (Y. Hong). 1 Sundaresan (2001) points out that ‘‘perhaps the most significant development in the continuous-time field during the last decade has been the innovations in econometric theory and in the estimation techniques for models in continuous time.’’ For other reviews of this literature, see (e.g.) Tauchen (1997) and Ait-Sahalia (2007). 0304-4076/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2011.06.001

Correct model specification is also crucial for valid economic interpretations of model parameters. More importantly, a misspecified model can lead to large errors in pricing, hedging and managing risk. However, economic theories usually do not suggest any concrete functional form for continuous-time models; the choice of a model is somewhat arbitrary. A prime example of this practice is the pricing and hedging literature, where continuous-time models are generally assumed to have a functional form that results in a closed-form pricing formula. The models, though convenient, are often incorrect or suboptimal. To avoid this pitfall, the development of reliable specification tests for continuous-time models is necessary. In a pioneer paper, Ait-Sahalia (1996a) develops a nonparametric test for univariate diffusion models. By observing that the drift and diffusion functions completely characterize the stationary (i.e., marginal) density of a diffusion model, Ait-Sahalia (1996a) compares the model-implied stationary density with a smoothed kernel density estimator based on discretely sampled data.2 Gao and King (2004) develop a simulation procedure to improve the finite sample performance of Ait-Sahalia’s (1996a) test. These tests are 2 Ait-Sahalia (1996a) also proposes a transition density-based test that exploits the ‘‘transition discrepancy’’ characterized by the forward and backward Kolmogorov equations, although the marginal density-based test is most emphasized.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

convenient to implement. Nevertheless, they may pass over a misspecified model that has a correct stationary density. Hong and Li (2005) develop an omnibus nonparametric specification test for continuous-time models. The test uses the transition density, which depicts the full dynamics of a continuoustime process. When a univariate continuous-time model is correctly specified, the probability integral transform (PIT) of data via the model-implied transition density is i.i.d. U[0, 1]. Hong and Li (2005) check the joint hypothesis of i.i.d. U [0, 1] by using a smoothed kernel estimator of the joint density of the PIT series. However, Hong and Li’s (2005) approach cannot be extended to a multivariate model. The PIT series with respect to a modelimplied multivariate transition density are no longer i.i.d U[0, 1], even if the model is correctly specified. In an application, Hong and Li (2005) evaluate multivariate affine term structure models (ATSMs) for interest rates by using the marginal PIT for each state variable. This is analogous to Singleton’s (2001) conditional characteristic function-based limited-information maximum likelihood (LML-CCF) estimator in a related context, which is based on the conditional density function of an individual state variable. As pointed out by Singleton (2001), ‘‘the LML-CCF estimator fully exploits the information in the conditional likelihood function of the individual yj,t +1 , but not the information in the joint conditional distribution of yt +1 .’’ This generally leads to an asymptotic efficiency loss in estimation. Similarly, Hong and Li’s (2005) test, when applied to each state variable of a multivariate process, may fail to detect misspecification in the joint dynamics of state variables. In particular, the test may easily overlook misspecification in the conditional correlations between state variables. Moreover, the use of the transition density may not be computationally convenient because the transition densities of most continuous-time models have no closed-form. Gallant and Tauchen (1996) propose a class of Efficient Method of Moments (EMM) tests that can be used to test multivariate continuous-time models. They propose a χ 2 test for model misspecification, and a class of appealing diagnostic t-tests that can be used to gauge possible sources for model failure. Since these tests are by-products of the EMM algorithm, they cannot be used when the model is estimated by other methods. This may limit the scope of these tests’ otherwise useful applications. Bhardwaj et al. (2008) consider a simulation-based test, which is an extension of Andrews’ (1997) conditional Kolmogorov test, for multivariate diffusion processes. The limiting distribution of the test statistic is not nuisance parameter free and hence asymptotically critical values must be obtained via the block bootstrap, which may be time-consuming. There have been other tests for univariate diffusion models in the recent literature. Ait-Sahalia et al. (2009) propose some tests by comparing the model-implied transition density and distribution function with their nonparametric counterparts. Chen et al. (2008) also propose a transition density-based test using a nonparametric empirical likelihood approach. Li (2007) focuses on the parametric specification of the diffusion function by measuring the distance between the model-implied diffusion function and its kernel estimator. These approaches could be extended to multivariate continuous-time models. However, all these tests maintain the Markov assumption for the data generating process (DGP), and consider the finite order lag only. If the DGP is non-Markov, these tests may miss some dynamic misspecifications. This paper proposes a new approach to testing the adequacy of a multivariate continuous-time model that uses the full information of the joint dynamics of state variables. In a multivariate context, modeling the joint dynamics of state variables is important in many applications (e.g., Alexander (1998)). For example, as the conditional correlations between asset returns change over time, the specific weight allocated to each asset within a portfolio should

269

be adjusted accordingly. Similarly, hedging requires knowledge of conditional correlations between the returns on different assets within the hedge. Conditional correlations are also important in pricing structured products such as rainbow options, which are based on multiple underlying assets whose prices are correlated. In the term structure literature, models of interest rate term structure impose dynamic cross-sectional restrictions, as implied by the no-arbitrage condition on bond yields of different maturities. This joint dynamics can be used to investigate the transmission mechanism that transfers the impact of government policy from spot rates to longer-term yields. There is a conflict between the flexibilities of modeling instantaneous conditional variances and instantaneous conditional correlations of bond yields. Dai and Singleton (2000) find that not only do swap rate data consistently call for negative conditional correlations between bond yields, but also factor correlations help explain the shape of the term structure of bond yields’ volatilities. Indeed, as Engle (2002) points out, ‘‘the quest for reliable estimates of correlations between financial variables has been the motivation for countless academic articles, practitioner conferences and Wall Street research’’. There has been a long history of using the CF in estimation and hypotheses testing in statistics and econometrics. To name a few, Koutrouvelis (1980) constructs a chi-squared goodness-of-fit test for simple null hypotheses with empirical characteristic function (ECF). Fan (1997) takes the CF approach to testing multivariate distributions. But both tests maintain the i.i.d. assumption and hence are not suitable for the time series data. Recently, Su and White (2007) test conditional independence by comparing the unrestricted and restricted CCFs via a kernel regression. All above works deal with discrete-time models, in recent years, the CF approach has attracted an increasing attention in the continuoustime literature. For most continuous-time models, the transition density has no closed-form, which makes estimation of and testing for continuous-time models rather challenging. However, for a general class of affine jump–diffusion (AJD) models (e.g., Duffie et al. (2000)) and time-changed Lévy processes (e.g., Carr and Wu (2003, 2004)), the CCF has a closed-form as an exponential-affine function of state variables up to a system of ordinary differential equations. This fact has been exploited to develop new estimation methods for multifactor continuous-time models in the literature. Specifically, Chacko and Viceira (2003) suggest a spectral GMM estimator based on the average of the differences between the ECF and the model-implied CF. Jiang and Knight (2002) derive the unconditional joint CF of an AJD model and use it to develop some GMM and ECF estimation procedures. Singleton (2001) proposes both time-domain estimators based on the Fourier transform of the CCF, and frequency-domain estimators directly based on the CCF. By extending Carrasco and Florens (2000), Carrasco et al. (2007) propose GMM estimators with a continuum of moment conditions (C-GMM) via the CF. All these estimation methods differ in their uses of the conditional information set. As of yet, no attempt has been made in the literature to use the CF to test continuoustime models, although Carrasco et al. (2007) do mention that ‘‘the future work will have to refine our results on estimation of nonMarkovian processes and latent states as well as develop tests in the framework of CF-based continuum of moment conditions’’. In light of the convenient closed-form of the CCF of AJD models, we provide a new test of the adequacy of multivariate continuoustime models. The multivariate continuous-time models can include jumps and the underlying DGPs of observable state variables need not be Markov. Naturally, as a special case, our test can be used to check univariate continuous-time models. Compared with the existing tests for continuous-time models in the literature, our approach has several main advantages. First, because the CCF is the Fourier transform of the transition density, our omnibus test fully exploits the information in the

270

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

joint transition density of state variables rather than only the information in the transition densities of individual state variables. Thus, it can detect misspecifications in the joint transition density even if the transition density of each state variable is correctly specified. When the underlying multivariate continuous-time process is Markov, our omnibus test is consistent against any model misspecification. This is unattainable by some existing tests for multivariate continuous-time models. Moreover, we have used a novel generalized cross-spectral approach, which embeds the CCF in a spectral framework, thus enjoying the appealing features of spectral analysis. For example, it checks many lags. This is particularly useful when the DGP is non-Markov. Second, besides the omnibus test, we propose a class of diagnostic tests by differentiating the generalized cross-spectrum of the state vector. These tests can evaluate how well a continuoustime model captures various specific aspects of the joint dynamics and they are easy to interpret. In particular, these tests can provide valuable information about neglected dynamics in conditional means, conditional variances, and conditional correlations of state variables, respectively. Therefore, they complement Gallant and Tauchen’s (1996) popular EMM-based individual t-tests. All our omnibus test and diagnostic tests are derived from a unified framework. Third, our tests are applicable to a wide variety of continuoustime models and discrete-time multivariate distribution models, since we impose regularity conditions on the CCF of discretely observed samples with some fixed sample frequency, rather than on stochastic differential equations (SDEs). By using the CCF to characterize the adequacy of a model, our tests are most convenient whenever the model has a closed-form CCF and many popular continuous-time models in finance (e.g., the class of multivariate AJD models and the class of time-changed Lévy processes) have a closed-form CCF, although they have no closedform transition density. Of course, our tests can also evaluate the multivariate continuous-time models with no closed-form CCF. In this case, we need to recover the model-implied CCF by using inverse Fourier transforms or simulation methods. Unlike tests based on CFs in the statistical literature, which often have nonstandard asymptotic distributions, our tests have a convenient null asymptotic N (0, 1) distribution. Fourth, we do not require a particular estimation method. √ Any T -consistent parametric estimators can be used. Parameter estimation uncertainty does not affect the asymptotic distribution of our test statistics. One can proceed as if true model parameters were known and equal to parameter estimates. This makes our tests easy to implement, particularly in view of the notorious difficulty of estimating multivariate continuous-time models. The only inputs needed to calculate the test statistics are the discretely observed data and the model-implied CCF. In Section 2, we introduce the framework, state the hypotheses, and characterize the correct specification of a multivariate continuous-time model. In Section 3, we propose a generalized cross-spectral omnibus test, and in Section 4 we derive the asymptotic null distribution of our omnibus test and discuss its asymptotic power property. In Section 5, we develop a class of generalized cross-spectral derivative tests that focuses on various specific aspects of the joint dynamics of a time series model. In Section 6, we assess the reliability of the asymptotic theory in finite samples by simulation. Section 7 concludes our work. All mathematical proofs are collected in Appendix. A GAUSS code to implement our tests is available from the authors upon request. Throughout the paper, we will use C to denote a generic bounded constant, ‖·‖ for the Euclidean norm, and A∗ for the complex conjugate of A.

2. Hypotheses of interest For concreteness, we focus on a multivariate continuous-time setup.3 For a given complete probability space (Ω , F , P) and an information filtration (Ft ), we assume that a N × 1 state vector Xt is a continuous-time DGP in some state space D ⊂ RN . We permit but do not require Xt to be Markov, which is often assumed in the continuous-time modeling in finance and macroeconomics.4 In financial modeling, the following class M of continuous-time models is often used to capture the dynamics of Xt 5 : dXt = µ (Xt , θ) dt + σ (Xt , θ) dWt + dJt (θ),

θ ∈ 2,

(2.1)

where Wt is an N × 1 standard Brownian motion in R , 2 is a finite-dimensional parameter space, µ : D × 2 → RN is a drift function (i.e., instantaneous conditional mean), σ : D × 2 → RN ×N is a diffusion function (i.e., instantaneous conditional standard deviation), and Jt is a pure jump process whose jump size follows a probability distribution ν : D × 2 → R+ and whose jump times arrive with intensity λ : D × 2 → R+ .6 We allow some state variables to be unobservable. One example is the stochastic volatility (SV) model, where the volatility, which is a proxy for the information inflow, is a latent process. The setup (2.1) is a general multivariate specification that nests most existing continuous-time models in finance and economics. For example, suppose we restrict the drift µ (·, ·), the instantaneous covariance matrix σ (·, ·) σ (·, ·)′ and the jump intensity λ (·, ·) to be affine functions of the state vector Xt ; namely, N

 µ(Xt , θ) = K0 + K1 Xt ,   [σ(Xt , θ)σ(Xt , θ)′ ]jl = [H0 ]jl + [H1 ]jl Xt ,   j, l = 1, . . . , N , ′ λ (Xt , θ) = L0 + L1 Xt ,

(2.2)

where K0 ∈ RN , K1 ∈ RN ×N , H0 ∈ RN ×N , H1 ∈ RN ×N ×N , L0 ∈ R, and L1 ∈ RN are unknown parameters. Then we obtain the class of AJD models of Duffie et al. (2000). It is well known that for a continuous-time model characterized by a SDE, the specification of the drift µ(Xt , θ), the diffusion σ(Xt , θ) and the jump process Jt (θ) completely determines the joint transition density of Xt . We use p(x, t |Fs , θ) to denote the model-implied transition density of Xt = x given Fs , where s < t. Suppose Xt has a true transition density, say p0 (x, t |Fs ). Then the continuous-time model is correctly specified for the full dynamics of Xt if there exists some parameter value θ0 ∈ 2 such that

H0 : p(·,t |Fs , θ0 ) = p0 (·, t |Fs ) almost surely (a.s.) and for all t , s, s < t .

(2.3)

3 Our test is applicable to both continuous-time and discrete-time models but we focus on a continuous-time setup due to the following reasons. Our approach is most convenient when the CCF has a closed-form and many popular continuoustime models in finance have no closed-form transition density, but do have a closedform CCF. On the other hand, to our knowledge, no CF-based test is available to check the specification of continuous-time models, although the CF approach has been used in the estimation of continuous-time models. Hence, our test nicely fills the gap in the literature. 4 However, Easley and O’Hara (1992) develop an economic structural model and show that financial time series, such as prices and volumes, are likely non-Markov. 5 Despite the fact that most continuous-time models characterized by SDEs in the literature are Markov, it is still important to allow Xt to be non-Markov. Even the null model is Markov, to allow for non-Markov DGPs under the alternative will ensure the power of the test against a wider range of misspecification, particularly, dynamic misspecification. On the other hand, the model may be Markov but involves some latent variables, as is the case of SV models. As a result, observable state variables themselves will be non-Markov, even under the null. 6 We assume that the functions µ, σ , ν and λ are regular enough to have a unique strong solution to (2.1). See (e.g.) Ait-Sahalia (1996a), Duffie et al. (2000) and GenonCatalot et al. (2000) for more discussions.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

Alternatively, if for all θ ∈ 2, we have

Given the equivalence between the transition density and the CCF, we can write the hypotheses of interest H0 in (2.3) versus HA in (2.4) as follows:

HA : p(·, t |Fs , θ) ̸= p0 (·, t |Fs ) for some s < t with positive probability measure,

(2.4)

then the continuous-time model is misspecified for the full dynamics of Xt . The transition density characterization can be used to test correct specification of a continuous-time model. When Xt is univariate, Hong and Li (2005) propose a kernel-based test for a continuous-time model by checking whether the PIT Zt (θ0 ) ≡

Xt



p(x, t |It −1 , θ0 )dx ∼ i.i.d.U [0, 1],

(2.5)

−∞

which holds under H0 , where It −1 = {Xt −1 , Xt −21 , . . .} is the information set available at time t − 1, where 1 is a fixed sampling interval for an observed sample. The i.i.d. U [0, 1] property for the PIT has also been used in other contexts (e.g., Diebold et al. (1998)). However, there are some limitations to this approach. For example, for most continuous-time diffusion models (except such simple diffusion models as Vasicek’s (1977) model), the transition densities, have no closed-form. Most importantly, the PIT cannot be applied to the multivariate joint transition density p(x, t |Fs , θ), because when N > 1, Zt (θ0 ) =

X1t





XNt

··· −∞

p(x, t |It −1 , θ 0 )dx

(2.6)

−∞

is no longer i.i.d. U [0, 1] even if H0 holds. Hong and Li (2005) suggest using the PIT for each state variable. This is valid, but it does not make full use of the information contained in the joint distribution of Xt . In particular, it may miss important model misspecification in the joint dynamics of Xt . For example, consider the DGP

  κ11 0 θ1 − X1,t dt κ21 κ22 θ2 − X2,t     σ11 0 W1,t + d , 0 σ22 W2,t   where W1,t , W2,t are independent standard Brownian motions and κ21 ̸= 0. Suppose we fit the data using the model      X1,t κ11 0 θ1 − X1,t d = dt X2,t 0 κ22 θ2 − X2,t     σ11 0 W1,t + d . 0 σ22 W2,t 

d

X1,t X2,t





=

Then this model is misspecified because it ignores correlations in drift. Now, following Hong and Li (2005), we calculate the gener alized residuals Z1,t , Z2,t , Z1,t −1 , Z2,t −1 , . . . , where Z1,t and Z2,t are the PITs of X1,t and X2,t with respect to the conditional density models p(X1,t , t |Xt −1 , X2,t , θ) and p(X2,t , t |Xt −1 , θ) respectively, and θ = (κ11 , κ22 , θ1 , θ2 , σ11 , σ22 )′ . Then Hong and Li’s (2005) test will have no power because each of these PITs is an i.i.d. U [0, 1] sequence respectively. As the Fourier transform of the transition density, the CCF can capture the full dynamics of Xt . Let ϕ(u, t , |Fs , θ) be the modelimplied CCF of Xt , conditional on Fs at time s < t; that is,

    ϕ(u, t |Fs , θ) ≡ Eθ exp iu′ Xt |Fs ∫ √   = exp iu′ x p(x, t |Fs , θ)dx, u ∈ RN , i = −1,

271

(2.7)

RN

where Eθ (·|Fs ) denotes the conditional expectation under the model-implied transition density p(x, t |Fs , θ). Note that for a Markov model, the filtration Fs can be replaced by Xs .

H0 : E exp iu′ Xt |Fs = ϕ(u, t |Fs , θ0 ) a.s. for all u ∈ RN









and for some θ0 ∈ 2

(2.8)

versus

HA : E exp iu′ Xt |Fs ̸= ϕ(u, t |Fs , θ)









with positive probability for all θ ∈ 2.

(2.9) Xt tT=11

Suppose we have a discrete random sample { } of size T . For notational simplicity, we set 1 = 1 below, where time is measured in units of the sampling interval of data.7 Also, we denote It −1 = {Xt −1 , Xt −2 , . . .}, the information set available at time t − 1. Define the process Zt (u, θ) ≡ exp iu′ Xt − ϕ(u, t |It −1 , θ),





u ∈ RN .

(2.10)

Then H0 is equivalent to the following martingale difference sequence (MDS) characterization: E [Zt (u, θ0 )|It −1 ] = 0

for all u ∈ RN

and some θ0 ∈ 2, a.s.

(2.11)

We may call Zt (u, θ) a ‘‘CCF-based generalized residual’’ of the continuous-time model M . This can be seen obviously from the auxiliary regression exp iu′ Xt = ϕ(u, t |It −1 , θ0 ) + Zt (u, θ0 ),





(2.12)

where ϕ(u, t |It −1 , θ0 ) is the regression model for the dependent variable exp iu′ Xt conditional on information set It −1 , and Zt (u, θ0 ) is a MDS regression disturbance (under H0 ). In principle, we can always obtain the CCF by the inverse Fourier transform, provided the transition density of Xt is given. Even if the CCF or the transition density has no closed-form, we can accurately approximate the model transition density by using (e.g.) the Hermite expansion method of Ait-Sahalia (2002), the simulation methods of Brandt and Santa-Clara (2002) and Pedersen (1995), or the closed-form approximation method of Duffie et al. (2003), and then calculating the Fourier transform. Nevertheless, our test is most convenient when the CCF has a closed-form. AJD models are a class of continuous-time models with a closedform CCF, developed and popularized by Dai and Singleton (2000), Duffie and Kan (1996), and Duffie et al. (2000). These models have proven fruitful in capturing the dynamics of economic variables, such as interest rates, exchange rates and stock prices. It has been shown (e.g., Duffie et al. (2000)) that for AJDs, the CCF of Xt conditional on It −1 is a closed-form exponential-affine function of Xt −1 :

  ϕ(u, t |It −1 , θ) = exp αt −1 (u) + βt −1 (u)′ Xt −1 ,

(2.13)

where αt −1 : R → R and βt −1 : R → R satisfy the complexvalued Riccati equations: N

N

N

     1  β˙ t = K′1 βt + β′t H1 βt + L1 g βt − 1 , 2     1 ′  α˙ t = K′0 βt + βt H0 βt + L0 g βt − 1 ,

(2.14)

2

with boundary conditions βT (u) = iu and aT (u) = 0.8 7 Our approach can be adapted to the case of 1 ̸= 1 easily. See Singleton (2001, p. 117) for related discussion. 8 Assuming that the spot rate is an affine function of the state vector X and that t Xt follows an affine diffusion, Duffie and Kan (1996) show that the yield of the zero coupon bond has a closed-form CCF. Assuming that the spot rate is a quadratic function of the normally distributed state vector, Ahn et al. (2002) also derive the closed-form CCF for the yield of the zero coupon bond.

272

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

Affine SV models are another popular class of multivariate continuous-time models with a closed-form CCF (e.g., Heston (1993), Bates (1996, 2000), Bakshi et al. (1997), Das and Sundaram (1999)). SV models can capture salient properties of volatility such as randomness and persistence. Basic affine SV models, affine SV models with Poisson jumps and affine SV models with Lévy processes have been widely used in modeling asset return dynamics as they allow for closed-form solutions for European option prices (e.g., Heston (1993), Das and Sundaram (1999), Chacko and Viceira (2003), Carr and Wu (2003, 2004), and Huang and Wu (2003)). To test SV models, where Vt is a latent process, we need to modify the characterization (2.11) to make it operatable. Generally, we put Xt = (X′1,t , X′2,t )′ , where X1,t ⊂ RN1 denotes the observable state variables, X2,t ⊂ RN2 denotes the unobservable state variables, and N1 + N2 = N. Also, partition u conformably as u =  ′ ′ ′ u1 , u2 . Then we can define Z1,t (u1 , θ) = exp iu′1 X1,t − φ(u1 , t |I1,t −1 , θ),



j

j J X2,t −1 j=1 . By the Bayes rule,

step forward to get the new particles { ˆ we have p x2,t −1 |I1,t −1 , θ



 

p x1,t −1 |I1,t −2 , θ







,

where p x2,t −1 |I1,t −2 , θ







p x2,t −1 |x2,t −2 , I1,t −2 , θ p x2,t −2 |I1,t −2 , θ dx2,t −2 .



=

 



We can approximate p x2,t −1 |I1,t −1 , θ up to some proportionality; namely,







ˆ 2,t −1 , Iˆ1,t −2 , θ pˆ x2,t −1 |Iˆ1,t −1 , θ ∝ pˆ x1,t −1 |X 





×

J −



  πtj−1 pˆ x2,t −1 |Xˆ 2,t −2 , Iˆ1,t −2 , θ ,

j =1

  φ(u1 , t |I1,t −1 , θ) ≡ Eθ [exp iu′1 X1,t |I1,t −1 ] = Eθ {ϕ[(u1 , 0 ) , t |It −1 , θ]|I1,t −1 },  I1,t −1 = X1,t −1 , X1,t −2 , . . . is the information set on the observables that is available at time t − 1 and the second equality follows ′

′ ′



from the law of iterated expectations. Then we have E Z1,t (u1 , θ0 ) |I1,t −1 = 0,





for all u1 ∈ RN1

and some θ0 ∈ 2, a.s.

(2.15)

This provides a basis for constructing operational tests for multivariate continuous-time models with partially observable state variables.9 Although the model-implied CCF ϕ (u, t |It −1 , θ), where It −1 = (I1,t −1 , I2,t −1 ), may have a closed-form, the conditional expectation φ(u1 , t |I1,t −1 , θ) generally has no closed-form. However, one can approximate it accurately by using some simulation techniques. For almost all continuous-time models   characterized  by SDEs in the literature, the CCF ϕ u, t |It −1, θ = ϕ u, t |Xt −1, θ is a Markov process. In this case, Eθ {ϕ[(u′1 , 0′ )′ , t |It −1 , θ]|I1,t −1 }







=

}

p x1,t −1 |x2,t −1 , I1,t −2 , θ p x2,t −1 |I1,t −2 , θ

where

=

J

ˆ 2,t −2 }j=1 one The key of this method is to propagate particles {X

ϕ[(u′1 , 0′ )′ , t |X1,t −1 , x2,t −1 , θ]p(x2,t −1 |I1,t −1 , θ)dx2,t −1 ,

where p(x2,t −1 |I1,t −1 , θ) is the model-implied transition density of the unobservable X2,t −1 given the observable information I1,t −1 . Noting that models with latent variables are the leading examples that may yield non-Markov observables in the continuous-time literature, we discuss several popular methods to estimate the model-implied CCF based on observables. We first consider particle filters, which have been developed by Gordon et al. (1993), Pitt and Shephard (1999) and Johannes et al. (2009). The term ‘‘particle’’ was first used by Kitagawa (1996) in this literature to denote the simulated discrete data with random support. Particle filters are the class of simulation filters that recursively approximate the filtering random variable x2,t −1 |I1,t −1 , θ by

ˆ 12,t −1 , Xˆ 22,t −1 , . . . , Xˆ J2,t −1 with discrete probability ‘‘particles’’ X J 1 mass of πt −1 , πt2−1 , . . . , πt −1 . Hence a continuous variable is approximately a discrete one with random  support. These discrete points are viewed as samples from p x2,t −1 |I1,t −1 , θ and as J → ∞, the particles can approximate the conditional density increasingly well. 9 This characterization has been used in Singleton (2001, Equation 50) to estimate affine asset pricing models with unobservable components.

∑J

ˆ 2,t −1 , Iˆ1,t −2 , θ) and where pˆ (x1,t −1 |X

j=1

πtj−1 pˆ (x2,t −1 |Xˆ 2,t −2 ,

Iˆ1,t −2 , θ) can be viewed as the likelihood and prior respectively. As pointed out by Gordon et al. (1993), the particle filters require ˆ 2t −1 can be that the likelihood function can be evaluated and that X ˆ ˆ sampled from pˆ (x2,t −1 |X2,t −2 , I1,t −2 , θ). These can be achieved by using time-discretized solutions to the SDEs.10 To implement particle filters, we can use Johannes et al. (2009) and Pitt and Shephard (1999) algorithm. First we generate a simj J ˆM ˆM ˆ ˆ ulated sample {(X 2,t −2 ) }j=1 , where X2,t −2 = {X2,t −2 , X2,t −2+ 1 , M

. . . , Xˆ 2,t −2+ M −1 } and M is an integer. Then we simulate them one M

step forward, evaluate the likelihood function, and set

πtj−1 =

j ˆ ˆM pˆ [x1,t −1 |(X 2,t −1 ) , I1,t −2 , θ] J ∑ j =1

,

j = 1, . . . , J .

j ˆ ˆM pˆ [x1,t −1 |(X 2,t −1 ) , I1,t −2 , θ] j

J

Finally, we resample J particles with weights {πt −1 }j=1 to obtain a 11

new random sample of size J. It has been shown (e.g., Bally and Talay (1996), Del Moral et al. (2001)) that with both large J and M, this algorithm sequentially generates valid simulated samples from p(x2,t −1 |I1,t −1 , θ). Hence φ(u1 , t |I1,t −1 , θ) can be estimated by Monte Carlo averages.12 The second method to approximate p(x2,t −1 , t − 1|I1,t −1 , θ) is Gallant and Tauchen’s (1998) SNP-based reprojection technique, which can characterize the dynamic response of a partially observed nonlinear system to its past observable history. First, ˆ 1,t −1 }Jt =2 and {Xˆ 2,t −1 }Jt =2 we can generate simulated samples {X from the continuous-time model, where J is a large integer. ˆ 2,t −1 }Jt =2 onto a Hermite Then, we project the simulated data {X series representation of the transition density p(x2,t −1 , t −

ˆ 1,t −1, Xˆ 1,t −2 , . . . Xˆ 1,t −L ), where L denotes a truncation lag order. 1|X 10 There are many discretization methods in practice, such as the Euler scheme, the Milstein scheme, and the explicit strong scheme. See (e.g.) Kloeden et al. (1994) for more discussion. 11 This is called sampling/importance resampling (SIR) in the literature. Alternative methods include rejection sampling and Markov chain Monte Carlo (MCMC) algorithm. See Doucet et al. (2001) and Pitt and Shephard (1999) for more discussion. 12 In a related estimation context, Chacko and Viceira (2003), Jiang and Knight (2002) and Singleton (2001) derive analytical expressions for Eθ {ϕ[(u′1 , 0′ )′ , t |It −1 , θ]|I˜1,t −1 } for some suitable subset I˜1,t −1 of I1,t −1 . For example, Chacko and Viceira (2003) obtain a closed-form expression for E [ϕlog S (u, t |It −1 , θ)| log St −1 ], by integrating out Vt −1 . This is computationally more convenient, but it is less efficient.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

With a suitable choice of L via some information criteria such as AIC or BIC, we can approximate p(x2,t −1 , t − 1|Iˆ1,t −1 , θ) arbitrarily well. The final step is to evaluate the estimated density function at the observed data in the conditional information set. See Gallant and Tauchen (1998) for more discussion. For models whose CCF is exponentially affine in Xt −1 ,13 we can ˆ u1 , t |I1,t −1 , θ). also adopt Bates’ (2007) approach to compute φ( First, at time t = 1, we initialize the CCF of the latent vector X2,t −1 conditional on I1,t −1 at its unconditional CF. Then, by exploiting the Markov property and the affine structure of the CCF, we can evaluate the model-implied CCF conditional on data observed through period t, namely, Eθ [ϕ(u, t |Xt −1 , θ)|I1,t −1 ], and thus an estimator for φ(u1 , t |I1,t −1 , θ) is obtained.

We now propose an omnibus specification test for the adequacy of a multivariate continuous-time model by using the CCF characterization in (2.11) or (2.15). For notational convenience, below we focus on the case where Xt is fully observable. The proposed procedures are readily applicable to the case where only X1,t is observable, with Z1,t (u, θ) and I1,t −1 replacing Zt (u, θ) and It −1 respectively. It is not a trivial task to check (2.11) because the MDS property must hold for each u ∈ RN , and because {Zt (u, θ)} may display serial dependence in higher order conditional moments. Any test for (2.11) should be robust to time-varying conditional heteroskedasticity and higher order moments of unknown form in {Zt (u, θ)}. To check the MDS property of Zt (u, θ), we substantively extend Hong’s (1999) univariate generalized spectrum to a multivariate generalized cross-spectrum.14 The generalized spectrum is an analytic tool for nonlinear time series that embeds the CF in a spectral framework. It can capture nonlinear dynamics while maintaining the nice features of spectral analysis. Because the dimension of It −1 can be infinity, we encounter the so-called ‘‘curse of dimensionality’’ problem in checking the MDS property in (2.11). Fortunately, the generalized spectral approach provides a solution to tackle this difficulty. It checks many lags in a pairwise manner, thus avoiding the ‘‘curse of dimensionality’’. Define the generalized covariance function

  Γj (u, v) = cov[Zt (u, θ), exp iv′ Xt −|j| ],

u, v ∈ RN .

(3.1)

With the generalized cross-covariance Γj (u, v), we can define the generalized cross-spectrum F (ω, u, v) =

1

a ‘‘flat’’ spectrum: F (ω, u, v) = F0 (ω, u, v) ≡

∞ −

2π j=−∞

Γj (u, v) exp (−ijω) ,

ω ∈ [−π , π], u, v ∈ RN ,

(3.2)

where ω is the frequency. This is the Fourier transform of the  generalized covariance function Γj (u, v) and thus contains the same information as Γj (u, v) . An advantage of generalized crossspectral analysis is that it can capture cyclical patterns caused by both linear and nonlinear cross dependence. Examples include volatility spillover, the comovements of tail distribution clustering between state variables, and asymmetric spillover of business cycles cross different sectors or countries. Another attractive feature of F (ω, u, v) is that it does not require the existence of any moment condition on Xt . Under H0 , we have Γj (u, v) = 0 for all u, v ∈ RN and all j ̸= 0. Consequently, the generalized cross-spectrum F (ω, u, v) becomes





13 Examples include AJD models (Duffie et al., 2000) and time-changed Lévy processes (Carr and Wu, 2003, 2004). 14 This is not a trivial extension since Hong’s (1999) test is under an i.i.d. assumption.

1 2π

Γ0 (u, v),

ω ∈ [−π , π], u, v ∈ RN .

(3.3)

We can test H0 by checking whether a consistent estimator for F (ω, u, v) is flat with respect to frequency ω. Any significant deviation from a flat generalized cross-spectrum is evidence of model misspecification. Suppose we have a discretely observed sample {Xt }Tt=1 of size T . Then we estimate the generalized covariance Γj (u, v) by its sample analogue

Γˆ j (u, v) =

3. Omnibus testing

273

1

T −

T − |j| t =|j|+1

ˆ exp iv′ Xt −|j| − ϕˆ j (v) , Zˆt (u, θ) 







u, v ∈ RN ,

(3.4)

where Ď

ˆ ˆ ≡ exp iu′ Xt − ϕ(u, t |I , θ), Zˆt (u, θ) t −1 



Ď

It is the observed information set available at time t that may



ˆ involve certain initial values,  estimator for ∑ θ is a T-consistent θ0 and ϕˆ j (v) = (T − |j|)−1 Tt=|j|+1 exp iv′ Xt −|j| is the empirical unconditional CF of Xt . We note that the information set It −1 = {Xt −1 , Xt −2 , . . .} dates back to past infinity and is not feasible. Thus, when Xt is non-Markov, we may need to assume some ˆ . Therefore, we have to initial values in computing ϕ(u, t |It −1 , θ) Ď replace It −1 with a truncated information set It −1 , which contains some initial values. We provide a condition (see Assumption A.5 in Section 4) to ensure that the use of initial values has no impact on the asymptotic distribution of the proposed test statistics. A consistent estimator for F0 (ω, u, v) is Fˆ0 (ω, u, v) =

1 2π

Γˆ 0 (u, v),

ω ∈ [−π , π], u, v ∈ RN .

(3.5)

Consistent estimation for F (ω, u, v) is more challenging. We use a smoothed kernel estimator Fˆ (ω, u, v) =

1

T −1 −

2π t =1−T

(1 − |j| /T )1/2 k(j/p)Γˆ j (u, v)e−ijω ,

ω ∈ [−π , π], u, v ∈ RN ,

(3.6)

where p ≡ p(T ) → ∞ is a bandwidth or an effective lag order, and k : R → [−1, 1] is a kernel function, assigning weights to various lags. Examples of k(·) include the Bartlett kernel, the Parzen kernel and the Quadratic-Spectral (QS) kernel. In (3.6), the factor (1 − |j| /T )1/2 is a finite sample correction. It could be replaced by unity. Under regularity conditions, Fˆ (ω, u, v) and Fˆ0 (ω, u, v) are consistent for F (ω, u, v) and F0 (ω, u, v) respectively. These estimators converge to the same limit under H0 but they generally converge to different limits under HA , giving the power of the test. We can measure the distance between Fˆ (ω, u, v) and Fˆ0 (ω, u, v) by the quadratic form

πT

∫∫∫

2

π −π

 2 ˆ  F (ω, u, v) − Fˆ0 (ω, u, v) dωdW (u) dW (v)

T −1

=

− j =1

k2 (j/p)(T − j)

∫∫  2 ˆ  Γj (u, v) dW (u) dW (v) ,

(3.7)

where the equality follows by Parseval’s identity, W : RN → R+ is a positive nondecreasing right-continuous function, and the unspecified integrals are all taken over the support of W (·). An example of W (·) is the N (0, IN ) CDF, where IN is a N × N identity

274

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

matrix. The function W (·) can also be a step function, analogous to the CDF of a discrete random vector. Our omnibus test statistic for H0 against HA is a standardized version of (3.7): Qˆ (0, 0) =

 T −1 −

k2 (j/p)(T − j)

j=1

×

]  ∫∫  2 ˆ  ˆ (0, 0), (3.8) D Γj (u, v) dW (u) dW (v) − Cˆ (0, 0)

T −1 −

k2 (j/p)(T − j)−1

∫∫  T 2 − ˆ ˆ  Zt (u, θ)

t =j+1

j=1

 2   × ψˆ t −j (v) dW (u) dW (v) , ˆ (0, 0) = 2 D

T −2 − T −2 −

k2 (j/p)k2 (l/p)

j=1 l=1

∫∫∫∫ ×

dW (u1 ) dW (u2 ) dW (v1 ) dW (v2 )

  1  ×  T − max(j, l)

T − t =max(j,l)+1

2   

ˆ Zˆt (u2 , θ) ˆ ψˆ t −j (v1 )ψˆ t −l (v2 ) , Zˆt (u1 , θ)

iv′ Xt

∑ ′ − ϕ( ˆ v), and ϕ( ˆ v) = T −1 Tt=1 eiv Xt . The ˆ (0, 0) are the approximately mean and factors Cˆ (0, 0) and D ˆ t (v) = e where ψ

Assumption A.6. k : R → [−1, 1] is a symmetric function that is continuous at zero and all points in R except for a finite number of points, with k (0) = 1 and k (z ) ≤ c |z |−b for some b > 12 as z → ∞. Assumption A.7. W : RN → R+ is a nondecreasing right-conti nuous function with RN dW (u) < ∞ and RN ‖u‖4 dW (u) < ∞.  Furthermore, W = w dv for some nonnegative weighting function w and some measure v , where W is absolutely continuous with respect to v and w is symmetric about the origin.

where the centering and scaling factors Cˆ (0, 0) =

Ď

Assumption A.5. Let It be the observed feasible information set available at time t that may involve certain initial values. Then ∑T Ď supu∈RN limT →∞ t =1 E supθ∈2 |ϕ(u, t |It −1 , θ) − ϕ(u, t |It −1 , 2 θ)| ≤ C .

variance of the quadratic form in (3.7). They have taken into account the impact of higher order dependence in the generalized residual {Zt (u, θ0 )}. As a result, the Qˆ (0, 0) test is robust to conditional heteroskedasticity and time-varying higher order conditional moments of unknown form in {Zt (u, θ0 )}. In practice, when W (·) is continuous, Qˆ (0, 0) can be calculated by numerical integration or simulation. This may be computationally costly when the dimension N of Xt is large. Alternatively, one can use a finite number of grid points for u and v, which is equivalent to using a discrete CDF. For example, we can generate finitely many numbers of u and v from the N (0, IN ) distribution. This will reduce the computational cost, but at a cost of some power loss. 4. Asymptotic theory To derive the null asymptotic distribution of the test statistic Qˆ (0, 0) and investigate its asymptotic power property, we impose following regularity conditions. Assumption A.1. A discrete-time sample {Xt }Tt =11 , where 1 ≡ 1 is the sampling interval, is observed at equally spaced discrete times. Assumption A.2. Let ϕ (u, t |It −1 , θ) be the CCF of Xt given It −1 ≡ {Xt −1 , Xt −2 , . . .} for a time series model for Xt . (i) For each θ ∈ 2, each u ∈ RN , and each t, ϕ (u, t |It −1 , θ) is measurable with respect to It −1 ; (ii) for each θ ∈ 2, each u ∈ RN , and each t, ϕ (u, t |It −1 , θ) is twice continuously differentiable with respect ∑T to θ ∈ 2 with probability one; (iii) supu∈RN limT →∞ T −1 t =1 ∂ ϕ(u, t |It −1 , θ)‖2 ≤ C and supu∈RN limT →∞ T −1 E supθ∈2 ‖ ∂θ

∑T

t =1

∂ E supθ∈2 ‖ ∂θ∂θ ′ ϕ (u, t |It −1 , θ) ‖ ≤ C . 2



Assumption A.3. θˆ is a parameter estimator such that T (θˆ − θ∗ ) = OP (1), where θ∗ ≡ p limT →∞ θˆ and θ∗ = θ0 under H0 . ∂ Assumption A.4. {Xt , ϕ (u, t |It −1 , θ0 ) , ∂θ ϕ (u, t |It −1 , θ 0 )} is a strictly stationary β -mixing process with the mixing coefficient |β (l)| ≤ Cl−ν for some constant ν > 2.

Assumption A.1 imposes some regularity conditions on the discretely observed random sample. Both univariate and multivariate continuous-time or discrete-time processes are covered, and we allow but do not require Xt to be Markov. This distinguishes our test from all other tests in the literature. We emphasize that it is important to allow the DGP to be non-Markov, even if one is interested in testing the adequacy of a Markov model. This is because the misspecification of a Markov model may come from not only the improper specification in functional forms, but also the violation of the Markov assumption. The non-Markov assumption for the DGPs of observable state variables is also necessary when some state variables are latent, as is the case of SV models. There are two kinds of asymptotics in the literature on continuous-time models. The first is to let the sampling interval 1 → 0. This implies that the number of observations per unit of time tends to infinity. The second is to let the time horizon T → ∞. As argued by Ait-Sahalia (1996b), the first approach hardly matches the way in which new data are added to the sample. Even if such ultra-high-frequency data are available, market micro-structural problems are likely to complicate the analysis considerably. Hence, like Ait-Sahalia (1996b) and Singleton (2001), we fix the sampling interval 1 and derive the asymptotic properties of our test for an expanding sampling period (i.e., T → ∞). Unlike Ait-Sahalia (1996a,b), however, we do not impose additional conditions on the SDEs for Xt . Instead, we impose conditions directly on the model-implied CCF, which is equivalent to imposing conditions on the model-implied transition density. For these reasons, our approach is more general and the proposed tests are applicable to both continuous-time and discrete-time models. Assumption A.2 imposes regularity conditions on the CCF of the multivariate time series model. As the CCF is the Fourier transform of the transition density, we can ensure Assumption A.2 by imposing the following conditions on the model-implied transition density p (x, t |It −1 , θ): (i) for each t, each x ∈ D, and each θ ∈ 2, p (x, t |It −1 , θ) is measurable with respect to It −1 ; (ii) for each t, each x ∈ D, and each given It −1 , p (x, t |It −1 , θ) is twice continuously differentiable with respect to θ ∈ 2 with probabil∑T ∂ ln p(x, t | ity one; and (iii) supx∈D limT →∞ T −1 t =1 E supθ∈2 ‖ ∂θ

It −1 , θ)‖2 ≤ C , and supx∈D limT →∞ T −1 (x, t |It −1 , θ) ‖ ≤ C .



∑T

t =1

∂ E supθ∈2 ‖ ∂θ∂θ ′ ln p 2

Assumption A.3 requires a T -consistent estimator θˆ under H0 . Examples include asymptotically optimal and suboptimal estimators, such as Gallant and Tauchen’s (1996) EMM, Singleton’s (2001) ML-CCF and GMM-CCF, Ait-Sahalia’s (2002) and Ait-Sahalia and Kimmel’s (2010) approximated MLE, Carrasco et al.’s (2007) C-GMM, and Chib et al.’s (2010) MCMC method. We do not require any asymptotically most efficient estimator or a specified estimator. This is attractive for practitioners given the notorious

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

difficulty of efficient estimation of many multivariate nonlinear time series models. Assumption A.4 is a regularity condition on the temporal de∂ pendence of the process {Xt , ϕ (u, t |It −1 , θ0 ) , ∂θ ϕ(u, t |It −1 , θ0 )}. The β -mixing assumption is a standard condition for discrete time series analysis. In the continuous-time context, almost all continuous-time models in the literature are Markov. For Markov ∂ models, ϕ (u, t |It −1 , θ0 ) and ∂θ ϕ (u, t |It −1 , θ0 ) are functions of Xt −1 . Thus, Assumption A.4 holds if the discretely observed random sample {Xt }Tt=1 is a strictly stationary β -mixing process with |β (l)| ≤ Cl−ν for some constant ν > 2.15 Assumption A.5 is a start-up value condition, which ensures Ď that the impact of initial values (if any) assumed in It −1 is asymptotically negligible. This condition holds automatically for Markov models and easily for many non-Markov models. For simplicity, we illustrate such an initial value condition by a √ discrete-time GARCH model: Xt = εt ht , where ht = ω +α ht −1 + β Xt2−1 and {εt } is i.i.d.N (0, 1). Here we have ϕ (u, t |It −1 , θ) = exp(− 21 u2 ht ), where It −1 = {Xt −1 , Xt −2 , . . .}. Furthermore, we Ď

have It −1 = {Xt −1 , Xt −2 , . . . , X1 , h¯ 0 }, where h¯ 0 is an initial value assumed for h0 . By recursive substitution, we obtain



Ď

var (Xt |It −1 , θ) − var Xt |It −1 , θ



=ω+β

t −2 −

α j Xt2−1−j

j =0

+ βα t −1 h0 − ω − β

t −2 −

α j Xt2−1−j − βα t −1 h¯ 0 .

j =0

It follows that sup

T −

u∈R t =1

 



Ď

2 

E sup ϕ (u, t |It −1 , θ) − ϕ u, t |It −1 , θ 

≤ sup

θ∈Θ

T −

u∈R t =1

E sup θ∈2

    2   1 − exp 1 u2 var (X |I , θ) − var X |IĎ , θ  t t − 1 t   t −1 2   1  ×  2 exp 2 u ht     T − u2 βα t −1 h0 − h¯ 0    ≤C ≤ sup E sup exp u2 ω θ∈2 u∈RN t =1 provided ω > 0, 0 < α, β < 1, α + β < 1 and E h¯ 0 < ∞. Assumption A.6 is the regularity condition on the kernel function. The continuity of k (·) at 0 and k (0) = 1 ensure that the bias of the generalized cross-spectral estimator Fˆ (ω, u, v) vanishes to zero asymptotically as T → ∞. The condition on the tail behavior of k (·) ensures that higher order lags have asymptotically negligible impact on the statistical properties of Fˆ (ω, u, v). Assumption A.6 covers most commonly used kernels. For kernels with bounded support, such as the Bartlett and Parzen kernels, b = ∞. For kernels with unbounded support, b is some finite positive real number. For example, b = 2 for the QuadraticSpectral kernel. Assumption A.7 imposes mild conditions on the function W (·). The CDF of any symmetric distribution with finite fourth moment satisfies Assumption A.7. Note that W (·) can be the CDF of continuous or discrete distribution and w(·) is its Radon–Nikodym 15 Suggested by Hansen and Scheinkman (1995) and Ait-Sahalia (1996a), one set of sufficient conditions for the β -mixing when N = 1 is: (i) limx→l or x→u σ (x, θ) π (x, θ) = 0; and (ii) limx→l or x→u |σ (x, θ) /{2µ (x, θ)−σ (x, θ) [∂σ (x, θ) / ∂ x]} < ∞, where l and u are left and right boundaries of Xt with possibly l = −∞ and/or u = +∞, and π (x, θ) is the model-implied marginal density.

275

derivative with respect to v . If W (·) is the CDF of some continuous distribution, v is Lebesgue measure; if W (·) is the CDF of some discrete distribution, v is some counting measure. This provides a convenient way to implement our tests, because we can avoid high-dimensional numerical integrations by using a finite number of grid points for u and v. This is equivalent to using the CDF of a discrete random vector. We now state the asymptotic distribution of the omnibus test Qˆ (0, 0) under H0 . Theorem 1. Suppose Assumptions A.1–A.7 hold, and p = cT λ for 1+δ < λ < (3 + 4b1−2 )−1 , where 0 < c , δ < ∞ and ν is defined νδ d

in Assumption A.4. Then Qˆ (0, 0) → N (0, 1) under H0 as T → ∞. An important feature of Qˆ (0, 0) is that the use of the estimated ˆ in place of the true unobservable generalized residuals {Zˆt (u, θ)} generalized residuals {Zt (u, θ0 )} has no impact on the limiting distribution of Qˆ (0, 0). One can proceed as if the true parameter value θ0 were known and equal to θˆ . Intuitively, the parametric estimator θˆ converges to θ0 faster than the nonparametric estimator Fˆ (ω, u, v) converges to F (ω, u, v) as T → ∞. Consequently, the limiting distribution of Qˆ (0, 0) is solely determined by Fˆ (ω, u, v), and replacing θ0 by θˆ has no impact √ asymptotically. This delivers a convenient procedure, because any T -consistent estimator can be used. We allow for weakly dependent data and data dependence has some impact on the feasible range of the bandwidth p. The condition on the tail behavior of the kernel function k(·) also has some impact. For kernels with bounded support (e.g., the Bartlett and 6 Parzen), λ < 13 because b = ∞. For the QS kernel (b = 2), λ < 19 . These conditions are mild. Next, we investigate the asymptotic power of Qˆ (0, 0) under HA . Theorem 2. Suppose Assumptions A.1–A.7 hold, and p = cT λ for 0 < λ < 12 and 0 < c < ∞. Then as T → ∞, 1

∞ ∫∫  −  1 Γj (u, v)2 dW (u) dW (v) Qˆ (0, 0) →p √ T D (0, 0) j=1

p2

∫∫∫ π π |F (ω, u, v) − F0 (ω, u, v)|2 = √ 2 D (0, 0) −π × dωdW (u) dW (v) , where D (0, 0) = 2





k (z ) dz 4

∫∫

|Σ0 (u1 , u2 )|2 dW (u1 ) dW (u2 )

0

∫∫ − ∞   Ωj (v1 , v2 )2 dW (v1 ) dW (v2 ) , × j=−∞

and Σ0 (u, v) cov(e

iu′ Xt

,e

     = cov Zt u, θ∗ , Zt v, θ∗ , and Ωj (u, v) = t −|j| ).

iv′ X

The function G (ω, u, v) is the generalized spectral density of the state vector {Xt }. It captures temporal dependence in {Xt }. The dependence of the constant D (0, 0) on G (ω, u, v) is due to the fact that the conditioning variable exp iv′ Xt −|j| is a time series process. Following Bierens (1982) and Stinchcombe and White (1998), N we have that for j > 0, Γ  j (u, v) = 0 for all u, vN ∈ R if ∗ and only if E Zt (u, θ )|Xt −j = 0 a.s. for all u ∈ R . Suppose E Zt (u, θ∗ )|Xt −j ̸= 0 at some lag j > 0 under HA . Then we have

  2   Γj (u, v) dW (u) dW (v) ≥ C > 0 for any function W (·)

that is positive, monotonically increasing and continuous, with unbounded support on RN . As a result, P [Qˆ (0, 0) > C (T )] → 1 for

276

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

any sequence of constants {C (T ) = o(T /p1/2 )}. Thus Qˆ (0, 0) has asymptotic unit  power at any  given significance level α ∈ (0, 1), whenever E Zt (u, θ∗ )|Xt −j is nonzero at some lag j > 0 under HA . Note Markov process Xt , we always have   a multivariate  that for E Zt u, θ∗ |Xt −j ̸= 0 at least for some j > 0 under HA . Hence, Qˆ (0, 0) is consistent against HA when Xt is Markov. Gallant and Tauchen’s (1996) EMM test does not have this property. process Xt , the hypothesis that E   For a non-Markovian Zt (u, θ0 )|Xt −j = 0 a.s. for all u ∈ RN and some θ0 ∈ 2 and all j > 0 is not equivalent to the hypothesis that E [Zt (u, θ0 )|It −1 ] = 0 a.s. for all u ∈ RN and some θ0 ∈ 2. The latter implies the former but not vice versa. This is the price we have to pay for dealing with the difficulty of the ‘‘curse of dimensionality’’. Nevertheless, our test is expected to have power against a wide range of nonMarkovian processes, since we check many lag orders. The use of a large number of lags might cause the loss of power, due to the loss of a large number of degrees of freedom. Fortunately, such power loss is substantially alleviated for Qˆ (0, 0), thanks to the downward weighting by k2 (·) for higher order lags. Generally speaking, the state vector Xt is more affected by the recent events than the remote past events. In such scenarios, equal weighting to each lag is not expected to be powerful. Instead, downward weighting is expected to enhance better power because it discounts past information. Thus, we expect that the power of our test is not so sensitive to the choice of the lag order. This is confirmed by our simulation studies below. 5. Diagnostic testing When a multivariate time series model M is rejected by the omnibus test Qˆ (0, 0), it would be interesting to explore possible sources of the rejection. For example, one may like to know whether the misspecification comes from conditional mean/drift dynamics, conditional variance/diffusion dynamics, or conditional correlations between state variables. Such information, if any, will be valuable in reconstructing the model. The CCF is a convenient and useful tool to gauge possible sources of model misspecification. As is well known, the CCF can be differentiated to obtain conditional moments. We now develop a class of diagnostic tests by differentiating the generalized crossspectrum F (ω, u, v). This class of diagnostic tests can provide useful information about how well a multivariate time series model can capture the dynamics of various conditional moments and conditional cross-moments of state variables. Define an N × 1 index vector m = (m1 , m2 , . . . , mN )′ ,

∑N

=

c =1

(m,0)

(0, v) = cov

If m = (0, 1), then

Γj

(m,0)

   (0, v) = icov X2t − Eθ (X2t |It −1 ), exp iv′ Xt −|j| .

Thus, the choice of |m| = 1 can be used to check misspecifications in the conditional mean dynamics of X1t and X2t respectively. • Case 2: |m| = 2. We have m = (2, 0), (0, 2) or (1, 1). If m = (2, 0),

Γj

(m,0)

   (0, v) = i2 cov X1t2 − Eθ (X1t2 |It −1 ), exp iv′ Xt −|j| .

If m = (0, 2),

Γj

(m,0)

   (0, v) = i2 cov X2t2 − Eθ (X2t2 |It −1 ), exp iv′ Xt −|j| .

Finally, if m = (1, 1),

Γj

(m,0)

(0, v) = i2 cov    × X1t X2t − Eθ (X1t X2t |It −1 ) , exp iv′ Xt −|j| .

We now define the class of diagnostic test statistics as follows: Qˆ (m, 0) =

 T −1 −

∞ −

2π j=−∞

Γj

(m,0)

(0, v) exp (−ijω) ,

 N ∏

 ∫  2  ˆ (m,0)  ˆ ˆ (m, 0), × Γj (0, v) dW (v) − C (m, 0) D

c =1

 − Eθ

     ′  mc  (iXct )  It −1 , exp iv Xt −|j| .  c =1

N ∏

Here, as before, Eθ (·|It −1 ) is the conditional expectation under the model-implied transition density p (x, t |It −1 , θ).

(5.3)

where the centering and scaling factors Cˆ (m, 0) =

T −1 −

k2 (j/p)

j =1

1 T −j

∫  T 2   −   ˆ (m) ˆ 2  ˆ Zt (0, θ) ψt −j (v) dW (v) ,

×

t =|j|+1

T −2 − T −2 −

k2 (j/p)k2 (l/p)

j=1 l=1

∫ ∫  T   − 1  ˆ (m) ˆ 2  × Zt (0, θ)   T − max(j, l) t =max(j,l)+1 2   ˆ ˆ × ψt −j (v1 )ψt −j (v2 ) dW (v1 ) dW (v2 ) , 

(5.1)

(iXct )mc

k2 (j/p)(T − j)

j =1

  ∂ m1 ∂ mN  · · · F (ω, u , v ) m1 mN  ∂ uN ∂ u1 u =0 1

(5.2)

Thus, the choice of |m| = 2 can be used to check model misspecifications in the conditional volatility of state variables as well as their conditional correlations.

mc . Then

where the derivative of the generalized cross-covariance function

Γj

• Case 1: |m| = 1. We have m = (1, 0) or m = (0, 1). If m = (1, 0),    (m,0) Γj (0, v) = icov X1t − Eθ (X1t |It −1 ), exp iv′ Xt −|j| .

ˆ (m, 0) = 2 D

where mc ≥ 0 for all 1 ≤ c ≤ N, and put |m| = we define the generalized cross-spectral derivative F (0,m,0) (ω, 0, v) ≡

To gain insight into the generalized cross-spectral derivative F (0,m,0) (ω, 0, v), we consider a bivariate process Xt = (X1t , X2t )′ and examine the cases of |m| = 1 and |m| = 2 respectively:

and (m) ˆ = Zˆt (0, θ)

N ∏

(iXct )

c =1

mc

− Eθˆ

 N ∏

 (iXct )

mc

|ItĎ−1

.

c =1

To derive the limit distribution of Qˆ (m, 0) under H0 , we impose some moment conditions.

∑T (i) limT →∞ T −1 t =1 E supθ∈2  [ ]2  ∂ ∂ m1  ∂ mN    ∂θ ∂ um1 1 · · · ∂ umN N ϕ (u, t |It −1 , θ) |u=0  ≤ C ;

Assumption A.8.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

(ii) limT →∞ T

∑T −1

E sup

θ∈2 t =1  [ ]2   ∂2 m m 1 N ∂ ∂    ∂θ∂θ′ ∂ um1 1 · · · ∂ umN N ϕ (u, t |It −1 , θ) |u=0  ≤ C ; ∑T (iii) limT →∞ T −1 t =1 E supθ∈2  4  ∂ m1   m · · · ∂ mmN ϕ (u, t |It −1 , θ) |u=0  ≤ C ; N  ∂ u1 1  ∂ uN   8(1+δ)mc (iv) E ΠcN=1 Xct ≤ C.

277

6. Finite sample performance It is unclear how well the asymptotic theory can provide reliable reference and guidance when applied to financial time series data, which is well known to display highly persistent serial dependence. We now investigate the finite sample performance of the proposed tests for the adequacy of a multivariate affine diffusion model. 6.1. Simulation design

m1 mN Assumption A.9. {Xt , ∂ m1 · · · ∂ mN ϕ (u, t |It −1 , θ0 ) ,

∂ ∂ m1 [ ∂θ ∂ um1

···

1

∂ mN m ∂ uN N

∂ u1

∂ uN

ϕ (u, t |It −1 , θ0 ) |u=0 ]} is a strictly stationary β -

mixing process with the mixing coefficient |β (l)| ≤ Cl−ν for some constant ν > 2. Ď

Assumption A.10. Let It be the observed feasible information set available at time t that may involve certain initial values. Then ∑T m1 mN limT →∞ t =1 E supθ∈2 | ∂ m1 · · · ∂ mN ϕ (u, t |It −1 , θ) |u=0



∂ m1 m ∂ u1 1

···

∂ mN m ∂ uN N

∂ u1

∂ uN

ϕ(u, t |ItĎ−1 , θ)|u=0 |2 ≤ C .

Theorem 3. Suppose Assumptions A.1, A.3 and A.6–A.10 hold for +δ some pre-specified m and p = cT λ for 1νδ < λ < (3 + 4b1−2 )−1 , where 0 < c , δ < ∞ and ν is defined in Assumption A.9. Then d

Qˆ (m, 0) → N (0, 1) under H0 as T → ∞. Like the omnibus test Qˆ (0, 0), Qˆ (m, 0) has a convenient asymptotic N (0, 1) distribution and parameter estimation uncertainty√in θˆ has no impact on the asymptotic distribution of Qˆ (m, 0). Any T -consistent estimator can be used. Moreover, different choices of m can examine various specific dynamic aspects of the state vector Xt and thus provide information on how well a multivariate time series model fits various aspects of the conditional distribution of Xt .16 It should be pointed out that the generalized cross-spectral derivative tests may fail to detect certain specific model misspecifications. In particular, they will fail to detect some time-invariant model misspecifications. For example, suppose a time series model assumes a zero conditional correlation between state variables, while there exists a nonzero but time-invariant conditional correlation. Then the derivative of the generalized covariance in (5.2) is identically zero. In this case, the correlation test will fail to detect such time-invariant conditional correlation. Of course, the tests Qˆ (m, 0) will generally have power against time-varying misspecifications. These diagnostic tests are designed to test specification of various conditional moments, i.e., whether the conditional moments of Xt are correctly specified given the discrete sample information. We note that the first two conditional moments differ from the instantaneous conditional mean (drift) and instantaneous conditional variance (squared diffusion). In general, the conditional moments tested here are functions of drift, diffusion and jump (see, e.g., Eqs. (6.2) and (6.3) in Section 6). Therefore, careful interpretation is needed. However, our simulation studies below show that the tests Qˆ (m, 0) with |m| = 1 and 2 are very indicative of drift and diffusion misspecifications respectively.

To examine the size of Qˆ (m, 0) under H0 , we consider the following DGP:

• DGP0 (Uncorrelated Gaussian Diffusion):      κ11 0 0 θ1 − X1t X1t θ2 − X2t dt 0 κ22 0 d X2t = θ3 − X3t X3t 0 0 κ33     σ11 0 0 W1t + 0 σ22 0 d W2t . W3t 0 0 σ33

We set (κ11 , κ22 , κ33 , θ1 , θ2 , θ3 , σ11 , σ22 , σ33 ) = (0.5, 1, 2, 0, 0, 0, 1, 1, 1). We use the Euler scheme to simulate 1000 data sets of the random sample {Xt }Tt =11 at the monthly frequency for T = 250, 500 and 1000 respectively. These sample sizes correspond to about twenty to one hundred years of monthly data. Each simulated sample path is generated using 120 intervals per month. We then discard 119 out of every 120 observations, obtaining discrete observations at the monthly frequency. For each data set, we use MLE to estimate model (6.1), with no restrictions on the intercept coefficients. All data are generated using the Matlab 6.1 random number generator on a PC. With a diagonal matrix κ = diag(κ11 , κ22, κ33 ), DGP0 is an uncorrelated 3-factor Gaussian diffusion process. As shown in Duffee (2002), the Gaussian diffusion model has analytic expressions for the conditional mean and the conditional variance respectively: E (Xt |Xs ) = I − e−κ(t −s) θ + e−κ(t −s) Xs ,



var(Xt |Xs ) =



t



(6.2)

e−κ(t −m) 66′ e−κ (t −m) dm ′

s

2 σ11 σ2 [1 − e2κ11 (s−t ) ], 22 2κ11 2κ22  2 σ33 2κ22 (s−t ) × [1 − e ], [1 − e2κ33 (s−t ) ] , (6.3) 2κ33   where θ = (θ1 , θ2 , θ3 )′ and 6 = diag σ11, σ22 , σ33 . To investigate the power of Qˆ (m, 0) in distinguishing model



= diag

(6.1) from alternative processes, we also generate data from five affine diffusion processes respectively:

• DGP1 [Correlated Gaussian Diffusion, with Constant Correlation in Drift]: X1t d X2t X3t





 =

0.5 −0.5 0.5

 16 In the literature, there have been diagnostic tests based on PITs to gauge possible sources of model rejection (e.g., Diebold et al. (1998), Hong and Li (2005)). As pointed out by Bontemps and Meddahi (2010), however, the diagnostic tests based on the PIT have some drawback: when one rejects a specification of transformed variables, it is difficult to know how to modify the model for original state variables to obtain a correct specification. However, our tests are directly based on state variables and thus avoid this drawback.

(6.1)

+

1 0 0

0 1 0.5 0 1 0

  −X1t −X2t dt −X3t    0 0 2

0 W1t 0 d W2t 1 W3t

;

• DGP2 [Uncorrelated CIR (Cox et al., 1985) Diffusion]:      0.5 0 0 2 − X1t X1t 0 1 0 1 − X2t dt d X2t = X3t 0 0 2 1 − X3t

(6.4)

278

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293



X1t

+ 0

   0 W1t  d W2t ; 0

0

X2t

0

W3t

X3t

0

(6.5)

• DGP3 [Correlated Gaussian Diffusion, with Constant Correlation in Diffusion]: X1t d X2t X3t





 =

0.5 0 0

 +

0 1 0

1 0.5 0.5

  −X1t −X2t dt −X3t   

0 0 2

0 1 0.5

0 W1t 0 d W2t W3t 1

;

(6.6)

• DGP4 [Correlated CIR Diffusion]:      0.5 0 0 2 − X1t X1t 0 1 0 1 − X2t dt d X2t = 1 − X3t X3t 0 0 2      X1t 0 W1t 0    + 0.5X1t d W2t . X2t  0 W3t 0.5 X1t 0.5 X2t X3t • DGP5 [Mixture of Gaussian and CIR Processes]:      X1t 0.5 0 0 2 − X1t 0 1 0 1 − X2t dt d X2t = X3t 0 0 2 −X3t      W1t X1t 0 0  + 0.5 X X 0 d W2t . 1t

0

2t

0

1

specified, but there are dynamic misspecifications in conditional variances and conditional covariances of state variables. DGP5 is a mixture of Gaussian and CIR diffusion processes, where {X1t , X2t } is a correlated 2-factor CIR diffusion process and {X3t } is a univariate Gaussian process. Under DGP5, model (6.1) is misspecified for the conditional variances of X1t and X2t and the conditional covariance between X1t and X2t . The conditional means of X1t , X2t and X3t and the conditional covariances between X1t and X3t , X2t and X3t are correctly specified. For each of DGPs 1–5, we generate 500 data sets of the random sample {Xt }Tt =11 by the Euler scheme, for T = 250, 500 and 1000 respectively at the monthly sample frequency. For each data set, we estimate model (6.1) via MLE. Because model (6.1) is misspecified under all five DGPs, our omnibus test Qˆ (0, 0) is expected to have nontrivial power under DGPs 1–5, provided the sample size T is sufficiently large. We will also examine how diagnostic tests Qˆ (m, 0) for |m| > 0 can reveal information about various model misspecifications. 6.2. Monte Carlo evidence

(6.7)

(6.8)

W3t

With a nondiagonal matrix κ, DGP1 is a correlated 3-factor Gaussian diffusion process, a special case of the canonical A0 (3) defined in Dai and Singleton (2000). Under DGP1, model (6.1) is misspecified for the drift function but is correctly specified for the diffusion function. Eqs. (6.2) and (6.3) show that the conditional means and the conditional variances of X2t and X3t (but not X1t ), and the conditional covariances between X1t , X2t and X3t are misspecified, if model (6.1) is used to fit the data generated from DGP1. However, the misspecifications in the conditional variances and conditional correlations of Xt are time-invariant. DGP2 is an uncorrelated 3-factor CIR diffusion process, which has a non-central χ 2 transition distribution. Under DGP2, model (6.1) is correctly specified for the drift function but is misspecified for the diffusion function because it fails to capture the ‘‘level effect’’. If model (6.1) is used to fit the data generated from DGP2, the conditional means and conditional covariances of X1t , X2t and X3t are correctly specified, but their conditional variances are misspecified. DGP3 is another correlated Gaussian diffusion process, where the correlations between state variables come from diffusions rather than drifts. Under DGP3, model (6.1) is correctly specified for the drift function but is misspecified for the diffusion function. If model (6.1) is used to fit data generated from DGP3, the conditional means of X1t , X2t and X3t are correctly specified, but the conditional variances of X2t and X3t , and the conditional covariances between X1t , X2t and X3t are misspecified. Specifically, model (6.1) assumes zero conditional correlations between state variables, whereas there exist nonzero but time-invariant conditional correlations between state variables under DGP3. DGP4 is a correlated 3-factor CIR diffusion process, where the conditional variances and covariances of state variables depend on state variables. If we use model (6.1) to fit data generated from DGP4, the conditional means of X1t , X2t and X3t are correctly

To reduce computational costs, we generate u and v from a N (0, I3 ) distribution, with each u and v having 30 grid points in R3 respectively. We use the Bartlett kernel, which has a bounded support and is computationally efficient. Our simulation experience suggests that the choices of W (·) and k (·) have little impact on both size and power of the Qˆ (m, 0) tests.17 Like Hong (1999), we use a data-driven pˆ via a plug-in method that minimizes the asymptotic integrated mean squared error of the generalized cross-spectral Fˆ (ω, u, v), with the Bartlett kernel k¯ (·) used in some preliminary generalized cross-spectral estimators. To examine the sensitivity of the choice of the preliminary bandwidth p¯ on the size and power of the tests, we consider p¯ in the range of 10–40.18 Table 1 reports the rejection rates (in terms of percentage) of Qˆ (m, 0) under DGP0 at the 10% and 5% significance levels, using the asymptotic theory. The omnibus Qˆ (0, 0) test tends to underreject when T = 250, but it improves as T increases. The size Qˆ (0, 0) is not very sensitive to the preliminary lag order p. For example, when T = 250, the rejection rate at the 5% level attains its maximum 3.8% at p = 10, and attains its minimum 2.5% at p = 31, 33, 34. For T = 1000, the rejection rate at the 5% level attains its maximum 6.2% at p = 17, 18, 19, 35, and attains its minimum 5.6% at p¯ = 11. We also consider the diagnostic tests Qˆ (m, 0) for |m| = 1, 2, which check model misspecifications in conditional means, conditional variances and conditional correlations of state variables. The Qˆ (m, 0) tests have similar size patterns to Qˆ (0, 0), except that some of them tend to overreject a bit when T = 1000. Overall, both omnibus and diagnostic tests have reasonable sizes at both the 10% and 5% levels and the sizes are robust to the choice of the preliminary lag order p¯ . Next, we turn to examine the power of Qˆ (m, 0). Tables 2–6 report the rejection rates of Qˆ (m, 0) under DGPs 1–5 at the 10% and 5% levels respectively. Under DGP1, model (6.1) ignores nonzero constant correlations in drifts. The omnibus test Qˆ (0, 0) is able to detect such model misspecification. The rejection rate of Qˆ (0, 0) is about 65% at the 5% level when T = 1000. Because only the misspecifications of the conditional means in X2t and X3t are time-varying when model (6.1) is used to fit the data from DGP1, we expect that the mean tests, Qˆ ((0, 1, 0)′ , 0) and √



17 We have tried the Parzen kernel for k (·) and the Uniform(−2 3, 2 3) CDF for W (·), obtaining similar results. 18 We have tried p¯ in the range of 1–10 and results are similar.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

279

Table 1 Sizes of specification tests under DGP0. Lag order

10

T

250

20

α Qˆ (0, 0)

.10 .056

.05 .038

.10 .072

.05 .045

.10 .088

.05 .056

.10 .056

.05 .033

.10 .087

.05 .047

.10 .086

.05 .061

500

1000

250

500

1000

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 0, 1), 0)

.072 .075 .085

.035 .043 .053

.090 .073 .074

.068 .046 .048

.078 .087 .076

.051 .052 .046

.057 .071 .078

.037 .036 .046

.084 .080 .073

.060 .048 .043

.080 .085 .075

.045 .057 .045

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.069 .066 .066

.044 .034 .035

.077 .092 .106

.050 .049 .065

.097 .114 .104

.066 .075 .064

.058 .050 .057

.033 .024 .031

.064 .080 .099

.036 .038 .058

.100 .120 .105

.065 .072 .065

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.090 .079 .098

.057 .049 .052

.096 .094 .080

.057 .063 .049

.106 .115 .106

.068 .075 .075

.078 .071 .076

.049 .044 .038

.089 .090 .075

.054 .061 .046

.113 .111 .107

.075 .072 .076

Lag order

30

Qˆ (0, 0)

.064

.026

.082

.049

.082

.059

.057

40 .029

.086

.047

.093

.059

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 0, 1), 0)

.059 .064 .066

.033 .034 .039

.080 .072 .075

.048 .048 .044

.073 .079 .073

.045 .045 .047

.060 .064 .059

.034 .031 .031

.083 .074 .079

.043 .051 .036

.068 .073 .082

.041 .052 .041

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.049 .041 .052

.028 .019 .019

.061 .070 .088

.031 .036 .055

.092 .114 .101

.055 .066 .054

.043 .035 .042

.020 .016 .020

.059 .062 .077

.025 .030 .050

.090 .104 .093

.049 .061 .054

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.066 .073 .067

.040 .039 .035

.091 .085 .063

.050 .052 .042

.111 .108 .103

.077 .066 .062

.059 .074 .066

.032 .039 .031

.085 .073 .059

.042 .048 .039

.108 .107 .088

.071 .063 .059

Notes: (i) DGP0 is an uncorrelated Gaussian diffusion process, given in Eq. (6.1); (ii) Qˆ (0, 0) is the omnibus test; Qˆ 1 , Qˆ 2 and Qˆ 3 are conditional mean tests, conditional variance tests and conditional correlation tests respectively; (iii) The p values are based on the results of 1000 iterations.

Table 2 Powers of specification tests under DGP1. Lag order

10

T

250

20

α Qˆ (0, 0)

.10 .120

.05 .078

.10 .304

.05 .214

.10 .746

.05 .674

.10 .122

.05 .082

.10 .282

.05 .212

.10 .722

.05 .650

500

1000

250

500

1000

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 0, 1), 0)

.096 .226 .616

.064 .158 .498

.060 .432 .914

.040 .352 .886

.078 .834 1.00

.052 .776 .998

.086 .198 .570

.046 .122 .468

.070 .422 .906

.044 .320 .850

.068 .830 1.00

.052 .770 .998

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.072 .066 .072

.042 .036 .050

.098 .096 .098

.058 .052 .060

.120 .126 .124

.082 .096 .088

.066 .058 .064

.028 .026 .038

.086 .082 .086

.042 .052 .048

.114 .130 .112

.074 .090 .080

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.078 .082 .092

.060 .040 .050

.098 .088 .108

.068 .056 .072

.102 .138 .112

.070 .086 .068

.066 .078 .084

.046 .042 .046

.106 .080 .100

.068 .060 .062

.106 .142 .106

.068 .100 .078

Lag order

30

Qˆ (0, 0)

.112

.072

.272

.200

.690

.618

.114

.064

.260

.200

.656

.568

.078 .170 .530

.044 .096 .430

.078 .388 .876

.058 .284 .832

.064 .796 1.00

.052 .736 .996

.078 .156 .500

.050 .084 .384

.088 .366 .856

.060 .264 .814

.068 .772 .998

.040 .688 .992

.060 .054 .062

.024 .026 .030

.072 .084 .072

.036 .042 .046

.106 .126 .104

.066 .084 .070

.054 .050 .052

.024 .022 .034

.056 .080 .070

.030 .040 .038

.096 .112 .100

.052 .074 .070

.066 .072 .074

.030 .036 .048

.098 .082 .088

.070 .058 .058

.104 .142 .102

.064 .098 .074

.050 .068 .078

.022 .034 .048

.102 .076 .090

.062 .056 .050

.106 .136 .108

.066 .096 .066

Qˆ 1

Qˆ 2

Qˆ 3

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 0, 1), 0) Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0) Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

40

Notes: (i) DGP1 is a correlated Gaussian diffusion process with correlation in drift, given in Eq. (6.4); (ii) Qˆ (0, 0) is the omnibus test; Qˆ 1 , Qˆ 2 and Qˆ 3 are conditional mean tests, conditional variance tests and conditional correlation tests respectively; (iii) The p values are based on the results of 500 iterations.

280

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

Table 3 Powers of specification tests under DGP2. Lag order

10

T

250

20

α Qˆ (0, 0)

.10 .204

.05 .150

.10 .412

.05 .360

.10 .832

.05 .770

.10 .186

.05 .116

.10 .390

.05 .320

.10 .802

.05 .748

500

1000

250

500

1000

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.050 .058 .070

.024 .034 .046

.056 .066 .066

.024 .040 .044

.044 .048 .058

.032 .034 .038

.046 .048 .074

.024 .028 .054

.048 .054 .066

.020 .026 .036

.060 .048 .068

.038 .030 .042

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.952 .958 .764

.922 .924 .678

1.00 1.00 .980

.996 1.00 .970

1.00 1.00 1.00

1.00 1.00 1.00

.932 .922 .680

.900 .882 .576

1.00 1.00 .968

.996 1.00 .942

1.00 1.00 1.00

1.00 1.00 1.00

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.080 .080 .072

.044 .050 .038

.082 .082 .094

.050 .062 .064

.080 .094 .106

.056 .066 .075

.062 .066 .062

.038 .040 .034

.062 .080 .078

.038 .048 .042

.086 .090 .102

.060 .058 .070

Lag order

30

Qˆ (0, 0)

.174

.110

.360

.296

.768

.694

40 .156

.088

.340

.278

.724

.660

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 0, 1), 0)

.038 .048 .078

.024 .024 .050

.042 .054 .052

.024 .032 .036

.058 .048 .068

.026 .024 .044

.042 .044 .070

.018 .028 .046

.044 .056 .050

.020 .030 .028

.054 .046 .072

.030 .026 .042

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.916 .910 .600

.872 .850 .512

.998 1.00 .942

.992 .996 .886

1.00 1.00 1.00

1.00 1.00 1.00

.896 .882 .560

.858 .806 .444

.998 .996 .892

.992 .996 .832

1.00 1.00 1.00

1.00 .998 .998

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.060 .060 .048

.038 .036 .034

.066 .076 .068

.038 .046 .052

.086 .080 .094

.062 .054 .070

.058 .058 .044

.028 .034 .028

.060 .076 .068

.030 .040 .050

.088 .072 .088

.058 .054 .066

Notes: (i) DGP2 is an uncorrelated CIR diffusion process, given in Eq. (6.5); (ii) Qˆ (0, 0) is the omnibus test; Qˆ 1 , Qˆ 2 and Qˆ 3 are conditional mean tests, conditional variance tests and conditional correlation tests respectively; (iii) The p values are based on the results of 500 iterations.

Table 4 Powers of specification tests under DGP3. Lag order

10

T

250

20

α Qˆ (0, 0)

.10 .468

.05 .366

.10 .868

.05 .792

.10 1.00

.05 1.00

.10 .406

.05 .310

.10 .804

.05 .714

.10 .992

.05 .980

500

1000

250

500

1000

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.032 .082 .092

.014 .056 .054

.062 .080 .102

.030 .052 .064

.054 .072 .084

.030 .038 .056

.036 .090 .086

.024 .054 .052

.044 .076 .098

.022 .050 .068

.048 .072 .092

.032 .044 .066

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.074 .080 .094

.060 .052 .060

.076 .122 .096

.040 .088 .060

.128 .116 .084

.078 .072 .052

.066 .068 .084

.038 .040 .040

.076 .110 .072

.036 .080 .042

.122 .094 .074

.084 .056 .046

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.080 .082 .094

.044 .044 .052

.124 .094 .118

.078 .068 .082

.094 .112 .088

.056 .074 .058

.056 .054 .078

.042 .032 .042

.126 .102 .100

.072 .062 .064

.086 .102 .092

.050 .070 .064

Lag order

30

Qˆ (0, 0)

.368

.248

.748

.656

.974

.962

40 .318

.226

.706

.606

.958

.938

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.032 .080 .080

.020 .036 .044

.050 .072 .098

.022 .040 .068

.048 .084 .092

.034 .046 .060

.034 .078 .066

.022 .040 .040

.054 .070 .094

.022 .040 .070

.050 .084 .084

.032 .052 .054

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.056 .062 .058

.028 .028 .024

.068 .104 .070

.028 .066 .040

.108 .082 .074

.070 .050 .042

.044 .054 .044

.022 .022 .012

.062 .090 .070

.026 .060 .032

.102 .078 .066

.072 .046 .044

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.052 .050 .064

.030 .026 .032

.114 .096 .098

.072 .048 .054

.072 .084 .082

.038 .060 .052

.052 .044 .060

.024 .026 .028

.102 .076 .092

.062 .042 .052

.070 .086 .082

.038 .046 .050

Notes: (i) DGP3 is a correlated Gaussian diffusion process with correlation in diffusion, given in Eq. (6.6); (ii) Qˆ (0, 0) is the omnibus test; Qˆ 1 , Qˆ 2 and Qˆ 3 are conditional mean tests, conditional variance tests and conditional correlation tests respectively; (iii) The p values are based on the results of 500 iterations.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

281

Table 5 Powers of specification tests under DGP4. Lag order

10

T

250

20

α Qˆ (0, 0)

.10 .946

.05 .924

.10 .994

.05 .990

.10 1.00

.05 1.00

.10 .914

.05 .886

.10 .988

.05 .988

.10 1.00

.05 1.00

500

1000

250

500

1000

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.036 .068 .092

.026 .052 .058

.052 .074 .094

.034 .044 .070

.114 .090 .078

.070 .064 .054

.036 .078 .090

.022 .044 .048

.060 .056 .076

.026 .040 .056

.090 .088 .086

.062 .058 .044

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.982 .958 .906

.970 .930 .858

1.00 1.00 1.00

1.00 .996 1.00

1.00 1.00 1.00

1.00 1.00 1.00

.972 .944 .846

.942 .896 .780

1.00 .998 .998

1.00 .996 .990

1.00 1.00 1.00

1.00 1.00 1.00

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.834 .832 .800

.768 .768 .722

.992 .986 .978

.980 .974 .956

1.00 1.00 1.00

1.00 1.00 1.00

.800 .786 .762

.712 .716 .682

.982 .978 .970

.972 .968 .952

1.00 1.00 1.00

1.00 1.00 1.00

Lag order

30

Qˆ (0, 0)

.892

.844

.988

.982

1.00

1.00

.868

40 .820

.988

.974

1.00

1.00

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.034 .066 .080

.018 .028 .044

.050 .056 .072

.022 .038 .042

.078 .088 .086

.056 .054 .038

.030 .048 .064

.014 .024 .034

.046 .056 .070

.014 .038 .040

.074 .084 .080

.044 .050 .038

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.954 .916 .802

.926 .854 .720

1.00 .996 .990

.996 .996 .976

1.00 1.00 1.00

1.00 1.00 1.00

.940 .888 .762

.912 .798 .668

.998 .996 .986

.996 .990 .960

1.00 1.00 1.00

1.00 1.00 1.00

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.766 .742 .728

.672 .646 .640

.974 .974 .966

.954 .952 .942

1.00 1.00 1.00

1.00 1.00 1.00

.730 .698 .696

.630 .622 .590

.966 .964 .956

.942 .940 .930

1.00 1.00 1.00

1.00 1.00 1.00

Notes: (i) DGP4 is a correlated CIR diffusion process, given in Eq. (6.7); (ii) Qˆ (0, 0) is the omnibus test; Qˆ 1 , Qˆ 2 and Qˆ 3 are conditional mean tests, conditional variance tests and conditional correlation tests respectively; (iii) The p values are based on the results of 500 iterations.

Table 6 Powers of specification tests under DGP5. Lag order

10

T

250

20

α Qˆ (0, 0)

.10 .438

.05 .366

.10 .726

.05 .676

.10 .970

.05 .960

.10 .458

.05 .366

.10 .728

.05 .670

.10 .970

.05 .958

500

1000

250

500

1000

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.070 .062 .078

.044 .042 .042

.072 .054 .064

.046 .026 .038

.098 .058 .056

.072 .038 .036

.068 .068 .066

.032 .036 .042

.064 .052 .070

.044 .030 .046

.092 .058 .072

.060 .038 .038

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.938 .974 .076

.902 .946 .048

1.00 1.00 .106

.998 1.00 .062

1.00 1.00 .116

1.00 1.00 .078

.918 .946 .066

.888 .910 .038

1.00 1.00 .096

1.00 1.00 .054

1.00 1.00 .116

1.00 1.00 .080

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.750 .094 .072

.648 .062 .044

.982 .096 .112

.964 .064 .076

1.00 .100 .106

1.00 .072 .070

.714 .072 .062

.610 .054 .034

.978 .088 .110

.952 .052 .068

1.00 .108 .110

1.00 .066 .072

Lag order

30

Qˆ (0, 0)

.444

.350

.718

.652

.964

.948

.430

40 .338

.700

.634

.968

.950

Qˆ 1

Qˆ ((1, 0, 0), 0) Qˆ ((0, 1, 0), 0) Qˆ ((0, 1, 0), 0)

.056 .066 .066

.030 .036 .030

.054 .050 .070

.036 .036 .034

.082 .054 .072

.054 .032 .038

.052 .058 .070

.028 .028 .030

.058 .048 .062

.030 .034 .032

.076 .050 .066

.050 .030 .036

Qˆ 2

Qˆ ((2, 0, 0), 0) Qˆ ((0, 2, 0), 0) Qˆ ((0, 0, 2), 0)

.904 .912 .060

.866 .858 .036

1.00 1.00 .096

.996 .996 .052

1.00 1.00 .108

1.00 1.00 .070

.890 .892 .052

.832 .814 .022

1.00 1.00 .086

.990 .996 .044

1.00 1.00 .104

1.00 1.00 .066

Qˆ 3

Qˆ ((1, 1, 0), 0) Qˆ ((0, 1, 1), 0) Qˆ ((1, 0, 1), 0)

.672 .068 .050

.572 .042 .028

.968 .082 .090

.936 .044 .062

1.00 .100 .096

1.00 .066 .066

.644 .058 .040

.522 .032 .020

.960 .066 .088

.912 .040 .052

1.00 .094 .088

1.00 .068 .064

Notes: (i) DGP5 is a mixture of Gaussian and CIR Processes, given in Eq. (6.8); (ii) Qˆ (0, 0) is the omnibus test; Qˆ 1 , Qˆ 2 and Qˆ 3 are conditional mean tests, conditional variance tests and conditional correlation tests respectively; (iii) The p values are based on the results of 500 iterations.

282

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

Qˆ ((0, 0, 1)′ , 0), will be powerful but the variance and covariance tests Qˆ (m, 0), where |m| = 2, will have no power. This is indeed confirmed in our simulation. Table 2 shows that the Qˆ ((0, 1, 0)′ , 0) and Qˆ ((0, 0, 1)′ , 0) tests are able to capture time-varying mean misspecifications in X2t and X3t , as their rejection rates are about 75% and 99% respectively at the 5% level when T = 1000.19 However, the rejection rates of all variance and covariance tests Qˆ (m, 0) for |m| = 2 are close to significance levels. Under DGP2, model (6.1) ignores the so-called ‘‘level effect’’ in diffusions. The omnibus test Qˆ (0, 0) has good power under DGP2, with a rejection rate of 75% at the 5% level when T = 1000. Note that the omnibus test Qˆ (0, 0) is less powerful than the conditional variance tests, because Qˆ (0, 0) has to check all possible directions whereas only the conditional variances are misspecified under DGP2. The variance tests Qˆ ((2, 0, 0)′ , 0), Qˆ ((0, 2, 0)′ , 0) and Qˆ ((0, 0, 2)′ , 0) have excellent power, which increases with T and approaches unity when T = 1000. Interestingly, the powers of the diagnostic tests for conditional means and conditional covariances are close to significance levels, indicating that these diagnostic tests do not overreject correctly specified conditional means and conditional covariances of state variables. Under DGP3, model (6.1) is correctly specified for both conditional means and conditional variances of state variables but is misspecified for conditional correlations between state variables, because it ignores the nonzero but time-invariant correlations in diffusions. As expected, the omnibus test Qˆ (0, 0) has excellent power when model (6.1) is used to fit the data generated from DGP3. The power of Qˆ (0, 0) increases significantly with the sample size T and approaches unity when T = 1000. However, our correlation diagnostic tests fail to capture the misspecifications in correlations, because they are time-invariant. Under DGP4, model (6.1) is correctly specified for conditional means of state variables but is misspecified for conditional variances and conditional correlations of state variables, because it ignores both the ‘‘level effect’’ and time-varying conditional correlations in diffusions. The omnibus test Qˆ (0, 0) has excellent power when (6.1) is used to fit data generated from DGP4. This is consistent with the fact that DGP4 deviates most from model (6.1). The Qˆ (m, 0) tests with |m| = 2 have good power in detecting misspecifications in conditional variances and correlations of state variables. The mean tests Qˆ (m, 0) with |m| = 1 have no power, because the conditional means of state variables are correctly specified. Under DGP5, model (6.1) is correctly specified for conditional means of state variables but is misspecified for conditional variances of X1t and X2t and the conditional correlation between X1t and X2t . Again, the omnibus test Qˆ (0, 0) has good power under DGP5, with a rejection rate of 95% at the 5% level when T = 1000. The variance tests Qˆ ((2, 0, 0)′ , 0), Qˆ ((0, 2, 0)′ , 0) and the covariance test Qˆ ((1, 1, 0)′ , 0) have excellent power, which increases with T and attains unity when T = 1000. The powers of other diagnostic tests are close to significance levels, because those conditional moments are correctly specified. To sum up, we observe:

• The Qˆ (m, 0) tests have reasonable sizes in finite samples. Although the omnibus test Qˆ (0, 0) tends to underreject when T = 250, it improves dramatically as the sample size T increases. The sizes of all tests Qˆ (m, 0) are robust to the choice 19 The power of the mean test Qˆ ((1, 0, 0) , 0) for the first state variable X is close 1t to the significance level. This is expected because the drift function of X1t is correctly specified.

of a preliminary lag order used to estimate the generalized cross-spectrum.

• The omnibus test Qˆ (0, 0) has good omnibus power in detecting various model misspecifications. It has reasonable power even when the sample size T is as small as 250. This demonstrates the nice feature of the proposed cross-spectral approach, which can capture various model misspecifications.

• The diagnostic tests Qˆ (m, 0) can check various specific aspects of model misspecifications. Generally speaking, the mean tests, Qˆ (m, 0), with |m| = 1, can detect misspecification in drifts; the variance and covariance tests Qˆ (m, 0), with |m| = 2, can check misspecifications in variances and covariances respectively. However, the correlation tests fail to detect neglected nonzero but time-invariant conditional correlations in diffusions. 7. Conclusion The CCF-based estimation of multivariate continuous-time models has attracted increasing attention in econometrics. We have complemented this literature by proposing a CCF-based omnibus specification test for the adequacy of a multivariate continuous-time model, which has not been attempted in the previous literature. The proposed test can be applied to a variety of univariate and multivariate continuous-time models, including those with jumps. It is also applicable to testing the adequacy of discrete-time dynamic multivariate distribution models. The most appealing feature of our omnibus test is that it fully exploits the information in the joint dynamics of state variables and thus can capture misspecification in modeling the joint dynamics, which may be easily missed by existing procedures. Indeed, when the underlying economic process is Markov, our omnibus test is consistent against any type of model misspecification. We assume that the DGP of state variables may not be Markov. This not only ensures the power of the proposed tests against a wider range of misspecification but also makes our approach applicable to testing models with latent variables, such as SV models. Our omnibus test is supplemented by a class of diagnostic procedures, which is obtained by differentiating the CCF and focuses on various specific aspects of the joint dynamics such as whether there exists neglected dynamics in conditional means, conditional variances, and conditional correlations of state variables respectively. Such information is useful for practitioners in reconstructing a misspecified model. Our procedures are most useful when the CCF of a multivariate time series model has a closed-form, as are the class of AJD models and the class of time-changed Lévy processes that have been commonly used in the literature. All test statistics follow a convenient asymptotic N (0, 1) distribution, and they are applicable to various estimation methods, including suboptimal consistent estimators. Moreover, parameter estimation uncertainty has no impact on the asymptotic distribution of the test statistics. Simulation studies show that the proposed tests perform reasonably well in finite samples. Acknowledgments We thank the co-editor, A. Ronald Gallant, and two referees for careful and constructive comments. We also thank Yacine AitSahalia, Zongwu Cai, Jianqing Fan, Jerry Hausman, Roger Klein, Chung-ming Kuan, Arthur Lewbel, Norm Swanson, Yiu Kuen Tse, Chunchi Wu and Zhijie Xiao, and seminar participants at Boston College, Rutgers University and Singapore Management University for their comments and discussions. Any remaining errors are solely ours.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

Appendix. Mathematical appendix

ˆ T . Also, C ∈ (1, ∞) denotes a generic bounded sample {Zˆt (u, θ)} t =1 constant. Proof of Theorem 1. The proof of Theorem 1 consists of the proofs of Theorems A.1 and A.2 below.  Theorem A.1. Under the conditions of Theorem 1, Qˆ (0, 0) − p

Q˜ (0, 0) → 0. Theorem A.2. Under the conditions of Theorem 1 and q = p

(ln2 T )

1+ 4b1−2

d

, Q˜ (0, 0) → N (0, 1).

Proof of Theorem A.1. Put Tj ≡ T − |j|, and let Γ˜ j (u, v) be defined

ˆ replaced by in the same way as Γˆ j (u, v) in (3.4), with Zˆt (u, θ) p

Zt (u, θ0 ). To show Qˆ (0, 0) − Q˜ (0, 0) → 0, it suffices to show 1

ˆ − 2 (0, 0) D

∫∫ − T −1



k2 (j/p)Tj |Γˆ j (u, v)|2 − |Γ˜ j (u, v)|2

T   −   ˆ − Zt (u, θ0 ) + ϕ (v) − ϕˆ j (v) Tj−1 Zˆt (u, θ) t =j +1

= Bˆ 1j (u, v) + Bˆ 2j (u, v), say. (A.3)  ∑2 ∑T −1 2 2 |Bˆ aj (u, v)| dW (u) It follows that Aˆ 1 ≤ 2 a=1 j=1 k (j/p)Tj dW (v). Proposition A.1 follows from Lemmas A.1 and A.2 below, and p/T → 0.   ∑T −1 2 |Bˆ 1j (u, v)|2 dW (u)dW (v) = OP (1). Lemma A.1. j=1 k (j/p)Tj  ∑T −1 2 |Bˆ 2j (u, v)|2 dW (u)dW (v) = OP Lemma A.2. j=1 k (j/p)Tj (p/T ). We now show these lemmas. Throughout, we put aT (j) ≡ k2 (j/p)Tj−1 . Proof of Lemma A.1. A second order Taylor series expansion yields Bˆ 1j (u, v) = −(θˆ − θ0 )′ Tj−1

 1

− (θˆ − θ0 )′ Tj−1 2

p

× dW (u)dW (v) → 0,

(A.1)

1

ˆ (0, 0) − D˜ (0, 0)] = oP (1) C˜ (0, 0)] = OP (T − 2 ) and p−1 [D are straightforward. We note that it is necessary to obtain the 1

convergence rate OP (pT − 2 ) for Cˆ (0, 0) − C˜ (0, 0) to ensure that replacing Cˆ (0, 0) with C˜ (0, 0) has asymptotically negligible impact given p/T → 0. To show (A.1), we first decompose

  ∂2 ¯ ′ ϕ u, t |It −1 , θ ∂θ∂θ t =j +1 T −

−ϕ

Ď u, t |It −1 , θˆ



= −Bˆ 11j (u, v) − Bˆ 12j (u, v) + Bˆ 13j (u, v) ,

∫∫ − T −1 j =1

 

T −1 −

k2 (j/p) Tj

j =1

(A.4)

∫∫  2  ˆ B12j (u, v) dW (u) dW (v)  T

−1

T −

 2  ∂ sup sup   ∂θ∂θ′ N θ∈2

t =1 u∈R

 2   T −1  −  × ϕ (u, t |It −1 , θ)  αT (j)  j =1

(A.2)

×dW (u) dW (v) = OP (p/T ) ,

2 

k2 (j/p)Tj Γˆ j (u, v) − Γ˜ j (u, v)

where we made use of the fact that T −1 −

× dW (u)dW (v) , ∫∫ − T −1   Aˆ 2 = k2 (j/p)Tj Γˆ j (u, v) − Γ˜ j (u, v)

aT (j) =

j =1

T −1 −

k2 (j/p)Tj−1 = O(p/T ),

(A.5)

j=1

given p = cT λ for λ ∈ (0, 1), as shown in Hong (1999, (A.15), p. 1213). For the third term in (A.4), we have

j =1

× Γ˜ j (u, v)∗ dW (u) dW (v), where Re (Aˆ 2 ) is the real part of Aˆ 2 and Γ˜ j (u, v)∗ is the complex conjugate of Γ˜ j (u, v). Then, (A.1) follows from Propositions A.1 and A.2 below, and p → ∞ as T → ∞.  Proposition A.1. Under the conditions of Theorem 1, Aˆ 1 = OP (1). − 12

Proposition A.2. Under the conditions of Theorem 1, p ′

say,

for some θ¯ between θˆ and θ0 . For the second term in (A.4), we have

where Aˆ 1 =

  [ϕ u, t |It −1 , θˆ

]ψt −j (v)

j=1

× dW (u)dW (v) = Aˆ 1 + 2 Re(Aˆ 2 ),

T −

t =j +1



√ 4 ∫ ∫   ˆ ≤ C  T (θ − θ0 )

  k2 (j/p)Tj |Γˆ j (u, v)|2 − |Γ˜ j (u, v)|2

T − ∂ ϕ (u, t |It −1 , θ0 ) ψt −j (v) ∂θ t =j +1

× ψt −j (v) (θˆ − θ0 ) + Tj−1

1

ˆ (0, 0) − D˜ (0, 0)] = p−1 [Cˆ (0, 0) − C˜ (0, 0)] = OP (T − 2 ), and p−1 [D ˜ (0, 0) are defined in the same way as oP (1), where C˜ (0, 0) and D ˆ replaced by Zt (u, θ0 ). For ˆ (0, 0) in (3.8), with Zˆt (u, θ) Cˆ (0, 0) and D space, we focus on the proof of (A.1); the proofs for p−1 [Cˆ (0, 0) −



ˆ − Zt (u, θ0 ) ψt −j (v) Zˆt (u, θ)

t =j +1

j =1

∫∫ − T −1

T  −

Γˆ j (u, v) − Γ˜ j (u, v) = Tj−1

Throughout the appendix, we let Q˜ (0, 0) be defined in the same way as Qˆ (0, 0) in (3.8) with the unobservable generalized residual sample {Zt (u, θ0 )}Tt=1 replacing the estimated generalized residual

1 2b−1

283

T −1 − j =1

∫∫  2 ˆ  k (j/p) Tj B13j (u, v) dW (u) dW (v) 2

≤4

T −1 − j=1

∫∫  − T     aT (j) ϕ u, t |It −1 , θˆ t =j+1

p

Aˆ 2 → 0.

Proof of Proposition A.1. Put ψt (v) ≡ eiv Xt − ϕ (v) and ϕ(v) ≡ ′ E (eiv Xt ). Then straightforward algebra yields that for j > 0,

−ϕ



Ď u, t |It −1 , θˆ

  

2 dW (u) dW (v) = OP (p/T ) ,

where we have used Assumptions A.5–A.7.

(A.6)

284

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

= Bˆ 21j (u, v) + Bˆ 22j (u, v) ,

For the first term in Eq. (A.4), we have

∫∫  2 − ˆ  k2 (j/p) Tj B11j (u, v) dW (u) dW (v)

T −1 −

j=1

T −1 √ 2 −   ≤  T (θˆ − θ0 ) k2 (j/p)

∫∫  2 ˆ  B21j (u, v) dW (u) dW (v)

k2 (j/p) Tj

j =1

j =1

T −1 −



2 ∫∫  T     −1 − ∂ ϕ (u, t |It −1 , θ0 ) ψt −j (v) × Tj   ∂θ t =j+1 × dW (u) dW (v) = OP (1) ,

say.

For the first term, we have

T −1

k (j/p) Tj 2

∫∫

  ϕ (v) − ϕˆ j (v)2 T −1 j

j =1 T   2 −   ϕ (u, t |It −1 , θ0 ) − ϕ u, t |It −1 , θˆ 

×

t =j +1

(A.7)

∂ as is shown below: Put ηj (u, v) ≡ E [ ∂θ ϕ (u, t |It −1 , θ0 ) ψt −j (v)] ∂ = cov[ ∂θ ϕ (u, t |It −1 , θ0 ) , ψt −j (v)]. Then we have supu,v∈R2N Σj∞ =1

  ηj (u, v) ≤ C by Assumption A.4. Next, expressing the moments

by cumulants via well-known formulas (e.g., Hannan (1970, (5.1), p. 23), for real-valued processes), we can obtain

× dW (u) dW (v) = OP (p/T ) , given Assumption A.4 and

T   2  2 −     ϕ (u, t |It −1 , θ0 ) − ϕ u, t |It −1 , θˆ  ≤ Tj θˆ − θ0  Tj−1

t =j +1

 2 T    −1 − ∂  Tj E Tj ϕ (u, t |It −1 , θ0 ) ψt −j (v) − ηj (u, v)   ∂θ t =j+1

T −

×

 2 ∂   sup sup  ϕ (u, t |It −1 , θ)  = OP (1). ∂θ N θ∈2

t =1 u∈R

  Tj −   ∂   ⩽ cov ∂θ ϕ u, max(0, τ ) + 2|Imax(0,τ )+1 , θ0 ,

For the second term, we have T −1 −

τ =−Tj

 ′  ∂  ϕ u, max(0, τ ) + 2 − τ |Imax(0,τ )+1−τ , θ0   ∂θ

k2 (j/p) Tj

j =1

Tj −    ηj+|τ | (u, −v) ηj−|τ | (u, v) × |Ωτ (v, −v)| +



T −1 −

∫∫  2 ˆ  B22j (u, v) dW (u) dW (v)

k2 (j/p) Tj

∫∫

  ϕ (v) − ϕˆ j (v)2 T −1 j

 



j =1

τ =−Tj

T −

×

Tj

−   κj,|τ |,j+|τ | (u, v) + τ =−Tj

(A.8)

2 

Ď

sup sup ϕ (u, t |It −1 , θ) − ϕ u, t |It −1 , θ 

t =j+1 u∈R

⩽ C,

(A.9)

2  where we made use of the fact that E ϕ (v) − ϕˆ j (v) ≤ CTj−1

N

θ∈2

× dW (u) dW (v) = OP (p/T ) ,

(A.10)

given Assumption A.4, where κj,l,τ (v) is the fourth order cumulant ∂ of the joint distribution of the process { ∂θ ϕ (u, t |It −1 , θ0 ) , ψt −j (v),

 2 where we made use of the fact that E ϕ (v) − ϕˆ j (v) ≤ CTj−1

Consequently, from (A.5), (A.8), |k(·)| ≤ 1, and p/T → 0, we have

Proof of Proposition A.2. Given the decomposition in (A.3), we have

∂ ϕ ∂θ

T −1 − j=1

(u, t |It −1 , θ0 ) , ψt∗−j (v)}. See also (A.7) of Hong (1999, p. 1212).

2 ∫∫  T    −1 − ∂  k (j/p)E ϕ (u, t |It −1 , θ0 ) ψt −j (v) Tj   ∂θ t =j +1 2

× dW (u)dW (v) T −1 ∫ ∫ T −1 − −   ηj (u, v)2 dW (u)dW (v) + C ≤C aT (j) j =1

given Assumptions A.4–A.7. The desired result of Lemma A.2 follows from (A.9) and (A.10). 

2   −  ˆ  |Bˆ aj (u, v)||Γ˜ j (u, v)|, [Γj (u, v) − Γ˜ j (u, v)]Γ˜ j (u, v)∗  ≤ a=1

where the Bˆ aj (u, v) are defined in (A.3), a = 1, 2. We first consider the term with a = 2. By the Cauchy–Schwarz inequality, we have

j=1

= OP (1) + OP (p/T ) = OP (1). Hence (A.7) is OP (1). The desired result of Lemma A.1 follows from (A.5)–(A.7).  Proof of Lemma A.2. We first decompose Bˆ 2j (u, v) to

T −1 −

k (j/p)Tj 2

∫∫

|Bˆ 2j (u, v)||Γ˜ j (u, v)|dW (u)dW (v)

j =1

 ≤

T −1 −

k (j/p)Tj 2

∫∫

 12 2 ˆ |B2j (u, v)| dW (u)dW (v)

j =1

 Bˆ 2j (u, v) = ϕ (v) − ϕˆ j (v) Tj−1 T    − ϕ (u, t |It −1 , θ0 ) − ϕ u, t |It −1 , θˆ × 

t =j +1

t =j +1

− ϕ

Ď u, t |It −1 , θˆ

×

T −1 −



k (j/p)Tj 2

∫∫

 21 |Γ˜ j (u, v)| dW (u)dW (v) 2

j =1 1

T    −   + ϕ (v) − ϕˆ j (v) Tj−1 ϕ u, t |It −1 , θˆ





1

1

1

= OP (p 2 /T 2 )OP (p 2 ) = oP (p 2 ),

(A.11)

given Lemma A.2, and p/T → 0, where p

−1

∑T −1 j =1

k (j/p)Tj 2



|Γ˜ j (u, v)|2 dW (u)dW (v) = OP (1) by Markov’s inequality, the m.d.s. hypothesis of Zt (u, θ0 ), and (A.5).

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

For a = 1, by (A.4) and the triangular inequality, we have T −1 −

∫∫

k2 (j/p)Tj



 2   ≤ CT θˆ − θ0  T −1

|Bˆ 1j (u, v)‖Γ˜ j (u, v)|dW (u)dW (v)

j =1

×



T −1 −

k2 (j/p)Tj

∫∫

+

k (j/p)Tj 2

∫∫

k2 (j/p)Tj

T −1 −

j=1

∫∫ ×

k2 (j/p)Tj

j =1

|Bˆ 13j (u, v)‖Γ˜ j (u, v)|dW (u)dW (v) .

(A.12)

≤C

For the first term in (A.12), we have T −1 − j =1

t =j+1 u∈R

  Γ˜ j (u, v) dW (u)dW (v)

by the Cauchy–Schwarz inequality, Markov’s inequality, Assumptions A.2, A.3, A.6 and A.7, and E |Γ˜ j (u, v) |2 ≤ CTj−1 . For the third term in (A.12), we have

|Bˆ 12j (u, v)‖Γ˜ j (u, v)| T −1 −

k2 (j/p)

∫∫

  1 = OP p/T 2 ,

j =1

× dW (u)dW (v) +

T −1 − j =1

|Bˆ 11j (u, v)‖Γ˜ j (u, v)|dW (u)dW (v)

j =1 T −1 −

285

  2 T −   ∂  sup sup  ′ ϕ (u, t |It −1 , θ)  ∂θ∂θ N θ∈2

∫∫    ˆ  k (j/p)Tj B11j (u, v) Γ˜ j (u, v) dW (u)dW (v)

×

2

T −

∫∫    ˆ  B13j (u, v) Γ˜ j (u, v) dW (u)dW (v)  



Ď

 

sup sup ϕ (u, t |It −1 , θ) − ϕ u, t |It −1 , θ 

N t =1 u∈R θ∈2

T −1 −

k2 (j/p)

∫∫

  Γ˜ j (u, v) dW (u)dW (v)

j =1 1

= OP (p/T 2 ),

T −1  −   k2 (j/p)Tj ≤ θˆ − θ0  j =1

 ∫∫  T    −1 − ∂  × ϕ (u, t |It −1 , θ0 ) ψt −j (v) Tj   ∂θ t =j +1   × Γ˜ j (u, v) dW (u)dW (v) 1

1

= OP (1 + p/T 2 ) = oP (p 2 ), given p → ∞, p/T → 0, Assumptions A.2, A.3, A.6 and A.7, and Tj E |Γ˜ j (u, v)|2 ≤ C . Note that we have made use of the fact that

   T     −1 − ∂  ˜ E Tj ϕ (u, t |It −1 , θ0 ) ψt −j (v) Γj (u, v)   ∂θ t =j +1   2  21 T   − ∂   ≤ E Tj−1 ϕ (u, t |It −1 , θ0 ) ψt −j (v)    ∂θ t =j+1   2  12 × E Γ˜ j (u, v) [ ] 1   − − 12   Tj 2 , ≤ C ηj (u, v) + CTj

by the Cauchy–Schwarz inequality, Markov’s inequality, Assumptions A.2, A.3 and A.5–A.7, and E |Γ˜ j (u, v) |2 ≤ CTj−1 . Hence, we have T −1 −

k2 (j/p)Tj

∫∫

|Bˆ 1j (u, v)||Γ˜ j (u, v)|dW (u)dW (v)

j =1

      1 1 1 = OP 1 + p/T 2 + OP p/T 2 + OP p/T 2  1 = oP p 2 .

(A.13)

Combining (A.11) and (A.13) then yields the result of this proposition.  Proof of Theorem A.2. We first state a lemma, which will be used in the proof.  n Lemma A.3. Suppose that Im are the σ -fields generated by a stationary β -mixing process Xi with mixing coefficient β (i). For some t positive integers m let yi ∈ Isii where s1 < t1 < s2 < t2 < · · · < tm and suppose ti − si > τ for all i. Assume that

‖yi ‖ppii = E |yi |pi < ∞, for some pi > 1 for which Q =

by (A.7), and consequently,

∫∫  T −1 T  −  ˆ   −1 − ∂ k2 (j/p)Tj E ϕ θ − θ0  Tj  ∂θ j=1 t =j +1     × (u, t |It −1 , θ0 ) ψt −j (v) Γ˜ j (u, v) dW (u)dW (v)  ∫ ∫ T −1 −   ηj (u, v) dW (u)dW (v) ≤C

< 1. Then     l l l  ∏  ∏ ∏   ‖yi ‖pi . yi − E (yi ) ≤ 10 (l − 1) β (1−Q ) (τ ) E  i=1  i =1 i=1 ∑l

1 i=1 pi

Proof of Lemma A.3. See Theorem 5.4 of Roussas and Ioannides (1987).  Let q = p A.4 below.

1+ 4b1−2

1

(ln2 T ) 2b−1 . We shall show Propositions A.3 and

j =1 1

+ CT − 2

T −1 −

Proposition A.3. Under the conditions of Theorem 1, 1

k2 (j/p) = O(1 + p/T 2 ),

j =1

given |k(·)| ≤ 1 and Assumption A.7. For the second term in (A.12), we have T −1 − j =1

k2 (j/p)Tj

∫∫    ˆ  B12j (u, v) Γ˜ j (u, v) dW (u)dW (v)

1

p− 2

T −1 −

k2 (j/p)Tj

∫∫

  Γ˜ j (u, v)2 dW (u) dW (v)

j =1

= p−1/2 C˜ (0, 0) + p−1/2 V˜ q + oP (1),  ∑T ∑q ∑t −2q−1 ∗ where V˜ q = Zt (u, θ0 ) j=1 aT (j)ψt −j (v)[ s=1 Zs t =2q+2 (u, θ0 ) ψs∗−j (v)]dW (u) dW (v).

286

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293 t −2q−1

˜ −1/2 (0, 0) Proposition A.4. Under the conditions of Theorem 1, D V˜ q → N (0, 1).



k2 (j/p)Tj

∫∫

≡ V˜ q + R˜ 4 ,

q; and the second term R˜ 4 is contributed by the lag orders j > q. It follows from (A.14)–(A.17) that

2 ∫ ∫  − T −1 T  −   = aT (j) Zt (u, θ0 ) ψt −j (v) dW (u)dW (v)    j =1 t =j +1   2 ∫∫  − T −1 T   − 2   + aT (j) Zt (u, θ0 ) ϕ (v) − ϕˆ j (v)    j =1 t =j +1

T −1 −

j =1

×

1 (0, 0) − C˜ (0, 0)] = oP (1) and p− 2 R˜ a = oP (1) given q =  −1 1 1 1+ p 4b−2 (ln2 T ) 2b−1 and p = cT λ for 0 < λ < 3 + 4b1−2 . 

Lemma A.4. Let C˜ 1 (0, 0) be defined as in (A.15). Then C˜ 1 (0, 0) − 1

C˜ (0, 0) = OP (p/T 2 ).

t =j +1

(A.14)

Next we write



aT (j)

∫∫ − T

Lemma A.6. Let R˜ 2 be defined as in (A.14). Then R˜ 2 = OP (p/T 2 ).

 2 |Zt (u, θ0 )|2 ψt −j (v)

1

Lemma A.7. Let R˜ 3 be defined as in (A.16). Then R˜ 3 = OP (qp/T 2 ).

t =j+1

j =1

× dW (u)dW (v) + 2 Re

T −1 −

Lemma A.8. Let R˜ 4 be defined as in (A.17). Then R˜ 4 = OP (p2b ln T /q2b−1 ).

aT (j)

j=1

×

∫∫ − T t −1 −

Proof of Lemma A.4. By Markov’s inequality and E |C˜ 1 (0, 0) − ∑ T −1 C˜ (0, 0)| ≤ Cp/T 1/2 given j=1 (j/p)aT (j) = O(p/T ) and E |ϕ(v) −

Zt (u, θ0 ) Zs∗ (u, θ0 ) ψt −j (v)ψs∗−j (v)

t =j+2 s=j+1

× dW (u) dW (v) ≡ C˜ 1 (0, 0) + 2 Re(U˜ ),

(A.15)

where we further decompose U˜ =

Lemma A.5. Let R˜ 1 be defined as in (A.14). Then R˜ 1 = OP (p/T ). 1

T −1

˜q = M

∫∫ T −

Zt (u, θ0 )

t −1 −

t =2q+2

aT (j)ψt −j (v)

j =1

ϕˆ j (v)|4 = O(T −2 ), which are shown in Hong (1999) and below respectively. Let t1 , . . . , t4 be distinct integers and |j| + 1 ≤ ti ≤ T , let |j| + 1 ≤ r1 < · · · < r4 ≤ T be the permutation of t1 , . . . , t4 in ascending order and let dc be the cth largest difference among rl+1 − rl , l = 1, 2, 3. By Lemma 1 of Yoshihara (1976), we have

t −2q−1



×

Zs∗ (u, θ0 ) ψs∗−j (v)dW (u) dW (v)

− |j|+1≤r1 <···
s=j+1 T ∫∫ −

+

Zt (u, θ0 )

t −1 −

t =2

aT (j)ψt −j (v)

×



Zs (u, θ0 ) ψs∗−j (v)dW (u)dW ∗



(v)

≡ U˜ 1 + R˜ 3 ,

(A.16)

∫∫ T −

Zt (u, θ0 )

t =2q+2

q −



T −3 −

T −2 −

T −1 −

T −

1

δ

4C 1+δ β 1+δ (r2 − r1 )

)

T −3 −

1

≤ 4C 1+δ

T −2 −

δ

(r2 − r1 )2 β 1+δ (r2 − r1 )

r1 =|j|+1 r2 =r1 +1 T −

1

≤ 4TC 1+δ

δ

r 2 β 1+δ (r ) = O (T ) .

Similarly,

j =1

− Zs∗ (u, θ0 ) ψs∗−j (v)dW (u) dW (v)

s=j+1

∫∫ T − t =2q+2

 ′  ′   − ϕ (v) eiv Xr3 −j − ϕ (v) eiv Xr4 −j − ϕ (v) 

r =|j|+1

aT (j)ψt −j (v)

t −2q−1

+

iv′ Xr −j 2

(

where in the first term U˜ 1 , we have t − s > 2q so that we can bound it with the mixing inequality. In the second term R˜ 3q , we have 0 < t − s ≤ 2q. Finally, we write

×

e

r1 =|j|+1 r2 =r1 +max rj −rj−1 r3 =r2 +1 r4 =r3 +1 j ≥3

s=max(j+1,t −2q)

U˜ 1 =



  ′   E eiv Xr1 −j − ϕ (v)

j =1 t −1

×

  Γ˜ j (u, v)2 dW (u)dW (v)

1 It suffices to show Lemmas A.4–A.8 below, which imply p− 2 [C˜ 1

t =j +1

˜ + R˜ 1 + 2 Re(R˜ 2 ). ≡M

k (j/p)Tj

= C˜ 1 (0, 0) + 2 Re(V˜ q ) + R˜ 1 − 2 Re(R˜ 2 − R˜ 3 − R˜ 4 ).

∗   Zt (u, θ0 ) ϕ (v) − ϕˆ j (v) dW (u) dW (v)



∫∫

2

j =1

× dW (u) dW (v)  ∫∫  − T −1 T − + 2 Re αT (j) Zt (u, θ0 ) ψt −j (v) T

(A.17)

where the first term V˜ q is contributed by the lag orders j from 1 to

  Γ˜ j (u, v)2 dW (u) dW (v)

j=1



Zs∗ (u, θ0 ) ψs∗−j (v)dW (u)dW (v)

s=j+1

Proof of Proposition A.3. We first decompose T −1



×

d

Zt (u, θ0 )

t −1 − j=q+1

aT (j)ψt −j (v)

|j|+1≤r1 <···
×



e

  ′   E eiv Xr1 −j − ϕ (v)

iv′ Xr −j 2

= O (T ) .

 ′  ′   − ϕ (v) eiv Xr3 −j − ϕ (v) eiv Xr4 −j − ϕ (v) 

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

Proof of Lemma A.7. By the m.d.s. property of Zt (u, θ0 ), Minkowski’s inequality and (A.5), we have

Further, it can be shown in a similar way that

− |j|+1≤r1 <···


×

e

  ′   E eiv Xr1 −j − ϕ (v)

iv′ Xr −j 2

 2

=O T

− ϕ (v)



e

iv′ Xr −j 3

  ′  − ϕ (v) eiv Xr4 −j − ϕ (v) 



e

Zs (u, θ0 ) ψs∗−j (v)dW ∗

s=max(j+1,t −2q)

  2  E eiv′ Xt1 −j − ϕ (v) 

|j|+1≤t1 ,t2 ,t3 ≤T t1 ,t2 ,t3 different

t −1 −

×

Similar to the above, we can show that if rs are not distinct to each other, we have

×

 ∫∫ T t −1  2 − − ˜   E R3  = E aT (j) Zt (u, θ0 ) ψt −j (v)  j =1 t =2

.



287

 T − t −1 −



aT (j)

    E Zt (u, θ0 ) ψt −j (v)  

∫∫

 j =1

t =2

2   (u) dW (v) 

2 2 1/2     × Zs∗ (u, θ0 ) ψs∗−j (v)  dW (u) dW (v)    s=max(j+1,t −2q) t −1 −

iv′ Xt −j 2

 ′   − ϕ (v) eiv Xt3 −j − ϕ (v) 

  = O T2 , 2

≤ CTq

and

 T −1 −

2 aT (j)

= O(p2 q2 /T ). 

j=1

− |j|+1≤t1 ,t2 ≤T t1 ,t2 different

  2  ′ 2   iv Xt −j E eiv′ Xt1 −j − ϕ (v)  2 e − ϕ v ( )  

 2

=O T

. 4

Therefore, E ϕ (v) − ϕˆ j (v) = O T −2 .







 ∫∫ t −1 T −  2 −  ˜  aT (j) Zt (u, θ0 ) ψt −j (v) E E R4  = j=q+1 t =2q+2 2 t− 2q−1  −  ∗ ∗ × Zs (u, θ0 ) ψs−j (v)dW (u) dW (v)  s=j+1  ∫ ∫ t −1 T   − − 4 1/4 aT (j) E Zt (u, θ0 ) ψt −j (v) ⩽



Proof of Lemma A.5.

E |R˜ 1 | ≤

T −1 −

aT (j)

∫∫

j =1

  4  21 T −    E  Zt (u, θ0 )  t =j+1 

  4  21 × E ϕ (v) − ϕˆ j (v) dW (u) dW (v) = O (p/T ) , ∑  T

4 

Zt (u, θ0 )

≤ CTj2 by  4  −2  Rosenthal’s inequality and E ϕ (v) − ϕˆ j (v) = O T .  where we have used the fact E 

t =j +1

Proof of Lemma A.6. By the m.d.s. property of Zt (u, θ0 ), the Cauchy–Schwarz inequality, we have

E |R˜ 2 | ≤ 2

T −1 − j=1

aT (j)

∫∫

  2  21 T −   E  Zt (u, θ0 ) ψt −j (v)  t =j+1 

  2  21 T  −     × E Z (u, θ0 ) ϕ (v) − ϕˆ j (v)  dW (u) dW (v)  t =j+1 t   ≤2

T −1 − j =1

 21 ∫∫  − T  2  Z (u, θ0 ) ψ 2 (v) aT (j) E t t −j t =j +1

j =q +1

t =2q+2

2   4 1/4 2q−1  t −−    × E  Zs∗ (u, θ0 ) ψs∗−j (v)  dW (u) dW (v)   s=j+1   ⩽ CT

2

2

T −1 −

aT (j)

 ⩽ CT

j=q+1

T −1 −

2

2 (j/p)

−2b −1

Tj

j =q +1

= O(p4b ln2 T /q4b−2 ), given Assumption A.6 (i.e., k(z ) ⩽ C |z |−b as z → ∞). Proof of Proposition A.4. We rewrite V˜ q = Vq (t ) =

∫∫

Zt (u, θ0 )

q −

∑T

t =2q+2



Vq (t ), where

aT (j)ψt −j (v)Hj,t −2q−1 (u, v)

j =1

× dW (u) dW (v), ∑t −2q−1 ∗ ∗ and Hj,t −2q−1 (u, v) = s=j+1 Zs (u, θ0 )ψs−j (v). We apply Brown’s 1 (1971) martingale limit theorem, which states var(2 Re V˜ q )− 2 2 Re d

  4  14 T −    × E  Zt (u, θ0 )  t =j+1 

V˜ q → N (0, 1) if var(2 Re V˜ q )−1

  4  41 × E ϕ (v) − ϕˆ j (v) dW (u) dW (v)   1 = O p/T 2 ,  4   given E ϕ (v) − ϕˆ j (v) = O T −2 .

Proof of Lemma A.8. By the m.d.s. property of Zt (u, θ0 ) and Minkowski’s inequality, we have



T − 

2

2 Re Vq (t )

1

t =2q+2

   1 × 2 Re Vq (t ) > η · var(2 Re V˜ q ) 2 → 0 ∀η > 0, var(2 Re V˜ q )−1

T − t =2q+2

E



2

2 Re Vq (t )

 p |It −1 → 1.

(A.18) (A.19)

288

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

First, we compute var(2 Re V˜ q ). By the m.d.s. property of Zt (u, θ0 ) under H0 , we have E (V˜ q2 ) =

∫ ∫

T −

Zt (u, θ0 )

E

q −

aT (j)ψt −j (v)

×

Zs (u, θ0 ) ψs∗−j (v)dW (u)dW ∗

q − q −

aT (j)aT (l)

j =1 l =1

∫∫∫∫ ×

t −2q−1

T −



β δ/(1+δ) (t − s)

t =2q+2 s=j+1

2

t −2q−1

|V1 (t )| ≤

t =2q+2

j=1

t =2q+2



T −

  2(1+δ)  2(11+δ) × E Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 )

(v)

s=j+1

=

q − q −

  2(1+δ)  2(11+δ) × E Zs∗ (u1 , θ0 ) Zs∗ (u2 , θ0 ) ψs∗−j (v1 )ψs∗−l (v2 )

aT (j)aT (l)

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 )   = O p2 T −1 q−νδ/(1+δ)+1 ,

j =1 l =1

∫∫∫∫ ×

T

t −2q−1





E [Zt (u1 , θ 0 ) Zt (u2 , θ0 )

t =2q+2 s=j+1



× ψt −j (v1 )ψt −l (v2 )] q − q − = aT (j)aT (l) T −

∫∫∫∫ ×

×

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) +

q − q −

×

t −2q−1 s1 −1





E {{Zt (u1 , θ0 )

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) ∫∫∫∫ − q t− 2q−1 s− T 1 −1 − − ≤2 a2T (j) β δ/(1+δ) (t − s1 )

aT (j)aT (l)

j =1 l =1

j =1

t −2q−1



T −

× Zt (u2 , θ0 ) ψt −j (v1 )ψt −j (v2 ) − E   × Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −j (v2 ) } × Zs∗1 (u1 , θ0 ) Zs∗2 (u2 , θ0 ) ψs∗1 −j (v1 )ψs∗2 −j (v2 )}

E [Zt (u1 , θ 0 ) Zt (u2 , θ0 )

× ψt −j (v1 )ψt −l (v2 )] × E [Zs∗ (u1 , θ0 ) Zs∗ (u2 , θ0 ) ψs∗−j (v1 )ψs∗−l (v2 )]

T −

a2T (j)

t =2q+2 s1 =j+1 s2 =j+1

t =2q+2 s=j+1

∫∫∫∫

− j =1

∫∫∫∫

t −2q−1



|V2 (t )| ≤ 2

t =2q+2

j =1 l =1

(A.21)

q

T

cov[Zt (u1 , θ 0 ) Zt (u2 , θ0 )

t =2q+2 s=j+1

× ψt −j (v1 )ψt −l (v2 ) × Zs∗ (u1 , θ0 ) Zs∗ (u2 , θ0 ) ψs∗−j (v1 )ψs∗−l (v2 )] × dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) + 2

q −

t =2q+2 s1 =j+1 s2 =j+1

  2(1+δ)  2(11+δ) × E Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −j (v2 )   × E Zs∗1 (u1 , θ 0 ) Zs∗2 (u2 , θ0 ) ψs∗1 −j (v1 ) 2(1+δ)  2(11+δ) × ψs∗2 −j (v2 ) dW (u1 ) dW (u2 ) dW (v1 )dW (v2 )  −νδ/(1+δ)+1  = O pq , (A.22)

a2T (j)

j =1 T −

∫∫∫∫ ×

t −2q−1 s1 −1





and

E [Zt (u1 , θ0 )Zt (u2 , θ0 )

t =2q+2 s1 =j+1 s2 =j+1

× ψt −j (v1 )ψt −j (v2 ) × Zs∗1 (u1 , θ0 ) Zs∗2 (u2 , θ0 ) ψs∗1 −j (v1 )ψs∗2 −j (v2 )] × dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) + 4

q − j −1 −

T −

|V3 (t )| ≤ 4

t =2q+2

aT (j)aT (l)

T −

∫∫∫∫

∫∫∫∫ ×



t =2q+2 s1 =j+1 s2 =l+1

× |E [Zq,j+l (u1 , θ0 ) Zq,j+l (u2 , θ0 ) ψq,l (v1 )ψq,j (v2 )]|

2

t =2q+2

V1 (t ) +

T − t =2q+2

V2 ( t ) +

T −

E {{Zt (u1 , θ0 )

j=1 l=1

∫∫∫∫

V3 (t ) ,

T −

t −2q−1 s1 −1





β δ/(1+δ) (t − s1 )

t =2q+2 s1 =j+1 s2 =j+1

  2(1+δ)  2(11+δ) × E Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 )

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) [1 + o(1)] T −



× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) q − q − ≤4 aT (j)aT (l)

2 j =1 l =1

+



× (u1 , θ0 ) Zs∗2 (u2 , θ0 ) ψs∗1 −j (v1 )ψs∗2 −l (v2 )}

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) q q 1 −− 2 = k (j/p)k2 (l/p) ×

t −2q−1 s1 −1

× Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 )   − E Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 ) }Zs∗1

E [Zt (u1 , θ0 ) Zt (u2 , θ0 )

× ψt −j (v1 )ψt −l (v2 ) × Zs∗1 (u1 , θ0 ) Zs∗2 (u2 , θ0 ) ψs∗1 −j (v1 )ψs∗2 −l (v2 )]

∫∫∫∫

T −

t =2q+2 s1 =j+1 s2 =j+1

t −2q−1 s1 −1



aT (j)aT (l)

j =1 l =1

j =2 l =1

×

q − q −

(A.20)

t =2q+2

where the first term is O (p) as shown in (A.25) and the remaining terms are o(p) following the arguments below

  2(1+δ)  2(11+δ) × E Zs∗1 (u1 , θ0 ) Zs∗2 (u2 , θ0 ) ψs∗1 −j (v1 )ψs∗2 −l (v2 ) × dW (u1 ) dW (u2 ) dW (v1 )dW (v2 )   = O p2 q−νδ/(1+δ)+1 .

(A.23)

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

where we used the fact that for any given m, p

Similarly, we can obtain E (V˜ q∗ )2 =

q q 1 −−

2 j=1 l=1

∫∫∫∫ ×

(

k2 (j/p)k2 (l/p)

j −m p

)→

   E Zj+l (u1 , θ0 ) Zj+l (u2 , θ0 ) ψl (v1 )ψj (v2 ) 2



 q −

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) [1 + o(1)] .

(A.24)

 E Zj+l (u1 , θ0 ) Zj+l (u2 , θ0 ) ψl (v1 )ψj (v2 ) = C (0, j, l) + Σ0 (u1 , u2 ; θ0 ) Ωl−j (v1 , v2 ),    E Zj+l (u1 , θ0 ) Zj+l (u2 , θ0 ) ψl (v1 )ψj (v2 ) 2  2 = |C (0, j, l)|2 + Σ0 (u1 , u2 ; θ0 ) Ωl−j (v1 , v2 ) + 2 Re[C (0, j, l)Σ0∗ (u1 , u2 ; θ0 ) Ωl∗−j (v1 , v2 )]. ∑∞ ∑∞ Given j=−∞ l=−∞ |C (0, j, l)| ≤ C and |k(·)| ≤ 1, we have 

k (j/p)k (l/p) 2

∫∫

  Ωl−j (v1 , v2 )2

j=1 l=1

× dW (v1 )dW (v2 )

∫∫

|Σ0 (u1 , u2 ; θ0 )|2

× dW (u1 ) dW (u2 ) [1 + o(1)]   q−1 q − − −1 2 2 = 2p p k (j/p)k [(j − m)/p] m=1−q

j=m+1

∫∫

|Ωm (v1 , v2 )|2 dW (v1 )dW (v2 )

∫∫

|Σ0 (u1 , u2 ; θ0 )|2 dW (u1 ) dW (u2 ) [1 + o(1)]

×

aT (j)

= O(p4 t 2 /T 4 ).

j =1



q − q −

aT (j)aT (l)

∫∫∫∫



Σt (u1 , u2 ; θ0 )

j=1 l=1

Put C (0, j, l) ≡ E {[Zj+l (u1 , θ0 ) Zj+l (u2 , θ0 ) − Σ0 (u1 , u2 ; θ0 )] ψl (v1 )ψj (v2 )}, where Σj (u1 , u2 ; θ0 ) ≡ E [Zt (u1 , θ0 ) Zt −j (u2 , θ0 )]. Then

2

4

E [Vq2 (t )|It −1 ] =

   E Zj+l (u1 , θ0 ) Zj+l (u2 , θ0 ) ψl (v1 )ψj (v2 ) 2

q − q −

 q −

∑T

k2 (j/p)k2 (l/p)

var(2 Re V˜ q ) = 2

⩽ Ct

2

4 4 It follows that = o(p2 ) given p2 / t =2q+2 E |Vq (t )| = O p /T T → 0. Thus, (A.18) holds. Next, we verify condition (A.19). Put Σt (u1 , u2 ; θ0 ) ≡ E [Zt (u1 , θ0 )Zt (u2 , θ0 ) |It −1 ]. Then

˜∗ 2

j=1 l=1

×

∫∫   4 1/4 aT (j) E Zt (u, θ0 ) ψq,t −j (v)Hj,t −2q−1 (u, v)

× dW (u)dW (v)

 2   var(2 Re V˜ q ) = E ( ˜ ) + E (Vq ) + 2E V˜ q 

∫∫∫∫

8

4

Because dW (·) weighs sets symmetric about zero equally, we have E |V˜ q |2 = E (V˜ q2 ) = E (V˜ q∗ )2 . Hence,

q − q −

k (z )dz as p → ∞.

j =1

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) [1 + o(1)] .

=2

k2 (j/p)k2

E |Vq (t )|4

   E Zj+l (u1 , θ0 ) Z ∗ (u2 , θ0 ) ψl (v1 )ψ ∗ (v2 ) 2 j j +l

Vq2

0

j=m+1

4



2 j=1 l=1

×

∞

We now verify condition (A.18). Noting that E Hj,t −2q−1 (u, v) ≤ Ct 4 for 1 ≤ j ≤ q given the m.d.s. property of Zt (u, θ0 ) and Rosenthal’s inequality (cf. Hall and Heyde (1980, p. 23)), we have

× dW (u1 ) dW (u2 ) dW (v1 )dW (v2 ) [1 + o(1)] , q q  2 1 −− 2   E V˜ q  = k (j/p)k2 (l/p) ∫∫∫∫

289

∑q −1

× ψt −j (v1 )ψt −l (v2 )Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 ) × dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) ∫∫∫∫ q − q − = aT (j)aT (l) E [Σt (u1 , u2 ; θ0 ) j =1 l =1

 × ψt −j (v1 )ψt −l (v2 ) Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 ) q − q − × dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) + aT (j)aT (l) j=1 l=1

∫∫∫∫ ×

j ,l

L˜ t (u1 , u2 , v1 , v2 )Hj,t −2q−1

× (u1 , v1 )Hl,t −2q−1 (u2 , v2 ) × dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) ≡ S1 (t ) + V4 (t ), say,

(A.26)

j ,l

where L˜ t (u1 , u2 , v1 , v2 ) ≡ Σt (u1 , u2 ; θ0 ) ψt −j (v1 )ψt −l (v2 ) −  E Σt (u1 , u2 ; θ0 ) ψt −j (v1 )ψt −l (v2 ) . We further decompose S1 (t ) =

q − q −

aT (j)aT (l)

∫∫∫∫

E Σt (u1 , u2 ; θ0 ) ψt −j (v1 )



j=1 l=1

  × ψt −l (v2 )] E Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 ) × dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) ∫∫∫∫ q − q − + aT (j)aT (l) E [Σt (u1 , u2 ; θ0 ) j =1 l =1

×





k4 (z )dz

= 2p 0

∞ ∫∫ −

|Ωm (v1 , v2 )|2 dW (v1 )dW (v2 )

m=−∞

∫∫

|Σ0 (u1 , u2 ; θ0 )|2 dW (u1 ) dW (u2 ) [1 + o(1)] ∫ ∞ ∫∫∫ π = 4π p k4 (z )dz |G(ω, v1 , v2 )|2 dωdW (v1 )dW (v2 ) ×

0

∫∫ ×

−π

|Σ0 (u1 , u2 ; θ0 )|2 dW (u1 ) dW (u2 ) [1 + o(1)], (A.25)

 × ψt −j (v1 )ψt −l (v2 ) Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 )   − E Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 ) × dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) ≡ V0 (t ) + S2 (t ), say, (A.27) where V0 ( t ) =

q − q −

min(t − 2q − 1 − j, t − 2q − 1 − l)aT (j)aT (l)

j=1 l=1

∫∫∫∫ ×

   E Σt (u1 , u2 ; θ0 ) ψt −j (v1 )ψt −l (v2 ) 2

290

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

× dW (u1 )dW (u2 ) dW (v1 )dW (v2 )   = E Vq2 (t ) − V1 (t ) − V2 (t ) − V3 (t ) ,

Lemma A.12. Let V7 (t ) be defined as in (A.30). Then E | V7 (t )|2 = O(p).

where Vj (t ) , j = 1, 2, 3, are defined in (A.20). Put

j ,l

  − E Zs (u1 , θ0 ) Zs (u2 , θ0 ) ψs−j (v1 )ψs−l (v2 ) .

(A.28)

Then we write q − q −

t −2q−1

  8  14 × E Hl,t −2q−1 (u2 , v2 )

Ljs,l (u1 , u2 , v1 , v2 )dW (u1 )dW (u2 )



×

t =2q+2

 E Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 ) 

×

+2

s=max(j,l) q − q −

× dW (v1 )dW (v2 ) + ×

s −1 −

Zs u1, θ0 ψs−j (v1 )Zτ (u2 , θ0 )



0


  4  14 × E Hj,t −2q−1 (u1 , v1 )

s=max(j,l)+1 τ =l+1

× ψτ −l (v2 )dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) ≡ V5 (t ) + S3 (t ),

say,

(A.29)

where S3 ( t ) =

q − q −

aT (j)aT (l)

∫∫∫∫

E [Zt (u1 , θ0 ) Zt (u2 , θ0 )

× ψt −j (v1 )ψt −l (v2 )] −− × Zs (u1 , θ0 ) ψs−j (v1 )Zτ (u2 , θ0 )

0
0
× ψτ −l (v2 )dW (u1 )dW (u2 ) dW (v1 )dW (v2 ) ∫∫∫∫ q − q − + aT (j)aT (l) E [Zt (u1 , θ0 ) Zt (u2 , θ0 ) j =1 l =1

−−

  4  14 × E Hl,t −2q−1 (u2 , v2 )   4  14 4  14   E Hl,τ −2q−1 (u2 , v2 ) × E Hj,τ −2q−1 (u1 , v1 )   −− δ  +2 β 1+δ (2q) E L˜ jq,,lt (u1 , u2 , v1 , v2 )L˜ jq,,τl

j=1 l=1

× ψt −j (v1 )ψt −l (v2 )]

[  2(1+δ) ] 2(11+δ)  ˜ j ,l  (2q) E Lq,t (u1 , u2 , v1 , v2 )

2(1+δ)  2(11+δ) × Hj,τ −2q−1 (u1 , v1 )Hl,τ −2q−1 (u2 , v2 )  − −  j,l  l +2 E L˜ q,t (u1 , u2 , v1 , v2 )L˜ jq,,τ (u1 , u2 , v1 , v2 ) .

 E Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 )



β

δ

1+δ

   × E L˜ jq,,τl (u1 , u2 , v1 , v2 )Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 )

aT (j)aT (l)



t −2q−1

×

−− t −τ >2q

j =1 l =1

∫∫∫∫

 2 T  −    j , l E L˜ q,t (u1 , u2 , v1 , v2 )Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 ) t =2q+2  [  T 4 ] 12   − 8  14  ˜ j ,l  ≤ E Lq,t (u1 , u2 , v1 , v2 ) E Hj,t −2q−1 (u1 , v1 )

aT (j)aT (l)

j=1 l=1

∫∫∫∫

Zs (u1 , θ 0 ) ψs−j (v1 )Zτ (u2 , θ0 )

 2(11+δ) × (u1 , u2 , v1 , v2 )|2(1+δ)   × E Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 )Hj,τ −2q−1 (u1 , v1 ) 2(1+δ)  2(11+δ) × Hl,τ −2q−1 (u2 , v2 )         νδ νδ = O T 3 + O T 4 q− 1+δ +1 + O T 3 q + O T 3 q− 1+δ +1 ,

s−τ >2q

≡ V6 (t ) + V7 (t ),

say.

(A.30)

It follows from (A.26) and (A.27) and (A.29) and (A.30) that ∑T ∑T ∑7 {E [Vq2 (t )|It −1 ] − E [Vq2 (t )]} = t =2q+2 [ a=4 Va (t ) − ∑t3=2q+2 V ( t )] . It suffices to show Lemmas A.9–A.12 below, which a a=1 p

1+ 4b1−2

∑T

t =2q+2

(ln2 T )

E [Vq2 (t )|It −1 ] − E [Vq2 (t )]|2 = o(p2 ) given q =

1 2b−1

and p = cT λ for 0 < λ < (3 +

)

1 −1 . 4b−2

Thus,

d

condition (A.19) holds, and so Q˜ q (0, 0) → N (0, 1) by Brown’s (1971) theorem.  Lemma A.9. Let V4 (t ) be defined as in (A.26). Then E | νδ

∑T

t =2q+2

V4 (t )|2 = O(p/q 1+δ ). Lemma A.10. Let V5 (t ) be defined as in (A.29). Then E | V5 (t )|2 = O(qp4 /T ). Lemma A.11. Let V6 (t ) be defined as in (A.30). Then E | V6 (t )| = O(qp /T ). 2

4

8

where we have made use of the fact that E Hj,t −2q−1 (u1 , v1 ) ≤ Ct 4 for 1 ≤ j ≤ q. It follows by Minkowski’s inequality and (A.5) that



× ψτ −l (v2 )dW (u1 )dW (u2 ) dW (v1 )dW (v2 )

imply E |

t =2q+2

Proof of Lemma A.9. Recalling the definition of L˜ t (u1 , u2 , v1 , v2 ) as in (A.26), we can obtain

Ljs,l (u1 , u2 , v1 , v2 ) ≡ Zs (u1 , θ0 ) Zs (u2 , θ0 ) ψs−j (v1 )ψs−l (v2 )

S2 ( t ) =

∑T

∑T

t =2q+2

∑T

t =2q+2

  2  q − q T  −  −   E V4 (t ) ⩽ a (j)aT (l) t =2q+2   j =1 l =1 T    ∫∫∫∫ T  −  j ,l × E L˜ q,t (u1 , v1 , u2 , v2 ) t =2q+2 × Hj,t −2q−1 (u1 , v1 )Hl,t −2q−1 (u2 , v2 )dW (u1 )dW (u2 )  2  12 2    × dW (v1 )dW (v2 )       4  = O qp /T .  j,l

Proof of Lemma A.10. Recalling the definition of Lq,s (u1 , v1 , u2 , v2 ) in (A.28), we have

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293

=

|s−τ |≤2q

q − q − q − q −

aT (j1 )aT (j2 )aT (l1 )aT (l2 )

j1 =1 j2 =1 l1 =1 l2 =1

  2(1+δ)  2(11+δ) δ β 1+δ (2q) E Ljs,l (u1 , v1 , u2 , v2 )



+

291

2   × ψq,τ −l (v2 )dW (u1 )dW (u2 ) dW (v1 )dW (v2 )  

 2 2q−1  t −−    j ,l E Ls (u1 , v1 , u2 , v2 ) s=max(j,l)  −  j ,l  E Ls (u1 , v1 , u2 , v2 )Ljτ,l (u1 , v1 , u2 , v2 ) =



E [Z0 (u11 , θ0 ) Z0 (u21 , θ0 ) ψ−j1 (v11 )ψ−l1 (v21 )]

×

|s−τ |>2q

R8N

× E [Z0∗ (u12 , θ0 ) Z0∗ (u22 , θ0 ) ψ−∗ j2 (v12 )ψ−∗ l2 (v22 )]

  2(1+δ)  2(11+δ) × E Ljτ,l (u1 , v1 , u2 , v2 )

t −2q−1

= O (tq) .

×



E Zs (u11 , θ0 ) Zs (u12 , θ0 ) ψs−j1 (v11 )ψs−j2 (v12 )





s=2q+2

It follows that

s−2q−1

 2  q − q T T  −  − −   E V5 (t ) ⩽ aT (j)aT (l) t =2q+2  t =2q+2 j=1 l=1 ∫∫∫∫ |E [Zt (u1 , θ0 ) Zt (u2 , θ0 ) ×



×

E Zτ∗ (u21 , θ0 ) Zτ∗ (u22 , θ0 )



τ =max(l1 ,l2 )+1

 × ψτ∗−l1 (v21 )ψτ∗−l2 (v22 ) dW (u11 )dW (u12 ) dW (u21 ) × dW (u22 ) dW (v11 )dW (v12 ) dW (v21 )dW (v22 ) q − q − q − q − aT (j1 )aT (j2 )aT (l1 )aT (l2 ) +

  2  12 2q−1  t −−     × ψt −j (v1 )ψt −l (v2 )] E  Ljs,l (u1 , v1 , u2 , v2 )  s=max(j,l) 

j1 =1 j2 =1 l1 =1 l2 =1



E [Z0 (u11 , θ0 ) Z0 (u21 , θ0 ) ψ−j1 (v11 )ψ−l1 (v21 )]

× R8N

2

× E [Z0 (u12 , θ0 ) Z0∗ (u22 , θ0 ) ψ−∗ j2 (v12 )ψ−∗ l2 (v22 )]  2q−1 2q−1 t −  t −− − δ  1 +δ Z (u , θ ) Z (u , θ ) ×β (2q) E  s =2q+2 s =2q+2 s1 11 0 s2 12 0 ∗

× dW (u1 )dW (u2 ) dW (v1 )dW (v2 )   = O qp4 /T . 

1

Proof of Lemma A.11. The result that E |

(qp2 /T ) by Minkowski’s inequality and  q − q − 2 aT (j)aT (l) E |V6 (t )| ⩽

s −2q−1 s −2q−1  1− 2−  Z (u , θ ) Z (u , θ ) ψ (v ) ×E  τ =2q+2 τ =2q+2 τ1 21 0 τ2 22 0 τ1 −j1 21 1

j=1 l=1

∫∫∫∫ ×

2

2(1+δ)  2(11+δ)  × ψτ2 −j2 (v22 )

  E [Zt (u1 , θ0 ) Zt (u2 , θ0 ) ψt −j (v1 )ψt −l (v2 )]

   ×  E Zs (u1 , θ0 ) ψs−j (v1 ) s=max(j,l)  2  12  −  × Zτ (u2 , θ0 ) ψτ −l (v2 )  dW (u1 )dW (u2 )  s−τ ≤2q 

2

2(1+δ)  2(11+δ)  × ψs1 −j1 (v11 )ψs2 −j2 (v12 )

2 = O t =2q+2 V6 (t )|

∑T

t −2q−1

× dW (u11 )dW (u12 ) dW (u21 )dW (u22 ) dW (v11 )dW (v12 ) × dW (v21 )dW (v22 )   νδ = O(t 2 pT −4 ) + O t 3 pq− 1+δ +1 T −4 ,



given Assumption A.7.



Proof of Theorem 2. The proof of Theorem 2 consists of the proofs of Theorems A.3 and A.4 below. 

2 × dW (v1 )dW (v2 )

1

 ⩽ Ctq

q



Theorem A.3. Under the conditions of Theorem 2, (p 2 /T )[Qˆ (0, 0)−

4 aT (j)

p

Q˜ (0, 0)] → 0.

  = O tqp4 /T 4 . 

j =1

Theorem A.4. Under the conditions of Theorem 2, Proof of Lemma A.12. The result that E | − νδ +1 Tpq 1+δ

(

∑T

t =2q+2

V7 (t )|2 = O

) follows from Minkowski’s inequality, p → ∞, and

t −2q−1

×

− s=2q+2

s−2q−1

Zq,s (u1 , θ0 ) ψq,s−j (v1 )

− τ =l +1

1

∫∫∫

π

|F (ω, u, v)

−π

− F0 (ω, u, v)|2 dωdW (u)dW (v) .

the fact that

 ∫∫∫∫ q − q −   2 E |V7 (t )| = E  a (j)aT (l) E Zq,j+l (u1 , θ0 )  j=1 l=1 T  × Zq,j+l (u2 , θ0 ) ψq,l (v1 )ψq,j (v2 )

p

1

(p 2 /T )Q˜ (0, 0) →(D (0, 0))− 2

Proof of Theorem A.3. It suffices to show that T

−1

∫∫ − T −1



k2 (j/p)Tj |Γˆ j (u, v)|2 − |Γ˜ j (u, v)|2



j =1

Zq,τ (u2 , θ0 )

p

× dW (u)dW (v) → 0,

(A.31)

292

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293 p

ˆ (0, 0) − D˜ (0, 0)] → 0, p−1 [Cˆ (0, 0) − C˜ (0, 0)] = OP (1), and p−1 [D ˜ ˜ where C (0, 0) and D(0, 0) are defined in the same way as Cˆ (0, 0) ˆ (0, 0) in (3.8), with Zˆt (u) replaced by Zt (u, θ∗ ). Since the and D ˆ (0, 0) − proofs for p−1 [Cˆ (0, 0) − C˜ (0, 0)] = OP (1) and p−1 [D p

˜ (0, 0)] → 0 are straightforward, we focus on the proof of (A.31). D From(A.5), the Cauchy–Schwarz inequality, and the fact that ∑ T −1 2 2 ˜ T −1 = OP (1) as is j=1 k (j/p)Tj |Γj (u, v)| dW (u)dW (v) implied by Theorem A.4 (the proof of Theorem A.4 does not p

depend on Theorem A.3), it suffices to show that T −1 Aˆ 1 → 0, where Aˆ 1 is defined as in (A.2). Given (A.3), we shall show that p

2 2 ˆ T −1 = 1, 2. j=1 k (j/p)Tj |Baj (u, v)| dW (u)dW (v) → 0, a We first consider a = 1. By the Cauchy–Schwarz inequality and |ψt −j (v)| ≤ 2, we have

 ∑T −1

|Bˆ 1j (u, v)|2 ≤ CTj−1

T    2 −    ϕ u, t |It −1 , θ∗ − ϕ u, t |It −1 , θˆ 

t =j +1 T

  2 −    Ď ϕ u, t |It −1 , θˆ − ϕ u, t |It −1 , θˆ 

+ CTj−1

t =j +1

 2 T 2  − ∂     sup sup  ≤ CTj−1 T θˆ − θ0  T −1 ϕ u , t | I , θ) ( t −1   ∂θ N t =1 u∈R θ∈8 T −

+ CTj−1

 

N

t =j+1 u∈R



 −1

= OP Tj



Ď

2 

sup sup ϕ (u, t |It −1 , θ) − ϕ u, t |It −1 , θ  θ∈8

.

It follows from (A.4) and Assumption A.7 that T −1

∫∫ − T −1

k2 (j/p)Tj |Bˆ 1j (u, v)|2 dW (u)dW (v)

j =1

= OP (p/T ). The proof for a = 2 is similar. This completes the proof for Theorem A.3.  Proof of Theorem A.4. The proof is very similar to Hong (1999, Proof of Thm. 5), for the case (m, l) = (0, 0). The consistency result ∑T ∞ π follows from (a) p−1 j=1 k4 (j/p) → 0 k4 (ω) dω; (b) −π

|Fˆ (ω, u, v) − F (ω, u, v) |2 × dωdW (u)dW (v) → 0; (c) Cˆ (0, 0) = ˆ (0, 0) →p D (0, 0).  OP (p); (d) D Proof of Theorem 3. The proof is very similar to Theorem 1, (m) ˆ Zt(m) (0, θ0 ), Γˆ (m,0) (0, v), Cˆ (m, 0), Dˆ (m, 0) replacwith Zˆt (0, θ), j

ˆ Zt (u, θ0 ), Γˆ j (u, v), Cˆ (0, 0), Dˆ (0, 0) respectively. ing Zˆt (u, θ),



References Ahn, D., Dittmar, R., Gallant, A.R., 2002. Quadratic term structure models: theory and evidence. Review of Financial Studies 15, 243–288. Ait-Sahalia, Y., 1996a. Testing continuous-time models of the spot interest rate. Review of Financial Studies 9, 385–426. Ait-Sahalia, Y., 1996b. Nonparametric pricing of interest rate derivative securities. Econometrica 64, 527–560. Ait-Sahalia, Y., 2002. Maximum-likelihood estimation of discretely sampled diffusions: a closed-form approach. Econometrica 70, 223–262. Ait-Sahalia, Y., 2007. Estimating continuous-time models using discretely sampled data. In: Blundell, B., Torsten, P., Newey, W. (Eds.), Advances in Economics and Econometrics, Theory and Applications, Ninth World Congress. Cambridge University Press, Cambridge. Ait-Sahalia, Y., Kimmel, R., 2010. Estimating affine multifactor term structure models using closed-form likelihood expansions. Journal of Financial Economics 98, 113–144. Ait-Sahalia, Y., Fan, J., Peng, H., 2009. Nonparametric transition-based tests for diffusions. Journal the of American Statistical Association 104, 1102–1116. Alexander, C.O., 1998. Volatility and correlation: methods, models and applications. In: Risk Management and Analysis: Measuring and Modelling Financial Risk.

Andersen, T.G., Benzon, L., Lund, J., 2002. An empirical investigation of continuoustime equity return models. Journal of Finance 57, 1239–1284. Andrews, D.W.K., 1997. A conditional kolmogorov test. Econometrica 65, 1097–1128. Bakshi, G., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing models. Journal of Finance 52, 2003–2049. Bally, V., Talay, D., 1996. The law of the Euler scheme for stochastic differential equations: I. Convergence rate of the distribution function. Probability Theory and Related Fields 104, 43–60. Bates, D., 1996. Jumps and stochastic volatility: exchange rate processes implicit in PHLX deutsche mark options. Review of Financial Studies 9, 69–107. Bates, D., 2000. Post-’87 crash fears in the S& P 500 futures option market. Journal of Econometrics 94, 181–238. Bates, D., 2007. Maximum likelihood estimation of latent affine processes. Review of Financial Studies 19, 909–965. Bhardwaj, G., Corradi, V., Swanson, N.R., 2008. A simulation based specification test for diffusion processes. Journal of Business & Economic Statistics 176–193. Bierens, H., 1982. Consistent model specification tests. Journal of Econometrics 20, 105–134. Bontemps, C., Meddahi, N., 2010. Testing distributional assumptions: a gmm approach. Working paper, University of Montreal. Brandt, M., Santa-Clara, P., 2002. Simulated likelihood estimation of diffusions with an application to exchange rate dynamics in incomplete markets. Journal of Financial Economics 63, 161–210. Brown, B.M., 1971. Martingale limit theorems. Annals of Mathematical Statistics 42, 59–66. Carr, P., Wu, L., 2003. The finite moment log stable process and option pricing. Journal of Finance 58, 753–777. Carr, P., Wu, L., 2004. Time-changed Lévy processes and option pricing. Journal of Financial Economics 71, 113–141. Carrasco, M., Florens, J.P., 2000. Generalization of GMM to a continuum of moment conditions. Econometric Theory 16, 797–834. Carrasco, M., Chernov, M., Florens, J.P., Ghysels, E., 2007. Efficient estimation of jump diffusions and general dynamic models with a continuum of moment conditions. Journal of Econometrics 140, 529–573. Chacko, G., Viceira, L., 2003. Spectral GMM estimation of continuous-time processes. Journal of Econometrics 116, 259–292. Chen, S.X., Gao, J., Tang, C.Y., 2008. A test for model specification of diffusion processes. Annals of Statistics 36, 162–198. Chernov, M., Gallant, A.R., Ghysels, E., Tauchen, G., 2003. Alternative models for stock price dynamics. Journal of Econometrics 116, 225–257. Chib, S., Pitt, M.K., Shephard, N., 2010. Likelihood based inference for diffusion driven state space models. Working paper, University of Oxford. Cox, J.C., Ingersoll, J.E., Ross, S.A., 1985. A theory of the term structure of interest rates. Econometrica 53, 385–407. Dai, Q., Singleton, K., 2000. Specification analysis of affine term structure models. Journal of Finance 55, 1943–1978. Dai, Q., Singleton, K., 2003. Term structure dynamics in theory and reality. Review of Financial Studies 16, 631–678. Das, S., Sundaram, R., 1999. Of smiles and smirks: a term structure perspective. Journal of Financial and Auantitative Analysis 34, 211–239. Del Moral, P., Jacod, J., Protter, P., 2001. The Monte–Carlo method for filtering with discrete-time observations. Probability Theory and Related Fields 120, 346–368. Diebold, F.X., Gunther, T., Tay, A., 1998. Evaluating density forecasts with applications to financial risk management. International Economic Review 39, 863–883. Doucet, A., Freitas, N., Gordon, N., 2001. Sequential Monte Carlo Methods in Practice. Springer-Verlag. Duffee, G., 2002. Term premia and interest rate forecasts in affine models. Journal of Finance 57, 405–443. Duffie, D., Kan, R., 1996. A yield-factor model of interest rates. Mathematical Finance 6, 379–406. Duffie, D., Pan, J., Singleton, K., 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 1343–1376. Duffie, D., Pedersen, L., Singleton, K., 2003. Modeling credit spreads on sovereign debt: a case study of Russian bonds. Journal of Finance 55, 119–159. Easley, D, O’Hara, M., 1992. Time and the process of security price adjustment. Journal of Finance 47, 577–605. Engle, R., 2002. Dynamic conditional correlation — a simple class of multivariate GARCH models. Journal of Business and Economic Statistics 20, 339–350. Fan, Y., 1997. Goodness-of-fit tests for a multivariate distribution by the empirical characteristic function. Journal of Multivariate Analysis 62, 36–63. Gallant, A.R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12, 657–681. Gallant, A.R., Tauchen, G., 1998. Reprojecting partially observed systems with application to interest rate diffusions. Journal of the American Statistical Association 93, 10–24. Gao, J., King, M., 2004. Adaptive testing in continuous-time diffusion models. Econometric Theory 20, 844–882. Genon-Catalot, V., Jeantheau, T., Laredo, C., 2000. Stochastic volatility models as hidden markov models and statistical applications. Bernoulli 6, 1051–1079.

B. Chen, Y. Hong / Journal of Econometrics 164 (2011) 268–293 Gordon, N., Salmond, D., Smith, A., 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F140, 107–113. Hall, P., Heyde, C., 1980. Martingale Limit Theory and its Application. Academic Press. Hannan, E., 1970. Multiple Time Series. Wiley, New York. Hansen, L.P., Scheinkman, J.A., 1995. Back to the future: generating moment implications for continuous-time Markov processes. Econometrica 63, 767–804. Heston, S., 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6, 327–344. Hong, Y., 1999. Hypothesis testing in time series via the empirical characteristic function: a generalized spectral density approach. Journal of the American Statistical Association 94, 1201–1220. Hong, Y., Li, H., 2005. Nonparametric specification testing for continuous-time models with applications to term structure of interest rates. Review of Financial Studies 18, 37–84. Huang, J., Wu, L., 2003. Specification analysis of option pricing models based on time-changed Lévy processes. Journal of Finance 59, 1405–1439. Jarrow, R., Li, H., Liu, S., Wu, C., 2010. Reduced-form valuation of callable corporate bonds: theory and evidence. Journal of Financial Economics 95, 227–248. Jiang, G., Knight, J., 2002. Estimation of continuous-time processes via the empirical characteristic function. Journal of Business and Economic Statistics 20, 198–212. Johannes, M., Polson, N., Stroud, J., 2009. Optimal filtering of jump diffusions: extracting latent states from asset prices. Review of Economic Studies 22, 2759–2799. Kitagawa, G., 1996. Monte Carlo filter and smoother for non-gaussian nonlinear state space models. Journal of Computational and Graphical Statistics 5, 1–25. Kloeden, P., Platen, E., Schurz, H., 1994. Numerical Solution of SDE through Computer Experiments. Springer-Verlag, Berlin, Heidelberg, New York. Koutrouvelis, I.A., 1980. A goodness-of-fit test of simple hypotheses based on the empirical characteristic function. Biometrika 67, 238–240.

293

Li, F., 2007. Testing the parametric specification of the diffusion function in a diffusion process. Econometric Theory 23, 221–250. Pan, J., Singleton, K., 2008. Default and recovery implicit in the term structure of sovereign CDS spreads. Journal of Finance 63, 2345–2384. Pedersen, A.R., 1995. A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics 22, 55–71. Piazzesi, M., 2010. Affine term structure models. In: Ait-Sahalia, Y., Hansen, L. (Eds.), Handbook of Financial Econometrics. pp. 691–766. Pitt, M.K., Shephard, N., 1999. Filtering via simulation: auxiliary particle filters. Journal of the American Statistical Association 94, 590–599. Roussas, G., Ioannides, D., 1987. Moment inequalities for mixing sequences of random variables. Stochastic Analysis and Applications 5, 60–120. Singleton, K., 2001. Estimation of affine asset pricing models using the empirical characteristic function. Journal of Econometrics 102, 111–141. Stinchcombe, M., White, H., 1998. Consistent specification testing with nuisance parameters present only under the alternative. Econometric Theory 14, 295–324. Su, L., White, H., 2007. A consistent characteristic-function-based test for conditional independence. Journal of Econometrics 141, 807–834. Sundaresan, S., 2001. Continuous-time methods in finance: a review and an assessment. Journal of Finance 55, 1569–1622. Tauchen, G., 1997. New minimum chi-square methods in empirical finance. In: Kreps, D., Wallis, K. (Eds.), Advances in Econometrics: Seventh World Congress. Cambridge University Press. Vasicek, O., 1977. An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177–188. Yoshihara, K., 1976. Limiting behavior of U-statistics for stationary, absolutely regular processes. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete 35, 237–252.