Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests

Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests

Journal of Econometrics xxx (xxxx) xxx Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/j...

813KB Sizes 0 Downloads 12 Views

Journal of Econometrics xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests Xiye Yang Department of Economics, Rutgers University, 75 Hamilton Street, New Jersey Hall, New Brunswick, NJ 08901, USA

article

info

Article history: Received 24 December 2018 Received in revised form 19 August 2019 Accepted 11 October 2019 Available online xxxx JEL classification: C12 C58 Keywords: Specification test Volatility functionals Efficiency High-frequency data

a b s t r a c t This paper studies the estimation and inference problems of time-invariant restrictions on certain known functions of the stochastic volatility process. We first develop a more efficient GMM estimator and derive the efficiency bound under such restrictions. Then we construct an integrated Hausman-type test by summing up the squared differences between this more efficient estimator and the unrestricted estimator computed at different time points. Although less efficient under the null, the latter estimator is consistent under both the null and the alternative. The efficient GMM estimator can also be used to update an existing Bierens-type test and simplify the calculation of the asymptotic variance. Since the quadratic function puts more weight on large values, the Hausman-type test can have superior power than the Bierens-type test, which is based on a linear function of the differences. The simulation study shows that except for very small local window sizes, the Hausman-type test has good size and superior power. We finally apply these tests to studying the constant beta hypothesis using empirical data and find substantial evidence against this hypothesis. © 2019 Published by Elsevier B.V.

1. Introduction Model specification test is one of the core research fields in econometrics and it serves as a useful tool to evaluate the various restrictions of economic/econometric models. The early literature mainly focuses on the orthogonality specification between the regressors and residuals in linear regression models (e.g., Ramsey (1974), Wu (1973), and Hausman (1978)). The later literature extends the focus to various specifications: conditional moment tests for maximum likelihood models (e.g., Newey (1985) and Tauchen (1985)), consistent tests for general functional form in regression models (Bierens (1982, 1983, 1990), Bierens and Ploberger (1997), Hong and White (1995), and Fan and Li (1996) among others), and consistent hypothesis tests for time series models (e.g., Hong (1996), Chen and Fan (1999) and Wang and Phillips (2012)). In recent years, the scope of model specification test has been further extended to study time-invariant restrictions on certain known functions of the spot covariance matrix of nonstationary high-frequency (intra-day) asset returns. Typically, the spot covariance matrix is assumed to be stochastic, in order to capture the time-varying and random features of the volatilities of the assets’ returns. To reduce model complexity, it is often desirable to impose certain time-invariant restrictions, which are subject to misspecification. By default, the model specification in high-frequency econometrics involves a continuum of restrictions with respect to time (see, e.g., (2.5)), whereas, in most of the cases mentioned above, the number of restrictions remains finite. To the best of our knowledge, there are just two papers studying specification test in such a nonstationary high-frequency setting. Reiß et al. (2015) have proposed a nonparametric test for constant beta in a continuous-time regression model. The statistic can be viewed as an integrated Anderson–Rubin statistic E-mail address: [email protected]. https://doi.org/10.1016/j.jeconom.2019.10.003 0304-4076/© 2019 Published by Elsevier B.V.

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

2

X. Yang / Journal of Econometrics xxx (xxxx) xxx

(cf. Anderson and Rubin (1949)). Li et al. (2016) have developed a consistent specification test, which is a Bierens-type test, for more general forms of time-invariant restrictions on the spot covariance matrix. The primary goal of this paper is to advance this branch of literature, by providing a more efficient estimator under time-invariant restrictions and some alternative test statistics for such model specifications. Li et al. (2016) have demonstrated that the estimation problem under the time-invariant restrictions is essentially a minimum distance type estimation problem under the occupation measure of the volatility process, instead of its stationary distribution, which may not exist in a typical high-frequency setting. The authors first integrate the volatility functionals that may be subject to the restrictions and then construct the estimator by minimizing the weighted distance measure between such integrated volatility functionals. When estimating the restricted model, there is an issue analogous to the classical heteroskedasticity problem in linear regression models, where the residuals have different variances. The spot estimates at different time points are all informative about the restricted parameter, which is a constant over time. Intuitively, the spot estimates at different observation time points can have different asymptotic variances, just like the heteroskedastic residuals in linear regression models. It is then better to weight such spot estimates according to their variances when integrating them, just like the generalized least square (GLS) is a better choice than the ordinary least square (OLS) in the presence of heteroskedasticity. Following this idea, we find a more efficient estimator by first weighting the spot estimates and then integrating them to get the distance measure. We show that the optimal estimation constructed from this new procedure is asymptotically more efficient than the one proposed by Li et al. (2016). We also derive the asymptotic efficiency bound for the estimators under the time-invariant restrictions and provide a condition to check whether the GMM estimator can attain the efficiency bound or not. In the second part of this paper, we develop a new integrated Hausman-type test. At each observation time point, we can have two estimators: the unrestricted one is consistent under both the null and the alternative hypotheses, while the restricted one is more efficient under the null but inconsistent under the alternative. Hence, we can construct spot Hausman statistics at each observation time point. Intuitively, those spot Hausman statistics calculated from a sequence of non-overlapping local windows are asymptotically independent, as implied by the independence property of Brownian increments, which are the key components of the estimation errors. Those statistics from the overlapping windows behave like a mixing sequence. We show that the averages of these non-overlapping and overlapping statistics are normally distributed with the same asymptotic variance under the null and that they diverge to infinity under the alternative hypothesis. As a by-product, we propose some alternative Bierens-type test statistics based on the more efficient estimator under restrictions. To be more specific, we estimate the model both with and without constraints, and then evaluate the differences between the restricted and the unrestricted estimators using a Bierens-type determining class of weighting functions (see Section 3.2 for more details). One can also add a volatility-dependent weighting function to the differences. The corresponding test statistics are similar to the one discussed by Li et al. (2016), but they have simpler forms for their asymptotic variances. An obvious advantage of the Hausman-type test is that it is pivotal and easy to implement. In particular, one does not have to correct for various higher-order biases, a procedure that can be very tedious. Besides, since the asymptotic distributions of those Bierens-type test statistics are not available under the null hypothesis, one has to employ a simulation-based method to figure out the critical values, which might be computationally demanding. Another advantage is that the Hausman-type test can have superior testing power. The reason for this is that it is constructed from a quadratic function of the differences between the unrestricted and the restricted estimator at different time points. Hence, it puts more weight on larger differences. In contrast, the Bierens-type statistics use the identity function, which puts less weight on larger differences than the quadratic function. Besides, differences with opposite signs might cancel each other out in the Bierens-type statistics, reducing the likelihood of rejecting the null even when it is false. We apply them to test the constant beta hypothesis, which has been studied by Reiß et al. (2015), using high-frequency data from sector ETFs. The authors have developed an Anderson–Rubin-type test and found substantial evidence against this null using several individual stocks. Our results with the sector ETFs are in line with such a finding. Consistent with the simulation study, the rejection rates of the Hausman-type statistics are substantially higher than those Bierens-type statistics. The rest of this paper is organized as follows. Section 2 discusses the estimation under both the un-restricted and the restricted models, develops the asymptotic properties of the various estimators and gives the asymptotic efficiency bound for the restricted models. Section 3 introduces the Hausman-type test and the alternative Bierens-type test. The Monte Carlo simulation results are summarized in Section 4. We present the empirical study in Section 5. Finally, Section 6 concludes. 2. Generalized method of integrated moments P

Lst

We use −→ and − −→ to denote convergence in probability and stable convergence in law, respectively. For any matrix A, we denote its transpose by A⊺ (or [A]⊺ when the expression of A is long), and its row and column numbers by rA and cA , respectively. We use vec(·) to denote the vectorization operator and ⊗ to denote the Kronecker product. The commutation matrix TrA ,cA is uniquely defined as the (rA cA )-by-(rA cA ) matrix that transforms vec(A) into vec(A⊺ ). That is, TrA ,cA vec(A) = vec(A⊺ ). This definition implies that the commutation matrix is orthogonal. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

3

2.1. Setup and assumption Let Z be a d-dimensional stochastic process defined on some filtered probability space (Ω , F , (Ft )t ∈[0,T ] , P). Throughout this paper, we assume that the multivariate stochastic process Z follows a multivariate Itô semimartingale, given as below: t

∫ Zt = Z0 +

t



σs dWs +

bs ds + 0

∫ t∫

0

0

δ (s, z)µ(ds, dz),

(2.1)

E

where b = (bs ) is a d-dimensional multivariate càdlàg adapted process representing the “genuine” drift, the process ′ σ = (σs ) is a Rd × Rd -valued càdlàg process adapted to the filtration, W is a d′ -dimensional standard Brownian motion, the Poisson random measure µ is defined on R+ × E for some auxiliary Polish space E (e.g. Rd ), and δ = δ (ω, t , z) is a predictable Rd -valued function on Ω × R+ × E. The compensator of µ is given by ν (ω, dt , dx) = dt ⊗ λ(ω, t , dx) for some σ -finite measure λ(ω, t , dx) on E. For simplicity, the last double integral is often denoted by δ ∗ µt in the literature. ⊺ We denote the spot volatility matrix of Z at time t by c t ≡ σt σt , which takes value in the space M+ d consisting of d-dimensional positive definite matrices. Note that it is a d-by-d dimensional càdlàg adapted process. We assume that the vectorization of c, denoted by vec(c), has the following representation: t



b˜ s ds +

vec(c t ) = vec(c 0 ) + 0



t

˜s + σ˜ s dW

∫ t∫ 0

0

δ˜ (s, dx)µ(ds, dx),

(2.2)

E

˜ can be correlated with W . We use the same jump measure µ as above for simplicity. The where the Brownian motion W processes b˜ and σ˜ are càdlàg and adapted. Assumption (HF). The log-price process Z and its volatility process c are given by (2.1) and (2.2), respectively. Let r ∈ [0, 2] and L be some positive finite number. There exists a localizing sequence (τm ) of stopping times such that, for each m, the following conditions hold: (i) There exists a nonnegative bounded λ-integrable function Γm on E satisfying the following conditions: E [Γm (z)]r λ(dx) < ∞, and ∥δ (ω, t , z)∥ ∧ 1 ≤ Γm (z) for all (ω, t , z) with t ≤ τm ∫(ω). Moreover, when r ∈ [0, 1], the “genuine” drift process b satisfies ∥bt ∥ ≤ L for all t ≤ τm . When r ∈ (1, 2], ∥bt + E δ (t , z)1{∥δ (t ,z)∥≤1} λ(t , dz)∥ ≤ L for all t ≤ τm . ˜t ∥ ≤ L (ii) There is a ∫ sequence of convex compact subsets Km of M+ d such that c t ∈ Km for all t ≤ τm . Furthermore, ∥σ and ∥b˜ t + E δ (t , z)1{∥δ (t ,z)∥≤1} λ(t , dx)∥ ≤ L for any t ≤ τm .



We define Z ′ , which will be used in the examples below, as follows:

{ ′

Zt =

Z0 + Z0 +

∫t

bs ds +

∫0t ( 0

bs +

∫t



σs dWs if r ≤ 1 ) ∫t . δ (s , z)1 λ (s , dz) ds + σ dW if r >1 {∥δ (s,z)∥≤1} s E 0 s 0

By definition, this process almost surely has continuous paths. Under condition (i), this process is also locally bounded. Assumption HF is very general and widely used in the high-frequency econometrics literature (see, e.g., Jacod and Protter (2011), Jacod and Rosenbaum (2013), Aït-Sahalia and Jacod (2014), and Li et al. (2016)). It includes most of the continuous-time models used in economics and finance as special cases. Conditions (i) and (ii) describe the behaviors of Z and c, respectively. Intuitively speaking, this assumption requires that the various components in the representation of Z and c are all locally bounded over [0, T ]. The parameter r is known as the jump activity index in the literature. It is worth noting that condition (ii) essentially imposes no restrictions on the jump activity index of the volatility process c. 2.2. A continuum of moment constraints In this paper, we are interested in testing a continuum of linear restrictions on some possibly nonlinear transformations of the volatility process c: y(c t ) = x(c t ) θ 0 ,

almost everywhere on [0, T ].

(2.3)

Here, y(·) is a R -valued function, x(·) is a R -valued function, and θ is a q-by-1 time-invariant vector (but θ can take different values for different realizations). For simplicity, we may denote x(c t ) by xt , and y(c t ) by y t . The unrestricted model can be written as p

y(c t ) = x(c t ) θ t ,

p×q

almost everywhere on [0, T ],

(2.4)

where θ = (θ t ) is a q-dimensional stochastic process. Suppose this process θ t is uniquely defined by (2.4), then the testing problem can also be framed as

θt = θ0 ,

almost everywhere on [0, T ].

(2.5)

This restriction requires the entire path of θ t , almost everywhere on [0, T ], to take the same value θ 0 . Hence, it imposes a continuum of restrictions on the process θ , which otherwise could be stochastic. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

4

X. Yang / Journal of Econometrics xxx (xxxx) xxx

The testing problem is to determine whether a given realization ω ∈ Ω belongs to the following set:

ΩT0 = {ω ∈ Ω : y(c t (ω)) = x(c t (ω)) θ 0 (ω), almost everywhere on [0, T ]}. Note that this is the typical format of a null hypothesis in the high-frequency econometrics literature (cf. Aït-Sahalia and Jacod (2009), Jacod and Todorov (2009, 2010), and Jacod (2012) among others). Example (Continuous-time Regression). Let Z = (X ⊺ , Y ⊺ )⊺ , where X and Y are two Itô semimartingales, with dimensions being K and P, respectively. Consider the following continuous-time regression model Yt′ = Y0 +

t



βs dXs′ + Ut ,

∀t ∈ [0, T ],

(2.6)

0

or equivalently, in a “locally linear” format dYt′ = βt dXt′ + dUt ,

∀t ∈ [0, T ],

(2.7)

where U is a P-dimensional continuous Itô semimartingale orthogonal to X , that is, [X , U ] ≡ 0, where [·, ·] is the quadratic co-variation operator. This model has been studied by Mykland and Zhang (2006), Bollerslev et al. (2016), and Li et al. (2017a). The parameter of interest β is referred to as the continuous beta in the literature. It is worth pointing out that β can be either deterministic or stochastic. The possible time-varying/stochastic feature of β is in line with conditional asset pricing models, which often allow the exposure of assets to fundamental risks to change over time (e.g., Hansen and Richard (1987)). For simplicity, one may assume that it is time-invariant. So it would be interesting to test whether β is constant or not. Note that, in this case, the instantaneous volatility matrix at time t can be written as ′

( ct =

c XX,t c YX,t

c XY,t c YY,t

)



,

where, for example, c XY,t ≡ d[X ′ , Y ′ ]t /dt. Accordingly, let IdX be the first K rows of the identity matrix Id , and IdY the rest P rows, so that c YX,t = IdY c t [IdX ]⊺ and c XX,t = IdX c t [IdX ]⊺ are both functions of the volatility process c t . Under the orthogonality condition [X ′ , U ] ≡ 0, it is easy to derive that c YX,t ≡ βt c XX,t ,

which is equivalent to vec(c YX,t ) ≡ (c XX,t ⊗ IP ) vec(βt ), ⊺

where vec(ABC ) = (C ⊺ ⊗ A) vec(B). Hence, in this case, we have θ t ≡ vec(βt ) with y(c t ) = vec(c YX,t ) = vec(IdY c t IdX )

and

x(c t ) = c XX,t ⊗ IP = (IdX c t [IdX ]⊺ )⊺ ⊗ IP . ⊺

An alternative way to formulate the null hypothesis is as follows. Suppose c XX,t is invertible almost everywhere on

−1 1 [0, T ], then we get βt = c YX,t c − XX,t . Accordingly, we have y(c t ) = vec(c YX,t c XX,t ) and x(c t ) = IPK .

Example (Continuous-time Regression with Instruments). It may happen that the orthogonality condition [X ′ , U ] ≡ 0 does not hold. In such case, we have an endogenous problem. Suppose we have a set of instrument processes W such that [W ′ , U ] ≡ 0 and [W ′ , X ′ ] ̸= 0 on [0, T ]. Accordingly, let Z = (X ⊺ , Y ⊺ , W ⊺ )⊺ . The instantaneous coefficient βt satisfies c YW,t = βt c XW,t . Then the unrestricted moment conditions for θ t ≡ vec(βt ) write as (dW is the dimension of W ): vec(c YW,t ) = (c XW,t ⊗ IdW ) θ t ⊺

almost everywhere on [0, T ].

That is, we have y(c t ) = vec(IdY c t [IdW ]⊺ )

and

x(c t ) = (IdX c t [IdW ]⊺ )⊺ ⊗ IdW .

Example (Idiosyncratic Volatility Restrictions). For simplicity, consider the continuous-time regression model with the orthogonal condition. It can be shown that the instantaneous idiosyncratic volatility is given by 1 c UU,t = c YY,t − βt c XX,t βt = c YY,t − c YX,t c − XX,t c XY,t . ⊺

There are several restrictions on the idiosyncratic volatility process that one may want to test. A first possible restriction is whether the idiosyncratic volatility is constant. We can write the null hypothesis as vec(c UU,t ) = θ 0 , almost everywhere on [0, T ]. Another possibility is to test whether the noise-to-signal ratio is constant: c UU,t ≡ M c XX,t for some constant P-by-K ⊺ matrix M. This condition is equivalent to vec(c UU,t ) ≡ (c XX,t ⊗ IP ) θ 0 , where θ 0 = vec(M). Combining the above two cases, we can get a more general constraint: c UU,t ≡ M0 + M1 c XX,t , for two constant matrices M0 and M1 . In this case, the null hypothesis can be written as vec(c UU,t ) ≡ (IP 2 , c XX,t ⊗ IP ) θ 0 , ⊺

where θ 0 =

(

)

vec(M0 ) . vec(M1 )

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

5

2.3. The unrestricted model: estimation and asymptotics Under the unrestricted model (2.4), we need the following identification assumption. Assumption (ID:). There exists a positive semi-definite (stochastic or deterministic) matrix process Φ = (Φt )t ≥0 such that ⊺ the stochastic process x⊺ Φ x = (xt Φt xt )t ≥0 is almost surely invertible, for Lebesgue almost everywhere t ∈ [0, T ]. According to Assumption ID, we almost surely have θ t = (xt Φt xt )−1 xt Φt y t , almost everywhere on [0, T ]. Hence, the ⊺ ⊺ ˆtˆ ˆtˆ process θ can be viewed as a functional of the volatility process c. The spot estimator ˆ θ t = (ˆ xt Φ xt )−1ˆ xt Φ y t can be viewed as a minimum distance estimator: ⊺

ˆ ˆt (ˆ θ t ≡ argmin(ˆ yt − ˆ x t θ )⊺ Φ yt − ˆ xt θ ), θ

almost everywhere on [0, T ],

where ˆ xt = x(ˆ c t ) and ˆ y t = y(ˆ c t ). It is obvious that we need to estimate the spot volatility process, in order to obtain the estimator ˆ θt . For simplicity, we assume that Z is observed at equally spaced time points i∆n , i = 0, 1, . . . , ⌊T /∆n ⌋, with ∆n → 0 asymptotically, where ⌊·⌋ is the floor function. It has been established in the literature (e.g. Jacod and Protter (2011) and Aït-Sahalia and Jacod (2014) among others) that the spot covariance matrix at i∆n can be estimated by

ˆ c ni :=

1

kn ∑

k n ∆n

j=1

(∆ni+j Z )(∆ni+j Z )⊺ 1{−un ≤∆n

i+j

Z ≤u n }

.

(2.8)

Here kn is a tuning parameter (local window size) satisfying kn → ∞ and kn ∆n → 0, and un = (u1,n , . . . , ud,n )⊺ is a d-dimensional vector of truncation thresholds. For each k = 1, . . . , d, uk,n = uk ∆ϖ n with ϖ ∈ (0, 1/2) and some positive finite number uk . In general, for any t ∈ ((i − 1)∆n , i∆n ], the spot volatility c t can be consistently estimated by ˆ c ni , because of the càdlàg property of volatility paths. Hence, in the limit, we will be able to recover the entire volatility path on [0, T ]. However, if we test the null (2.3) at each time instant, then we will have a multiple testing problem. One solution is to construct test statistics based on the integrated functionals of the stochastic volatility process. ∫T The volatility occupation measure F on a finite interval [0, T ] is defined as F(B) = 0 1{c s ∈B} ds, for any Borel subset B ⊂ Rd×d (see, e.g., Li et al. (2013, 2016) for more details). For any function g defined on Rd×d , we have (we vectorize g because central limit theorem is usually stated for vector-valued random variables rather than matrix-valued ones):

∫ Fg ≡

(

T



)

vec g(c s ) ds.

(

vec g(c) F(dc) =

)

0

Hence, the normalized measure T1 F can be viewed as an “expectation” operator, which gives the “mean” of the process g(c) over [0, T ]. Therefore, Fg gives an integrated moment of the spot volatility process. If we plug in spot volatility estimates at different times, then we will get a consistent estimator of Fg, given by ⌊T /∆n ⌋−kn

ˆ Fn g :=



vec g(ˆ c ni ) ∆n .

(

)

i=0

However, it is well-known that such a plug-in type estimator has higher-order bias(es) (see, e.g., Jacod and Rosenbaum (2013, 2015), Li and Xiu (2016), Li et al. (2017a), Li et al. (2019) and Yang (2018)). One can construct a bias-corrected estimator by using either the multi-scale jackknife method proposed by Li et al. (2019) or the analytical-based correction method proposed by Yang (2018). More specifically, the multi-scale jackknife estimator is given by

Fn g =

L ∑

ψl ˆ Fn g(kl,n ),

(2.9)

l=1

where ˆ Fn g(kl,n ) means that the spot volatility estimates are constructed with window size kl,n . The weights φq and local windows kl,n , 1 ≤ l ≤ L, satisfy L ∑

ψl = 1,

l=1

L ∑

1 1/2 ψl k− l,n = o(∆n )

l=1

L ∑

1/2 ψl kl,n = o(∆− ). n

l=1

On the other hand, the analytical-based correction estimator is given by

Fn g = ˆ Fn g(kn ) −

1 kn

Bnu g + kn ∆n Bno g .

(2.10)

Here Bnu g is an estimator of the “under-smoothing” bias, given by: ⌊T /∆n ⌋−kn

Bnu g =

∑ i=0

( )]⊺ ( ) 1[ Irg cg ⊗ vec (Id2 + Td,d ) ˆ c ni ⊗ ˆ c ni vec [H(g(ˆ c ni ))]⊺ ∆n ,

2

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

6

X. Yang / Journal of Econometrics xxx (xxxx) xxx

where rg and cg are the row and column numbers of g (we use a similar notation for any matrix to avoid introducing too many parameters), respectively, and Hg gives the Hessian matrix of the function g. We follow Magnus and Neudecker (1999) and define it as Hg := D[Dg ]⊺ , where Dg := ∂ vec(g)/∂[vec(c)]⊺ . There could be other matrix representations of the differential operator D. Refer to Kollo and von Rosen (2006) for more details. The notation Bno g represents an estimator of the “over-smoothing” bias. When c does not jump, it is given by (see Yang (2018) for more details): ⌊T /∆n ⌋−kn

) 1 ( ˆ ˆ c ni )⊺ H(g(ˆ c ni )) vec(∆ c ni ) Irg cg ⊗ vec(∆ 8kn



Bno g =

i=kn +1



1 k2n ∆n

Bnu g +

1 2

vec g(ˆ c n1 ) + g(ˆ c n⌊T /∆n ⌋−kn −1 ) .

(

)

Remark 1. The above Fn g is defined by using all ˆ c ni ’s, which are constructed from overlapping local windows. One can also use spot estimates that are not overlapped with each other. For any v ∈ [0, 1), let vn = ⌊v kn ⌋. Then, for example, one can use the following raw estimator: ⌊ Tk−v∆∆n ⌋−1 n n



n

ˆ F gvn :=

vec g(ˆ c nikn +vn ) ∆n .

(

)

i=0 k −1

Then the above overlapping ˆ Fn g can be viewed as the average of {ˆ Fn grn }vnn =0 . Since these kn number of different nonoverlapping estimates use almost the same set of observations, with the possible exception of the first and last kn observations, they are nearly perfectly correlated with sufficiently high sampling frequencies. This explains why the nonoverlapping estimators and the overlapping one all have the same asymptotic variance (see Jacod and Rosenbaum (2013) and Yang (2018) for more discussions). The same conclusion also applies to the test statistics introduced in Section 3. The following lemma, which is a summary of the main results of Li et al. (2019) and Yang (2018), describes the asymptotic behavior of the bias-corrected estimator Fn g, constructed by using either one of the above methods. The analytical form of the asymptotic variance (2.12) is from the latter paper. ϵ Lemma 1. Let Assumption HF hold for some r ∈ [0, 1). Consider a function g that is C 3 on Km := {M ∈ M+ d : infA∈Km ∥M − A∥ ≤ ϵ} for some ϵ > 0, where Km is given in Assumption HF. We choose the truncation parameter ϖ such that ϖ ∈ [ 2(21−r) , 12 ). Furthermore, suppose the following conditions hold for the local window size kn :

k3 ∆2 −→ 0, ) (( n n )2 ∥E(c˜ t +u − c˜ t )∥ + E(∥˜c t +u − c˜ t ∥2 ) k2n ∆n −→ 0. sup

k3n ∆n −→ ∞, sup

(2.11)

t ∈[0,T ] u∈(0,kn ∆n )

Then we have the following functional-stable-convergence-in-law results: 1



∆n

(

Fn g − Fg

)

Lst

−−→ Z ,

where, conditionally on F , the process Z is a continuous centered Gaussian martingale with variance given by

FV(g , g) :=

T



T



(

Vgg ,t dt = 0

D g(c t )

)(

(Id2 + Td,d ) c t ⊗ c t

)[ (

)]⊺

D g(c t )

dt ,

(2.12)

0

where D(g(c)) := ∂ vec(g)/∂[vec(c)]⊺ and the commutation matrix Tp,q is defined in such a way that, for any p-by-q matrix A, Tp,q vec(A) = vec(A⊺ ). Note that when g is a matrix-valued function, FV(g , g) is equivalent to FV(vec(g), vec(g)). When the volatility of volatility process c˜ is also an Itô semimartingale, which holds true for most stochastic volatility models used in practice, the second line of (2.11) is equivalent to k3n ∆2n → 0, imposing no more restrictions than the first line. Let kn ∝ ∆−κ n , the first line of (2.11) can be re-written as κ ∈ (1/3, 2/3). It is well-established in the literature that the optimal convergence rate for the nonparametric spot volatility estimator is achieved when k2n ∆n converges to some positive finite number, i.e. κ = 1/2. That is to say, no matter with under-smoothing (κ < 1/2) or oversmoothing (κ > 1/2) local window, as long as κ ∈ (1/3, 2/3), one can always get a feasible central limit theorem for the bias-corrected estimator Fn g. For more discussions, refer to Li et al. (2019) and Yang (2018). Moreover, note that the asymptotic variance (2.12) is also an integrated function of the volatility process. As long as one can find the functional form, one can use either ˆ Fn V(g , g), which is consistent and sufficient for conducting inference on g, or one of the bias-corrected estimators (if one wants to get a better finite sample performance) to estimate the asymptotic variance. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

7

In view of Lemma 1, the instantaneous asymptotic variance for g = (y , x) is given by

(

Vyy ,t Vxy ,t

Vyx,t Vxx,t

)

( =

(

)

D(y t )( (Id2 + Td,d ) c t ⊗ c t ) [D(y t )]⊺ D(xt ) (Id2 + Td,d ) c t ⊗ c t [D(y t )]⊺

(

)

)

D(y t )( (Id2 + Td,d ) c t ⊗ c t ) [D(xt )]⊺ , D(xt ) (Id2 + Td,d ) c t ⊗ c t [D(xt )]⊺

where xt = x(c t ) and y t = y(c t ). We allow for the possibility that Φt = Φ (c t ), ∀t ∈ [0, T ], whereas Li et al. (2016) assume Φ to be a F -measurable time-invariant matrix (see Assumption ID therein). Suppose Φ (·) is differentiable, then we have

D(θ ) = ∂x θ D(x) + ∂y θ D(y) + ∂Φ θ D(Φ ), where, for instance, ∂x θ = ∂ vec(θ )/∂[vec(x)]⊺ . When Φ is also a function of c, Lemma 1 implies that its estimator can be correlated with those of x and y (this may also happen even if Φ is not a function of c). Hence, one may wonder if FV(Φ , Φ ), FV(Φ , x) and FV(Φ , y) may affect the asymptotic variance FV(θ, θ ) or not (recall (2.12) for the definition of FV(·, ·)). In fact, whether Φ depends on c or not (more precisely, D(Φ ) ≡ 0 or not), it can be shown that ∂Φ θ ≡ 0. Consequently, none of FV(Φ , Φ ), FV(Φ , x) and FV(Φ , y) appears in the expression of FV(θ, θ ). ⊺ ⊺ ⊺ ⊺ ⊺ ⊺ Besides, we have ∂y θ t = (xt Φt xt )−1 xt Φt and ∂x θ t = −θ t ⊗ ∂y θ t + (xt Φt xt )−1 ⊗ [y t (Φt − Φt )] (see Appendix B.2 and ⊺ Lemma B.1 for the derivations). When Φ is a symmetric matrix process, we get ∂x θ t = −θ t ⊗ ∂y θ t . Note that both ∂y θ t and ∂x θ t depend on Φt . We are going to present the expression for FV ∫(Tθ, θ ) in the following theorem. For notation simplicity, we follow the notation used by Li et al. (2016) and define S¯ := 0 St dt for any process S = (St )t ∈[0,T ] . Theorem 2. Suppose that the assumptions in Lemma 1 and Assumption ID all hold true. Let Φ = (Φ )t ∈[0,T ] be a symmetric matrix process. The asymptotic variance of the bias-corrected estimator Fn θ is given by

FV(θ, θ ) = (x⊺ Φ x)−1 x⊺ ΦΣθθ Φ x(x⊺ Φ x)−1 ,

(2.13)

where

( ) Vyy ,t ( Σθθ,t = Ip , −θ ⊺t ⊗ Ip Vxy ,t

Vyx,t Vxx,t

)(

Ip

−θ t ⊗ Ip

)

,

(2.14)

for t ∈ [0, T ]. The integrand in (2.13) takes the same form as the sandwich variance–covariance matrix of the weighted least square −1 (WLS) estimators. Following a similar argument, the optimal choice for symmetric Φ is Φt∗ = Σθθ, t for t ∈ [0, T ]. The corresponding optimal asymptotic variance for θ is given as follows: −1 −1 FV(θ, θ )∗ = (x⊺ Σθθ x) .

The corresponding estimator Fn θ ∗ is an extension of the generalized least square (GLS) estimator to the continuous-time case. 2.4. The restricted model: estimation and asymptotics Under the restricted model (2.3), a consistent estimator of θ 0 is given by T1 Fn θ (note that the null). Besides, one can also construct some other estimators. The estimator for a constant θ considered by Li et al. (2016) can be written as

θ=

1 F T

1 T

∫T 0

θ t dt = θ 0 under

LTT ˆ ¯ n x¯ n )−1 x¯ ⊺n Φ ¯ n y¯ n , θ n = (x¯ ⊺n Φ

where, for instance, x¯ n is a bias-corrected estimator of x¯ . This estimator can be obtained by first estimating the integrated x, Φ , and y, and then combining them together. We note that, under the null hypothesis of a time-invariant θ , the above estimator will remain the same if one replaces the possible time-varying weight matrix process (Φt )t ∈[0,T ] by a time¯ . In view of such invariant property, it does not really matter whether Φ is time-invariant or not for the invariant one Φ LTT ˆ estimator θ n considered by Li et al. (2016). However, for some of the other estimator(s) under the null, whether Φ is time-varying or not can make a difference. Consider the following estimator: new 1 ⊺ ˆ θ n = (x⊺ Φ x)− n (x Φ y)n .

Similar to the spot estimator ˆ θ t discussed in the previous subsection, these two estimators are also of the form ˆ B−1ˆ A. LTT ⊺ ¯ ⊺ Provided that x¯ Φ x¯ and x Φ x are invertible, we can use the delta method to derive the asymptotic variances of both ˆ θn new and ˆ θn . Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

8

X. Yang / Journal of Econometrics xxx (xxxx) xxx

LTT

¯ n , and y¯ n is exactly the same as that of θ t with respect First, note that the functional form of ˆ θ n with respect to x¯ n , Φ to xt , Φt , and y t . Quite similar to the case of Dθ (see Appendix B.3 for some explanations), when Φ is symmetric, we obtain LTT

∂y¯ˆ θn

LTT

¯ n, ¯ n x¯ n )−1 x¯ ⊺n Φ = (x¯ ⊺n Φ

∂x¯ˆ θn

LTT

= −ˆ θn

LTT

⊗ ∂y¯ˆ θn ,

LTT

∂Φ¯ ˆ θn

= 0.

LTT new ¯ n will not affect the asymptotic variance of ˆ Once again, the estimation error associated with Φ θ n . As for ˆ θ n , we have new new new 1 1 ∂x⊺ Φ y ˆ θ n = (x⊺ Φ x)− and ∂x⊺ Φ x ˆ θ n = −[ˆ θ n ]⊺ ⊗ (x⊺ Φ x)− n n .

The covariance between (x⊺ Φ x)n and (x⊺ Φ y)n is quite complicated. However, as shown in the following theorem, the new estimation errors associated with Φ still have no impacts on the asymptotic variance of ˆ θn . Theorem 3. Let the assumptions in Theorem 2 hold true. For symmetric Φ (either a stochastic matrix process or a time-invariant matrix), we have the following convergence results under the null hypothesis (2.3).

¯ x¯ is almost surely invertible, then (i) If the matrix x¯ ⊺ Φ 1



∆n

) Lst ( ) ( LTT ˆ ¯ x¯ )−1 x¯ ⊺ Φ ¯Σ ¯ θ0 θ0 Φ ¯ x¯ (x¯ ⊺ Φ ¯ x¯ )−1 , θ n − θ 0 −−→ MN 0, (x¯ ⊺ Φ

(2.15)

¯ θ0 θ0 := FΣθ0 θ0 . where MN stands for mixed normal, Σθ0 θ 0 is given by (2.14) with θ t replaced by θ 0 , and Σ (ii) If the matrix x⊺ Φ x is almost surely invertible, then 1



∆n

( new ( ) Lst ) ˆ θ n − θ 0 −−→ MN 0, (x⊺ Φ x)−1 x⊺ ΦΣθ0 θ0 Φ x (x⊺ Φ x)−1 .

(2.16)

Part (i) of Theorem 3 generalizes Proposition 1 in Li et al. (2016), where Φ is the identity matrix, by allowing Φ to be an unknown time-invariant matrix or even a stochastic process. To the best of our knowledge, part (ii) is new to the literature. LTT LTT Let Var(ˆ θ ) denote the asymptotic variance of ˆ θ n given in (2.15). It also has the sandwich structure the same as −1 ¯ is Σ ¯ θθ the WLS estimators. Therefore, the optimal choice of Φ and the optimal variance is given by LTT ∗

¯ θ−1θ x¯ )−1 . ) = (x¯ ⊺ Σ

Var(ˆ θ

0 0

To find the optimal asymptotic variance for ˆ θ n , we need the following matrix version of the Cauchy–Schwarz inequality. Lemma 4 (Cauchy–Schwarz Inequality for Matrix-valued Processes). Let A = (At )t ∈[0,T ] and B = (Bt )t ∈[0,T ] be two matrix processes such that all the following matrix products are well-defined. Assume that B⊺ B is invertible. The following matrix A⊺ A − A⊺ B(B⊺ B)−1 B⊺ A is positive semi-definite. If A⊺ A and A⊺ B are also invertible, the above conclusion is equivalent to the following matrix (A⊺ B)−1 B⊺ B(B⊺ A)−1 − (A⊺ A)−1 is positive semi-definite. Both matrices are identically zero if and only if At ≡ Bt C , where C is a time-invariant matrix. Theorem 5. Suppose that Assumption ID holds true. Moreover, assume that Σθ0 θ0 = (Σθ0 θ0 ,t )t ∈[0,T ] is almost surely, almost everywhere positive definite. (i) Let Φ be a symmetric matrix process such that x⊺ Φ x is almost surely invertible. Then the following matrix new

Var(ˆ θ

new ∗

) − Var(ˆ θ

) = (x⊺ Φ x)−1 x⊺ ΦΣθ0 θ0 Φ x (x⊺ Φ x)−1 − (x⊺ Σθ−1θ x)−1 0 0

is almost surely positive semi-definite. (ii) The following matrices LTT ∗

Var(ˆ θ

new ∗

) − Var(ˆ θn

1

¯ θ−1θ x¯ )−1 − (x⊺ Σθ−1θ x)−1 ) = (x¯ ⊺ Σ 0 0

0 0

1 new FV(θ, θ ) |θt ≡θ0 −Var(ˆ θ n )∗ = 2 (x⊺ Σθ−01θ0 x)−1 − (x⊺ Σθ−01θ0 x)−1 2 ∗

T T are almost surely positive semi-definite.

Since Σθ 0 θ 0 ,t is almost surely positive definite on [0, T ], there exists an invertible matrix process Ξ such that Ξt Ξt ≡ Then part (i) readily follows from the matrix version Cauchy–Schwarz inequality by letting At ≡ Ξt xt and ⊺

Σθ−01θ0 ,t .

new

−⊺

Bt ≡ Ξt Φt xt . The conclusion of part (i) suggests that the optimal asymptotic variance for ˆ θn

ˆnew ∗

Var(θ n

) =

(x⊺

is given by

Σθ−01θ0 x)−1 .

This lower bound can be achieved by choosing Φt ≡ Σθ−1θ

0 0 ,t

.

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

9

The first conclusion of part (ii) is equivalent to the matrix

¯ θ−1θ x¯ x⊺ Σθ−1θ x − x¯ ⊺ Σ 0 0

0 0

is almost surely positive semi-definite, which is a direct result from the matrix version Cauchy–Schwarz inequality by new −⊺ letting At ≡ Ξt xt and Bt ≡ Ξt . Hence, unless Σθ−1θ ,t xt is time-invariant on [0, T ], the estimator ˆ θ n is more efficient 0 0

LTT

than ˆ θ n under the null hypothesis. Next, Assumption ID and the positive definite property of Σθ 0 θ0 together imply that the matrix process x⊺ Σθ−1θ x is also 0 0

almost surely positive definite on [0, T ]. Hence, there exists an invertible matrix process χ such that χt χt ≡ xt Σθ−1θ ⊺

−⊺

Then the conclusion can be proved by letting At ≡ χt and Bt ≡ χt . Hence, unless new

the estimator ˆ θn

is more efficient than

1 n F T

θ under the null hypothesis.

⊺ xt

Σθ−01θ0 ,t xt



0 0 ,t

xt .

is time-invariant on [0, T ],

Example (Continuous-time Regression Continued). Recall that in the example of continuous-time regression (without 1 instruments), we have θ t ≡ vec(βt ) ≡ vec(c YX,t c − XX,t ), where c XX is assumed to be almost surely invertible on [0, T ]. After some tedious calculation (refer to Appendix B.5 for more details), we obtain 1 Σθθ,t = c XX,t ⊗ (c YY,t − c YX,t c − XX,t c XY,t ) =: c XX,t ⊗ c UU,t .

−1 −1 1 Hence, FV(θ, θ )∗ = (x⊺ Σθθ x) = c XX,t ⊗ c − UU,t (recall that xt ≡ c XX,t ⊗ IP ≡ c XX,t ⊗ Ip ). Under the null hypothesis that θ t ≡ θ 0 , the above estimators with their corresponding optimal weighting process Φ become ⊺

1 T

Fn θ ∗ =

1 T

1 vec(c YX c − XX )n

LTT,∗ 1 ˆ ¯ θ−1θ ,n x¯ n )−1 x¯ ⊺n Σ ¯ θ−1θ n y¯ n = (c XX ⊗ IP )− θn = (x¯ ⊺n Σ n vec(c YX )n 0 0 0 0 new,∗ −1 −1 −1 1 ⊺ −1 ˆ θn = (x⊺ Σθ−01θ0 x)− n (x Σθ 0 θ 0 y)n = (c XX ⊗ c UU )n (IK ⊗ c UU vec(c YX ))n . LTT

new

The condition for Var(ˆ θ )∗ = Var(ˆ θ n )∗ now becomes that the matrix process c UU,t is time-invariant on [0, T ] (we have −1 −1 Σθ θ ,t xt = IK ⊗ c UU,t in this case), which is the constant idiosyncratic volatility restriction discussed in Section 2.2. 0 0

new

1 To make T12 FV(θ, θ )∗ = Var(ˆ θ n )∗ , we need the matrix c XX,t ⊗ c − UU,t to be time-invariant on [0, T ]. To some extent, this is quite similar to the constant noise-to-signal restriction discussed in Section 2.2. When both X and Y are univariate, i.e., K = P = 1, these estimators reduce to ˆ βn2 , ˆ βn1 , and ˆ βn∗ , respectively, all of which have been studied by Li et al. (2017a) (see Section 4.2 therein).

2.5. Efficiency bound new,∗

The previous subsection indicates that ˆ θn is the most efficient GMM-type estimator under restriction (2.3). One may wonder if it is the most efficient estimator. It is well-known in the literature that the GMM estimator cannot always achieve the Cramér–Rao lower bound, hence it can be inefficient relative to the maximum likelihood estimator (MLE). Godambe (1960) has demonstrated that, when using the score function, the method of moment estimator is as efficient as the maximum likelihood estimator. Carrasco and Florens (2014) have further extended this result and shown that the sufficient and necessary condition for the GMM estimator to be as efficient as MLE is that the true score belongs to the closure of the linear space spanned by the moment conditions. In a continuous-time setting, the Locally Asymptotically Normality (LAN) property often does not hold. Instead, Clement et al. (2013) have established the Locally Asymptotically Mixed Normality (LAMN) property by extending the Hajek convolution theorem in a quite general setup. Jacod and Rosenbaum (2013) have shown that the estimator Fn g of the integrated volatility functionals satisfies the LAMN property and attains the efficiency bound in the unrestricted case where g(c t ) is a time-varying stochastic process. In the context of continuous-time regression, Li et al. (2017a,b) have proved that estimators of the time-invariant continuous beta and jump beta can achieve their corresponding efficiency bounds. Based on these papers, we introduce the following definition. Definition 1 (LAMN). The sequence (Pθn ) satisfies LAMN property at θ = θ 0 if there exists a sequence of q-dimensional random vector ζn and a sequence of almost surely positive definite q-by-q random matrices Γn , such that, for any l ∈ Rq , √ θ +l / n

log

dPn0

θ

dPn0

1

L

= l ⊺ Γn1/2 ζn − l ⊺ Γn l + oP (1) and (ζn , Γn ) −→ (ζ , Γ ), 2

where Γ is an almost surely positive definite q-by-q matrix and ζ is a q-dimensional standard normal random variable independent of Γ . Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

10

X. Yang / Journal of Econometrics xxx (xxxx) xxx

To find the efficiency bound of estimating θ 0 , we introduce the following simplified version of Assumption HF. Assumption (SHF). Let (bs ) and (as ) be two locally bounded stochastic processes on [0, T ], with the dimensions being d-by-1 √ and d-by-d, respectively. In addition, assume that (as ) is almost surely almost everywhere invertible on [0, T ]. Let θ+l / n Pn be the probability distribution function of Ztl = Z0 +

t



t



0



as (θ + l / n)dBs ,

bs ds + 0

where θ, l ∈ R and B is a d-by-1 dimensional Brownian motion. q

We can also assume that both b and a depend on Ztl , just like Clement et al. (2013) and Li et al. (2017a). However, this will not change the efficiency bound. Hence, we discard such dependence, to make the notation in the proof a bit less complicated. Let D, Da , and Dθ be the derivatives with respect to c, a, and θ respectively. Theorem 6. Assume that Dθ a is almost surely almost everywhere well-defined on [0,T]. We have the following results: (i) If Assumption SHF holds true, then the sequence (Pθn ) satisfies the LAMN property with

Γ = [Dθ a]⊺ (Id ⊗ a−⊺ )(Id2 + Td,d )(Id ⊗ a−1 )[Dθ a]. ( ) (ii) Let γ (θ ) ≡ Dy − (θ ⊺ ⊗ Ip ) Dx [Da c ] [Dθ a]. Under the null hypothesis, we have x ≡ γ (θ 0 ) and x⊺ Σθ−1θ x ≡ 0 0 γ (θ 0 )⊺ Σθ−01θ0 γ (θ 0 ). Hence, if the following condition holds

γ (θ 0 )⊺ Σθ−01θ0 γ (θ 0 ) ≡ [Dθ a]⊺ (Id ⊗ a−⊺ )(Id2 + Td,d )(Id ⊗ a−1 )[Dθ a], (2.17) √ new,∗ then the limiting distribution of n(ˆ θn − θ 0 ), conditionally on Γ , can be represented as a convolution between a mixed Gaussian distribution MN (0, Γ −1 ) and another transition kernel. Typically, one needs to know the form of Dθ a to verify (2.17). In the constant beta model studied by Li et al. (2017a), the explicit functional formal of a and β (θ 0 = vec(β0 ) in this case) can be derived. Hence, the authors can directly evaluate the derivative Dθ a. In more general cases, one probably has to solve the following equation (γ (θ 0 ) ≡ x):

Dy − θ0 ⊗ Ip Dx [Da c ] [Dθ a] ≡ x.

(



)

With the matrix representation adopted in this paper, the matrix Dy − θ0 ⊗ Ip Dx [Da c ] is p-by-d2 dimensional, hence it is not invertible in general. The solution will therefore involve the generalized inverse of that matrix. Thus, it is probably not easy to verify (2.17) in general. One may have to check it on a case-by-case basis. Besides, we would not expect (2.17) to hold in every case, just like the above-mentioned finding that the GMM estimator cannot always achieve the efficiency bound in the classic asymptotic setting. Intuitively, we would expect that the sufficient and necessary condition for (2.17) might be quite similar to (or even the same as) the conclusion of Carrasco and Florens (2014) in the classic asymptotic setting. Yet it does not seem to be straightforward to show this either. Hence, we leave this up to future exploration. However, as to be shown in Section 3.2, even without knowing explicitly whether the efficiency bound can be achieved or not, we can still get the result that the variance of the difference between the two estimators discussed therein is the difference of the respective variances, which is a key property used by Hausman (1978).

(



)

3. Consistent tests for a continuum of moment restrictions We first present a Hausman-type test, which is new in this context. Then we show that the GMM-efficient estimator

new,∗ ˆ θn can also be used to update the existing Bierens-type test. The resulting asymptotic variance is easier to compute.

3.1. Hausman-type test ∗ ⊺ ∗ ⊺ ∗ −1 ˆt ˆ ˆt ˆ ˆt∗ = Σ ˆθθ, According to the results in the previous section, we know that ˆ θ t = (ˆ xt Φ xt )−1ˆ xt Φ y t , where Φ t , is a

new,∗

consistent estimator of θ t under both the null and alternative hypotheses for√ any t ∈ [0, T ], while ˆ θn is only consistent under the null but more efficient. In fact, the convergence rate of ˆ θ t is kn ∧ √k1∆ (see Jacod and Protter (2011) n

n

new,∗

ˆ and Aït-Sahalia and Jacod has a much faster √(2014)), since it only uses local observations. In contrast, the estimator θ n convergence rate, i.e. 1/ ∆n , because it uses all of the observations. Consequently, we do not even need to consider the new,∗ estimation error associated with ˆ θn , which will always be dominated by the error associated with ˆ θ t . However, we new,∗ ˆ include the asymptotic variance of θ n , in order to obtain a (possibly) better finite sample performance.

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

11

Let Assumption HF hold for some r ∈ [0, 1). If the following conditions hold

Lemma 7.

0<κ <

1 2

and

κ ∧ (1 − κ ) 1 ≤ϖ < , 2(2 − r) 2

(3.1)

then under the null hypothesis, for any t ∈ [0, T ], we have

(1 kn

−1 −1 −1 ˆθθ ˆ (ˆ x⊺ Σ x)t − ∆n [Fn (x⊺ Σθθ x)]−1

)−1/2 ( ∗ new ) Lst ˆ θt − ˆ θn −−→ N(0, Iq ).

Theoretically speaking, one can also have a central limit theorem for some κ ∈ [1/2, 2/3). However, the asymptotic variance, in this case, depends on the volatility of volatility, i.e., ˜ c, which is notoriously hard to estimate (Vetter, 2015; Aït-Sahalia and Jacod, 2014; Clinet and Potiron, 2017). Therefore, we simply discard this case. In view of the above discussion, we can construct a localized or spot (that is, for any given t ∈ [0, T ]) Hausman-type statistic, given as follows: ∗ new θ i∆n − ˆ θn hi∆n := ˆ

(

)⊺ ( 1 kn

−1 −1 −1 ˆθθ ˆ (ˆ x⊺ Σ x)i∆n − ∆n [Fn (x⊺ Σθθ x)]−1

)−1 ( ∗ new ) ˆ θ i∆n − ˆ θn .

The above discussion and Lemma 7 imply that hikn ∆n converges stably in law to a chi-square random variable with q degrees of freedom. However, since the null hypothesis (2.3) is a continuum of restrictions on [0, T ], the final test is based on the sum of these localized test statistics. More specifically, consider the following test statistic using either non-overlapping or overlapping windows: Hnon =

1

⌊T /(kn ∆n )⌋−1



T

hikn ∆n − q

i=0



2q

kn ∆n ,

Hover =

1 T

⌊T /∆n ⌋−kn

∑ i=0

hi∆n − q



2q

∆n .

The statistic ˆ T n (b) proposed by Reiß et al. (2015) is constructed in a similar way. It is based on the square of a spot n ˆ statistic Cj (b), which is the sample spot quadratic covariance between X and U in the continuous-time regression example. Hence, in a certain sense, the statistic ˆ T n (b) can be viewed as an integrated Anderson–Rubin statistic (cf. Anderson and Rubin (1949)). The asymptotic behavior of the Hausman-type test is summarized in the following theorem. Theorem 8. have



T kn ∆n

new

If ˆ θn

Let k2n ∆n → 0, k3n ∆n → ∞, and ϖ ∈ [ 4(21−r) , 21 ). In the restriction to the null, for H = Hnon or H = Hover , we Lst

H− −→ N(0, 1).

− θ t = OP (1) under the alternative, we have





T /(kn ∆n ) H = OP ( kn /∆n ).



As shown above, the convergence rates√of the two Hausman-type test statistics are both 1/ kn ∆n . As a comparison, the convergence rate of Fn g in Lemma 1 is 1/ ∆n . This difference is actually quite crucial. With a faster convergence rate, the condition (2.11) in Lemma 1 can only make the integrated third- and higher-order terms in the Taylor expansion of ˆ θ t −θ t asymptotically negligible, while the integrated second-order term leads to the higher-order bias(es) when estimating Fτ θ (see the proofs in Jacod and Rosenbaum (2013) and Yang (2018) for more details). However, for the Hausman-type statistics, the condition k3n ∆n → ∞ can make the second-order term in the same Taylor expansion of ˆ θ t − θ t , which includes those biases given in (2.10), asymptotically negligible (see the proof in Appendix B.8 for more details). To put it differently, the sources of biases are essentially √ the same in both cases. When estimating Fτ θ , it is impossible to make them asymptotically negligible √ at the rate 1/ ∆n . In contrast, for the Hausmantype statistics, we only need them to be negligible at the rate of 1/ kn ∆n , which is much less stringent. Consequently, one does not have to do bias-correction when constructing the Hausman-type test statistics as long as the condition k3n ∆n → ∞ is satisfied. Hence, it is easier to implement the Hausman-type test than those Bierens-type tests to be discussed in the next subsection. Another advantage of the Hausman-type test is that it is pivotal. Hence, one does not have to use a simulation-based method to construct the rejection region. From a theoretical point of view, its disadvantage is that it has a more stringent requirement on the local window size. On one hand, one cannot choose very large local window sizes because of the requirement k2n ∆n → 0. However, our simulation study shows that its finite sample performance is quite good for a reasonably large range of window sizes. On the other hand, with smaller local window sizes, the estimation errors of spot θ t are more likely to be large, and then the squared errors will lead to large spot h statistics. Hence, this test may have distorted size when the local window size is relatively small. However, this feature can also increase the testing power, as the quadratic function puts more weights on large differences between the un-restricted and the restricted estimators than the identity function, which is used in Bierens-type statistics. Moreover, under the alternative hypothesis, Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

12

X. Yang / Journal of Econometrics xxx (xxxx) xxx





the divergence rate for the Bierens-type statistics (see the next subsection) is 1/ ∆n , which is slower than kn /∆n , which is the divergence rate of the Hausman-type test statistics. In contrast, one needs to use a simulation-based method to find the critical values for the Bierens-type statistics, which can be very time consuming if the dimension is relatively large. However, the Bierens-type statistics can be applied with new a broader range of local window sizes. Moreover, since we use a linear function of ˆ θt − ˆ θ n in the construction of the Bierens-type statistics, it is more likely that some of the weighted differences between the un-restricted and the restricted estimator are (partially) canceled by others with the opposite sign. In view of this, one could expect that the Bierens-type statistics have better size even with a small local window. However, this may yield inferior testing power compared to the Hausman-type test. 3.2. Bierens-type tests Consider a set of functions W and define the following subset of Ω :

{ ΩT0 (W ) := ω ∈ Ω :

T



θ (t , ω)w(t)dt =

T



0

θ 0 (ω)w (t)dt ,

} ∀w ∈ W ,

0

provided that the two integrals are well-defined. We will call W a determining class of ΩT0 if ΩT0 (W ) = ΩT0 . It is evident that the set of all functions such that the two integrals are well-defined make up a determining class. The question is whether we can find a smaller set of functions for W . In ∫particular, denote by δτ the Dirac-delta function centered at τ , ∀τ ∈ [0, T ]. Note that for any function f of time, we T have 0 g(t)δτ (t)dt = g(τ ). It readily follows that for any τ ∈ [0, T ] T



θ (τ , ω ) = θ 0 (ω )

θ (t , ω)δτ (t)dt =

⇐⇒ 0

T



θ 0 (ω)δτ (t)dt . 0

Therefore, W = {δτ }τ ∈[0,T ] forms a determining class. Bierens (1982, 1990) has shown that W = {exp(iτ )}τ ∈R , where i is the imaginary root, is also a determining class. Bierens and Ploberger (1997) have further proved that W = {wτ (t)}τ ∈Γ , where Γ is a compact subset of R, is a determining class if (i) every wτ (·) is infinitely many times continuously differentiable at zero and (ii) the set {k ∈ N : (d/du)k w (u)|u=0 = 0} is finite. One example of such determining class is W = {wτ (t) = cos(τ t) + sin(τ t)}τ ∈Γ . To construct a Bierens-type test, we can consider the following quantity

ζ 1 (τ ) := Fτ θ − Fτ θ 0 = Fτ (θ − θ 0 ) =

T

∫ 0

( ) wτ⊺ (t) θ (c t ) − θ 0 dt ,

Note that for a univariate wτ , one can transform it into a vector using wτ 1, where 1 is a column vector whose elements are all 1. Under the null hypothesis (2.3), we have ζ 1 (τ ) = 0, ∀wτ ∈ W . The case where θ 0 is known is relatively easy to handle. Below we mainly discuss the case where θ 0 is unknown and new needs to be estimated. If we use ˆ θ n to estimate θ 0 , then we have the following estimator of ζ (τ ): ⌊T /∆n ⌋−kn

new ζn1 (τ ) := Fnτ θ − Fnτˆ θn =



( new ) wτ⊺ (i∆n ) θ (ˆ c i∆n ) − ˆ θ n ∆n .

(3.2)

i=0

Under the null hypothesis, we have

( ) ( ) new ζn1 (τ ) = ζn1 (τ ) − ζ 1 (τ ) = Fnτ θ − Fτ θ − Fnτˆ θ n − Fτ θ 0 . new

Since both estimators Fnτ θ and ˆ θ n are constructed from the same sample, they can be correlated. The following theorem describes the asymptotic behavior of the τ -indexed empirical process ζn1 (τ ). In general, one can also include a volatility-dependent weighting function

ζn2 (τ ) = Fnτ y ⋆ − Fnτ x⋆ ˆ θn , new







(3.3)





where y = z y and x = z x for the function z . This testing process is very similar to the one considered by Li et al. new (2016).1 Note that ζn2 (τ ) = Fnτ x⋆ (θ −ˆ θ n ). That is, this testing process can be viewed as an wτ x⋆ -weighted average of the new difference (ˆ θt −ˆ θ n ). Hence, if the difference between the unrestricted and the restricted estimators is larger when x⋆ is larger, then the test based on ζn2 (τ ) may have larger testing power than that based on ζn1 (τ ). One can also set y ⋆ = x⊺ Φ y and x⋆ = x⊺ Φ x. Provided that Fnτ x⊺ Φ x is invertible, we get the following empirical process from (3.3) by multiplying it by (Fnτ x⊺ Φ x)−1 : new

(Fnτ x⊺ Φ x)−1 Fnτ x⊺ Φ y − ˆ θn

new

new

:= ˆ θ n,τ − ˆ θn .

new LTT 1 In fact, if we replace ˆ θ n by ˆ θ n and let the weighting process z ⋆ be a scalar function of the spot volatility process, the above empirical process becomes the one studied in the cited paper.

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

13

new

That is, we are comparing two different estimators of θ 0 in the restricted model: the estimator ˆ θ n puts equal weights new to all of the observations, while ˆ θ n,τ assigns weights according to the weighing function wτ . But the asymptotic variance of this choice is a bit more involved than that associated with y ⋆ = y and x⋆ = x. For simplicity, we will focus on the latter case in what follows. Theorem 9. Suppose that the assumptions in Lemma 1 and Assumption ID hold true. Denote by Φ and Ψ the weighting new processes used in the construction of ˆ θ t and ˆ θ n , respectively. 1/2 1 Under the null hypothesis, the sequence ∆n ζn (τ ) of τ -indexed processes converges stably in law and uniformly in τ ∈ Γ , to a process Z 1 (τ ), which is defined on an extension of the original filtered probability space. Conditional on F , the process Z 1 (τ ) is centered Gaussian with asymptotic variance V 1 (τ , τ ; θ 0 , Φ , Ψ ). The conditional variance function V 1 (·, ·; θ 0 , Φ , Ψ ) is given by V 1 (τ , η; θ 0 , Φ , Ψ ) = wτ (x⊺ Φ x)−1 x⊺ ΦΣθ 0 θ 0 Φ x(x⊺ Φ x)−1 wη ⊺

− wτ⊺ (x⊺ Φ x)−1 x⊺ ΦΣθ0 θ0 Ψ x(x⊺ Ψ x)−1 wη − wτ ⊺ (x⊺ Ψ x)−1 x⊺ Ψ Σθ0 θ0 Φ x(x⊺ Φ x)−1 wη + wτ ⊺ (x⊺ Ψ x)−1 x⊺ Ψ Σθ0 θ0 Ψ x(x⊺ Ψ x)−1 wη , which can be consistently estimated, uniformly in τ and η, by ˆ Fn V 1 (τ , η; θ 0 , Φ , Ψ ). 1/2 For y ⋆ = y and x⋆ = x, we have a similar convergence result for ∆n ζn2 (τ ) under the null hypothesis. The conditional 2 variance function V (·, ·; θ 0 , Ψ ) is given by V 2 (τ , η; θ 0 , Ψ ) = wτ Σθ 0 θ 0 wη − wτ Σθ 0 θ 0 Ψ x (x⊺ Ψ x)−1 x⊺ wη ⊺



new

− wτ⊺ x (x⊺ Ψ x)−1 x⊺ Ψ Σθ0 θ0 wη + wτ⊺ x Var(ˆ θ new

where Var(ˆ θ

) x⊺ wη ,

) = (x⊺ Ψ x)−1 x⊺ Ψ Σθ 0 θ 0 Ψ x (x⊺ Ψ x)−1 . new

When choosing the optimal weighting process for both ˆ θ t and ˆ θn

, we obtain

V 1 (τ , η; θ 0 , Σθ θ , Σθ θ ) = wτ (x⊺ Σθ θ x)−1 wη − wτ ⊺ (x⊺ Σθ θ x)−1 wη . 0 0 0 0 0 0 0 0 −1

−1



−1

−1

new

That is, the variance of the wτ -weighted differences between ˆ θ t and ˆ θ n at different time points is the difference between their wτ -weighted variances (cf. the discussion preceding Lemma 2.1 of Hausman (1978)). Similarly, we have V 2 (τ , η; θ 0 , Σθ θ , Σθ θ ) = wτ Σθ 0 θ 0 wη − wτ x (x⊺ Σθ θ x)−1 x⊺ wη . 0 0 0 0 0 0 −1

−1



−1



new

This property does not hold if we replace ˆ θn new

ˆ θn

LTT

with ˆ θn

, because the latter estimator is generally less efficient than LTT

under the null. Therefore, the asymptotic variances associated with ˆ θ n are more involved (there are two additional terms, even with the optimal weighting matrix). Suppose W = {wτ (t)}τ ∈Γ is a determining class for the null hypothesis ΩT0 . For either ζ = ζ 1 or ζ 2 , one can construct a Kolmogorov–Smirnov type test statistic 1/2 BKSn = sup |(Fn V )−1 ∆− ζn (τ )|. n

τ ∈Γ

If one prefers a univariate wτ (t), then one can use the following statistic: 1/2 BKSn = sup max |(Fn Vmm )−1 ∆− ζn (τ )m |, n

τ ∈Γ 1≤m≤p

where Vmm is the (m, m)-element of the asymptotic variance matrix, and ζn (τ )m is the mth element of ζn (τ ). One can thus follow the simulation procedure described by Li et al. (2016) to find the critical value for any given significance level. More specifically, first estimate the asymptotic variance–covariance matrix V 1 (or V 2 ) for different values of τ and η. ∗ M 2 ˆ1 ˆ2 Then simulate a large number of centered Gaussian processes {ζn1 (·)∗j }M j=1 (or {ζn (·)j }j=1 ) from N(·, ·; V ) (or N(·, ·; V )), ∗ M 1 2 1 2 ˆ ˆ where V (or V ) is a consistent estimator of V (or V ). Finally, let CVn,α be the 1 − α quantile of {BKSn,j }j=1 . One can reject the null hypothesis if BKSn > CVn,α . Corollary 10. Let α ∈ (0, 1) be a constant. Suppose the conditions in Theorem 9 are satisfied, and W = {wτ (t)}τ ∈Γ is a determining class. Moreover, suppose V 1 (τ , τ ; θ 0 , Σθ−1θ , Σθ−1θ ) and V 2 (τ , τ ; θ 0 , Σθ−1θ , Σθ−1θ ) are almost surely positive 0 0 0 0 0 0 0 0 definite, for all τ . Then, for ζ = ζ 1 or ζ 2 , the following statements hold true for the critical region {BKSn > CVn,α }: (i) Under the null hypothesis, the test associated with the above critical region has asymptotic size α . That is, P(BKSn > CVn,α | ΩT0 (W )) → α . (ii) Under the alternative hypothesis ΩTa (W ), the corresponding test has an asymptotic power of one. That is, P(BKSn > CVn,α | ΩTa (W )) → 1. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

14

X. Yang / Journal of Econometrics xxx (xxxx) xxx Table 1 Parameter values in Monte Carlo simulations.

κ

θ

η

κ˜

θ˜

η˜

κˇ

θˇ

4

0.06

0.08

1.5

0.8

0.8

5

0.05

0.3

ρ −0.6

µ

λX 4/T

X µ√ − 3 ∆n

λU

0.05

X µ√ + 2 ∆n

U µ√ +

µX− √ 1.5 ∆n

σ µ √ ∆n

8/T

1 ∆n

ηˇ 2

4. Monte Carlo simulation The simulation model we used is a simplified version of the one used by Aït-Sahalia and Xiu (2017), where the authors consider a multi-dimensional log-price process with a three-factor structure. For simplicity, we consider the univariate case with one factor:

⎧ ⎨dYt = βt dXt + dUt , dX = µdt + σt dWt + dJtX , ⎩ t dUt = γt dBt + dJtU . The process Y can be viewed as the log-price process of an individual stock, while X represents the market factor and U captures the idiosyncratic risk. The jump processes J X and J U are two compound Poisson processes with intensities λX and λU , respectively. The jump size distributions for dJX and dJU are double exponential with the probabilities pX = 0.65 and pU = 0.55 being negative and with means of the positive and negative parts being µX+ , µX− , µU+ and µU− , respectively. The process X can be designated as the “market factor”, whose coefficient βt is always positive. The time-varying coefficients are given by

⎧ ˇ t + dJtσ 2 , ⎪ ˇ t dW ⎨dσt2 = κˇ (θˇ − σt2 )dt + ησ dγt2 = κ (θ − γt2 )dt + ηγt dB¯ t , ⎪ ⎩dβ = κ˜ (θ˜ − β )dt + η˜ √β dB˜ , t t t t ˇ can be correlated with W and the correlation is given by ρ . where W All values of the parameters are given in Table 1. Many parameters take the same values as in Aït-Sahalia and Xiu (2017). Those with different values are of the same order as those in the cited paper. Under the null hypothesis of constant β , we set κ˜ = η˜ = 0 and βt ≡ θ˜ = 0.8. A feature of the above model is that σt2 , γt2 and βt all follow the Ornstein–Uhlenbeck process. Thus, one needs to make sure that the Feller condition is satisfied, to avoid negative values for these √ processes (all of which are positive by definition) in a simulation. For instance, the parameter η˜ must satisfy η˜ <

θ˜ = 0.8.

2κ˜ θ˜ , which is 1.5492 with κ˜ = 1.5 and

Note that under the alternative hypothesis of time-varying β , the parameter η˜ controls the variation of β , when the other parameters are fixed. Hence, it can be viewed as a measure of the deviation from the constant null hypothesis. To check its effect on the finite sample testing power, we consider a sequence of values of η˜ from 0 to 1.5, with the step size being 0.02. Fig. 1 gives the results for the overlapping Hausman-type test (the non-overlapping test has similar results). The reason why we do not show the results for the Bierens-type tests is that the asymptotic variances of the processes ζn1 and ζn2 may not always be positive semi-definite for larger values of η˜ . This implies that the Hausman-type test is probably more applicable than the Bierens-type tests in practice. As shown in the figure, the testing power generally increases with η˜ , for any given sampling frequency. On the other hand, for any given value of η˜ , the testing power increases with the sampling frequency, consistent with the implication of Theorem 8. In particular, the sample rejection rates are quite close to 1 for larger values of η˜ when the sampling frequency is relatively high (1-minute or 30-second). To conduct a comparison between the two types of tests, we fix η˜ = 0.8 (the general pattern found below remains the same for the other values) in the following and consider several different scenarios: the first three all have a 1-month time horizon, while the sampling frequencies are 5 min, 1 min, and 30 s, respectively; the last one has a 3-month time horizon and a 5-minute sampling frequency. The unit of time is 1 year. In the first three scenarios, we are testing the same hypothesis (the process β is constant over a month) under different sampling frequencies. Strictly speaking, the hypothesis being tested in the last scenario is different (e.g., consider the case where the value of β changes every month). The corresponding testing results are given in Tables 2–5, respectively. We consider both the non-overlapping and the overlapping versions of the Hausman-type test. For the Bierens-type tests, we consider both BKS1 and BKS2 , each with the new LTT estimators ˆ θ n and ˆ θ n . Both estimators are constructed using the corresponding optimal weighting matrices in order LTT

to have a fair comparison. Thus, the statistics associated with ˆ θ n are different from the one used by Li et al. (2016), which is constructed using the identity matrix. We vary the local window size from 30 to 150 (the results for 180 and 300 are similar to those for 150), with the step size being 30. In each setting, the simulation is conducted 5000 times. To find the critical values of those Bierens-type statistics, we draw 2000 Monte Carlo realizations from their corresponding distributions. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

15

Fig. 1. Sample rejection rates of the overlapping Hausman-type test against η˜ (T = 1 month and kn = 60). Table 2 Testing results for ∆n = 5 min and T = 1 month.

Size

Power

Hnon

Hover

BKS1new

BKS2new

BKS1LTT

BKS2LTT

kn = 30

1% 5% 10%

6.12% 13.04% 19.30%

3.02% 8.00% 13.24%

0.58% 3.86% 8.00%

0.66% 3.82% 7.90%

0.70% 3.84% 7.88%

0.64% 3.72% 7.78%

kn = 60

1% 5% 10%

2.88% 7.66% 11.90%

1.00% 3.54% 6.38%

0.92% 4.22% 9.28%

0.88% 4.40% 8.76%

0.86% 4.00% 9.20%

0.86% 4.10% 8.54%

kn = 90

1% 5% 10%

2.36% 6.38% 10.94%

0.74% 2.80% 4.98%

0.80% 4.56% 9.52%

0.90% 4.48% 9.40%

0.78% 4.34% 9.22%

0.80% 4.38% 9.06%

kn = 120

1% 5% 10%

2.18% 5.92% 10.30%

0.76% 2.48% 4.82%

1.10% 4.80% 9.86%

1.18% 5.08% 10.22%

0.98% 4.78% 9.44%

0.98% 4.58% 9.32%

kn = 150

1% 5% 10%

2.24% 5.40% 8.88%

0.80% 2.60% 4.68%

1.06% 4.84% 9.72%

0.88% 4.86% 10.28%

0.86% 4.46% 9.14%

0.78% 4.34% 9.26%

kn = 30

1% 5% 10%

46.92% 57.20% 63.56%

45.28% 56.14% 63.20%

34.02% 44.80% 52.04%

36.18% 47.32% 53.38%

33.44% 44.34% 51.36%

35.68% 46.48% 52.90%

kn = 60

1% 5% 10%

47.42% 56.62% 62.00%

45.58% 54.70% 60.38%

40.88% 52.12% 59.50%

42.02% 53.74% 60.82%

39.28% 51.42% 58.66%

40.88% 52.62% 59.56%

kn = 90

1% 5% 10%

51.40% 59.74% 64.38%

48.40% 56.34% 61.40%

43.34% 55.36% 62.38%

44.20% 55.70% 62.88%

42.80% 54.64% 62.02%

43.18% 54.78% 62.14%

kn = 120

1% 5% 10%

53.32% 61.28% 65.80%

49.70% 57.84% 62.92%

45.62% 58.12% 65.60%

46.10% 59.00% 66.36%

44.52% 57.34% 64.94%

45.02% 57.46% 65.12%

kn = 150

1% 5% 10%

53.94% 61.22% 65.62%

50.80% 58.74% 63.48%

45.60% 58.82% 66.10%

46.50% 59.56% 66.66%

44.64% 57.58% 64.88%

44.68% 57.84% 64.90%

When the local window is relatively small (e.g., kn = 30 or even kn = 60 when the sampling frequency is higher than 1-minute), the Hausman-type statistics over-reject the true null hypothesis. The size distortion is more severe, with either a higher sampling frequency or a longer time horizon. Hence, we do not recommend using a very small window Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

16

X. Yang / Journal of Econometrics xxx (xxxx) xxx Table 3 Testing results for ∆n = 1 min and T = 1 month.

Size

Power

Hnon

Hover

BKS1new

BKS2new

BKS1LTT

BKS2LTT

kn = 30

1% 5% 10%

14.20% 28.34% 37.74%

9.82% 25.06% 35.38%

0.70% 3.88% 8.18%

0.74% 4.18% 8.30%

0.78% 3.80% 8.14%

0.64% 4.04% 8.32%

kn = 60

1% 5% 10%

3.36% 9.16% 15.02%

1.34% 5.10% 9.52%

0.78% 4.46% 9.84%

0.92% 4.82% 9.70%

0.90% 4.44% 9.76%

0.86% 4.74% 9.78%

kn = 90

1% 5% 10%

2.46% 7.42% 12.36%

0.80% 3.54% 6.76%

0.88% 4.58% 8.88%

0.86% 4.46% 8.94%

0.88% 4.64% 8.94%

0.94% 4.48% 9.12%

kn = 120

1% 5% 10%

1.74% 6.36% 11.94%

0.68% 2.70% 5.86%

1.12% 4.92% 9.30%

1.24% 4.76% 9.38%

1.18% 4.78% 9.44%

1.28% 4.68% 9.50%

kn = 150

1% 5% 10%

1.68% 5.90% 11.64%

0.48% 2.34% 5.32%

0.90% 4.50% 9.40%

0.80% 4.16% 9.48%

0.86% 4.42% 9.62%

0.80% 4.34% 9.34%

kn = 30

1% 5% 10%

79.68% 86.92% 90.24%

80.42% 88.16% 91.68%

55.48% 63.60% 67.82%

57.44% 65.50% 69.86%

55.16% 63.48% 67.96%

57.42% 65.56% 69.90%

kn = 60

1% 5% 10%

74.66% 81.50% 85.62%

74.64% 82.44% 86.52%

56.72% 65.12% 69.98%

58.38% 66.42% 71.42%

56.54% 64.96% 69.96%

58.00% 66.50% 71.32%

kn = 90

1% 5% 10%

76.80% 83.38% 86.46%

76.68% 83.66% 87.00%

76.54% 82.62% 85.76%

77.36% 83.22% 86.48%

76.28% 82.62% 85.76%

77.16% 83.22% 86.46%

kn = 120

1% 5% 10%

79.16% 84.60% 87.56%

78.80% 84.64% 87.84%

78.84% 85.08% 88.42%

79.40% 85.64% 88.38%

78.74% 84.88% 88.26%

79.24% 85.42% 88.40%

kn = 150

1% 5% 10%

80.98% 86.12% 89.04%

80.72% 85.96% 88.58%

77.76% 83.58% 86.84%

78.18% 84.08% 87.18%

77.68% 83.48% 86.98%

78.00% 84.00% 87.12%

size for the Hausman-type statistics. For larger window sizes, the non-overlapping Hausman-type statistic sometimes still tends to over-reject the true null, but the size distortion becomes much less severe. In contrast, the overlapping Hausman-type statistic becomes less likely to reject the true null than the given significance level. Besides, these two tests have similar testing powers in all scenarios. Therefore, the overlapping Hausman-type statistic is a better choice than the non-overlapping one. As for the Bierens-type statistics, the overlapping ones have inferior powers compared to the non-overlapping ones at 1-minute and 5-minute frequencies (about 10% to 20% less respectively), yet have similar powers at higher sampling frequencies (refer to Table 8). This suggests that, although the asymptotic properties of these statistics are the same, the overlapping Bierens-type statistics could have an inferior power when the sampling frequency is relatively low. A detailed analysis of such finite sample effect, however, is beyond the scope of this paper. Here, we just report the results for the non-overlapping ones to save space. These Bierens-type tests have similar sizes and powers in all cases. When comparing all of the test statistics, the overlapping Hausman-type statistic is more conservative, when the null is true (except for very small local window). It has larger testing power than those Bierens-type tests whenever the null is false for nearly all of the sampling frequencies, time horizons, and local window sizes considered here. In addition, it is pivotal and does not require bias correction, which can be quite tedious. Hence, in general, we recommend the overlapping Hausman-type statistic in practice. If the computation of those Bierens-type statistics is not very demanding, then one can also calculate these statistics to double-check. Last but not least, we further investigate the power gain of the Hausman-type test when kn = 60. We calculate the sample likelihood of the Hausman test being significant when the BKS test is only “close-to-be-significant”. Specifically, we calculate the following quantities

P(|Hover | ≥ z0.975 | CVn,0.15 ≤ BKS2new < CVn,0.05 ) P(|Hover | ≥ z0.975 | CVn,0.25 ≤ BKS2new < CVn,0.05 ), where z1−α/2 is the quantile of the standard normal distribution. In the scenario where ∆n is 1 min and T is 1 month, these values are around 84% and 80%, respectively. In fact, such a possibility is the main source of power gain for the Hausman-type test. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

17

Table 4 Testing results for ∆n = 30 s and T = 1 month.

Size

Power

Hnon

Hover

BKS1new

BKS2new

BKS1LTT

BKS2LTT

kn = 30

1% 5% 10%

26.48% 45.36% 55.72%

21.96% 45.32% 58.48%

0.50% 3.44% 7.48%

0.64% 3.72% 8.02%

0.48% 3.48% 7.66%

0.60% 3.68% 8.20%

kn = 60

1% 5% 10%

4.16% 11.06% 18.02%

1.86% 6.46% 12.20%

0.72% 4.36% 8.84%

0.80% 4.26% 8.86%

0.74% 4.44% 8.86%

0.72% 4.18% 8.88%

kn = 90

1% 5% 10%

2.28% 7.36% 13.06%

0.88% 3.54% 7.28%

0.80% 4.24% 9.26%

0.78% 4.80% 9.38%

0.72% 4.12% 9.42%

0.80% 4.90% 9.44%

kn = 120

1% 5% 10%

1.74% 6.38% 11.86%

0.50% 2.80% 5.88%

0.86% 4.60% 9.10%

0.78% 4.52% 9.44%

0.82% 4.34% 9.04%

0.84% 4.56% 9.30%

kn = 150

1% 5% 10%

1.74% 6.18% 11.32%

0.44% 2.48% 5.20%

0.92% 4.72% 9.56%

0.98% 4.64% 9.76%

1.02% 4.66% 9.62%

1.00% 4.54% 9.98%

kn = 30

1% 5% 10%

90.82% 95.32% 96.58%

91.50% 96.24% 97.92%

72.50% 78.68% 82.10%

73.80% 80.32% 83.32%

72.40% 78.78% 82.08%

74.06% 80.18% 83.24%

kn = 60

1% 5% 10%

84.02% 89.62% 91.76%

84.20% 89.84% 92.42%

73.86% 79.94% 82.92%

74.28% 80.70% 83.86%

73.88% 80.06% 83.00%

74.62% 80.72% 83.96%

kn = 90

1% 5% 10%

85.12% 89.94% 91.94%

85.14% 90.06% 92.24%

74.10% 80.00% 83.36%

74.50% 80.58% 84.40%

74.08% 80.08% 83.46%

74.56% 80.60% 84.40%

kn = 120

1% 5% 10%

86.36% 90.56% 92.42%

86.86% 90.92% 92.90%

74.34% 80.64% 83.64%

74.86% 81.00% 84.32%

74.44% 80.52% 83.60%

74.98% 81.04% 84.38%

kn = 150

1% 5% 10%

88.06% 91.72% 93.50%

88.38% 91.70% 93.44%

74.32% 80.20% 83.60%

74.70% 80.80% 84.40%

73.92% 80.26% 83.56%

74.62% 80.74% 84.42%

5. Empirical study In this section, we are going to test for the constant beta hypothesis using the empirical data. The intra-day observations are collected from the TAQ database for the SPDR S&P 500 ETF (SPY) and the 9 S&P sector ETFs, including Materials (XLB), Energy (XLE), Financial (XLF), Industrial (XLI), Technology (XLK), Consumer Staples (XLP), Utilities (XLU), Health Care (XLV), and Consumer Discretionary (XLY). The sample period is from January 2, 2008, to November 16, 2018. We first exclude the abnormal prices that deviate from the pre- and post-price levels only by a few seconds.2 We calculate returns with both 1-minute and 5-minute sampling frequencies and discard overnight returns. We regress each sector ETF on the market ETF (i.e., SPY) and then test for the constant beta hypothesis over both 1-month (with 1-minute frequency) and 1-quarter (with 5-minute frequency) time horizons. To save space, we only report the testing results for the industrial (XLI) and technology (XLK) sectors, in Tables 6 and 7 respectively. The significance level is 5% for all test statistics. The results for the other sectors have similar patterns. The numbers reported are the percentages of months (131 months in total) or quarters (44 quarters in total) that the constant beta null hypothesis can be rejected. Those numbers in parentheses are the corresponding results obtained from the Benjamini–Hochberg procedure, which helps reduce potential multiple testing bias. To save space, we only show the results with kn being 60, 90, and 120 (the pattern for kn = 150 is quite similar). From Tables 6 and 7, one can see that the constant beta null hypothesis can be rejected quite often, under both scenarios. We also note that Reiß et al. (2015) found substantial evidence of time-varying beta for individual stocks. Hence, we believe that the above results are very unlikely to be spurious (note that we also report the Benjamini–Hochberg rejection rates). In all cases, the two Hausman-type statistics have substantially higher rejection rates (10 to 40 percentage points higher). Take XLI as an example. When kn = 60, the rejection rates of Hover are about 30 percentage points higher than those Bierens-type tests, no matter the time horizon is 1 month or 1 quarter. To better understand this power 2 This makes the volatility signature plot much flatter. See Figs. 2 and 3 for more details. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

18

X. Yang / Journal of Econometrics xxx (xxxx) xxx Table 5 Testing results for ∆n = 5 min and T = 3 months.

Size

Power

Hnon

Hover

BKS1new

BKS2new

BKS1LTT

BKS2LTT

kn = 30

1% 5% 10%

10.58% 21.78% 30.18%

5.98% 16.82% 25.14%

0.74% 3.96% 8.20%

0.72% 3.72% 8.36%

0.72% 3.96% 8.32%

0.74% 3.80% 8.40%

kn = 60

1% 5% 10%

3.20% 8.30% 13.32%

1.16% 4.36% 8.24%

0.78% 4.20% 8.52%

0.92% 4.48% 8.84%

0.80% 4.28% 8.54%

0.88% 4.28% 8.64%

kn = 90

1% 5% 10%

2.46% 6.96% 12.18%

0.78% 3.10% 6.18%

0.56% 4.18% 9.70%

0.62% 4.10% 9.28%

0.52% 4.18% 9.72%

0.72% 4.06% 9.20%

kn = 120

1% 5% 10%

1.60% 6.16% 10.36%

0.50% 2.60% 5.46%

0.64% 4.36% 9.06%

0.80% 4.30% 9.38%

0.66% 4.22% 8.86%

0.74% 4.18% 9.06%

kn = 150

1% 5% 10%

1.90% 5.70% 10.84%

0.44% 2.32% 5.20%

1.06% 5.12% 9.22%

1.08% 5.16% 9.44%

1.00% 4.86% 9.00%

0.96% 4.96% 9.38%

kn = 30

1% 5% 10%

90.16% 93.82% 95.42%

90.76% 94.76% 96.20%

67.64% 75.02% 78.94%

70.00% 76.98% 80.90%

67.70% 74.96% 79.02%

69.98% 76.98% 80.96%

kn = 60

1% 5% 10%

89.36% 92.88% 94.30%

90.06% 93.08% 94.68%

82.60% 88.48% 91.16%

83.30% 88.80% 91.10%

82.10% 88.32% 91.06%

83.00% 88.56% 90.98%

kn = 90

1% 5% 10%

91.18% 93.84% 95.24%

90.98% 94.12% 95.32%

86.88% 91.56% 93.44%

87.40% 92.18% 94.00%

86.96% 91.44% 93.38%

87.58% 92.10% 93.96%

kn = 120

1% 5% 10%

91.84% 94.38% 95.78%

92.08% 94.62% 95.92%

86.92% 91.94% 94.08%

87.26% 92.26% 94.28%

86.66% 91.74% 93.98%

87.10% 92.00% 93.96%

kn = 150

1% 5% 10%

93.08% 95.30% 96.48%

92.84% 95.02% 96.34%

91.14% 94.74% 96.10%

91.78% 95.00% 96.54%

90.74% 94.60% 95.98%

91.64% 94.88% 96.40%

Table 6 Testing results for XLI (Industrial Sector ETF).

Month

Quarter

Hnon

Hover

BKS1new

BKS2new

BKS1LTT

BKS2LTT

kn = 60

96.18% (96.18%)

97.71% (97.71%)

58.78% (49.62%)

64.89% (57.25%)

60.31% (49.62%)

69.47% (59.54%)

kn = 90

97.71% (97.71%)

96.95% (96.95%)

72.52% (67.18%)

73.28% (70.99%)

70.99% (65.65%)

77.10% (72.52%)

kn = 120

91.60% (91.60%)

95.42% (94.66%)

77.86% (76.34%)

84.73% (83.21%)

79.39% (76.34%)

85.50% (83.97%)

kn = 60

86.36% (86.36%)

90.91% (88.64%)

59.09% (50.00%)

56.82% (38.64%)

63.64% (54.55%)

63.64% (43.18%)

kn = 90

86.36% (84.09%)

88.64% (88.64%)

56.82% (45.45%)

61.36% (52.27%)

59.09% (34.09%)

63.64% (59.09%)

kn = 120

84.09% (84.09%)

86.36% (86.36%)

68.18% (61.36%)

63.64% (47.73%)

72.73% (65.91%)

70.45% (59.09%)

improvement, we check the number of testing periods that have “close-to-be-significant” BKS2new statistics, but significant Hover statistics. There are 35 months (26.72% of 131), the BKS2new statistics of which have (simulation-based) p-values between 5% and 25%. Among them, 34 months are associated with the Hover statistics that are significant at 5%. In the quarterly case, the numbers of periods become 10 (22.73% of 44) and 8 (18.18% of 44). We note that the “conversion” rates (34/35 ≈ 97% and 8/10 = 80%) are consistent with what has been found in the simulation study. As mentioned above, this is because when the differences between the unrestricted and the restricted estimates are large, the quadratic function used by the Hausman-type statistics puts more weights on such differences, thus yielding Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

19

Table 7 Testing results for XLK (Technology Sector ETF).

Month

Quarter

Hnon

Hover

BKS1new

BKS2new

BKS1LTT

BKS2LTT

kn = 60

92.37% (92.37%)

92.37% (92.37%)

44.27% (32.82%)

48.85% (39.69%)

45.04% (32.06%)

48.85% (41.22%)

kn = 90

87.02% (87.02%)

90.84% (90.84%)

61.07% (50.38%)

61.07% (47.33%)

58.02% (49.62%)

61.83% (49.62%)

kn = 120

88.55% (87.79%)

87.02% (86.26%)

54.96% (51.15%)

62.60% (52.67%)

55.73% (50.38%)

62.60% (51.15%)

kn = 60

79.55% (77.27%)

84.09% (81.82%)

40.91% (20.45%)

47.73% (27.27%)

38.64% (22.73%)

50.00% (31.82%)

kn = 90

77.27% (72.73%)

86.36% (81.82%)

45.45% (34.09%)

52.27% (47.73%)

45.45% (34.09%)

52.27% (47.73%)

kn = 120

68.18% (63.64%)

75.00% (72.73%)

50.00% (40.91%)

45.45% (40.91%)

52.27% (36.36%)

45.45% (40.91%)

Fig. 2. Log-prices of SPY on 1/2/2018. The three log-prices marked by red square are identified as outliers. Note the last two are associated with four abnormal 1-second returns whose magnitudes are larger than 1%, which is quite large even for daily returns. Mathematically, the cleaned log-price process can be viewed as a modification (see p.3 of Protter (2005)) of the raw process. According to Theorem 2 of Protter (2005), the modification and the original processes are indistinguishable if they are càdlàg, which we assumed in Assumption HF.

superior power. This also helps explain why the two Hausman-type statistics are more robust to the Benjamini–Hochberg correction than those Bierens-type statistics. 6. Conclusion This paper studies time-invariant restrictions on certain known functions of the stochastic volatility process. We have developed a GMM estimator that is more efficient than the existing ones and derived the efficiency bound under such constraints. We have also provided a condition to check if the efficiency bound is attained or not. Under the un-restricted model, we can have another estimator that is less efficient under the null but is consistent under both the null and the alternative. We went on to construct an integrated Hausman-type test by summing up the squared differences between this more efficient estimator and the unrestricted estimator computed at different time points. The more efficient GMM estimator can also be used to update an existing Bierens-type test and simplify the calculation of the asymptotic variance. Our simulation results have shown that, except for very small local windows, the overlapping Hausman-type test has good size and superior power. We finally applied these tests to examine the constant beta hypothesis studied in the literature and found substantial evidence against it. Appendix A. Additional table and figures See Table 8, Figs. 2 and 3. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

20

X. Yang / Journal of Econometrics xxx (xxxx) xxx Table 8 Power comparison for BKS tests at higher frequencies.

∆n = 30 s and T = 1 month kn = 60

BKS1new BKS2new BKS1LTT BKS2LTT

kn = 90

1%

5%

10%

1%

5%

10%

non over non over

73.86% 73.08% 74.28% 73.98%

79.94% 79.20% 80.70% 80.14%

82.92% 82.00% 83.86% 83.06%

74.10% 73.34% 74.50% 74.20%

80.00% 79.36% 80.58% 80.00%

83.36% 81.84% 84.40% 82.98%

non over non over

73.88% 73.22% 74.62% 73.98%

80.06% 79.14% 80.72% 80.10%

83.00% 82.04% 83.96% 82.98%

74.08% 73.46% 74.56% 74.12%

80.08% 79.44% 80.60% 80.06%

83.46% 81.96% 84.40% 82.88%

∆n = 15 s and T = 1 month kn = 60 BKS1new BKS2new BKS1LTT BKS2LTT

kn = 90

non over non over

1% 90.96% 90.90% 91.58% 91.52%

5% 94.00% 93.98% 94.56% 94.58%

10% 95.44% 95.20% 95.66% 95.60%

1% 91.32% 91.12% 91.72% 91.56%

5% 94.16% 94.22% 94.56% 94.54%

10% 95.50% 95.26% 95.64% 95.64%

non over non over

90.92% 91.06% 91.60% 91.60%

94.04% 94.00% 94.56% 94.54%

95.44% 95.22% 95.60% 95.62%

91.22% 91.04% 91.80% 91.60%

94.08% 94.18% 94.58% 94.48%

95.56% 95.24% 95.64% 95.62%

Fig. 3. Volatility signature plots. We calculate the realized variances at different sampling frequencies. The black line corresponds to the raw data with only obvious outliers removed (those 1-second returns with magnitudes larger than 50%). The red line corresponds to the cleaned data with most bouncing-back log-prices removed. As shown in Fig. 2, our current cleaning method is effective in removing relatively large price outliers, but less so for relatively small ones (e.g., there are three suspicious yet un-identified log-prices between 14:00 and 15:00 in the above figure). This may explain why the realized variance increases when the sampling frequency is very high.

Appendix B. Proofs According to the localization argument (see, e.g., p.84 and p.118 of Jacod and Protter (2011)), one can replace those locally bounded conditions in Assumption HF by bounded ones, and then prove the desired results under such strengthened assumption. One important implication of this is that the volatility process c t shall be viewed as uniformly bounded on [0, T ]. B.1. Proof of Lemma 1 This lemma is a summary of the main results of Li et al. (2019) and Yang (2018). Refer to the cited papers for detailed proofs. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

21

B.2. Proof of Theorem 2 Lemma B.1 (Product Rules for Matrix Functions). Let A and B be two matrix functions of a matrix variable X . Let cA = rB so that the product AB is well-defined. Then we have

D(AB)(X ) = (B⊺ ⊗ IrA ) DA(X ) + (IcB ⊗ A) DB(X ). Proof. It is easy to verify that d(AB) = (dA)B + A(dB). Then it readily follows from the equality vec(CDE) = (E ⊺ ⊗ C ) vec(D) that dvec(AB) vec(d(AB)) D(AB)(X ) := = dvec(X )⊺ dvec(X )⊺ dvec(A) dvec(B) = (B⊺ ⊗ Im ) + (Ir ⊗ A) . dvec(X )⊺ dvec(X )⊺ This completes the proof. □ Lemma B.1 yields that (we omit the subscript t for simplicity)

D(x⊺ Φ y) = x⊺ Φ Dy + y ⊺ ⊗ x⊺ DΦ + (Φ y)⊺ ⊗ Icx Trx ,cx Dx

= x⊺ Φ Dy + y ⊺ ⊗ x⊺ DΦ + Icx ⊗ (Φ y)⊺ Dx D(x Φ x) = Icx ⊗ (x⊺ Φ ) Dx + x⊺ ⊗ x⊺ DΦ + (Φ x)⊺ ⊗ Icx Trx ,cx Dx = (Icx2 + Tcx ,cx )Icx ⊗ (x⊺ Φ ) Dx + x⊺ ⊗ x⊺ DΦ . ⊺

From the equality x⊺ Φ y = (x⊺ Φ x)θ , we get

D(x⊺ Φ y) = (x⊺ Φ x) Dθ + θ ⊺ ⊗ Ip D(x⊺ Φ x). It then follows that

Dθ = −θ ⊺ ⊗ (x⊺ Φ x)−1 D(x⊺ Φ x) + (x⊺ Φ x)−1 D(x⊺ Φ y)

= −θ ⊺ ⊗ (x⊺ Φ x)−1 (Icx2 + Tcx ,cx )Icx ⊗ (x⊺ Φ ) Dx − y ⊺ ⊗ [(x⊺ Φ x)−1 x⊺ ] DΦ + (x⊺ Φ x)−1 x⊺ Φ Dy + y ⊺ ⊗ [(x⊺ Φ x)−1 x⊺ ] DΦ + (x⊺ Φ x)−1 ⊗ (Φ y)⊺ Dx = (x⊺ Φ x)−1 x⊺ Φ Dy − θ ⊺ ⊗ [(x⊺ Φ x)−1 (x⊺ Φ )] Dx − (x⊺ Φ x)−1 ⊗ [θ ⊺ x⊺ Φ ] Dx + (x⊺ Φ x)−1 ⊗ (Φ y)⊺ Dx { } = (x⊺ Φ x)−1 x⊺ Φ Dy + −θ ⊺ ⊗ [(x⊺ Φ x)−1 (x⊺ Φ )] + (x⊺t Φt xt )−1 ⊗ [y ⊺t (Φt⊺ − Φt )] Dx. This shows that ∂y θ t = (xt Φt xt )−1 xt Φt , ∂x θ t = −θ t ⊗ ∂y θ t + (xt Φt xt )−1 ⊗ [y t (Φt − Φt )], and ∂Φ θ ≡ 0. Then for symmetric Φ , we have ⊺

Dθ t =

⊺ (xt





Φt xt ) xt Φt Ip , −θ ⊗ Ip −1 ⊺

⊺ t

(

( ) ) Dy t Dxt







.

(B.1)

Then one can easily derive the following result from Lemma 1:

FV(θ, θ ) =

T



D θ (c t )

(

)(

(Id2 + Td,d ) c t ⊗ c t

)]⊺

D θ (c t )

)[ (

dt ,

0 T



(xt Φt xt )−1 xt Φt Ip , −θ t ⊗ Ip ⊺

=





(

0

(

Ip

( ) ) Dy t ( Dxt

(Id2 + Td,d ) c t ⊗ c t

Dy t , Dxt

)(

)

)

× Φt xt (x⊺t Φt xt )−1 dt −θ ⊺t ⊗ Ip ∫ T ⊺ ⊺ ⊺ = (xt Φt xt )−1 xt Φt Σθθ,t Φt xt (xt Φt xt )−1 dt = (x⊺ Φ x)−1 x⊺ ΦΣθθ Φ x(x⊺ Φ x)−1 . 0

This completes the proof. B.3. Proof of Theorem 3 LTT

We first prove the part related to ˆ θn According to Lemma 1, we have

LTT ¯ n x¯ n )−1 x¯ ⊺n Φ ¯ n y¯ n . The estimator ˆ ¯ n , x¯ n ). = (x¯ ⊺n Φ θ n is a function of the vector (y¯ n , Φ

⎛ ⎞⎞ (( ) ( y¯ 0 FV(y , y) Lst ¯ n ⎠ − ⎝Φ ¯ ⎠⎠ −−→ MN 0 , FV(Φ , y) √ ⎝⎝Φ ∆n 0 FV(x, y) x¯ n x¯ ⎛⎛

1

y¯ n



FV(y , Φ ) FV(Φ , Φ ) FV(x, Φ )

FV(y , x) FV(Φ , x) FV(x, x)

))

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

22

X. Yang / Journal of Econometrics xxx (xxxx) xxx

¯ n: Following a similar calculation to the previous subsection, we get the following result for symmetric Φ LTT

Dˆ θn

LTT ¯ n Dy¯ n − [ˆ ¯ n x¯ n )−1 (x¯ ⊺n Φ ¯ n )] Dx¯ n . ¯ n x¯ )−1 x¯ ⊺n Φ θ n ]⊺ ⊗ [(x¯ ⊺n Φ = (x¯ ⊺n Φ

¯ n , and y by y¯ n . The The derivation is quite similar to the previous one about Dθ : one just needs to replace x by x¯ n , Φ by Φ derivative Dy¯ n is also understood as the derivative of y with respect to the spot volatility matrix c at some t ∈ [0, T ], just like Dy. Hence, Lemma 1 applies in the usual way. Note that the estimation errors of c at time t and s (t ̸ = s) are asymptotically independent, for that the main estimation errors are from the Brownian increments driving the log-price processes, which are independent at different time points. Using the delta method, it then readily follows that 1



) ( ( LTT ) Lst ˆ ¯Σ ¯ θ0 θ0 Φ ¯ x¯ (x¯ ⊺ Φ ¯ x¯ )−1 . ¯ x¯ )−1 x¯ ⊺ Φ θ n − θ 0 −−→ MN 0, (x¯ ⊺ Φ

∆n

new

As for ˆ θn

( ) 1 ⊺ ⊺ ⊺ = (x⊺ Φ x)− n (x Φ y)n , it is a function of the vector (x Φ y)n , (x Φ x)n . According to Lemma 1, we have (( (( ) ( )) ) ( )) 1 Lst (x⊺ Φ y)n x⊺ Φ y 0 FV(x⊺ Φ y , x⊺ Φ y) FV(x⊺ Φ y , x⊺ Φ x) − − → MN , . − √ 0 FV(x⊺ Φ x, x⊺ Φ y) FV(x⊺ Φ x, x⊺ Φ x) (x⊺ Φ x)n x⊺ Φ y ∆n

On the other hand, we have new

Dˆ θn

1 1 ⊺ ⊺ ˆnew ⊗ (x⊺ Φ x)− = (x⊺ Φ x)− n D(x Φ y)n − θ n n D(x Φ x)n .

new √1 (ˆ θ ∆n n

− θ 0 ) is given by ) ( )( ) FV(x⊺ Φ y , x⊺ Φ y) FV(x⊺ Φ y , x⊺ Φ x) ( Iq ⊺ (x⊺ Φ x)−1 (x⊺ Φ x)−1 Iq , −θ 0 ⊗ Iq ⊺ ⊺ ⊺ ⊺ ⊺ FV(x Φ x, x Φ y) FV(x Φ x, x Φ x) −θ 0 ⊗ Iq ) ∫ ( ( ) T x⊺t Φt y ⊺t ⊗ x⊺t Iq ⊗ (Φt y t )⊺ = (x⊺ Φ x)−1 Iq , −θ ⊺0 ⊗ Iq ⊺ ⊺ ⊺ 0 xt ⊗ xt (Iq2 + Tq,q )Iq ⊗ (xt Φt ) 0 )( ) ( Φt xt 0 Vyy ,t Vy Φ , t Vyx,t y t ⊗ xt xt ⊗ x dt × VΦ y ,t VΦΦ ,t VΦ x,t Iq ⊗ (Φt y t ) Iq ⊗ (Φt xt )(Iq2 + Tq,q ) Vxy ,t VxΦ ,t Vxx,t ) ( Iq (x⊺ Φ x)−1 × −θ 0 ⊗ Iq ( )( ) ∫ T Ip ) Vyy ,t Vy Φ ,t Vyx,t ( ⊺ ⊺ − 1 VΦ y ,t VΦΦ ,t VΦ x,t 0 Φt xt dt = (x⊺ Φ x) xt Φt Ip , 0, −θ 0 ⊗ Ip 0 Vxy ,t VxΦ ,t Vxx,t −θ 0 ⊗ Ip

Therefore, the variance–covariance matrix of

× (x⊺ Φ x)−1 = (x⊺ Φ x)−1 x⊺ ΦΣθ0 θ0 Φ x(x⊺ Φ x)−1 . This completes the proof. B.4. Proof of Lemma 4 Following the idea of Bultheel (1982), we introduce the following “inner product” of square matrix processes on [0, T ]. For any A = (At )t ∈[0,T ] and B = (Bt )t ∈[0,T ] with the same dimension, define

⟨A, B⟩M =

T



At Bt dt = A⊺ B. ⊺

0

It is not hard to verify that this operator ⟨·, ·⟩M has the following properties ⊺

• Symmetric up to transpose: ⟨A, B⟩M = A⊺ B = B⊺ A = ⟨B, A⟩⊺M • Bilinearity with respect to scalars and square matrices of the same dimension (right multiplication): ⟨Aα, Bβ + C γ ⟩M = α ⊺ ⟨A, B⟩M β + α ⊺ ⟨A, C ⟩M γ • Positive semi-definiteness: for any vector v such that Av is well-defined, we have v ⊺ ⟨A, A⟩M v ≥ 0. Hence, for any β , the matrix

⟨A + Bβ, A + Bβ⟩M = ⟨A, A⟩M + ⟨A, B⟩M β + β ⊺ ⟨B, A⟩M + β ⊺ ⟨B, B⟩M β Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx −1

1 ⊺ is positive semi-definite. Then by taking β = ⟨B, B⟩− M ⟨B, A⟩M = B B

23

B⊺ A, we deduce that the following matrix

1 −1 −1 ⟨A, A⟩M ⟨A, B⟩M ⟨B, B⟩− M ⟨B, A⟩M − ⟨A, B⟩M ⟨B, B⟩M ⟨B, A⟩M + ⟨A, B⟩M ⟨B, B⟩M ⟨B, A⟩M 1 = ⟨A, A⟩M − ⟨A, B⟩M ⟨B, B⟩− M ⟨B, A⟩M

is positive semi-definite, which is the desired result. B.5. More details about the continuous-time regression example The calculation of Σθθ,t in this case relies on the following lemmas. Lemma B.2. Let ei (i = 1∑ , . . . , K ) be a column vector, whose ith element is 1 whereas all the other elements are zero. Let K ⊺ Eij := ei ej . Note that IK = i=1 Eii . Let T denote the commutation matrix such that TrA ,cA vec(A) = vec(A ). Then we have ∑K ⊺ TK ,K = i,j=1 Eij ⊗ Eij and TrB ,rA (A ⊗ B) = (B ⊗ A) TcB ,cA ,

(

vec(A ⊗ B) = IcA ⊗ TcB ,rA ⊗ IrB

)(

vec(A) ⊗ vec(B)

)

Proof. See Ghazal and Neudecker (2000) and reference therein.



Lemma B.3 (Derivatives of Kronecker Product). Let A and B be two matrix functions. Denote the row and column numbers of A by rA and cA , respectively (same for B). Then we have

D(A ⊗ B) =

∂ (A ⊗ B) ∂ (A ⊗ B) DA + DB, ∂A ∂B

where

)( ) ∂ (A ⊗ B) ( = IcA ⊗ TcB ,rA ⊗ IrB IrA cA ⊗ vec(B) ∂A )( ) ∂ (A ⊗ B) ( = IcA ⊗ TcB ,rA ⊗ IrB vec(A) ⊗ IrB cB ∂B Proof. According to the definition of partial matrix derivative and Lemma B.2, we readily get

) )( dvec(A) ( ∂ (A ⊗ B) vec(dA ⊗ B) = = IcA ⊗ TcB ,rA ⊗ IrB ⊗ vec(B) ⊺ ⊺ ∂A dvec(A) dvec(A) ) )( ( = IcA ⊗ TcB ,rA ⊗ IrB IrA cA ⊗ vec(B) . ∂ (A⊗B)

Similarly, one can prove the result for ∂ B . Not that Lemma B.2 also implies that

(

dvec(A ⊗ B) = IcA ⊗ TcB ,rA ⊗ IrB

)((

)

(

dvec(A) ⊗ vec(B) + vec(A) ⊗ dvec(B)

))

.

Then the result for D(A ⊗ B) readily follows. □ Recall that θ t = vec(βt ) = (c XX,t ⊗ IP )vec(c YX,t ) is a PK -by-1 vector, i.e. p = PK in (2.14). Hence xt ≡ c XX ⊗ IP and y t ≡ vec(c YX,t ). Note that vec(y t ) ≡ vec(c YX,t ), which implies Dy t ≡ Dc YX,t . Lemma B.3 implies that ⊺



Dxt = (IK ⊗ TP ,K ⊗ IP )(IK 2 ⊗ vec(Ip )) Dc XX,t = (IK ⊗ TP ,K ⊗ IP )(TK ,K ⊗ vec(Ip )) Dc XX,t . ⊺

Then (B.1) can be written as

Dθ t = (xt Φt xt )−1 xt Φt Ip , −(θ t ⊗ Ip )(IK ⊗ TP ,K ⊗ IP )(TK ,K ⊗ vec(Ip )) ⊺





(

( ) ) Dc YX,t Dc XX,t

.



According to Lemma B.2, we have (note that ei ⊗ ej = (ei ⊗ ej )vec(1) = vec(ej ei ) ) K ∑

TK ,K =



Eij ⊗ Eij =

i,j=1

K ∑





(ei ⊗ ej )(ei ⊗ ej ) =

i,j=1

K ∑





vec(ej ei )[vec(ej ei )]⊺

i,j=1

It then follows that (θ t ⊗ IPK )(IK ⊗ TP ,K ⊗ IP )(TK ,K ⊗ vec(Ip ))

=

K ∑ (

vec(βt )⊺ ⊗ IPK (IK ⊗ TP ,K ⊗ IP ) vec(Eji ) ⊗ vec(IP ) [vec(Eji )]⊺

)

(

)

i,j=1

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

24

X. Yang / Journal of Econometrics xxx (xxxx) xxx

=

K ∑ (

vec(βt )⊺ ⊗ IPK vec(Eji ⊗ IP )[vec(Eji )]⊺

)

i,j=1

=

K ∑

vec IPK (Eji ⊗ IP )vec(βt ) [vec(Eji )]⊺ =

(

)

i,j=1

K ∑

vec(βt Eji )[vec(Eji )]⊺

i,j=1

= (IK ⊗ βt )

K ∑

vec(Eji )[vec(Eji )]⊺ = (IK ⊗ βt ) TK ,K .

i,j=1

Therefore, (2.14) becomes

( ( ) Vc c ,t YX YX Σθθ,t = Ip , −(IK ⊗ βt ) TK ,K

Vc XX c XY ,t Vc XX c XX ,t

Vc YX c XX ,t

)(

)

Ip . −TK ,K (IK ⊗ βt⊺ )

According to our definition of D, we get Dθ t ≡ Dβt . Moreover, we also have Dc YX,t = IdX ⊗ IdY , Dc XX,t = IdX ⊗ IdX and Dc XX,t = TK ,K Dc XX,t . To get the asymptotic variance FV(θ, θ ), it is easier to consider θ (β essentially) as a function of c YX,t and c XX,t . On the other hand, one can check that ⊺

Vc YX c YX ,t = Dc YX,t (Id2 + Td,d )(c t ⊗ c t )[Dc YX,t ]⊺ = c XX,t ⊗ c YY,t + TK ,P (c YX,t ⊗ c YX,t ), ⊺

Vc YX c XX ,t = Dc YX,t (Id2 + Td,d )(c t ⊗ c t )[Dc XX,t ] = (c XX,t ⊗ c YX,t )(IK 2 + TK ,K ), ⊺

Vc XX c XY ,t = Dc XX,t (Id2 + Td,d )(c t ⊗ c t )[Dc XY,t ]⊺ = (IK 2 + TK ,K )(c XX,t ⊗ c XY,t ), Vc XX c XX ,t = Dc XX,t (Id2 + Td,d )(c t ⊗ c t )[Dc XX,t ]⊺ = (Ik2 + TK ,K )(c XX,t ⊗ c XX,t ). It then follows that (note that c XY,t ≡ c YX,t , c YX,t = c XX,t βt , and TK ,K TK ,K = IK 2 ) ⊺







Σθθ,t = Vc YX c YX ,t − Vc YX c XX ,t TK ,K (IK ⊗ βt⊺ ) + (IK ⊗ βt )TK ,K Vc XX c YX ,t + (IK ⊗ βt )TK ,K Vc XX c XX ,t TK ,K (IK ⊗ βt⊺ ) = c XX,t ⊗ c YY,t + TK ,P (c YX,t ⊗ c XY,t ) − (c XX,t ⊗ c YX,t )(IK 2 + TK ,K )(IK ⊗ βt⊺ ) − (IK ⊗ βt )(IK 2 + TK ,K )(c XX,t ⊗ c XY,t ) + (IK ⊗ βt )(Ik2 + TK ,K )(c XX,t ⊗ c XX,t )(IK ⊗ βt⊺ ) = c XX,t ⊗ c YY,t + TK ,P (c YX,t ⊗ c XY,t ) − c XX,t ⊗ (βt c XX,t βt⊺ ) − TP ,K [c YX,t ⊗ (c XX,t βt⊺ )] − c XX,t ⊗ (βt c XX,t βt⊺ ) − TP ,K (c YX,t ⊗ c XY,t ) + c XX,t ⊗ (βt c XX,t βt⊺ ) + TP ,K (c YX,t ⊗ c XY,t ) 1 = c XX,t ⊗ c YY,t − c XX,t ⊗ (βt c XX,t βt⊺ ) = c XX,t ⊗ (c YY,t − c YX,t c − XX,t c XY,t ) = c XX,t ⊗ c UU,t .

B.6. Proof of Theorem 6 The proof essentially follows the appendix of Clement et al. (2013), where the parameter of interest is a scalar. To save space, we just focus on those key steps therein and highlight the important changes that are needed for the general vector case. Following Clement et al. (2013), we consider the following process η

t



Zt = Z0 +

t



0



as (θ + ηl / n)dWs ,

bs ds + 0

where η ∈ R is a scalar. Comparing (47) in Clement et al. (2013) with the above equation, it is clear that our b and a do not depend on Z θ . Hence the process Y (t) defined in (49) of the cited paper is just the identify matrix Id . This makes some of the following terms simpler, but will not fundamentally change the proof. It then follows that Ds Z η (the Malliavin derivative of Z η ) and γZ η (the corresponding Malliavin variance–covariance matrix) become (tin = (i − 1)∆n in our context)



Ds Z η (tin+1 ) = as (θ + ηl / n) and γZ η (t n

) i+1



tin+1

= tin

Ds Z η (tin+1 )[Ds Z η (tin+1 )]⊺ ds.

Let Dη Z η be the derivative of Z η with respect to η. It is actually the counterpart of X˙ θ in Clement et al. (2013). Following the matrix representation of such derivative, we get 1 Dη Z η (tin+1 ) = √ n



tin+1 tin

dW ⊺ ⊗ Id [Dθ a] l + other terms.

(

)

The function a is evaluated at the time point (i − 1)∆n and we omit this subscript to simplify the notation (same below). Basically, the term a˙ l dW in the cited paper becomes (dW ⊺ ⊗ Id )[Dθ a] l. Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

25

Then the term Pn (s), which is constant for s ∈ [tin , tin+1 ), becomes (note that tin+1 − tin ≡ ∆n in the current context) 1 1 l ⊺ [Dθ a]⊺ (∆ni W ⊗ a−⊺ ) = √ l ⊺ [Dθ a]⊺ (Id ⊗ a−⊺ )(∆ni W ⊗ Id ). Pn (s) = √ n∆ n n∆ n It is worth to point out that the term Pn (s) does not depend on the process Y (t) defined in (49) of the cited paper, even η though both b and a depend on Xtθ (the counterpart of Zt in this paper) therein. Hence, we did not lose any generality θ by assuming that b and a do not depend on Z . ⊺ We note that the term ∆Wt n (a−1 a˙ )⊺ ∆Wt n in (53) of Clement et al. (2013) is a scalar. Hence, it is the same as its trace. i

i

Therefore, we get

E[Tr(∆Wt n (a−1 a˙ )⊺ ∆Wt n )] = E[Tr((a−1 a˙ )⊺ ∆Wt n ∆Wt n )] = (tin+1 − tin )Tr(a−1 a˙ )⊺ , ⊺



i

i

i

i

where the right-hand-side term is the second term in (53) of Clement et al. (2013). For a d-dimensional normal random variable x ∼ N(µ, c), Ghazal and Neudecker (2000) prove that E(x ⊗ x) = vec(c) + µ ⊗ µ and Var(x ⊗ x) = (Id2 + Td,d ) c ⊗ c + (µµ⊺ ) ⊗ c + c ⊗ (µµ⊺ )

(

)

This implies that E[(dW ⊗ dW )(dW ⊗ dW )⊺ ]/dt = Id2 + Td,d . ⊺ According to the above results, and upon noticing that the counterpart of ∆Wt n (a−1 a˙ )⊺ ∆Wt n is [Dθ a]⊺ (Id ⊗ a−⊺ )(∆ni W ⊗

δ (Pn ) = √

1

(

n∆ n

i

i

∆ni W ), we readily get

)

l ⊺ [Dθ a]⊺ (Id ⊗ a−⊺ ) (∆ni W ⊗ ∆ni dW ) − vec(Id ) ∆n ,

where δ is the divergence operator, defined as the adjoint of the derivative operator. In this context, it can be interpreted the Skorohod integral (see Chapter 1.3 of Nualart (2006) for more details). Now the equation (54) in Clement et al. (2013) becomes (n = ⌊T /∆n ⌋) √ θ +l / n

log

dPn0

n 1 ∑ 1

=√

θ0

dPn

n

i=1

(

∆n

l ⊺ [Dθ a]⊺ (Id ⊗ a−⊺ ) (∆ni W ⊗ ∆ni dW ) − vec(Id ) ∆n

)

n

1 ∑ 1



2n

i=1

∆n

l ⊺ [Dθ a]⊺ (Id ⊗ a−⊺ )(∆ni W ⊗ ∆ni dW )(∆ni W ⊺ ⊗ ∆ni dW ⊺ )

× (Id ⊗ a−1 )[Dθ a] l + oP (1) 1

= l ⊺ Γn−1/2 ζn − l ⊺ Γn l + oP (1), 2

−1/2 ˜

where ζn := Γn

ζn and

n

( ∆n W ) ∆n W [Dθ a]⊺ (Id ⊗ a−⊺ ) √i ⊗ √i − vec(Id ) n ∆n ∆n i=1

1 ∑

ζ˜n := √ Γn :=

n 1∑ 1

n

i=1

∆n

[Dθ a]⊺ (Id ⊗ a−⊺ )(∆ni W ⊗ ∆ni dW )(∆ni W ⊺ ⊗ ∆ni dW ⊺ )(Id ⊗ a−1 )[Dθ a].

It is not very hard to verify that Γn −→ Γ , which is given by P

Γ = [Dθ a]⊺ (Id ⊗ a−⊺ )(Id2 + Td,d )(Id ⊗ a−1 )[Dθ a]. Moreover, one can also verify that ζ˜n converges stably in law to a random variable that is F -conditionally normal with Lst

zero mean and variance given by Γ . Hence ζn − −→ N(0, Iq ). This proves part (i). By taking derivative with respect to θ on both sides of the restriction y(c t ) ≡ x(c t ) θ 0 , we obtain (our matrix representation of the derivative preserves the classic chain rule)

Dy − (θ 0 ⊗ Ip ) Dx [Da c ] [Dθ a] ≡ x, ⊺

(

)

where D, Da , and Dθ represent the derivatives with respect to c , a, and θ , respectively. Then the optimal asymptotic new variance of ˆ θ n can be re-written as new ∗

Var(ˆ θn

) = (x⊺ Σθ−1θ x)−1 = γ (θ 0 )⊺ Σθ−1θ γ (θ 0 )

(

0 0

0 0

)−1

,

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

26

X. Yang / Journal of Econometrics xxx (xxxx) xxx

where γ (θ 0 ) ≡ Dy − (θ 0 ⊗ Ip ) Dx [Da c ] [Dθ a]. It is clear that if ⊺

(

)

γ (θ 0 )⊺ Σθ−01θ0 γ (θ 0 ) ≡ [Dθ a]⊺ (Id ⊗ a−⊺ )(Id2 + Td,d )(Id ⊗ a−1 )[Dθ a], new,∗

then ˆ θn

can attain the efficiency bound.

B.7. Proof of Lemma 7 Yang (2018) proved the following lemma, which has been modified to adapt to the notation of this paper. Lemma B.4. Let X and c satisfy Assumption HF for some r ∈ [0, 2). Suppose function f is twice continuously differentiable ϵ on Km , for each stopping time τm appeared in Assumption HF. If the following conditions hold (let 1/0 = ∞)

(

0 < κ ∧ (1 − κ ) < (2 − r) 1 ∧

1) r

ρ ∧ (1 − ρ ) 1 ≤ϖ < , 2(2 − r) 2

and

(B.2)

then, we have



) P 1 ( n f (ˆ c τl ) − f (ˆ c ′τnl ) −→ 0 ∀l. kn ∧ √ kn ∆n

Moreover, for any κ ∈ (0, 1), we have

(√

′n ¯ kn vec f (ˆ ( c τln) − f (c τl ) ) √ 1 vec f (c¯ τl ) − f (c τl ) k ∆ n

(

)

n

)

Lst

(

−−→ l≥0

Df (c τl ) Zτl Df (c τl ) Z¯τl

)

, l≥0

where (·)l≥0 means any finite number elements of the input. 2

Conditionally on F , the above two processes Z and Z¯ are two independent Rd -valued Gaussian white noises. Their F -conditional variance–covariance matrices at time t are given by

( ⊺ ) ( ⊺ ) 1 ˜ E Zt Zt | F = (Id2 + Td,d )(c t ⊗ c t ) and ˜ E Z¯t Z¯t | F = c˜ t ,

(B.3)

3

where Td,d is the commutation matrix and c˜ = σ˜ σ˜ ⊺ is the volatility of the vectorized volatility vec(c), known as volatility of volatility in the literature. ⊺ −1 2 Note that Dθ ∗ (c t )(Id2 + Td,d )(c t ⊗ c t )[Dθ ∗ (c t )]⊺ = xt Σθθ, t xt . Hence, when κ ∈ (0, 1/2) (equivalently, kn ∆n → 0), the above lemma implies that

(1 kn

−1 −1 (x⊺ Σθθ x)t

)−1/2 ( ∗ ) Lst ˆ θ t − θ t −−→ N(0, Iq ).

Note that this conclusion holds under both the null and the alternative. new According to Theorem 3, we have the following result under the null hypothesis for bias-corrected ˆ θn 1



∆n

√ ( new ) ( new ) 1 ˆ θ n − θ 0 = OP (1) H⇒ ˆ θ n − θ 0 = OP ( ∆n ) = oP ( √ ). kn

new n ,

Even for non-bias-corrected ˆ θ new ˆ θ n − θ 0 = OP

(1) kn

Lemma 1 also gives the order of those biases. Hence, we have

√ + OP (kn ∆n ) + OP ( ∆n ). √

To make all these terms to be oP (1/ kn ), it is sufficient to have kn → ∞, kn ∆n → 0 and k3n ∆2n → 0, all of which are new satisfied by the assumption of this lemma. Hence, the biases associated with ˆ θ n are asymptotically negligible at rate √ new 1/ kn . In other words, one does not have to correct the biases in ˆ θ n . In both cases, we can conclude that, under the assumption of this lemma, we get

(1 kn

−1 −1 (x⊺ Σθθ x)t

)−1/2 ( new ) P ˆ θ n − θ 0 −→ 0. −1

−1 Together with the fact ∆n (x⊺ Σθθ x)

(1 kn

−1 −1 −1 (x⊺ Σθθ x)t + ∆n (x⊺ Σθθ x)

= oP ( k1n ), the above results yield that

−1 )−1/2 ( ∗

new ) Lst ˆ θt − ˆ θn −−→ N(0, Iq ).

−1 −1 −1 −1 −1 −1 ˆθθ ˆ Moreover, since (ˆ x⊺ Σ x)t and [Fn (x⊺ Σθθ x)]−1 are consistent estimators of (x⊺ Σθθ x)t and (x⊺ Σθθ x) the desired results readily follows.

−1

, respectively,

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

27

B.8. Proof of Theorem 8 Recall that ∗ new ht := ˆ θt − ˆ θn

(

)⊺ ( 1 kn

−1 −1 −1 ˆθθ ˆ (ˆ x⊺ Σ x)t − ∆n [Fn (x⊺ Σθθ x)]−1

)−1 ( ∗ new ) ˆ θt − ˆ θn .

We introduce the following term, which is to be shown as a good approximation of ht : ∗ new θt − ˆ θn h′t = kn ˆ

(

)( ∗ new ) ˆ θt − ˆ θn . t ) −1

)⊺ (

−1 x⊺ Σθθ x

For simplicity, let f (c t ) = x⊺ Σθθ x t . Assume that f is twice continuously differentiable. Then Lemma B.4 implies that, when k2n ∆n → 0, we have

(

( (

2 −1 −1 ˆθθ ˆ kn E  ˆ x⊺ Σ x t − x⊺ Σθθ x t  | Ft

)

)

(

)

−→ Df (c t )(Id2 + Td,d )(c t ⊗ c t )[Df (c t )]⊺ . P

According to the localization argument (see, e.g., p.84 and p.118 of Jacod and Protter (2011)), it is sufficient to prove the desired result by replacing any locally bounded conditions in Assumption HF by bounded√ ones. As a consequence, the right ∗ new θt − ˆ θ n ) is also uniformly hand side above is uniformly bounded on [0, T ]. Similarly, under the null hypothesis, kn (ˆ bounded on [0, T ]. Hence, it can be shown that ∥ht − h′t ∥ ≤ L/kn , uniformly in t ∈ [0, T ], where L is some finite positive number (same below). This implies that



⌊T /∆n ⌋

T



kn ∆n

∥hi∆n − h′i∆n ∥

i=1

∆n

≤ √

L

k3n ∆n

T

→ 0.

′ ′ The same conclusion holds if we take the sum with step kn ∆n . Hence, it is sufficient to study Hnon and Hover , which are ′ defined by replacing h by h in Hnon and Hover , respectively. Next, define the following terms

ξt =

√ ( ) )1/2 ( ∗ ) ( new ) −1 1/2 ˆ and ζ = θ − θ x t θt − ˆ θn . kn x⊺ Σθθ t t t t

√ (

−1 x kn x⊺ Σθθ

It then follows that h′t = ξt ξt + ζt ζt + ξt ζt + ζt ξt . A second-order Taylor expansion yields that ⊺







( )⊺ ( ) ∗ ˆ θ t − θ t = Dθ ∗ (c t ) (ˆ c t − c t ) + Iq ⊗ vec(ˆ c nt − c t )⊗2 vec [Hθ ∗ (c˜ t )]⊺ + higher-order terms, where ∥˜c t − c t ∥ ≤ ∥ˆ c nt − c t ∥. Then accordingly, we can decompose ξt as ξt = ξt′ + ξt′′ . The results in the previous section imply that

( ) ) ( ( ) Lst −1 1/2 −1 1/2 ξt′ −−→ N 0, x⊺ Σθθ x t = N(0, Iq ). x t Dθ ∗ (c t )(Id2 + Td,d )(c t ⊗ c t )[Dθ ∗ (c t )]⊺ x⊺ Σθθ It is then easy to verify that the following results hold uniformly in t ∈ [0, T ]:

E([ξt′ ]⊺ ξt′ | Ft ) −→ q E ([ξt′ ]⊺ ξt′ − q)2 | Ft −→ 2q E ([ξt′ ]⊺ ξt′ − q)4 | Ft ≤ L. P

(

)

P

(

)

These results imply that



kn ∆n T

kn ∆n

⌊T /(kn ∆n )⌋−1



( ([ξ ′ ]⊺ ξ ′ ) ) P ik ∆ − q | Fikn ∆n −→ 0, √ n n

E

⌊T /(kn ∆n )⌋−1



T

E

( (([ξ ′ ]⊺ ξ ′ )

(kn ∆n )2

⌊T /(kn ∆n )⌋−1



ikn ∆n

− q)2

2q

i=1

T

2q

i=1

E

( (([ξ ′ ]⊺ ξ ′ )

ikn ∆n

− q)4

(2q)2

i=1

) P | Fikn ∆n −→ 1, ) P | Fikn ∆n ≤ Lkn ∆n −→ 0.

For any continuous martingale M, we are going to show that kn ∆n T

⌊T /(kn ∆n )⌋−1

∑ i=1

E

( (([ξ ′ ]⊺ ξ ′ )

ikn ∆n

2q

− q)2

(M(i+1)kn ∆n − Mikn ∆n ) | Fikn ∆n

)

−→ 0. P

In general, a continuous martingale M can be written as the sum of two components: one is orthogonal to the Brownian motion W driving log-prices, the other one is a stochastic integral with respect to W . For the orthogonal component, the conditional expectations of all Brownian motion related terms in the expansion are identically zero. For the other component, note that (([ξ ′ ]⊺ ξ ′ )ikn ∆n − q)2 is an even function of the Brownian increments in the local window from which we construct ([ξ ′ ]⊺ ξ ′ )ikn ∆n . Hence, the product is an odd function of those Brownian increments, the conditional Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

28

X. Yang / Journal of Econometrics xxx (xxxx) xxx

expectation of which is zero. Thus, in either case, what left in the conditional expectation is the product of two integrals of some locally bounded processes over a local window with length being kn ∆n . According to the localization argument mentioned above, we can assume that all those locally bounded processes are bounded on [0, T ]. It then follows that k n ∆n

⌊T /(kn ∆n )⌋−1 ⏐

T



)⏐ ( ⏐ ⏐ (([ξ ′ ]⊺ ξ ′ )ikn ∆n − q)2 (M(i+1)kn ∆n − Mikn ∆n ) | Fikn ∆n ⏐ ⏐E



2q

i=1

k n ∆n

⌊T /(kn ∆n )⌋−1

(



T

O (kn ∆n )2

)

(k ∆ n n =O T

i=1

T kn ∆n

(kn ∆n )2

)

( ) = O (kn ∆n )2 −→ 0.

This implies the desired result. Overall, the above result holds for any continuous martingale M defined on the same filtered probability space. Therefore, according to Theorems 2.2.13 (Pages 56 & 57) and 2.2.15 (Page 58) in Jacod and Protter (2011), we conclude that



⌊T /(kn ∆n )⌋−1

T

([ξ ′ ]⊺ ξ ′ )ikn ∆n − q kn ∆n



kn ∆n



i=1

T

2q

Lst

−−→ N(0, 1).

As for the second-order term, according to the strengthened assumption, Hθ ∗ (c˜ t ) is uniformly bounded on [0, T ]. Then Lemma B.4 implies that, for any t ∈ [0, T ], we have

((

c nt − c t )⊗2 E  Iq ⊗ vec(ˆ

)⊺

L

))

vec [Hθ ∗ (c˜ t )]⊺  ≤

(

kn

.

Therefore, ξt′′ is uniformly bounded by L/kn and ([ξ ′′ ]⊺ ξ ′′ )t is uniformly bounded by L2 /k2n , on [0, T ]. Consequently, we obtain



kn ∆n T

√ ≤

n ∆n )⌋−1  ( ⌊T /(k∑ (ξ ⊺ ξ )ikn ∆n − ([ξ ′ ]⊺ ξ ′ )ikn ∆n )  √ 

E 

⌊T /(kn ∆n )⌋−1

kn ∆n



T

√ ≤



L E ∥(ξ ⊺ ξ )ikn ∆n − ([ξ ′ ]⊺ ξ ′ )ikn ∆n ∥

(

i=1 ⌊T /(kn ∆n )⌋−1

kn ∆n



T



2q

i=1

( (

L E ∥([ξ ′′ ]⊺ ξ ′′ )ikn ∆n ∥ +

i=1 ⌊T /(kn ∆n )⌋−1

kn ∆n

L



T

kn

i=1

)

≤ √

L

k3n ∆n

)

√ ( )) E ∥ξ ′ ∥2 ∥ξ ′′ ∥2

.

The right hand side shrinks to zero if and only if k3n ∆n → ∞, or equivalently κ > 1/3. Under the null hypothesis θ t ≡ θ 0 , the discussion in the previous subsection implies that, even if we do not correct new the biases in ˆ θ n , we still have (note that the biases are all bounded under the strengthened assumption, as they are continuous functions of the volatility matrix) new

E(∥θ 0 − ˆ θn

∥) ≤ L(

1 kn

+ kn ∆n +



1 new ∆n ) and E(∥θ 0 − ˆ θ n ∥2 ) ≤ L( 2 + k2n ∆2n + ∆n ). kn



3/2

Under the strengthened assumption, the above result implies that E(∥ζt ∥) ≤ L(1/ kn + kn ∆n ) and E(∥ζt ∥2 ) ≤ L(1/kn + k3n ∆2n ), uniformly on [0, T ]. It then follows that



kn ∆n T

√ ≤

kn ∆n T

⌊T /(kn ∆n )⌋−1



E ∥([ζ ]⊺ ζ )ikn ∆n ∥

(

)

i=1 ⌊T /(kn ∆n )⌋−1



L(1/kn + k3n ∆2n ) ≤ √

i=1

L k3n ∆n

√ + L (kn ∆n )(k2n ∆n ).

What left to show is that the following result holds under the null hypothesis



kn ∆n T

⌊T /(kn ∆n )⌋−1



(ξikn ∆n ζikn ∆n + ζikn ∆n ξikn ∆n ) −→ 0. ⊺

P



i=1

Under the null, we readily get

( ∗ )⊺ ( ) −1 ξik⊺ n ∆n ζikn ∆n = kn ˆ θ ikn ∆n − θ 0 x⊺ Σθθ x ik

n ∆n

( new ) θ0 − ˆ θn .

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

29

It then follows that



⌊T /(kn ∆n )⌋−1

kn ∆n



T

ξik⊺ n ∆n ζikn ∆n

i=1

⌊T (kn ∆n )⌋−1 )⊺ ( ) kn T ( ∑ ( ∗ k n ∆n ) ( new ) −1 ˆ = θ ikn ∆n − θ 0 x⊺ Σθθ × θ0 − ˆ θn . x ik ∆ n n ∆n T i=1 ∑⌊T (kn ∆n )⌋−1 ∗ ˆ Suppose that θ ikn ∆n kn ∆n is an estimator of θ 0 without any bias correction. Suppose we do not correct the i=1 new ˆ bias in θ n either. Then, according to (2.10) and Lemma 1, we can deduce that



n ∆n )⌋−1  (⌊T (k∑ ( ∗ )⊺ ( ) kn ∆n ) ( new )  −1 ˆ × θ0 − ˆ θn  θ ikn ∆n − θ 0 x⊺ Σθθ x ik ∆ n n

E

T

i=1

)1/2 ( ( ⌊T (k∑ n ∆n )⌋−1  ( ( ∗ )⊺ ( ⊺ −1 ) kn ∆n 2 ) new 2 )  ˆ ˆ ≤ E  θ ikn ∆n − θ 0 x Σθθ x ik ∆  × E ∥θ 0 − θ n ∥ n n T

i=1

≤L

(( 1 k2n

) (1 + k2n ∆2n + ∆n × 2 + k2n ∆2n + ∆n

))1/2

kn

=L

(1 k2n

) + k2n ∆2n + ∆n .

Therefore, we get



kn ∆n T

√ ≤L

n ∆n )⌋−1 ) ( ⌊T /(k∑   ⊺ ⊺ (ξikn ∆n ζikn ∆n + ζikn ∆n ξikn ∆n ) E 

i=1

kn T ( 1

∆n k2n

+

k2n ∆2n

+ ∆n

)

√ ( ) √ √ 1 2 ≤L + k ∆ × k ∆ + k ∆ → 0. n n n n n n k3n ∆n

Then the desired result readily follows. Thus, we can conclude that, under the null hypothesis, we have



T

Lst

kn ∆n

−→ N(0, 1). Hnon −

Following the argument of Yang (2018), non-overlapping estimators with different starting points are perfectly correlated for sufficiently large n (or, equivalently, for sufficiently small ∆n ). Hence, their average Hover converges to the same limit. new Finally, it is obvious that if ˆ θ n − θ t = OP (1) under the alternative, the term ([ζ ]⊺ ζ ′ )ikn ∆n will diverge to infinity at rate kn . Hence, the divergence rate for H is



kn ∆n T

×

T kn ∆n

√ × kn ∝

kn

∆n

.

This completes the proof. B.9. Proof of Theorem 9 First of all, we have −1

ζn1 (τ ) = wτ⊺ (x⊺ Φ x)−1 x⊺ Φ y − wτ⊺ x⊺ Φ x x⊺ Φ y . new

According to previous results, the covariance matrix Cov(wτ θ, ˆ θn T

∫ 0

(

) converges to

)

) ( ) Dy t ( wτ⊺ (t)(x⊺t Φt xt )−1 x⊺t Φt Ip , −θ ⊺0 ⊗ Ip (Id2 + Td,d ) c t ⊗ c t Dxt

)( ) Ψt xt 0 ( ) Iq y t ⊗ xt xt ⊗ x (x⊺ Ψ x)−1 dt × Dy t , DΨt , Dxt ⊺ −θ 0 ⊗ Iq Iq ⊗ (Ψt y t ) Iq ⊗ (Ψt xt )(Iq2 + Tq,q ) ( ) ( Ip ) ∫ T ) Vyy ,t Vy Ψ ,t Vyx,t ( ⊺ ⊺ ⊺ −1 ⊺ 0 = wτ (t)(xt Φt xt ) xt Φt Ip , −θ 0 ⊗ Ip Ψt xt (x⊺ Ψ x)−1 dt Vxy ,t VxΨ ,t Vxx,t 0 −θ 0 ⊗ Ip ∫ T = wτ⊺ (t)(x⊺t Φt xt )−1 x⊺t Φt Σθ0 θ0 ,t Ψt xt (x⊺ Ψ x)−1 dt = wτ (xΦ x)−1 x⊺ ΦΣθ0 θ0 Ψ x (x⊺ Ψ x)−1 . (

0

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

30

X. Yang / Journal of Econometrics xxx (xxxx) xxx

Therefore, the asymptotic covariance matrix V 1 (τ , η; θ 0 , Φ , Ψ ) is given by new

θn Cov(wτ θ, wη θ ) − Cov(wτ θ, wη ˆ ⊺







new

) − Cov(wη ˆ θn ⊺

new , wτ⊺ θ ) + wτ⊺ Var(ˆ θ n ) wη

= wτ⊺ (x⊺ Φ x)−1 x⊺ ΦΣθ0 θ0 Φ x(x⊺ Φ x)−1 wη − wτ⊺ (x⊺ Φ x)−1 x⊺ ΦΣθ0 θ0 Ψ x(x⊺ Ψ x)−1 wη wτ ⊺ (x⊺ Ψ x)−1 x⊺ Ψ Σθ0 θ0 Φ x(x⊺ Φ x)−1 wη + wτ ⊺ (x⊺ Ψ x)−1 x⊺ Ψ Σθ0 θ0 Ψ x(x⊺ Ψ x)−1 wη . Note that the above results hold uniformly in τ ∈ Γ , where Γ is a compact set. This completes the first part. As for ζn2 (τ ), note that it can be written as −1

ζn2 (τ ) = wτ⊺ y − wτ⊺ x x⊺ Ψ x x⊺ Ψ y new

Following a similar calculation, we deduce that the covariance matrix Cov(wτ y , wη x ˆ θn ⊺

Ψt xt y t ⊗ xt wτ (t)Dy t (Id2 + Td,d ) c t ⊗ c t Dy t , DΨt , Dxt 0 I q ⊗ ( Ψt y t ) ( ) } Iq × (x⊺ Ψ x)−1 x⊺ wη + (wτ⊺ (t)Vyx,t (θ 0 ⊗ Ip )wη (t)) dt ⊺ −θ 0 ⊗ Iq



∫ T{

(



)(

)

) converges to

0

(

)

xt ⊗ x Iq ⊗ (Ψt xt )(Iq2 + Tq,q )

= wτ⊺ (Vyy − Vyx θ 0 ⊗ Ip )Ψ x (x⊺ Ψ x)−1 x⊺ wη + wτ⊺ Vyx (θ 0 ⊗ Ip )wη . new

θn On the other hand, the variance–covariance matrix Cov(wτ x ˆ ⊺

new , wη⊺ xˆ θ n ) converges to

new wτ⊺ x Var(ˆ θ ) x⊺ wη + wτ⊺ x (x⊺ Ψ x)−1 (Vyx θ 0 ⊗ Ip − θ ⊺0 ⊗ Ip Vxx θ 0 ⊗ Ip ) wη

+wτ⊺ (θ ⊺0 ⊗ Ip Vxy − θ ⊺0 ⊗ Ip Vxx θ 0 ⊗ Ip ) (x⊺ Ψ x)−1 x⊺ wη + wτ⊺ θ ⊺0 ⊗ Ip Vxx θ 0 ⊗ Ip wη . Putting all the above results together, we deduce that the asymptotic covariance matrix V 2 (τ , η; θ 0 , Ψ , Ψ ) is given by new

Cov(wτ y , wη y) − Cov(wτ y , wη x ˆ θn ⊺







new

) − Cov(wτ x ˆ θn ⊺

new

new

, wη⊺ y) + Cov(wτ⊺ xˆ θ n , wη⊺ xˆ θn )

= wτ⊺ Vyy wη − wτ⊺ (Vyy − Vyx θ 0 ⊗ Ip )Ψ x (x⊺ Ψ x)−1 x⊺ wη − wτ⊺ Vyx (θ 0 ⊗ Ip )wη wτ⊺ x (x⊺ Ψ x)−1 xΨ (Vyy − θ ⊺0 ⊗ Ip Vxy )wη − wτ⊺ (θ ⊺0 ⊗ Ip )Vxy wη new + wτ⊺ x Var(ˆ θ ) x⊺ wη + wτ⊺ x (x⊺ Ψ x)−1 (Vyx θ 0 ⊗ Ip − θ ⊺ ⊗ Ip Vxx θ 0 ⊗ Ip ) wη 0

+ wτ (θ ⊗ Ip Vxy − θ ⊗ Ip Vxx θ 0 ⊗ Ip ⊺

⊺ 0

⊺ 0

) (x⊺

Ψ x)−1 x⊺ wη

+ wτ⊺ θ ⊺0 ⊗ Ip Vxx θ 0 ⊗ Ip wη new

= wτ⊺ Σθ0 θ0 wη − wτ⊺ Σθ0 θ0 Ψ x (x⊺ Ψ x)−1 x⊺ wη − wτ⊺ x (x⊺ Ψ x)−1 x⊺ Ψ Σθ0 θ0 wη + wτ⊺ x Var(ˆ θ

) x⊺ wη .

Once again, the above results hold uniformly in τ ∈ Γ . This completes the proof. B.10. Proof of Corollary 10 Part (i) directly follows from Theorem 9. As for part (ii), note that under the alternative hypothesis, there exists τ , such that BKSn diverges to infinity. Then the conclusion readily follows. References Aït-Sahalia, Y., Jacod, J., 2009. Testing for jumps in a discretely observed process. Ann. Statist. 37, 184–222. Aït-Sahalia, Y., Jacod, J., 2014. High Frequency Financial Econometrics. Princeton University Press. Aït-Sahalia, Y., Xiu, D., 2017. Principal component analysis of high frequency data. J. Am. Stat. Assoc. (forthcoming). Anderson, T.W., Rubin, H., 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Ann. Math. Stat. 20, 46–63. Bierens, H.J., 1982. Consistent model specification tests. J. Econometrics 20, 105–134. Bierens, H.J., 1983. Uniform consistency of kernel estimators of a regression function under generalized conditions. J. Amer. Statist. Assoc. 78, 699–707. Bierens, H.J., 1990. A consistent conditional moment test of functional form. Econometrica 58, 1443–1458. Bierens, H.J., Ploberger, W., 1997. Asymptotic theory of integrated conditional moment tests. Econometrica 65, 1129–1151. Bollerslev, T., Li, S.Z., Todorov, V., 2016. Roughing up beta: continuous vs. discontinuous betas, and the cross section of expected stock returns. J. Financ. Econ. 120, 464–490. Bultheel, A., 1982. Inequalities in Hilbert modules of matrix-valued functions. Proc. Amer. Math. Soc. 85, 369–372. Carrasco, M., Florens, J.-P., 2014. On the asymptotic efficiency of GMM. Econometric Theory 30, 372–406. Chen, X., Fan, Y., 1999. Consistent hypothesis testing in semiparametric and nonparametric models for econometric time series. J. Econometrics 91, 373–401. Clement, E., Delattre, S., Gloter, A., 2013. An infinite dimensional convolution theorem with applications to the efficient estimation of the integrated volatility. Stochastic Process. Appl. 123, 2500–2521. Clinet, S., Potiron, Y., 2017. Estimation for high-frequency data under parametric market microstructure noise. Working paper, arXiv preprint arXiv:1712.01479. Fan, Y., Li, Q., 1996. Consistent model specification tests: omitted variables and semiparametric functional forms. Econometrica 865–890.

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.

X. Yang / Journal of Econometrics xxx (xxxx) xxx

31

Ghazal, G.A., Neudecker, H., 2000. On second-order and fourth-order moments of jointly distributed random matrices: a survey. Linear Algebra Appl. 321, 61–93. Godambe, V.P., 1960. An optimum property of regular maximum likelihood estimation. Ann. Math. Stat. 31, 1208–1211. Hansen, L.P., Richard, S.F., 1987. The role of conditioning information in deducing testable. Econometrica 55, 587–613, 1987. Hausman, J.A., 1978. Specification tests in econometrics. Econometrica 1251–1271. Hong, Y., 1996. Consistent testing for serial correlation of unknown form. Econometrica 837–864. Hong, Y., White, H., 1995. Consistent specification testing via nonparametric series regression. Econometrica 1133–1159. Jacod, J., 2012. In: Kessler, M., Lindner, A., Sorensen, M. (Eds.), Statistical Methods for Stochastic Differential Equations. In: Lecture Notes, CRC Press, pp. 191–310. Jacod, J., Protter, P., 2011. Discretization of Processes. Springer. Jacod, J., Rosenbaum, M., 2013. Quarticity and other functionals of volatility: efficient √estimation. Ann. Statist. 41, 1462–1484. Jacod, J., Rosenbaum, M., 2015. Estimation of volatility functionals: the case of a n window. In: Large Deviations and Asymptotic Methods in Finance. Springer, pp. 559–590. Jacod, J., Todorov, V., 2009. Testing for common arrivals of jumps for discretely observed multidimensional processes. Ann. Statist. 37, 1792–1838. Jacod, J., Todorov, V., 2010. Do price and volatility jump together? Ann. Appl. Probab. 20, 1425–1469. Kollo, T., von Rosen, D., 2006. Advanced Multivariate Statistics with Matrices, vol. 579. Springer Science & Business Media. Li, J., Liu, Y., Xiu, D., 2019. Efficient estimation of integrated volatility functionals via multiscale jackknife. Ann. Statist. 47, 156–176. Li, J., Todorov, V., Tauchen, G., 2013. Volatility occupation times. Ann. Statist. 41, 1865–1891. Li, J., Todorov, V., Tauchen, G., 2016. Inference theory for volatility functional dependencies. J. Econometrics 193, 17–34. Li, J., Todorov, V., Tauchen, G., 2017a. Adaptive estimation of continuous-time regression models using high-frequency data. J. Econometrics 200, 36–47. Li, J., Todorov, V., Tauchen, G., 2017b. Jump regressions. Econometrica 85, 173–195. Li, J., Xiu, D., 2016. Generalized method of integrated moments for high-frequency data. Econometrica 84, 1613–1633. Magnus, J.R., Neudecker, H., 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics, second ed. In: Wiley Series in Probability and Mathematical Statistics, Wiley. Mykland, P.A., Zhang, L., 2006. ANOVA for diffusions and Itô processes. Ann. Statist. 34, 1931–1963. Newey, W.K., 1985. Maximum likelihood specification testing and conditional moment tests. Econometrica 1047–1070. Nualart, D., 2006. The Malliavin Calculus and Related Topics, second ed. In: Probability and its Applications, Springer. Protter, P.E., 2005. Stochastic Integration and Differential Equations, second ed. Springer, Ramsey, J.B., 1974. Classical model selection through specification error tests. Front. Econom. 13–47. Reiß, M., Todorov, V., Tauchen, G., 2015. Nonparametric test for a constant beta between Ito semi-martingales based on high-frequency data. Stochastic Process. Appl. 125, 2955–2988. Tauchen, G., 1985. Diagnostic testing and evaluation of maximum likelihood models. J. Econometrics 30, 415–443. Vetter, M., 2015. Estimation of integrated volatility of volatility with applications to goodness-of-fit testing. Bernoulli 21, 2393–2418. Wang, Q., Phillips, P.C., 2012. A specification test for nonlinear nonstationary models. Ann. Statist. 40, 727–758. Wu, D.-M., 1973. Alternative tests of independence between stochastic regressors and disturbances. Econometrica 733–750. Yang, X., 2018. Semiparametric estimation in continuous-time: asymptotics for integrated volatility functionals with small and large bandwidths. Working paper.

Please cite this article as: X. Yang, Time-invariant restrictions of volatility functionals: Efficient estimation and specification tests. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.10.003.