Simulated likelihood estimators for discretely observed jump–diffusions

Simulated likelihood estimators for discretely observed jump–diffusions

Journal of Econometrics xxx (xxxx) xxx Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/j...

859KB Sizes 0 Downloads 51 Views

Journal of Econometrics xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Simulated likelihood estimators for discretely observed jump–diffusions✩ K. Giesecke a , G. Schwenkler b , a b



Department of Management Science & Engineering, Stanford University, United States Department of Finance, Boston University Questrom School of Business, United States

article

info

Article history: Received 13 October 2016 Received in revised form 23 July 2018 Accepted 8 January 2019 Available online xxxx JEL classification: codes C13 C51 C58 C63

a b s t r a c t This paper develops an unbiased Monte Carlo approximation to the transition density of a jump–diffusion process with state-dependent drift, volatility, jump intensity, and jump magnitude. The approximation is used to construct a likelihood estimator of the parameters of a jump–diffusion observed at fixed time intervals that need not be short. The estimator is asymptotically unbiased for any sample size. It has the same large-sample asymptotic properties as the true but uncomputable likelihood estimator. Numerical results illustrate its properties. © 2019 Elsevier B.V. All rights reserved.

Keywords: Unbiased density estimator Jump–diffusions Likelihood inference Asymptotic efficiency Computational efficiency

1. Introduction This paper addresses the parameter inference problem for a discretely-observed jump–diffusion process. We develop an unbiased Monte Carlo approximation to the transition density of the process and use it to construct likelihood estimators of the parameters specifying the dynamics of the process. The results include asymptotic unbiasedness, consistency, and asymptotic normality as the sample period grows. Our approach is motivated by (i) the fact that it offers consistent and asymptotically efficient estimation at any observation frequency, and (ii) its computational advantages in calculating and maximizing the likelihood. ✩ Schwenkler acknowledges support from a Mayfield Fellowship and a Lieberman Fellowship. We are grateful to Rama Cont, Darrell Duffie, Peter Glynn, Emmanuel Gobet, Marcel Rindisbacher, Olivier Scaillet, Alexander Shkolnik (who pointed out an error in earlier version of this work), Stefan Weber, and the participants at the Bachelier World Congress, the BU Conference on Credit and Systemic Risk, the Conference on Computing in Economics and Finance, the European Meeting of the Econometric Society, the INFORMS Annual Meeting, the SIAM Financial Mathematics and Engineering Conference, and seminars at Boston University, Carnegie Mellon University, the Federal Reserve Board, Princeton University, the University of California at Berkeley, and the Worcester Polytechnic Institute for useful comments. We are also grateful to Francois Guay for excellent research assistance. An implementation in R of the methods of this paper can be downloaded at http://www.gustavo-schwenkler.com. ∗ Corresponding author. E-mail address: [email protected] (G. Schwenkler). URL: http://www.gustavo-schwenkler.com (G. Schwenkler). https://doi.org/10.1016/j.jeconom.2019.01.015 0304-4076/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

2

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

More specifically, we consider a one-dimensional jump–diffusion process whose drift, volatility, jump intensity, and jump magnitude are allowed to be arbitrary parametric functions of the state. We develop unbiased simulation estimators of the transition density of the process and its partial derivatives. Volatility and measure transformation arguments are first used to represent the transition density as a mixture of weighted Gaussian distributions, generalizing the results of Dacunha-Castelle and Florens-Zmirou (1986) and Rogers (1985) for diffusions. A weight takes the form of a conditional probability that a certain doubly-stochastic Poisson process has no jumps in a given interval. We develop an unbiased Monte Carlo approximation of that probability using ideas from the exact sampling of solutions of stochastic differential equations.1 The resulting transition density estimator is unbiased, non-negative, and guaranteed to have finite variance under mild conditions. For any given time horizon, the accuracy of the estimator depends only on the number of Monte Carlo replications used to compute it. Moreover, the estimator can be evaluated at any value of the parameter and arguments of the density function without re-simulation. This property reduces the maximization of the simulated likelihood to a deterministic problem that can be solved using standard methods. We analyze the asymptotic behavior of the estimator maximizing the simulated likelihood. The estimator converges almost surely to the true likelihood estimator as the number of Monte Carlo replications grows, for a fixed sample period. Therefore, the estimator inherits the consistency, asymptotic normality, and asymptotic efficiency of the true likelihood estimator if the number of Monte Carlo replications grows at least as fast as the sample period. Our estimator does not suffer from the second-order bias generated by a conventional Monte Carlo approximation of the transition density, which entails a time-discretization of the process and non-parametric kernel estimation.2 Our exact Monte Carlo approach eliminates the need to discretize the process and perform kernel estimation. It facilitates efficient likelihood estimation for jump–diffusion processes with state-dependent coefficients. Numerical results illustrate the accuracy and computational efficiency of the density approximation as well as the performance of the simulated likelihood estimator. Our density estimator outperforms alternative estimators of the transition density, both in terms of accuracy and computational efficiency. The error of our density estimator converges at the fastest rate. Moreover, our density estimator entails the smallest computational cost per observation when calculating the likelihood. The cost decreases as more observations become available. Performing likelihood estimation, we see that our estimator compares favorably to alternative simulation-based likelihood estimators when a fixed computational budget is given.3 Our results have several important applications. Andrieu et al. (2010) show that an unbiased density estimator can be combined with a Markov Chain Monte Carlo method to perform exact Bayesian estimation. Thus, our density estimator enables exact Bayesian estimation of jump–diffusion models with general coefficients.4 Our density estimator can also be used to perform efficient generalized method of moments (GMM) estimation of jump–diffusions. It is well-known that the optimal instrument that yields maximum likelihood efficiency for GMM estimators is a function of the underlying transition density (see Feuerverger and McDunnough (1981) and Singleton (2001)). Our unbiased density estimator can be used instead of the unknown true density, enabling efficient GMM estimation for jump–diffusions.5 Finally, the transition density approximation can be used to estimate derivative prices without the need to re-simulate for different model parameters. This yields computational efficiency in econometric applications. Cosma et al. (2018) highlight potential applications of our density estimator in the valuation of path-dependent derivatives. The remainder of the introduction discusses the related literature. Section 2 formulates the inference problem. Section 3 develops a representation of the transition density of a jump–diffusion. Section 4 uses this representation to construct an unbiased Monte Carlo estimator of the density and its partial derivatives. Section 5 discusses the implementation of the estimator. Section 6 analyzes the asymptotic behavior of the estimator maximizing the simulated likelihood. Section 7 presents numerical results. A technical appendix contains the proofs. 1.1. Related literature Prior research on the parametric inference problem for discretely-observed stochastic processes has focused mostly on diffusions. Of particular relevance to our work is the Monte Carlo likelihood estimator for diffusions with state-dependent coefficients proposed by Beskos et al. (2009). They use the exact sampling method of Beskos et al. (2006) to approximate the likelihood for a discretely-observed diffusion. Their estimator is a special case of ours. Because we require weaker assumptions on the coefficient functions of the process, our estimator has a broader scope even for a diffusion. Moreover, our approach allows us to optimize the computational efficiency of the estimator. Numerical results indicate that the efficiency gains can be substantial. 1 See Beskos and Roberts (2005), Casella and Roberts (2011), Chen and Huang (2013), Giesecke and Smelov (2013), Gonçalves and Roberts (2014), and others. 2 Detemple et al. (2006) analyze this bias in the diffusion case and propose a bias-corrected discretization scheme. For jump–diffusions with state-independent coefficients, Kristensen and Shin (2012) provide conditions under which the bias is zero. 3 We have implemented the transition density approximation and the likelihood estimator in R. The code can be downloaded at http: //www.gustavo-schwenkler.com. It can be easily customized to treat a given jump–diffusion. 4 See Semaidis et al. (2013) for Bayesian estimators for diffusions with state-dependent coefficients. 5 Approximate GMM estimators that achieve efficiency have recently been proposed by Carrasco et al. (2007), Chen et al. (2013), and Jiang and Knight (2010).

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

3

Lo (1988) treats a jump–diffusion with state-independent Poisson jumps by numerically solving the partial integro differential equation governing the transition density. Kristensen and Shin (2012) analyze a nonparametric kernel estimator of the transition density of a jump–diffusion with state-independent coefficients.6 Aït-Sahalia and Yu (2006) develop saddlepoint expansions of the transition densities of Markov processes, focusing on jump–diffusions with stateindependent Poisson jumps and Lévy processes. Filipović et al. (2013) analyze polynomial expansions of the transition density of an affine jump–diffusion. Li and Chen (2016) study a power series expansion of the transition density of a jump– diffusion with state-independent Poisson jumps. Yu (2007) provides a small-time expansion of the transition density of a jump–diffusion in a high-frequency observation regime, assuming a state-independent jump size. The associated estimator inherits the asymptotic efficiency of the likelihood estimator as the observation frequency grows large; see Chang and Chen (2011) for the diffusion case. Unlike the transition density approximations developed in the aforementioned papers, our unbiased density estimator applies to jump–diffusions with general state-dependent drift, diffusion, jump intensity, and jump size. Beyond mild regularity, no structure is imposed on the coefficient functions. The approximation has a significantly wider scope than existing estimators, including, in particular, models with state-dependent jumps and non-affine formulations. The simulated likelihood estimator inherits the asymptotic efficiency of the theoretical likelihood estimator as both the sample period and the number of Monte Carlo replications grow, for any observation frequency. Our approach harnesses the exact sampling methods for diffusions developed by Beskos and Roberts (2005) and Chen and Huang (2013). While these exact methods have been extended to jump–diffusions (see Casella and Roberts (2011), Giesecke and Smelov (2013), and Gonçalves and Roberts (2014)), their direct application together with kernel estimation generates biased density estimators. The results of Detemple et al. (2006) suggest that likelihood inference using biased density estimators leads to biased or inefficient parameter estimators. Our approach exploits a novel density representation, which eliminates the need for kernel estimation and facilitates an unbiased density approximation. 2. Inference problem Fix a complete probability space (Ω , F , P) and a right-continuous, complete information filtration (Ft )t ≥0 . Let X be a Markov jump–diffusion process valued in S ⊂ R that is governed by the stochastic differential equation (SDE) dXt = µ(Xt ; θ )dt + σ (Xt ; θ )dBt + dJt ,

(1)

where X0 is a constant, µ : S × Θ → R is the drift function, σ : S × Θ → R+ is the volatility function, B is a standard Brownian motion, and Jt =

Nt ∑

Γ (XTn − , Dn ; θ ),

(2)

n=1

where N is a non-explosive counting process with event stopping times (Tn )n≥0 where T0 = 0, and intensity λt = Λ(Xt ; θ ) for a function Λ : S × Θ → [0, ∞). The function Γ : S × D × Θ → R governs the jump magnitudes of X , and (Dn )n≥1 is a sequence of i.i.d. mark variables with probability density function π on D ⊂ R. If Γ ≡ 0, then X has continuous sample paths and Λ can be freely chosen. The drift, volatility, jump intensity, and jump size functions are specified by a parameter θ ∈ Θ , where Θ is a compact subset of Rr with non-empty interior for r ∈ N. The SDE (1) is assumed to have a unique strong solution; see Protter (2004, Theorem V.3.7) for sufficient conditions. Finally, we assume that the boundary of S is unattainable. Our goal is to estimate the parameter θ specifying the dynamics of X given a sequence of values X = {Xt0 , . . . , Xtm } of X observed at fixed times 0 = t0 < · · · < tm < ∞. For ease of exposition, we assume that ti − ti−1 = ∆ for all i and some fixed ∆ > 0. The data X is a random variable valued in S m and measurable with respect to Bm , where B is the Borel σ -algebra on S . The likelihood of the data is the Radon–Nikodym density of the law of X with respect to the Lebesgue measure on (S m , Bm ). Let pt (x, ·; θ ) be the Radon–Nikodym density of the law of Xt given X0 = x with respect to the Lebesgue measure on (S , B), i.e., the transition density∏ of X (see Theorem 3.1 for existence). Given the Markov m property of X , the likelihood function takes the form L(θ ) = i=1 p∆ (X(i−1)∆ , Xi∆ ; θ ). A maximum likelihood estimator (MLE) θˆm satisfies θˆm ∈ arg maxθ ∈Θ L(θ ) almost surely. We only consider interior MLEs for which ∇ L(θ )|θ=θˆm = 0 and we assume consistency, asymptotic normality, and asymptotic efficiency; standard conditions can be found in Newey ∗ and McFadden (1994) and Singleton (2006), among others. the true data-generating parameter, √Letting θ ∗ ∈ int Θ denote ∗ these assumptions imply that θˆm → θ in probability and m(θˆm −θ ) → N(0, Σθ−∗1 ) in distribution as m → ∞. Here, Σθ ∗ is the Fisher information matrix. Throughout, we use ∇ and ∇ 2 to denote the gradient and the Hessian matrix operators, respectively. For any 1 ≤ i1 , . . . , in ≤ r, we write ∂in ,...,in for the nth partial derivative with respect to θi1 , . . . , θin . We also 1 write Pθ and Eθ for the probability and expectation operators when the parameter governing (1)–(2) is θ . Several generalizations of the model formulation are possible. We can extend to time-inhomogeneous Markov jump– diffusions, for which the coefficient functions may depend on state and time. It is straightforward to treat the case of 6 The assumption that the distribution of ϵ is independent of t and θ in equation (1) of Kristensen and Shin (2012) effectively restricts their t model to state-independent jump–diffusions. Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

4

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

observation interval lengths ti − ti−1 that vary across i, the case of a random initial value X0 , and the case of mark variables with parameter-dependent density function π = π (·; θ ). The analysis can be extended to certain multi-dimensional jump– diffusions, namely those that are reducible in the sense of Definition 1 in Aït-Sahalia (2008). Finally, we can also extend to settings in which the boundary ∂ S is attainable. The results presented below hold until the first hitting time of ∂ S. 3. Transition density Using volatility and measure transformation arguments, this section develops a weighted Gaussian mixture representation of the transition density p∆ of X . 3.1. Change of variables We begin ∫ wby applying a change of variables to transform X into a unit-volatility process. Define the Lamperti transform F (w; θ ) = X σ (u1;θ ) du for w ∈ S and θ ∈ Θ . For every θ ∈ Θ , the mapping w ↦ → F (w; θ ) is well-defined given that 0 σ (u; θ ) > 0 for all u ∈ S . Set Yt (θ ) = F (Xt ; θ ) on the state space SY = F (S ; θ ). If σ (x; θ ) is continuously differentiable in x, then Itô’s formula implies that Y = Y (θ ) solves the SDE dYt = µY (Yt ; θ )dt + dBt + dJtY ,

Y0 = F (X0 ; θ ) = 0.

Since 0 < σ (u; θ ) < ∞ for all u ∈ S , θ ∈ Θ , it follows that F is invertible with respect to w ∈ S . Let F −1 (y; θ ) denote the inverse of F , such that F (F −1 (y; θ ); θ ) = y. The drift function of Y satisfies

µY (y; θ ) =

µ(F −1 (y; θ ); θ ) 1 ′ −1 − σ (F (y; θ ); θ ) σ (F −1 (y; θ ); θ ) 2

in the interior of SY . The Lamperti transformation does not affect the jump intensity∑ of N, but it alters the jump magnitudes Nt of the state process. The process J Y describing the jumps of Y is given by JtY = n=1 ΓY (YTn − , Dn ; θ ) for the jump size function

( ) ΓY (y, d; θ ) = F F −1 (y; θ ) + Γ (F −1 (y; θ ), d; θ ); θ − y,

y ∈ SY .

The Lamperti transformation can be understood as a change of variables. If X has a transition density p∆ , then Y has a transition density, denoted pY∆ . We have 1

p∆ (v, w; θ ) =

σ (w; θ )

pY∆ (F (v; θ ), F (w; θ ); θ ).

(3)

3.2. Change of measure To facilitate the computation of the transition density pY∆ , we change from Pθ to a measure P∗θ under which Y has D P Z∆ , where constant drift ρ ∈ R and jumps arrive according to a Poisson process with fixed rate ℓ ≥ 0. Let Z∆ = Z∆ D

(

Z∆ = exp − P Z∆ = exp

1 2 ∆

(∫





(µY (Ys ; θ ) − ρ) ds − 2





(µY (Ys ; θ ) − ρ) dBs

) (4)

0

0

)∏ N∆ ( ) Λ(F −1 (Ys ; θ ); θ ) − ℓ ds

0

n=1

ℓ Λ(F −1 (Y

Tn −

; θ ); θ )

(5)

∏0

for θ ∈ Θ (by convention we take n=1 = 1). If Eθ [Z∆ ] = 1, we can define an equivalent probability measure P∗θ on (Ω , F∆ ) by P∗θ [Y∆ ∈ A] = Eθ [Z∆ 1{Y∆ ∈A} ] for any A ∈ σ (SY ). If µY (y; θ ) is continuously differentiable in y, integration by parts implies that

(

Z∆ = exp a(Y0 ; θ ) − a(Y∆ ; θ ) +



∫ 0

b(Ys ; θ )ds

)∏ N∆ n=1

1 c(YTn − , Dn ; θ )

(6)

where a, b : SY × Θ → R and c : SY × D × Θ → R are given by a(y; θ ) =



y

(µY (u; θ ) − ρ) du,

0

b(y; θ ) = Λ(F −1 (y; θ ); θ ) − ℓ +

µ2Y (y; θ ) − ρ 2 + µ′Y (y; θ )

, 2 ( ) ∫ y + Γ (y , d ;θ ) Y Λ(F −1 (y; θ ); θ ) c(y, d; θ ) = exp − (µY (u; θ ) − ρ) du . ℓ y Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

5

∫t

The Theorems of Girsanov, Lévy, and Watanabe imply that, under P∗θ and on [0, ∆], Wt = Bt + 0 (µY (Ys ; θ ) − ρ )ds is a standard P∗θ -Brownian motion, jumps arrive according to a Poisson process with rate ℓ, the random variables (Dn )n≥1 are i.i.d. with density π , and Y is governed by the SDE dYt = ρ dt + dWt + dJtY .

(7)

Under P∗θ , the process Y is a jump–diffusion with state-independent Poisson jumps that arrive at rate ℓ. The size of the nth jump of Y is a function of the state YTn − , the parameter θ , and Dn , where the Dn ’s are i.i.d. variables with density π . Between jumps, Y follows a Brownian motion with drift ρ .

3.3. Density representation We exploit the volatility and measure transformations to represent the transition density p∆ as a mixture of weighted Gaussian distributions. Theorem 3.1.

Fix ∆ > 0. Suppose the following assumptions hold.

(A1) For any θ ∈ Θ , the function u ↦ → µ(u; θ ) is continuously differentiable and the function u ↦ → σ (u; θ ) is twice continuously differentiable. (A2) For any θ ∈ Θ , the variable Z∆ has unit mean: Eθ [Z∆ ] = 1. Let v, w be arbitrary points in S , and set x = F (v; θ ) and y = F (w; θ ). Let Y x be the solution of the SDE (7) on [0, ∆] with Y0 = x. Then the transition density p∆ exists and ea(y;θ )−a(x;θ )

p∆ (v, w; θ ) =

σ (w; θ )

Ψ∆ (x, y; θ )

(8)

for any θ ∈ Θ , where

[ Ψ∆ (x, y; θ ) =

E∗θ

(

q∆−T ∗∗ N



YTx∗ N∗ ∆

)

, y f∆−T ∗∗

ft (u, z ; θ ) = E∗θ

[

t

( ∫ exp − 0

YTx∗ N∗ ∆

N



1 (z − ρ t − u)2 qt (u, z) = √ exp − 2t 2π t

(

(

)

, y; θ

∗ N∆ )∏ (

c

YTx∗ − n

)

, Dn ; θ fTn∗ −Tn∗−1

(

YTx∗ n−1

,

YTx∗ − n



)

] ,

(9)

n=1

,

(10)

)⏐ ⏐

b(u + ρ s + Ws ; θ )ds ⏐⏐ Wt = z − u − ρ t

] (11)

for u, z ∈ SY , and N ∗ is a P∗θ -Poisson process with rate ℓ and jump times (Tn∗ )n≥1 . Assumption (A1) of Theorem 3.1 is standard. Assumption (A2) ensures that the change of measure is well-defined. Sufficient conditions for (A2) are given by Blanchet and Ruf (2016) and Protter and Shimbo (2008), for example.7 The density representation of Theorem 3.1 involves the expectations (11) of path functionals of Brownian bridges. These bridges represent the behavior of Y between the Poisson jump times Tn∗ under P∗θ given the values (Tn∗ , YTx∗ , YTx∗ − ). n n The representation is valid for any choice of the measure change parameters ρ and ℓ. As we will see, these parameters have a significant impact on the efficiency of a Monte Carlo estimator of (8) developed in Sections 4 and 5. We show in Section 5.2 how to select ρ and ℓ so as to optimize the efficiency of that estimator. Theorem 3.1 also covers the special case of a diffusion. To specify X as a diffusion we can set the jump intensity Λ ≡ 0 ∗ in (1). In this case, the equivalence of the measures Pθ and P∗θ requires that ℓ = 0 as well so that N∆ = 0 and TN∗∗ = T0∗ = 0 ∆ almost surely. Theorem 3.1 then yields the density representation (8) with

Ψ∆ (x, y; θ ) = q∆ (x, y) f∆ (x, y; θ) .

(12)

If we set ρ = 0, then this is the diffusion density representation developed by Dacunha-Castelle and Florens-Zmirou (1986) and Rogers (1985).

7 If Γ ≡ 0, Λ is constant, and ℓ = Λ, then Z P = 1 almost surely and the measure change does not impact the dynamics of jump arrival (N = N ∗ ). ∆ D In that case, a sufficient condition for (A2) is Novikov’s Condition. If µY is constant and ρ = µY , then Z∆ = 1 almost surely and the measure change does not alter the drift of Y . In this case, sufficient conditions for (A2) are given by Brémaud (1980) and Giesecke and Shkolnik (2019). Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

6

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

An alternative approach to specifying X as a diffusion is to set Γ ≡ 0 in (2). Even if the jump intensity Λ is nontrivial, the jumps will not impact X because J ≡ 0. While the function Λ can be freely chosen in this case, a convenient choice is Λ = ℓ.8 With that choice, Theorem 3.1 gives the density representation (8) with c = 1 in (9). Following this approach offers an additional degree of freedom relative to (12), namely the choice of ℓ. The degrees of freedom can be exploited; see Section 7 for numerical results. We harness the representation (8) to develop conditions guaranteeing the smoothness of the transition density with respect to the parameter θ . Smoothness is often required for consistency and asymptotic normality of a MLE θˆm . Proposition 3.2.

Suppose the conditions of Theorem 3.1 hold. Furthermore, suppose that the following conditions also hold.

(A3) The functions (u, θ ) ↦ → Λ(u; θ ) and (u, d, θ ) ↦ → Γ (u, d; θ ) are n-times continuously differentiable in (u, d, θ ) ∈ S × D × Θ . The function (u, θ ) ↦ → µ(u; θ ) is (n + 1)-times continuously differentiable. The function (u, θ ) ↦ → σ (u; θ ) is (n + 2)-times continuously differentiable in (u, θ ) ∈ S × Θ . (A4) For every u, z ∈ SY and θ ∈ Θ there exists a neighborhood Au,z ,θ ⊂ SY2 × Θ such that

[

sup (u′ ,z ′ ,θ ′ )∈Au,z ,θ

E∗θ exp

1

(∫

⏐ ( ′ )⏐ ⏐b u + s(z ′ − u′ ) − ρ s∆ + Ws∆ − sW∆ ; θ ′ ⏐ ds

)]

< ∞.

0

Then θ ↦ → p∆ (v, w; θ ) is n-times continuously differentiable for any v, w ∈ S . 4. Transition density estimator This section uses the representation in Theorem 3.1 to develop an unbiased Monte Carlo estimator of the transition density p∆ . The key is estimating the expectation (9). We require exact samples of several random quantities. Samples ∗ of N∆ and the jump times (Tn∗ )n≤N∆∗ of the P∗θ -Poisson process N ∗ can be generated exactly (using the order statistics property, for example). Under P∗θ , the variables (YTx∗ − )n≤N∆∗ are conditionally Gaussian so samples can also be generated n exactly. Moreover, the marks (Dn )n≤N∆∗ can be sampled exactly from the P∗θ -density π by the inverse transform method, for example. The only non-trivial task is the unbiased estimation of the expectation (11). We follow an approach developed by Beskos and Roberts (2005) for the exact sampling of a diffusion and its extension by Chen and Huang (2013). To see how an unbiased estimator of (11) can be constructed, suppose for the moment that the function b is positive. Then (11) is the conditional probability that a doubly-stochastic Poisson process with intensity b(u + ρ s + Ws ; θ ) has no jumps in the interval [0, t ], given the value Wt . If b is also bounded, then this probability can be estimated without bias using a simple thinning scheme (Lewis and Shedler, 1979). Here, one generates the jump times τ1 < · · · < τp of a dominating Poisson process on [0, t ] with intensity maxu′ b(u′ ; θ ), and a skeleton Wτ1 , . . . , Wτp of a Brownian bridge starting from 0 at time 0 and ending at z − u − ρ t at time t. An estimator of the desired no-jump probability is p ( ∏

1−

b(u + ρτi + Wτi ; θ )

i=1

maxu′ b(u′ ; θ )

)

,

(13)

which is the empirical probability of rejecting the jump times of the dominating Poisson process as jump times of a doubly-stochastic Poisson process with intensity b(u + ρ s + Ws ; θ ) conditional on Wt = z − u − ρ t. This approach extends to the case where b is not necessarily bounded or positive; see Chen and Huang (2013) for the case ρ = 0. We partition the interval [0, t ] into segments in which W is bounded. Let η = min{t > 0 : Wt ∈ / [−L, L]} for some level L > 0 whose choice will be discussed below. If the function b is continuous, then b(u + ρ t + Wt ; θ ) −

min

|w|≤L+|ρ|η

b(u + w; θ )

(14)

is non-negative and bounded for t ∈ [0, η]. Consequently, we can use the estimator (13) locally up to time η if we replace the function b with (14). We iterate this localization argument to obtain an unbiased estimator of (11). Let (ηi : i = 1, . . . , I + 1) be i.i.d. samples of η for I = sup{i : η1 + · · · + ηi ≤ t } and set E0 = 0, Ei = η1 + · · · + ηi . In addition, let τi,1 < · · · < τi,pi be a sample of the jump times of a dominating Poisson process on [0, ηi ] with rate Mi = mi =

max

b u + ρ Ei−1 + WEi−1 + u′ ; θ − mi ,

min

b u + ρ Ei−1 + WEi−1 + u′ ; θ .

|u′ |≤L+|ρ|ηi |u′ |≤L+|ρ|ηi

( (

)

where

)

Note that 0 ≤ Mi < ∞ for all y ∈ SY if b is continuous. Finally, define τi,0 = 0, wi,0 = WEi−1 , and let wi,1 , . . . , wi,pi be a skeleton of the Brownian bridge starting at WEi−1 at time Ei−1 and finishing at WEi at time Ei . An unbiased estimator of 8 Of course, any parameters governing Λ that do not enter the drift and volatility functions will not be identifiable. The choice Λ = ℓ is optimal for the Monte Carlo estimator of the density developed in Sections 4–5. Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

7

(11) is given by fˆt (u, z ; θ ) =

I +1 ∏

e−mi (Ei −Ei−1 )

pi ( ∏

i=1

where EI +1 = t −

1−

b(u + ρ (Ei−1 + τi,j ) + wi,j ; θ ) − mi

i=1

(15)

Mi − mi

j=1

∑I

)

Ei .

Theorem 4.1. Suppose that the conditions of Theorem 3.1 hold. Fix the localization level L > 0, the Poisson rate ℓ ≥ 0, and the drift ρ ∈ R. Set pˆ ∆ (v, w; θ ) =

ea(y;θ )−a(x;θ )

σ (w; θ )

( q∆−T ∗

N∗



YTx∗ N∗ ∆

∗ N∆

×

∏ (

c YTx∗ − , Dn ; θ n

)

)

, y fˆ∆−T ∗∗

(

N



fˆTn∗ −T ∗

n−1

YTx∗ N∗ ∆

, y; θ

(

YTx∗ , YTx∗ − ; θ n−1

)

)

(16)

n

n=1

for x = F (v; θ ) and y = F (w; θ ). Suppose the following conditions hold: (B1) For any θ ∈ Θ , v ↦ → Λ(v; θ ) is continuous, v ↦ → µ(v; θ ) is continuously differentiable, and v ↦ → σ (v; θ ) is twice continuously differentiable. (B2) The function b is not constant. Then pˆ ∆ (v, w; θ ) is an unbiased estimator of the transition density p∆ (v, w; θ ) for any v, w ∈ S , and θ ∈ Θ . That is, E∗θ [ˆp∆ (v, w; θ )] = p∆ (v, w; θ ). Theorem 4.1 provides mild conditions on the coefficient functions of X guaranteeing the unbiasedness of the transition density estimator (16). Section 5 discusses the practical implementation of the estimator, including the optimal selection of L, ℓ and ρ . Theorem 4.1 also applies in the case that X is a diffusion (Λ ≡ 0 or Γ ≡ 0). If we take Λ ≡ 0, ρ = ℓ = 0, and L = ∞, then (16) reduces to the diffusion density estimator of Beskos et al. (2009). However, we note that, even with these choices, (16) has a slightly broader scope than the estimator of Beskos et al. (2009), which requires the coefficient functions of X to satisfy additional conditions.9 More importantly, (16) allows us to implement the diffusion case by setting Γ ≡ 0 and Λ = ℓ, and choosing ℓ, ρ and L so as to optimize the efficiency of (16) as explained in Section 5.2. The numerical results in Section 7.4 illustrate the efficiency improvements that can be achieved with this approach. A simulation estimator of the transition density p∆ (v, w; θ ) of the jump–diffusion X is given by the average of independent Monte Carlo samples of pˆ ∆ (v, w; θ ) drawn from P∗θ . If pˆ ∆ (v, w; θ ) has finite variance, then it is guaranteed to converge at the square root rate to the true density. We establish the finite variance property. Proposition 4.2. Suppose that the conditions of Theorem 4.1 are satisfied. Assume that the following conditions also hold for any θ ∈ Θ : (B3) The function u ↦ → b(u; θ ) decreases at most at a linear rate as u grows large. That is, lim infu→±∞ u > −∞. ∗ (B4) For any u ⏐∈ SY , the expected ⏐ jump magnitude Eθ [ΓY (u, D1 ; θ )] diverges at most at a linear rate as u → ±∞. That is, b(u;θ )

⏐∫

supu→±∞ ⏐

ΓY (u,s;θ ) u

⏐ π (s)ds⏐ < ∞.

∗ (B5) There exists an integer k∗ ∈ N0 such that( for any ∫ ) u ∈ SY , the expectation Eθ [c(u, D1 ; θ )] is bounded above by a multiple u k∗ u k∗ of e u . That is, c(u, s; θ )π (s)ds = O e u .

Then pˆ ∆ (v, w; θ ) has finite variance for any v, w ∈ S and θ ∈ Θ . Assumptions (B3)–(B5) put mild restrictions on the coefficient functions governing (1).10 They essentially require that the drift µ(x; θ ), the squared volatility σ 2 (x; θ ), the jump intensity Λ(x; θ ), and the expected jump magnitude E∗θ [Γ (x, D1 ; θ )] diverge at most at a linear rate as x → ±∞. If this restriction would not hold, then the coefficient functions could not be Lipschitz continuous, violating a well-known condition for the non-explosiveness of X . Often, one is interested in evaluating partial derivatives of p∆ (v, w; θ ). For example, many sufficient conditions for consistency and asymptotic normality of a MLE θˆm are formulated in terms of partial derivatives of the density. Conveniently, under certain conditions, the density estimator (16) can be differentiated to obtain unbiased estimators of the partial derivatives of the transition density. 9 Beskos et al. (2009) require inf w∈SY b(w; θ ) > ∞ and supw∈SY b(w; θ ) − infw∈SY b(w; θ ) < ∞ for all θ ∈ Θ . 10 Similar conditions are assumed in Dacunha-Castelle and Florens-Zmirou (1986). Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

8

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Corollary 4.3.

Suppose that the conditions of Proposition 3.2 and Theorem 4.1 hold. In addition, suppose that:

(B6) For any ξ > 0, the following functions are n-times continuously differentiable: (u, θ ) ↦ → minu′ ∈[−ξ ,ξ ] b(u + u′ ; θ ) and (u, θ ) ↦ → maxu′ ∈[−ξ ,ξ ] b(u + u′ ; θ ). Then the points of discontinuity of the mapping θ ↦ → pˆ ∆ (v, w; θ ) form a null set. In addition, any nth partial derivative of pˆ ∆ (v, w; θ ) with respect to θ is an unbiased estimator of the corresponding partial derivative of p∆ (v, w; θ ). That is, for i1 , . . . , in ∈ {1, . . . , r },

E∗θ ∂in1 ,...,in pˆ ∆ (v, w; θ ) = ∂in1 ,...,in p∆ (v, w; θ ).

]

[

The assumptions of Proposition 3.2 together with Assumption (B6) imply that the function fˆt in (15) is almost surely n-times continuously differentiable with respect to all its arguments. This is a necessary condition for the differentiability of the density estimator pˆ ∆ . Note that the number of factors in (15) may change when the parameter changes, so that pˆ ∆ may not be smooth at any parameter for which τi,pi = ηi . However, the event {τi,pi = ηi } is a zero-probability event, making discontinuity a zero-probability phenomenon. 5. Implementation of density estimator This section explains the practical implementation of the transition density estimator (16). The algorithms stated below have been implemented in R and are available for download at http://www.gustavo-schwenkler.com. 5.1. Algorithms Fix L > 0, ℓ ≥ 0 and ρ ∈ R. To generate a sample of pˆ ∆ (v, w; θ ), we require a vector R = (P, T, E, W, V, D) of variates with the following properties:

• • • • • •

∗ P ∼ Poisson(ℓ∆) is a sample of the jump count N∆ under P∗θ ∗ T = (Tn )1≤n≤P is a sample of the jump times (Tn )1≤n≤N ∗ under P∗θ ∆

E = (Enk )1≤n≤P+1,k≥1 is a collection of i.i.d. samples of the exit time η W = (Wni )1≤n≤P+1,i≥1 is a collection of i.i.d. uniforms on {−L, L} V = (Vni,j )1≤n≤P+1,i,j≥1 is a collection of i.i.d. standard uniforms D = (Dn )1≤n≤P is a collection of i.i.d. samples from the density π

The variates P and D can be sampled exactly using standard methods. The Poisson jump times T can be generated exactly as the order statistics of P uniforms on [0, ∆]. The collection E of exit times can be generated exactly using an acceptance–rejection scheme; see Section 4.1 of Chen and Huang (2013). This scheme uses gamma variates. The sampling of the remaining variates is trivial. Algorithm 5.1 (Computation of Density Estimator). For given v, w ∈ S , θ ∈ Θ , do: (i) Set Y0x = x = F (v; θ ) and y = F (w; θ ). (ii) For n = 1, . . . , P, do: (a) Draw samples of YTxn − and YTxn under P∗θ according to (7).

(b) Compute fˆTn −Tn−1 (YTxn−1 , YTxn − ; θ ) according to (15). (iii) Compute fˆ∆−TP (YTxP , y; θ ) according to (15). (iv) Compute pˆ ∆ (v, w; θ ) =

ea(y;θ )−a(x;θ )

( ) qˆ ∆−TP YTxP , y; θ fˆ∆−TP (YTxP , y; θ ) σ (w; θ ) P ∏ × c(YTxn − , Dn ; θ ) fˆTn −Tn−1 (YTxn−1 , YTxn − ; θ ). n=1

Only Steps (ii)b and (iii) of Algorithm 5.1 are nontrivial. The following algorithm details their implementation. Algorithm 5.2 (Computation of fˆt (u, z ; θ ) for Given t > 0 and u, z ∈ SY ). Let I = max{i ≥ 1 : E1 + · · · + Ei ≤ t } and set w1,0 = u and wi,0 = wi−1,0 + ρ Ei−1 + Wi−1 for i = 2, . . . , I + 1. For i = 1, . . . , I + 1, do: ∑I (i) Set tend = Ei if i = 1, . . . , I and tend = t − j=1 Ej if i = I + 1. ( ) (ii) Compute mi = min|u′ |≤L+|ρ|tend b(wi,0 + u′ ; θ ) and Mi = max|u′ |≤L+|ρ|tend b wi,0 + u′ ; θ − mi . Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

9

(iii) Draw samples of the jump times τi,1 , . . . , τi,pi of a dominating Poisson process with rate Mi : Set τi,0 = 0, τi,j = τi,j−1 − (log Vi,j )/Mi for j ≥ 1 while τi,j ≤ Ei , and pi = max{j ≥ 0 : τi,j ≤ tend }. (iv) Set wend = Wi if i = 1, . . . , I and wend = z − wI ,0 − ρ tend if i = I + 1 and compute a skeleton wi,1 , . . . , wi,pi of a Brownian bridge starting at 0 at time 0 and ending at wend at time tend . (v) Compute the normalizing factor ei = exp(−mi tend ). A sample of fˆt (x, y; θ ) is given by pi I +1 ∏ ∏

ei

i=1

(

b wi,0 + wi,j + ρτi,pj ; θ − mi

(

1−

)

Mi

j=1

) .

The correctness of Algorithm 5.2 follows Chen and Huang (2013) after noting that an exact sample of the first jump time of a Poisson process with rate Mi is given by − log(U)/Mi , where U is standard uniform. This observation is used in Step (iii). The skeleton of a Brownian bridge required in Step (iv) can be sampled exactly using the procedure outlined in Section 6.4 of Beskos et al. (2009), which constructs a skeleton in terms of three independent standard normals. Note that the vector of basic random variates R used in the algorithms above does not depend on the arguments (v, w; θ ) of the density estimator (16). Thus, a single sample of R suffices to generate samples of pˆ ∆ (v, w; θ ) for any v, w ∈ S and θ ∈ Θ . This property generates significant computational benefits for the maximization of the simulated likelihood based on the estimator (16); see Section 6. 5.2. Selection of tuning parameters We discuss the optimal selection of the level L, the Poisson rate ℓ, and the drift ρ . While these parameters have no impact on the unbiasedness of the density estimator, they influence the variance and computational efficiency of the estimator. The Poisson rate ℓ governs the frequency of the jumps under P∗θ . If ℓ is small, then TN∗∗ ≈ 0 with high probability ∆ and the estimator (16) approximates the density p∆ as a weighted Gaussian density, ignoring the jumps of Y x . Thus, the estimator (16) has large variance in the tails. On the other hand, the computational effort required to evaluate the estimator increases with ℓ. This is because E∗θ [P] = ℓ∆ so that Step (ii)b of Algorithm 5.1 is repeated more frequently for large values of ℓ. The level parameter L controls the number of iterations I + 1 in Algorithm 5.2. Note that P∗θ [|Ytx | > L] → 0 as L → ∞ for any fixed 0 ≤ t ≤ ∆ because Y x is non-explosive under P∗θ . Thus, the larger L, the smaller I, and the fewer iterations of Algorithm 5.2 are needed to compute (15). On the other hand, large values of L make Mi large, which increases pi in (15) for all 1 ≤ i ≤ I + 1. This, in turn, increases both the variance of the thinning estimator (15) and the computational effort required to evaluate it. Similarly, large positive or negative values of ρ also make Mi large, increasing both the variance of the density estimator and the computational effort necessary to evaluate it. We propose to choose the quantities ℓ, L, and ρ so as to optimally trade off computational effort and variance. We adopt the efficiency concept of Glynn and Whitt (1992) for simulation estimators, defining efficiency as the inverse of the product of the variance of the estimator and the work required to evaluate the estimator. Thus, we select ℓ, L, and ρ as the solution of the optimization problem inf

sup Var∗θ pˆ ∆ (v, w; θ ) × R(v, w; θ ),

(

ℓ≥0,L>0,ρ∈R v,w∈S

)

(17)

θ∈Θ

where R(v, w; θ ) is the time required to compute the estimator for given v , w , θ , ℓ, L, and ρ . The solution of this optimization problem leads to a density estimator that is efficient across the state and the parameter spaces.11 The problem (17) is a non-linear optimization problem with constraints, which can be solved numerically using standard methods. A run of Algorithm 5.1 yields R(v, w; θ ) for given v , w , θ , ℓ, L, and ρ . An additional run of a slight variant of Algorithm 5.1 yields an unbiased estimator of the second moment E∗θ [ˆp2∆ (v, w; θ )], which serves as an upper bound for Var∗θ (pˆ ∆ (v, w; θ )). 6. Simulated likelihood estimators This section analyzes the asymptotic behavior of the simulated likelihood estimator of the parameter θ of the jump– diffusion process X . Let pˆ K∆ be a transition density estimator on K ∈ N Monte Carlo samples of (16). The simulated ∏ based K ˆ K (θ ) = m ˆ likelihood function of θ at the data X is given by L p (X n=1 ∆ (n−1)∆ , Xn∆ ; θ ); this is the Monte Carlo counterpart of the true likelihood L discussed in Section 2. A simulated maximum likelihood estimator (SMLE) θˆmK satisfies almost surely θˆmK ∈ arg maxθ ∈Θ Lˆ K (θ ). Conveniently, the maximization of the simulated likelihood is effectively a deterministic optimization problem. We draw K Monte Carlo samples of the basic random variate R to construct the density estimator pˆ K∆ (v, w; θ ); see Section 5. ∗ 11 One could also choose ‘‘locally’’ optimal parameters by solving min ˆ ∆ (v, w; θ ))R(v, w; θ ) for each (v, w; θ ). However, this may be ℓ,L>0,ρ∈R Varθ (p computationally burdensome.

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

10

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Because pˆ K∆ (v, w; θ ) is a deterministic function of (v, w, θ ) given the samples of R, it can be evaluated at various data ˆ K (θ ) points (v, w ) = (X(n−1)∆ , Xn∆ ) without re-simulation. Thus, given the samples of R and the data X, the likelihood L is a deterministic function of the parameter θ . There is no need to re-simulate the likelihood during the optimization, eliminating the need to deal with a simulation optimization problem. We study the properties of a SMLE θˆmK . We first establish that θˆmK is asymptotically unbiased. A sufficient condition for ˆ K = L almost surely and uniformly over the parameter space Θ . The strong law of large this property is that limK →∞ L numbers and the continuous mapping theorem imply that the above convergence occurs almost surely in our setting, but they do not provide uniformity of the convergence. We use the strong law of large numbers for random elements in separable Banach spaces to prove uniform convergence, exploiting the compactness of the parameter space Θ and the ˆ K takes values in a separable Banach space (see, e.g., Beskos et al. (2009) and Straumann and Mikosch (2006)). fact that L Conveniently, asymptotic unbiasedness implies (strong) consistency of a SMLE if a MLE is (strongly) consistent. Theorem 6.1. Suppose the conditions of Theorem 3.1 hold, and assume the conditions of Corollary 4.3 for differentiation up to first order. Moreover, suppose that:

[

]

(C1) For any v, w ∈ S , E supθ ∈Θ pˆ ∆ (v, w; θ ) < ∞. Then any SMLE θˆmK is an asymptotically unbiased estimator of θˆm , i.e., θˆmK → θˆm almost surely as K → ∞. If a MLE θˆm is (strongly) consistent, then a SMLE θˆmK is also a (strongly) consistent estimator of the true parameter θ ∗ if K → ∞ as m → ∞. In other words, θˆmK → θ ∗ in Pθ ∗ -probability (almost surely) as m → ∞ and K → ∞. Theorem 6.1 states that, for a given realization of the data X, a SMLE θˆmK converges to a theoretical MLE as the number of Monte Carlo samples K → ∞. This implies that a SMLE inherits the consistency of a MLE if more Monte Carlo samples are used as more data becomes available. Condition (C1) is mild and implies that the simulated likelihood is bounded in expectation. How many Monte Carlo samples K of the density estimator (16) need to be generated for each additional observation of X ? In general, the number of samples will influence the variance of the density estimator and this will affect the asymptotic distribution of a SMLE. Standard Monte Carlo theory asserts that the error from approximating the true transition density p∆ by the estimator pˆ K∆ is of order O(K −1/2 ). Thus, the Monte Carlo error arising from using pˆ K∆ instead of p∆ for a single observation vanishes as K → ∞. However, the aggregate Monte Carlo error associated with the simulated likelihood ˆ K may explode as m → ∞ if K is not chosen optimally in accordance with m. The following theorem indicates function L the optimal choice. Theorem 6.2. Suppose the conditions of Theorem 6.1 hold, and assume the conditions of Corollary 4.3 for differentiation up to second order. In addition, suppose the following condition holds. (C2) The mapping θ ↦ → p∆ (v, w; θ ) is three-times continuously differentiable for any v, w ∈ S . )] [ ( pˆ ∆ (v,w;θ ) < ∞ . p (v,w;θ )

(C3) For any θ ∈ int Θ , supv,w∈S Var∗θ ∇





→ c ∈ [0, ∞) as m → ∞ and K → ∞, then a SMLE θˆmK is asymptotically normal. That is, m(θˆmK − θ√∗ ) → N 0, Σθ−∗1 If in Pθ ∗ -distribution as m → ∞ and K → ∞. On the other hand, if m → ∞ as m → ∞ and K → ∞, then K (θˆmK − θ ∗ ) → 0 K almost surely. m K

(

)

A SMLE inherits the asymptotic normality property under some conditions if m converges to some finite constant. K There is no loss of efficiency when estimating θ using the simulated likelihood rather than the theoretical likelihood. The Monte Carlo variance does not impact the asymptotic distribution of a SMLE. If the number of Monte Carlo samples K grows fast enough, then the Monte Carlo variance vanishes in the limit as m → ∞. This is guaranteed by the choice K = O(m).12 Sufficient conditions for Condition (C2) are given in Proposition 3.2. Condition (C3) is mild but necessary. It implies that the variance of the simulated score function is finite. That our Monte Carlo approximation of the transition density does not affect the asymptotic distribution of the estimator is a consequence of the unbiasedness of our density approximation. To appreciate this feature, consider a conventional Monte Carlo approximation of the transition density, where one first approximates X on a discrete-time grid and then applies a nonparametric kernel to the Monte Carlo samples. In the special case of a diffusion that is approximated using an Euler scheme, Detemple et al. (2006) show that this approach distorts the asymptotic distribution of the likelihood estimator. More precisely, letting θˆmEuler denote the estimator obtained from Km i.i.d. Monte Carlo samples of the Euler approximation of X∆ that are based on km√discretization steps, Theorem 12) of Detemple et al. (2006) implies ( that if Km → ∞ and km → ∞ as m → ∞, then m(θˆmEuler − θ ∗ ) → N β, Σ Euler as m → ∞ in Pθ ∗ -distribution, √

12 In the case of a diffusion, Beskos et al. (2009) obtain a result similar to Theorem 6.2 under the assumptions that K = O( m) and that independent samples of their density estimator are evaluated for each data point. In contrast, we require K = O(m) because we assume that the same samples of R are re-used for all data points. Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

11

where either β ̸ = 0 or Σ Euler ̸ = Σθ−∗1 . In particular, Detemple et al. (2006) show that β = 0 and Σ Euler = Σθ−∗1 cannot hold simultaneously. Thus, this approach either generates size-distorted asymptotic standard errors or is inefficient.13 Our exact Monte Carlo approach, in contrast, facilitates efficient parameter estimation and produces correct asymptotic standard errors at the same time. It eliminates the need to discretize X and generates an estimator that has the same asymptotic distribution as a true MLE. 7. Numerical results This section illustrates the performance of the density and simulated likelihood estimators. We consider two alternative models. The first is the mean-reverting interest rate model of Das (2002). We specify the jump–diffusion SDE (1) by choosing the following functions for θ = (κ, X¯ , σ , l0 , l1 , γ1 , γ2 ) ∈ R+ × R × R3+ × R × R+ , x ∈ S = R, and d ∈ D = R: µ(x; θ ) = κ (X¯ − x), σ (x; θ ) = σ , Γ (x, d; θ ) = γ1 + γ2 d, and Λ(x; θ ) = l0 + l1 x. The jump–diffusion X has dynamics dXt = κ (X¯ − Xt )dt + σ dBt + dJt ,

(18)

∑Nt

where Jt = n=1 (γ1 + γ2 Dn ) and N is a counting process with state-dependent intensity λt = l0 + l1 Xt . The marks (Dn )n≥1 are i.i.d. standard normal. We choose the parameter space as Θ = [0.0001, 3] × [−1, 1] × [0.0001, 1] × [0.0001, 100] × [−10, 10]×[−0.1, 0.1]×[0.0001, 0.1]. The true parameter θ ∗ is taken as (0.8542, 0.0330, 0.0173, 54.0500, 0.0000, 0.0004, 0.0058) ∈ int Θ , the value estimated by Das (2002) from daily data of the Fed Funds rate between 1988 and 1997.14 We take X0 = X¯ = 0.0330. The second model we consider is the double-exponential stock price model of Kou (2002). For this model, we set θ = (µ, σ , l0 , l1 , p, η1 , η2 ) ∈ R × R+ × R2+ × [0, 1] × R2+ , x ∈ S = R+ , d ∈ D = R2+ × [0, 1], µ(x; θ ) = µx, σ (x; θ ) = σ x, Λ(x; θ ) = l0 + l1 x, and

Γ (x, d; θ ) =

{

x(ed1 /η1 − 1), x(e−d2 /η2 − 1),

if d3 < p, otherwise.

This results in a jump–diffusion X with dynamics dXt = µXt − dt + σ Xt − dBt + Xt − dJt with Jt =

∑Nt

n=1 (e

Un

(19)

− 1) for a counting process N with state-dependent intensity λt = l0 + l1 Xt . The random variable (1)

Un has an asymmetric double exponential distribution with Un = (1)

(2)

(3)

(1)

(2)

Dn

η1

(3)

(3)

if Dn < p, and Un = −

(2)

Dn

η2

otherwise, for a mark

variable Dn = (Dn , Dn , Dn ) that satisfies Dn , Dn ∼ Exp(1) and Dn ∼ Unif[0, 1]. We choose Θ = [−1, 1] × [0.0001 × 1] × [0.0001, 100] × [−1, 1] × [0.001, 100]2 . The true parameter is θ ∗ = (0.15, 0.20, 10, 0, 0.3, 1/0.02, 1/0.04) ∈ int Θ , which is consistent with the choice of Kou (2002). We take X0 = 10. Model (18) is affine in the sense of Duffie et al. (2000), allowing us to compute the ‘‘true’’ transition density of X by Fourier inversion of the characteristic function of X . For Model (19), only the process log X is affine. We can nonetheless recover the transition density of log X in semi-analytical form via Fourier inversion, and then compute the density of X via a change of variables. The density estimators derived from Fourier inversion in these ways serve as benchmarks for Models (18) and (19). We implement the Fourier inversion via numerical quadrature with 103 discretization points in [−103 , 103 ]. The characteristic functions of X in Model (18) and log X in Model (19) are known in closed form if l1 = 0. When l1 ̸ = 0, the characteristic functions solve sets of ordinary differential equations that need to be solved numerically. We use a Runge–Kutta method based on 50 time steps to numerically solve these differential equations. We save in advance K ∈ N i.i.d. samples of the basic random variates R of Section 5, where we restrict the number P to be at most 2000. This bound is only exceeded with very small probability by our model specification and implementation. If exceeded, we sequentially generate additional basic random variates to ensure unbiasedness of our density estimator. The numerical results reported below are based on an implementation in R, running on an 2 × 8-core 2.6 GHz Intel Xeon E5-2670, 128 GB server at Boston University with a Linux Centos 6.4 operating system. All R codes can be downloaded at http://www.gustavo-schwenkler.com. 7.1. Transition density estimator We begin by verifying the conditions of Sections 3 and 4 ensuring the unbiasedness and convergence of our density estimator. The coefficient functions in Models (18) and (19) are infinitely-often continuously differentiable so that Conditions (A1) and (B1) are satisfied for both models. Furthermore, both models have linear drift functions and nonexplosive point processes N. Blanchet and Ruf (2016) therefore tell us that Assumption (A2) is satisfied in both models. 13 Efficiency can be achieved if the number of Euler discretization steps k is chosen according to the square-root rule of Duffie and Glynn (1995), m √ i.e., km = O( Km ). Detemple et al. (2006) develop an improved discretization scheme for diffusions for which β = 0. For jump–diffusions with state-independent coefficients, Kristensen and Shin (2012) show that β = 0 can be achieved under conditions on the kernel. 14 Das (2002) assumed l = 0. 1

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

12

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Assumption (B2) is satisfied in both models because the intensity function Λ is linear. As a result, Theorems 3.1 and 4.1 hold for both models. For Model (18), we have b(y; θ ) = l0 + l1 (σ y + X0 ) − ℓ + c(y, d; θ ) =

ΓY (y, d; θ ) =

l0 + l1 (σ y + X0 )



e

1

((

2

) )2 κ (X¯ − X0 ) 2 − κy − ρ − κ , σ

( ) κ (X¯ −X0 ) γ +γ d − −ρ 1 σ 2 σ

κ

e

2

(

γ +γ d (γ +γ d)2 2y 1 σ 2 + 1 22 σ

)

,

γ1 + γ2 d . σ

Assumption (B4) and (B5) are therefore satisfied with k∗ = 1. Assumption (B3) is also satisfied given that infy∈SY b(y; θ ) > ∞. For Model (19), b(y; θ ) = l0 + l1 X0 eσ y − ℓ +

1 (µ 2 (σ d

l0 + l1 X0 eσ y ( µσ − σ2 −ρ ) c(y, d; θ ) = e

{ ΓY (y, d; θ ) =

ℓ , ση

if d3 < p,

− ση ,

otherwise.

d1

1

d2

2



σ )2 2

− ρ2, d

1 2 σ η1 1{d3
)

,

It follows that infy∈SY b(y; θ ) > ∞. Assumptions (B3)–(B5) are thus satisfied with k∗ = 0. It follows that Proposition 4.2 also holds in both models, and the density estimator pˆ ∆ (v, w; θ ) is unbiased with finite variance. Consequently, the Monte Carlo estimator pˆ K∆ (v, w; θ ) converges at square-root rate to the true density in both models. We evaluate the accuracy of the unbiased density (UD) estimator. Figs. 1–3 show pˆ K∆ (X0 , w; θ ∗ ) for Models (18) and 1 (19), ∆ ∈ { 12 , 14 , 21 }, and each of several K , along with pointwise empirical confidence bands obtained from 103 bootstrap 15 samples. We compare the UD estimator with several simulation-based alternatives:

• A Gaussian kernel estimator obtained from K samples of X∆ generated with the exact method of Giesecke and Smelov (2013).

• A Gaussian kernel estimator obtained from K samples of X∆ generated√with the Euler discretization method of Giesecke et al. (2019). The number of discretization steps is taken as and Glynn (1995).

K , as suggested by the results of Duffie

The UD estimator oscillates around the true transition density, which is governed by the expectation Ψ∆ in (9). The expectation Ψ∆ can be viewed as a weighted sum of infinitely many normal distributions. The UD estimator approximates this infinite sum by a finite sum. The oscillations of pˆ K∆ correspond to the normal densities that are mixed by the UD estimator (16). The amplitude of the oscillations of pˆ K∆ is large for small values of K . As K increases, the amplitude of the oscillations decreases and the confidence bands become tight. This confirms the unbiasedness result of Theorem 4.1. If the number K of Monte Carlo samples is sufficiently large, the UD estimator pˆ K∆ accurately approximates the transition density over the entire range of X . In contrast, both kernel density estimators are biased. The biases are relatively large in the tails of the distribution; see Fig. 2. They are also large when the time ∆ between consecutive observations is large, as can be seen in Fig. 3. By construction, our density estimator respects the boundary of the state space S . That is, our density estimator assigns no probability mass to values outside of S . Fig. 4 illustrates this property for Model (19), in which X has the constraint state space S = [0, ∞). The 90% confidence bands of pˆ ∆ (X0 , w; θ ∗ ) are tight and close to zero for very small values of w regardless of the size of X0 . Although not displayed in Fig. 4, we know that the Gaussian kernel estimator derived from exact samples of X∆ will also restrict itself to S . If the samples are derived from Euler’s method, then the Gaussian kernel estimator may not satisfy this property because Euler’s method does not ensure that the approximate solution of the SDE (19) will stay within the state space S . Fig. 4 also shows that our density estimator is always non-negative. This property holds because our estimator is a mixture of Gaussian densities and the weights are non-negative. Although not displayed, the kernel density estimators are also always non-negative. However, Fig. 4 shows that the Fourier inverse estimator may become negative for extreme values of the state space S . This occurs because the numerical inversion of the Fourier transform may become numerically unstable in certain situations. 7.2. Computational efficiency We run three experiments to assess the computational efficiency of our UD estimator. We start by analyzing the convergence of the root-mean-squared error (RMSE) of the estimator pˆ K∆ (v, w; θ ) at a random parameter θ and 120 15 We find that the solutions of (17) yielding optimal configurations of the density estimator are ℓ = 27.70, L = 2.10 and ρ = 0.36 for Model (18) and ℓ = 20.94, L = 8.19, and ρ = 0.42 for Model (19). The optimizations were solved using the Nelder–Mead method. Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

13

Fig. 1. Estimator pˆ K∆ (X0 , w; θ ∗ ) for Model (18) with K ∈ {1000, 5000, 10000}, w ∈ [0, 0.1] and ∆ = 1/12, along with 90% bootstrap confidence bands, as well as the true transition density and kernel estimators obtained from Euler’s method and exact simulation.

randomly selected points (v, w ). The bias of the RMSE is computed relative to the ‘‘true’’ density obtained by Fourier inversion. We take ∆ = 1/12 for Model (18) and ∆ = 1/4 for Model (19). Repeated calculations of the transition density at the points (v, w ) = (X(n−1)∆ , Xn∆ ) are required for evaluating the likelihood. Thus, our analysis also indicates the efficiency of computing the likelihood given 10 years of monthly data for Model (18), and 30 years of quarterly data for Model (19). Fig. 5 shows the RMSE of pˆ K∆ (v, w; θ ) as a function of the time required to evaluate the estimator at a randomly selected θ ∈ Θ and 120 randomly selected pairs (v, w) ∈ S . It also shows the RMSE for the alternative estimators discussed above. The UD estimator has the fastest error convergence rate. It also has the smallest RMSE when the time between observations is small or when the available computational budget is large, consistent with the asymptotic computational efficiency concept of Glynn and Whitt (1992). We study the computational effort required estimate the likelihood for different sample sizes. To this end, we select a random parameter θ ∈ Θ and m random pairs (v, w ) ∈ S , and measure the average time it takes to evaluate pˆ K∆ (v, w; θ ) across all m pairs (v, w ) for ∆ = 1/12 and K = 103 . Fig. 6 shows the average run time as a function of m for Model (18). It also compares our average run times to the average run times associated with alternative density estimators. Our density estimator involves the smallest per-observation cost when evaluating the likelihood for a given m. Further, the per-observation cost decreases as m grows. Similar findings hold for Model (19) and alternative choices of ∆. The UD estimator performs well in these experiments for two reasons. First, the samples of the basic random variates R used to compute the density estimator (see Section 5) need to be generated only once. They can be reused to compute pˆ K∆ (v, w; θ ) for any v, w ∈ S and any θ ∈ Θ . This yields the small and decreasing per-observation cost for estimating the likelihood. Second, our density estimator is unbiased. Alternative estimators do not exhibit both of these properties. Kernel density estimation introduces bias, which slows down the rate of convergence for the RMSE. Euler discretization introduces additional bias. As a result, the discretization-based density estimator has the slowest rate of convergence. For the other simulation-based estimator, which is based on exact samples of X∆ , the samples of the process must be re-generated for every pair (v, w ) at which the transition density is estimated. This increases computational costs. Finally, the Fourier density estimator of Duffie et al. (2000) is essentially error-free. However, evaluating this estimator requires numerically solving a system of ordinary differential equations for each pair (v, w ), which increases the computational costs. Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

14

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Fig. 2. Estimator pˆ K∆ (X0 , w; θ ∗ ) for Model (19) with K ∈ {1000, 5000, 10000}, w ∈ [0, 0.1] and ∆ = 1/12, along with 90% bootstrap confidence bands, as well as the true transition density and kernel estimators obtained from Euler’s method and exact simulation. The y-axis in this plot is displayed in log scale.

Fig. 3. Estimator pˆ K∆ (X0 , ·; θ ∗ ) for Model (19) with K = 1000, X0 = 10, and ∆ ∈ { 14 , 12 }, along with 90% bootstrap confidence bands, as well as the true transition density and kernel estimators obtained from Euler’s method and exact simulation. The y-axes of the right plots are displayed in log scale.

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

15

Fig. 4. Estimator pˆ K∆ (X0 , ·; θ ∗ ) for Model (19) with K = 1000, X0 ∈ {0.1, 0.01, 0.001, 0.0001}, and ∆ = 1/12, along with 90% bootstrap confidence bands, as well as the true transition density.

Fig. 5. Root-mean-squared error (RMSE) of different density estimators as a function of computation time. The RMSE is the square root of the average squared error of an estimator of p∆ (v, w; θ ) over 120 randomly selected pairs (v, w ) and a randomly selected parameter θ . For Model (18) with ∆ = 1/12, we randomly select (v, w ) ∈ [0, 0.1]2 and θ = (0.9924, 0.0186, 0.0345, 32.6581, 2.3996, 0.0006, 0.0039) ∈ Θ . For Model (19) with ∆ = 1/4, we randomly select (v, w ) ∈ [5, 15]2 and θ = (−0.1744, 0.2342, 28.4388, 0.5418, 0.9278, 80.8277, 69.6841) ∈ Θ .

A Monte Carlo estimator of the density has the implicit benefit that it can be computed in parallel using multiple processors. If a density estimator requires K i.i.d. Monte Carlo samples and Kp processors are available, then each processor only needs to compute KK Monte Carlo samples. We analyze the computational gains generated by parallel computing our p

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

16

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Fig. 6. Per-observation run time required to compute the likelihood of Model (18) with different density estimators as a function of the sample size m. The per-observation run time is measured as the average time necessary to estimate the density p∆ (v, w; θ ) at m randomly selected pairs (v, w ) ∈ [0, 0.1]2 for ∆ = 1/12 and θ = (0.9924, 0.0186, 0.0345, 32.6581, 2.3996, 0.0006, 0.0039) selected randomly from Θ . We take K = 1000 for the UD estimator. For the Kernel (Euler) estimator, we use 1000 Euler samples of X∆ as to achieve roughly the same RMSE (see Fig. 5). Due to computational constraints, for the Kernel (Exact) estimator we use only 500 exact samples of X∆ . Table 1 Run times of the different operations necessary to compute the density estimator pˆ K∆ (v, w; θ ) for Model (18) at a random parameter θ ∈ Θ and 120 random (v, w) ∈ [0, 0.1]2 using Kp processors. Processors Kp 1

2

4

6

Generating K i.i.d. samples of R Computing one sample of pˆ ∆ given R Computing pˆ K∆ given K samples of R for K = 1000

9.84 0.10 98.86

9.84 0.10 53.64

9.84 0.10 31.93

9.84 0.10 27.19

Total run time in seconds Speed-up factor

108.70

63.48 1.71

41.77 2.60

37.03 2.94

density estimator. For Model (18), we measure the time it takes to generate K = 1000 i.i.d. samples of the basic random variates R and parallel compute pˆ ∆ (v, w; θ ) at a random parameter θ ∈ Θ and 120 random pairs (v, w ) ∈ [0, 0.1]2 using Kp ∈ {1, 2, 4, 6} processors. Table 1 shows that parallelization results in a significant reduction of the run time necessary to evaluate our density estimator. Increasing the number of processors from 1 to 4 reduces the total run time by a factor of 2.6. There are other possibilities to further reduce the run time required to evaluate our UD estimator. Graphics processing units (GPUs) may be used, for example. 7.3. Simulated likelihood estimators A Monte Carlo analysis illustrates the properties of the simulated maximum likelihood estimators (SMLE). We generate 100 samples of the data X = {X0 , X∆ , . . . , Xm∆ } from the law Pθ ∗ for m = 600 and ∆ = 1/12 for Model (18), and m = 400 and ∆ = 1/4 for Model (19) using the exact algorithm of Giesecke and Smelov (2013). This corresponds to 50 years of monthly data for Model (18) and 100 years of quarterly data for Model (19). For each data sample, we compute an SMLE θˆmK by maximizing the simulated likelihood Lˆ K , and an MLE θˆm by maximizing the likelihood obtained from the true transition density. The Nelder–Mead method, initialized at θ ∗ , is used to numerically solve the optimization problems. In accordance with Theorem 6.2, we choose K = 10m for Model (18) and K = 15m for Model (19) in order to guarantee asymptotic normality of the SMLE. We test the asymptotic unbiasedness of a SMLE (Theorem 6.1). Table 2 compares the average deviation E[θˆmK − θˆm ] of a SMLE from a MLE to the average deviation (E[(θˆm − θ ∗ )2 ])1/2 of a MLE from the true parameter, for Model (18) with ∆ = 1/12 in Panel A, and Model (19) with ∆ = 1/4 in Panel B. The expectations are estimated by sample averages across all 100 data sets. The values in Table 2 show that the average ‘‘error’’ of a SMLE is small when compared to the average error of a MLE. The null hypothesis that the error of a SMLE is equal to zero cannot be rejected for any model parameter over any horizon based on the asymptotic distribution implied by Theorem 6.2. This verifies the asymptotic unbiasedness property. We analyze the of a SMLE. Tables 3 and 4 compare the mean and the standard deviation of √ finite-sample distribution √ the scaled errors m(θˆmK −θ ∗ ) for a SMLE and m(θˆm −θ ∗ ) for a MLE across all 100 data sets for Models (18) and (19). They √ √ also indicate the theoretical asymptotic means and standard deviations of m(θˆmK − θ ∗ ) and m(θˆm − θ ∗ ) in accordance Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

17

Table 2 Average deviation of a SMLE from a MLE and average deviation of a MLE from the true parameter θ ∗ over all 100 data samples for Models (18) and (19). Model (18), ∆ = 1/12 m = 120, K = 1200

k X¯

σ l0 l1

γ1 γ2

m = 600, K = 6000

E[θˆmK − θˆm ]

(E[(θˆm − θ ∗ )2 ])1/2

E[θˆmK − θˆm ]

(E[(θˆm − θ ∗ )2 ])1/2

0.0898 0.0103 0.0138 −1.1346 −2.1320 −0.0003 0.0023

0.3627 0.1739 0.0173 3.2255 3.7097 0.0019 0.0027

0.3332 −0.0734 0.0159 −0.5079 −3.0502 0.0007 0.0006

0.4296 0.1161 0.0154 2.5916 4.1564 0.0008 0.0016

E[θˆmK − θˆm ]

(E[(θˆm − θ ∗ )2 ])1/2

E[θˆmK − θˆm ]

(E[(θˆm − θ ∗ )2 ])1/2

0.1218 0.1809 2.5005 −0.1475 0.0524 −0.7392 −1.4902

0.3465 0.1618 12.8842 0.8127 0.3797 14.1629 9.4348

0.1885 0.2882 15.6174 −0.1935 0.2666 −1.1473 −4.5843

0.2987 0.2293 14.2461 0.8247 0.3907 18.1835 12.1261

Model (19), ∆ = 1/4 m = 200, K = 3000

µ σ l0 l1 p

η1 η2

m = 400, K = 6000

with Theorem 6.2. For most parameters, the moments of the scaled error of a SMLE are similar to the corresponding moments of the scaled error of a MLE over all time horizons for both models. Based on the asymptotic standard errors indicated in the column ‘‘Asymp.’’ of Table 3, we cannot reject the null hypothesis that the differences between the scaled error moments of a SMLE and a MLE are equal to zero for any value of m. This tells us that the distribution of a SMLE is similar to the distribution of a MLE, confirming the results of Theorem 6.2. For Model (19) with ∆ = 1/4, Table 4 also shows the means and standard deviations of the scaled errors of the parameter estimators derived from the Kernel (Euler) estimator; see Section 7.1.16 This method is implemented such that the evaluation of the corresponding likelihood takes the same computational effort as the evaluation of the simulated likelihood derived from our density estimator. We see that the moments of the scaled parameter errors derived from the Kernel (Euler) estimator differ strongly from those of our SMLE and true MLE. The standard deviations are too small compared to the theoretical asymptotic distribution of an MLE. These findings suggest that the parameter estimators derived from Kernel estimation based on Euler discretization have distorted finite-sample and asymptotic distributions, consistent with the findings of Detemple et al. (2006). Furthermore, the findings indicate that in using up the same computational budget, our simulated likelihood estimators are more accurate approximations of true likelihood estimators than the parameter estimators derived from Kernel estimation based on Euler discretization. 7.4. Diffusion case Our density estimator also applies in the diffusion case. In that case, our estimator can entail computational advantages relative to the diffusion density estimator of Beskos et al. (2009). This is because we can specify a diffusion model in (1)–(2) by taking the jump magnitude Γ = 0, setting Λ ≡ ℓ and optimally selecting the tuning parameters (ℓ, ρ, L) to obtain a computationally efficient estimator (see Section 5.2). In a final analysis, we study the computational benefits of our approach in the diffusion case by repeating the experiment of Fig. 5 for three diffusive models: Model (18) with l0 = ℓ and l1 = γ1 = γ2 = 0, a CIR model for which X solves



dXt = 0.8542(0.08 − Xt )dt + 0.173 Xt dBt , and the logistic model of Beskos et al. (2009) for which X solves17 dXt = 0.1Xt (1 − 0.001Xt )dt + 0.1Xt dBt . 16 We do not carry out the analogous analysis using the Kernel (Exact) estimator from Section 7.1 because this would require that one re-simulates samples of X∆ for every evaluation of the density, which makes the numerical optimization of the corresponding likelihood approximation unstable and imprecise. 17 The parameter values for the CIR process are chosen so as to roughly match the annual dynamics of the S&P 500 index. For the logistic model we take the same parameters as Beskos et al. (2009). We do not consider a diffusive version of Model (19) because the estimator of Beskos et al. (2009) is trivial for this model.

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

18

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Table 3 √ √ Empirical means (‘‘M’’) and standard deviations (‘‘SD’’) of the scaled error m(θˆmK − θ ∗ ) of a SMLE and the scaled error m(θˆm − θ ∗ ) of a MLE, over 100 independent data samples for Model (18) with ∆ = 1/12. The table also shows the differences (‘‘Diff.’’) between the sample moments of the scaled errors of a SMLE and a MLE, as well as the average asymptotic moments (‘‘Asymp.’’) of a SMLE in accordance with Theorem 6.2. The asymptotic mean is zero, the asymptotic standard deviations are given by the average of the square-roots of the diagonal entries of Σθ−∗1 in Theorem 6.2 across all 100 data sets. m = 120, K = 1200 SMLE

MLE

Diff.

Asymp.



M SD

0.741 2.586

−0.243 3.965

0.984 −1.379

0 4.614



M SD

0.251 0.926

0.138 1.900

0.113

−0.974

0 2.089



M SD

0.091 0.241

−0.060 0.179

0.151 0.062

0 0.499



M SD

6.517 23.988

18.946 29.763

−12.429 −5.775

0 50.588



M SD

10.033 19.316

33.388 22.921

−23.355 −3.605

0 110.867



M SD

−0.003 0.043

−0.001 0.021

−0.003 0.022

0 0.161

M SD

0.009 0.104

−0.016 0.025

0.025 0.079

0 0.207

SMLE

MLE

Diff.

Asymp.

−1.012

−9.207

5.925

5.026

8.195 0.900

0 4.614

−0.182

m(kˆ − k∗ ) m(Xˆ¯ − X¯ ∗ ) m(σˆ − σ ∗ ) m(ˆl0 − l∗0 ) m(ˆl1 − l∗1 ) m(γˆ1 − γ1∗ )



m(γˆ2 − γ2∗ )

m = 600, K = 6000



M SD



M SD

1.507 2.376

−1.689

4.181

1.805

0 2.089



M SD

0.183 0.529

−0.219 0.309

0.403 0.220

0 0.499



M SD

32.369 103.731

46.822 42.760

−14.453

0 50.588

M SD

18.342 54.420

91.691 42.808

−73.349 11.612

0 110.867



M SD

0.006 0.034

−0.011 0.016

0.018 0.018

0 0.161



M SD

0.000 0.047

−0.013 0.037

0.013 0.010

0 0.207

m(kˆ − k∗ ) m(Xˆ¯ − X¯ ∗ ) m(σˆ − σ ∗ ) m(ˆl0 − l∗0 )



m(ˆl1 − l∗1 ) m(γˆ1 − γ1∗ ) m(γˆ2 − γ2∗ )

60.972

We select 120 random pairs (v, w ) ∈ S and evaluate the RMSE of our density estimator with the optimal choices of ℓ, L, and ρ across all 120 pairs of (v, w ).18 In Fig. 7 we plot the resulting RMSE for the two diffusive models against the time it takes to compute our density estimator for the 120 pairs (v, w ). We also show the analogous plot for the density estimator of Beskos et al. (2009).19 Because both our estimator and the estimator of Beskos et al. (2009) are unbiased, they converge at the same rate. However, our density estimator achieves smaller errors than the estimator of Beskos et al. (2009) for any given computational budget. Appendix A. Proofs

Proof of Theorem 3.1. We begin by showing that the transition density of Y x exists and deriving a representation for this density. For clarity, we write Y instead of Y x . Assumption (A2) implies that Z∆ is a valid Radon–Nikodym derivative and that the local Pθ -martingale (Zt )t ≤∆ given by Zt = Eθ [Z∆ | Ft ] is a uniformly integrable martingale. Thus, the measures 18 For the CIR model ℓ∗ = 0.11, L∗ = 9.67, and ρ ∗ = 0.43. For the logistic model ℓ∗ = 0.13, L∗ = 19.34, and ρ ∗ = −0.22. For Model (18) we use the same values as in Section 7.2. 19 The CIR model and Model (18) violate a key boundedness assumption of Beskos et al. (2009) so that their density estimator is not directly applicable. We circumvent this issue by incorporating the localization step of Chen and Huang (2013) into the density estimator of Beskos et al. (2009) in a similar fashion as in Section 4. We use the same localization boundary L as for our estimator. Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

19

Table 4 √ √ Empirical means (‘‘M’’) and standard deviations (‘‘SD’’) of the scaled error m(θˆmK − θ ∗ ) of a SMLE and the scaled error m(θˆm − θ ∗ ) of a MLE, over 100 independent data samples of Model (19) with ∆ = 1/4. When computing SMLE, we set K = 6000 for m = 400, and K = 3000 for m = 200. The table also shows the analogous scaled errors of the parameter estimators derived from the Kernel (Euler) estimator (‘‘Euler’’) as described in Section 7.1, and the average asymptotic moments (‘‘Asymp.’’) of a SMLE in accordance with Theorem 6.2. The asymptotic mean is zero, the asymptotic standard deviations are given by the average of the square-roots of the diagonal entries of Σθ−∗1 across all 100 data sets. For Kernel (Euler), we set K = 1400 for m = 200, and K = 2300 for m = 400. Model (19), ∆ = 1/4 m = 200

m = 400

Asymp.

SMLE

MLE

Euler

SMLE

MLE

Euler

4.147 5.372

0.446 4.771

0.057 0.717

0 20.445

µ

M SD

1.870 3.078

0.148 4.462

−0.018

σ

M SD

2.158 3.230

−0.401 1.001

−0.058

5.485 5.746

−0.311 0.843

−0.039

0.242

0.227

0 6.264

l0

M SD

92.747 132.869

57.385 166.556

9.990 14.138

371.237 264.964

66.263 182.975

8.424 15.685

0 181.520

l1

M SD

−0.994 2.167

1.092 11.421

0.862 1.503

−4.014

0.094 15.617

0.601 2.075

0 216.612

M

0.908

0.167

0.197

4.824

−0.578

0.114

0

η1

SD M SD

3.533 −18.765 114.614

4.749 −8.310 194.367

0.810 17.309 23.556

6.438 −76.052 307.697

5.500 −61.220 320.415

0.984 36.686 36.806

43.010 0 338.527

η2

M SD

6.459 141.653

27.533 130.804

24.319 26.252

−65.716

33.399 172.182

21.623 31.143

0 127.635

p

0.635

5.987

228.121

P∗ = P∗θ and Pθ are equivalent. In this case, the distribution of N under P∗ is the same as that of N ∗ . It follows for any measurable set A:20 P [Y∆ ∈ A] = E



[

1{Y∆ ∈A}

[ =

=

Z∆



E





E



ZTn∗

n

ZTn∗

−a(YT ∗ ) n

−a(YT ∗ )

e

n≥0

e

n≥0

[

]

1{Tn∗ ≤∆} E∗



[ 1{Y∆ ∈A} 1{N∆∗ =n}

E

∫∆ a(Y∆ ) − Tn∗ b(Ys ) e

[ 1{Y∆ ∈A} 1{T ∗

⏐ ]] ⏐ ⏐ FT ∗ ⏐ n ]]

∫∆ a(Y∆ )− T ∗ b(Ys ) n e

>∆} e n +1

⏐ ⏐ ⏐ FT ∗ ⏐ n

.

Consider the inner expectation in the previous equality. On the set {Tn∗ ≤ ∆},

E



∫ − T∆∗ b(Ys )ds

[

1{Y∆ ∈A} 1{T ∗ >∆} ea(Y∆ ) e n +1

n

] ⏐ ⏐ ∗ ⏐ FT n

⏐ ] [ ∫ ⏐ [ ] − ∆∗ b(Y )ds ⏐ = E∗ 1{Y∆ ∈A} ea(Y∆ ) e Tn s ⏐⏐ FTn∗ , Tn∗+1 > ∆ P∗ Tn∗+1 > ∆ ⏐ FTn∗ ⏐ )⏐ ] ( ] [ [ = E∗ 1{Y∆ ∈A} ea(Y∆ ) f∆−Tn∗ YTn∗ , Y∆ ⏐ FTn∗ , Tn∗+1 > ∆ P∗ Tn∗+1 > ∆ ⏐ FTn∗ ∫ ⏐ ( ) ( ) [ ] = q∆−Tn∗ YTn∗ , z ea(z) f∆−Tn∗ YTn∗ , z dz P∗ Tn∗+1 > ∆ ⏐ FTn∗ . A

The second equality above follows from iterated expectations. To see this, note that on the conditioning event {Tn∗+1 > ∆}, the process Y evolves on the interval [Tn∗ , ∆] as a P∗ -Brownian motion with drift ρ and initial value YTn∗ . Therefore, on {Tn∗ ≤ ∆} we get

[

E∗ e

∫ − T∆∗ b(Ys )ds n

] ⏐ ( ) ⏐ ∗ ⏐ FTn , Y∆ , Tn∗+1 > ∆ = f∆−Tn∗ YTn∗ , Y∆ .

That ‘‘conditionally Brownian’’ property of Y also allows us to invoke the Gaussian density q∆−Tn∗ in the third equality above. Given that all integrands are non-negative, Fubini’s Theorem yields

P [Y∆ ∈ A] =

∫ ∑ A n≥0

[ E



1{N∆∗ =n}

e

a(z)−a(YT ∗ ) n

ZTn∗

] q∆−Tn∗ YTn∗ , z f∆−Tn∗ YTn∗ , z

(

)

(

)

dz

20 Note that if ℓ = 0, then N ∗ = 0 and T ∗ = ∞ almost surely for n > 0. n ∆ Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

20

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

Fig. 7. Root-mean-squared error (RMSE) as a function of computation time in the diffusion case. The RMSE is the square root of the average squared 1 error of an estimator of p∆ (v, w; θ ) over 120 randomly selected pairs (v, w ) and ∆ = 12 . For the CIR model, we randomly select (v, w ) ∈ [0.02, 0.16]2 . 2 For the logistic model, we randomly select (v, w ) ∈ [600, 800] . For Model (18), we randomly select (v, w ) ∈ [0, 0.1]2 .

=

∫ ∑

[ ∗

E

A n≥0

where, by convention, Note that ZT ∗

k−1

ZT ∗

n

q∆−Tn∗ YTn∗ , z f∆−Tn∗ YTn∗ , z

(

(

)

n ZT ∗ )∏ k−1 k=1

∏0

k=1

( (

1{N∆∗ =n} e

a(z)−a(YT ∗ )

ZT ∗

] dz ,

k

= 1.

)

(

= exp a YTk∗ − a YTk∗−1

( ∫ c YT ∗ − , Dk exp − k

)) (

)

k

Tk∗ Tk∗−1

) b(Ys )ds .

∗ (YT ∗ )k≤n is a Markov chain on the event {N∆ = n}. A repeated application of iterated expectations conditioning on the k σ -algebras generated by (Yt , Nt∗ )t ∈[0,Tk∗−1 ]∩[Tk∗ ,∆] thus yields: ∫ ∑ [ ( ) ( ) P [Y∆ ∈ A] = E∗ 1{N∆∗ =n} ea(z)−a(Y0 ) q∆−Tn∗ YTn∗ , z f∆−Tn∗ YTn∗ , z A n≥0

×

n ∏

c(YT ∗ − k

k=1

,

Dk )fT ∗ −T ∗ k k−1

(

YT ∗ k−1

,

YT ∗ − k

)

]



ea(z)−a(Y0 ) Ψ∆ (Y0 , z ; θ )dz .

dz = A

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

21

It follows that the transition density of Y exists and satisfies pY∆ (x, y; θ ) = ea(y)−a(x) Ψ∆ (x, y; θ ). Given that Y∆ = F (X∆ ; θ ) and F is invertible, the change-of-variable rule implies that the transition density of X also exists and is given by (3). □ Proof of Proposition 3.2. We will show that Conditions (A1)–(A4) of Glasserman (2003, Section 7.2.2) are satisfied. Note that the P∗θ -evolution of Y x is driven by a standard Brownian motion W and a Poisson process N ∗ with rate ℓ such that

∑N ∗

t x ∗ Ytx = x + ρ t + Wt + n= 1 ΓY (YTn − , Dn ; θ ) for Dn ∼ π . Thus, Yt is pathwise differentiable as in Section 7.2 of Glasserman (2003). Assumption (A3) therefore implies that Condition (A1) of Glasserman (2003, Section 7.2.2) is satisfied. Let H denote the integrand of Ψ∆ (x, y; θ ) in (9) so that Ψ∆ (x, y; θ ) = E∗θ [H(x, y; θ )]. Under P∗θ , N ∗ is a Poisson process with rate ℓ and therefore does not depend upon θ . The Gaussian density q is n-times continuously differentiable in all of its arguments, as is the function c under Assumption (A3). Because of the Brownian bridge structure,

[

t

( ∫

ft (u, z ; θ ) = E∗θ exp −

0

)]

( ) s s b u + (z − u) − ρ s + Ws − Wt ; θ ds t t

for t ∈ [0, ∆]. Assumption (A4) implies that the order of integration and differentiation can be interchanged for ft so that Assumption (A3) implies that ft is also n-times continuously differentiable in all of its arguments. As a result, H(x, y; θ ) is continuously differentiable in x, y, and θ , and Condition (A2) of Glasserman (2003, Section 7.2.2) holds. The discussion on p. 395 of Glasserman (2003) implies that it suffices if Condition (A4) holds locally over bounded subsets. Given that H(x, y; θ ) is continuously differentiable, it follows that H(x, y; θ ) is locally Lipschitz continuous over bounded subsets so that Condition (A4) of Glasserman (2003, Section 7.2.2) holds. Condition (A3) of Glasserman (2003, Section 7.2.2) naturally holds (in our setting, the function f of Glasserman (2003) is the identity). □ Proof of Theorem 4.1. The law of iterated expectation implies the claim if fˆt is an unbiased estimator of ft . We therefore only show that fˆt is an unbiased estimator of ft . Assumption (B1) implies that the function y ↦ → b(y; θ ) is continuous so the functions mi and Mi are well defined and finite. Assumptions (B1) and (B2) together imply that Mi − mi > 0 so that the dominating Poisson process with jump times (τi,j )1≤j≤pi in (15) is well-defined. Standard thinning arguments (see Lewis and Shedler (1979) or Theorem 4.3 of Chen and Huang (2013)) imply that pi ( ∏

b(u + ρ (Ei−1 + τi,j ) + wi,j ; θ ) − mi

1−

)

Mi − mi

j=1

is an unbiased estimator of e

mi t

E∗θ

[

( ∫ t )⏐ ] ⏐ b(u + ρ s + Ws ; θ )ds ⏐⏐ Wt . exp − 0

The claim follows from (15) after fixing Wt = z − u − ρt .



Proof of Proposition 4.2. Fix v, w ∈ S and θ ∈ Θ , and set x = F (v; θ ) and y = F (w; θ ). We will bound the second moments of the factors that go into the density estimator (16) individually, and then combine all bounds. For clarity, we suppress θ in E∗θ and P∗θ , and x in Y x . We begin with terms of the type fˆt (u, z ; θ ) for u, z ∈ SY and t ∈ [0, ∆]. It can easily be seen that fˆ (u, z ; θ) ≤ e−m¯ for ¯ = min|u′ |≤L(I +1)+ρ ∆ b(u + u′ ; θ ). Assumption (B3) implies that if I = i for sufficiently large i, then m e−m¯ = O e|u|+L(i+1) .

(

)

∗ Now, P√ [I = i] = P∗ [W∆ ∈ [iL, (i − 1)L)] ≤ P∗ [W∆ ≥ iL]. Given that W is a standard Brownian motion under P∗ , we have W∆ ∼ ∆U for U ∼ N(0, 1). Markov’s inequality yields

[

]

E∗ fˆ 2 (u, z ; θ) =





E∗ e−2m¯ I +1 ⏐ I = i P∗ [I = i]

[

]

i≥0







E

[

e

¯ I +1 −2m

[ ]∑ [ ⏐ ⏐ ] ] L2 2 ⏐ I = i P∗ [|W∆ | ≥ iL] ≤ E∗ e 14 U 2 E∗ e−2m¯ I +1 ⏐ I = i e− 4∆ i

i≥0

i≥0

( =O



e

2 2(|u|+L(i+1)) − 4L∆ i2

) (

e

=O e

) 2|z |

.

(20)

i≥0

∗ Next, we bound the conditional second moment of the term q∆−T ∗∗ (u, y) given N∆ = n for u ∈ SY . Note that N



∗ TN ∗ ∼ ∆U(n) conditional on N∆ = n under P∗ , where U(n) is the nth order statistic of n independent standard uniform ∆ samples. The distribution of U(n) is Beta(n, 1). As a result, ∗

⏐ ⏐

[

]

∗ E∗ q2∆−T ∗ (u, y) ⏐⏐ N∆ =n = N∗ ∆

1

∫ 0

Γ (n + 1) n−1 2 s q∆(1−s) (u, y)ds ≤ n sup q2∆(1−s) (u, y). Γ (n)Γ (1) s∈[0,1]

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

22

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

If u ̸ = y, then q∆ (u, y) > 0 and lims↗1 q∆(1−s) (u, y) = 0. The continuity and nonnegativity of the Gaussian density imply that sups∈[0,1] q2∆(1−s) (u, y) < ∞ so that ∗

[

E

⏐ ] ⏐ ∗ ⏐ q∆−T ∗ (u, y) ⏐ N∆ = n = O(n) N∗ 2



if u ̸ = y. Together with (20), Assumption (B5), and the fact that YT ∗∗ ̸ = y almost surely, we conclude that N







∗ E∗ pˆ 2∆ (v, w; θ ) = O ⎝E∗ ⎣N∆

]

[

∗ N∆



⎤⎞ ∗

4YT ∗ −

YT2k∗ − e

n

n

⎦⎠ .

(21)

n=1

∗ Given that Y follows a Brownian motion with drift between jump times, we have for given y1 ∈ SY , n ≤ N∆ , and t ∈ [0, ∆]:

[



E∗ YT2k∗ − e

4YT ∗ − n

n

[( ] ⏐ ] [ ∗] √ )2k∗ 4(y +ρ t +√tU) ⏐ ∗ ∗ ∗ ∗ 1 = t = E y + ρ t + − T Y = y , T tU e = e4(y1 +ρ t)+8t E∗ U˜ 2k , ⏐ Tn−1 1 1 n−1 n

where U˜ ∼ N(y1 + (ρ + 4)t , t). As a result,

[



E∗ YTk∗ − e

4YT ∗ − n

n

⏐ ] ( ∗ ) ⏐ ∗ 4y1 . ⏐ YTn−1 = y1 , Tn∗ − Tn∗−1 = t = O y2k 1 e

(22)

Assumption (B4) tells us that

[

∗ 4Y ∗ Tn

E∗ YT2k∗ e n

⏐ ] ( ∗ ) ⏐ ∗ 4y1 ⏐ YTn − = y1 = O y2k 1 e

(23)

We combine (21)–(23) and conclude that

(

[



∗ 2k 4x E∗ pˆ 2∆ (v, w; θ ) = O E∗ N∆ x e

[

]

])

( ∗ ) = O x2k e4x ℓ∆ < ∞.

(24)

In light of (24), the claim follows from the unbiasedness of the density estimator established in Theorem 4.1.



Proof of Corollary 4.3. We again show that Conditions (A1)–(A4) of Glasserman (2003, Section 7.2.2) are satisfied (in our setting, the function f in Glasserman (2003) is the identity function). Assumptions (A3) and (B6) imply that fˆt2 −t1 (Ytx1 , Ytx2 ; θ ) is n-times continuously differentiable in θ ∈ int Θ and x ∈ int SY except if τk,pk = η. Nevertheless, this event occurs with zero probability. Thus, the set of discontinuities of the mapping θ ↦ → pˆ ∆ (v, w; θ ) is a null set and Conditions (A1) and (A2) of Glasserman (2003, Section 7.2.2) holds. Proposition 3.2 together with Assumption (B6) implies that pˆ ∆ (v, w; θ ) is locally Lipschitz continuous so that Condition (A4) of Glasserman (2003, Section 7.2.2) is satisfied. Finally, Condition (A3) of Glasserman (2003, Section 7.2.2) is naturally satisfied. □

ˆ K is the product of the simulated transition densities as in (18), it suffices to show that Proof of Theorem 6.1. Since L the simulated transition densities converge almost surely and uniformly in the parameter space to the true simulated transition densities. Fix v, w ∈ S and t > 0. Set x = F (v; θ ) and y = F (w; θ ). Define the function θ ∈ Θ ↦→ H = pˆ K∆ (v, w; θ ) − p∆ (v, w; θ ).

(25)

Corollary 4.3 implies that H is a continuous function on the parameter space Θ . Define (CΘ , ∥·∥) as the space of continuous functions on Θ , equipped with the∑ supremum norm. Since Θ is compact, (CΘ , ∥ · ∥) is a separable Banach space. Now, K ˆ H ∈ CΘ and we can rewrite H = K1 k=1 hk for a sequence (hk )k=1,...,K of i.i.d. samples of p∆ (v, w; θ ) − p∆ (v, w; θ ). Thus,

[

] ˆ E [∥hk ∥] ≤ sup p∆ (v, w; θ ) + E sup p∆ (v, w; θ ) < ∞ θ ∈Θ

θ∈Θ

by Theorem 3.1 and Assumption (C1). Further, E [hk ] = 0 by Theorem 4.1. Thus, we are in position to use the strong law of large numbers in ⏐separable Banach spaces (see, e.g., Beskos et al. (2009, Theorem ⏐ ⏐ K ⏐ 2)). It follows that ˆ − L⏐ → 0 almost surely supθ ∈Θ ⏐pˆ K∆ (v, w; θ ) − p∆ (v, w; θ )⏐ → 0 almost surely as K → ∞. Consequently, also supθ∈Θ ⏐L as K → ∞ and the asymptotical unbiasedness of θˆmK follows. For (strong) consistency, note that if K → ∞ as m → ∞, also limm→∞ θˆmK = limm→∞ limK →∞ θˆmK = limm→∞ θˆm = θ ∗ in probability (almost surely for strong consistency) given that Θ is compact. This follows since the SMLE θˆmK is asymptotically unbiased almost surely and the MLE θˆm is (strongly) consistent. □ Proof of Theorem 6.2. Taylor expansion together with the smoothness of pˆ K∆ implied by Corollary 4.3, the consistency of θˆmK as in Theorem 6.1, and Condition (C2) lead to

( ) ] √ 1 1 [ 1 − ∇ 2 log Lˆ K (θ ∗ ) m(θˆmK − θ ∗ ) = √ ∇ log Lˆ K (θ ∗ ) − ∇ log L(θ ∗ ) + √ ∇ log L(θ ∗ ) + oP (1) . m

m

m

(26)

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

23

The asymptotic efficiency of an MLE θˆm implies that the second summand on the right-hand side of (26) converges in Pθ ∗ -distribution to N(0, Σθ ∗ ), where Σθ ∗ is the Fisher information matrix. Thus, it remains to study the convergence of 1 the first summand on the right hand side of (26), as well as of the term − m ∇ 2 log Lˆ K (θ ∗ ). The strong law of large numbers and Corollary 4.3 imply that, for any v, w ∈ S and θ ∈ int Θ ,

[ ] ] [ ∇ 2 pˆ K∆ (v, w; θ ) → E∗θ ∇ 2 pˆ ∆ (v, w; θ ) = ∇ 2 E∗θ pˆ ∆ (v, w; θ ) = ∇ 2 p∆ (v, w; θ ) almost surely as K → ∞. It follows that 1

1

∇ 2 log Lˆ K (θ ∗ ) → lim

m

m→∞

m

∇ 2 log L(θ ∗ ) = −Σθ ∗

(27)

in probability as m →[ ∞ and K → ∞ because of] the asymptotic efficiency of an MLE. ˆ K (θ ∗ ) − ∇ log L(θ ∗ ) for the first summand on the right-hand side of (26). Now GKm = Write GKm = √1m ∇ log L √1

m

∑K

∑m

1 1 n=1 hK K n

k=1

(

gnk = ∇

gnk for

pˆ k∆ (X(n−1)∆ , Xn∆ ; θ ) ⏐

)⏐ ⏐ p∆ (X(n−1)∆ , Xn∆ ; θ ) ⏐

,

hKn =

θ =θ ∗

K 1 ∑ pˆ k∆ (X(n−1)∆ , Xn∆ ; θ ∗ )

K

k=1

p∆ (X(n−1)∆ , Xn∆ ; θ ∗ )

,

and a sequence (pˆ k∆ )k=1,..., ] of the unbiased density estimator. From Theorem 4.1 and Proposition 4.2 we [ K ⏐of i.i.d. samples conclude that hKn → E∗θ h1n ⏐ X(n−1)∆ , Xn∆ = 1 almost surely as K → ∞. Thus, it suffices to study the convergence of m K ∑ 1 ∑ k ¯ Km = √1 G gn . K m n=1

k=1

→ 0. Chebyshev’s inequality and the i.i.d. property of (pˆ k∆ )1≤k≤K imply that, for any ϵ > 0, ⎡( )2 ⎤ m K ∑ ∑ [ ] 1 P∗θ |G¯ Km | > ϵ ≤ E∗ ⎣ gnk ⎦ mK 2 ϵ 2 θ n=1 k=1 [ ] [ ] m K K ∑ ∑ ∑ 1 1 2 1 ∑ k k ∗ k 2 ∗ ≤ Eθ (gn ) + Eθ |gn gl | mK ϵ 2 K mK ϵ 2 K

First, consider the case

m K

n=1

1



K ϵ2

1≤n
k=1

max Eθ (gn1 )

[ ∗

] 2

+

m−1 K ϵ2

1≤n≤m

k=1

[ ] max E∗θ |gn1 gl1 | .

1≤n
¯ Km | > ϵ] → 0 as m → ∞ and K → ∞ if Hölder’s inequality and Assumption (C3) imply that P∗θ [|G ¯ Km → 0 in Pθ -probability as m, K → ∞. that G Next, consider the case m → c ∈ (0, ∞). Asymptotically, we have K = O(m) and K

( ¯ Km = OP G

m/c m c ∑ 1 ∑



m

n=1

m

m K

→ 0. We conclude

) .

gnk

k=1

Assumption (C3) and the i.i.d. property of (gnk )1≤k≤K imply that the central limit theorem holds under P∗θ so that m/c 1 ∑



m

gnk → N

k=1

(

1 c

] 1

E∗θ gnk | X(n−1)∆ , Xn∆ ,

[

c

Var∗θ gn1 | X(n−1)∆ , Xn∆

[

]

)

.

Corollary 4.3 states that the partial derivatives of pˆ k∆ are unbiased, which leads to

E∗θ gnk ⏐ X(n−1)∆ , Xn∆ = 0.



[

]

¯ Km → 0 almost surely. Thus, (gn1 )n≥1 are uncorrelated. We conclude from the strong law of large numbers that G K m Finally, consider the case K → ∞ so that m → 0. Taylor expansion yields

( −

1 m

√ √ ) √ K K K 1 ∇ 2 log Lˆ K (θ ∗ ) K (θˆmK − θ ∗ ) = Gm + √ ∇ log L(θ ∗ ) + oP (1) . m

m

m

The second term on the right-hand side converges to zero almost surely as m → ∞ and K → ∞ due to the asymptotic efficiency of the MLE. For the first term, note that



K m

GKm

1

= √

m

[

m 1 ∑ 1

m

n=1

hKn

K 1 ∑



K

] gnk

→0

k=1

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.

24

K. Giesecke and G. Schwenkler / Journal of Econometrics xxx (xxxx) xxx

almost surely according to the central limit theorem and the strong law of large numbers as in the case √ follows that K (θˆmK − θ ∗ ) → 0 almost surely given (27). □

m K

→ c > 0. It

References Aït-Sahalia, Yacine, 2008. Closed-form likelihood expansions for multivariate diffusions. Ann. Statist. 36 (2), 906–937. Aït-Sahalia, Yacine, Yu, Jialin, 2006. Saddlepoint approximations for continuous-time Markov processes. J. Econometrics 134 (2), 507–551. Andrieu, Christophe, Doucet, Arnaud, Holenstein, Roman, 2010. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 (3), 269–342. Beskos, Alex, Papaspiliopoulos, Omiros, Roberts, Gareth, 2006. Retrospective exact simulation of diffusion sample paths with applications. Bernoulli 12 (6), 1077–1098. Beskos, Alex, Papaspiliopoulos, Omiros, Roberts, Gareth, 2009. Monte Carlo maximum likelihood estimation for discretely observed diffusion processes. Ann. Statist. 37, 223–245. Beskos, Alexander, Roberts, Gareth, 2005. Exact simulation of diffusions. Ann. Appl. Probab. 15 (4), 2422–2444. Blanchet, Jose, Ruf, Johannes, 2016. A weak convergence criterion for constructing changes of measure. Stoch. Models 32 (2), 233–252. Brémaud, Pierre, 1980. Point Processes and Queues – Martingale Dynamics. Springer-Verlag, New York. Carrasco, Marine, Chernov, Mikhail, Florens, Jean-Pierre, Ghysels, Eric, 2007. Efficient estimation of general dynamic models with a continuum of moment conditions. J. Econometrics 140 (2), 529–573. Casella, Bruno, Roberts, Gareth O., 2011. Exact simulation of jump-diffusion processes with Monte Carlo applications. Methodol. Comput. Appl. Probab. 13 (3), 449–473. Chang, Jinyuan, Chen, Song Xi, 2011. On the approximate maximum likelihood estimation for diffusion processes. Ann. Statist. 39 (6), 2820–2851. Chen, Nan, Huang, Zhengyu, 2013. Localization and exact simulation of Brownian motion driven stochastic differential equations. Math. Oper. Res. 38, 591–616. Chen, Song X., Peng, Liang, Yu, Cindy L., 2013. Parameter estimation and model testing for Markov processes via conditional characteristic functions. Bernoulli 19 (1), 228–251. Cosma, Antonio, Galluccio, Stefano, Pederzoli, Paola, Scaillet, Olivier, 2018. Early exercise decision in American options with dividends, stochastic volatility and jumps. J. Financ. Quant. Anal. 1–26. Dacunha-Castelle, D., Florens-Zmirou, D., 1986. Estimation of the coefficients of a diffusion from discrete observations. Stochastics 19 (4), 263–284. Das, Sanjiv R., 2002. The surprise element: jumps in interest rates. J. Econometrics 106 (1), 27–65. Detemple, Jérôme, Garcia, René, Rindisbacher, Marcel, 2006. Asymptotic properties of Monte Carlo estimators of diffusion processes. J. Econometrics 134 (1), 1–68. Duffie, Darrell, Glynn, Peter, 1995. Efficient Monte Carlo estimation of security prices. Ann. Appl. Probab. 4 (5), 897–905. Duffie, Darrell, Pan, Jun, Singleton, Kenneth, 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 1343–1376. Feuerverger, Andrey, McDunnough, Philip, 1981. On the efficiency of empirical characteristic function procedures. J. R. Stat. Soc. Ser. B Stat. Methodol. 43 (1), 20–27. Filipović, Damir, Mayerhofer, Eberhard, Schneider, Paul, 2013. Density approximations for multivariate affine jump-diffusion processes. J. Econometrics 176 (2), 93–111. Giesecke, Kay, Shkolnik, Alexander, 2019. Reducing bias in event time simulations using measure changes. Math. Oper. Res. forthcoming. Giesecke, Kay, Shkolnik, Alexander, Teng, Gerald, Wei, Yexiang, 2019. Numerical solution of jump-diffusion SDEs. Oper. Res. forthcoming. Giesecke, Kay, Smelov, Dmitry, 2013. Exact sampling of jump-diffusions. Oper. Res. 61 (4), 894–907. Glasserman, Paul, 2003. Monte Carlo Methods in Financial Engineering. Springer-Verlag, New York. Glynn, Peter W., Whitt, Ward, 1992. The asymptotic efficiency of simulation estimators. Oper. Res. 40 (3), 505–520. Gonçalves, F., Roberts, G.O., 2014. Exact simulation problems for jump-diffusions. Methodol. Comput. Appl. Probab. 16, 907–930. Jiang, George J., Knight, John L., 2010. ECF estimation of Markov models where the transition density is unknown. Econom. J. 13 (2), 245–270. Kou, Steven G., 2002. A jump-diffusion model for option pricing. Manage. Sci. 48 (8), 1086–1101. Kristensen, Dennis, Shin, Yongseok, 2012. Estimation of dynamic models with nonparametric simulated maximum likelihood. J. Econometrics 167 (1), 76–94. Lewis, P., Shedler, G., 1979. Simulation of nonhomogeneous Poisson processes by thinning. Nav. Logist. Q. 26, 403–413. Li, Chenxu, Chen, Dachuan, 2016. Estimating jump-diffusions using closed-form likelihood expansions. J. Econometrics 195 (1), 51–70. Lo, Andrew W., 1988. Maximum likelihood estimation of generalized Itô processes with discretely sampled data. Econometric Theory 4 (2), 231–247. Newey, Whitney K., McFadden, Daniel, 1994. Large sample estimation and hypothesis testing. In: Handbook of Econometrics, Vol. 4. Elsevier, pp. 2111–2245. Protter, Philip, 2004. Stochastic Integration and Differential Equations. Springer-Verlag, New York. Protter, Philip, Shimbo, Kazuhiro, 2008. No arbitrage and general semimartingales. In: Ethier, Stewart N., Feng, Jin, Stockbridge, Richard H. (Eds.), Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz. In: Collections, vol. 4, Institute of Mathematical Statistics, pp. 267–283. Rogers, L.C.G., 1985. Smooth transition densities for one-dimensional diffusions. Bull. Lond. Math. Soc. 17 (2), 157–161. Semaidis, Giorgos, Papaspiliopoulos, Omiros, Roberts, Gareth O., Beskos, Alexandros, Fearnhead, Paul, 2013. Markov chain Monte Carlo for exact inference for diffusions. Scand. J. Stat. 40 (2), 294–321. Singleton, Kenneth J., 2001. Estimation of affine asset pricing models using the empirical characteristic function. J. Econometrics 102 (1), 111–141. Singleton, Kenneth J., 2006. Empirical Dynamic Asset Pricing. Princeton University Press. Straumann, Daniel, Mikosch, Thomas, 2006. Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. Ann. Statist. 34 (5), 2449–2495. Yu, Jialin, 2007. Closed-form likelihood approximation and estimation of jump-diffusions with an application to the realignment risk of the Chinese Yuan. J. Econometrics 141 (2), 1245–1280.

Please cite this article as: K. Giesecke and G. Schwenkler, Simulated likelihood estimators for discretely observed jump–diffusions. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.01.015.