Available online at www.sciencedirect.com
ScienceDirect Stochastic Processes and their Applications 124 (2014) 51–80 www.elsevier.com/locate/spa
Parametric inference for discretely observed multidimensional diffusions with small diffusion coefficient Romain Guy a,b,∗ , Catherine Lar´edo a,b , Elisabeta Vergu a a UR 341 Math´ematiques et Informatique Appliqu´ees, INRA, Jouy-en-Josas, France b UMR 7599 Laboratoire de Probabilit´es et Mod`eles al´eatoires, Universit´e Denis Diderot Paris 7 and CNRS, Paris,
France Received 6 June 2012; received in revised form 3 May 2013; accepted 23 July 2013 Available online 6 August 2013
Abstract We consider a multidimensional diffusion X with drift coefficient b(α, X t ) and diffusion coefficient ϵσ (β, X t ). The diffusion sample path is discretely observed at times tk = k∆ for k = 1 . . . n on a fixed interval [0, T ]. We study minimum contrast estimators derived from the Gaussian process approximating X for small ϵ. We obtain consistent and asymptotically normal estimators of α for fixed ∆ and ϵ → 0 and of (α, β) for ∆ → 0 and ϵ → 0 without any condition linking ϵ and ∆. We compare the estimators obtained with various methods and for various magnitudes of ∆ and ϵ based on simulation studies. Finally, we investigate the interest of using such methods in an epidemiological framework. c 2013 Elsevier B.V. All rights reserved. ⃝ Keywords: Minimum contrast estimators; Low frequency data; High frequency data; Epidemic data
1. Introduction In this study we focus on the parametric inference in the drift coefficient b(α, X tϵ ) and in the diffusion coefficient ϵσ (β, X tϵ ) of a multidimensional diffusion model X tϵ t≥0 with small diffusion coefficient, when it is observed at discrete times on a fixed time interval in the ∗ Corresponding author at: UR 341 Math´ematiques et Informatique Appliqu´ees, INRA, Jouy-en-Josas, France. Tel.: +33 0 1 34 65 22 58. E-mail address:
[email protected] (R. Guy).
c 2013 Elsevier B.V. All rights reserved. 0304-4149/$ - see front matter ⃝ http://dx.doi.org/10.1016/j.spa.2013.07.009
52
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
asymptotics ϵ → 0. This asymptotics has been widely studied and has proved fruitful in applied problems, see e.g. [6]. Our interest in considering this kind of diffusions is motivated by the fact that they are natural approximations of epidemic processes. Indeed, the classical stochastic S I R model in a closed population, describing variations over time in Susceptible (S), Infectious (I ) and Removed (R) individuals, is a bi-dimensional continuous-time Markovian jump process. The population size (N ) based normalization of this process asymptotically leads to an ODE system. Before passing to the limit, the forward Kolmogorov diffusion equation allows describing the epidemic √ dynamics through a bi-dimensional diffusion, with diffusion coefficient proportional to 1/ N . Moreover, epidemics are discretely observed and therefore we are interested in the statistical setting defined by discrete data sampled at times tk = k∆ on a fixed interval [0, T ] with T = n∆. The number of data points is n and ∆, the sampling interval, is not necessarily small. Historically, statistics for diffusions were developed for continuously observed processes leading to explicit formulations of the likelihood [16,18]. In this context, two asymptotics exist for estimating α for a diffusion continuously observed on a time interval [0, T ]: T → ∞ for recurrent diffusions and T fixed and the diffusion coefficient tends to 0. In practice, however, observations are not continuous but partial, with various mechanisms underlying the missingness, which leads to intractable likelihoods. One classical case consists in sample paths discretely observed with a sampling interval ∆. This adds another asymptotic framework ∆ → 0 and raises the question of estimating parameters in the diffusion coefficient (see [8,24] for T fixed and [12,15,23] for T → ∞). Since nineties, statistical methods associated to discrete data have been developed in the asymptotics of a small diffusion coefficient (e.g. [17,7,22]). Considering a discretely observed diffusion on R with constant (=ϵ) diffusion coefficient, Genon-Catalot (1990) obtained, using the Gaussian approximating process [2], a consistent and ϵ −1 -normal and efficient estimator of √ α under the condition ϵ → 0, ∆ → 0, ϵ/ ∆ = O(1) . The author additionally proved that this estimator possessed good properties also for ∆ fixed. Uchida [22] obtained similar results using approximate martingale estimating equations. Then, Sørensen [20] obtained, as ϵ → 0, consistent and ϵ −1 -normal estimators of a parameter θ present in both the drift and diffusion coefficients, with no assumption on ∆, but under additional conditions not verified in the case of distinct parameters in the drift and diffusion coefficients. For this latter case, Sørensen and √ Uchida [21] obtained consistent and ϵ −1 -normal estimators of α and consistent and n-normal √ estimators of β under the condition ∆/ϵ → 0 and ∆/ϵ bounded. This result was later extended by Gloter and Sørensen [10] to the case where ϵ −1 ∆ρ is bounded for some ρ > 0. Their results rely on a class of contrast processes based on the expansion of the infinitesimal generator of the diffusion, the order of the expansion being driven by the respective magnitude of ϵ and ∆ and requiring this knowledge (value of ρ), which might be a drawback when applying the method. Moreover, this contrast becomes difficult to handle for values of ∆ that are not very small with respect to ϵ. To overcome this drawback, we consider a simple contrast based on the Gaussian approximation of the diffusion process X ϵ [2,6]. Contrary to Gloter and Sørensen [10], our contrast has generic formulation, regardless of the ratio between ∆ and ϵ. Thus, the standard balance condition between ϵ and ∆ of previous works is here removed. Our study extends the results of [7] to the case of multidimensional diffusion processes with parameters in both the drift and diffusion coefficients. We consider successively the cases ∆ fixed and ∆ → 0. We obtain consistent and ϵ −1 -normal estimators of α (when β is unknown or equal to a known function of α) for fixed ∆. For high frequency data, we obtain results similar to [10], but without
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
53
any assumption on ϵ with respect to ∆. The estimators obtained are analytically calculated on a simple example, the Cox–Ingersoll–Ross (CIR) model. Finally, they are compared based on simulation studies in the case of a financial two-factor model [19] and of the epidemic S I R model [5], for various magnitudes of ∆ and ϵ. The paper is structured as follows. After an introduction, Section 2 contains the notations and preliminary results on the stochastic Taylor expansion of the diffusion. Sections 3 and 4, which constitute the core of the paper, present analytical results, both in terms of contrast functions and estimators properties. We investigate in Section 3 the inference when ∆ fixed and ϵ → 0 in three contexts depending on whether the parameter β in the diffusion coefficient is unknown, equal to a known function of α (with the special case β = α) or whether the diffusion coefficient is multiplicative. Section 4 is devoted to the case ∆ → 0. Results are applied in Section 5 to the CIR model. Moreover, the different estimators obtained are compared based on numerical simulations to the minimum contrast estimator of Gloter and Sørensen [10], mainly in the context of epidemic data.
2. Notations and preliminary results Let us consider on a probability space Ω , A, (At )t≥0 , P the p-dimensional diffusion process satisfying the stochastic differential equation ϵ d X t = b(α, X tϵ )dt + ϵσ (β, X tϵ )d Bt (2.1) X 0ϵ = x0 , where x0 ∈ R p is prescribed, ϵ > 0, θ = (α, β) are unknown multidimensional parameters, b(α, x) is a vector in R p , σ (β, x) is a p × p matrix and (Bt )t≥0 is a p-dimensional Brownian motion defined on (Ω , A). Throughout the paper we use the convention that objects are indexed by θ when there is a dependence on both α and β and by α or β alone otherwise. Let us denote by M p (R) the set of p × p matrices, and by tM, Tr(M) and det(M) respectively the transpose, trace and determinant of a matrix M. ∂f (α0 , x0 ) and We denote the partial derivatives of a function f (α, x) in (α0 , x0 ) by ∂α ∂f ∂ x (α0 , x 0 ). Moreover, if x = x(α, t) the derivative of the function α → f (α, x(α, t)) in α0 ∂f ∂x will be denoted by ∂ f (α,x(α,t)) (α0 ) = ∂α (α0 , x(α0 , t)) + ∂∂ xf (α0 , x(α0 , t)) ∂α (α0 , t). ∂α We set Σ (β, x) = σ (β, x) tσ (β, x).
(2.2)
In what follows, we assume that A = sup(At , t ≥ 0), (At )t≥0 is right-continuous and ∀t ∈ [0, T ], X tϵ ∈ U (ii) b(α, ·) ∈ C 2 (U, R p ), σ (β, ·) ∈ C 2 (U, M p ) (iii) ∃K > 0, ∥b(α, x) − b(α, y)∥2 + ∥σ (β, x) − σ (β, y)∥2 ≤ K ∥x − y∥2
(i) ∃U, open set of R p such that, for small enough ϵ, (H1)
(H2) ∀x ∈ U, Σ (β, x) is invertible. Assumptions (H1) and (H2) ensure the existence and uniqueness of a strong solution of (2.1), with infinite explosion time (see e.g. [13]).
54
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
2.1. Results on the ordinary differential equation Consider the solution xα (t) of the ODE associated with ϵ = 0 in (2.1) d xα (t) = b(α, xα (t))dt xα (0) = x0 ∈ R p .
(2.3)
Under (H1), this solution is well defined, unique and belongs to C 2 (U, R p ). Let us consider the matrix Φα (·, t0 ) ∈ M p , the solution of ∂b (α, xα (t))Φα (t, t0 ) dt ∂x Φα (t0 , t0 ) = I p .
dΦ
α
(t, t0 ) =
(2.4)
Under (H1), it is well known (see e.g. [3]) that, for t0 ∈ [0, T ], Φα (·, t0 ) is twice continuously differentiable on [0, T ] and satisfies the semi-group property ∀(t0 , t1 , t2 ) ∈ [0, T ]3 ,
Φα (t2 , t0 ) = Φα (t2 , t1 )Φα (t1 , t0 ).
(2.5)
A consequence of (2.5) is that the matrix Φα (t1 , t0 ) is invertible with inverse Φα (t0 , t1 ). 2.2. Taylor stochastic expansion of the diffusion (X tϵ ) We use in the sequel some known results for small perturbations of dynamical systems (see [6,2]). The family of diffusion processes (X tϵ , t ∈ [0, T ]) solution of (2.1) satisfies the following theorem. Theorem 2.1. Under (H1), X tϵ = xα (t) + ϵgθ (t) + ϵ 2 Rθ2,ϵ (t) with sup {ϵ Rθ2,ϵ (t)} −−→ 0 in probability (2.6) ϵ→0
t∈[0,T ]
and with xα (·) defined in (2.3) and gθ (t) satisfying dgθ (t) =
∂b (α, xα (t))gθ (t)dt + σ (β, xα (t))d Bt , ∂x
with gθ (0) = 0.
Remark 2.1. We use also in sequel the Taylor expansion of order 1 X tϵ = xα (t) + ϵ Rθ1,ϵ (t) with sup ϵ Rθ1,ϵ (t) −−→ 0 in probability. t∈[0,T ]
ϵ→0
(2.7)
(2.8)
Corollary 2.1. Under (H1), the process gθ (·) is the continuous Gaussian martingale on [0, T ] defined, using (2.4), by t gθ (t) = Φα (t, s)σ (β, xα (s))d Bs . (2.9) 0
Proof. Using (2.5), the matrix Φα (t, 0) is invertible with inverse Φα (0, t). The process C(t) defined by gθ (t) = Φα (t, 0)C(t) satisfies, using (2.7), dC(t) = Φα (0, t)σ (β, xα (t))d Bt and C(0) = 0. Thus, applying (2.5) yields (2.9).
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
55
Corollary 2.2. Assume (H1). If, moreover, b(α, ·) and σ (β, ·) have uniformly bounded derivatives on U , then there exist constants only depending on T and θ such that 2 (i) ∀t ∈ [0, T ], E [(] Rθ2,ϵ (t) ) < C1 , 2 2,ϵ 2,ϵ (ii) ∀t ∈ [0, T ], as h → 0, E Rθ (t + h) − Rθ (t) < C2 h. The result 2.2-(i) is given in [6, Theorem 2.2 p. 56], proof of 2.2-(ii) is given in Appendix A.2. An important consequence of Corollary 2.1 is the following lemma on the Gaussian process gθ . Let us define tk 1 Z kθ = √ Φα (tk , s)σ (β, xα (s)) d Bs , (2.10) ∆ tk−1 1 tk α,β Sk = Φα (tk , s)Σ (β, xα (s)) tΦα (tk , s)ds. (2.11) ∆ tk−1 Lemma 2.1. Under (H1), the random variables gθ (tk ) verify, for tk = k∆, k = 1, . . . , n, √ (2.12) gθ (tk ) = Φα (tk , tk−1 )gθ (tk−1 ) + ∆Z kθ , where (Z kθ )1≤k≤n defined in (2.10) is a sequence of R p -dimensional independent centered α,β Gaussian random variables, Atk -measurable and with the covariance matrix Sk Proof. Using tk (2.9) and the semi-group property of Φα (t, s) yields gθ (tk ) = Φα (tkθ, tk−1 )gθ (tk−1 ) + tk−1 Φα (tk , s)σ (β, xα (s)) d Bs . The proof is achieved by identifying Z k in this relation. α,β
Note that (H1) and (H2) ensure that Sk
is a positive definite matrix.
2.3. Statistical framework Let C = C([0, T ], R p ) denote the space of continuous functions defined on [0, T ] with values in R p endowed with the uniform convergence topology, C the σ -algebra of the Borel sets, (X t ) the canonical coordinates of (C, C) and Ft = σ (X s , 0 ≤ s ≤ T ). Finally, let Pϵθ = Pϵα,β be the distribution on (C, C) of the diffusion process solution of (2.1). From now on, let θ0 = (α0 , β0 ) ∈ Θ be the true value of the parameter. We assume ˚ (S1) (α, β) ∈ K a × K b = Θ with K a , K b compacts sets of Ra , Rb ; θ0 ∈ Θ (S2) (H1)–(H2) hold for all (α, β) ∈ Θ with constant K not depending on θ (S3) The function b(α, x) is C 3 (K a × U, R p ) and σ (β, x) ∈ C 2 (K b × U, M p ) (S4) ∆ → 0: α ̸= α ′ ⇒ b(α, xα (·)) ̸= b(α ′ , xα ′ (·)) (S4′ ) ∆ fixed: α ̸= α ′ ⇒ {∃k, 1 ≤ k ≤ n, xα (tk ) ̸= xα ′ (tk )} (S5) β ̸= β ′ ⇒ Σ (β, xα0 (·)) ̸= Σ (β ′ , xα0 (·)). Assumptions (S1)–(S3) are classical for the inference for diffusion processes. The differentiability in (S3) comes from the regularity conditions required on α → Φα (t, s). Indeed, (S3) on b(α, x) ensures that Φα (t, t0 ) belongs to C 2 (K a × [0, T ]2 , M p ) (see Appendix A.1 for the proof). (S4) is the usual identifiability assumption for a continuously observed diffusion on [0, T ]. Note that (S4) ensures that, for ∆ small enough, (S4′ ) holds.
56
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
For a sample path y(·) ∈ C([0, T ], R p ), let us define the quantity depending on xα (·), Φα (·, ·) and on the discrete sampling (ytk , k = 1, . . . , n), Nk (y, α) = y(tk ) − xα (tk ) − Φα (tk , tk−1 )(y(tk−1 ) − xα (tk−1 )).
(2.13)
Note that Nk (xα , α) = 0. Let us also define the Gaussian process (Ytϵ )t∈[0,T ] ∈ C, Ytϵ = xα (t) + ϵgθ (t). Using (2.12) and (2.13), we can express the random variables Z kθ using Ytϵ , Z kθ
Ytϵ − xα (tk−1 ) Ytϵ − xα (tk ) − Φα (tk , tk−1 ) k−1 √ = Nk = k √ ϵ ∆ ϵ ∆
Y·ϵ − xα (·) ,α . √ ϵ ∆
(2.14)
Then, the n-sample (Ytk , k = 1, . . . , n) has an explicit log-likelihood l(α, β; (Ytk )) which is, using (2.13) and (2.14), l(α, β; (Ytk )) = −
n n 1 1 α,β α,β t log(det Sk ) − 2 Nk (Y, α)(Sk )−1 Nk (Y, α). 2 k=1 2ϵ ∆ k=1
(2.15)
3. Parametric inference for a fixed sampling interval For the diffusion parameter β, all existing results for discretized observations on a fixed sampling interval are provided in the context of the asymptotics ∆ → 0 (T = n∆). In this section we focus on a different asymptotics (ϵ → 0) as ∆ is assumed to be fixed. We build a contrast process based on the functions Nk (X, α) defined in (2.13). Except for some specific cases (e.g. linear drift in the diffusion process), the two deterministic quantities xα (·), Φα (·, ·) appearing in the Nk ’s are not explicit and are approximated by solving numerically an ODE with dimension p × ( p + 1). 3.1. One-dimensional Ornstein–Uhlenbeck process The one dimensional Ornstein–Uhlenbeck process is an appropriate illustration of the limitations imposed by the assumption ∆ fixed. Indeed, assuming that α is known and equal to α0 , the diffusion process (X t )t∈[0,T ] following d X t = α0 X t dt + ϵβd Bt , X 0 = x0 ∈ R is equal to its Gaussian approximation (X t = xα0 (t) + ϵgα0 ,β (t)), and l(α , β) is then the log 0 α ,β
likelihood of Gaussian observations. Noting that Sk 0
= β2
e2α0 ∆ −1 2α0 ∆
, xα0 (t) = x0 eα0 t and
Φα0 (tk , tk−1 ) = eα0 ∆ , the maximum likelihood estimator of β is given by 2 βˆϵ, ∆ =
n
2α0 ϵ 2 (e2α0 ∆ − 1)
2 2 Under Pθ0 , βˆϵ, ∆ = β0
n
X tk − eα0 ∆ X tk−1
2
.
k=1
2 2 k=1 Uk , where Uk =
2 tk α0 (tk −s) d B e s . tk−1 2 N (0, 1), and βˆϵ, ∆ is unbiased
2α0 e2α0 ∆ −1
Hence, (Uk )1≤k≤n are i.i.d. random variables no other properties as ϵ → 0.
for all ϵ but has
57
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
3.2. General case (β unknown) In the case where we have no information on β, it is quite natural to consider a contrast process derived from the conditional least squares for (Ytk ), which does not depend on β and is defined using (2.4) and (2.13) by n 1 t Nk (X, α)Nk (X, α). U¯ ϵ,∆ α; (X tk ) = U¯ ϵ,∆ (α) = ∆ k=1
(3.1)
Then, the conditional least square estimator is defined as any solution of, α¯ ϵ,∆ = argmin U¯ ϵ,∆ α, (X tk ) .
(3.2)
α∈K a
Let us also define, n 1 t K¯ ∆ (α0 , α) = Nk (xα0 , α)Nk (xα0 , α). ∆ k=1
Clearly, K¯ ∆ (α0 , α) ≥ 0 and K¯ ∆ (α0 , α0 ) = 0. Now, K¯ ∆ (α0 , α) = 0 if for all k, xα (tk ) − xα0 (tk ) = Φα (tk , tk−1 )(xα (tk−1 ) − xα0 (tk−1 )). The matrix Φα (tk , tk−1 ) being invertible, this is the identifiability assumption (S4′ ). Lemma 3.1. Assume (S1), (S2). Then, under Pθ0 , U¯ ϵ,∆ (α) −−→ K¯ ∆ (α0 , α) in probability.
(3.3)
ϵ→0
Using (2.13) and the stochastic Taylor formula (2.6) the proof is immediate. In order to study α¯ ϵ,∆ , we define for 1 ≤ i ≤ a and for 1 ≤ k ≤ n, 1 ∂ xα (tk ) ∂ xα (tk−1 ) Dk,i (α) = (α) + Φα (tk , tk−1 ) (α) ∈ R p , − ∆ ∂αi ∂αi
(3.4)
and M∆ (α) = ∆
n
t
Dk,i (α)Dk, j (α)
∈ Ma (R).
k=1
(3.5)
1≤i, j≤a
Proposition 3.1. Assume (S1)–(S3) and (S4′ ). Then, under Pθ0 , (i) α¯ ϵ,∆ −−→ α0 in probability. ϵ→0
−1 (ii) If M∆ (α0 ) is invertible, ϵ −1 α¯ ϵ,∆ − α0 −−→ N (0, J∆ (α0 , β0 )) in distribution, with ϵ→0
J∆ (α0 , β0 ) = M∆ (α0 ) ∆
n k=1
t
θ Dk,i (α0 )Sk 0 Dk, j (α0 )
−1 t 1≤i, j≤a
M∆ (α0 ).
(3.6)
58
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Proof. The proof of (i) is classical and relies on the control of the continuity modulus of U¯ ∆,ϵ (α, X tk (ω)) (see Appendix A.3 for details).
¯ϵ) ϵ (α0 ) ϵ (α . Expanding ∂ U∂α in Taylor series at point α0 yields Let us just study 1ϵ ∂ U∂α 2 ¯ 1 ∂ 2 U¯ ϵ ∂ 2 U¯ ϵ 1 ∂ U¯ ϵ (α0 ) 0 = ϵ ∂α + ∂α 2 (α0 ) + 0 ∂α 2 (α0 + t (αϵ,¯ ∆ − α0 )) − ∂∂αU2ϵ (α0 ) dt ϵ −1 (α¯ ϵ − α0 ). t √ ¯ ϵ (α0 ) 1 ∂ Nk (X,α) √1 Nk (X, α0 ) . First, we study the term 1ϵ ∂ U∂α = 2 ∆ nk=1 ∆ (α0 ) ∂α ϵ ∆ Using (2.6) and (2.10), ¯
¯
1 θ √ Nk (X, α0 ) −−→ Z k 0 ϵ→0 ϵ ∆ Now, by (2.13) and (3.4), based on (2.8) we obtain
in Pθ0 -probability.
∂ Nk (X,α) (α0 ) ∂αi
(3.7)
= ∆Dk,i (α0 ) + ∂ Φα (t∂αk ,ti k−1 ) (α0 ) X tk−1 − xα0 (tk−1 ) , and
∂ Nk (X, α) (α0 ) −−→ ∆Dk,i (α0 ). ϵ→0 ∂αi t 1 ∂ Nk (X,α) (α0 ) By Slutsky’s lemma, ∆ ∂αi
(3.8) √1
ϵ ∆
θ Nk (X, α0 ) −−→ Dk,i (α0 )Z k 0 , and definition ϵ→0
(2.11) yields that, under Pθ0 , as ϵ → 0, n ¯ 1 ∂ Uϵ α ,β t 0 0 (α0 ) −−→ N 0, 4∆ Dk,i (α0 )Sk Dk, j (α0 ) ϵ→0 ϵ ∂α k=1
in distribution.
1≤i, j≤a
In Appendix A.3 we prove, by using the matrix defined in (3.5), that Pθ0 -a.s., ∂ 2 U¯ ϵ (α0 ) −−→ 2M∆ (α0 )i, j ϵ→0 ∂αi ∂α j
and
supt∈[0,1] ∥ ∂∂αU2ϵ (α0 + t (α¯ ϵ,∆ − α0 )) − ∂∂αU2ϵ (α0 )∥ −−→ 0, which completes the proof of (ii). 2
2
¯
¯
ϵ→0
It is well known that the Fisher Information matrix for a continuously observed diffusion in the asymptotics of ϵ → 0 is (see e.g. [16]) T t ∂b ∂b Ib (α0 , β0 ) = (α0 , xα0 (s))Σ −1 (β0 , xα0 (s)) (α0 , xα0 (s))ds . (3.9) ∂α j 0 ∂αi 1≤i, j≤a Setting Fb (α0 , M)
=
T ( 0
t∂b ∂b ∂αi (α0 , x α0 (s))M(s) ∂α j
(α0 , xα0 (s))ds)1≤i, j≤a , we have that
M∆ (α0 ) → Fb (α0 , I p ) and J∆ (α0 , β0 ) → Fb (α0 , I p )(Fb (α0 , Σ (β0 , xα0 (·))))−1 tFb (α0 , I p ) as ∆ → 0. This is different from the Fisher Information matrix Ib (α0 , β0 ) = Fb (α0 , Σ −1 (β0 , xα0 (·))), but possesses the right rate of convergence. 3.3. Case of additional information on β In this section we will consider successively the case where β is a known regular function of α and the multiplicative case for parameter β which applies to Ornstein–Uhlenbeck or Cox–Ingersoll–Ross models for examples. In the former context, one particular subcase, interesting in applications, is given by α = β (see Section 5.3).
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
59
3.3.1. Case of β = f (α), f known In many applicative situations, such as the modeling of epidemic spread (see Section 5.3), we have β = α. Using a contrast depending on β through α leads to the optimal asymptotic information. We regroup these cases in a more general formulation with β = f (α), where f is known and regular. Since the Gaussian process (Yt ) is a good approximation of (X t ) for small ϵ (Theorem 2.1), we use the likelihood (2.15) to derive a contrast process for (X tk ). The sampling interval ∆ being fixed, the first term of (2.15) converges to a finite limit as ϵ → 0. This leads to the contrast process, using (2.11) n 1 α, f (α) −1 t Nk (X, α)(Sk ) Nk (X, α). U˜ ∆,ϵ α; (X tk ) = ∆ k=1
(3.10)
Then, we can define α˜ ϵ,∆ = argmin U˜ ϵ,∆ α, (X tk ) .
(3.11)
α∈K a
Clearly, under (S1)–(S3), U˜ ∆,ϵ (α) −−→ K˜ ∆ (α0 , α), where ϵ→0
n 1 α, f (α) −1 t K˜ ∆ (α0 , α) = ) Nk (xα0 , α). Nk (xα0 , α)(Sk ∆ k=1
(3.12)
Assumption (S4′ ) ensures that K˜ ∆ (α0 , α) is non negative and has a strict minimum at α = α0 . Proposition 3.2. Assume (S1)–(S3), (S4′ ). Then, (i) α˜ ϵ,∆ −−→ α0 in Pθ0 -probability.
ϵ→0 −1 (ii) If I∆ (α0 , β0 ) is invertible, ϵ −1 α˜ ϵ,∆ − α0 −−→ N (0, I∆ (α0 , β0 )), under Pθ0 in ϵ→0
distribution, with I∆ (α0 , β0 ) = ∆
n
t
Dk,i (α0 )
θ −1 Sk 0
Dk, j (α0 )
k=1
(3.13) 1≤i, j≤a
The proof of (i) is a repetition of the proof of Proposition 3.1. The proof of (ii) relies again ∂U on the two properties. Under Pθ0 , ϵ −1 ∂α∆,ϵ (α0 ) −−→ N (0, 4I∆ (α0 , β0 )) in distribution and ∂ 2 U∆,ϵ (α0 ) ∂α 2
ϵ→0
−−→ 2I∆ (α0 , β0 ) in probability. Contrary to Proposition 3.1, additional terms ϵ→0
α, f (α)
appear due to the derivation of Sk . Those terms are controlled using α → Φα (tk , t) and α → Σ ( f (α), xα (t)) regularities. Details of the proof are given in Appendix A.4. Remark 3.1. Contrary to the previous contrast (3.1), the Covariance matrix is asymptotically optimal in the sense that I∆ (α0 , β0 ) −−−→ Ib (α0 , β0 ) where Ib is defined in (3.9). ∆→0
3.3.2. The multiplicative case (Σ (β, x) = f (β)Σ0 (x)) The case of Σ (β, x) = f (β)Σ0 (x) with f (·) a strictly positive known function of C(Rb , R∗+ ) α,β often occurs in practice. Note that Sk = f (β)Skα,0 with
60
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
1 tk t Skα,0 = ∆ tk−1 Φα (tk , s)Σ0 (x α (s)) Φα (tk , s)ds. Define the contrast process U˜ ∆,ϵ α; (X tk ) = 1 nk=1 tNk (X, α)(S α,0 )−1 Nk (X, α), then we have the following. ∆
k
Corollary 3.1. Assume (S1)–(S3)–(S4′ ), Then, under Pθ0 , as ϵ → 0, (i) α˜ ϵ,∆ −−→ α0 in Pθ0 -probability. ϵ→0 −1 (ii) If I∆ (α0 , β0 ) is invertible, ϵ −1 α˜ ϵ,∆ − α0 −−→ N (0, I∆ (α0 , β0 )) in distribution. ϵ→0
α, f (α)
Its study is similar to β = f (α), with a substitution of Sk
by Skα,0 in (3.10).
4. Parametric inference for a small sampling interval We assume now that ∆ = ∆n → 0, so that the number of observations n = T /∆n goes to infinity. The results obtained by Gloter and Sørensen [10] state that, under the additional ρ condition (∃ρ > 0, ∆n /ϵ bounded), the rates of convergence for α, β are respectively ϵ −1 and √ 1 . Indeed, considering the one dimensional Ornstein–Uhlenbeck process the estimator βˆ 2 ϵ,∆ ∆n √ 2 2 obtained in Section 3.1 is still the MLE, is consistent and satisfies n βˆϵ,∆ − β0 −−−−→ ϵ,∆→0
N (0, 2β04 ). In the sequel, we follow [21,10], which allow to study contrast estimators of parameters which converge at different rates: we prove the consistency of αˇ ϵ,∆ in Proposition 4.1 and the tightness of the sequence (αˇ ϵ,∆ − α0 )/ϵ in Proposition 4.2. From this, we deduce the consistency of βˇϵ,∆ in Proposition 4.3. Asymptotic normality for both estimators is finally proved in Theorem 4.1. For clarity, we omit the index n in ∆n . Using (2.13), let us consider now the contrast process Uˇ ϵ,∆ (α, β), (X tk ) = Uˇ ϵ,∆ (α, β) Uˇ ϵ,∆ (α, β) =
n
log det Σ (β, X tk−1 )
k=1
+
n 1 t Nk (X, α)Σ −1 (β, X tk−1 )Nk (X, α). ϵ 2 ∆ k=1
(4.1)
The minimum contrast estimators are defined as any solution of (αˇ ϵ,∆ , βˇϵ,∆ ) = argmin Uˇ ϵ,∆ (α, β).
(4.2)
(α,β)∈Θ
For studying these estimators, we need to state some lemmas on the behavior of Nk (X, α). 4.1. Asymptotic properties of Nk (X, α) Clearly, as ϵ goes to zero, Nk (X, α) converges to Nk (xα0 , α) by (2.8) under Pθ0 . Let us define the function Γ (α0 , α; t) = b(α0 , xα0 (t)) − b(α, xα (t)) − Then, functions Nk (xα0 , α) k≤n satisfy
∂b (α, xα (t))(xα0 (t) − xα (t)) ∈ R p . ∂x
(4.3)
61
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Lemma 4.1. Under (S2) Nk (xα0 , α) sup −−→ 0. − Γ (α0 , α; tk−1 ) − ∆ ∆→0 k∈{1,...,n},α∈K a Proof. First, note that Nk (xα0 , α) defined in (2.13) writes Nk (xα0 , α) = xα0 (tk ) − xα0 (tk−1 ) − (xα (tk ) − xα (tk−1 )) − Φα (tk , tk−1 ) − I p xα0 (tk−1 ) − xα (tk−1 ) .
(4.4)
Hence, using (4.3) we have 1 1 Nk (xα0 , α) = Γ (α0 , α; tk−1 ) + xα0 (tk ) − xα0 (tk−1 ) − b(α0 , xα0 (tk−1 )) ∆ ∆ 1 1 − (xα (tk ) − xα (tk−1 )) + b(α, xα (tk−1 )) + (Φα (tk , tk−1 ) ∆ ∆ ∂b − Ip − (α, xα (tk−1 )) xα0 (tk−1 ) − xα (tk−1 ) . ∂x The uniform approximation is then obtained using the analytical properties (A.1), (A.4) of xα and Φα given in Appendix A.1. Let us now study the properties of Nk (X, α). Lemma 4.2. Assume (S1)–(S3). Then, under Pθ0 , for all k (1 ≤ k ≤ n), 1 1 [Nk (X, α) − Nk (X, α0 )] = Nk (xα0 , α) + ϵ ∥α − α0 ∥ ηk , ∆ ∆ where ηk = ηk (α0 , α, ϵ, ∆) is Ftk−1 -measurable and satisfies that, under Pθ0 , as ϵ, ∆ → 0, supk∈{1,...,n},α∈K a ∥ηk ∥ is bounded in probability. Proof. Using (2.6) and (2.13), Nk (X, α) writes Nk (X, α) = Nk (X, α0 ) + Nk (xα0 , α) + (tk−1 ). Φα0 (tk , tk−1 ) − Φα (tk , tk−1 ) ϵ Rθ1,ϵ 0
1 Applying (A.1) yields that ∆ ∥Φα0 (tk , tk−1 ) − Φα (tk , tk−1 )∥ ≤ 2∥ ∂∂bx (α0 , xα0 (tk−1 )) − ≤ K ∥α − α0 ∥. Assumption (S3) ensuresthat (t, α) → ∂∂bx (α, xα (t)) is uni formly continuous on [0, T ] × K a , and (2.6) that supt∈[0,T ] Rθ1,ϵ (t) is bounded in probability 0 ∂b ∂ x (α, x α (tk−1 ))∥
under Pθ0 . The proof is achieved setting ηk = Rθ1,ϵ (tk−1 ) 0
is Ftk−1 -measurable.
Φα0 (tk ,tk−1 )−Φα (tk ,tk−1 ) 1,ϵ Rθ0 (tk−1 ) ∆∥α−α0 ∥
and noting that
The following Lemma concerns the properties of Nk (X, α0 ) Lemma 4.3. Assume (S1)–(S3). Then, under Pθ0 , Nk (X, α0 ) = ϵσ (β0 , X tk−1 ) Btk − Btk−1 + E k , where E k = E k (α0 , β0 ) satisfies, for m ≥ 2, E ∥E k ∥m |Ftk−1 ≤ Cϵ m ∆m . The proof of Lemma 4.3 follows the proof of [10] and is given in Appendix A.5. The properties of the derivatives of Nk (X, α) are given in the following lemma.
62
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Lemma 4.4. Assume (S1)–(S3). Then, for all i, j, 1 ≤ i, j ≤ a, as ϵ, ∆ → 0 1 ∂ Nk (X,α) ∂b (i) ∆ (α0 ) = − ∂α (α0 , xα0 (tk−1 )) + ϵ ζk,i + rk,i where ζk,i = ζk,i (α0 , ϵ, ∆) is Ftk−1 ∂αi i as ϵ, ∆ → 0, measurable and satisfies that supk∈{1,...,n} ζk,i is bounded in Pθ0 -probability and rk,i = rk,i (α0 , ∆) is deterministic and satisfies supk∈{1,...,n} rk,i −−−→ 0.
∆→0
(ii)
1 ∂ 2 Nk (X,α) ∆ ∂αi ∂α j (α0 )
is bounded in Pθ0 -probability.
Proof. Let us first prove (i). Using (2.8) and (4.4), we get ∂ xα (tk ) ∂ xα (tk−1 ) ∂ Nk (X, α) (α0 ) = − (α0 ) + (α0 ) − Φα0 (tk , tk−1 ) − I p ∂αi ∂αi ∂αi ×
∂ xα (tk−1 ) ∂Φα (tk , tk−1 ) (α0 ) + ϵ (α0 )Rθ1,ϵ (tk−1 ). 0 ∂αi ∂αi
1 ∂ Φα (tk ,tk−1 ) (α0 )Rθ1,ϵ (tk−1 ). Using (2.6) and (A.2) we obtain, as ∆ → 0, that Set ζk,i = ∆ ∂αi 0 1,ϵ ∂ 2 b(α,xα (tk−1 )) 1 ∂ Φα (tk ,tk−1 ) (α )∥ ≤ 2∥ ∥∆ (α , x (t ))∥, sup (t) R is bounded in Pθ0 0 0 α0 k−1 t∈[0,T ] θ0 ∂αi ∂αi ∂ x probability. It remains to study the deterministic part
E k,i = −
∂ xα (tk−1 ) ∂ xα (tk ) ∂ xα (tk−1 ) (α0 ) + (α0 ) − [Φα0 (tk , tk−1 ) − I p ] (α0 ). ∂αi ∂αi ∂αi
1 1 ∂ xα (tk ) According to (A.1) and (A.5), as ∆ → 0, ∆ (Φα0 (tk , tk−1 ) − I p ) (resp. ∆ ( ∂αi − ∂b(α,x (t )) ∂b α k−1 (α0 )). Noting that ∂ x (α0 , x α0 (tk−1 ))(resp. − ∂αi
∂ xα (tk−1 ) )(α0 )) is approximated by ∂αi ∂b(α,xα (t)) ∂b (α0 ) = ∂α (α0 , xα0 (t)) + ∂αi i
(α0 ) + rk,i
α (tk−1 )) we get that E k,i = − ∂b(α,x∂α i with supk∈{1,...,n} rk,i −−−→ 0, which achieves the proof. The proof of (ii) is given
in Appendix A.6.
∂ xα (t) ∂b ∂ x (α0 , x α0 (t)) ∂αi (α0 ),
∆→0
4.2. Study of the contrast process Uˇ ϵ,∆ First, consider the estimation of parameters present in the drift coefficient. Using (4.3), we define T t K 1 (α0 , α; β) = Γ (α0 , α; t)Σ −1 (β, xα0 (t))Γ (α0 , α; t)dt. (4.5) 0
Note that K 1 is non negative and by (S4), if α ̸= α0 , the function Γ (α0 , α, ·) is non identically null. Thus, K 1 (α0 , α, β) is equal to 0 if and only if α = α0 , and defines a contrast function for all β. Proposition 4.1. Assume (S1)–(S4). Then, as ϵ → 0 and ∆ → 0, under Pθ0 , using definition (4.1) for Uˇ ϵ,∆ (i) supθ ∈Θ ϵ 2 Uˇ ϵ,∆ (α, β) − Uˇ ϵ,∆ (α0 , β) − K 1 (α0 , α; β) → 0 in probability; (ii) αˇ ϵ,∆ −−−−→ α0 in probability. ϵ,∆→0
63
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
1 n t Proof. Let us prove (i). Using (4.1), we get ϵ 2 (Uˇ ϵ,∆ (α, β) − Uˇ ϵ,∆ (α0 , β)) = ∆ k=1 [Nk −1 (X, α) − Nk (X, α0 )]Σ (β, X tk−1 )[Nk (X, α) + Nk (X, α0 )]. Using that Nk (xα0 , α0 ) = 0, t N (x ,α) k α0 an application of Lemma 4.2 yields ϵ 2 (Uˇ ϵ,∆ (α, β) − Uˇ ϵ,∆ (α0 , β)) = ∆ nk=1 Σ −1
∆
Nk (xα ,α)
(β, xα0 (tk−1 )) ∆0 + R(α0 , α, β; ϵ, ∆). The first term of the above formula is a Riemann sum which converges by Lemma 4.1 to the function K 1 (α0 , α, β) defined in (4.5) as ∆ → 0. This convergence is uniform with respect to the parameters. Let us now study the remainder term. Using Lemma 4.2, we get that R(α0 , α, β; ϵ, ∆) = T1 + T2 + T3 , where n t N (x , α) Nk (xα0 , α) −1 k α0 Σ (β, X tk−1 ) − Σ −1 (β, xα0 (tk−1 )) , ∆ ∆ k=1 n t Vk ηk , T2 = ∆ϵ ∥α − α0 ∥ k=1 Nk (xα0 , α) −1 with Vk = Σ (β, X tk−1 ) + ϵ ∥α − α0 ∥ ηk , ∆ n t T3 = 2 Vk Nk (X, α0 ).
T1 = ∆
k=1
Using Lemma 4.1 yields |T1 | ≤ 2n∆ supt∈[0,T ],α∈K a ∥Γ (α0 , α; t)∥ supβ∈K b ∥Σ −1 (β, X tk−1 )− Σ −1 (β, xα0 (tk−1 ))∥. By the Taylor stochastic formula this supremum goes to zero in Pθ0 probability as ϵ → 0. The term T2 contains the random variables ηk and Vk which are uniformly bounded in Pθ0 -probability by Lemma 4.2. Hence |T2 | ≤ ϵT supk∈{1,...,n},α∈K a ηk supk∈{1,...,n} ∥Vk ∥ which yields that T2 goes to 0 as ϵ, ∆ → 0. Finally, we prove that T3 goes to zero in Pθ0 -probability, by setting the more general result: if
sup k∈{1,...,n}
∥Vk ∥ < ∞,
n k=1
Vk Nk (X, α0 ) −−−−→ 0,
t
ϵ,∆→0
in Pθ0 -probability.
(4.6)
Indeed, Lemma 4.3 yields E tVk Nk (X, α0 )|Ftk−1 = tVk E E k |Ftk−1 ≤ supk∈{1,...,n} ∥Vk ∥ E ∥E k ∥2 |Ftk−1 ≤ C∆ϵ. ≤ C ′ ∆ϵ 2 . Using (A.9) in Appendix A.5 yields E ( tVk Nk (X, α0 ))2 |Ft k−1
Set X n,k = tVk Nk (X, α0 ). We get (4.6) using an application of Lemma 9 in [8] (Lemma A.2 in the Appendix). All convergences above are uniform with respect to θ and the proof of (i) is achieved. Let us now prove (ii). The uniformity with respect to α in (i) ensures that the continuity modulus of Uˇ ϵ,∆ is dominated, as ϵ, ∆ → 0, by modulus of K 1 . By compacity of the continuity K a , we can extract a sub-sequence of αˇ ϵ,∆ , αˇ ϵk ,∆k k≥1 with αˇ ϵk ,∆k −−−→ α∞ ∈ K a . Then, by k→∞
definition (4.2) of αˇ ϵ,∆ , 0 ≤ K 1 (α0 , α∞ , β) ≤ K 1 (α0 , α0 , β), which yields, by (S4), α∞ = α0 . So any convergent sub-sequence of αˇ ϵ,∆ goes to α0 which achieves the proof. The following Proposition studies the tightness of ϵ −1 αˇ ϵ,∆ − α0 with respect to β Proposition 4.2. Assume (S1)–(S4). If Ib (α0 , β0 ) defined in (3.9) is invertible, as ϵ, ∆ → 0, supβ∈K b ϵ −1 αˇ ϵ,∆ − α0 is bounded in Pθ0 -probability.
64
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Using definition (3.9) for Ib , the proof given in Appendix A.7 relies on the two properties, for all β ∈ K b ∀(i, j) ∈ {1, . . . , a}2 , ϵ
ϵ2
∂ 2 Uˇ ϵ (α0 , β) −−−−→ 2Ib (α0 , β)i, j , ∂αi α j ϵ,∆→0
(4.7)
∂ Uˇ ϵ (α0 , β) −−−−→ N (0, 4Ib (α0 , β)) . ∂α ϵ,∆→0
(4.8)
For studying the estimation of β, let us define 1 T K 2 (α0 ; β0 , β) = Tr(Σ −1 (β, xα0 (t))Σ (β0 , xα0 (t)))dt T 0 1 T log det(Σ −1 (β, xα0 (t))Σ (β0 , xα0 (t))) dt − p. − T 0
(4.9)
Using the inequality for invertible symmetric p × p matrices A, Tr(A)− p −log(det(A)) ≥ 0, with equality if and only if A = I p , we get that, for all β, K 2 (α0 ; β0 , β) is non negative and is equal to 0 if, for all t, Σ (β0 , xα0 (t)) = Σ (β, xα0 (t)), which implies β = β0 by (S5). Proposition 4.3. Assume (S1)–(S5). Then, if Ib (α0 , β0 ) is invertible, the following holds in Pθ0 -probability, using (4.1), (4.2) and (4.9) (i) supβ∈K b n1 Uˇ ∆,ϵ (αˇ ϵ,∆ , β) − Uˇ ∆,ϵ (αˇ ϵ,∆ , β0 ) − K 2 (α0 ; β0 , β) −−−−→ 0 ϵ,∆→0
(ii) βˇϵ,∆ −−−−→ β0 . ϵ,∆→0
Proof. Let us first prove (i). Using (4.1), we get A2 (α, β0 , β) with A1 (β0 , β) =
1 n
Uˇ ∆,ϵ (α, β) − Uˇ ∆,ϵ (α, β0 ) = A1 (β0 , β) +
n 1 log det Σ (β, X tk−1 )Σ −1 (β0 , X tk−1 ) , n k=1
(4.10)
n 1 t N (X, α) Σ −1 (β, X tk−1 ) k n∆ϵ 2 k=1 − Σ −1 (β0 , X tk−1 ) Nk (X, α). (4.11) Using that, under (S2), x → log det Σ (β, x)Σ −1 (β0 , x) is differentiable on U , an application of the Taylor stochastic formula yields (t )) with A1 (β0 , β) = T1 (∆ nk=1 log(det[Σ (β, xα0 (tk−1 ))Σ −1 (β0 , xα0 (tk−1 ))]) + ϵ Rα1,ϵ 0 ,β,β0 k−1
A2 (α, β0 , β) =
∥Rα1,ϵ ∥ uniformly bounded in Pθ0 probability. Hence, A1 (β0 , β), as a Riemann sum, con0 ,β,β0 T verges to T1 0 log(det[Σ (β, xα0 (t))Σ −1 (β0 , xα0 (t))])dt as ϵ, ∆ → 0. n t Applying Lemma 4.3 to Nk (X, α0 ) yields A2 (α0 , β0 , β) = ∆ T k=1 Uk MkUk + n 1 t −1 (β, X −1 (β , X = √1 Btk − Btk−1 and tk−1 ) − Σ tk−1 ))E k with Uk 0 k=1 E k (Σ ϵ2 T ∆
Mk = tσ (β0 , X tk−1 )(Σ −1 (β, X tk−1 ) − Σ−1 (β0 , X tk−1 ))σ (β0 , X tk−1 ). The random vectors Uk are N 0, I p independent of Ftk−1 . Hence, using that for U ∼ N 0, I p E( tU MU ) = Tr(M), we get E tUk Mk Uk |Ftk−1 = Tr(Mk ) = Tr(Σ −1 (β, X tk−1 )Σ (β0 ,
65
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
T X tk−1 ) − I p ). The first term of A2 (α0 , β0 , β) converges to T1 0 Tr(Σ −1 (β, xα0 (t))Σ (β0 , xα0 (t)) dt − p). Joining this result with the one for A1 , we obtain consistency towards K 2 defined in (4.9). The detailed proofs for consistency of A2 (α0 , β0 , β) and control of the error term A2 (αˇ ϵ,∆ , β0 , β) − A2 (α0 , β0 , β) are given in Appendix A.8. The proof of (ii) is a repetition of the one of Proposition 4.1-(ii). Let us now study the asymptotic properties of our estimators. Define the b × b matrix T ∂Σ −1 ∂Σ −1 1 Σ Σ (β0 , xα0 (s)) ds. (4.12) Tr Iσ (α0 , β0 )i, j = 2T 0 ∂βi ∂β j Theorem 4.1. Assume (S1)–(S5). If Ib (α0 , β0 ), Iσ (α0 , β0 ) defined in (3.9), (4.12) are invertible, we have under Pθ0 , in distribution −1 −1 ϵ αˇ ϵ,∆ − α0 Ib (α0 , β0 ) 0 √ → N 0, . n βˇϵ,∆ − β0 0 Iσ−1 (α0 , β0 ) ∂ Uϵ 0 ,β0 ) We have already studied the limits as ϵ, ∆ → 0 of ϵ 2 ∂α (α0 , β0 ) and ϵ ∂ Uϵ (α in ∂α iαj Lemma 4.2. These results lead to αˇ ϵ,∆ asymptotic normality. For βˇϵ,∆ we have to set that 2
∂ Uˇ ∆,ϵ √1 (θ0 ) n ∂βi
ˇ
ˇ
1 ∂ Uˇ ϵ,∆ n ∂βi ∂β j (θ0 ) → 2Iσ (θ0 )i, j in probability. ∂ Uˇ ϵ,∆ that √ϵn ∂βi ∂α (θ0 ) → 0 in probability. Details j
→ N (0, 4Iσ (θ0 )) in distribution and
Finally, for crossed-terms it is sufficient to prove are provided in Appendix A.9. 5. Examples
5.1. Exact calculations on the Cox–Ingersoll–Ross model (CIR) Consider the diffusion on R+ defined for α > 0 by d X t = α X t dt + ϵβ X t d Bt , X 0 = x0 . √ We have b(α, x) = αx, σ (β, x) = β x and xα (t) = x0 eαt . The function Φα defined in α∆ α,β (2.4) is explicit with Φα (t2 , t1 ) = eα(t2 −t1 ) . Σ (β, x) = β 2 x and Sk = x0 β 2 e α ∆−1 eαk ∆ depends on k (contrary to the Ornstein–Uhlenbeck process in Section 3.1). This is an AR(1) process, but the noise is not homoscedastic. Let us define a = eα ∆ . We have then n n 2 a¯ ϵ,∆ = k=1 X tk−1 . With notations introduced in previous sections for k=1 X tk X tk−1 / 1 the different estimators, we have for (3.2): α¯ ϵ,∆ = ∆ ln(a). ¯ No explicit formula can be obtained for α˜ ϵ,∆ and αˇ ϵ,∆ defined in (3.11) and (4.2). αT
We can also calculate the asymptotic covariance matrix (3.9): Ib (α, β) = x0 (eβ 2 α−1) . Noting 2 a. Setting Jb (α, β) = that Dk (α) = ∆eαk ∆ , we get for (3.13): I∆ (α, β) = Ib (α, β) × ln(a) a−1 3 2 2αT 2 3x0 (e −1) ln(a) 4a a −1 , we obtain for (3.6): J (α, β) = J (α, β) × . We remark that b ∆ 3 a−1 4αβ 2 e3αT −1 a 2 −1 Jb (α, β) ≤ Ib (α, β), ∀T > 0. So, as expected, ∀∆ > 0, J∆ (α, β) ≤ I∆ (α, β). Hence, contrast estimation with prior knowledge on the model multiplicativity (see Section 3.3.2) leads to a more accurate confidence interval than the general case with no available information on β.
66
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Table 1 Mean (standard deviation) of minimum contrast estimators for parameters of (5.1) based on 400 simulated trajectories with µ1 = µ2 = m = κ1 = κ2 = 1, ρ = 0.3, using ϵ = 0.01 and n = 10, 20, 50, 100. ϵ = 0.01 n
α¯ (β unknown) µ¯ 1 µ¯ 2
m¯
α(β ˜ = β0 fixed) µ˜ 1 µ˜ 2 m˜
ˇ (Small Delta) (α, ˇ β) µˇ 1 µˇ 2 mˇ
κˇ 12
κˇ 22
ρˇ
n = 10
1.001 (0.01)
1.007 (0.13)
0.996 (0.04)
1.000 (0.01)
1.005 (0.12)
0.997 (0.04)
1.000 (0.01)
1.013 (0.13)
0.999 (0.04)
0.971 (0.45)
0.728 (0.37)
0.328 (0.33)
n = 20
1.000 (0.01)
1.012 (0.12)
0.999 (0.04)
1.000 (0.01)
1.003 (0.12)
0.997 (0.04)
1.000 (0.01)
1.012 (0.12)
0.999 (0.04)
0.973 (0.32)
0.853 (0.29)
0.306 (0.23)
n = 50
1.000 (0.01)
1.012 (0.12)
0.999 (0.04)
1.000 (0.01)
1.000 (0.12)
0.996 (0.04)
1.000 (0.01)
1.012 (0.12)
0.999 (0.04)
0.982 (0.20)
0.910 (0.19)
0.302 (0.14)
n = 100
1.000 (0.01)
1.013 (0.12)
0.999 (0.04)
1.000 (0.01)
0.995 (0.12)
0.995 (0.04)
1.000 (0.01)
1.011 (0.12)
0.999 (0.04)
1.001 (0.14)
0.953 (0.14)
0.310 (0.09)
Table 2 Mean (standard deviation) of minimum contrast estimators for parameters of (5.1) based on 400 simulated trajectories with µ1 = µ2 = m = κ1 = κ2 = 1, ρ = 0.3, using ϵ = 0.1 and n = 10, 20, 50, 100. ϵ = 0.1 n
α¯ (β unknown) µ¯ 1 µ¯ 2
m¯
α(β ˜ = β0 fixed) µ˜ 1 µ˜ 2 m˜
ˇ (Small Delta) (α, ˇ β) µˇ 1 µˇ 2 mˇ
κˇ 12
κˇ 22
ρˇ
n = 10
1.000 (0.10)
1.723 (1.23)
0.892 (0.43)
1.005 (0.10)
1.052 (0.92)
0.667 (0.49)
0.998 (0.10)
1.678 (1.23)
0.997 (0.41)
0.927 (0.43)
0.769 (0.36)
0.422 (0.23)
n = 20
1.001 (0.10)
1.754 (1.24)
0.922 (0.40)
1.011 (0.10)
0.930 (0.90)
0.590 (0.51)
1.000 (0.10)
1.718 (1.20)
0.930 (0.39)
0.966 (0.29)
0.864 (0.29)
0.344 (0.18)
n = 50
1.000 (0.10)
1.760 (1.23)
0.928 (0.40)
1.029 (0.10)
0.509 (0.70)
0.342 (0.61)
1.001 (0.10)
1.82 (1.18)
0.994 (0.31)
0.971 (0.09)
0.832 (0.08)
0.167 (0.07)
n = 100
1.001 (0.10)
1.778 (1.23)
0.933 (0.40)
1.051 (0.10)
0.122 (0.27)
0.410 (1.22)
1.000 (0.10)
1.825 (1.19)
0.987 (0.33)
0.979 (0.07)
0.846 (0.06)
0.156 (0.05)
5.2. A two factor model We consider here the same example as [10] (see e.g. [19]). Let us define X t = (Yt , Rt ) as the solution on [0, 1] of dYt = (Rt + µ1 ) dt + ϵκ1 d Bt1 , Y0 = y0 ∈ R d Rt = µ2 (m − Rt ) dt + ϵκ2 Rt ρd Bt1 + 1 − ρ 2 d Bt2 ,
R0 = r0 > 0.
(5.1)
κ2 κ1 κ2 ρ Rt Hence, we get that Σ ((κ1 , κ2 , ρ), (Yt , Rt )) = κ κ ρ1 R . 2 κ2 R t t 1 2 For r0 ̸= m the diffusion process satisfies (S1)–(S5) and we can estimate parameters α = (µ1 , µ2 , m) and β = (κ12 , κ22 , ρ) with our minimum contrast estimators defined in (3.2), (3.11) and (4.2). As [10], we investigate the case of µ1 = µ2 = m = κ1 = κ2 = 1, ρ = 0.3, and (y0 , r0 ) = (0, 1.5), for two values of ϵ, 0.1 and 0.01. Similarly, we present in Tables 1 and 2 contrast estimators (empirical means and standard deviations) over 400 runs of the diffusion process (5.1) simulated based on a Euler scheme. For each of these simulations, different values of the number of observations n are used (n = 10, 20, 50, 100 observations) to infer parameters.
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
67
Results of Gloter and Sorensen were reproduced using their contrast based on an expansion at order 2 of the function defined in Section 2.3 in [10]. For ϵ = 0.01, Table 1 results are very similar to those in [10]. When ϵ = 0.1, we can ˇ results exhibit a lack of accuracy on µ2 , simdistinguish two different patterns. For α¯ and (α, ˇ β) ilarly to those of [10]. The second pattern concerns α, ˜ where the bias on µ2 (for n = 10, 20, 50), ˇ is partially balanced by an increase in the uncertainty of less important than for α¯ and (α, ˇ β), m. These results show that prior knowledge on the model (more specifically fixing the diffusion parameters to their true value) leads to a different behavior of the estimator. From a theoretical α,β point of view, this is explained only by the shape of Sk 0 , which does not consider equal weights for all observations. In addition, a decrease in accuracy is obtained when increasing the number of observations. This could be explained by the behavior of Nk (X, α) which depends on the variation of a slope between two consecutive data points. Indeed, in this particular model (5.1), where the drift is almost linear (and hence the local gradient close to zero), variations of local slopes increase with the number of observations randomly distributed around the global slope. When performing the estimation on a longer time interval with the same number of observations ([0, 5], n = 50), the decrease in accuracy following the increase in the number of observations is partially counterbalanced. 5.3. Epidemic models and data √ Here, we present an example where ϵ, corresponding to the normalizing constant 1/ N has an intrinsic meaning. One of the simplest models for the study of epidemic spread is the S I R (Susceptible–Infectious–Removed from the infectious chain) model, where each individual can find himself at a given time in one of these three mutually exclusive health states. One classical representation of the S I R model in closed population is the bi-dimensional continuous-time Markovian jump process: X t = (St , It ) with initial state X 0 = (N − m, m) and λ N
SI
γI
transitions (S, I ) −−→ (S − 1, I + 1) and (S, I ) −→ (S, I − 1). The normalization of this process based on the population size N asymptotically leads to an ODE system: x(t) = (s(t), i(t), r (t) = 1 − s(t) − i(t)), with x(0) = (1 − m/N , m/N , 0), which −λx1 x2 is a solution of (2.3) for b((λ, γ ), x) = λx1 x2 − γ x2 . λx1 x2 −λx1 x2 Before passing to the limit, by defining Σ ((λ, γ ), x) = −λx , we can write the x λx x + γ x 1 2 1 2 2 infinitesimal generator of the renormalized Markovian jump process (X (t)/N ) as the solution of A N ( f (x)) = N λx1 x2 ( f (x1 − N1 , x2 + N1 ) − f (x1 , x2 )) + N γ x2 ( f (x1 , x2 − N1 ) − f (x1 , x2 )). We (2) (2) (3+) also have A N ( f (x)) = A N ( f (x)) + A N ( f (x)), with A N ( f (x)) = b((λ, γ ), x) ▽ f (x) + 2 (3+) ∂ f 2 1 contains all the derivative terms of order 3 i, j=1 ∂ xi ∂ x j (x)Σ ((λ, γ ), x)i, j and where A N 2N and above. Then, approximating the renormalized Markovian jump process by a Markov process (2) with generator A N , leads to a diffusion process X t = (st , i t ) with drift b and diffusion matrix Σ , which can be rewritten as the solution of: 1 dst = −λst i t dt + √ λst i t d B1 (t) N 1 1 di t = (λst i t − γ i t )dt − √ λst i t d B1 (t) + √ γ i t d B2 (t). N N
(5.2)
68
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Here, λ and γ represent transmission and recovery rates, respectively and are the two parameters to be estimated. So, system (5.2) can naturally be viewed as a diffusion with a small diffusion coefficient (ϵ = N −1/2 ). Moreover, parameters to be estimated are both in drift and diffusion coefficients, with the specificity that α = (λ, γ ) = β (with the notations of (2.1)). Besides, since epidemics are discretely observed, the statistical setting is defined by data on a fixed interval [0, T ], at times tk = k∆, with T = n∆ and n the number of data points (∆ not necessarily small). The performances of our method for epidemic models in the case of a fixed sampling interval ∆ and for ∆ → 0 were evaluated on discretized exact simulated trajectories of the pure Markov jump process X t and compared to estimators provided by the method of [10]. We considered the Maximum Likelihood Estimator (MLE) [1] of the Markov Jump process, built using all the jumps, as the reference. Simulated data were generated by using the Gillespie algorithm [9] after specifying (N , m, λ, γ ). Two population sizes were considered N ∈ [100; 10000] and m/N was set to 0.01 for all simulations. (λ, γ ) were chosen such that their ratio takes a realistic value. Indeed, λ/γ defines for the SIR model used here a key parameter in epidemiology, R0 , which represents the mean number of secondary infections generated by a primary case in a totally susceptible population. We have chosen R0 = 1.2, γ = 1/3 (days−1 ) (and hence λ = 0.4 (days−1 )) to represent a realistic scenario (parameter values close to influenza epidemics). We considered T = 50 days, in order to capture the pattern of an average trajectory for the case N = 100 (shorter epidemic duration than for N = 10 000). Three values of n were tested: 10, 50, 100. For each simulated scenario, means and theoretical confidence intervals (95%) for λ and γ were calculated on 1000 runs for each parameter and for each estimation method. Figs. 1 and 2 summarize numerical results (only drift estimators are provided). According to our findings, contrast based estimators are very effective even for a few amount of observations, compared with the MLE. As expected, for all scenarios, we can see an improvement in the accuracy as the number of observations increases for estimators α, ¯ αˇ and the estimator of [10]. On the contrary, α-estimators ˜ accuracy decreases as the number of observations increases. This α,β phenomenon is due to the shape of Sk (defined in (2.11)), which confers greater weights to the beginning and the end of data (as for the two factor model (5.1) above). For N = 10 000, it is important to notice that the bias is quite negligible from an epidemiological point of view. Indeed, the bias for 1/γ has an order of magnitude of one hour whereas an accuracy of one day would be acceptable. For the case N = 100, only emerging trajectories were considered, based on an epidemiological relevant criteria (epidemic size above 10% of the population size). We can remark that MLE provides less satisfactory estimations for γ . Our contrast estimators for n = 100 perform globally well, except for α. ˜ But even in this last case, contrary to the MLE, the ratio λ/γ is close to the true value despite a bias on both λ and γ separately. Our results are promising in the epidemiological context, since the minimum contrast estimators are both accurate and not computationally expensive, even for very noisy data (N = 100). Ongoing research is devoted to the extension of these findings to the more realistic case of partially observed epidemic data. Acknowledgment Partial financial support for this research was provided by Ile de France Regional Council under MIDEM project in the framework DIM Malinf.
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
69
Fig. 1. Mean values and theoretical CI (95%) of the estimators of λ and γ . Labels are 0: MLE (with all data available), 1: α¯ (β unknown), 2: α˜ (β0 known), 3: α˜ (β = α), 4: αˇ (small ∆) and 5: the estimator of drift parameters in [10]. Results based on 1000 runs for N = 10 000, T = 50, λ = 0.4, γ = 1/3 and for n = 10, 50, 100.
Fig. 2. Mean values and theoretical CI (95%) of the estimators of λ and γ . Labels are 0: MLE (with all data available), 1: α¯ (β unknown), 2: α˜ (β0 known), 3: α˜ (β = α), 4: αˇ (small ∆) and 5: the estimator of drift parameters in [10]. Results based on 1000 runs for N = 100, T = 50, λ = 0.4, γ = 1/3 and for n = 10, 50, 100.
Appendix A.1. Some useful analytical properties We state here a series of regularity properties of (α, t) → Φα (t, t0 ) and xα (t). Let us first consider Φα . A Taylor expansion of t → Φα (t, tk−1 ) yields using (2.4) Φα (tk , tk−1 ) = I p + ∆ ∂∂bx (α, xα (tk−1 )+∆r (α, tk−1 , ∆)) where r (α, tk−1 , ∆) converges uniformly to 0 on [0, T ]×K a . Hence, 1 ∂b Φ (t , t ) − I − (α, x (t )) (A.1) p α k−1 −−−→ 0. ∆ α k k−1 ∂x ∆→0 As a consequence (α, t) → Φα (t, t0 ) is uniformly bounded on K a × [0, T ]. Consider now the properties of α → Φα (t, t0 ). Lemma A.1. Under the assumption that b(α, x) ∈ C 3 (K a × U ), the function α → Φα (t, t0 ) is in C 2 (K a ).
70
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Proof. Classically, we just prove here that Φα is continuous w.r.t. α if α → ∂∂bx (α, xα (t)) is t continuous. Set Mh (t) = Φα+h (t, t0 ) − Φα (t, t0 ). Using (2.4), we have Mh (t) = t0 ∂∂bx (α + t h, xα+h (s))Mh (s)ds + t0 ( ∂∂bx (α + h, xα+h (s)) − ∂∂bx (α, xα (s)))Φα (s, t0 )ds. By (A.1) and the continuity of t → Φα (t, t0 ), we can define K 0 = sup K a ×[0,T ] ∥Φα (t, t0 )∥ and K = t sup K a ×[0,T ] ∂∂bx (α, xα (t)). Setting γh (t) = K 0 t0 ∂∂bx (α + h, xα+h (s)) − ∂∂bx (α, xα (s)) ds, t we have ∥Mh (t)∥ ≤ γh (t) + K 0 ∥Mh (s)∥ ds. Applying Gronwall’s inequality to ∥Mh ∥ yields t ∥Mh (t)∥ ≤ γh (t) + K t0 γh (s)e K (t−s) ds. By the Lebesgue dominated convergence theorem, γh (t) goes to 0 as h → 0, which implies the same property for Mh (t). The existence of derivatives for Φα w.r.t. α are obtained similarly. Moreover, expanding in Taylor series at point tk−1 , they satisfy
∂ Φα ∂ 2 Φα ∂αi , ∂αi ∂α j
1 ∂Φα (tk , tk−1 ) ∂ 2 b(α, xα (tk−1 )) ∀i ≤ a, (α0 ) − (α0 , xα0 (tk−1 )) −−−→ 0, ∆ ∂αi ∂ x∂αi ∆→0 1 ∂ 2 Φα (tk , tk−1 ) ∂ 3 b(α, xα (tk−1 )) ∀i, j ≤ a, (α0 ) − (α0 , xα0 (tk−1 )) −−−→ 0 ∆ ∂αi ∂α j ∂ x∂αi ∂α j ∆→0
(A.2)
(A.3)
and all left terms are bounded as ∆ → 0. Let us now consider xα and its derivatives. Using (2.3), and expanding t → xα (t) in Taylor series at point tk−1 , as above, yields 1 (xα (tk ) − xα (tk−1 )) − b(α, xα (tk−1 )) −−−→ 0, (A.4) ∆ ∆→0 1 ∂ xα (tk ) ∂ xα (tk−1 ) ∂b(α, xα (tk−1 )) (A.5) − (α ) − (α ) 0 0 −−−→ 0, ∆ ∂αi ∂αi ∂αi ∆→0 2 1 ∂ xα (tk ) ∂ 2 xα (tk−1 ) ∂b(α, xα (tk−1 )) (A.6) (α0 ) −−−→ 0. ∆ ∂α ∂α (α0 ) − ∂α ∂α (α0 ) − ∂αi ∂α j ∆→0 i j i j A.2. Proof of Corollary 2.2 The proof of (i) is given in [6, Theorem 2.2] but we need a more refined result on the incre(t). For sake of clarity, we omit in the sequel θ and α (and therefore denote ∂∂ xf (x0 ) ments of Rθ2,ϵ 0 by f ′ (x0 )), and we denote by ∥∥ either a norm on R p or on M p (R). We study successively Rθ1,ϵ (t) and Rθ2,ϵ (t). 0 0 t ϵ Using X t = x(t) + ϵ R 1,ϵ (t) and (2.1), R 1,ϵ satisfies R 1,ϵ (t) = 0 1ϵ (b(x(s) + ϵ R 1,ϵ (s)) − t b(x(s)))ds + 0 σ (x(s) + ϵ R 1,ϵ (s))d Bs , R 1,ϵ (0) = 0. Hence, R 1,ϵ satisfies a stochastic differential equation with drift dϵ (t, z) and diffusion coefficient vϵ (t, z) where dϵ (t, z) = 1ϵ (b(x(t) + z) − b(x(t))) and vϵ (t, z) = σ (x(t) + ϵz). Using on U , these two coefficients of b and σ are uniformly bounded that the derivatives satisfy dϵ (t, z) − dϵ (t, z ′ ) ≤ supx∈U b′ (x) z − z ′ , vϵ (t, z) − vϵ (t, z ′ ) ≤ supx∈U ∥σ ′ (x)∥ϵ z− z ′ and ∥dϵ (t, z)∥2 +∥vϵ (t, z)∥2 ≤ C1 (1+∥z∥2 ) where C1 = max(supx∈U b′ (x) , supx∈U σ ′ (x)).
71
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
Hence, X 1 is finite and independent of ϵ. An application of Theorem yields that 2.9 of [14] 2 there is C a constant depending only on C1 and T such that ∀t ≤ T, E R 1,ϵ (t) ≤ CeCt . Let us now study R 2,ϵ (t). Using (2.6) we get, t t R 2,ϵ (t) = d˜ϵ (s, ω, R 2,ϵ (s))ds + v˜ϵ (s, ω)d Bs , 0
R 2,ϵ (0) = 0,
(A.7)
0
with d˜ϵ (s, ω, z) = ϵ12 b(x(t) + ϵg(t, ω) + ϵ 2 z) − b(x(t)) − ϵb′ (x(t))g(t, ω) , v˜ϵ (s, ω) = 1ϵ (σ (x(t) + ϵ R 1,ϵ (t, ω)) − σ (x(t))). First, let us check that the stochastic integral above is well defined. For this, we comt pute E[∥ 0 v˜ϵ (s, ω) tv˜ϵ (s, ω)ds∥]. Applying a Taylor expansion to σ (x(s)) yields v˜ϵ (s, ω) = ′ 1,ϵ 1 ′ 1,ϵ (s). Hence, ∥v˜ (s, ω)∥ ≤ sup σ (x) R (s) and ( 0 σ (x(s) + uϵ R 1,ϵ (s))du)R ϵ x∈U Ct 2 2 t t σ ′ (x) E R 1,ϵ (s) ds ≤ C 2 e −1 . E v˜ϵ (s, ω) tv˜ϵ (s, ω)ds ≤ sup x∈U
0
1
0
C
Consider now the drift term d˜ϵ (s, ω, z). A Taylor expansion with integral remainder yields ϵ 2 d˜ϵ (s, ω, z) = b(x(t) + ϵg(t) + ϵ 2 z) − b(x(t) + ϵg(t)) ′ + b(x(t) + ϵg(t)) − b(x(t)) − ϵb (x(t))g(t) 1
= ϵ2
0
b′ (x(t) + ϵg(t) + uϵ 2 z)du z 1
+ ϵ g(t) 2t
(1 − u)b (x(t) + uϵg(t))du g(t). ′′
0
Hence, d˜ϵ (s, ω, z) is bounded independently of ϵ by d˜ϵ (s, ω, z) ≤ supx∈U b′ (x) ∥z∥ + 2 2 t supx∈U b′′ (x) ∥g(t)∥2 . Now, using (A.7), we get R 2,ϵ (t) ≤ 2 0 d˜ϵ (s, ω, R 2,ϵ (s))ds + 2 t 0 v˜ϵ (s, ω)d Bs . We already prove that the last term above has a finite expectation. It remains to study the first term. 2 t t E[∥R 2,ϵ (t)∥2 ] ≤ 2C12 0 E[∥R 2,ϵ (s)∥2 ]ds + H (t) with H (t) = 2 supx∈U b′′ (x) 0 E[∥g Ct
(t)∥4 ]ds + C12 e C−1 . Applying Gronwall’s inequality yields E[∥R 2,ϵ (t)∥2 ] ≤ H (t) + 2C12 t 2C12 (t−s) ds. Since g(s) is a continuous Gaussian process, sups∈[0,T ] E[∥g(s)∥4 ] is finite, 0 H (s)e and |H (t)| ≤ K t, so that E[∥R 2,ϵ (t)∥2 ] ≤ K ′ t with K ′ = K (1 + 2C12 ). t+h t+h Consider now (ii), R 2,ϵ (t + h) − R 2,ϵ (t) = d˜ϵ (s, ω, R 2,ϵ (s))ds + v˜ϵ (s, ω)d Bs . t
t
2 ]|F ] = E[E [∥R 2,ϵ (h)∥2 ]]. E[∥R 2,ϵ (t + h) − R 2,ϵ (t)∥2 ] = E[E[∥R 2,ϵ (t +h) − R 2,ϵ (t)∥ t Xt 2,ϵ 2 By the Markov property of X t we get that E X t R (h) ≤ K ′ h.
A.3. Proof of Proposition 3.1 Let us first prove (i). The processes U¯ ϵ,∆ (α, (X tk (ω))) are almost surely continuous with continuity modulus w(U¯ ϵ,∆ , η) = sup{U¯ ϵ,∆ (α, ·) − U¯ ϵ,∆ (α ′ , ·) , (α, α ′ ) ∈ K¯ a2 , α − α ′ ≤ η}.
72
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
We have |U¯ ϵ,∆ (α, (X tk )) − U¯ ϵ,∆ (α ′ , (X tk ))| ≤ |U¯ ϵ,∆ (α, (X tk )) − U¯ ϵ,∆ (α, (xα0 (tk )))| + |U¯ ϵ,∆ (α ′ , (X tk )) − U¯ ϵ,∆ (α ′ , (xα0 (tk )))| + | K¯ ∆ (α0 , α) − K¯ ∆ (α0 , α ′ )|. Using formula (2.8), Nk (X, α) − Nk (xα0 , α) = ϵ Rθ1,ϵ (tk ) − Φα (tk , tk−1 )ϵ Rθ1,ϵ (tk−1 ) and 0 0 U¯ ϵ,∆ (α, (X t )) − U¯ ϵ,∆ (α, (xα (tk ))) k 0 n 1 Nk (X, α) − Nk (xα , α) Nk (xα , α) + Nk (X, α) ≤ 0 0 ∆ k=1 2n I p + Φα (tk , tk−1 ) Nk (xα , α) . sup (t) sup ϵ Rθ1,ϵ ≤ 0 0 ∆ t∈[0,T ] α∈K a ,k∈{1,...,n} Let φ(η) = sup{| K¯ ∆ (α0 , α) − K¯ ∆ (α0 , α ′ )|, (α, α ′ ) ∈ K¯ a2 , α − α ′ ≤ η}, we obtain w(U¯ ϵ,∆ , η) −−→ φ(η) under Pθ0 . Assumptions (S1)–(S4) ensure that φ(η) −−→ 0. The proof of (i) is ϵ→0
η→0
achieved using Theorem 3.2.8 [4]. Consider now the second derivatives of U¯ ϵ (α, ·). Noting that, ∂ 2 Nk (X, α) ∂ Dk,i (α) ∂Φα (tk , tk−1 ) ∂ xα (tk−1 ) (α0 ) = ∆ (α0 ) + (α0 ) (α0 ) ∂αi ∂α j ∂α j ∂αi ∂α j ∂ 2 Φα (tk , tk−1 ) (α0 ) X tk−1 − xα0 (tk−1 ) , + ∂αi ∂α j we have √ ∂ 2 U¯ ϵ (α0 ) = 2ϵ ∆E 1 + 2∆E 2 , ∂αi ∂α j 1 t∂ 2 Nk (X,α) √1 with E 1 = nk=1 ∆ ∂αi ∂α j (α0 ) ϵ ∆ Nk (X, α0 ) , t 1 ∂ Nk (X,α) 1 ∂ Nk (X,α) E 2 = nk=1 ∆ (α ) (α ) 0 0 . ∂αi ∂α j ∆ k (X,α) ∂αi ∂α j (α0 ) is bounded in probability, yields that E 1 and E 2 are ∂ 2 U¯ ϵ −→ 2∆ nk=1 tDk,i (α0 )Dk, j (α0 ) = 2M∆ (α0 )i, j . ∂αi ∂α j (α0 ) − ϵ→0 The consistency result obtained in (i), and the uniform deriva 2continuity of α → Φα and its ∂ U¯ ϵ ∂ 2 U¯ ϵ tives (see Lemma A.1), yields that, under Pθ0 , supt∈[0,1] ∂α 2 (α0 + t (α¯ ϵ,∆ − α0 )) − ∂α 2 (α0 )
1 Using (3.7), (3.8) and that ∆ bounded in probability. Hence,
t∂ 2 N
−−→ 0, which completes the proof of (ii). ϵ→0
A.4. Proof of Proposition 3.2 The proof of (i) is a repetition of the proof of Proposition 3.1. The proof of (ii) contains ∂U α, f (α) additional terms due to the presence of Sk in the contrast process: ϵ −1 ∂α∆,ϵ (α0 ) = i √ n t∂ N (X,α) α ,β N (X,α ) n 1 k T1i + ϵ T2i , with T1i = 2 ∆ k=1 ( ∆ ∂α (α0 ))(Sk 0 0 )−1 ( k √ 0 ), T2i = k=1 i
ϵ ∆ α, f (α) −1 ∂ (Sk ) (X,α ) N (X,α ) k√ 0 (α0 ) k √ 0 . ∂αi ϵ ∆ ϵ ∆ α, f (α) For all i, the term T2i is bounded in probability since Sk inherits from Φα its differentiaNk √ (X,α) bility with respect to α and that is bounded in probability by (3.7). Using now (3.7) and ϵ ∆ (3.8), we obtain, as before, that T1i 1≤i≤a −−→ N (0, 4I∆ (α0 , β0 )). ϵ→0 tN
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80 ∂ 2 U∆,ϵ (α0 ,β0 ) ∂αi ∂α j i, j
T1
i, j
T2
i, j
T3
73
√ i, j i, j i, j = T1 + 2ϵ ∆T2 + ϵ 2 T3 , where for all i, j ≤ a:
= 2∆
n 1 t∂ Nk (X, α) 1 ∂ Nk (X, α) α ,β (α0 )(Sk 0 0 )−1 (α0 ), ∆ ∂αi ∆ ∂α j k=1
n 1 t∂ 2 Nk (X, α) Nk (X, α0 ) α ,β (α0 )(Sk 0 0 )−1 √ ∆ ∂αi ∂α j ϵ ∆ k=1 α, f (α) −1 ) ∂ (Sk Nk (X, α0 ) 1 t∂ Nk (X, α) , (α0 ) (α0 ) + √ ∆ ∂αi ∂α j ϵ ∆ α, f (α) −1 2 n t ) Nk (X, α0 ) ∂ (Sk Nk (X, α0 ) = . (α0 ) √ √ ∂αi ∂α j ϵ ∆ ϵ ∆ k=1
=
i, j
The two terms T2
i, j
and T3
are bounded in probability and therefore
∂ 2 U∆,ϵ (α0 , β0 ) −−→ 2I∆ (α0 , β0 )i, j . ϵ→0 ∂αi ∂α j A.5. Proof of Lemma 4.3 Proof. Let us study the term E k defined in Lemma 4.3. We have E k = E k1 + E k2 with E k1 = tk 2 = tk−1 b(α0 , X t ) − b(α0 , x α0 (t)) dt + I p − Φα0 (tk , tk−1 ) X tk−1 − x α0 (tk−1 ) and E k tk ϵ tk−1 σ (β0 , X s ) − σ (β0 , X tk−1 ) d Bs . Using that x → b(α, x) is Lipschitz, we obtain 1 E ≤ ∆C sup X t − xα (t) k
0
t∈[tk−1 ;tk ]
1 ∂b (α0 , xα0 (t))Φα0 (t, tk−1 )dt Rθ1,ϵ (t ) + ∆ϵ k−1 0 0 ∂x ≤ C ′ ϵ∆ sup Rθ1,ϵ (t) . 0 t∈[tk−1 ;tk ]
The proof for E k2 follows the sketch given in [10, Lemma 1]. We prove this result based on the stronger condition Σ and b bounded (similarly to Gloter and Sørensen in Proposition 1 [10]). We use sequentially Burkh¨older–Davis–Gundy’s inequality and Jensen’s inequality to obtain m/2 m tk 2 2 σ (β0 , X s ) − σ (β0 , X t ) ds E E k |Ftk−1 ≤ Cϵ m E |Ftk−1 k−1 tk−1
≤ Cϵ m ∆m/2−1
tk
tk−1
m E σ (β0 , X s ) − σ (β0 , X tk−1 ) |Ftk−1 ds.
Then, using that x → σ (β, x) is Lipschitz, we obtain: tk m m 2 ′ m m/2−1 E E k |Ftk−1 ≤ C ϵ ∆ E X s − X tk−1 ds tk−1
74
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
s ≤Cϵ ∆ E (b(α0 , X u )du tk−1 tk−1 m + ϵσ (β0 , X u )d Bu ) ds. ′ m
m/2−1
tk
s Since b is bounded on U, tk−1 b(α0 , X u )du ≤ K |s − tk−1 | and Ito’s isometry yields E
s
tk−1
m ≤ E σ (β0 , X u )d Bu
tk−1
m Thus, E E 2 (β0 ) |Ft k
s
k−1
≤ C ′′ ϵ m ∆m/2−1
m/2 ≤ K |s − tk−1 |1/2 . Σ (β0 , X u )du
tk tk−1
|s − tk−1 |m/2 ds ≤ C (3) ϵ m ∆m .
The two following results are consequences of Lemma 4.3. Define, for M a symmetric positive random matrix, using (2.13): 2 Nk,0 (M) = tNk (X, α0 )MΣ −1 (β0 , X tk−1 )Nk (X, α0 ) ∈ R.
(A.8)
(i)
Now, for i = 1, 2, if (Mk−1 )k≥1 is a sequence of Ftk−1 -measurable symmetric positive matrices (i) of M p (R) satisfying supk≥1 Mk−1 is finite in probability, n 1 1 (1) (1) 2 E Nk,0 (Mk−1 )|Ftk−1 − Tr(Mk−1 ) −−−−→ 0 √ 2 ϵ,∆→0 n ϵ ∆ k=1 n 1 (1) (2) 2 2 E Nk,0 (Mk−1 )Nk,0 (Mk−1 )|Ftk−1 ϵ 4 ∆2 k=1 (1) (2) (1) (2) − Tr(Mk−1 )Tr(Mk−1 ) + 2Tr(Mk−1 Mk−1 ) −−−−→ 0 ϵ,∆→0
(A.9)
(A.10)
Indeed, under Pθ0 , we have (1) E tNk (X, α0 )Mk−1 Nk (X, α0 )|Ftk−1 =
=
p i, j=1 p
(1) (Mk−1 )i, j E Nk (X, α0 )i Nk (X, α0 ) j |Ftk−1 (1) (Mk−1 )i, j ϵ 2 ∆1{i= j} Σ (β0 , X tk−1 )i, j + E (E k )i (E k ) j |Ftk−1
i, j=1
which leads to n 1 1 (1) (1) t E Nk (X, α0 )Mk−1 Nk (X, α0 )|Ftk−1 − Tr(Mk−1 Σ (β0 , X tk−1 )) A= √ 2 n k=1 ϵ ∆ n C (1) ≤ sup Mk−1 E ∥E k ∥2 |Ftk−1 √ ϵ 2 T ∆ k∈{1,...,n} k=1 √ ≤ C ′ ∆. The proof of (A.10) is similar and not detailed here.
75
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
A.6. Proof of Lemma 4.4-(ii) (3),(i, j)
(i, j)
1 ∂ Nk (X,α) Using (2.13), we have ∆ ∂αi α j (α0 ) = f ∆ (α0 , tk−1 ) + ηk−1 1 ∂ 2 Φα (tk ,tk−1 ) (α0 ) X tk−1 − xα0 (tk−1 ) , and ∂αi ∂α j ∆ 2
(3),(i, j)
(α0 ) with ηk−1
∂ 2 xα (tk ) ∂ 2 xα (tk−1 ) (α0 ) − Φα0 (tk , tk−1 ) (α0 ) ∂α ∂α ∂αi ∂α j i j ∂ xα (tk−1 ) 1 ∂Φα (tk , tk−1 ) (α0 ) (α0 ) + ∆ ∂αi ∂α j ∂ xα (tk−1 ) ∂Φα (tk , tk−1 ) (α0 ) (α0 ) . + ∂α j ∂αi (i, j) Using (A.2), (A.3) and (A.6) we obtain that the deterministic quantity f ∆ 1 f ∆ (α0 , tk−1 ) = ∆ (i, j)
=
(3),(i, j)
as ∆ → 0. Finally, ηk−1 formula as ϵ, ∆ → 0.
∞
is bounded
is Ftk−1 -measurable and goes to zero due to the Taylor stochastic
A.7. Proof of Proposition 4.2 Let us first recall Lemma 9 in [8] that we use in the proof adapted to our notations. k Lemma n A.2. Letk (X n ) be a Ftk -measurable random variable (with tk = kT /n), then assume that k=1 E X n |Ftk−1 → U , with U a random variable, and nk=1 E (X nk )2 |Ftk−1 → 0, then nk=1 X nk → U . All the convergences are in probability. ∂ Uˇ
A Taylor expansion with integral remainder, for function ∂αϵ,∆ at point (α0 , βˇϵ,∆ ) yields for i i ≤a a 1 2 ˇ ∂ Uˇ ϵ,∆ (α0 , βˇϵ,∆ ) 2 ∂ Uϵ,∆ −ϵ = (α0 + t (αˇ ϵ,∆ − α0 ), βˇϵ,∆ )dt ϵ ∂αi ∂αi α j 0 j=1 i, j
×ϵ
−1
(αˇ ϵ,∆ − α0 ) j .
Then, setting η(α ˇ 0 , βˇϵ,∆ )i, j =
1
ϵ2
0
−
∂ 2 Uˇ ϵ,∆ (α0 + t (αˇ ϵ,∆ − α0 ), βˇϵ,∆ ) ∂αi α j
∂ 2 Uˇ ϵ,∆ (α0 , βˇϵ,∆ )dt ∂αi α j
i, j
∂ 2 Uˇ ϵ,∆ + ϵ2 (α0 , βˇϵ,∆ ) − 2Ib (α0 , βˇϵ,∆ )i, j , ∂αi α j with Ib defined in (3.9) we get ∂ Uˇ ϵ,∆ 2Ib (α0 , βˇϵ,∆ ) + η(α ˇ 0 , βˇϵ,∆ ) ϵ −1 αˇ ϵ,∆ − α0 = −ϵ (α0 , βˇϵ,∆ ). ∂α
(A.11)
76
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
To obtain the tightness of the sequence ϵ −1 (αˇ ϵ,∆ − α0 ) w.r.t. β, we first study the right hand side of (A.11). ∂ Uˇ (α ,β) Using Lemma 4.4-(i), for all i ∈ {1, . . . , a}, ϵ ϵ,∆∂αi 0 = nk=1 Cki + Dki , with Cki = 2 t∂b −1 (β, x (t α0 k−1 ))Nk (X, α0 ) ϵ ∂αi (α0 , x α0 (tk−1 ))Σ
Dki =
and
2 t∂b (α0 , xα0 (tk−1 )) Σ −1 (β, X tk−1 ) − Σ −1 (β, xα0 (tk−1 )) Nk (X, α0 ) ϵ ∂αi + 2 ∥α − α0 ∥ tηk Σ −1 (β, X tk−1 )Nk (X, α0 ).
Set C˜ ki = Cki − E Cki |Ftk−1 . Let us consider the centered martingale nk=1 C˜ ki . In order to apply a central limit theorem (see [11, Theorem 3.2 p. 58]) we have to prove that supk C˜ ki −−−−→ ϵ,∆→0 2 n j i i < ∞. 0, k=1 C˜ k C˜ k −−−−→ 4Ib (α0 , β)i, j and E supk C˜ k ϵ,∆→0
Note that, since the limit Ib (α0 , β) is deterministic, no nesting condition on the σ -fields is required. t∂b Applying the Taylor stochastic formula to Nk (X, α0 ) in C˜ ki expression yields C˜ ki = 2 ∂α i √ α ,β 2,ϵ i − (α0 , xα0 (tk−1 ))Σ −1 (β, xα0 (tk−1 )) ∆Z k 0 0 + ϵ(Rθ2,ϵ (t ) − R (t )) . Hence, sup |C k k−1 k θ k 0 0 E Cki |Ftk−1 | −−−−→ 0 and E supk (Cki − E Cki |Ftk−1 )2 < ∞. It remains to prove that ϵ,∆→0 n j j iC ˜ ˜ C − − − −→ 2Ib (α0 , β)i, j . Let us apply Lemma A.2 with X n,k = C˜ i C˜ . Then, nk=1 k=1
k k k k ϵ,∆→0 j n 1 2 (M 2 E[C˜ ki C˜ k |Ftk−1 ] = ϵ 2 k=1 E[Nk,0 k−1 )|Ftk−1 ] where Nk,0 is defined in (A.8) and Mk−1 = t∂b ∂b 4 ∂αi (α0 , xα0 (tk−1 ))Σ −1 (β, xα0 (tk−1 )) ∂αi (α0 , xα0 (tk−1 )). Using (A.9) yields that | nk=1 j j E[C˜ ki C˜ k |Ftk−1 ] − 4Ib (α0 , β)i, j | −−−−→ 0. Moreover (A.10) leads to nk=1 E[(C˜ ki C˜ k )2 |Ftk−1 ] = ϵ,∆→0
O(∆) → 0.
i Ck |Ftk−1 −−−−→ 0 and nk=1 Dki −−−−→ ϵ,∆→0 ϵ,∆→0 0 in probability. For Dki , (2.6) ensures that 1ϵ Σ −1 (β, X tk−1 ) − Σ −1 (β, xα0 (tk−1 )) is t∂b bounded in Pθ0 -probability. Hence, using Lemma 4.2 Vki = 2ϵ ∂α (α0 , xα0 (tk−1 )) Σ −1 (β, X tk−1 ) i −Σ −1 (β, xα0 (tk−1 )) + 2 ∥α − α0 ∥ tηk Σ −1 (β, X tk−1 ) is bounded in probability for all k. Since Dki = tVki Nk (X, α0 ), (4.6) ensures that nk=1 Dki −−−−→ 0 in Pθ0 -probability. Now, we prove that the centering term
n
k=1 E
ϵ,∆→0
∂b Σ −1 (β, xα0 (tk−1 )) ∂α i
Set Vk−1 = (α0 , xα0 (tk−1 )). Then, E[Cki |Ftk−1 ] = α0 )|Ftk−1 ]. By the Taylor stochastic formula
1t ϵ Vk−1 E[Nk (X,
√ α ,β 2,ϵ Nk (X, α0 ) = ϵ ∆Z k 0 0 + ϵ 2 Rθ2,ϵ (t ) − Φ (t , t )R (t ) . k α k k−1 k−1 0 θ0 0 Using the fact that Z k is independent of Ftk−1 , E tVk−1 Nk (X, α0 )|Ftk−1 = ϵ 2 tVk−1 [E[Rθ2,ϵ 0 (I p −Φα (tk ,tk−1 ))
0 Rθ2,ϵ (tk−1 )]. An Abel transformation to the series yields (t ) − R 2,ϵ (t )] + ∆ ∆ 0 k θ0 k−1 1 n V −V 2,ϵ t ϵ k=1 E Vk−1 Nk (X, α0 )|Ftk−1 ≤ T supk∈{1,...,n} k ∆k−1 supt∈[0,T ] ϵ Rθ0 (t). Using
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
77
k−1 now that supk∈{1,...,n} Vk −V is bounded, we obtain that in probability ∆ n 1 t Vk−1 E Nk (X, α0 )|Ftk−1 −−−−→ 0. ϵ,∆→0 ϵ k=1
(A.12)
Combining all these results we get (4.8). Let us now study η(α ˇ 0 , βˇϵ,∆ )i, j defined by (A.11). t 2U ˇ ∂ i, j i, j i, j n 1 ∂ Nk (X,α) ϵ 2 ∂αi ϵ,∆ = 2∆ + B with A (α , β) = A (α ) Σ −1 (β, X tk−1 ) 0 0 k=1 k k k αj ∂αi ∆ 2 N (X,α) t i, j 1 ∂ Nk (X,α) k (α0 ) and Bk = ∆1 ∂ ∂α (α0 )Σ −1 (β, X tk−1 )Nk (X, α0 ). ∂α j ∆ iαj −1 Using that Σ −1 (β, X tk−1 ) converges towards Σ (β0 , xα0 (tk−1 )), Lemma 4.4-(i) yields that n i, j k=1 Ak −−−−→ 2Ib (α0 , β0 )i, j additional terms are negligible since they are bounded by ϵ,∆→0 (2) n∆ supk∈{1,...,n},α∈K a ηϵ,∆ . ∞ i, j Applying Lemma 4.4-(ii) and (4.6) yields that nk=1 Bk → 0 in Pθ0 -probability. Joining ∂ 2 Uˇ
all the results we get (4.7). In addition, since the limit of ϵ 2 ∂αi ϵ,∆ α j (α0 , β) is deterministic, we have ∂ 2 Uˇ 2 ˇ 2 ϵ,∆ 2 ∂ Uϵ ∀t ∈ [0, 1] sup ϵ (α0 + t (αˇ ϵ,∆ − α0 ), β) − ϵ (α0 , β) ∂α α ∂α α i j i j β∈K b ≤ K αˇ ϵ,∆ − α0 .
(A.13) Joining (4.7) and (A.13) ensures that supβ∈K b η(α ˇ 0 , β) −−−−→ 0. It remains to prove that ϵ,∆→0
Ib (α0 , β) is invertible for all β. According to (S2), Σ (β, x) is invertible ∀(β, x) ∈ K b × U , which ensures that Σ −1 (β, x) is a coercive bilinear application. The set K b being compact, the coercive constant can be chosen independently of β. Using (3.9) infβ∈K b det(Ib (α0 , β)) ≥ T ∂b(α0 ,xα0 (s)) 2 C T1 0 ds = C0 , with C0 strictly positive because Ib (α0 , β0 ) is invertible. ∂α Noting Com(M) the comatrix of M, we have that supβ∈K b tCom(Ib (α0 , β)) < ∞ as a −1 continuous function of β and lim ϵ (αˇ ϵ,∆ − α0 ) ≤ C10 supβ∈K b tCom(Ib (α0 , β)) supβ∈K b 1 ∂ Uˇ ϵ,∆ −1 ϵ ∂α (α0 , β). Hence ϵ (αˇ ϵ,∆ −α0 ) is bounded in Pθ0 -probability, uniformly w.r.t. β, which achieves the proof of Proposition 4.2. A.8. Proof of Proposition 4.3 Using notations (4.10) and (4.11), we get n1 (Uˇ ∆,ϵ (αˇ ϵ,∆ , β)− Uˇ ∆,ϵ (αˇ ϵ,∆ , β0 )) = A1 (β, β0 )+ A2 (α0 , β, β0 ) + A2 (αˇ ϵ,∆ , β, β0 ) − A2 (α0 , β, β0 ) . We already obtained the convergence result for A1 (β, β0 ). Let us study A2 (α0 , β, β0 ). Using 2 (M −1 (β, X (A.8), A2 (α0 , β, β0 ) = ϵ 21T nk=1 Nk,0 tk−1 )Σ k−1 (β, β0 )), with Mk−1 (β, β0 ) = Σ 2 (β0 , X tk−1 )− I p . Let us now control the conditional moments of Nk,0 (Mk−1 (β, β0 )). Using (A.9) 1 n 2 (M yields supβ∈K b ϵ 21T nk=1 E[Nk,0 k−1 (β, β0 ))|Ftk−1 ] − n k=1 Tr(Mk−1 (β, β0 )) → 0.
78
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
T 2 (M (β, β ))|F → T1 0 Tr(Σ −1 (β, xα0 (t))Σ (β0 , xα0 (t)) E N t k−1 0 k−1 k=1 k,0 2 (M 2 dt − p), uniformly w.r.t. β. Using (A.10) yields ϵ 41T 2 nk=1 E (Nk,0 k−1 (β, β0 ))) |Ftk−1 − ∆ n 2 2 k=1 [Tr (Mk−1 (β, β0 )) + 2Tr(Mk−1 (β, β0 ))] → 0. n Hence,
1 ϵ2 T
n
The last term is O(∆) and goes to zero. Applying Lemma A.2 to X n,k = 1 T 2 (M −1 (β, x (t)) − I )dt Nk,0 α0 p k−1 (β, β0 )) yields A2 (α0 , β, β0 ) → T 0 Tr(Σ (β0 , x α0 (t))Σ in Pθ0 -probability. Joining these two results we obtain that A1 (β, β0 ) + A2 (α0 ; β, β0 ) −−−−→ K 2 (α0 , β0 , β) uniformly w.r.t. β. 1 ϵ2 T
ϵ,∆→0
It remains to prove that A2 (αˇ ϵ,∆ , β, β0 ) − A2 (α0 , β, β0) → 0 in probability uniformly Nk (xα0 ,α) ∥α ∥ w.r.t. β. Then, we use Lemma 4.2 to set h k (α, α0 ) = ϵ −1 + − α η . Then, k 0 ∆ A2 (α, β, β0 ) − A2 (α0 , β, β0 ) = T1 (α, α0 , β) + T2 (α, α0 , β), with n t −1 (β, X T1 (α, α0 , β) = ∆ tk−1 )h k (α, α0 ) and n k=1 h k (α, α0 )Σ n 2 t −1 T2 (α, α0 , β) = nϵ k=1 h k (α, α0 )Σ (β, X tk−1 )Nk (X, α0 ). By Lemmas 4.1 and 4.2, supk∈{1,...,n} ∥h k (α, α0 )∥ ≤ K ϵ −1 ∥α − α0 ∥ which leads to supβ∈K b 2 |T1 (αˇ ϵ,∆ , α0 , β)| ≤ K ∆ ϵ −1 (αˇ ϵ,∆ − α0 ) supβ∈K b Σ −1 (β, X tk−1 ). Applying Proposition 4.2 yields that this term goes to zero. ∆ t sup h k (αˇ ϵ,∆ , α0 )Σ −1 (β, X t ) ≤ K ϵ −1 αˇ ϵ,∆ − α0 is bounded in Pθ k∈{1,...,n}
n
k−1
T
0
probability by Proposition 4.2. Finally, applying (4.6) ensures that T2 (αˇ ϵ,∆ , α0 , β) → 0 and the proof is achieved.
A.9. Proof of Theorem 4.1 The asymptotic normality of ϵ −1 αˇ ϵ,∆ − α0 is obtained by just adding the consistency result on βˇϵ,∆ in the proof of Proposition 4.2. Taylor expansion of
∂ Uˇ ϵ,∆ ∂θ
t (βˇϵ,∆ − β0 )), Mα (θ ) = ∂ Uˇ Mβ (θ) = n1 ∂βi ϵ,∆ (θ ) ∂β j
at point θ0 = (α0 , β0 ), setting θt = (α0 + t (αˇ ϵ,∆ − α0 ), β0 + ∂ Uˇ ϵ,∆ ϵ ∂ Uˇ ϵ,∆ 2 √ , Mα,β (θ ) = (θ ) and ϵ ∂αi ∂α j (θ ) n ∂αi ∂β j 1≤i, j≤a
1≤i≤a,1≤ j≤b
provides
1≤i, j≤b
∂ Uˇ ϵ,∆ (α0 , β0 ) 1 ϵ Mα ∂α − = ˇ M 1 ∂ Uϵ,∆ α,β 0 (α0 , β0 ) √ n ∂β
ϵ −1 (αˇ ϵ,∆ − α0 ) Mα,β √ (θt )dt . Mβ n(βˇϵ,∆ − β0 )
i Let us first study the asymptotic normality of β. Setting Mk−1 = Σ −1 (β0 , X tk−1 ) ∂∂βΣi (β0 , ∂ log(det(Σ (β,X t
k−1 X tk−1 ) and noting that ∂βi n ∂ Uˇ ∆,ϵ i i k=1 Ak , with Ak = ∂βi (α0 , β0 ) =
)))
i ), we obtain using definition (A.8), = Tr(Mk−1
√1 Tr(M i ) k−1 n
−
1√ i ). N 2 (Mk−1 ϵ 2 ∆ n k,0
√1 n
Let us first apply
79
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80 j
Lemma A.2 with X n,k = Aik Ak . i ) n n Tr(Mk−1 1 j i j i E Ak Ak |Ftk−1 = Tr(Mk−1 )Tr(Mk−1 ) − n k=1 ϵ2∆ k=1 n 1 j j i 2 + Tr(Mk−1 )Tr(Mk−1 ) × E Nk,0 (Mk−1 )|Ftk−1 n k=1 j n Tr(Mk−1 ) 2 1 1 i − E N (M )|F + tk−1 k,0 k−1 n k=1 ϵ 4 T ∆2 ϵ2∆ j j 2 i 2 i × E Nk,0 (Mk−1 )Nk,0 (Mk−1 )|Ftk−1 − Tr(Mk−1 )Tr(Mk−1 ) n 2 j j i i Tr(Mk−1 Mk−1 ). − 2Tr(Mk−1 Mk−1 ) + n k=1 Using (A.9) and (A.10), the first three summation √ terms go to zero, while the last one goes to 4Iσ (α0 , β0 )i, j as a Riemann sum. Using that n Aik is bounded in probability yields that n n i ˜ j 2 j Ak Ak = O( n12 ), leading to k=1 E (Aik Ak )2 |Ftk−1 → 0. Thus, we obtain k=1 X n,k → 4Iσ (α0 , β0 ). In addition, (A.9) yields n n 1 1 2 i i E Aik |Ftk−1 = √ 2 E Nk,0 (Mk−1 )|Ftk−1 − Tr(Mk−1 ) −−−−→ 0 k=1 ϵ,∆→0 n ϵ ∆ k=1 and supk∈{1,...,n} E Aik |Ftk−1 → 0. Now, setting A˜ ik = Aik − E Aik |Ftk−1 leads nk=1 j E A˜ i A˜ |Ftk−1 → 4Iσ (α0 , β0 ). Using the Taylor stochastic formula k
k
1 α ,β A˜ ik = √ Tr Mki Sk 0 0 Σ −1 (β0 , X tk−1 ) − I p n + ϵ 2 t(Rθ2,ϵ (tk ) − Rθ2,ϵ (tk−1 ))Mki Σ −1 (β0 , X tk−1 )(Rθ2,ϵ (tk ) − Rθ2,ϵ (tk−1 )). 0 0 0 0 α ,β
Using (2.11) and (A.1) yields that Sk 0 0 = Σ (β0 , xα0 (tk−1 ) + O(∆)). Hence, supk∈{1,...,n} 2 ˜i Ak −−−−→ 0. Using Corollary 2.2-(ii) yields E supk∈{1,...,n} A˜ ik < ∞. ϵ,∆→0
We can now apply Theorem 3.2, p 58 in [11] to the centered martingale A˜ ik to obtain the asymptotic normality:
1 n
∂ Uˇ ∆,ϵ √1 (α0 , β0 ) n ∂βi
→ N (0, 4Iσ (α0 , β0 )).
n ∂ Uˇ i, j i, j Let us now study the second derivatives of Uˇ ϵ,∆ , n1 ∂βi ∆,ϵ k=1 Bk with Bk = ∂β j (α0 , β0 ) = i, j j i, j j j i, j 1 i ) i i Tr(L k−1 ) − Tr(Mk−1 Mk−1 N 2 L k−1 − Mk−1 Mk−1 − Mk−1 Mk−1 , and L k−1 = ϵ 2 T k,0
Σ −1 (β0 , X tk−1 ) ∂β∂iΣ ∂β j (β0 , X tk−1 ). i, j j i Using (A.9) we have | nk=1 E[Bk |Ftk−1 ] − n1 Tr(Mk−1 Mk−1 )| −−−−→ 0. Moreover, ϵ,∆→0 n i, j 2 i, j 2 i, j 1 ∥Bk ∥ = O( n 2 ), so k=1 E[(Bk ) |Ftk−1 ] → 0. Hence Lemma A.2 yields nk=1 Bk −−−−→ 2Iσ (α0 , β0 )i, j .
ϵ,∆→0
80
R. Guy et al. / Stochastic Processes and their Applications 124 (2014) 51–80
It remains to study
∂ Uˇ ∆,ϵ √ϵ (α0 , β0 ) n ∂αi β j
Σ −1 (β0 , X tk−1 )Nk (X, α0 ). Posing
i, j Vk−1
= n
√1 Σ −1 (β0 , n
=
i, j k=1 C k
n
i, j
with, Ck
= ϵ ∆1√n
t∂ N
j k (X,α) (α0 )Mk−1 ∂αi
i, j i, j Vk −Vk−1 we have supk∈{1,...,n} ∆ i, j tV k−1 E Nk (X, α0 )|Ftk−1 . Thus (A.12) leads
j 1 ∂ Nk (X,α) X tk−1 ) tMk−1 ∆ (α0 ), ∂αi
i, j −−−−→ 0 and k=1 E Ck |Ftk−1 = 1ϵ nk=1 ϵ,∆→0 i, j i,k t i,k to nk=1 E Cki,k |Ftk−1 → 0. Moreover, setting Mk−1 = nVk−1 Vk−1 we apply (A.9) to n i, j i,k 2 n 2 (M obtain k=1 E (Ck ) |Ftk−1 = nϵ12 k=1 E Nk,0 k−1 )|Ftk−1 → 0. Lemma A.2 leads to n i, j k=1 C k −−−−→ 0. The proof is then achieved. ϵ,∆→0
References [1] H. Andersson, T. Britton, Stochastic Epidemic Models and their Statistical Analysis, Springer, 2000. [2] R. Azencott, Stochastic Taylor formula and Feynmann integrals, in: Geometrie Differentielle Stochastique, in: Seminaire Prob., vol. XVI, 1982. [3] H. Cartan, Differential Calculus, Hermann, 1971. [4] D. Dacunha-Castelle, M. Duflo, Probabilit´es et statistiques 2, in: Probl`emes a` Temps Mobile, Masson, 1993. [5] O. Diekmann, J.A.P. Heesterbeek, Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation, Wiley, 2000. [6] M.I. Freidlin, A.D. Wentzell, Random Perturbations of Dynamical Systems, Springer-Verlag, 1984. [7] V. Genon-Catalot, Maximum contrast estimation for diffusion processes from discrete observations, Statistics 21 (1990) 99–116. [8] V. Genon-Catalot, J. Jacod, On estimating the diffusion coefficient for multidimensionnal processes, Annnales de l’Institut Henri Poincar´e Probabilit´es Statistiques 29 (1993) 119–151. [9] D.T. Gillespie, Exact stochastic simulation of coupled chemical reactions, The Journal of Physical Chemistry 81 (1977) 2340–2361. [10] A. Gloter, M. Sørensen, Estimation for stochastic differential equations with a small diffusion coefficient, Stochastic Processes and their Applications 119 (2009) 679–699. [11] P. Hall, C.C. Heyde, Martingale limit theory and its application, Probability and Mathematical Statistics (1980). [12] L.P. Hansen, J.A. Scheinkman, Back to the future: generating moment implications for continuous time Markov processes, Econometrica 63 (1995) 767–804. [13] N. Ikeda, S. Watanabe, Stochastic Differential Equations and Diffusion Processes, North-Holland Publishing Company, 1989. [14] I. Karatzas, S.E. Shreve, Brownian Motion and Stochastic Calculus, second ed., Springer, 1991. [15] M. Kessler, Simple and explicit estimating functions for a discretely observed diffusion process, Scandinavian Journal of Statistics. Theory and Applications 27 (2000) 65–82. [16] Y. Kutoyants, Parameter Estimation for Stochastic Processes, Heldermann, Berlin, 1984. [17] C. Laredo, A sufficient condition for asymptotic sufficiency of incomplete observations of a diffusion process, The Annal of Statistics 18 (1990) 1151–1178. [18] R.N. Lipster, A.N. Shiryaev, Statistics of Random Processes, Springer, New York, 2001. [19] F. Longstaff, E. Schwartz, A simple approach to valuing risky fixed and floating rate debt, The Journal of Finance 1 (1995) 789–819. [20] M. Sørensen, Small dispersion asymptotics for diffusion martingale estimating functions, 2000. Preprint. [21] M. Sørensen, M. Uchida, Small diffusion asymptotics for discretely sampled stochastic differential equations, Bernoulli 9 (2003) 1051–1069. [22] M. Uchida, Estimation for discretely observed small diffusions based on approximate martingale estimating functions, Scandinavian Journal of Statistics 31 (4) (2004) 553–566. [23] M. Uchida, N. Yoshida, Adaptive estimation of an ergodic diffusion process based on sampled data, Stochastic Processes and their Applications 122 (2012) 2885–2924. [24] N. Yoshida, Estimation for diffusion processes from discrete observation, Journal of Multivariate Analysis 41 (1992) 220–242.