Optimal Berry–Esseen bound for an estimator of parameter in the Ornstein–Uhlenbeck process

Optimal Berry–Esseen bound for an estimator of parameter in the Ornstein–Uhlenbeck process

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

453KB Sizes 0 Downloads 26 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Optimal Berry–Esseen bound for an estimator of parameter in the Ornstein–Uhlenbeck process✩ Yoon Tae Kim, Hyun Suk Park ∗ Department of Statistics, Hallym University, Chuncheon, Gangwon-Do 200-702, South Korea

article

abstract

info

Article history: Received 21 September 2016 Accepted 16 January 2017 Available online xxxx

This paper is concerned with the study of the rate of central limit theorem for the maximum likelihood estimator θˆT of the unknown parameter θ > 0, based on the observation X = {Xt , 0 ≤ t ≤ T }, occurring in the drift coefficient of an Ornstein–Uhlenbeck process dXt = −θ Xt dt + dWt , X0 = 0 for 0 ≤ t ≤ T , where {Wt , t ≥ 0} is a standard Brownian motion. The tool we use is an Edgeworth expansion with an explicitly expressed remainder. We prove that upper and lower bounds, obtained by controlling the remainder term, give an optimal rate √1 in Kolmogorov distance for normal approximation of θˆT .

AMS 2000 subject classifications: primary 60H07 secondary 60F25

T

© 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Keywords: Malliavin calculus Ornstein–Uhlenbeck process Berry–Esseen bound Maximum likelihood estimator Multiple stochastic integral Edgeworth expansion

1. Introduction In this paper, we find an optimal rate of convergence of the distribution of the maximum likelihood estimator (MLE) of the unknown parameter θ ∈ Θ ⊆ R+ based on the observation X = {Xt , 0 ≤ t ≤ T } given by dXt = −θ Xt dt + dWt ,

X0 = 0,

0 ≤ t ≤ T,

(1)

where {Wt , t ≥ 0} is a standard Brownian motion. When the process {Xt , 0 ≤ t ≤ T } can be observed, the MLE θˆT is given by



T 2θ

(θˆT − θ ) =





2θ T

2θ T

ST

⟨S ⟩T

,

(2)

where

 ST =

T

0

T

 Xt dWt

Xt2 dt .

and ⟨S ⟩T = 0

✩ This research was supported by Hallym University Research Fund, 2015 (HRF-201509-009), and by Basic Science Research Program through the National Research Foundation of Korea (NRF-2013R1A1A2008478). ∗ Corresponding author. E-mail addresses: [email protected] (Y.T. Kim), [email protected] (H.S. Park).

http://dx.doi.org/10.1016/j.jkss.2017.01.002 1226-3192/© 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

2

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



Here we just employ the label ⟨S ⟩T of the quadratic variation as the notation. It is well known that θˆT is strongly consistent √ and T /2θ (θˆT − θ ) converges to Gaussian random variable with the mean zero and unit variance as T tends to infinity (see Basawa & Prakasa Rao, 1980). For the Berry–Esseen bound of the MLE θˆT , Mishra and Prakasa Rao in Mishra and Prakasa Rao (1985) obtained the rate O(T −1/5 ) by using the technique of Michel and Pfanzagl (1971). In Bose (1985), author decomposed the numerator in (2) −1/2 into two parts by using log T ). In Bishwal and Bose (1995), authors improve √ Itó formula, and obtained the rate O(T −1/2 log T ) by using a characteristic function for normal approximation of the numerator and moment the rate to O(T generating function for the convergence of denominator. Afterwards, Bishwal (2000) improved the Berry–Esseen bound for θˆT to O(T −1/2 ) through the squeezing techniques of Pfanzagl (1971) developed for the minimum contrast estimator in Pfanzagl (1971) (for more information, see Bishwal, 2008). The aim of the present work is to show that the upper bound T −1/2 obtained by Bishwal (2000) is sharp by finding a lower bound with the same speed, that is an optimal Berry–Esseen bound for θˆT . As a tool for this, we use a one-term Edgeworth expansion from Kim and Park (submitted for publication). By using this method, we also find the upper bound T −1/2 obtained by Bishwal (2000). We stress that our technique is more straightforward than the squeezing techniques used in the paper (Bishwal, 2000). Moreover, our technique is widely used for an optimal rate for parameter estimation of Gaussian processes. We simply state the method of a one-term Edgeworth expansion. Let {Fn , n ≥ 1} be a sequence of random variables of functional of infinite-dimensional Gaussian fields associated with an isonormal Gaussian process defined on a probability space (Ω , F, P). Authors (Kim & Park, submitted for publication), by combining Malliavin calculus and repeated applications of Stein’s equations, find a one-term Edgeworth expansion with an explicit expression of the remainder Rn (z ):

P(Fn ≤ z ) − P(Z ≤ z ) = −

1 3!

H2 (z )φ(z )κ3 (Fn ) + Rn (z ),

(3)

where H2 (z ) denotes the second Hermite polynomial, κ3 (Fn ) denotes the the third cumulant of Fn , Z is a standard Gaussian random variable and 1

φ(z ) = √



1 2

e− 2 z .



For our work, we use this Edgeworth expansion (3) of the distribution function P

T 2θ

 (θˆT − θ ) ≤ z . By controlling

the remainder after required terms to obtain an  upper (or lower) bound in the Kolmogorov distance, we obtain an optimal

(θˆT − θ ) . We say that the bound ϕ(T ) is optimal for the sequence {FT , T ≥ 0} with respect to the distance d if there exist constants 0 < c < C < ∞ (not depending on T ) such that, for sufficiently large T ,

Berry–Esseen bound of a sequence

c≤

d(FT , N )

ϕ(T )

T 2θ

≤ C.

(4)

In this paper, we focus on the normal approximation of random variables with respect to the Kolmogorov distance defined by d(X , Y ) = sup |P(X ≤ z ) − P(Y ≤ z )|. z ∈R

The rest of the paper is organized as follows. Section 2 reviews some basic notations and results of Gaussian analysis and Malliavin calculus. In Section 3, we prove that the rate T −1/2 is an optimal rate of CLT for the MLE θˆT . 2. Preliminaries In this section, we recall some basic facts about Malliavin calculus for Gaussian processes. The reader is referred to Nourdin and Peccati (2012) and Nualart (2006) for a more detailed explanation. Suppose that H is a real separable Hilbert space with scalar product denoted by ⟨·, ·⟩H . Let B = {B(h), h ∈ H} be an isonormal Gaussian process, that is a centered Gaussian family of random variables such that E[B(h)B(g )] = ⟨h, g ⟩H . For every n ≥ 1, let Hn be the nth Wiener chaos of B, that is the closed linear subspace of L2 (Ω ) generated by {Hn (B(h)) : h ∈ H, ∥h∥H = 1}, where Hn is the nth Hermite polynomial. We define a linear isometric mapping In : H⊙n → Hn by In (h⊗n ) = n!Hn (B(h)), where H⊙n is the symmetric tensor product. It is well known that any square integrable random variable F ∈ L2 (Ω , G, P) (G denotes the σ -field generated by B) can be expanded into a series of multiple stochastic integrals: F =

∞ 

Ik (fk ),

k =0

where f0 = E[F ], the series converges in L2 , and the functions fk ∈ H⊙k are uniquely determined by F .

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



3

Let {el , l ≥ 1} be a complete orthonormal system in H. If f ∈ H⊙p and g ∈ H⊙q , the contraction f ⊗r g, 1 ≤ r ≤ p ∧ q, is the element of H⊗(p+q−2r ) defined by ∞ 

f ⊗r g =

⟨f , el1 ⊗ · · · ⊗ elr ⟩H⊗r ⊗ ⟨g , el1 ⊗ · · · ⊗ elr ⟩H⊗r .

(5)

l1 ,...,lr =1

The following formula for the product of the multiple stochastic integrals will be frequently used to prove the main result in this paper: Proposition 1. Let f ∈ H⊙p and g ∈ H⊙q be two symmetric functions. Then Ip (f )Iq (g ) =

p∧q 

r!

r =0

p q r

r

Ip+q−2r (f ⊗r g ).

(6)

Let S be the class of smooth and cylindrical random variables F of the form F = f (B(ϕ1 ), . . . , B(ϕn )),

(7)

where n ≥ 1, f ∈ Cb∞ (Rn ) and ϕi ∈ H, i = 1, . . . , n. The Malliavin derivative of F with respect to B is the element of L2 (Ω , H) defined by DF =

n  ∂f (B(ϕ1 ), . . . , B(ϕn ))ϕi . ∂ xi i=1

(8)

We can define the iteration of the operator D in such a way that for a random variable F ∈ S, the iterated derivative Dk F is a random variable with values in H⊗k . We denote by Dl,p the closure of its associated smooth random variable class with respect to the norm

∥F ∥pl,p = E(|F |p ) +

l 

p

E(∥Dk F ∥H⊗k ).

k=1

We denote by δ the adjoint of the operator D, also called the divergence operator. The domain of δ , denoted by Dom(δ), is formed by element u ∈ L2 (Ω ; H) such that

|E(⟨Dl F , u⟩H⊗l )| ≤ C (E|F |2 )1/2 for all F ∈ Dl,2 . If u ∈ Dom(δ), then δ(u) is the element of L2 (Ω ) defined by the duality relationship (i.e., ‘‘the integration by parts formula’’)

E[F δ(u)] = E[⟨DF , u⟩H ] for every F ∈ D1,2 .

(9)

Let F ∈ L (Ω ) be a square integrable random variable. For each n ≥ 1, we will denote by Jn : L (Ω ) → Hn the orthogonal projection ∞ on the nth Wiener chaos Hn . The operator L is defined through the projection operator Jn , n = 0, 1, 2 . . . , as L = n=0 −nJn F , and is called the infinitesimal generator of the Ornstein–Uhlenbeck semigroup. The relationship between the operators D, δ , and L is given as follows: δ DF = −LF , that is, for F ∈ L2 (Ω ) the statement F ∈ Dom(L) is equivalent to F ∈ Dom(δ D) (i.e. F ∈ D1,2 and DF ∈ Dom(δ)), and in this case δ DF = −LF . We also define the operator L−1 , which is the ∞ 1 −1 is an operator with values in D2,2 and LL−1 F = F − E[F ] for all pseudo-inverse of L, as L−1 F = n=1 n Jn (F ). Note that L 2 F ∈ L (Ω ). 2

2

3. Edgeworth expansion and Lemmas 3.1. Edgeworth expansion For fixed z ∈ R, we consider the Stein equation f ′ (x) − xf (x) = 1(−∞,z ] (x) − Φ (z ),

(10)

where Φ (z ) = P(Z ≤ z ). It is well known (see e.g. Chen, Goldstein, & Shao, 2011) that for every fixed z ∈ R, the function x2 /2

fz (x) = e



x

[1(−∞,z ] (u) − Φ (z )]e−u

2 /2

du

−∞

√ 2 2π ex /2 Φ (x)(1 − Φ (z )) = √ 2 2π ex /2 Φ (z )(1 − Φ (x))

if x ≤ z if x > z ,

(11)

4

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)





2π /4 and ∥fz′ ∥∞ ≤ 1. For a given function h : R → R such that

is a solution to the Stein equation (10) such that ∥fz ∥∞ ≤ E[|h(Z )|] < ∞, we define 1

(UZ h)(w) = e 2 w

2



w

1 2

(h(x) − E[h(Z )])e− 2 x dx,

(12)

−∞

and write (UN 1(−∞,z ] )(w) = fz (w). The next lemma gives some properties of the function UZ h (see e.g. Nourdin & Peccati, 2012). Lemma 1 (Stein’s Equation). Given a bounded function h : R → R, there exists an absolutely continuous function UZ h solving the Stein’s equation: f ′ (x) − xf (x) = h(x) − E[h(Z )] for all x ∈ R

(13)

for all x and satisfying

 ∥UZ h∥∞ ≤

π

∥h − E[h(Z )]∥∞ and ∥(UZ h)′ ∥∞ ≤ 2∥h − E[h(Z )]∥∞ .

2

(14)

Before stating Edgeworth expansions obtained by Kim and Park (submitted for publication) in the paper (Kim & Park, submitted for publication), we introduce some notations including the Gamma operator Γ (see Nourdin & Peccati, 2010). We shall denote by Hq , q ≥ the Hermite polynomials, defined as follows: H0 = 1 and for q ≥ 1 d q − 1 x2 e 2 , x ∈ R. dxq Let F be a real-valued random variable such that E[|F |m ] < ∞ for some integer m ≥ 1. Then the jth cumulant of F for j = 1, 2, . . . , m, denoted by, 1 2

Hq (x) = (−1)q e 2 x

dj

  κj (F ) = (−i) j log E[e ] . dt t =0 j

itF

Let F ∈ D1,2 . Define Γ0 (F ) = F and Γ1 (F ) = ⟨DF , −DL−1 F ⟩H . If Γj (F ), j ≥ 1, is well-defined element in L2 (Ω ), we write

Γj (F ) = ⟨DF , −DL−1 Γj−1 (F )⟩H ,

for j = 0, 1, . . . .

1,2

Similarly, let F ∈ D , we define Γ˜ 0 (F ) = F and Γ˜ 1 (F ) = Γ1 (F ). If Γ˜ j (F ) ∈ D1,2 for fixed j ≤ 1, we write

Γ˜ j+1 (F ) = ⟨−DL−1 F , DΓ˜ j (F )⟩H . Let F ∈ Dj,2 for fixed integer j ≥ 1. Then, for all k = 1, . . . , j, it is easy, from Lemma 4.2 in Nourdin and Peccati (2010), to j

see that Γ˜ k (F ) is a well-defined element in Dj−k,2 (i1 ) (i2 )

(Ψi1 ,...,ik g )(x) = (UZ (· · · (UZ (UZ g )  

)

k

j−k

. Let us set

(ik )

· · ·)

(x) for i1 , . . . , ik ∈ {0, 1},



where

(UZ g )(i) (x) =



(UZ g )(x) (UZ g )′ (x)

for i = 0, for i = 1.

In the paper (Kim & Park, submitted for publication), authors prove the following relations by repeated applications of the duality relationship (9) and the Stein equation (10). Lemma 2 (One-Term Edgeworth Expansions). Let F ∈ D3,2 with the zero mean and E[F 2 ] = 1. Assume, moreover, that F has an absolutely continuous law with respect to Lebesgue. Then, for every z ∈ R, we have 3

P(F ≤ z ) − P(Z ≤ z ) = −

1 3!

H2 (z )φ(z )κ3 (F ) + R(z3) (F ).

(15) (3)

Here H2 (z ), φ(z ) and κ3 (F ) are given in (3), and the remainder term Rz (F ) has an expression given as follows: R(z3) (F ) = E[Ψ1 fz′ (F )(1 − Γ1 (F ))2 ] + E[Ψ0,1 fz′ (F )(1 − Γ1 (F ))Γ˜ 2 (F )] − E[Ψ0,0 fz′ (F )Γ˜ 3 (F )].

(16)

Remark 1. The above lemma does not require to state the way in which this expansion is supposed to approximate P(F ≤ z ) since Lemma 2 shows just the relation with an explicit remainder. Usually Edgeworth expansions of P(F ≤ z ) are only available as an asymptotic series even when the random variable F has the specific form of partial sums. For our work, even (3) in the case of non-decomposable F , an explicit expression of the remainder, as given in Rz (F ) in Lemma 2, is needed to control the remainder after required terms to obtain an upper (or lower) bound in the Kolmogorov distance.

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



5

3.2. Lemmas Using the relationship between the multiple stochastic integral and iterated Ito integral, we write ST = 21 I2 (f2 ), where f2 (s, t ) = e−θ |t −s| for t , s ∈ [0, T ]. Next, the label ⟨S ⟩T in (2) will be explicitly computed. The formula for the product of the multiple stochastic integrals in (6) yields I1 (e−θ t 1[0,t ] e−θ· )2 = I2 (e−θ t 1[0,t ] e−θ· ⊗ e−θ t 1[0,t ] e−θ· ) + (e−θ t 1[0,t ] e−θ· ) ⊗1 (e−θ t 1[0,t ] e−θ· ).

(17)

By Fübini theorem and (17), we have T

 ⟨S ⟩T = I2

 



e−2θ t 1[0,t ] (u)eθ u ⊗ 1[0,t ] (v)eθv dt

T





+ 0

0

T

= I2 (g2 ) +





1 4θ 2

t

e−2θ (t −s) dsdt

0

(1 − e−2θ T ),

(18)

where

 eθ(u+v)  −2θ(u∨v) e − e−2θ T . 2θ

g2 (u, v) =

Eq. (2) can be written as



T 2θ

(θˆT − θ ) =

−F T 2(GT + QT )

,

(19)

where FT and GT are the double stochastic integrals

 FT =

2θ T

I2 (f2 ),

GT =

2θ T

I2 (g2 )

(20)

and QT = 1 −

1 2θ T

(1 − e−2θ T ).

Let us set

  2θ 4θ z  ΨT (z ) = −FT − 2zGT = I2 − f2 − g2 for z ∈ R. T

T

Lemma 3. For fixed z ∈ R, we have

κ3 (ΨT (z )) = √

 1  1  1  1 + z2o √ + z3o 2 . √ + o √ + zo

48



T

T

T

T

T

T

 1  1  1  1  1 1 1 1 κ4 (ΨT (z )) = (48 × 4)θ 1 + + 2 +o + zo √ + z 2 o 2 + z 3 o √ + z4o 3 , θ 2θ T T T T T T T2 T

(21) (22)

where the notation o(1/T α ) means that T α o(1/T α ) → 0 as T → ∞.



+ 4Tθ z g2 . For any fixed z ∈ R, denote by κp (ΨT (z )), p ≥ 1, the pth cumulant of ΨT (z ). The following relation, giving an explicit expression for the cumulant of ΨT (z ), is well known (see, e.g., Fox & Taqqu, 1987 and Nualart &

Proof. Put h2,z =

2θ f T 2

Peccati, 2009):

κp (ΨT (z )) = 2p−1 (p − 1)!⟨h2,z ⊗(1p−1) h2,z , h2,z ⟩H ⊗2 , (p−1)

where h2,z ⊗1

(p)

(23)

(1)

h2,z , p ≥ 1, defined as follows: h2,z ⊗1 h2,z = h2,z and for p ≥ 2 (p−1)

h2,z ⊗1 h2,z = (h2,z ⊗1

h2,z ) ⊗1 h2,z .

˜ 1 h2,z . Indeed, Obviously, h2,z ⊗1 h2,z = h2,z ⊗ h2,z ⊗1 h2,z (s, t ) = h2,z ⊗1 h2,z (t , s) =

T



h2,z (s, r )h2,z (t , r )dr . 0

From the case of p = 3 in formula (23), the third cumulant of ΨT (z ) is given by

κ3 (ΨT (z )) = 8⟨h2,z ⊗1 h2,z , h2,z ⟩H⊗2 .

(24)

6

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



First we write 4 

(h2,z ⊗1 h2,z )(s, t ) =

ℓ˜ i,T (s, t ),

(25)

i=1

where 2θ ℓ˜ 1,T (s, t ) = T

T



f2 (s, r )f2 (t , r )dr , 0



 T 4θ 2θ z f2 (s, r )g2 (t , r )dr , ℓ˜ 2,T (s, t ) = √ T T 0 √  T 4θ 2θ z ℓ˜ 3,T (s, t ) = f2 (t , r )g2 (s, r )dr , √ T

T

0

16θ z ℓ˜ 4,T (s, t ) = 2

2 2

T



T

g2 (s, r )g2 (t , r )dr . 0

Direct computations give

ℓ˜ 1,T (s, t ) = ℓ1,T (s, t ) + ℓ1,T (t , s), ℓ˜ 2,T (s, t ) = ℓ˜ 3,T (t , s), ℓ˜ 4,T (s, t ) = ℓ4,T (s, t ) + ℓ4,T (t , s),

(26)

where

 2θ  −θ(t +s) 2θ s eθ (t −s) −2θ t e (e − 1) + e−θ(t −s) (t − s) + (e − e−2θ T ) 1[s≤t ] , T 2θ

ℓ1,T (s, t ) =





2 2θ z

ℓ˜ 2,T (s, t ) =



T

T θ(t +s)



+e

1 2θ



+

1 θ(t −s) −2θ t e (e − e−2θ T )(e2θ s − 1) + eθ (t +s) (e−2θ t − e−2θ T )(t − s)



2 2θ z





+e

)−e

−2θ T

(T − t )

 1[s≤t ]



T

θ(t +s)

−e

−2θ T

1 θ(t −s) −2θ t 1 −2 θ T 2 θ s e (e − e−2θ T )(e2θ t − 1) + eθ (t −s) (s − t ) − e (e − e−2θ t ) 2θ 2θ



T

(e

−2θ t

1 2θ

(e

−2θ s

−e

−2θ T

)−e

−2θ T

 (T − t ) 1[t ≤s] ,



4z 2 1 −2θ t ℓ4,T (s, t ) = 2 eθ(t +s) (e − e−2θ T )(e−2θ s − e−2θ T )(e2θ s − 1) T 2θ

+ (e +

−2θ t

−e

−2θ T

  1 −2 θ T 2 θ t 2θ s ) (t − s) − e (e − e ) 2θ 

1 2θ

(e

−2θ t

−e

−2θ T

) − 2e

−2θ T

1 −4θ T 2θ T (T − t ) + e (e − e2θ t ) 1[s≤t ] . 2θ

From (25), we decompose the following inner product into six integrals:

⟨h2,z ⊗1 h2,z , h2,z ⟩H⊗2 =

8 

Ai,T ,

i=1

where A1,T = A2,T = A3,T =







T



T



T



f2 (s, t )f2 (s, r )f2 (t , r )drdsdt ,



T 8θ 2 z

T

T2 8θ 2 z T2

0

0

T 

 0

0 T 

 0

0

T 

T

f2 (s, t )f2 (s, r )g2 (t , r )drdsdt , 0

T  0

T

f2 (s, t )f2 (t , r )g2 (s, r )drdsdt , 0



Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

A4,T = A5,T =

2θ z 2

T2 8θ 2 z T2 16θ

T

0

T  0

√ 2

2θ z 2

7

0

0

0

T 



f2 (s, t )g2 (s, r )g2 (t , r )drdsdt ,

√ 

)

T

T 

T 



T

g2 (s, t )f2 (s, r )f2 (t , r )drdsdt , 0 T

T 



T



g2 (s, t )f2 (s, r )g2 (t , r )drdsdt , √ T2 T 0 0 0 √  T T T 16θ 2 2θ z 2 = g2 (s, t )f2 (t , r )g2 (s, r )drdsdt , √

A6,T = A7,T

16θ

√ 2

A8,T =

T2

T

64θ 3 z 3



T3

0

T



0

0

0 T

T



0

g2 (s, t )g2 (s, r )g2 (t , r )drdsdt . 0

Obviously, the fact that f2 and g2 are symmetric functions shows A2,T = A3,T = A5,T

and

A4,T = A6,T = A7,T .

Since the integrand in A1,T is symmetric, we have 2θ



 T t s 2θ f2 (s, t )f2 (s, r )f2 (t , r )drdsdt √ T T 0 0 0 √  T  t s 2θ 2θ = 3! √ e−2θ t e2θ r drdsdt T T 0 0 0  √   T  T 2θ 2θ 1 1 −2θ t −2θ t = 3! √ (1 − e )dt − te dt 4θ 2 0 2θ 0 T T  √  T 2θ 2θ T −2θ T 1 −2θ T = 3! √ + 2e − 3 (1 − e ) 4θ 2 4θ 4θ T T  1  3! = √ √ +o √ . 2θ T T

A1,T = 3!

(27)

For A2,T , we compute separately two cases 0 ≤ s ≤ t ≤ T and 0 ≤ t ≤ s ≤ T . For t ≥ s, T



f2 (s, t )f2 (s, r )g2 (t , r )dr =



s

+

0

0

=

 T

t



+ s

f2 (s, t )f2 (s, r )g2 (t , r )dr

t

1

1

(e−2θ t − e−2θ T )(e2θ s − 1) + (e−2θ t − e−2θ T )e2θ s (t − s) 4θ 2θ   2

+

1



e2θ s

1



(e−2θ t − e−2θ T ) − e−2θ T (T − t ) .

(28)

Direct computations, together with (28), yield T



T



0

0

=

=

T



f2 (s, t )f2 (s, r )g2 (t , r )1[s≤t ] drdsdt 0 T

   T 1  1 t 1 −2θ t −2θ T 2θ t −2θ t −2θ T 2θ t ( e − e ) ( e − 1 ) − t dt + + ( e − 1 ) dt ( e − e ) − 4θ 2 0 2θ 2θ 0 2θ 4θ 2   T 1 1 −2θ t 1 2θ t −2θ T −2 θ T + (e −e )−e (T − t ) (e − 1)dt 2θ 0 2θ 2θ 

1

3T 8θ 3

+ o(T ).

(29)

By a similar computation as for the case t ≥ s, we have that for s ≥ t, T

 0

T

 0

T



f2 (s, t )f2 (s, r )g2 (t , r )1[t ≤s] drdsdt = 0

3T 8θ 3

+ o(T ).

(30)

8

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



From (29) and (30), it follows that for fixed z ∈ R, A2,T =

6z

θT

1 . + zo

(31)

T

As for the integral A4,T , we have



 T T T 2θ z 2 f2 (s, t )g2 (s, r )g2 (t , r )1[s≤t ] drdsdt √ T2 T 0 0 0 √  T  T  s  t  T  32θ 2 2θ z 2 f2 (s, t )g2 (s, r )g2 (t , r )1[s≤t ] drdsdt + = + √ T2 T t s 0 0 0   √  T t  T t 32θ 2 2θ z 2 1 1 −2θ(t −s) −2θ (t −s) = e dsdt + e (t − s)dsdt + o(T ) √ 8θ 3 0 0 4θ 2 0 0 T2 T √  1  6 2θ z 2 = √ + z2o √ . θ 2T T T T

A4,T =

32θ 2

(32)

Obviously, the integral A8,T can be computed as A8,T = 2 ×

= 2×

=

64θ 3 z 3 T3

θ 2T 2

T



0

64θ 3 z 3

12z 3

T



0



+ z3o

T

1 T2

t



−2θ(t −s)

e

8θ 4

T3

g2 (s, t )g2 (s, r )g2 (t , r )1[s≤t ] drdsdt 0



1

T



0

dsdt +

0

1

T





t



−2θ (t −s)

e

8θ 3

0

(t − s)dsdt + o(T )

0

.

(33)

The above results on Ai,T , i = 1, . . . , 8 and (24) prove that for fixed z ∈ R,

κ3 (ΨT (z )) = √

 1  1 1  1  + z2o √ + z3o 2 . √ + o √ + zo

48 2θ

T

T

T

T

(34)

T

T

By formula (23), the fourth cumulant of ΨT (z ) is given by

κ4 (ΨT (z )) = 48∥h2,z ⊗1 h2,z ∥2H⊗2 .

(35)

T T

ℓ˜ 1,T (s, t )2 dsdt is the main term in (35) in that it determines the asymptotics of κ4 (ΨT (z )) for fixed z ∈ R as T → ∞. Also, from (26), the main terms in ℓ˜ 1,T (s, t ) are in the Next we compute the right-hand side of (35). First we note that

0

0

following: 2θ −θ(t −s) e 1[s≤t ] T

2θ −θ(t −s) e (t − s)1[s≤t ] . T

and

Hence direct computations and (25) yield

κ4 (ΨT (z )) = 48∥h2,z ⊗1 h2,z , h2,z ∥2H⊗2  T  T  4 2 = 48 ℓ˜ i,T (s, t ) dsdt 0

0

= (48 × 2)

i=1 T

 0

+o

1 T

 t 0

2θ T

[e

−θ(t −s)

+e

−θ(t −s)

(t − s)]

 1  1  + zo √ + z 2 o 2 + z 3 o T

T

T

2

1 T2

dsdt



√ T

+ z4o

1 T3

 1  1  1  1  1 1 1 1 +o + zo √ + z 2 o 2 + z 3 o = (48 × 4)θ 1 + + 2 √ + z4o 3 . θ 2θ T T T T T T T2 T Therefore the proof of lemma is completed.

(36)



As mentioned in Remark 1, formula (15) is quite different from the classical version of Edgeworth expansion. Hence the following lemma plays an important role in estimating the remainder in a one-term Edgeworth expansion.

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



9

Lemma 4. Let F = Iq (f ), where f ∈ H⊗q , with E[F 2 ] = 1. Then we have

(a) E[|1 − Γ1 (F )|2 ] ≤ Cq κ4 (F ),  (b) E[|(1 − Γ1 (F ))Γ˜ 2 (F )|] ≤ Cq κ4 (F )[|κ3 (F )| + κ4 (F )3/4 ], (c ) E[|Γ˜ 3 (F )|] ≤ Cq κ4 (F ). Proof. By following the proof of Theorem 5.1 in Nourdin and Peccati (2010) for the formula Γs (F ), an explicit expression for Γ˜ s (F ), s ≥ 1, is given by

Γ˜ s (F ) =

q 

···

[sq−2r1 −···− 2rs−1 ]∧q

r1 =1

αq (r1 , . . . , rs−1 ) × (sq − 2r1 − · · · − 2rs−1 )

rs =1

× (rs − 1)!



sq − 2r1 − · · · − 2rs−1 − 1



rs − 1

q−1

 × 1{r1
1 +···+rs−2 <

rs − 1

(s−1)q 2

1

{r1 +···+rs−1 < sq2 }

˜ r1 f )⊗ ˜ r2 f ) . . . f )⊗ ˜ rs f ), × I(s+1)q−2r1 −2r2 −···−2rs ((. . . (f ⊗

(37)

where the combinatorial constants αq (r1 , . . . , rs−2 ) are recursively defined by the relation

 2 q−1 αq (r ) = q(r − 1)! , r −1 and for any a ≥ 2,

αq (r1 , . . . , ra ) = αq (r1 , . . . , ra−1 )(aq − 2r1 − · · · − 2ra−1 )(ra − 1)! ×



aq − 2r1 − · · · − 2ra−1 − 1 ra − 1



q−1 ra − 1



.

For s = 1, formula (37) gives

Γ1 (F ) =

q 

˜ r f ). αq (r )I2q−2r (f ⊗

(38)

r =1

By the Cauchy–Schwarz inequality and Lemma 4.2 in Bishwal and Bose (1995), we estimate

E[|1 − Γ1 (F )|2 ] ≤

q−1 

|αq (r )|2

r =1

q−1 

˜ r f )|2 ] E[|I2q−2r (f ⊗

r =1

≤ Cq max ∥f ⊗r f ∥2H⊗(2q−2r ) 1≤r ≤q−1

≤ Cq κ4 (F ).

(39)

For s = 2, formula (37) can be expressed as

Γ˜ 2 (F ) =

2r1 )∧q q (2q−   r 1 =1

˜ r1 f )⊗ ˜ r2 f ). αq (r1 , r2 )1{r1
(40)

r2 =1

By the Cauchy–Schwarz inequality and the estimate (4.10) in Bishwal and Bose (1995), we have



E (Γ˜ 2 (F ) −

1 2

   κ3 (F ))2 = E (Γ˜ 2 (F ) − E[Γ˜ 2 (F )])2 ≤

2r1 )∧q q (2q−   r 1 =1

αq (r1 , r2 )2 ×

r2 =1

2r1 )∧q q (2q−   r 1 =1

1{r1
3q



1{r1
3q



r1 +r2 ̸= 2

r2 =1

˜ r1 f )⊗ ˜ r2 f )2 ] × E[I3q−2r1 −2r2 ((f ⊗ =

2r1 )∧q q (2q−   r 1 =1

r2 =1

αq (r1 , r2 )2 ×

2r1 )∧q q (2q−   r 1 =1

r2 =1

r1 +r2 ̸= 2

˜ r1 )⊗ ˜ r2 f ∥2 ⊗(3q−2r1 −2r2 ) × (3q − 2r1 − 2r2 )!∥(f ⊗ H ≤ Cq max ∥f ⊗r f ∥3H⊗(2q−2r ) . 1≤r ≤q−1

(41)

10

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

Here for the first equation in (41), we use the relation E[Γ˜ 2 (F )2 ] = E[Γ2 (F )2 ] = (41), we use

E[Γ˜ 2 (F )] =

2r1 )∧q q (2q−   r 1 =1

) 1 2



κ3 (F ), and for the second inequality in

˜ r2 f ). ˜ r1 f )⊗ αq (r1 , r2 )1{r1
r2 =1

The estimate (4.5) in Bishwal and Bose (1995) and (41) yields

E[Γ˜ 2 (F )2 ] ≤ Cq [κ3 (F )2 + κ4 (F )3/2 ].

(42)

By (39), (42) and the Cauchy–Schwarz inequality, we have that

 E[(1 − Γ1 (F )) ] E[Γ˜ 2 (F )2 ]  ≤ Cq κ4 (F )[|κ3 (F )| + κ4 (F )3/4 ].

E[|(1 − Γ1 (F ))Γ˜ 2 (F )|] ≤



2

(43)

Obviously, from (37), it follows that 2r1 )∧q (3q−2r q (2q− 1 −2r2 )∧q  

Γ˜ 3 (F ) =

r 1 =1

αq (r1 , r2 , r3 )1{r1
r3 =1

r 2 =1

3q 1 +r2 < 2



˜ r1 f )⊗ ˜ r2 f )⊗ ˜ r3 f ). × I4q−2r1 −2r2 −2r3 (((f ⊗

(44)

Using the estimates (4.5) and (4.11) in Bishwal and Bose (1995) for Eq. (44) yields

E[Γ˜ 3 (F )2 ] ≤ Cq max ∥f ⊗r f ∥4H⊗(2q−2r ) ≤ Cq κ4 (F )2 .

(45)

1≤r ≤q−1

The estimate (45) and the Cauchy–Schwarz inequality prove that (c) holds.



4. Main result In this section, we will prove that an optimal rate of Kolmogorov distance for CLT on MLE θˆT is given by √1 . First we T

compute an upper bound and then find a lower bound with the same speed. 4.1. Upper bound In this section, we show that an upper bound in Kolmogorov distance for a normal approximation of MLE θˆT is given by the rate √1 . T

Theorem 1 (Upper Bound). For sufficiently large T > 0, there exists a constant Cθ such that

      T 1   (θˆT − θ ) ≤ z − P(Z ≤ z ) ≤ Cθ √ . supP  2θ z ∈R  T Proof. Recall that GT and QT are given in (20). Thus, ⟨S ⟩T =

 P

T 2θ

(θˆT − θ ) ≤ z



(46)

T 0

Xt2 dt = 2Tθ (GT + QT ) > 0 a.s. for all T > 0. By (19), we obtain

= P(−FT − 2zGT ≤ 2zQT ).

(47)

Obviously, for every z ∈ R, ΨT (z ) = −FT − 2zGT is an element of D3,2 . Recall that a non-zero finite sum of multiple integrals has an absolutely continuous law with respect to the Lebesgue measure (see Shigekawa, 1978). This implies that ΨT (z ) has an absolutely continuous law with respect to the Lebesgue measure. By using a one-term Edgeworth expansion in Lemma 2, we have 3

 P

T 2θ

 (θˆT − θ ) ≤ z − P(Z ≤ z ) = P(ΨT (z ) ≤ 2zQT ) − P(Z ≤ 2zQT ) + δT (z ) =−

1 3!

H2 (2zQT )φ(2zQT )κ3 (ΨT (z )) + R(ΨT (z )) + δT (z ),

where δT (z ) and the remainder term R are given by

δT (z ) = P(Z ≤ 2zQT ) − P(Z ≤ z ),

(48)

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



11

and ′ ′ R(ΨT (z )) = E[Ψ1 f2zQ (ΨT (z ))(1 − Γ1 (ΨT (z )))2 ] + E[Ψ0,1 f2zQ (ΨT (z ))(1 − Γ1 (ΨT (z )))Γ˜ 2 (ΨT (z ))] T T ′ − E[Ψ0,0 f2zQ (ΨT (z ))Γ˜ 3 (ΨT (z ))]. T

By using Lemmas 1 and 4, we estimate

   |R(ΨT (z ))| ≤ C κ4 (ΨT (z )) + κ4 (ΨT (z ))[|κ3 (ΨT (z ))| + κ4 (ΨT (z ))3/4 ] . For the upper bounds of terms in (48), it suffices to consider z ≥ 0 because we may apply the result to −

(49)



T 2θ

(θˆT − θ ) in

the case when z < 0. We will prove this theorem in two cases separately. We consider a positive sequence such that, as T → ∞, aT

aT → ∞ and



→ 0 for every α > 0.

(50)

(i) z ≥ aT : Obviously, sup |H2 (z )|φ(z ) ≤ C .

(51)

z ≥a T

From (48) and (51),

            T T P ˆT − θ ) ≤ z − P(Z ≤ z ) ≤ P ˆT − θ ) ≤ aT − P(Z ≤ aT ) + 2P(Z ≥ aT ) ( θ ( θ     2θ 2θ   ≤ C |κ3 (ΨT (aT ))| + |R(ΨT (aT ))| + |δT (aT )| + 2P(Z ≥ aT )   |R(ΨT (aT ))| ≤ C |κ3 (ΨT (aT ))| 1 + + |δT (aT )| + 2P(Z ≥ aT ). |κ3 (ΨT (aT ))|

(52)

From Lemma 3 and (50), it follows that

1  1  20 × 48 +o and κ4 (ΨT (z )) = . √ +o √ θT T 2θ T T

κ3 (ΨT (aT )) = √

48

(53)

The mean value theorem yields, for z > 0, 1

z

− δT (z ) =

2θ T

(1 − e−2θ T ) √



e−

λ2 T 2

,

(54)

where λT satisfies



1−

1 2θ T

 (1 − e−2θ T ) z < λT < z .

We take T ≥ T0 such that 2θ1T (1 − e−2θ T ) < 1/2. From (54), we estimate

|δT (z )| ≤

1

z

2θ T

(1 − e−2θ T ) √



e−

z2 4



1 2θ T

.

(55)

Also the estimate (49) of the remainder proves, together with (53), that as T → ∞,

|R(ΨT (aT ))| → 0. |κ3 (ΨT (aT ))|

(56)

The last term in (52) can be estimated as

P(Z ≥ aT ) ≤ If we take aT =

1

√ aT

e−aT /2 . 2



(57)



2 log T , then we have, from (57), that

1 P(Z ≥ aT ) ≤ √ . √ 2π T 2 log T

(58)

12

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



By combining the above results (53), (55), (56) and (58), there exists a constant C > 0 such that, for sufficiently large T > 0,

      T 1 sup P (θˆT − θ ) ≤ z − P(Z ≤ z ) ≤ Cθ √ . 2θ z ≥a T

(59)

T

(ii) Case 0 < z ≤ aT : By a similar estimate as for the case (i), we have that, for sufficiently large T ,

      T  ˆ sup P (θT − θ ) ≤ z − P(Z ≤ z ) ≤ sup |P(ΨT (z ) ≤ 2zQT ) − P(Z ≤ 2zQT )| + sup |δT (z )| 2 θ 0≤z ≤aT 0≤z ≤aT 0≤z ≤aT   ≤ C sup |κ3 (ΨT (z ))| + |R(ΨT (z ))| + sup |δT (z )| 0≤z ≤aT

0≤z ≤aT

1

≤ Cθ √ .

(60)

T

By combining (59) and (60), the proof of theorem is completed.



4.2. Lower bound In this section, we obtain a lower bound of Kolmogorov distance for CLT on MLE θˆT . Theorem 2 (Lower Bound). For sufficiently large T > 0, there exists a constant cθ > 0 such that

      1 T   ˆ (θT − θ ) ≤ z − P(Z ≤ z ) ≥ cθ √ . supP   2θ z ∈R T

(61)

Proof. Plugging zero into z in a one-term Edgeworth expansion (48) and using H2 (0) = −1, we have

 P

T 2θ

 (θˆT − θ ) ≤ 0 − P(Z ≤ 0) = P(ΨT (0) ≤ 0) − P(Z ≤ 0) =

1



3! 2π

κ3 (ΨT (0)) + R(ΨT (0)).

(62)

Eq. (62) gives

          T 1 |R(ΨT (0))|  P ˆ (θT − θ ) ≤ 0 − P(Z ≤ 0) ≥ max |κ3 (ΨT (0))| √ − ,0 .  2θ |κ3 (ΨT (0))| 3! 2π

(63)

From (49), it follows that, as T → ∞,

|R(ΨT (0))| → 0. |κ3 (ΨT (0))|

(64)

Hence we have, from (63) and (64), that, for sufficiently large T > 0,

            T T     supP (θˆT − θ ) ≤ z − P(Z ≤ z ) ≥ P (θˆT − θ ) ≤ 0 − P(Z ≤ z )    2θ 2θ z ∈R     1 R(ΨT (0))   ≥ cθ √1 , = |κ3 (ΨT (0))| √ +  κ ( Ψ ( 0 )) 3! 2π T 3 T where c is a positive constant. Hence from (65), the lower bound is obtained.

(65)



Combining Theorems 1 and 2, we obtain an optimal Berry–Esseen bound for the sequence



T 2θ

 (θˆT − θ ), T ≥ 0 .

Theorem 3 (Optimal Bound). For sufficiently large T > 0, there exist constants 0 < cθ < Cθ < ∞ such that

      T 1   cθ √ ≤ supP (θˆT − θ ) ≤ z − P(Z ≤ z ) ≤ Cθ √ .  2θ z ∈R  T T 1

(66)

Y.T. Kim, H.S. Park / Journal of the Korean Statistical Society (

)



13

Remark 2. As mentioned in Introduction, our technique is more straightforward than the squeezing techniques used in the paper (Bishwal, 2000). Indeed, in order to use the squeezing techniques, the rates for a normal approximation of the quantities in MLE should be obtained first (see Corollary 2.3 and Theorem 2.5 in Bishwal, 2000):

      2θ 1   ST ≤ z − P(Z ≤ z ) ≤ C √ , (a) supP  T z ∈R  T         T 2θ 1   (b) supP ⟨ST ⟩ − ≤ z − P(Z ≤ z ) ≤ C √ ,   T 2 z ∈R T         2θ  2θ 1   ST − ⟨ST ⟩ − 1 ≤ z − P(Z ≤ z ) ≤ C √ . (c) supP −  T T z ∈R  T To prove (a), (b) and (c), Bishwal (2000) gives the distance between the characteristic functions of the above quantities and standard normal. As seen in the proof of Theorem 2.5 in Bishwal (2000), it is uneasy to use the squeezing techniques. In our work, only the third and fourth cumulants of the second order multiple stochastic integral are needed, but tedious computations are accompanied as seen in Lemma 3. Acknowledgments We would like to thank the Editor-in-Chief, an Associate Editor and two anonymous referees for comments which considerably improved the paper. References Basawa, I. V., & Prakasa Rao, B. L. S. (1980). Statistical inference for stochastic processes. New York-London: Academic Press. ¯ Series A, 62, 1–10. Bishwal, J. P. N. (2000). Sharp Berry-Esseen bound for the maximum likelihood estimators in the Ornstein-Uhlenbeck process. Sankhya. Bishwal, J. P. N. (2008). Lecture notes in mathematics: vol. 1923. Parameter estimation in stochastic differential equations. Springer-Verlag. Bishwal, J. P. N., & Bose, A. (1995). Speed of convergence of the maximum likelihood estimators in the Ornstein-Uhlenbeck process. Calcutta Statistical Association Bulletin, 45, 245–251. Bose, A. (1985). Rate of convergence of the maximum likelihood estimators in the Ornstein-Uhlenbeck process, Technical Report 4/85. Calcutta: Stat-Math unit, Indian Statistical Institute. Chen, L. H. Y., Goldstein, L., & Shao, Q.-M. (2011). Probability and its applications (New York), Normal approximation by Stein’s method. Heidelberg: Springer. Fox, R., & Taqqu, M. S. (1987). Central limit theorems for quadratic forms in random variables having long-range dependence. Probability Theory and Related Fields, 74, 213–240. Heidelberg. Kim, Y. T., & Park, H. S. (2016). An edgeworth expansion for functionals of Gaussian fields and its applications. (submitted for publication). Michel, R., & Pfanzagl, J. (1971). The accuracy of the normal approximation for minimum contrast estimate. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 18, 73–84. Mishra, M. N., & Prakasa Rao, B. L. S. (1985). On the Berry-Esseen theorem for maximum likelihood estimator for linear homogeneous diffusion processes. ¯ Series A, 47, 392–398. Sankhya. Nourdin, I., & Peccati, G. (2010). Cumulants on the Wiener space. Journal of Functional Analysis, 258(11), 3775–3791. Nourdin, I., & Peccati, G. (2012). Cambridge tracts in mathematica: Vol. 192. Normal approximations with Malliavin calculus: From Stein’s method to universality. Cambridge: Cambridge University Press. Nualart, D. (2006). Malliavin calculus and related topics (second ed.). Springer. Nualart, D., & Peccati, G. (2009). Stein’s method and exact Berry-Esseen asymptotics for functionals of Gaussian fields. The Annals of Probability, 37(6), 2231–2261. Pfanzagl, J. (1971). The Berry-Esseen bound for the minimum contrast estimators. Metrika, 17, 82–91. Shigekawa, I. (1978). Absolute continuity of probability laws of Wiener functionals. Japan Academy. Proceedings. Series A. Mathematical Sciences, 54, 230–233.