Journal of Statistical Planning and Inference 91 (2000) 323–340
www.elsevier.com/locate/jspi
Optimal inference for discretely observed semiparametric Ornstein-Uhlenbeck processes Marc Hallina; ∗; 1 , Christophe Koellb; 2 , Bas J.M. Werkera; 3
a Institut
de Statistique et de Recherche OpÃerationnelle and DÃepartement de MathÃematique, UniversitÃe Libre de Bruxelles, CP 210 Boulevard du Triomphe, B-1050 Bruxelles, Belgium b Institut de Recherche Mathà ematique AvancÃee, UniversitÃe Louis Pasteur et C.N.R.S., 7 Rue RenÃe Descartes, F-67084 Strasbourg Cedex, France
Abstract In this paper we discuss statistical inference about the continuous time parameters of a semiparametric Ornstein–Uhlenbeck process observed in discrete time. The model is semiparametric in the sense that we do not necessarily assume that the driving process is a Brownian motion. The main results are stated for a more general time-series model: a quantile autoregressive model. For this semiparametric model we will construct locally asymptotically ecient estimators. Finally, we investigate the implications for the semiparametric Ornstein–Uhlenbeck c 2000 Elsevier Science B.V. All rights reserved. model. MSC: 62M10 Keywords: Ornstein–Uhlenbeck process; Semiparametric time-series
1. Introduction One of the simplest and most often used continuous-time stochastic processes is the Ornstein–Uhlenbeck process. This process is traditionally driven by a standard Brownian motion. In this paper, we leave this Gaussian framework by extending the Ornstein–Uhlenbeck model to a semiparametric one without parametric distributional assumptions. Such an extension is important in, e.g., nancial applications, where ∗
Corresponding author. E-mail address:
[email protected] (M. Hallin). 1 Research supported by an A.R.C. contract of the Communaut e francaise de Belgique and the Fonds d’Encouragement a la Recherche de l’Universite Libre de Bruxelles. 2 This work was partly completed when the author visited the Institut de Statistique et de Recherche Operationnelle at the Universite Libre de Bruxelles on a European Union Human Capital and Mobility postdoctoral fellowship (under Programme ERB-CHRX-CT 940693) and later on the above mentioned A.R.C. contract. 3 Also aliated to Tilburg University. c 2000 Elsevier Science B.V. All rights reserved. 0378-3758/00/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 1 8 5 - 3
324
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
rst-order autoregressive structures are often found (for interest rates) but also evidence for the presence of jumps (i.e. non-continuous sample paths) exists, see, e.g., Jorion (1988). Clearly, these jumps are incompatible with the Gaussian Ornstein–Uhlenbeck process. We start by recollecting some well-known results for the traditional Gaussian Ornstein–Uhlenbeck model in Section 2. This model is uniformly locally asymptotically normal (ULAN). In this fully parametric setup, hence, ecient estimators and optimal tests are easily constructed. In Section 3, we then consider two semiparametric extensions obtained when we replace the driving Brownian motion with a general (non-Gaussian, but with nite variance) Levy process. Such an extension has been studied in a dierent context by Barndor-Nielsen and Shephard (1998). They solve the problem of nding a Levy process that generates a prespeci ed stationary distribution for the process. Instead, we consider models where the autoregressive structure of the discretely observed Ornstein–Uhlenbeck process is maintained, but the innovations’ distribution is a functional nuisance parameter. In Section 4, we construct ecient estimators for these semiparametric models. The techniques used are similar to those of Bickel et al. (1993) and Greenwood and Wefelmeyer (1995). It turns out that the ecient score function is closely related to the generating group of our semiparametric model. This is a consequence of more general results outlined in Hallin and Werker (1999) and outside the scope of this paper. 2. The Gaussian Ornstein–Uhlenbeck process Let ( ; G; (Gt : t¿0); P) be a ltered probability space. The stochastic process (Xt )t¿0 is called a standard Ornstein–Uhlenbeck process if it satis es the stochastic dierential equation dXt = −Xt dt + dWt ;
t¿0;
(2.1)
where ¿ 0 is the parameter of interest and W some standard P-Brownian motion with respect to (Gt : t¿0). One may think of X as being the speed of a particle of mass m which, at time t, is subjected to a force composed of two parts, a frictional force −mXt and a uctuating force (formally written as m dWt =dt), so that this equation is, again formally, nothing else but Newton’s Law. It is known as the Langevin equation. Or, one may think of X as the prevailing short-term interest rate in a term-structure model. A slight extension of the model is obtained by a location-scale transformation of the process X , i.e., Xt 7→ + Xt , yielding dXt = −(Xt − ) dt + dWt :
(2.2)
In order to simplify notation and to focus on the drift coecient , we stick to model (2.1), but the results derived for (2.1) are easily extended to (2.2). We suppose that we are given equidistant discrete-time observations from the process X . That is, we observe X0 ; Xh ; X2h ; : : : ; Xjh ; : : : ; Xnh , with h xed. We consider asymptotics as the span of the time-series tends to in nity with constant inter-observation
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
325
time h. Of course, the justi cation of this choice of asymptotics strongly depends on the application one has in mind. The hypothesis under which X0 ; Xh ; : : : ; Xnh are generated by discretization from (2.1) is denoted by HG(n) (). It is well-known that applying the Itˆo formula to d(exp(t)Xt ) yields Z h −h e−(h−s) dWt+s : (2.3) Xt+h = e Xt + 0
Note that Z h 0
e
−(h−s)
Z dWt+s ∼ N 0;
h
0
! e
−2(h−s)
ds
1 − e−2h ; = N 0; 2
so that HG(n) () coincides with a simple hypothesis of Gaussian rst-order autoregressive dependence, with AR parameter and innovation variance of the form exp(−h) and (1 − exp(−2h))=(2), respectively. Note that ¿ 0 implies exp(−h) ∈ (0; 1) so that (2.1) generates a causal AR(1) model. Of course, all this strongly depends on the assumed normality of the continuous-time innovation W . We will come back to this point in Section 3. Let us de ne r 1 − exp(−2h) def def : (2.4) () = exp(−h) and () = 2 Now, HG(n) () holds if and only if the observations Xt , t = h; 2h; : : : ; nh, are such that def
t () =
Xt − ()Xt−h ; ()
t = h; : : : ; nh;
(2.5)
are i.i.d. standard normal. For these kinds of models, LAN is easily obtained, e.g., following the lines of Drost et al. (1997). Let 0 () and 0 () denote the derivatives of () and () with respect to . Then we obtain the following (conditional) score function for the observation at time t: 0 () 0 () (1 − t ()2 ): Xt−h t () − () ()
(2.6)
The Gaussian distribution is symmetric, so that t and 1−t2 are uncorrelated. Therefore, the Fisher information for is given by def
IG () =
0 ()2 0 ()2 + 2 : 1 − ()2 ()2
(2.7)
In the sequel, we suppose that X0 is observed and we consider the models conditionally on the value of X0 . As is to be expected, the limiting experiments will not depend on the value of X0 , which shows that the starting conditions are inessential for these causal autoregressive models. Formally, we have the following uniform local asymptotic normality (ULAN) property. Let P(n) denote the joint distribution of (Xh ; X2h ; : : : ; Xnh ) under the hypothesis
326
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
HG(n) (). Consider the sequence of statistical experiments def
E(n) = (Rn ; B(Rn ); {P(n) : ¿ 0}); where B(Rn ) denotes the Borel sigma- eld on Rn . Theorem 2.1. Consider the sequence of experiments (E(n) )n¿1 . Fix ¿ 0 and let √ (n) = + n = n, n¿1; be a sequence of local alternatives (i.e. n is bounded) and (n) | = log(dP(n)(n) =dP(n) ) the logarithm of the likelihood ratio of HG(n) ( (n) ) with respect to HG(n) (). Deÿne 0 nh () 0 () def 1 P (n) t ()Xt−h − () = √ (2.8) (1 − t ()2 ) ; G () n t=h () where and are deÿned in (2.4); t () is the residual deÿned in (2.5) (which; of course; coincides with the error t under the hypothesis HG(n) ()). Then; the following two properties hold: (i) (Local asymptotic quadraticity or LAQ) (n) () + 12 2n IG () (n) | − n G
converges to zero in P(n) -probability; as n tends to inÿnity; uniformly w.r.t. in compact intervals of (0; ∞). (n) () converges in distribution to (ii) (Asymptotic normality) The central sequence G a centered normal random variable with variance IG (); as n tends to inÿnity; under the hypothesis HG(n) (); uniformly w.r.t. in compact intervals of (0; ∞). The sequence of experiments (E(n) )n¿1 is thus ULAN. Hence, optimal estimators and tests for the Gaussian parametric case follow along standard lines. The central sequence presents a mixture of location and scale features, due to the dependence of and on . In the remainder of this paper, we will consider semiparametric extensions of the model. 3. Semiparametric extensions In the previous section, we saw how the standard Ornstein–Uhlenbeck process (2.1), observed at equidistant time points, satis es the ULAN property with score function (2.6). In this section, we consider the model in which the driving Brownian motion W in (2.1) is replaced by a more general process, for instance a LÃevy process: dXt = −Xt dt + dLt : As before, the Itˆo formula leads to the discrete-time model Z h e−(h−s) dLt+s Xt+h = e−h Xt + 0
(3.1)
(3.2)
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
327
for any h ¿ 0 and any t ¿ 0. Let us denote this latter (stochastic) integral by ˜t; h . Note that the continuous-time innovation process L only enters in the distribution of the discrete time innovations ˜t; h and not in the autoregressive parameter in (3.2). However, the exact form of this distribution depends on as well. To elaborate on this somewhat further, we consider the following example, in which we calculate the characteristic function of the discrete time innovations ˜t; h in case the characteristic function of Lt equals exp(−ct|u| ) for some c ¿ 0 and 0 ¡ 62 ( denotes the index of the Levy process L). Setting Zt; h; j; m = e−h(1−( j−1)=m) (Lt+h j=m − Lt+h( j−1)=m ); we have, for all H ¿ 0, P m sup Zt; h; j; m − ˜t; h = oP (1); 06h6H j=1
m → ∞:
But since L is a Levy process, the random variables Zt; h; j; m are independent and share the same distribution as, say, Z˜ h; j; m = e−h(1−( j−1)=m) Lh=m : The characteristic function of ˜t; h is, therefore, given by m m Q Q ’Z˜h; j; m (u) = lim ’Lh=m (e−h(1−( j−1)=m) u) ’˜t; h (u) = lim m→∞ j=1
m→∞ j=1
m Q
h exp − c|e−h(1−( j−1)=m) u| = lim m→∞ j=1 m
!
m h P e−h(1−( j−1)=m) = exp −c|u| lim m→∞ j=1 m ! Z
= exp −c|u|
h
0
e
− (h−s)
= ’L(1−exp(− h))= (u) = ’L1
− e− h ds = exp −c|u| ! 1= 1 − exp(− h) u
1
= ’((1−exp(− h))= )1= L1 (u): Note that this characteristic function does not depend on t. For any positive t, the three random variables 1= 1 − e− h L1 ˜t; h; L(1−e− h )= and are thus identically distributed. The discrete-time model (3.2) thus reduces to ˜ ˜ Xt = ()X t−h + () t; ˜ )= )1= , where with () ˜ = exp(−h) and () ˜ = ((1 − exp(−h ))= )1= = ((1 − () (t )t¿1 is a sequence of independent random variables (also supposed to be independent
328
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
of X0 ) sharing the same distribution as L1 . A consequence of the stability of the latter random variable is that its density is in nitely dierentiable (see, e.g., Chow and Teicher, 1988, p. 457, Exercise 3). This example shows that the characteristics of the Levy process L enter, in general, in a non-trivial way in the discrete-time innovation distribution, thereby generating a large class of possible discrete-time distributions. In order to analyse this semiparametric model, we shall introduce two semiparametric extensions which take the form of rst-order autoregressive models with quantile restrictions on the innovations. To simplify notation, we put the distance between observations dates to h = 1. Recall that in the Gaussian case we found Xt+1 = ()Xt + ()t+1 ; where the discrete-time innovations t are i.i.d. standard normal. This latter normality crucially depends on the normality of the Brownian motion as continuous-time innovation process. We will relax this latter normality assumption by assuming that the density f of the innovations t belongs to some non-parametric class. Formally, we will denote by Hf(n) () the hypothesis under which Xt − ()Xt−1 def = t (); t = 1; 2; 3; : : : ; n; () are i.i.d. with density f, where () and () are de ned in (2.4) and f belongs to a broad class of densities described in Assumptions A1–A4 below. Note that this semiparametric model contains all discretized versions of (3.1), including those for which the Levy process L contains jumps, such as the compound Poisson process. However, it is not guaranteed that for every density f there exists a Levy process such that discretizing (3.1) leads to innovations t () that have density f. We will throughout assume that the median of f is zero. In order to identify the scale transformation (), we need of course to x the scale of the density f in some way. We will consider two possibilities. First, we will x the 25% and 75%-quantiles of the distribution of t at −1 and +1. In this case the probabilities that t belongs to (−∞; −1]; (−1; 0]; (0; 1], or (1; +∞) all equal 1=4. Note that −1 and 1 are not the 25% and 75% quantiles of a standard normal law, so that we implicitly introduced a rescaling of the Gaussian model. We stick to −1 and 1 for notational convenience. Another possibility would be to x the median of t2 at unity. This gives rise to a comparable semiparametric model, but without imposing any extra symmetry conditions. We will see how these two models dier with respect to optimal inference. The semiparametric models are now formally speci ed by requiring that f belongs to the class F of densities de ned by the following set of assumptions. Assumption R A1. f is strictly positive, with nite variance, i.e., u2 f(u) du ¡ ∞. A2. f is absolutely continuous with a.e. derivative f0 . def R def R A3. Ill = (f0 (u)=f(u))2 f(u) du ¡ ∞ and Iss = (1 + uf0 (u)=f(u))2 f(u) du ¡ ∞.
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
A4. 0
A4 .
R −1
f(u) du =
R−∞ 0 −∞
R0
f(u) du =
def
def
Ils = Isl = def
Z
f(u) du =
R−1 1 −1
0
f(u) du =
R∞ 1
f(u) du = 1=4, or
f(u) du = 1=2. We also de ne
uf0 (u)2 =f(u) du;
Z
f =
R1
329
uf(u) du;
and
def f2 =
Z
u2 f(u) du − f2 :
(3.3)
def S The semiparametric hypothesis is denoted by H(n) () = f ∈ F Hf(n) () if F is de ned by Assumption A1–A4, and by H0(n) () if F is de ned by Assumption A1–A40 . The traditional way of standardizing densities is to x mean and variance. The results in this paper are easily translated to that case. However, choosing the median and some quantiles of the residuals to standardize shows that this more classical mean-variance standardization is quite arbitrary and not essential in order to carry out asymptotic analysis. An advantage of our standardization is that it allows for an underlying group invariance structure. More precisely, under Assumption A4, any continuous, monotonically increasing transformation g of the innovations t with g(−1) = −1; g(0) = 0, and g(1) = 1, yields new innovations that again satisfy Assumption A4. These transformations clearly constitute a group G. As we will see later, the ecient estimators that we construct are closely related to the invariance structure induced by G. Such a group does not exist if densities are standardized in the classical sense, using the mean and the variance. This is discussed in more detail for a similar model in Hallin et al. (1999). In order to construct ecient inference procedures for in the case f ∈ F is unknown, we need to calculate the so-called ecient score function. Let us rst remark that under Assumptions A1–A3, the ULAN property of Theorem 2.1 remains true (see, e.g., Drost et al. (1997)), provided that G () and IG () are replaced with n def 1 P ˙ lt () (n) () = √ n t=1
and def
If () = lim Var(l˙t ()) t→∞ " # 2 2 0 ()0 () 0 ()2 f f I + + 2 I + Iss ; = 0 ()2 ll f ls (1 − ())() ()2 1 − ()2 (1 − ())2 with 0 () f0 (t ()) 0 () def Xt−1 − l˙t () = − () f(t ()) ()
1 + t ()
f0 (t ()) f(t ())
;
(3.4)
respectively. Note that If () is continuous in . Let us also recall the following consequence of the ULAN property.
330
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
Lemma 3.1. Under Assumptions A1–A3 and with l˙t () deÿned by (3.4) we have; for √ √ all sequences n = 0 + O(1= n) and ˜n = 0 + O(1= n); under P(n) ; n ;f n √ 1 P ˙ ˜ √ lt (n ) − l˙t (n ) = −I (0 ) n(˜n − n ) + oP (1): n t=1 Proof. This is well-known. Expand, using the ULAN property, 0 = ˜n |n + n |˜n ; and use the continuity of If () in . The ecient score is generally obtained by taking the residual of the projection of the score function (3.4) on the tangent space generated by the in nite-dimensional nuisance parameter f. We will calculate this projection now. Let us de ne the indicators A1 (u) = I(−∞; −1] (u); A2 (u) = I(−1; 0] (u); A3 (u) = I(0; 1] (u); A4 (u) = I(1; +∞) (u) and B1 (u) = I(−∞; 0] (u)
and
B2 (u) = I(−1; 1] (u):
The tangent space under Assumption A4 is then easily seen to be T() = {h(): Ef h()2 ¡ ∞; Ef h()Aj () = 0; j = 1; : : : ; 4};
(3.5)
where we use as generic notation for an innovation and Ef denotes the expectation under the hypothesis that the density of is f. Under Assumption A40 the tangent space is given by T0 () = {h(): Ef h()2 ¡ ∞; Ef h()B1 () = Ef h()B2 () = Ef h() = 0}: (3.6) The score for consists of two parts: a ‘-score’, −(f0 (t )=f(t ))Xt−1 , and a ‘-score’, −(1 + t f0 (t )=f(t )). In order to project the -score on the tangent space, we consider these two parts separately. Let (·|T ) denote the projection operator onto T . Note that, for a ¡ b, f0 () I(a; b] () = f(b) − f(a); f() f0 () I(a; b] () = bf(b) − af(a): Ef 1 + f() Ef
Using these relations, one easily veri es, for the model de ned by Assumptions A1–A4, that ! 4 Q f0 (t ) P f0 (t ) Xt−1 T(t ) = − − j Aj (t ) EXt−1 (3.7) − f(t ) f(t ) j=1 and Q
−
f0 (t ) T( ) = 1 + t t f(t )
f0 (t ) − 1 + t f(t )
+
4 P j=1
! j Aj (t ) ;
(3.8)
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
331
where def
def
def
def
def
def
1 = −4f(−1); 2 = −4(f(0) − f(−1)); 3 = −4(f(1) − f(0)); 4 = 4f(1); def
def
1 = −4f(−1); 2 = 4f(−1); 3 = 4f(1); and 4 = −4f(1): Note that 4 P j=1 4 P j=1
j Aj () = 2f(0) sgn() + [4f( sgn()) − 2f(0)] sgn(2 − 1) sgn();
j Aj () = −4f(sgn()) sgn(2 − 1):
Recall that the ecient score is given by the residual of the projection of the parametric score on the tangent space (compare Bickel et al., 1993). This suggests that the ecient score for at time t is given by 0 () f0 (t ) 0 () def (Xt−1 − EXt−1 ) + 4 f(sgn(t )) sgn(t2 − 1) l˜t () = − () f(t ) () +2 +
0 () f(0) sgn(t )EXt−1 ()
0 () [4f(sgn(t )) − 2f(0)] sgn(t2 − 1) sgn(t )EXt−1 : ()
(3.9)
The information for estimating in the semiparametric model de ned by Assumptions A1–A4 thus is apparently given by def I˜f () = lim Var l˜t () t→∞
=
0 ()2 f2
Ill ()2
+8
0 ()2 [f(1)2 + f(−1)2 )] ()2
1− 2 0 ()f [f(1)2 + f(0)2 + f(−1)2 − f(0)[f(1) + f(−1)]] +8 1 − () 0 ()0 ()f 1 f(1)2 − f(−1)2 − f(0)[f(1) − f(−1)] : (3.10) + 16 ()(1 − ()) 2
Very similar calculations give the results for the semiparametric model de ned by Assumptions A1–A40 . The ecient score function for this model is then found to be 0
0
() f (t ) 0 def (Xt−1 − EXt−1 ) l˜t () = − () f(t ) 0 () (p1 − p2 )(f(1) + f(−1)) sgn(t ) +4 () 1 + 4(p2 − p1 )(p3 − p2 )
332
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
+
(f(1) + f(−1))=2 sgn(t2 − 1) 1 + 4(p2 − p1 )(p3 − p2 )
+2 +
0 () f(0) + 2(p1 − p2 )(f(1) − f(−1)) sgn(t )EXt−1 () 1 + 4(p2 − p1 )(p3 − p2 )
0 () 4(p3 − p2 )f(0) + 2(f(1) − f(−1)) sgn(t2 − 1)EXt−1 ; () 1 + 4(p2 − p1 )(p3 − p2 )
(3.11)
where we have set pj =Ef Aj (), j =1; : : : ; 4. Note assumption A40 implies that p1 =p3 and p2 = p4 , and Assumption A1 ensures that 1 + 4(p2 − p1 )(p3 − p2 ) never vanishes. 0 def Expressing l˜t () as a function of = p2 − p1 yields 0
0
() f (t ) 0 def (Xt−1 − EXt−1 ) l˜t () = − () f(t ) +4
0 () 12 (f(1) + f(−1)) sgn(t2 − 1) − (f(1) + f(−1)) sgn(t ) () 1 − 42
+2
0 () f(0) − 2(f(1) − f(−1)) sgn(t )EXt−1 () 1 − 42
+
0 () 2(f(1) − f(−1)) − 4f(0) sgn(t2 − 1)EXt−1 : () 1 − 42
The information for the model de ned by Assumption A40 is then given by 0 0 def I˜f () = lim Var l˜t () t→∞
=
0 ()2 f2
0 ()2 21 [f(1) + f(−1)]2 I + 8 ll 1 − ()2 ()2 1 − 42 2 1 0 2 1 2 ()f 2 [f(1) − f(−1)] + 2 f(0) − 2f(0)[f(1) − f(−1)] +8 1 − () 1 − 42 + 16
0 ()0 ()f [f(1) + f(−1)][ 12 f(1) − 12 f(−1) − f(0)] : ()(1 − ()) 1 − 42 (3.12)
One easily checks that the information in the model de ned by Assumption A40 is (for =0) no larger than the information in the model de ned by Assumption ˜ A4 (I f ()), as should be. Moreover, the two information numbers are, generally, not equal, which means that the limiting experiments are, generally, dierent for the two models. Formally, a semiparametric lower bound is found by considering all parametric lower bounds for (parametric) submodels of the semiparametric model under study. The highest lower bound is generated, by de nition, by the so-called least favorable parametric submodel. For i.i.d. models, it is well-known that this least favorable parametric submodel is generated by the projections calculated above (see, e.g., Bickel et al., 1993). 0 (I˜f ())
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
333
However, in our time-series setup these tangent space calculations are still somewhat heuristic. We need to verify that one can indeed construct a parametric submodel which satis es the LAN property with scores given by (3.4) and (3.9) (or, under Assumption A40 (3.11)). In other words, we have to construct a parametric submodel of the full semiparametric model in which (3.9) (or (3.11)) exactly generates the central sequence for . Then, we de ne an estimator that is semiparametric in the sense that it does not depend on the density f and is (locally and asymptotically) ecient for this least favorable parametric submodel. In this way we prove that the proposed estimator is semiparametrically ecient. We start with a lemma whose interest will become apparent later. Lemma 3.2. Let beqa sigma-ÿnite measure and let s0 be a non-negative measurable R 0 def s (x) d(x)=1. Let h be a measurable function such that hs0 function with ||s0 || = R2 is square-integrable and h(x)s02 (x) d(x) = 0. Finally; let : R → [0; 2] be deÿned def
by (x) = (1 + x)I (|x|61). Then; for ∈ R; r = s0 (h) is non-negative and dierentiable in quadratic mean; with derivative hs0 ; i.e.; ||r − s0 − hs0 || = o();
→ 0:
Moreover; s = r =||r || is also non-negative and dierentiable in quadratic mean; with derivative hs0 ; and has norm 1. Proof. Note that, for all x, r (x) − s0 (x) (h(x)) − 1 = s0 (x) I (|h(x)| ¿ 1) = s0 (x) h(x)I (|h(x)|61) − → s0 (x)h(x) as → 0. Moreover, similar computations yield r − s0 2 2 2 6 ||s0 h|| + ||s0 hI (|h| ¿ 1)|| → ||s0 h||2 by the bounded convergence theorem. Vitali’s Theorem now gives the desired dierentiability in quadratic mean for r . The other claims are easily veri ed using the fact, R since h(x)s02 (x) d(x) = 0, that ||r − s0 − hs0 || = o() implies that ||r || = 1 + o(). Let us de ne a parametric submodel of the complete semiparametric model de ned by Assumptions A1–A4. The reasoning for the alternative model de ned by
334
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
Assumptions A1–A40 is very similar and, hence, will be omitted. In addition to the parameter , we introduce another parameter ∈ (−1; 1) in such a way that it generates the score 0 () f0 () + 2f(0) sgn() h() = − () f() ()f +[4f(sgn()) − 2f(0)]sgn(2 − 1)sgn() 1 − () f0 () 0 () 2 1+ (3.13) + 4f(sgn())sgn( − 1) : − () f() In order to do this, we de ne for xed f0 the alternative density Z 4 1P Ai (x)f0 (x)2 (h(x)=2) Ai (u)f0 (u)2 (h(u)=2) du ; f (x) = 4 i=1 u
(3.14)
where is as in Lemma 3.2. In general, and in particular in our model, it is not guaranteed that the local density (3.14) satis es Assumptions A1–A3. As such, {f : ∈ (−1; 1)} will not de ne a parametric submodel. However, the square-roots of the densities that do satisfy Assumptions A1–A3, are dense in the set of all square-root densities. Therefore, one can always pick f in such way that they obey the same dierentiability properties as those de ned by (3.14) and satisfy Assumptions A1–A3. In the following we tacitly assume that we have chosen such latter local alternative densities. (n) (n) = {P; We now consider the model Hsub f : ¿ 0; ∈ (−1; 1)}. For (slight) notational (n) (n) convenience, we write P; = P; f . In order to derive a LAN condition joint in and , we need the following asymptotic linearity condition.
√ √ Lemma 3.3. For n = 0 + O(1= n) and ˆn = n + O(1= n) we have; under Pnn ;0 ; n √ 1 P √ [h(t (˜n )) − h(t (n ))] = − n(˜n − n )(If (n ) − I˜f (n )) + oP (1): (3.15) n t=1
Proof. First of all, note that the asymptotic linearity for the terms with f0 (t ())= f(t ()) and 1 + t ()f0 (t ())=f(t ()) follows from Lemma 3.1. So, it suces to consider n 1 P √ [ (t (˜n )) − (t (n ))]; n t=1 for piecewise constant functions
. Writing
(t (˜n )) − (t (n )) = [ (t (˜n )) − (t (0 ))] − [ (t (n )) − (t (0 ))]; } with respect to {P(n) } shows that we may assume in and using contiguity of {P(n) n ;f 0 ;f this proof that n = 0 . Writing as a linear combination of indicator functions, shows that, without loss of generality, we may take (x) = I(−∞; a] (x). Now the arguments follow Appendix A:2 of Jureckova and Sen (1996) using the absolute regularity of the bivariate process (t (˜n ); t (0 )): t ∈ Z) following from Pham and Tran (1985).
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
335
We now can prove the following LAN property. Note that this property is uniform in , but not necessarily in . √ √ Theorem 3.4. Let n = 0 + O(1= n); with 0 ¿ 0. Suppose that ˜n = n + n = n and √ n = 0 + n = n (n¿1) denote a sequence of local alternatives; and write; as usual; (n) ˜n ;n |n ;0 = log(dP(n) ˜n ;n =dPn ;0 ). Then; the following two properties hold: (i) (Local asymptotic quadraticity) ˜n ;n |n ;0 −
n n
T
(n) sub (n )
1 + 2
n n
T
Ifsub (n )
n n
-probability (and hence in P(n) tends to zero in P(n) ˜n ;0 -probability) as n goes to n ;0 inÿnity; where 0 () f0 (t ()) 0 () f0 (t ()) n P 1 X 1 + − − () def t (n) () f(t ()) t−1 () = √ sub () f(t ()) ; n t=1 h(t ()) def Ifsub () =
If () If () − I˜f () ; If () − I˜f () If () − I˜f ()
and t () = (Xt − ()Xt−1 )=(), t = 1; : : : ; n; denote again the residuals. (n) (n ) converges in distribution to a centred normal ran(ii) The central sequence sub dom variable with variance Ifsub (0 ); as n tends to inÿnity under P(n) . n ;0 Proof. Using LAN of {P(n) ˜n ; : ∈ (−1; 1)} around = 0 (see Example 3:2:1 of Bickel et al. (1993) and use Lemma 3.2), continuity of the Fisher information, ULAN of (n) (n) {P; 0 : ¿ 0} around = 0 , and Lemma 3.3, we obtain, under P˜ ;0 , n
(n) (n) (n) ˜n ;n |n ;0 = log(dP(n) ˜ ; =dP˜ ;0 ) + log(dP˜ ;0 =dPn ;0 ) n
n
n
n
n 1 n P h(t (˜n )) − 2n (If (˜n ) − I˜f (˜n )) =√ 2 n t=1 1 + n (n) (n ) − 2n If (n ) + oP (1) 2 n 1 n P h(t (n )) − n n (If (n ) − I˜f (n )) − 2n (If (n ) − I˜f (n )) =√ 2 n t=1 1 + n (n) (n ) − 2n If (n ) + oP (1) 2 T n P 1 n l˙t (n ) √ = n n t=1 h(t (n )) T 1 n If (n ) − I˜f (n ) n If (n ) + oP (1): − If (n ) − I˜f (n ) If (n ) − I˜f (n ) n 2 n
336
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
The required asymptotic normality follows trivially by the martingale central limit theorem. As an immediate consequence, we obtain the asymptotic linearity condition for the ecient score for . For this asymptotic linearity it is apparently sucient to have uniformity of the LAN-condition with respect to alone. √ Lemma 3.5. Fix 0 ¿ 0 and let n = 0 + n = n with n a bounded sequence in R. (n) Then we have; under P0 ;0 ; i n h √ 1 P √ (3.16) l˜t (n ) − l˜t (0 ) = −I˜f (0 ) n(n − 0 ) + oP (1); n → ∞: n t=1 Proof. Follows trivially from Lemmas 3.1 and 3.3. 4. Ecient inference in the semiparametric model In the previous section, we constructed a parametric submodel of the given semiparametric model, which satis es the Local Asymptotic Normality property uniformly in the parameter of interest and pointwise in the nuisance parameter. The convolution theorem now implies that ecient estimators ˆn are necessarily of the form n √ 1P (4.1) l˜t (n ) + oP (1= n); ˆn = n + I˜f (n )−1 n t=1 √ under n = 0 + O(1= n). Similarly, (locally asymptotically) optimal tests should be based upon n 1P l˜t (); I˜f ()−1 n t=1 or asymptotically equivalent expressions. In order to obtain semiparametric procedures, we construct an ecient score l˜t () that does not depend on the unknown density f (n) and is equivalent to l˜t () in the sense that, under P; 0, n 1 P √ lˆt () − l˜t () = oP (1); n → ∞: n t=1 The continuity of I˜f () in then implies that inference based on lˆt () is locally and asymptotically optimal in the parametric submodel, and therefore, by de nition, in the semiparametric model. The idea is simple enough. We replace f in l˜t () by an estimator fˆn . Therefore we will make the following assumption. Assumption A5. Given 1 ; : : : ; n that are i.i.d. with density f, there exists an estimator fˆn of f, based on 1 ; : : : ; n , such that 2 Z ˆ0 p fn (u) f0 (u) − f(u) du → 0; fˆn (u) f(u)
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
Z
0 fˆn (u) fˆn (u)
!2
337
p fˆn (u) du → Ill ;
p fˆn (i) → f(i)
for i = −1; 0; 1:
The existence of an estimator satisfying the rst two properties above is an immediate consequence of Proposition 7:8:1 in Bickel et al. (1993). Estimators satisfying the other property are easily constructed. 0 The eect of replacing f in the ecient score l˜t () (or l˜t ()) by an estimator fˆn satisfying Assumption A5 is asymptotically negligible. More precisely, we have the following. Lemma 4.1. Suppose that (n ) is a sequence of positive integers such that there exist 0 ¡ − 6+ ¡ 1 such that − 6lim inf n→∞ n =n6lim supn→∞ n =n6+ . Let fˆ1n be the estimator of Assumption A5 based on 1 ; : : : ; n and fˆ2n the estimator of Assumption A5 based on n +1 ; : : : ; n . Deÿne; for t = 1; : : : ; n ; ! 0 ˆ0 (t ()) f () (1) 2n (Xt−1 − X n ) lˆt () = − () fˆ2n (t ()) 0 () ˆ f (sgn(t ()))sgn(t ()2 − 1) () 2n 0 () ˆ (1) f (0)sgn(t ())X n +2 () 2n 0 () ˆ (1) [4f2n (sgn(t ())) − 2fˆ2n (0)] sgn(t ()2 − 1)sgn(t ())X n ; + () +4
(1) where X n = (1=n )
Pn
s=1
Xs−1 ; and; for t = n + 1; : : : ; n; !
0 0 () fˆ1n (t ()) (2) (Xt−1 − X n ) lˆt () = − () fˆ1n (t ())
0 () ˆ f (sgn(t ()))sgn(t ()2 − 1) () 1n 0 () ˆ (2) +2 f (0)sgn(t ())X n () 1n 0 () ˆ (2) + [4f1n (sgn(t ())) − 2fˆ1n (0)] sgn(t ()2 − 1)sgn(t ())X n ; () +4
(2) where X n = (1=(n − n ))
Pn
s=n +1
√ Xs−1 . Then; for n = 0 + O(1= n) and under P(n) ; 0
n n √ 1P 1P lˆt (n ) = l˜t (n ) + oP (1= n): n t=1 n t=1
(4.2)
338
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
Moreover; Iˆin (0 ) =
0 (0 )2 1 − (0 )2
n 1P t (0 )2 − n t=1
2 ! Z
n 1P t (0 ) n t=1
0 fˆin (u) fˆin (u)
!2 fˆin (u) du
0 (0 )2 ˆ [f (1)2 + fˆin (−1)2 )] (0 )2 in Pn 2 (1=n) t=1 t (0 ) [fˆin (1)2 + fˆin (0)2 + fˆin (−1)2 + 80 (0 )2 1 − (0 )
+8
−fˆin (0)[fˆin (1) + fˆin (−1)]] Pn 0 (0 )0 (0 ) (1=n) t=1 t (0 ) h ˆ fin (1)2 − fˆin (−1)2 + 16 (0 ) 1 − (0 ) i 1 − fˆin (0)[fˆin (1) − fˆin (−1)] ; 2
(4.3)
-probability. i = 1; 2; both converge to I˜f (0 ) in P(n) 0 Proof. We only consider the proof for t = 1; : : : ; n , the other part being similar. Note that n 1 P [lˆt (n ) − l˜t (n )] n t=1 " 0 # Z ∞ ˆ0 n f0 (t (n )) 0 (n ) fˆ2n (t (n )) f2n (x) 1 P − f(x) d x − − = n t=1 (n ) fˆ2n (t (n )) f(t (n )) −∞ fˆ2n (x) n 1 P × Xt−1 − Xs−1 n s=1
+
n 0 (n ) ˆ 1 P [f (sgn(t (n ))) − f(sgn(t (n ))] sgn(t (n )2 − 1) 4 n t=1 (n ) 2n
+
n n 0 (n ) ˆ 1 P 1 P [f2n (0) − f(0)] sgn(t (n )) 2 Xs−1 n t=1 (n ) n s=1
+
n 0 ( ) 1 P n [4fˆ2n (sgn(t (n ))) − 4f(sgn(t (n ))) − 2fˆ2n (0) + 2f(0)] n t=1 (n )
×sgn(t (n )2 − 1)sgn(t (n )) −
n 1 P 0 (n ) f0 (t (n )) − n t=1 (n ) f(t (n ))
n 1 P Xs−1 n s=1
EXt−1 −
n 1 P Xs−1 n s=1
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
339
n n 1 P 0 (n ) 1 P f(0) sgn(t (n )) EXt−1 − − 2 Xs−1 n t=1 (n ) n s=1 n 0 ( ) 1 P n [4f(sgn(t (n ))) − 2f(0)]sgn(t (n )2 − 1)sgn(t (n )) n t=1 (n ) n 1 P Xs−1 × EXt−1 − n s=1 p =oP (1= n )
−
by the martingale central limit theorem for all terms separately (conditional on fˆ2n ) Pn and Assumption A5, the law of large numbers for (1=n ) s=1 Xs−1 , and contiguity of (n) (n) {Pn } and {P0 }. √ An initial n-consistent estimator for the semiparametric model is easily obtained from, e.g., the rst-order empirical autocorrelation. This shows that there exist estimators satisfying the following assumption. Assumption A6. There exists an estimator n for in the semiparametric model which √ √ is n-consistent (i.e. n(n − ) = OP (1) under P(n) ) and discretized (i.e. for each √ M ¿ 0 the number of possible values of n with ||n ||6M n is bounded in n). We are now ready to present our ecient estimator. Theorem 4.2. Suppose that (n ) is a sequence of positive integers such that there exist 0 ¡ − 6+ ¡ 1 such that − 6lim inf n→∞ n =n6lim supn→∞ n =n6+ : Let n be an initial estimator for satisfying Assumption A6. Denote by t = t (n ) the estimated residuals. Finally; let fˆ1n be the estimator of Assumption A5 based on 1 ; : : : ; n and fˆ2n the estimator of Assumption A5 based on n +1 ; : : : ; n . Then; n 1P (4.4) lˆt (n ) ˆn = n + Iˆn (n )−1 n t=1 is an ecient semiparametric estimator for . Proof. This follows along standard lines using the discreteness of n , its and Lemma 4.1. See, e.g., Bickel et al. (1993, Section 2:5).
√
n-consistency,
Acknowledgements The authors thank Catherine Vermandele and two referees for their pertinent remarks. References Barndor-Nielsen, O.E., Shephard, N., 1998. Aggregation and model construction for volatility models. Preprint, Nueld College, Oxford.
340
M. Hallin et al. / Journal of Statistical Planning and Inference 91 (2000) 323–340
Bickel, P.J., Klaassen, C.A.J., Ritov, Y., Wellner, J.A., 1993. Ecient and Adaptive Statistical Inference for Semiparametric Models. John Hopkins University Press, Baltimore. Chow, Y., Teicher, H., 1988. Probability Theory: Independence, Interchangeability, Martingales. Springer-Verlag, New York. Drost, F.C., Klaassen, C.A.J., Werker, B.J.M., 1997. Adaptive estimation in time-series models. Ann. Statist. 25, 786–818. Greenwood, P.E., Wefelmeyer, W., 1995. Eciency of empirical estimators for Markov chains. Ann. Statist. 23, 132–143. Hallin, M., Vermandele, C., Werker, B.J.M., 1999. Rank and sign based ecient inference for the median autoregressive model. Preprint, Institut de Statistique, Universite Libre de Bruxelles, Brussels. Hallin, M., Werker, B.J.M., 1999. Semiparametrically ecient invariant inference, Preprint, Institut de Statistique, Universite Libre de Bruxelles, Brussels. Jorion, P., 1988. On jump processes in the foreign exchange and stock markets. Rev. Financial Studies 1, 427–445. Jureckova, J., Sen, P.K., 1996. Robust Statistical Procedures: Asymptotics and Interrelations. Wiley, New York. Pham, T.D., Tran, L.T., 1985. Some mixing properties of time series models. Stochastic Processes Appl. 19, 297–303.