Journal of Statistical Planning and Inference 126 (2004) 495 – 520
www.elsevier.com/locate/jspi
H%older norm test statistics for epidemic change Alfredas Ra)ckauskasa , Charles Suquetb;∗ a Department
of Mathematics, Institute of Mathematics and Informatics, Vilnius University, Naugarduko 24, Lt-2006 Vilnius, Lithuania b Universit" e des Sciences et Technologies de Lille, Math"ematiques Appliqu"ees, F.R.E. CNRS 2222, Bˆat. M2, U.F.R. de Math"ematiques, F-59655 Villeneuve d’Ascq Cedex, France Received 5 December 2002; accepted 4 September 2003
Abstract To detect epidemic change in the mean of a sample of size n, we introduce new test statistics UI and DI based on weighted increments of partial sums. We obtain their limit distributions under the null hypothesis of no change in the mean. Under alternative hypothesis our statistics can detect very short epidemics of length log n; ¿ 1. Using self-normalization and adaptiveness to modify UI and DI, allows us to prove the same results under very relaxed moment assumptions. Trimmed versions of UI and DI are also studied. c 2003 Elsevier B.V. All rights reserved. MSC: 62E20; 62G10; 60F17 Keywords: Change point; Epidemic alternative; Functional central limit theorem; H%older norm; Partial sums processes; Selfnormalization
1. Introduction An important question in the large area of change point problems involves testing the null hypothesis of no parameter change in a sample versus the alternative that parameter changes do take place at an unknown time. For a survey we refer to the books by Brodsky and Darkhovsky (1993) or Cs%org˝o and Horv@ath (1997). In this paper we suggest a class of new statistics for testing change in the mean under so called epidemic alternative. More precisely, given a sample X1 ; X2 ; : : : ; Xn , we want to test the ∗
Research supported by a cooperation agreement CNRS/LITHUANIA (4714). Corresponding author. Fax: +33-320-436774. E-mail address:
[email protected] (C. Suquet).
c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter doi:10.1016/j.jspi.2003.09.004
496 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
standard null hypothesis of constant mean (H0 ): X1 ; : : : ; Xn all have the same mean denoted by 0 ; against the epidemic alternative (HA ): there are integers 1 ¡ k ∗ ¡ m∗ ¡ n and a constant 1 = 0 such that EXi = 0 + (1 − 0 )1{k ∗ ¡i6m∗ } ;
i = 1; 2; : : : ; n:
Writing l∗ := m∗ − k ∗ for the length of the epidemic, we assume throughout the paper that both l∗ and n − l∗ go to inJnity with n. In this paper we follow the classical methodology to build test statistics by using continuous functionals of a partial sums process. Set S(0) = 0; S(t) = Xk ; 0 ¡ t 6 n: k6t
When A is a set of integers, S(A) will denote with 0 ¡ ¡ 1=2 UI(n; ) :=
max
16i¡j6n
i∈A
Xi . For instance we suggest to use
|S(j) − S(i) − S(n)(j=n − i=n)| : [(j=n − i=n)(1 − (j − i)=n)]
This is a functional, continuous in some H%older topology, of the classical Donsker– Prokhorov polygonal line process. The statistics UI(n; 0) was suggested by Levin and Kline (1985), see Cs%org˝o and Horv@ath (1997). To motivate the deJnition of such test statistics, let us assume just for a moment that the changes times k ∗ and m∗ are known. Suppose moreover under (H0 ) that the (Xi ; 1 6 i 6 n) are independent identically distributed with Jnite variance !2 . Under (HA ), suppose that the (Xi ; i ∈ In ) and the (Xi ; i ∈ Icn ) are separately independent identically distributed with the same Jnite variance !2 , where In := {i ∈ {1; : : : ; n}; k ∗ ¡ i 6 m∗ };
Icn := {1; : : : ; n} \ In :
Then we simply have a two sample problem with known variances. It is then natural to accept (H0 ) for small values of the statistics |Q| and to reject it for large ones, where Q :=
S(In ) − l∗ S(n)=n S(Icn ) − (n − l∗ )S(n)=n − : (l∗ )1=2 (n − l∗ )1=2
After some algebra, Q may be recast as (n)1=2 l∗ Q= ∗ S(I ) − S(n) [(1 − l∗ =n)1=2 + (l∗ =n)1=2 ]: n (l (n − l∗ ))1=2 n
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 497
As the last factor into square brackets ranges between 1 and 21=2 , we may drop it and so replace |Q| by the statistics l∗ ∗ (n)1=2 n ) − n S(n)| S(In ) − l S(n) = n−1=2 |S(I R := ∗ 1=2 : n (l (n − l∗ ))1=2 l∗ l∗ 1 − n n Introducing now the notation tk = tn; k :=
k ; n
0 6 k 6 n;
enables us to rewrite R as R = n−1=2
|S(m∗ ) − S(k ∗ ) − S(n)(tm∗ − tk ∗ )| : [(tm∗ − tk ∗ )(1 − (tm∗ − tk ∗ ))]1=2
Now in the more realistic situation where k ∗ and m∗ are unknown it is reasonable to replace R by taking the maximum over all possible indexes for k ∗ and m∗ . This leads to consider UI(n; 1=2) :=
max
16i¡j6n
|S(j) − S(i) − S(n)(tj − ti )| [(tj − ti )(1 − (tj − ti ))]1=2
:
It is worth to note that the same statistics arises from likelihood arguments in the special case where the observations Xi are Gaussian, see Yao (1993). The asymptotic distribution of UI(n; 1=2) is unknown, due to diOculties caused by the denominator (for historical remarks see Cs%org˝o and Horv@ath, 1997, p. 183). In our setting, the Xi ’s are not supposed to be Gaussian. Moreover it seems fair to pay something in terms of normalization when passing from R to n−1=2 UI(n; 1=2). Intuitively the cost should depend on the moment assumptions made about the Xi ’s. To discuss this, let us introduce the polygonal partial sums process $n deJned by linear interpolation between the points (tk ; S(k)). Then UI(n; 1=2) appears as the discretization through the grid (tk ; 0 6 k 6 n) of the functional T1=2 ($n ) where T1=2 (x) :=
sup
0¡s¡t¡1
|x(t) − x(s) − (x(1) − x(0))(t − s)| : [(t − s)(1 − (t − s))]1=2
(1)
This functional is continuous in the H%older space Ho1=2 of functions x : [0; 1] → R such that |x(t + h) − x(t)| = o(h1=2 ), uniformly in t. Obviously Jnite dimensional distributions of n−1=2 !−1 $n converge to those of a standard Brownian motion W . However L@evy’s theorem on the modulus of uniform continuity of W implies that Ho1=2 has too strong a topology to support a version of W . So n−1=2 $n cannot converge in distribution to !W in the space Ho1=2 . This forbid us to obtain limiting distribution for T1=2 (n−1=2 $n ) by invariance principle in Ho1=2 via continuous mapping. Fortunately, H%olderian invariances principles do exist for, roughly speaking, all the scale of H%older spaces Ho( of functions x such that |x(t + h) − x(t)| = o(((h)), uniformly in t, provided that the weight function ( satisJes limh↓0 ((h)(h log |h|)−1=2 = ∞. This type of invariance
498 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
principles goes back to Lamperti (1962) who studied the case ((h) = h (0 ¡ ¡ 1=2). A complete characterization in terms of moments assumptions on X1 was obtained recently by the authors in the general case (see Theorem 11 below). This leads us to replace T1=2 (n−1=2 $n ) by T (n−1=2 $n ) obtained substituting the denominator in (1) by (t − s) (1 − (t − s)) . Going back to the discretization we Jnally suggest the class UI (uniform increments) of statistics which includes particularly UI(n; ) and similar ones UI(n; () built with a general weight ((h) instead of h . Together with UI we consider the class of DI (dyadic increments) statistics, which includes particularly 1 1 DI(n; ) = max 2j max S(nr) − S(nr + n2−j ) − S(nr − n2−j ) ; r∈Dj 16j6log2 n 2 2 where Dj is the set of dyadic numbers of the level j and log2 denotes the logarithm with basis 2. DI(n; ) and UI(n; ) have similar asymptotic behaviors. Moreover, dyadic increments statistics are of particular interest since their limiting distributions are completely speciJed (see Theorem 10 below). Due to the independence of the Xi ’s, it is easy to see that even stochastic boundedness of either of n−1=2 UI(n; ) or n−1=2 DI(n; ) yields that of n−1=2+ max16i6n |Xi |. Hence, necessarily P(|X1 | ¿ t) = O(t −p() ), where p() = (1=2 − )−1 . So, heavier is the weight ((h), stronger are the required moment assumptions to obtain the convergence of n−1=2 UI(n; (). On the other hand, the interest of a heavy weight is in the detection of short epidemics. Indeed (see Theorem 4 below), UI(n; 0) can detect only epidemics whose the length l∗ is such that n1=2 = o(l∗ ). For 0 ¡ ¡ 1=2; UI(n; ) detects epidemics with n* = o(l∗ ) where * = (1 − 2)=(2 − 2). With the weight ((h) = h1=2 log+ (c=h); + ¿ 1=2, UI(n; () detects epidemics such that log2+ n = o(l∗ ). We consider two ways to preserve the sensitiveness to short epidemics while relaxing moments assumptions. One is trimming and the other one is self-normalization and adaptive selection of partial sums increments. Trimming leads to a class of statistics, which includes 1 1 DI(n; ; ) = max 2j max S(nr) − S(nr + n2−j ) − S(nr − n2−j ) ; r∈Dj 16j6 log2 n 2 2 where 0 ¡ ¡ 1. Adaptive construction of partial sums process and the corresponding functional central limit theorem proved in Ra)ckauskas and Suquet (2001a) allows to deal with a class of statistics which includes e.g. SUI(n; ) obtained by replacing in UI(n; ) the deterministic points tk by the random vk := Vk2 =Vn2 , where Vk2 = X12 + · · · + Xk2 ; k = 1; : : : ; n, SUI(n; ) =
|S(j) − S(i) − S(n)(vj − vi )| : 16i¡j6n [(vj − vi )(1 − (vj − vi ))] max
When X1 is symmetric, the convergence in distribution of Vn−1 SUI(n; ) requires only the membership of X1 in the domain of attraction of normal law, otherwise the existence of E|X1 |2+- for some - ¿ 0 is suOcient. The paper is organized as follows. In Section 2, deJnitions of classes of statistics are presented and limiting distributions are given. All proofs are deferred to Section 3.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 499
2. UI and DI statistics and their asymptotics All the test statistics studied in this paper may be viewed as discretizations of some H%older norms or semi-norms. The following subsection contains the relevant background. 2.1. The H>olderian framework Let ( : [0; 1] → R+ be a weight function. Membership of a continuous function x : [0; 1] → R in the H%older space Ho( means roughly that |x(t + h) − x(t)| = o(((|h|)) uniformly in t. The classical scale of H%older spaces uses ((h) = h , 0 ¡ ¡ 1. L@evy’s theorem on the modulus of continuity of the Brownian motion restricts the investigation of invariance principles in this scale to the range 0 ¡ ¡ 1=2. The same result leads naturally to consider the H%older spaces built with the functions ((h) = h1=2 log+ (c=h) for + ¿ 1=2. We include these two cases of practical interest in the following rather general class R. Denition 1. We denote by R the class of non-decreasing functions ( : [0; 1] → R+ satisfying (i) for some 0 ¡ 6 1=2, and some function L, positive on [1; ∞) and normalized slowly varying at inJnity, ((h) = h L(1=h);
0 ¡ h 6 1;
(ii) /(t) = t 1=2 ((1=t) is C 1 on [1; ∞); (iii) there is a + ¿ 1=2 and some a ¿ 0, such that /(t) log−+ (t) is non-decreasing on [a; ∞). Let us recall that L is normalized slowly varying at inJnity if and only if for every * ¿ 0, t * L(t) is ultimately increasing and t −* L(t) is ultimately decreasing (Bingham et al., 1987, Theorem 1.5.5). The main practical examples we have in mind may be parametrized by ((h) = ((h; ; +) := h log+ (c=h): We write C[0; 1] for the Banach space of continuous functions x : [0; 1] → R endowed with the supremum norm x∞ := sup{|x(t)|; t ∈ [0; 1]}. Denition 2. Let ( be a real valued non-decreasing function on [0; 1], null and right continuous at 0. We denote by Ho( the H%older space
o H( := x ∈ C[0; 1]; lim !( (x; *) = 0 ; *→0
500 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
where !( (x; *) :=
sup
s; t∈[0;1]; 0¡t−s¡*
|x(t) − x(s)| : ((t − s)
Ho( is a separable Banach space for its native norm x( := |x(0)| + !( (x; 1): Under technical assumptions, satisJed by any ( in R, the space Ho( may be endowed with an equivalent norm xseq ( built on weighted dyadic increments of x. Let us denote by Dj the set of dyadic numbers in [0; 1] of level j, i.e. Dj = {(2l − 1)2−j ; 1 6 l 6 2j−1 };
D0 = {0; 1};
j ¿ 1:
∗
We write D (resp. D ) for the sets of dyadic numbers in [0; 1] (resp. (0; 1]) ∞ Dj ; D∗ := D \ {0}: D := j=0
Put for r ∈ Dj ; j ¿ 0, r − := r − 2−j ;
r + := r + 2−j :
For any function x : [0; 1] → R, deJne its Schauder coeOcients 2r (x) by x(r + ) + x(r − ) ; r ∈ Dj ; j ¿ 1 (2) 2r (x) := x(r) − 2 and in the special case j = 0 by 20 (x) := x(0), 21 (x) := x(1). When ( belongs to R, we have on the space Ho( the equivalence of norms (see Ra)ckauskas and Suquet, 2001b; Semadeni, 1982) 1 x( ∼ xseq max |2r (x)|: ( := sup −j j¿0 ((2 ) r∈Dj Let W = (W (t); t ∈ [0; 1]) be a standard Wiener process and B = (B(t); t ∈ [0; 1]) the corresponding Brownian bridge B(t) = W (t) − tW (1); t ∈ [0; 1]. Consider for ( in R, the following random variables: |B(t) − B(s)| UI(() := sup (3) 0¡t−s¡1 (((t − s)(1 − (t − s))) and DI(() = sup j¿1
1 W (r) − 1 W (r + ) − 1 W (r − ) = Bseq max ( : −j ((2 ) r∈Dj 2 2
(4)
These variables serve as limiting for uniform increment (UI) and dyadic increment (DI) statistics respectively. No analytical form seems to be known for the distribution function of UI((), whereas the distribution of DI(() is completely speciJed by Theorem 10 below. Remark 1. We do not treat in this paper the problem of low regularity H%older norms associated to the weights ((h) = L(1=h) with L slowly varying. In this case nothing
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 501
seems known about equivalence of the norms ( and seq ole ( , which plays a key rˆ in our proofs of H%olderian invariance principles. The extension of our results to this boundary case would require other technics and seems to be of limited practical scope. 2.2. Statistics UI(n; () and DI(n; () To simplify notation put %(h) := ((h(1 − h));
0 6 h 6 1:
For ( ∈ R, deJne (recalling that tk = k=n, 0 6 k 6 n), |S(j) − S(i) − S(n)(tj − ti )| UI(n; () = max 16i¡j6n %(tj − ti ) and
1 S(nr) − 1 S(nr + ) − 1 S(nr − ) : max −j 2 2 1¡2 6n ((2 ) r∈Dj
DI(n; () = max j
To obtain limiting distribution for these statistics we shall work with a stronger null hypothesis, namely (H0 ): X1 ; : : : ; Xn are independent identically distributed random variables with mean denoted 0 :
Theorem 3. Under (H0 ), assume that ( ∈ R and for every A ¿ 0, lim tP(|X1 | ¿ A/(t)) = 0:
t→∞
(5)
Then D
!−1 n−1=2 UI(n; () −→ UI(() n→∞
(6)
and D
!−1 n−1=2 DI(n; () −→ DI((); n→∞
(7)
where !2 = var(X1 ) and UI((); DI(() are de?ned by (3) and (4) respectively. When ¡ 1=2, it is enough to take A=1 in (5). In the special case ((h)=((h; ; 0)= h , (5) may be recast as P(|X1 | ¿ t) = o(t −p )
with p := 1=(1=2 − ):
In the other special case ((h) = ((h; 1=2; +) = h1=2 log+ (c=h), where + ¿ 1=2, it is not possible to drop the constant A in Condition (5) which is easily seen to be equivalent to E exp{2|X1 |1=+ } ¡ ∞
for each 2 ¿ 0:
(8)
Let us note that either of (6) or (7) yields stochastic boundedness of the random variable max16k6n |Xk |=/(n) which on its turn gives (8).
502 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Remark 2. In the case where the variance !2 is unknown the results remain valid if !2 is substituted by its standard estimator !ˆ2 . Remark 3. Under (H0 ), the statistics n−1=2 UI(n; () keeps the same value when each Xi is substituted by Xi := Xi − EXi . This property is only asymptotically true for n−1=2 DI(n; (), see the proof of Theorem 3. For practical use of DI(n; (), it is preferable to replace Xi by Xi if 0 is known or else by Xi − XS where XS = n−1 (X1 + · · · + Xn ). This will avoid a bias term which may be of the order of |EX1 | ln−+ n in the worst cases. To see a consistency of tests to reject null hypothesis versus epidemic alternative (HA ) for large values of n−1=2 UI(n; (), we naturally assume that the numbers of observations k ∗ ; m∗ − k ∗ ; n − m∗ before, during and after the epidemic go to inJnity with n. Write l∗ := m∗ − k ∗ for the length of the epidemic. Theorem 4. Let ( ∈ R. Assume under (HA ) that the Xi ’s are independent and !02 := supk¿1 var(Xk ) is ?nite. If l∗ hn l∗ 1− ; (9) lim n1=2 |1 − 0 | = ∞; where hn := n→∞ ((hn ) n n then P
n−1=2 UI(n; () −→ ∞; n→∞ P
n−1=2 DI(n; () −→ ∞: n→∞
(10)
In our setting 0 and 1 are constants, but the proof of Theorem 9 remains valid when 1 and 0 are allowed to depend on n. This explains the presence of the factor |1 − 0 | in (9). This dependence on n of 0 and 1 is discarded in the following examples. When ((h) = h , (9) is equivalent to hn = ∞: (11) lim n→∞ n1=(2−2) In this case one can detect short epidemics such that n(1−2)=(2−2) = o(l∗ ) as well as long epidemics such that n(1−2)=(2−2) = o(n − l∗ ). When ((h)=h1=2 log+ (c=h) with + ¿ 1=2, (9) is satisJed provided that hn =n−1 log n, with ¿ 2+. This leads to detection of short epidemics such that log n = o(l∗ ) as well as of long ones verifying log n = o(n − l∗ ). 2.3. Statistics SUI and SDI To introduce the adaptive selfnormalized statistics, we shall restrict the null hypothesis assuming that 0 = 0. Practically the reduction to this case by centering requires the knowledge of 0 . Although this seems a quite reasonable assumption, it is fair to point out that this knowledge was not required for the UI and DI statistics.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 503
For any index set A ⊂ {1; : : : ; n}, deJne V 2 (A) := Xi2 : i∈A
Then = V ({1; : : : ; k}), V02 = 0. To simplify notation we write vk for the random points of [0; 1] Vk2
vk :=
2
Vk2 ; Vn2
k = 0; 1; : : : ; n:
For ( ∈ R deJne SUI(n; () =
|S(j) − S(i) − S(n)(vj − vi )| : 16i¡j6n %(vj − vi ) max
Introduce for any t ∈ [0; 1], 6n (t) := max{i 6 n; Vi2 6 tVn2 } and denoting by log2 the logarithm with basis 2 (log2 (2t ) = t = log(et )), 2 Jn := log2 (Vn2 =Xn:n )
Now deJne SDI(n; () := max
16j6Jn
where Xn:n := max |Xk |: 16k6n
1 S(6n (r)) − 1 S(6n (r + )) − 1 S(6n (r − )) : max −j 2 2 ((2 ) r∈Dj
Theorem 5. Assume that under (H0 ); 0 = 0 and the random variables X1 ; : : : ; Xn are either symmetric and belong to the domain of attraction of normal law or E|X1 |2+- ¡ ∞ for some - ¿ 0. Then for every ( ∈ R, D
Vn−1 SUI(n; () −→ UI(()
(12)
n→∞
and D
Vn−1 SDI(n; () −→ DI(():
(13)
n→∞
Theorem 6. Let ( ∈ R. Under (HA ) with 0 = 0, assume that the Xi ’s satisfy S(In ) = l∗ 1 + OP (l∗ 1=2 );
S(Icn ) = OP ((n − l∗ )1=2 )
(14)
and V 2 (In ) P −→ b1 ; l∗ n→∞
V 2 (Icn ) P −→ b0 ; n − l∗ n→∞
(15)
for some ?nite constants b0 and b1 . Assume that l∗ =n converges to a limit c ∈ [0; 1] and when c = 0 or 1, assume moreover that l∗ l∗ 1=2 hn 1− : (16) lim n = ∞; where hn := n→∞ ((hn ) n n
504 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Then P
Vn−1 SUI(n; () −→ ∞:
(17)
n→∞
Note that in Theorem 6, no assumption is made about the dependence structure of the Xi ’s. It is easy to verify the general hypotheses (14) and (15) in various situations of weak dependence like mixing or association. Under independence of the Xi ’s (without assuming identical distributions inside each block In and Icn ), it is enough to have supk¿1 E|Xk |2+- ¡ ∞ and EV 2 (In )=l∗ → b1 , EV 2 (Icn )=(n − l∗ ) → b0 . This follows easily from Lindebergh’s condition for the central limit theorem (giving (14)) and of Theorem 5, p. 261 in Petrov (1975) giving the weak law of large numbers required for the Xi ’s. Theorem 7. Let ( ∈ R and set Xi := Xi − E(Xi ). Under (HA ) with 0 = 0, assume that the random variables Xi are independent identically distributed and that E|X1 |2+- is ?nite for some - ¿ 0. Then under (16), P
Vn−1 SDI(n; () −→ ∞:
(18)
n→∞
Remark 4. Theorems 5 and 7 are proved using the H>olderian FCLT for self-normalized partial sums processes (see Ra)ckauskas and Suquet, 2001a). As far as we know, the removing of the assumption of Jnite 2 + - moment in the non-symmetric case for the H%olderian FCLT remains an open problem. Of course when there is no weight (((h) = 1), we can apply the weak invariance principle in C[0; 1] under selfnormalization. In this special case, Theorems 5 and 7 are valid with X1 (resp. X1 ) in the domain of attraction of the normal distribution. 2.4. Trimmed statistics Let ( ∈ R and let the sequence mn → ∞ be such that mn 6 log2 n. DeJne 1 1 + − DI(n; (; mn ) := max S(nr) − ) + S(nr )) max (S(nr : 16j6mn ((2−j ) r∈Dj 2 Theorem 8. If under (H0 ); E|X1 |2+6 ¡ ∞ for some 0 ¡ 6 6 1 and −6=2 −2−6 −mn lim m1+6 ( (2 ) = 0 n n
n→∞
(19)
then D
!−1 n−1=2 DI(n; (; mn ) −→ DI((): n→∞
Let (-n ) and (-n ) be two sequences of positive real numbers, both converging to 0. DeJne with tk = k=n, |S(j) − S(i) − S(n)(tj − ti )| : UI(n; (; -n ; -n ) := max n-n 6i¡j6n(1−-n ) %(tj − ti )
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 505
Theorem 9. If under (H0 ); E|X1 |2+6 ¡ ∞ for some 0 ¡ 6 6 1 and lim n−6=2 (−2−6 (-n -n ) log1+6 n = 0;
n→∞
then D
!−1 n−1=2 UI(n; (; -n ; -n ) −→ UI((): n→∞
2.5. The distribution of DI(() The distribution function of DI(() may be conveniently expressed in terms of the error function:
x 2 erf x = 1=2 exp(−s2 ) ds: 8 0 The following asymptotic expansion will be useful erf x = 1 −
1 exp(−x2 )(1 + O(x−2 )); x81=2
x → ∞:
Theorem 10. Let c = lim supj→∞ j 1=2 =/(2j ). (i) If c = ∞ then DI(() = ∞ almost surely. (ii) If 0 6 c ¡ ∞, then DI(() is almost surely ?nite and its distribution functions is given by P(DI(() 6 x) =
∞ j−1 {erf (/(2j )x)}2 ;
x ¿ 0:
j=1
The distribution function of DI(() is continuous with support [c(log 2)1=2 ; ∞). Remark 5. The exact distribution of UI(() = B( is not known. To obtain critical values for UI(n; (), one can use a general result on large deviations for Gaussian norms (see e.g. Ledoux and Talagrand, 1991) which reads −x2 P(UI(() ¿ x) 6 4 exp ; x ¿ 0: (20) 8EB2( Another approach is to use the equivalence of the norms B( and Bseq which ( provides constants c( and C( such that c( DI(() 6 UI(() 6 C( DI((): Both methods give only rough estimates of the critical value for UI(n; ().
(21)
506 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
3. Proofs 3.1. Tools The proofs of Theorems 3 and 5 are based on invariance principles in H%older spaces, for the partial sums processes $n = ($n (t); t ∈ [0; 1]) and 9n = (9n (t); t ∈ [0; 1]) deJned by k ; S(k) ; 0 6 k 6 n; (22) $n is the polygonal line with vertices n 2 Vk 9n is the polygonal line with vertices ; S(k) ; 0 6 k 6 n: (23) Vn2 The relevant invariance principles are proved in Ra)ckauskas and Suquet (2001b) and Ra)ckauskas and Suquet (2001a) respectively and may be stated as follows. Theorem 11. Let ( be in R. Then under (H0 ) with 0 = 0, the convergence D
!−1 n−1=2 $n → W takes place in Ho( if and only if condition (5) holds. Theorem 12. Under the conditions of Theorem 5 D
Vn−1 9n → W in Ho( for any ( ∈ R. The two next lemmas are useful to simplify and unify the proofs of Theorems 3 and 5. Lemma 13. Let (:n )n¿1 be a tight sequence of random elements in the separable Banach space B and gn ; g be continuous functionals B → R. Assume that gn converges pointwise to g on B and that (gn )n¿1 is equicontinuous. Then gn (:n ) = g(:n ) + oP (1): Proof. By the tightness assumption, there is for every - ¿ 0, a compact subset K in B such that for every n ¿ 1, P(:n ∈ K) ¡ -. Now by a classical corollary of Ascoli’s theorem, the equicontinuity and pointwise convergence of (gn ) give its uniform convergence to g on the compact K. Then for every * ¿ 0, there is some n0 = n0 (*; K), such that sup |gn (x) − g(x)| ¡ *;
x∈K
n ¿ n0 :
Therefore we have for n ¿ n0 , P(|gn (:n ) − g(:n )| ¿ *) 6 P(:n ∈ K) ¡ -; which is the expected conclusion.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 507
Lemma 14. Let (B; ) be a vector normed space and q : B → R such that (a) q is subadditive: q(x + y) 6 q(x) + q(y); x; y ∈ B; (b) q is symmetric: q(−x) = q(x); x ∈ B; (c) For some constant C; q(x) 6 Cx, x ∈ B. Then q satis?es the Lipschitz condition |q(x + y) − q(x)| 6 Cy;
x; y ∈ B:
(24)
If F is any set of functionals q ful?lling (a), (b), (c) with the same constant C, then (a), (b), (c) are inherited by g(x) := sup{q(x); q ∈ F} which therefore satis?es (24). The proof is elementary and will be omitted. 3.2. Proof of Theorem 3 Convergence of UI(n; (). Consider the functionals gn ; g, deJned on Ho( by gn (x) :=
max I (x; i=n; j=n);
16i¡j6n
g(x) :=
sup
0¡s¡t¡1
I (x; s; t);
(25)
where I (x; s; t) :=
|x(t) − x(s) − (t − s)x(1)| ; %(t − s)
0¡t − s¡1
(26)
and %(h) = ((h(1 − h)); h ∈ [0; 1]. Observe that UI(n; () = gn ($n );
UI(() = g(W ):
(27)
Clearly the functional q = I (:; s; t) satisJes Conditions (a) and (b) of Lemma 14. Let us check Condition (c). From the DeJnition 1, it is clear that for ( ∈ R, the ratio ((h)=((h=2) is continuous on (0; 1] and has the Jnite limit 2 at zero. Hence it is bounded on [0; 1] by a constant a = a(() ¡ ∞. Similarly the ratio (? (h) := h=((h) vanishes at 0 and is bounded on [0; 1] by some b = b(() ¡ ∞. If 0 ¡ t − s 6 1=2, then %(t − s) ¿ (((t − s)=2) ¿ a−1 ((t − s) and I (x; s; t) 6 a
|x(t) − x(s)| t−s +a |x(1)| ((t − s) ((t − s)
6 a(1 + b)x( :
(28) (29)
If 1=2 ¡ t − s ¡ 1, then %(t − s) ¿ (((1 − t + s)=2) ¿ a−1 ((1 − t + s) and ((1 − t + s) is bigger than both ((s) and ((1 − t). Assuming that x(0) = 0, we write x(t) − x(s) − (t − s)x(1) = x(t) − x(1) + x(0) − x(s) + (1 − t + s)x(1) and obtain I (x; s; t) 6 a
|x(s) − x(0)| 1−t+s |x(1) − x(t)| +a +a |x(1)| ((1 − t) ((s) ((1 − t + s)
6 a(2 + b)x( :
(30) (31)
508 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Introduce the closed subspace B := {x ∈ Ho( ; x(0) = 0}. From (29) and (31) we see that the functionals q = I (:; s; t) satisfy on B the Condition (c) of Lemma 14 with the same constant C = C(() = a(2 + b). It follows by Lemma 14 that gn as well as g are Lipschitz on B with the same constant C. In particular the sequence (gn )n¿2 is equicontinuous on B. Now we shall apply Lemma 13 with :n := !−1 n−1=2 $n with $n deJned by (22). By Theorem 11, (:n )n¿2 is tight on B. To check the pointwise convergence on B of gn to g, it is enough to show that for each x ∈ B, the function (s; t) → I (x; s; t) can be extended by continuity to the compact set T = {(s; t) ∈ [0; 1]2 ; 0 6 s 6 t 6 1}. From (28) we get 0 6 I (x; s; t) 6 a!( (x; t − s) + a|x(1)|(? (t − s), which allows the continuous extension along the diagonal putting I (x; s; s) := 0. From (30) we get 0 6 I (x; s; t) 6 2a!( (x; 1 + t − s) + a|x(1)|(? (1 + t − s) which allows the continuous extension at the point (0; 1) putting I (x; 0; 1) := 0. The pointwise convergence of (gn ) being now established, Lemma 13 gives gn (!−1 n−1=2 $n ) = g(!−1 n−1=2 $n ) + oP (1)
(32)
and the convergence of !−1 n−1=2 UI(n; () to UI(() follows from Theorem 11, (27) and (32). Xi
Convergence of DI(n; (). Let DI (n; () be deJned like DI(n; (), replacing Xi by = Xi − EXi . As Xi − Xi + c(r)EX1 ; Xi − Xi = nr − ¡i6nr
nr¡i6nr +
nr¡i6nr +
nr − ¡i6nr
with |c(r)| 6 1, we clearly have n−1=2 (DI(n; () − DI (n; ()) = oP (1), so we may and do assume without loss of generality that 0 = 0 in the proof. First we observe that 1 max |2r (S(n :))|; 1¡2 6n ((2−j ) r∈Dj
DI(n; () = max j
(33)
where the 2r are deJned by (2) and S(n :) is the discontinuous process (S(nt); 0 6 t 6 1). This process may also be written as S(nt) = $n (t) − (nt − [nt])X[nt]+1 ;
t ∈ [0; 1];
from which we see that |2r (S(n :)) − 2r ($n )| 6 2 max |Xi |: 16i6n
Plugging this estimate into (33) leads to the representation !−1 n−1=2 DI(n; () = gn (!−1 n−1=2 $n ) + Zn ; where gn (x) := max max j 1¡2 6n r∈Dj
|2r (x)| ; ((2−j )
x ∈ Ho(
(34)
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 509
and the random variable Zn satisJes 2 2 max |Xi |: |Zn | 6 1=2 max |Xi | = !n ((1=n) 16i6n !/(n) 16i6n
(35)
Hypothesis (5) in Theorem 3 gives immediately the convergence in probability to zero of max16i6n |Xi |=/(n). The functionals qr (x) := |2r (x)|=((r − r − ) satisfy obviously the hypotheses of Lemma 14 with the same constant C = 1. The same holds for gn = max1¡2j 6n maxr∈Dj qr and for g(x) := sup max qr (x): j¿1 r∈Dj
Accounting (34) and (35), Lemma 13 leads to the representation !−1 n−1=2 DI(n; () = g(!−1 n−1=2 $n ) + oP (1); from which the conclusion follows by Theorem 11 and continuous mapping. 3.3. Proof of Theorem 4 Divergence of UI(n; (). DeJne the random variables Xi − 0 if i ∈ Icn ; Xi := Xi − 1 if i ∈ In : In order to Jnd a lower bound for UI(n; (), let us consider the contribution to its numerator of the increment corresponding to the full length of epidemics. It may be written as l∗ l∗ S(In ) − S(Icn ) S(m∗ ) − S(k ∗ ) − S(n)(tm∗ − tk ∗ ) = 1 − n n l∗ (1 − 0 ) + Rn ; = l∗ 1 − (36) n where
l∗ l∗ Xi + 1 − Xi : Rn := − n n c i∈In
i∈In
It is easy to see that n−1=2 Rn = OP (hn1=2 ). Indeed 2 2 l∗ 1 l∗ 1 −1=2 ∗ 2 1− Rn ) 6 (n − l )!0 + l∗ !02 = !02 hn : var(n n n n n This estimate together with (36) leads to the lower bound hn1=2 −1=2 1=2 hn : UI(n; () ¿ n n |1 − 0 | − OP ((hn ) ((hn )
(37)
From DeJnition 1(iii), it is easily seen that limhn →0 hn1=2 =((hn ) = 0, so (10) follows from Condition (9) via (37).
510 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Divergence of DI(n; (). Let us estimate Jrst 2r (S(n :)) under the special conJguration r − 6 tk ∗ ¡ tm∗ 6 r (then necessarily, l∗ =n 6 1=2). For notational simpliJcation, deJne for 0 6 s ¡ t 6 1 and ∈ R, Sn (s; t) := Xk and Sn (s; t; ) := (Xk + ): ns¡k6nt
ns¡k6nt
Then we have 22r (S(n :)) = Sn (r − ; r) − Sn (r; r + ) = Sn (r − ; tk ∗ ; 0 ) + Sn (tk ∗ ; tm∗ ; 1 ) + Sn (tm∗ ; r; 0 ) − Sn (r; r + ; 0 ) = (0 − 1 )l∗ + O(1) + -k Xk ; nr − ¡k6nr +
where O(1) and the -k ’s are deterministic and -k = ±1. If the level j of r (r ∈ Dj ) is such that 2−j−1 ¡ l∗ =n 6 2−j , this gives for DI(n; () the same lower bound as the right-hand side of (37). Clearly the same result holds true when [tk ∗ ; tm∗ ] is included in [r; r + ]. Denoting by / the middle of [tk ∗ ; tm∗ ], we obtain the same lower bound (up to multiplicative constants) under the conJgurations where [/; tm∗ ] ⊂ [r − ; r] or [tk ∗ ; /] ⊂ [r; r + ]. Assume still that l∗ =n 6 1=2 and Jx the level j by 2−j−1 ¡ l∗ =n 6 2−j . There is a unique r0 ∈ Dj such that r0− 6 / ¡ r0+ . Then the divergence of n−1=2 DI(n; () follows from Hypothesis (9) by applying the above remarks in each of the following cases. (a) r0− 6 / ¡ r0− + 2−j−1 . Then / + l∗ =(2n) 6 r0− + 2−j−1 + 2−j−1 = r0 , so [/; tm∗ ] ⊂ [r0− ; r0 ]. (b) r0 + 2−j−1 6 / ¡ r0+ . Then / − l∗ =(2n) ¿ r0 , so [tk ∗ ; /] ⊂ [r0 ; r0+ ]. (c) r0 − 2−j−1 6 / ¡ r0 + 2−j−1 . Then r0− 6 tk ∗ ¡ tm∗ 6 r0+ . Only one of both dyadics r0− and r0+ has the level j − 1. If r0− ∈ Dj−1 , writing r1 := r0− we have r1+ = r0+ and [tk ∗ ; tm∗ ] ⊂ [r1 ; r1+ ]. Else r0+ ∈ Dj−1 and with r1 := r0+ , r1− = r0− so that [tk ∗ ; tm∗ ] ⊂ [r1− ; r1 ]. It remains to consider the case where l∗ =n ¿ 1=2. Assume Jrst that tk ∗ ¿ 1 − tm∗ , so that tk ∗ ¿ (1−l∗ =n)=2. Then there is a unique j such that 0 ¡ 2−j−1 ¡ tk ∗ 6 2−j 6 1=2 ¡ tm∗ . Choosing r0 := 2−j ∈ Dj , we easily obtain 22r0 (S(n :)) = (0 − 1 )(ntk ∗ ) + O(1) + -k Xk : nr − ¡k6nr +
Now the divergence of n−1=2 DI(n; () follows from (9) in view of the lower bound |2r0 (S(n :))| ¿ |0 − 1 |
n − l∗ − OP ((n − l∗ )1=2 ): 4
When tk ∗ ¡ 1 − tm∗ , Jxing j by 1 − 2−j 6 tm∗ ¡ 1 − 2−j−1 and choosing r0 := 1 − 2−j ∈ Dj , leads clearly to the same lower bound. The proof is complete.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 511
3.4. Proof of Theorem 5 Convergence of SUI(n; (). We use the straightforward representation Vi2 Vj2 SUI(n; () = max I 9n ; 2 ; 2 ; 16i¡j6n Vk Vk
(38)
where 9n and I are deJned by (23) and (26) respectively. By Lemma 9 in Ra)ckauskas and Suquet (2001a), if X1 is in the domain of attraction of the normal distribution then 2 Vk k P max − −→ 0: (39) 06k6n Vn2 n n→∞ Theorem 12 and the same arguments as in the proof of Theorem 3 give i j −1 gn (Vn 9n ) = max I 9n ; ; = g(Vn−1 9n ) + oP (1); 16i¡j6n n n
(40)
with g deJned by (25). By continuous mapping, the right-hand side of (40) converges in distribution to UI((). Hence the convergence of SUI(n; () to UI(() will follow from the estimate SUI(n; () − gn (Vn−1 9n ) = oP (1): Due to the tightness in Ho( of (Vn−1 9n )n¿1 , this follows in turn from (38), (39) and the purely analytical next lemma. Lemma 15. For any x ∈ Ho( such that x(0) = 0, denote by Ix the continuous function Ix : T → R;
(s; t) → I (x; s; t);
where T := {(s; t) ∈ [0; 1]2 ; 0 6 s 6 t 6 1} and I is de?ned by (26) and its continuous extension to the boundary of T putting I (x; s; s) := 0 and I (x; 0; 1) := 0. Let K be a compact subset in Ho( of functions x such that x(0) = 0. Then the set of functions EK := {Ix ; x ∈ K} is equicontinuous. Proof. Put for notational simpliJcation y(t) := x(t) − tx(1) and deJne y˙ % (s; t) :=
y(t) − y(s) ; %(|t − s|)
0 ¡ |t − s| ¡ 1;
with y˙ % (s; t) := 0 when |t − s| = 0 or 1. This reduces the problem to the proof of the equicontinuity of EK := {y˙ % : T → R; y ∈ K } where K is a compact subset in Ho( of functions vanishing at 0 and 1. It is convenient to estimate the increments of y in terms of the function %(h) = ((h(1 − h)). First we always have for 0 6 s 6 s + h 6 1, |y(s + h) − y(s)| 6 ((h)!( (y; h):
(41)
512 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Because y(0) = y(1) = 0 and ( and !( (y; :) are non-decreasing, we also have for 0 6 s 6 s + h 6 1, |y(s + h) − y(s)| 6 |y(s + h) − y(1)| + |y(s) − y(0)| 6 ((1 − s − h)!( (y; 1 − s − h) + ((s)!( (y; s) 6 2((1 − h)!( (y; 1 − h):
(42)
Recalling the constant a = a(() = sup{((h)=((h=2); 0 ¡ h 6 1} ¡ ∞, we see that ((h) ∧ ((1 − h) 6 a((h(1 − h)) = a%(h):
(43)
By (41)–(43) there is a constant c = c(() such that |y(s + h) − y(s)| 6 c%(h)!( (y; h ∧ (1 − h));
0 6 s 6 s + h 6 1:
Writing for simplicity !˜ ( (y; h) := c!( (y; h ∧ (1 − h)); the above estimate leads to y(t) = y(s) + y˙ % (s; t)%(t − s);
(s; t) ∈ [0; 1]2 ;
(44)
where |y˙ % (s; t)| 6 !˜ ( (y; |t − s|):
(45)
Using expansion (44), we get for any (s; t) and (s ; t ) in the interior of T (writing h := t − s and h = t − s) (y(t) − y(s))(%(h) − %(h )) y˙ % (s ; t ) − y˙ % (s; t) = %(h)%(h ) +
y˙ % (t; t )%(|t − t|) − y˙ % (s; s )%(|s − s|) %(h )
= y˙ % (s; t) +
%(h) − %(h ) %(h )
y˙ % (t; t )%(|t − t|) − y˙ % (s; s )%(|s − s|) : %(h )
Due to (45), this gives |y˙ % (s ; t ) − y˙ % (s; t)| 6 cy(
|%(h) − %(h )| + %(|t − t|) + %(|s − s|) : ((h )
(46)
To deal with the points close to the border of T , we complete this estimate with |y˙ % (s ; t ) − y˙ % (s; t)| 6 c(!˜ ( (y; h) + !˜ ( (y; h )):
(47)
Now the equicontinuity of EK follows easily from (46), (47), the continuity of (, the boundedness in Ho( of the compact K and the uniform convergence to zero over K of !( (y; *) (when * goes to zero). This last property is again an application of Ascoli’s theorem or of Dini’s theorem.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 513
Convergence of SDI(n; (). Recall that 6n (t) = max{i 6 n; Vi2 6 tVn2 }, t ∈ [0; 1] 2 and Jn = log2 (Vn2 =Xn:n ) where Xn:n := max16k6n |Xk |. Note also that SDI(n; () may be recast as SDI(n; () = max
16j6Jn
1 max |2r (S ◦ 6n )|: ((2−j ) r∈Dj
By O’Brien (1980), X1 is in the domain of attraction of the normal distribution if and only if Vn−1 max |Xk | = oP (1):
(48)
16k6n
The polygonal random line 9n deJned by (23) may be written as 9n (t) = S(6n (t)) +
t − V62n (t) Vn−2 X62n (t)+1 Vn−2
X6n (t)+1 = S(6n (t)) +
n (t):
Noting that | n (t)| 6 Xn:n , we have |2r ( n )| 6 2Xn:n for every dyadic r. This provides us the representation 9n 1 −1 max 2r + Zn ; Vn SDI(n; () = max 16j6Jn ((2−j ) r∈Dj Vn where |Zn | 6
2 2 V −2 ) Vn ((Xn:n n
=
2 −2 ) /(Vn2 Xn:n
;
with /(t) = t 1=2 ((1=t). Recalling that for ( ∈ R, limt→∞ /(t) = ∞, we see from (48) that Zn = oP (1). Introducing again the functionals qr (x) := |2r (x)|=((r − r − ), gn = max16j6log2 n maxr∈Dj qr and g(x) := supj¿1 maxr∈Dj qr (x) which satisfy the hypotheses of Lemma 14 with the same constant C = 1, we have Vn−1 SDI(n; () = gJn (Vn−1 9n ) + Zn : By (48), Jn goes to inJnity in probability, so an obvious extension of Lemma 13 leads to the representation Vn−1 SDI(n; () = g(Vn−1 9n ) + oP (1) and the convergence of Vn−1 SDI(n; () to DI(() follows from Theorem 12 by continuous mapping. 3.5. Proof of Theorem 6 Divergence of SUI(n; (). We shall exploit the lower bound SUI(n; () ¿
|S(In ) − S(n)(vm∗ − vk ∗ )| : %(vm∗ − vk ∗ )
(49)
514 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Writing again Xi := Xi − E(Xi ), we get S(In ) − S(n)(vm∗ − vk ∗ ) =
V 2 (Icn ) ∗ l 1 + R n ; Vn2
(50)
where Rn :=
V 2 (Icn ) V 2 (In ) Xi − Xi : Vn2 Vn2 c i∈In
i∈In
From (15) it follows that Vn2 = cb1 + (1 − c)b0 + oP (1); n
(51)
so by (14) and (15) we easily obtain n−1=2 Rn = OP (hn1=2 ):
(52)
Rewriting the principal term in (50) as V 2 (Icn ) ∗ (n − l∗ )−1 V 2 (Icn ) l∗ l = (n − l∗ )1 1 Vn2 n−1 Vn2 n leads by (15) to the equivalence in probability n−1=2
V 2 (Icn ) ∗ P b 1 1 l 1 ∼ n1=2 hn : Vn2 cb1 + (1 − c)b0
(53)
Finally we also have from (14) V 2 (In ) V 2 (Icn ) P b0 b1 ∼ hn ; Vn2 Vn2 (cb1 + (1 − c)b0 )2 from which (recalling that ((h) = h L(1=h) with L slowly varying) we deduce that for some constant d ¿ 0, %(vm∗ − vk ∗ ) 6 d((hn )(1 + oP (1)):
(54)
Going back to (49) with the estimates (52), (53) and (54) provides the lower bound 1=2 1=2 h h n n n n−1=2 SUI(n; () ¿ C − oP (1) − OP ; (55) ((hn ) ((hn ) with a positive constant C depending on 1 , b0 , b1 , c and (. As ( ∈ R, hn1=2 = o(((hn )) when hn goes to zero, so (55) gives the divergence of n−1=2 SUI(n; (), accounting Hypothesis (16). Due to (51), the divergence of Vn−1 SUI(n; () follows.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 515
3.6. Proof of Theorem 7 Divergence of SDI(n; (). For A ⊂ {1; : : : ; n} and 1 6 k 6 n, deJne Xi ; S (0) := 0; S (k) = S ({1; : : : ; k}) S (A) := i∈A
and 2
V (A) :=
2
Xi2 ;
Vk2 := V ({1; : : : ; k});
v0 := 0;
vk :=
i∈A
Vk2 : Vn2
9n
Denote by the polygonal line with vertices (vk ; S (k)); 0 6 k 6 n. As a preliminary step, we look for some control on the ratios Vn =Vn and − vk∗ ). Clearly (vm∗ − vk∗ )=(vm∗ l∗ 2 21 S (In ) Vn2 = 1 + 21 + : 2 Vn Vn Vn2 −1 2 Under the assumptions made on the sequence (Xi ), Vn−1 Vn S (In ) = OP (1) and n converges in probability to !2 := E(X12 ), so
1 6 lim inf n→+∞
Vn2 V2 2 6 lim sup n2 6 1 + 12 2 ! Vn n→+∞ Vn
in probability:
(56)
Similarly we have 2
vm∗ − vk∗ Vn2 (V (In ) + l∗ 12 + 21 S (In )) = − v vm∗ Vn2 V 2 (In ) k∗
2S (In ) V 2 l∗ 2 1+ = n2 1 + 2 1 : Vn 1 l∗ V (In ) 2
Applying the weak law of large numbers to V (In )=l∗ and S (In )=l∗ , we see that in probability, −1 2 vm∗ − vk∗ vm∗ − vk∗ 12 1 + 12 6 lim inf 6 lim sup 6 1 + : (57) n→+∞ vm∗ − v ! !2 n→+∞ vm∗ − vk∗ k∗ The statistics SDI(n; () may be expressed as 1 Xk − Xk : SDI(n; () = max max −j 16j6Jn 2((2 ) r∈Dj r¡vk 6r + r − ¡vk 6r DeJne the random index J by 2−J ¿ vm∗ − vk ∗ ¿ 2−J −1 and consider Jrst the conJguration where r − 6 vk ∗ ¡ vm∗ 6 r where r ∈ DJ . Then Vn−1 SDI(n; () ¿
l∗ |1 | 1 − Rn ; −J 2Vn ((2 ) 2
516 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
where 1 Rn := −J Vn ((2 ) − r
Xk − Xk : r¡vk 6r + ¡vk 6r
Due to (56), (57), the deJnition of J and the form of (, there are some positive random variables Ki , not depending on n such that l∗ |1 | n1=2 hn K 1 l∗ K2 (l∗ =Vn2 )Vn ¿ K ¿ ¿ 3 Vn ((2−J ) Vn ((vm∗ − vk∗ ) ((hn ) ((V 2 (In )=Vn2 ) and this lower bound goes to inJnity in probability by Condition (16). So it remains to check that Rn = OP (1). To this end, we split Rn in four blocks Rn; 1 ; : : : ; Rn; 4 indexed respectively by r − ¡ vk with k 6 k ∗ , k ∈ In , vk 6 r with k ¿ m∗ ; r ¡ vk 6 r + . Recall that 6n (t) := max{i 6 n; vi 6 t}. To bound Rn; 1 , note that if 6n (r − ) ¿ k ∗ , then Rn; 1 = 0. Assuming now that 6n (r − ) 6 k ∗ , we get 1 Vn S ((6n (r − ); k ∗ ]) Rn; 1 = ((2−J ) Vn Vn Vn 9n ((vk ∗ − v6n (r − ) ) 6 Vn Vn ( ((2−J ) Vn 9n ((vk ∗ − v6n (r − ) ) 6 : Vn Vn ( ((vk ∗ − v6n (r − ) )
(58)
In the bound (58), Vn =Vn = OP (1) by (56) and Vn−1 9n ( = OP (1) by Theorem 12. To bound the last fraction, we observe that
vk ∗ − v6 n (r − ) vk ∗ − v6n (r − )
2
=
((6n (r − ); k ∗ ]) Vn−2 V2 V = n2 ; −2 2 Vn Vn V ((6n (r − ); k ∗ ])
since Xi = Xi for i ∈ (6n (r − ); k ∗ ]. As ( is of the form ((h) = h L(1=h) with L slowly varying, it easily follows from (56) that ((vk ∗ − v6 n (r − ) ) ((vk ∗ − v6n (r − ) )
= OP (1):
Similarly the control of Rn; 2 reduces to establishing the stochastic boundedness of the ratio ((vm ∗ − vk ∗ )=((vm∗ − vk ∗ ), which in turn follows from (57). The control of Rn; 3 and Rn; 4 is similar to the one of Rn; 1 . Finally the proof is completed by a discussion of the conJgurations like in the proof of Theorem 4.
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 517
3.7. Proof of Theorem 8 Like in the proof of the convergence of DI(n; () (Theorem 3), it is enough to consider the case 0 = 0. We shall assume !2 = 1. Denoting 1 n−1=2 [1{nt − ¡ i 6 nt} − 1{nt ¡ i 6 nt + }]; *ni (j; t) = ((2−j ) where j = 1; : : : ; mn ; t ∈ Dj and i = 1; : : : ; n, we easily Jnd the following representation n 1 −1=2 n Xi *ni (j; t) : DI(n; (; mn ) = max max 2 16j6mn t∈Dj i=1
Let (i ; i = 1; : : : ; n) be a collection of independent standard normal random variables. Consider n 1 max max Vn; ( (mn ) = i *ni (j; t) : 2 16j6mn t∈Dj i=1
mn
Set kn = 1 + 2 + · · · + 2 . For x = (x1 ; : : : ; xkn ) ∈ Rkn , write x∞ := max |xl |: 16l6kn
Consider for each - ¿ 0 and r ¿ - the function fr; - : Rkn → R which possesses the following properties (i) for each x ∈ Rkn 1{x∞ 6 r − -} 6 fr; - (x) 6 1{x∞ 6 r + -};
(59)
(ii) the function f- is inJnitely many times diVerentiable and for each ‘ ¿ 1 there exists an absolute constant C‘ ¿ 0 such that fr;(‘)- := sup{|fr;(‘)- (x)(h)‘ | : x; h ∈ Rkn ; h∞ 6 1} 6 C‘ -−‘ log‘−1 kn :
(60)
Here fr;(‘)- denotes the ‘th derivative of the function fr; - and fr;(‘)- (x)(h)‘ the corresponding diVerential. Such functions are constructed in Bentkus (1984). Denoting for i = 1; : : : ; n, Xni = (Xi *ni (j; t), j = 1; : : : ; mn ; t ∈ Dj ) and Yni = (i *ni (j; t), t ∈ Dj , j = 1; : : : ; mn ) we have n n DI(n; (; mn ) = Xni ; Vn; ( (mn ) = Yni : i=1
i=1
∞
∞
Applying property (59) of the functions f- = fr+-; - and f- = fr−-; - we derive for r ¿ Dn (r) := |P(DI(n; (; mn ) 6 r) − P(Vn; ( (mn ) 6 r)| n n 6 max EfXni − EfYni + P(r − - 6 Vn; ( (mn ) 6 r + -); i=1
i=1
where maximum extends over two functions f- = fr−-; - and f- = fr+-; - .
518 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
n k−1 Writing Znk = i=1 Xni + i=k+1 Yni and noting that Znk + Xnk = Zn; k+1 + Ynk we have n n I := EfXni − EfYni i=1
=
n
i=1
Ef- (Znk + Xnk ) − Ef- (Znk + Ynk ):
k=1
Next we shall use Taylor’s expansion f- (x + h) = f- (x) + f- (x)h + 2−1 f- (x)h2 + R with interpolated remainder |R| 6 21−6 f- 1−6 f- 6 h2+6 ∞ where 0 ¡ 6 6 1. Noting that for each k = 1; : : : ; n Ef- (Znk )(Xnk ) = Ef- (Znk )(Ynk ) = 0 and Ef- (Znk )(Xnk )2 = Ef- (Znk )(Ynk )2 n 2+6 1−6 f- 6 . Clearly we obtain I 6 21− -−3 i=1 [EXni 2+6 ∞ + EYni ∞ ] f- 2+6 −(2+6)=2 −2−6 −mn EXni 2+6 n ( (2 ) ∞ 6 E|X1 |
and −(2+6)=2 −2−6 −mn EYni 2+6 ( (2 ); ∞ 6 c6 n
where c6 depends on 6 only. Finally, accounting (60) we obtain |I | 6 c6 -−3 n−6=2 (−2−6 (2−mn )m1+6 n : Therefore, under condition (19) we have for each r ¿ lim sup Dn (r) 6 lim P(r − - 6 Vn; ( (mn ) 6 r + -): n→∞
n→∞
If r ¡ - we have evidently Dn (r) 6 Dn (-) + 2P(Vn; ( (mn ) 6 -); hence lim sup Dn (r) 6 3 lim P(r − - 6 Vn; ( (mn ) 6 r + -) n→∞
n→∞
(61)
for all r ¿ 0. Since for each a ¿ 0 lim P(Vn; ( (mn ) 6 a) = F((1) (a) = P(DI(() 6 a)
n→∞
and the distribution function of DI(() is continuous we complete the proof by letting - → 0 in (61). The proof of Theorem 9 is similar to the proof of Theorem 8. The only diVerence is in the deJnition of random vectors Xni (the modiJcation being evident) and the dimension kn which now is O(n2 ).
A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520 519
3.8. Proof of Theorem 10 The result relies essentially on the representation of a standard Brownian motion as a series of triangular functions. For r ∈ Dj ; j ¿ 1, let Hr be the L2 [0; 1] normalized Haar function, deJned on [0; 1] by +(r + − r − )−1=2 = +2(j−1)=2 if t ∈ (r − ; r]; Hr (t) = −(r + − r − )−1=2 = −2(j−1)=2 if t ∈ (r; r + ]; 0 else: DeJne moreover H1 (t) := 1[0; 1] (t). For r ∈ Dj , j ¿ 1, the triangular Faber-Schauder functions Fr are deJned as (t − r − )=(r − r − ) = 2j (t − r − ) if t ∈ (r − ; r]; Fr (t) := (r + − t)=(r + − r) = 2j (r + − t) if t ∈ (r; r + ]; 0 else: In the special case j = 0, we set F1 (t) := t. The Fr ’s are linked to the Hr ’s in the general case r ∈ Dj , j ¿ 1 by
t
t Hr (s) ds = 2(j+1)=2 Hr (s) ds (62) Fr (t) = 2(r + − r − )−1=2 0
0
and in the special case r = 1 by
t H1 (s) ds: F1 (t) = 0
It is well known that any function x ∈ C[0; 1] such that x(0) = 0 may be expanded in the uniformly convergent series x = 21 (x)F1 +
∞
2r (x)Fr ;
j=1 r∈Dj
where the 2r (x)’s are given by (2). Let us recall the L@evy–Kamp@e de Feriet representation of the Brownian motion, which was generalized by Shepp (1966) with any orthonormal basis. Let {Y1 ; Yr ; r ∈ Dj ; j ¿ 1} be a collection of independent standard normal random variables. Then the series
t ∞ Yr Hr (s) ds; t ∈ [0; 1]; (63) W (t) := Y1 t + j=1 r∈Dj
0
converges almost surely uniformly on [0; 1] and W is a Brownian motion started at 0. Removing the Jrst term in (63) gives a Brownian bridge
t ∞ B(t) := W (t) − tW (1) = Yr Hr (s) ds; t ∈ [0; 1]: (64) j=1 r∈Dj
0
520 A. Ra5ckauskas, C. Suquet / Journal of Statistical Planning and Inference 126 (2004) 495 – 520
Now from (62) to (64), it is clear that the sequential H%older norms of B may be written as 1 Bseq (65) max |Yr |: ( = sup 1=2 j j¿1 (2) /(2 ) r∈Dj Recalling that DI(() has the same distribution as Bseq ( , the explicit formula for its distribution function is easily derived from (65) using standard techniques for the maxima of independent identically distributed Gaussian variables. References Bentkus, V.Yu., 1984. DiVerentiable functions deJned in the spaces c0 and Rk . Lithuan. Math. J. 23 (2), 146–154. Bingham, N.H., Goldie, C.M., Teugels, J.L., 1987. Regular variation. In: Encyclopaedia of Mathematics and its Applications. Cambridge University Press, Cambridge. Brodsky, B.E., Darkhovsky, B.S., 1993. Non Parametric Methods in Change Point Problems. Kluwer Academic Publishers, Dordrecht. Cs%org˝o, M., Horv@ath, L., 1997. Limit Theorems in Change-Point Analysis. Wiley, New York. Lamperti, J., 1962. On convergence of stochastic processes. Trans. Amer. Math. Soc. 104, 430–435. Ledoux, M., Talagrand, M., 1991. Probability in Banach Spaces. Springer, Berlin, Heidelberg. O’Brien, G.L., 1980. A limit theorem for sample maxima and heavy branches in Galton–Watson trees. J. Appl. Probab. 17, 539–545. Petrov, V.V., 1975. Sums of Independent Random Variables. Springer, Berlin. Ra)ckauskas, A., Suquet, Ch., 2001a. Invariance principles for adaptive self-normalized partial sums processes. Stochastic Process. Appl. 95, 63–81. Ra)ckauskas, A., Suquet, Ch., 2001b. Necessary and suOcient condition for the H%olderian functional central limit theorem. Pub. IRMA Lille 57-I, preprint. Semadeni, Z., 1982. Schauder bases in Banach spaces of continuous functions. In: Lecture Notes in Mathematics, Vol. 918. Springer, Berlin, Heidelberg, New York. Shepp, L.A., 1966. Radon–Nikodym derivatives of Gaussian measures. Ann. Math. Statist. 37, 321–354. Yao, Q., 1993. Tests for change-points with epidemic alternatives. Biometrika 80, 179–191.