Exponential and polynomial tailbounds for change-point estimators

Exponential and polynomial tailbounds for change-point estimators

Journal of Statistical Planning and Inference 92 (2001) 73–109 www.elsevier.com/locate/jspi Exponential and polynomial tailbounds for change-point e...

243KB Sizes 3 Downloads 42 Views

Journal of Statistical Planning and Inference 92 (2001) 73–109

www.elsevier.com/locate/jspi

Exponential and polynomial tailbounds for change-point estimators Dietmar Ferger ∗ Department of Mathematics, Dresden University of Technology, Mommsenstrasse 13, 01062 Dresden, Germany Received 2 June 1999; received in revised form 25 January 2000; accepted 1 May 2000

Abstract Let X1n ; : : : ; Xnn be independent random elements with an unknown change point  ∈ (0; 1), that is Xin has a distribution 1 or 2 , respectively, according to i6[n] or i ¿ [n]. We propose an estimator n of , which is de ned as the maximizer of a weighted empirical process on (0; 1). Finding upper bounds of polynomial and exponential type for the tails of nn − [n], we are able to derive rates of almost sure convergence, of distributional convergence, of Lp -convergence and c 2001 Elsevier Science B.V. All of convergence in the Ky-Fan- and in the Prokhorov-metric. rights reserved. MSC: 62F05; 62J05 Keywords: Change-point estimator; Exponential and polynomial tail bounds; Rates of convergence; Martingale maximal inequalities; Weighted empirical processes

1. Introduction Let X1n ; : : : ; Xnn , n ∈ N, be a triangular array of rowwise independent random variables de ned on a common probability space ( ; A; P) with values in a measurable space (X; F). We assume that there exists a real number  ∈ (0; 1) such that Xin has a distribution 1 or 2 , respectively, according to i6[n] or i ¿ [n]. The goal is to estimate the unknown change point . Here nothing is known about the underlying distributions 1 and 2 except that 1 6= 2 . Our estimator is based on the empirical process rn (t) = n−2

n P

[nt] P

i=[nt]+1 j=1

K(Xin ; Xjn );

06t61;

∗ Fax: +49-0351-4637287. E-mail address: [email protected] (D. Ferger).

c 2001 Elsevier Science B.V. All rights reserved. 0378-3758/01/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 1 4 8 - 8

74

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

where K : X2 → R is any F ⊗ F-measurable mapping. The process rn has been introduced by Csorg˝o and Horvath (1988) with K either symmetric or antisymmetric. If the triangular array (Xin ) arises in the usual way from a sequence (Xi ) of i.i.d. random variables (no change) they prove a weak approximation of rn in weighted sup-norms by a Gaussian process. Their results have been extended by Szyszkowicz (1991) in the case of contiguous alternatives and by Ferger (1994a) for a two-parameter generalization of rn . We de ne our estimator as n = arg max w(t)|rn (t)|;

(1.1)

t ∈ Gn

where Gn = {kn−1 : 16k6n − 1} and w : (0; 1) → (0; ∞) is a weight function. Besides n we also consider its one-sided counterpart n+ = arg max w(t)rn (t):

(1.2)

t ∈ Gn

Ferger and Stute (1992) introduce n+ but with no weights (w = 1). They prove an exponential inequality for the tails of n+ −  provided the kernel K is bounded. As a consequence |n+ − |6Cn−1 log n eventually for all n ∈ N with probability one, for some positive constant C. Ferger (1994c) investigates n in the case of small disorders. He points out that n with w = 1 overestimates or underestimates the true change point if  is close to zero or one, respectively. In order to eliminate these boundary e ects weight functions need to be introduced. It is shown for 1 − 2 = 1n − 2n converging to zero at a rate n (in some speci ed sense) that nn2 (n − ) converges in distribution to the a.s. unique maximizer of a two-sided Brownian motion with a negative linear drift. However, very restrictive assumptions on w are needed, in particular the boundedness of w. Natural candidates for weight functions, namely w(t) = t −a (1 − t)−b ;

0 ¡ t ¡ 1; 06a; b61

(1.3)

are therefore excluded. In this paper we consider a class of weight functions — including those in (1.3) – which is given by the following conditions: There are 0 ¡ 0 61 ¡ 1 such that w is monotone decreasing on (0; 0 ) and monotone increasing on (1 ; 1); sup tw(t) ¡ ∞

0¡t¡0

and

sup (1 − t)w(t) ¡ ∞

1 ¡t¡1

(1.4) (1.5)

and w is Lipschitz-continuous on each closed subinterval of (0; 1):

(1.6)

It will turn out (see Remark 2.4 below) that with probability one the weighted empirical processes w(t)rn (t) converge uniformly on (0; 1) to the deterministic function %(t) = w(t)r(t)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

75

with r(t) = 1(0; ] (t)t[( − t) + (1 − )] + 1(; 1) (t)(1 − t)[ + (t − )]; R R R where  = K d1 ⊗ 1 ,  = K d2 ⊗ 2 and  = K d2 ⊗ 1 . A forerunner of this result can be found in Theorem 3:1 of Csorg˝o and Horvath (1988). They prove that rn (t) → r(t)

as n → ∞ P-stochastically

for each xed t ∈ [0; 1]. Condition (1.5) is not only technical but has a natural explanation: It guarantees the boundedness of the function %(t) on (0; 1): This is essential because the crucial condition for consistency of our estimator n in (1.1) is given through the functional shape of the limit function %. It is required that the change point  is the unique maximizer of |%| and, in addition, that |%| has a peak at . Analytically this means the validity of either (P+) or (P−), where %(t) ¡ %()

(P+)

%() − %(t)¿L|t − | ∀t ∈ (0; 1)

and

− inf

(P−)

%(t) − %()¿L|t − | ∀t ∈ (0; 1)

and

− sup %(t) ¿ %()

0¡t¡1

and 0¡t¡1

with L denoting a positive constant. For short we say that (P) holds. If only the rst condition of (P+) is full lled we say that (P + ) holds, which is the corresponding condition for the one-sided estimator n+ in (1.2). As already mentioned the process %n (t) = w(t)rn (t);

0 ¡ t ¡ 1;

has been studied by Csorg˝o and Horvath (1988) under the hypothesis of no change. The di erence to their approach mainly lies in that we do not consider the process %n (t), 0 ¡ t ¡ 1; but the pertaining discrete localized process      [n] t [n] − %n ; t ∈ Z: + Yn (t) = n %n n n n This process is completely di erent from %n and will be analyzed under the alternative of a change. As a consequence we do not obtain Csorg˝o’s and Horvath’s (1988) weight functions of Chibisov–O’Reilly type which instead of (1.5) need to be compared with t −1=2 and (1 − t)−1=2 in a neighborhood of zero and one, respectively. The reason why we introduce the process Yn is that it allows for the representation nn − [n] = arg max Yn (t): t∈Z

It will turn out that Yn converges to a discrete process Y in an appropriate distributional sense, where Y (t) = w()S(t) + 4(t) with

 4(t) =

%0 (+)t; t¿0; %0 (−)t; t ¡ 0

76

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

and S(t) being a centered two-sided random walk on Z. Now it becomes clear how the peak property (P) comes into play. Namely, if e.g. (P+) holds then the random walk Y (t) has a negative linear drift function. Thus by the Strong Law of Large Numbers there is at least one maximizer of Y (t). Finally, we will use arguments very close to those of the argmax Continuous Mapping Theorem of van der Vaart and Wellner (1996, p. 286). Roughly speaking it states that given Yn converges to Y , then the argmax of Yn converges to the argmax of Y . However, the essential condition there and in our case is the tightness of the sequence nn − [n] = arg maxYn (t). For that reason upper bounds for the tail probabilities P(|nn − [n]|¿x);

x ¿ 0;

are indispensable. Note that there is no symmetry or antisymmetry assumption on the kernel K. However, for antisymmetric K the properties (P) and (P + ) may simplify considerably for then  =  = 0 by Fubini’s Theorem. Example 1.1. Let K be antisymmetric and let w be as in (1.3). Then (P) or (P + ) holds i  6= 0 or  ¿ 0, respectively. Many examples of antisymmetric kernels with that property are given in Ferger (1994c). Several estimators known in the change-point literature are special cases of n or n+ . Example 1.2. Let X = R, K(x; y) = 1{x6y} and w(t) = 1[a; 1−a] (t)t −1 (1 − t)−1 for some a ¿ 0 with [a; 1 − a] containing . Then n+ coincides with Darkhovskh’s (1976) estimator. Example 1.3. Let X = R, K(x; y) = x − y and w(t) = t −1=2 (1 − t)−1=2 . Then k n P 1 1 1=2 1 P Xin − Xin : n = arg max (k(n − k)) n 16k6n−1 k i=1 n − k i=k+1

(1.7)

We see that, for each 16k6n − 1, the sample X1n ; : : : ; Xnn is divided into two subsamples X1n ; : : : ; Xkn and Xk+1; n ; : : : ; Xnn and the corresponding subsample means are compared. If for example the rst moments of 1 and 2 di er, then the index k with a maximal distance between the subsample means is indeed a plausible estimator for the true moment m = [n] of change. The estimator in (1.7) is due to Bhattacharya and Brockwell (1976) who derived it in a normal-shift model by a two-step maximum likelihood method. Note that n can be rewritten as 1 n = arg max |Tn (k)|; n 16k6n−1 where Tn (k) =

1 k

Pk

i=1

Xin − q 1 k

1 n−k

+

Pn

1 n−k

i=k+1

Xin

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

77

is the well-known t-test statistic for the hypothesis of equality of the means of the rst k and the last (n − k) observations. Antoch and Huskova (1998) extend Bhattacharya’s and Brockwell’s (1976) estimator to weight functions w(t) = t −a (1 − t)−a ; 06a61=2. In this paper we study the asymptotic properties of n and n+ . For the sake of simplicity, the results are only stated for n but everything carries over to n+ under (P + ). In Section 2 we derive upper bounds for the tails of nn − [n]. They are the key bounds for establishing rates of almost sure convergence (Section 3), of convergence in law (Section 4), of Lp -convergence (Section 5) and of convergence in the Prokhorovand Ky-Fan-metric (Section 6). The appendix contains several inequalities which are needed in our proofs and which may be of interest on their own. 2. Exponential and polynomial tail bounds First we have to investigate the deviation between the weighted empirical process %n (t) = w(t)rn (t) and its limit %(t) = w(t)r(t). For that purpose we consider   n () = P sup w(t)|rn (t) − rn (t)| ¿  ;  ¿ 0; t ∈ Gn

where rn (t) = Ern (t) denotes the mean process. The next proposition gives an upper bound for n (), which involves the following quantities: Z  p |K| di ⊗ j : 16i; j62 ; p¿1; Mp = max Z

n; 0 (p) = −

0

1=n

t

(p)

p

w (dt)

Z and

n; 1 (p) =

1−1=n

1

(1 − t) (p) wp (dt);

where (p) = max(p=2; 1) and w satis es (1.4). Finally, put cn (p) = n; 0 (p) + n; 1 (p): Proposition 2.1. (1) Assume Mp ¡ ∞ for some p¿1 and let w satisfy (1:4) and (1:5). Then there exists a positive constant Cp such that for all  ¿ 0  −(p−1) + n−(2p−3) ; 16p ¡ 2;   (1 + cn (p))n (2.1) n ()6Cp Mp −p (1 + cn (2) + log n)n−1 ; p = 2;   (1 + cn (p))n−p=2 ; p ¿ 2: If w is bounded then we actually have n ()6Cp Mp −p n− (p)

(2.2)

with (p) = min(p=2; p − 1). (2) Assume K is bounded (p = ∞) and let w satisfy (1:4) and sup t 1=2 w(t) ¡ ∞

0¡t¡0

and

sup (1 − t)1=2 w(t) ¡ ∞:

1 ¡t¡1

(2.3)

78

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Then there exists a positive constant C such that for all  ¿ 0 n ()624n exp{−Cn2 }:

(2.4)

Proof. Let A; B; C; etc. denote generic positive constants. We have n ()6P(Wn ¿ ) + P(Vn ¿ ); where Wn = max w n¡k¡n

and Vn = max w 16k6n

(2.5)

      k k k r − Er n n n n n

      k k k r − Er : n n n n n

For the rst term we have   n [n] P P k [K(Xin ; Xjn ) − ] Wn 6 n−2 max w n¡k¡n n i=k+1 j=1   n k P P k [K(Xin ; Xjn ) − ] + n−2 max w n¡k¡n n i=k+1 j=[n]+1 = Wn1 + Wn2 : Put, for x; y ∈ X, Z R12 (y) = K(x; y)2 (d x);

(2.6) Z R21 (x) =

K(x; y)1 (dy)

(2.7)

and H (x; y) = K(x; y) − R12 (y) − R21 (x) + : Then we have

  n [n] k P P max w H (Xin ; Xjn ) Wn1 6 n n¡k¡n n i=k+1 j=1   [n] k P −2 (n − k) [R12 (Xjn ) − ] max w +n j=1 n¡k¡n n   n P k [n] [R21 (Xin ) − ] + n−2 max w n¡k¡n n i=k+1 −2

= An + Bn + Cn : Note that, by (1.5),

   k 1 k −1 1− max w An = n n¡k¡n n n n−k 6 Cn−1

1 |Sl |; 16l6n−[n] l max

(2.8) n−k P P [n] H (Xn−i+1; n ; Xjn ) i=1 j=1

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

79

where Sl =

l [n] P P i=1 j=1

H (Xn−i+1; n ; Xjn )

is a martingale w.r.t. Fl = (Xin : i ∈ {1; : : : ; [n]} ∪ {n − l + 1; : : : ; n}), 16l6n − [n]. Therefore, by Chow’s inequality for submartingales, it follows that P(An ¿ =6) 6C−p n−p

(

P



1 6 l ¡ n−[n]

)  1 1 − E|Sl |p + (n − [n])−p E|Sn−[n] |p : lp (l + 1)p (2.9)

An application of Lemma A:1 below yields p [n] l P P p E H (Xn−k+1; n ; Xjn ) ; E|Sl | 6al (p) j=1 k=1 where

 al (p):=

Bp lp=2−1 ; p ¿ 2; 2; 16p62:

(2.10)

(2.11)

Note that for all 16j6[n] and for all 16k6l6n − [n], we have the relation n − k + 1 ¿ [n]¿j. Thus by conditioning on Xn−k+1; n we obtain p Z p [n] [n] P P (2.12) E H (Xn−k+1; n ; Xjn ) = E H (x; Xjn ) 2 (d x): j=1 j=1 For each xed x ∈ X, the sequence (H (x; Xjn ): 16j6n) consists of i.i.d. centered random variables with Z p mp (x):=E|H (x; Xjn )| = |H (x; y)|p 1 (dy) ∀16j6n: A repeated application of Lemma A:1 to the integrand of the RHS in (2.12) gives p [n] P E H (x; Xjn ) 6a[n] (p)[n]mp (x)6an (p)nmp (x) ∀x ∈ X: j=1 Using Holder’s inequality and Jensen’s inequality, we get Z mp (x)2 (d x)64p−1 Mp : From (2.10) – (2.12) we may conclude that E|Sl |p 6CMp bn; l (p) where

 bn; l (p):=

∀16l6n − [n];

np=2 lp=2 ; p ¿ 2 nl; 1 ¡ p62

∀n; l ∈ N:

(2.13)

80

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Taking into account that l−p − (l + 1)−p 6pl−p−1 for all l ∈ N, a combination of (2:9) and (2:13) yields P(An ¿ =6)6CMp −p n− (p) ;

(2.14)

where (p)=min(p=2; p−1). If K is bounded (p=∞) we have to modify Hoe ding’s (1963) arguments, because a direct application of his exponential inequality is possible but not suciently good. Since An 6Cn−1 max16l6n−[n] (1=l)|Sl |, it follows that   n−[n] P 1 −1 P |Sl | ¿ C nl : (2.15) P(An ¿ =6)6 6 l=1 Now for all  ¿ 0 we have P(Sl ¿ ) # ! " Z [n] l P P H (Xn−i+1; n ; yj ) ¿  [n] = P 1 (dy) P

6e

=e

−s

−s

=e

i=1

j=1

Z E

l Q i=1

# (

exp

Z

(

[n] P

j=1

exp sl ( E exp sl

[n] P

j=1

( exp

Z Z

with y = (y1 ; : : : ; y[n] )

!

sH (Xn−i+1; n ; yj ) ¿ s [n] 1 (dy)

Z "Z

6e−s −s

j=1

" [n] l P P

Z =

i=1

j=1

sH (Xn−i+1; n ; yj ) [n] 1 (dy) #l

sH (x; yj ) 2 (d x)

j=1

[n] P

)

)

[n] P

for all s ¿ 0

[n] 1 (dy)

by independence

) H (x; yj )2 (d x) [n] 1 (dy)

by Jensen’s inequality

)

H (x; Xjn ) 2 (d x) by Fubini’s Theorem:

According to Hoe ding’s (1963) proof, the integrand is bounded by   1 2 2 2 s l n||H || exp 2 uniformly for all x ∈ X with || · || denoting the sup-norm. Hence, we obtain P(Sl ¿ )6exp{−s + 12 s2 l2 n||H ||2 }

∀s ¿ 0:

Minimization in s nally gives P(Sl ¿ )6exp{− 12 2 l−2 n−1 ||H ||−2 }

∀ ¿ 0:

Taking  = 16 C −1 nl ¿ 0 it follows from (2.15) that P(An ¿ =6)62n exp{−Cn2 }:

(2.16)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Summing up (2:14) and (2:16) we obtain  CMp −p n− (p) ; 16p ¡ ∞; P(An ¿ =6)6 2n exp{−Cn2 }; p = ∞:

81

(2.17)

Since, by (1.5),

[n] P Bn 6Cn−1 [R12 (Xjn ) − ] ; j=1

an application of Markov’s inequality, inequality (A.3) and Hoe ding’s (1963) exponential inequality, yields  CMp −p n− (p) ; 16p ¡ ∞; (2.18) P(Bn ¿ =6)6 2exp{−Cn2 }; p = ∞: In the following we assume w.l.o.g. that  ¡ 1 . Then we have    n  k P 1 max w [R (X ) − ] ¿ n P(Cn ¿ =6) 6 P 21 in 12 n¡k¡n1 n i=k+1    n  k P 1 n = Pn1 + Pn2 : [R (X ) − ] ¿ +P max w 21 in 12 n1 6k¡n n i=k+1 w on (; 1 ) we nd that l  P [R21 (Xn−i+1; n ) − ] ¿ Cn 16l6n−[n]

Using the boundedness of  max Pn1 6 P 6 CMp 

−p − (p)

n

i=1

;

where the last inequality is a consequence of Doob’s inequality for submartingales and inequality (A.3). An application of Chow’s inequality together with (A.3) gives  l    l P 1 max w 1− [R21 (Xn−i+1; n ) − ] ¿ n Pn2 6 P 16l6n−[n1 ] n i=1 12 (      P l l+1 −p −p p p −w 1− w 1− 6 C n n n 16l¡n−[n1 ] p l P ×E [R21 (Xn−i+1; n ) − ] i=1 p )  n−[n ]  [n1 ] P1 p E +w [R21 (Xn−i+1; n ) − ] i=1 n ) (        (p) n−1 P k k k − 1 − wp + O(1) 1− wp 6 C−p n− (p) n n n k=[n1 ]+1 ) (Z 1−1=n −p − (p) (p) p (1 − t) w (dt) + O(1) 6 C n 1

6 C

−p − (p)

n

( n; 1 (p) + A):

82

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Thus we obtain for 16p ¡ ∞ P(Cn ¿ =6)6C−p n− (p) ( n; 1 (p) + 1):

(2.19)

If K is bounded then by Hoe ding’s inequality and by (2.3) P(Cn ¿ =6)   n P P n P [R21 (Xin ) − ] ¿ 6 6w(k=n) i=k+1 n¡k¡n   P 2 n 62n exp{−C2 n}: exp −C 2 62 w (k=n)(1 − k=n) n¡k¡n From (2:8) and (2:17)–(2:20) we can conclude that ( CMp −p n− (p) (1 + n; 1 (p)); P(Wn1 ¿ =2)6 6n exp{−C2 n}; p = ∞:

16p ¡ ∞;

(2.20)

(2.21)

For the second term Wn2 in (2.6), we introduce Z R22 (x) = K(x; y)2 (dy) and h(x; y) = K(x; y) − R12 (y) − R22 (x) + ;

x; y ∈ X:

We then have

  n k P k P max w h(Xin ; Xjn ) Wn2 6 n n¡k¡n n i=k+1 j=[n]+1   P k k (n − k) [R12 (Xjn ) − ] + n−2 max w j=[n]+1 n¡k¡n n   n P k (k − [n]) [R22 (Xin ) − ] + n−2 max w n¡k¡n n i=k+1 = Dn + En + Fn : −2

Observe that, by (1.5), Dn 6Cn

−1

n−[n] l 1 P P max h(X[n]+i; n ; X[n]+j; n ) : 16l¡n−[n] n − [n] − l i=l+1 j=1

With the abbreviations m = n − [n] and i = X[n]+i; n we obtain P Pm; l ; P(Dn ¿ =6)6 16l¡m

where Pm; l

! 1 m P l 1 P −1 =P h(i ; j ) ¿ C n 6 m − l i=l+1 j=1 ! ! Z 1 m l P 1 P −1 h(i ; yj ) ¿ C n dl2 (y1 ; : : : ; yl ) = P 6 m − l i=l+1 j=1

(2.22)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

83

p P l 6 C n (m − l) E h(1 ; yj ) dl2 (y1 ; : : : ; yl ) j=1 p Z l P = C−p n−p (m − l) (p)−p E h(x; j ) 2 (d x) j=1 −p −p

(p)−p

Z

6 CMp −p n−p (m − l) (p)−p l (p) : Here both inequalities follow from (A.3) and the last equality is an application of Fubini’s Theorem. Therefore, by Lemma A.3,  −(2p−3) ; 16p ¡ 2; n (2.23) P(Dn ¿ =6)6CMp −p n−1 log n; p = 2;  −p=2 ; 2 ¡ p ¡ ∞: n The bounded case (p = ∞) is treated similar to the derivation of (2.17). Namely, for all  ¿ 0 we have ! l m P P h(i ; j ) ¿  P i=l+1 j=1

Z =

P

6e

=e

−s

6e

=e

−s

sh(i ; yj ) ¿ s dl2 (y1 ; : : : ; yl ) ∀s ¿ 0

i=l+1 j=1

Z E

(

m Q

exp

i=l+1

Z "Z

( exp

−s

−s

!

l m P P

Z Z

Z

l P

j=1

l P

j=1

) sh(i ; yj ) dl2 (y1 ; : : : ; yl ) )

#m−l

sh(x; yj ) 2 (d x)

( exp s(m − l) (

E exp s(m − l)

l P j=1

l P j=1

dl2 (y1 ; : : : ; yl )

) h(x; yj ) 2 (d x)dl2 (y1 ; : : : ; yl ) )

h(x; j ) 2 (d x)

6exp{−s + 12 s2 (m − l)2 l||h||2 }: The last expression attains its minimal value at s = ||h||−2 (m − l)−2 l−1 , whence Pm; l 62exp{−Cn2 } and consequently P(Dn ¿ =6)62n exp{−Cn2 }: With the same arguments as in the proofs of (2.19) – (2.21), one shows that  CMp −p n− (p) ; 16p ¡ ∞; P(En ¿ =6)6 2exp{−Cn2 }; p = ∞

(2.24)

(2.25)

84

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

and

( P(Fn ¿ =6)6

CMp −p n− (p) (1 + n; 1 (p)); 2

2n exp{−Cn };

1 ¡ p ¡ ∞;

p = ∞:

(2.26)

Combining (2:6) and (2:22)–(2:27) we arrive at  −(p−1) n (1 + n; 1 (p)) + n−(2p−3) ; 16p ¡ 2;    P(Wn ¿ )6CMp −p n−1 (1 + n; 1 (1) + log n); p = 2;    −p=2 (1 + n; 1 (p)); 2 ¡ p ¡ ∞ n and if K is bounded P(Wn ¿ )612n exp{−Cn2 }: Analoguous upper bounds for P(Vn ¿ ) can be derived in the same manner, which shows (2:1) and (2:4). To make the proof of Proposition 2.1 complete assume that w is bounded. Then n−[n] l P P −2 max h(X[n]+i; n ; X[n]+j; n ) Dn 6Cn 16l6n−[n] i=l+1 j=1 whence by Lemma A.2 P(Dn ¿ =6)6CMp −p n−2 (p) : Replacing (2.24) through this sharper bound gives (2.2). Remark 2.2. The rates of convergence in Proposition 2.1 involve the growth behavior of the weight function w at the boundary of (0; 1). If w is di erentiable such that w0 (t)¿ − A0 t −( +1)

∀t ∈ (0; 0 )

(2.27)

and w0 (t)6A1 (1 − t)−( +1) with constants A0 , A1 ¿ 0 and   1;

n; 0 (p)6C log n;  p( −1=2) ; n and for 3=2 ¡ p ¡ 2    1;

n; 0 (p)6C log n;   n p−1 ;

∀t ∈ (1 ; 1)

(2.28)

06 , 61, then for p¿2 06 ¡ 1=2 = 1=2 1=2 ¡ 61

06 ¡ 1=p = 1=p 1 p ¡ 61

A corresponding result holds for n; 1 (p).

(2.29)

(2.30)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

85

Example 2.3. Consider w(t) = t −a (1 − t)−b ;

06a; b61;

which satis es (2.28) and (2.29) with = a and = b, respectively. So, for example, in the case of a = b and p ¿ 2, it follows from (2.1), (2.2), and (2.30) that  −p=2 ; 06a ¡ 1=2; n n ()6Cp Mp −p n−p=2 log n; a = 1=2;  −p(1−a) ; 1=2 ¡ a61: n Note that n () in general does not converge to zero if a = 1. This means that weights of order t −1 and (1 − t)−1 overdo it and therefore must be omitted. Remark 2.4. It is easy to extend Proposition 2.1 to sup0¡t¡1 w(t)|rn (t)−Ern (t)|. Since under (1.5) sup w(t)|Ern (t) − r(t)| → 0

0¡t¡1

as n → ∞;

it follows that sup w(t)|rn (t) − r(t)| → 0

0¡t¡1

as n → ∞

P-stochastically or P-almost surely, respectively, according to 3=2 ¡ p62 or p ¿ 2. The following lemma shows that (n ) and (n+ ) are stochastically equal, asymptotically. Lemma 2.5. Assume (1:5) and the second part of (P+) hold. Then there exists 0 ¿ 0 such that P(n 6= n+ )62n (0 )

eventually for all n ∈ N:

Proof. An application of Lemma A.4 with T = Gn and f = wrn yields   P(n 6= n+ )6P max w(t)rn (t)6 − min w(t)rn (t) = Pn : t ∈ Gn

t ∈ Gn

According to the second part of (P+) we know that 0 = 12 (() + inf 0¡t¡1 (t)) is strictly positive. With rn (t) = Ern (t) and mn = mint ∈ Gn w(t)rn (t) it follows that     [n] rn () Pn 6 P − min w(t)rn (t)¿w t ∈ Gn n       [n] [n] rn (); w |rn () − rn ()|60 +n (0 ): 6 P − min w(t)rn (t)¿w t ∈Gn n n The rst summand in this last expression is bounded from above by     [n] rn () − 0 + mn P − min w(t)(rn (t) − rn (t))¿w t ∈ Gn n     [n] rn () − 0 + mn : 6P max w(t)|rn (t) − rn (t)|¿w t ∈ Gn n

86

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Observe that by (1.5)     [n] rn () + mn ¿() + inf (t) = 20 lim inf w n→∞ 0¡t¡1 n from which we can deduce the assertion of the lemma immediately. Remark 2.6. Note that for antisymmetric K with  ¿ 0 the second part of (P+) is automatically ful lled because of inf 0¡t¡1 (t)¿0. Especially in this case 0 = 12 () = 12 w()(1 − ) ¿ 0 is suited. In view of Lemma 2.5 it suces to investigate the one-sided estimator n+ . From a technical point of view we may omit the absolute value. In the next step we continue reducing the problem to the truncated estimator n∗ = arg max w(t)rn (t); t ∈ Gn ∩( ; )

where 0 ¡ ¡ ¡ 1 are such that  ∈ ( ; ). Lemma 2.7. Under (1:5) and (P + ); there exists 0 ¿ 0 such that P(n+ 6= n∗ )64n (0 )

eventually for all n ∈ N:

Proof. First note that because  ∈ ( ; ) we have P(n+ 6= n∗ ) 6 P(n+ 6 ∈ Gn ∩ ( ; )) 



        k k ∗ ∗   rn ¿ w(n )rn (n ) 6P max w n n   16k6n n 6k ¡ n        [n] k k rn ¿w rn () 6 P max w 16k6n n n n 

 +P

max w

n 6k¡n

+

       [n] k k rn ¿w rn () = Pn + Qn : n n n

Recall that , by (P ), is the unique maximizer of %, whence   0 := 12 %() − max %(t) t6 ∈ ( ; )

is strictly positive. Put kn =

max

t ∈ Gn ; t6 ∈ ( ; )

w(t)Ern (t)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

87

and observe that           [n] k k k rn − Ern ¿w rn () − kn Pn 6 P max w 16k6n n n n n          [n] k k k rn − Ern ¿w rn () − kn ; 6 P max w 16k6n n n n n    [n] |rn () − Ern ()|60 + n (0 ) w n           [n] k k k rn − Ern ¿w r() − 0 − kn + n (0 ): 6 P max w 16k¡n n n n n Using lim supn→∞ kn 6maxt6 ∈ ( ; ) %(t) and limn→∞ w([n]=n)r() = %() we obtain Pn 62n (0 )

eventually for all n ∈ N:

Treating the second probability Qn in the same manner completes the proof. Remark 2.8. Let K be antisymmetric and w(t) = t −a (1 − t)−b with 06a; b ¡ 1. From Example 1.1 we know that (1.4) – (1.6) are satis ed and that (P + ) holds i  ¿ 0. Moreover, we have 0 = C0  for a certain positive constant C0 . A simple combination of Lemmata 2:5 and 2:7 yields the following. Corollary 2.9. Let B denote the Borel--algebra on R and assume that (1:5) and (P+) hold. Then there exists a 0 ¿ 0 such that sup |P(n ∈ B) − P(n∗ ∈ B)|66n (0 ) eventually for all n ∈ N:

B∈B

Now we are in the position to prove the main result of this paper. It will be used to derive rates for several modes of convergence. For its formulation we introduce  −(p−1) + n−(2p−3) ; 3=2 ¡ p ¡ 2;   (1 + cn (p))n n (p) = (1 + cn (2) + log n)n−1 ; p = 2;   (1 + cn (p))n−p=2 ; 2 ¡ p ¡ ∞: This enables us to rewrite (2.1) of Proposition 2.1 shortly as n ()6Cp Mp −p n (p). The following theorem provides upper bounds for the tail probability   [n] ¿ x ; x ¿ 0: dn (x) = P n n − n Theorem 2.10. (1) Assume that (1:4)– (1:6) and (P) are satisÿed. If Mp is ÿnite for some p ¿ 3=2 then there exist positive constants Ap ; Bp and x0 such that for eventually all n ∈ N   −(p−1) + n−(2p−3) ; 3=2 ¡ p ¡ 2  x + Bp Mp n (p) dn (x)6Ap Mp L−p x−1 + n−1 log n; p=2   −p=2 x + n−p=2 ; 2¡p¡∞

88

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

for all x¿x0 . The constants Ap and Bp depend only on w and  and are continuous in . (2) Assume (1:4); (1:6); (2:3) and (P) are satisÿed. If K is bounded (p = ∞) then there exist positive constants A; B; C and D such that for eventually all n ∈ N dn (x)6C exp{−AL2 x} + 64n exp{−BL2 n} + 24n exp{−Dn} for all x¿x0 . (3) If w = 1 then x0 = 0; and; in addition; for the one-sided estimator n+ the n terms in (1) and the last term in (2) vanish. Proof. W.l.o.g. we can assume that (P+) holds, otherwise apply the following arguments to −K. By Corollary 2.9 we may consider n∗ . Set %n (t) = w(t)rn (t) and %n (t) = w(t)Ern (t); 0 ¡ t ¡ 1, as well as Z = {l ∈ Z: n 6l6n }. Now check that 



     k [n]    ¿%n P(|nn∗ − [n]| ¿ x) 6 P  max %n  n n   |k−[n]|¿x k ∈Z

    k [n] ¿%n %n 6P max [n]+x¡k6n n n 

    k [n] ¿%n %n +P max n 6k¡[n]−x n n 

= P(Bn (x)) + P(B˜ n (x)) = Pn + P˜ n : For all k ∈ Z with [n] + x ¡ k6n , centering with %n (k=n) gives %n where

    k [n] − %n = Sn; 1 (k) + Sn; 2 (k) − n (k); n n

          [n] k k k −w rn − rn ; Sn; 1 (k) = w n n n n  Sn; 2 (k) = w

and

 n (k) = %n

[n] n

[n] n

      k k rn − rn − rn () + rn () n n

 − %n

  k : n

(2.31)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

89

After some algebra and taking into account (1.5), (1.6) and (P+), it follows that n (k)¿Ln−1 (k − [n]) − Cn−1 ¿ 12 Ln−1 (k − [n])

(2.32)

−1

for all x¿x:=2CL ˜ , since x ¡ k − [n] by assumption on k. Thus for all x¿x˜   1 1 max |Sn; 1 (k)| ¿ Ln−1 Pn 6 P [n]+x¡k6n k − [n] 4   1 −1 1 |Sn; 2 (k)| ¿ Ln +P max [n]+x¡k6n k − [n] 4 = P(Bn1 (x)) + P(Bn2 (x)) = Pn1 + Pn2 :

(2.33)

Conclude with (1.6) that   Pn1 6P max |rn (t) − rn (t)| ¿ CL ; t ∈ Gn

whence by (2.2) Pn1 6CMp L−p n− (p)

(2.34)

and Pn1 624n exp{−B1 nL2 }

for bounded K:

In the sequel we write shortly Xi instead of Xin . Then we have ! P k P 1 n max [K(Xi ; Xj ) − ] ¿ CLn Pn2 6 P [n]+x¡k¡n k − [n] i=k+1 j=[n]+1 +P

max

[n]+x¡k¡n

! P P 1 k [n] [K(Xi ; Xj ) − ] ¿ CLn k − [n] i=[n]+1 j=1

= P(Bˆ n2 (x)) + P(B˜ n2 (x)) = Pˆ n2 + P˜ n2 : With h as de ned in the proof of Proposition 2.1, we obtain ! n−[n] k P P 1 max h(X[n]+i ; X[n]+j ) ¿ CLn Pˆ n2 6 P 16k¡n−[n] k i=k+1 j=1 ! k 1 P +P max (n − [n] − k) [R12 (X[n]+j ) − ] ¿ CLn x¡k¡n−[n] k j=1 ! n−[n] P [R22 (X[n]+i ) − ] ¿ CLn +P max x¡k¡n−[n] i=k+1 = Qˆ n + Vˆn + Wˆ n : Since

! P m P l 1  n−i+1 ; Xn−j+1 ) ¿ CLn ; h(X max Qˆ n = P 16l¡n−[n] m − l i=l+1 j=1

(2.35)

90

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

 y):=h(y; x), we can proceed similarly as for (2.24) and (2.25). This gives where h(x;  −(2p−3) ; 3=2 ¡ p ¡ 2; n (2.36) Qˆ n 6CMp L−p n−1 log n; p = 2;  −p=2 n ; 2¡p¡∞ and, if K is bounded, then Qˆ n 62n exp{−B2 nL2 }:

(2.37)

By Example 6:5:5(c) in Ganssler and Stute (1977), P [R12 (X[n]+j ) − ]; 16k6n − [n] k −1 16j6k

is an inverse martingale. Therefore, by Doob’s maximal inequality and inequality (A.3) we arrive at Vˆn 6CMp L−p x− (p) ;

3=2 ¡ p ¡ ∞:

(2.38)

If K is bounded, then by Hoe ding’s inequality ! P k P 1 P Vˆn 6 [R12 (X[n]+j ) − ] ¿ CL k j=1 x¡k¡n−[n] 62

P x¡k¡n−[n]

Z 62 Note that Wˆn 6P

n−[n]

x



exp{−AL2 k}

exp{−AL2 s} ds6C exp{−AL2 x}:

(2.39)

l  P [R22 (Xn−i+1 ) − ] ¿ CLn 16l¡n−[n] max

i=1

so that again by Doob’s maximal inequality and by (A.3) Wˆn 6CMp L−p n− (p) ;

3=2 ¡ p ¡ ∞

(2.40)

and, if K is bounded, then by Hoe ding’s inequality Wˆn 62 exp{−BL2 n}: Conclude with H de ned as in the proof of Proposition 2.1, that ! P [n] k P 1 max H (Xi ; Xj ) ¿ CLn P˜ n2 6 P [n]+x¡k¡n k − [n] i=[n]+1 j=1 ! [n] P + P [R12 (Xj ) − ] ¿ CLn j=1 ! P 1 k [R21 (Xi ) − ] ¿ CL +P max [n]+x¡k¡n k − [n] i=[n]+1 = Q˜ n + V˜n + W˜ n :

(2.41)

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

91

Put, for 16k6n − [n], Lk =

k [n] P P i=1 j=1

H (X[n]+i ; Xj )

and Fk = (X1 ; : : : ; X[n]+k ): Then (Lk ; Fk )16k6n−[n] is a martingale and the same arguments as in (2.9) – (2.18) yield ( CMp L−p n− (p) ; 3=2 ¡ p ¡ ∞; (2.42) Q˜ n 6 2n exp{−BnL2 }; p = ∞: By (A.3) and Hoe ding’s inequality we immediately obtain for V˜ n the same upper bound as for Q˜ n . Finally,   k 1 P max [R (X ) − ] ¿ CL ; W˜ n = P 21 [n]+i x¡k¡n−[n] k i=1 so that we can proceed as in the derivation of (2.39) and (2.40). Taking this into account and summing up (2:34)– (2:43) leads to  −(p−1) + n−(2p−3) ; 3=2 ¡ p ¡ 2; x Pn 6CMp L−p x−1 + n−1 log n; p = 2;  −p=2 + n−p=2 ; 2 ¡ p ¡ ∞ x and, if K is bounded, to Pn 6C exp{−AL2 x} + 32n exp{BL2 n}: The second probability P˜ n in (2.33) can be treated in the same manner. Thus, by (2.33), Corollary 2.9 and Proposition 2.1, assertions (1) and (2) of the theorem follow. As to (3) observe that for w = 1    [n] k − 1− n (k) = n−1 (k − [n]) n n    [n] [n] − 1− ¿ n−1 (k − [n]) n n because k ¿ [n] in the case of Pn . Now the third factor {: : :} in the lower bound converges to  − (1 − ) = −r 0 (+). Since by (P+) the right-hand-side derivative r 0 (+) is smaller than −L we can infer that n (k)¿Ln−1 (k − [n]) eventually for all n ∈ N, which means that x˜ = 0. This proves the rst part of (3). Moreover, if w = 1, the proof of (2.33) may be extended to the extreme case = = 0, in which n∗ and n+ coincide. In the following corollary we consider the case which is most relevant in practice as pointed out in Example 1.1.

92

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

−a −b Corollary 2.11. Let K be antisymmetric and w(t) = R t (1 − t) ; 0 ¡ t ¡ 1; 06a; b 61=2. Assume Mp ¡ ∞ for some p ¿ 3=2 and  = K d2 ⊗ 1 6= 0. Then there exist positive constants Ap and x0 such that for eventually all n ∈ N  −(p−1) x + n−(2p−3) ; 3=2 ¡ p ¡ 2;    −1 −1 x + n log n; p = 2; P(|nn − [n]| ¿ x)6Ap Mp ||−p −p=2 −p=2  + n ; 2 ¡ p ¡ ∞;a; b ¡ 1=2; x   −p=2 −p=2 +n log n; 2¡p ¡∞;a or b=1=2 x

for all x¿x0 . If K is bounded then there exist positive constants A; B and C such that for eventually all n ∈ N P(|nn − [n]| ¿ x)6C exp{−A2 x} + 88n exp{−B2 n}

∀x¿x0 :

Proof. The proof of Theorem 2.10 actually shows that the constants Bp and D in (1) and D = C20 with 0 as in Corollary and (2), respectively, are of the type Bp = C−p 0 2.9. Using Remarks 2.6 and 2.8 we see that 0 = C||. Finally, L = ||min(; 1 − ) because of antisymmetry, which in view of Remark 2.2 completes the proof.

3. Optimal rates of almost sure convergence The following proposition is a simple consequence of Theorem 2.10, Remark 2.2 and Example 2.3. Proposition 3.1. Suppose the assumptions of Theorem 2:10 are fullÿlled. (1) If p ¿ 3=2 and n (p) = o(1); then n −  = Op (n−1 ): P (2) If p ¿ 2 and n¿1 n (p) ¡ ∞; then n −  = o(n−1+2=p+ )

a:s:

for all  ¿ 0. (3) If K is bounded; then there exists a positive constant C such that a.s. |n − |6Cn−1 log n

eventually for all n ∈ N:

(4) If w(t) = t −a (1 − t)−b ; 06a; b61=2; then the pertaining n (p) satisfy the assumptions in (1) and (2). Up to now we did not specify the probabilistic relation between the rows of the array (Xin ). Ferger (1995) considers the special embedding  [n]−i+1 ; 16i6[n]; (3.1) Xin = i−[n] ; [n] ¡ i6n;

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

93

where (i )i ∈ N and (i )i ∈ N are two independent sequences of i.i.d. random variables with i ∼ 1 and i ∼ 2 . He shows that in the situation of (3.1), assertion (2) of Proposition 3.1 can be improved to n −  = o(n−1 ln )

a:s:

∀(ln ) with ln → ∞:

(3.2)

Since for any sequence (an ) of real numbers one has an = O(1) ⇔ an = o(ln ) ∀(ln ) with ln → ∞;

(3.3)

we may conjecture in view of (3.2) that n −  = O(n−1 )

a:s:

(3.4)

Note that (3.4) cannot be deduced from (3.2) and (3.3) because the set of sequences (ln ) with ln → ∞ is not countable. However, the next result gives a positive answer. Theorem 3.2. Assume (P) holds and Mp ¡ ∞ for some p ¿ 2. If w with (1:4)– (1:6) P is such that n¿1 n (p) ¡ ∞; then (3:4) holds for the special embedding (3:1). Proof. By Proposition 2.1 and Lemmata 2:5 and 2:7 we have with probability one n = n∗ for eventually all n ∈ N. Therefore, it suces to prove    [n] ¿ M = 0; P lim sup n n∗ − n n→∞ where M is some random variable on ( ; A; P), which will be speci ed below. We follow the steps in the proof of Theorem 2.10. With the same notation, put   [n] ¿ M En = n n∗ − n and confer (2.32) to see that En ⊆ Bn (M ) ∪ B˜ n (M )

∀n ∈ N:

Clearly, lim sup En ⊆ lim sup Bn (M ) ∪ lim sup B˜ n (M ): n→∞

n→∞

n→∞

First we consider P(lim supn→∞ Bn (M )). Since w.l.o.g. M ¿x0 (otherwise take max(M; x0 )) we have that lim sup Bn (M ) ⊆ lim sup Bn1 (M ) ∪ lim sup Bn2 (M ): n→∞

n→∞

n→∞

Following the proof of (2.35), we obtain P(Bn1 (M ))6CMp L−p n−p=2 ; which ensures that   P lim sup Bn1 (M ) = 0: n→∞

94

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Furthermore, we have that lim sup Bn2 (M ) ⊆ lim sup Bˆ n2 (M ) ∪ lim sup B˜ n2 (M ) n→∞

n→∞

n→∞

and Bˆ n2 (M ) ⊆ Qn ∪ Vn ∪ Wn ; where Qn ; Vn and Wn denote the events pertaining to the probabilities Qˆ n ; Vˆn and Wˆn , respectively, with x = M . Since the upper bounds in (2.37) and (2.41) remain valid even for our random x = M (because they do not depend on x), it follows that     P lim sup Qn = P lim sup Wn = 0: n→∞

n→∞

Next observe that by (3.1) ( ) k 1 P Vn ⊆ max [R12 (j ) − ] ¿ CL : M ¡k6n k j=1 De ne the random variable ) ( 1 k 1 P M = inf ‘ ∈ N: max [R12 (j ) − ] 6 CL : 2 k¿‘ k j=1 Then by the Strong Law of Large Numbers we have that M is nite (M ∈ N) with probability one, whence P(lim supn→∞ Vn ) = 0. This shows P(lim supn→∞ Bˆ n2 (M )) = 0. Analogously, one proves that P(lim supn→∞ B˜ n2 (M )) = 0, whence P(lim supn→∞ Bn (M )) = 0. In the same manner we can conclude that there exists a random variable M˜ such that P(lim supn→∞ B˜ n (M˜ )) = 0. Replacing M with max(M; M˜ ; x0 ) yields the desired result. We like to stress that one cannot adopt the above proof to arbitrary double-indexed schemes (Xin ) since then the conclusion P(lim supn→∞ Vn ) = 0 is no longer possible. Yao et al. (1994) proved a result like (3.4) for Dumbgen’s (1991) estimator, and also used the special embedding (3.1). 4. Rates of convergence in law Theorems 3:1 and 3:2 suggest that nn − [n] converges in distribution, which indeed has been proved by Ferger (1994b) in the case of bounded kernels K and w = 1 (no weights). We extend this result to general weight functions w with (1.4) – (1.6) and a general K. Moreover, rates of distributional convergence are established. Our limit theorems involve the two-sided random walk Z de ned by  k P    g(j ); k¿0;  j=1 Z(k) = 0 P    g(j ); k ¡ 0; − j=k+1

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

95

where (j )j ∈ Z and (j )j ∈ Z are two independent sequences of i.i.d. random elements with j ∼ 1 and j ∼ 2 . The mapping g is given by g(x) = (1 − )R12 (x) − R21 (x);

x ∈ X;

with R12 and R21 as introduced in (2.7). Let m(k) = EZ(k); k ∈ Z. If w0 () exists, put (k) = w0 ()(1 − )k + w()m(k);

k ∈ Z:

Set Y (k) = w()[Z(k) − m(k)] + (k);

k ∈ Z:

Note that Y = {Y (k): k ∈ Z} is a two-sided random walk with two-sided linear drift , which also has the representation  0 % (+)k; k¿0; (k) = %0 (−)k; k ¡ 0: If % has, e.g., the property (P+), then Y has a negative drift. Especially, by the Strong Law of Large Numbers, Y (k) → −∞ as |k| → ∞ a.s., whence the smallest and largest maximizer of Y , Tmin = min{k0 ∈ Z: Y (k)6Y (k0 ) ∀k ∈ Z} and Tmax = max{k0 ∈ Z: Y (k)6Y (k0 ) ∀k ∈ Z}; exist a.s. Theorem 4.1. Assume w with (1:4)– (1:6) is continuously di erentiable at  and n = o(1). If Mp ¡ ∞ for some p ¿ 3=2 and (P) is satisÿed; say w.l.o.g. (P+); then for all z ∈ Z lim inf P(nn − [n]6z)¿P(Tmax 6z) n→∞

and lim sup P(nn − [n]6z)6P(Tmin 6z): n→∞

L

Consequently; if Tmin = Tmax ; then L

nn − [n] → T;

(4.1)

where T denotes the a.s. unique maximizing point of Y . Proof. By Lemma 2.5 it suces to prove the assertions for n+ . If we de ne the stochastic process Yn = {Yn (k): − [n]6k6n − [n]} through      [n] + k [n] − %n ; Yn (k) = n %n n n

96

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

then the following basic relation holds: nn+ − [n] = argmax{Yn (k): − [n]6k6n − [n]}: Let Yn(d) and Y (d) denote the restrictions of Yn and Y , respectively, to Z∩[−d; d]; d ∈ N. We will show that L

Yn(d) → Y (d) ;

n→∞

∀d ∈ N:

(4.2)

For that purpose, x d ∈ N and write [n] + k : n Then Yn admits the decomposition      [n] [n] Zn (k) + n w(tn (k)) − w rn (tn (k)) Yn (k) = w n n tn (k) =

with Zn (k) = n[rn (tn (k)) − rn ()]: Consider the centered process Xn (k) = Yn (k) − EYn (k)   [n] [Zn (k) − EZn (k)] + Rn (k) =w n with

(4.3)

   [n] [rn (tn (k)) − Ern (n (k))]: Rn (k) = n w(tn (k)) − w n

Using (1.6) and Proposition 2.1, we can infer that Rn ={Rn (k): −d6k6d} is stochastically negligible: sup

−d6k6d

|Rn (k)| = oP (1)

as n → ∞:

(4.4)

It is easy to verify that for all n ¿ n0 = n0 (d) = dmax( −1 ; (1 − )−1 ) L

{Zn (k) − EZn (k): − d6k6d} ={Zn∗ (k): − d6k6d}; where

(4.5)

 ) ( n−[n] [n] k  P P P   n−1 [K(i ; j ) − ] − [K(j ; i ) − ]    j=1 i=k+1 i=1 ∗ Zn (k) = ) (  0 k n  P P P   [K(i ; j ) − ] − [K(j ; i ) − ]  n−1  i=k+1

j=−[n]+1

j=[n]+1

for 06k6d and −d6k ¡ 0, respectively. Next we approximate Zn∗ by the process Z˜ n = {Z˜ n (k): − d6k6d} de ned by  E(Zn∗ (k)|1 ; : : : ; k ); 06k6d Z˜ n (k) = ∗ E(Zn (k)|k+1 ; : : : ; 0 ); −d6k ¡ 0 = Z(k) − m(k) − Vn (k);

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

97

where, e.g., for 06k6d Vn (k) =

k 1P (n − [n] − k)(R12 (j ) − ) + (n − [n])(R21 (j ) − ): n j=1

By Doob’s maximal inequality in combination with (A.3) one shows that   P max |Vn (k)| ¿  6CMp −p n−p d (p)+p ∀ ¿ 0: −d6k6d

(4.6)

Furthermore, with h and H as in the proof of Proposition 2.1, we obtain for 06k6d k k P P PP 1 n−[n] 1 [n] h(i ; j ) − H (j ; i ) Zn∗ (k) − Z˜ n (k) = n i=k+1 j=1 n i=1 j=1

+

P P k [n] 1 n−[n] (R22 (j ) − ) − (R12 (i ) − ) n i=k+1 n i=1

and a similar representation for −d6k ¡ 0. Tedious arguments which are similar to the proof of Proposition 2.1 (for details see Ferger, 1995) lead to   P max |Zn∗ (k) − Z˜ n (k)| ¿  −d6k6d

6CMp −p {d2 (p) n−p + d (p) n (p)−p + dp n (p)−p }:

(4.7)

Taking into account (4.3) – (4.7), Slutsky’s lemma gives L

{Xn (k): − d6k6d} →{w()[Z(k) − m(k)]: − d6k6d} as n → ∞ for each xed d ∈ N. Finally, an elementary calculation shows that max |EYn (k) − (k)| → 0;

−d6k6d

from which (4.2) can be deduced by another application of Slutsky’s lemma. Now the rest of the proof follows with Theorem 2.10 in complete analogy to the proof of Theorem 2:3 of Ferger (1994b). Here we have to add that indeed T = arg maxk ∈ Z Y (k) L

is a.s. unique. To see this recall that Tmin = Tmax by assumption and Tmin 6Tmax by de nition. Therefore, 06E(Tmax − Tmin ) = ETmax − ETmin = 0; whence Tmax = Tmin a.s. Remark 4.2. (1) Recall that n (p) = o(1) for w(t) = t −a (1 − t)−b ; 06a; b61. (2) Combining Theorems 2.10 and 4.1 gives an upper bound for the tails of the limit variable T . Namely, for all x¿x0 , we have  −(p−1) ; 3=2 ¡ p62; x −p=2 P(|T | ¿ x)6C x ; 2 ¡ p ¡ ∞;  exp{−Dx}; p = ∞: Especially T is integrable for p ¿ 2.

98

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Let Hn and H denote the distribution functions of n =nn −[n] and T , respectively. Then by Lemma 3:1 of Ferger (1994b), (4.1) implies lim sup |Hn (x) − H (x)| = 0:

(4.8)

n→∞ x ∈ R

It is now natural to ask for rates of convergence in (4.8). For the solution of this problem we investigate the probabilities P(n 6= Tn );

n ∈ N; L

where Tn is an appropriate random variable on ( ; A; P) with Tn = T for all n ∈ N. Consider the extended array (Xin : i ∈ Z; n ∈ N) of rowwise independent random elements on ( ; A; P) with  1 ; i6[n]; P ◦ Xin−1 = 2 ; i ¿ [n]: Moreover, set f(x) = w()g(x) + w0 ()r(); and (n) Y˜ (k) =

x∈X

 k P    f(X[n]+j; n ); k¿0;  j=1

−k P    f(X[n]−j+1; n ); k ¡ 0: − j=1

L Obviously, Y = Y˜

(n)

for all n ∈ N. Thus

(n) L Tn := arg max Y˜ (k) = T k ∈Z

∀n ∈ N

as desired. Proposition 4.3. Assume w with (1:4)– (1:6) has a bounded second derivative w00 in a neighborhood of . Furthermore; let the distribution functions F1 and F2 of 1 ◦ f−1 and 2 ◦ f−1 be Lipschitz continuous. If Mp ¡ ∞ for some p ¿ 3=2 and (P) is satisÿed; say (P+) w.l.o.g.; then   −(2p−3)  ; 3=2 ¡ p ¡ p0   n 2 2 + C2 n (p): P(nn − [n] 6= Tn )6C1 n−(p−1) =p +3p−1 ; p0 6p ¡ 2     n−p=2(p+7) ; 26p ¡ ∞ Here p0 is the largest solution of the cubic equation 2p3 + 2p2 − 9p + 2 = 0; i.e. p0 = 1:523241256 : : : : If K is bounded then P(nn − [n] 6= Tn )6Cn−1=2 (log n)7=2 : Remark 4.4. Since (P+) is full lled and F1 and F2 are continuous, Tn is the a.s. (n) unique maximizer of Y˜ . Consequently, (4.1) holds.

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

99

Proof of Proposition 4.3. By Lemma 2.5 we have P(nn − [n] 6= Tn )6P(nn+ − [n] 6= Tn ) + n (0 )

(4.9)

for some positive 0 . Let (dn ) be positive real numbers with dn → ∞ and dn = o(n=log n). Then P(nn+ − [n] 6= Tn ) 6P(Tn 6= nn+ − [n] ∈ [ − dn ; dn ]; Tn ∈ [ − dn ; dn ]) + P(|nn+ − [n]| ¿ dn ) + P(|Tn | ¿ dn ) =Pn + Qn + Rn :

(4.10)

An application of Theorem 2.10 and Remark 4.2(2) immediately yields  −(p−1) + n−(2p−3) + n (p); 3=2 ¡ p ¡ 2;  dn −p=2 Qn + Rn 6C dn + n (p); 26p ¡ ∞;  exp{−Adn }; p = ∞:

(4.11)

Set + + n = nn − [n]

and n =

min

Y (Tn ) − Y (k);

−dn 6k6dn k6=Tn (n) where Y is shorthand notation for Y˜ . Note that n ¿ 0 a.s. by Remark 4.4. We then get

∈ [ − dn ; dn ]; Tn ∈ [ − dn ; dn ]} {Tn 6= +  n |Yn (k) − Y (k)| ≥ n =2 ∩ {Tn ∈ [ − dn ; dn ]} ⊆ sup −dn 6k6dn

with Yn as de ned in the proof of Theorem 4.1. To prove the inclusion assume that Tn 6= + n ∈ [ − dn ; dn ] and Tn ∈ [ − dn ; dn ], but sup

−dn 6k6dn

|Yn (k) − Y (k)| ¡ n =2:

Then Yn (Tn ) − Yn (k) = Y (Tn ) − Y (k) + [Yn (Tn ) − Y (Tn ) + Y (k) − Yn (k)] ¿ n − 2

sup

−dn 6k6dn

|Yn (k) − Y (k)| ¿ 0

for all −dn 6k6dn with k 6= Tn , which implies Tn = arg max Yn (k): −dn 6k6dn

In the proof of Theorem 4.1 we saw that + n =

arg max

−[n]6k6n−[n]

Yn (k):

100

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

+ Now by assumption + n ∈ [ − dn ; dn ] and consequently Tn = n , which is a contradiction to our assumption Tn 6= + n . This proves the inclusion. Thus for all  ¿ 0   sup |Yn (k) − Y (k)|¿ Pn 6 P(n ¡ 2; |Tn |6dn ) + P −dn 6k6dn

= Pn1 () + Pn2 ():

(4.12)

Next observe that Pn1 () = P(n ¡ 2; 06Tn 6dn ) + P(n ¡ 2; −dn 6Tn ¡ 0) = n () + n () and

(4.13)





n () 6 P 

min

Y (Tn ) − Y (k)62; 06Tn 6dn 

06k6dn k6=Tn



 +P

min

−dn 6k¡0

Y (Tn ) − Y (k)62; 06Tn 6dn

= n1 () + n2 (): On the set {06Tn 6dn } we have min 06k6dn k6=Tn

j P Y (Tn ) − Y (k)¿ min f(X[n]+l; n ) ; 06i¡j6dn l=i+1

whence n1 () is  P min

less than or equal to j  P 62 f(X ) [n]+l; n 06i¡j6dn l=i+1   j P P P −26 f(X[n]+l; n )62 6 06i¡j6dn

=

P 06i¡j6dn

l=i+1

Z

P −2 −

j−1 P l=i+1

xl 6f(X[n]+j; n )62 −

j−1 P l=i+1

! xl

×dF2j−i−1 (xi+1 ; : : : ; xj−1 )6Cd2n ; where we used the Lipschitz continuity of F2 in the last equality. Moreover, since   {06Tn 6dn } ⊆ Y (Tn ) = max Y (k)¿ max Y (k) ; 06k6dn

we can conclude that  n2 () = P Y (Tn ) −

−dn 6k¡0

 max

−dn 6k¡0

Y (k)62; 06Tn 6dn

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

 6P =

 max

−dn 6k¡0

Y (k)6 max Y (k)6



Z R

06k≤dn



max

−dn 6k¡0

Y (k) + 2 −1



P x6 max Y (k)6x + 2 P ◦

6 Cdn 

101

06k6dn

max

−dn 6k¡0

Yk

(d x)

by Lemma 7:5:

Therefore, n ()6Cd2n , and an analogous argument gives the same upper bound for n (). Hence we arrive at Pn1 ()6Cd2n :

(4.14)

For Pn2 () we use the abbreviations n =[n]=n, tn (k)=([n]+k)=n, g=(1−)R12 −R21 and introduce the random walk Z (n) on Z de ned by  k  X    g(X[n]+j; n ); k¿0;   j=1 (n) Z (k) = −k X     g(X[n]−j+1; n ); k ¡ 0: −   j=1

L

Note that Z (n) = Z for all n ∈ N. Here Z is shorthand for Z (n) . Using the notation as in the proof of Theorem 4.1, we obtain the following decomposition: Yn (k) − Y (k) = Wn (k) + Rn (k) + Un (k) + Vn (k);

k ∈ Z;

where Wn (k) = w(n )[Zn (k) − EZn (k)] − w()[Z(k) − m(k)]; Rn (k) = n[w(tn (k)) − w(n )][rn (tn (k)) − Ern (tn (k))]; Un (k) = w(n )EZn (k) − w()m(k) and Vn (k) = n[w(tn (k)) − w(n )]Ern (tn (k)) − w0 ()kr(): It is easy to check that sup

−dn 6k6dn

|Un (k)|6Cn−1 d2n

∀n ∈ N

(4.15)

∀n ∈ N:

(4.16)

and, by boundedness of w00 , that sup

−dn 6k6dn

|Vn (k)|6Cn−1 d2n

Using (1.6) and applying (2.2) and (2.4) we obtain   −p p − (p)  C dn n ; 16p ¡ ∞; |Rn (k)| ¿  6 P sup −2 2 p=∞ 24n exp{−Cnd −dn 6k6dn n  };

(4.17)

for all n ¿ dn max( −1 ; (1 − )−1 ). Since dn = o(n=log n), (4.17) holds for eventually all n ∈ N. Furthermore, we have Wn (k) = [w() − w(n )][Z(k) − m(k)] + w(n )[Z(k) − m(k) − (Zn (k) − EZn (k))] = Wn1 (k) + Wn2 (k):

102

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Since |w() − w(n )|6Cn−1 the usual combination of Doob’s inequality, (A.3) and Hoe ding’s inequality yields   −p (p) −p  C d n ; 16p ¡ ∞; (4.18) |Wn1 (k)| ¿  6 P sup 2 p=∞ 2 exp{−Cn2 d−1 −dn 6k6dn n  }; for all  ¿ 0. We continue with sup

−dn 6k6dn

|Wn2 (k)|6Cn−1 dn + B

sup

−dn 6k6dn

|Zn (k) − Z(k)|:

(4.19)

With the same technique as for (2.6) in Ferger (1994b), one shows in the case of bounded K, that   |Zn (k) − Z(k)|¿ 6C0 dn exp(−C1 2 nd−2 (4.20) P sup n ) −dn 6k6dn

for all  ¿ 0. Now let (n ) be a positive sequence with n → 0 such that n−1 d2n = o(n ). Then, by (4.11) – (4.20), 2 −2 2 P(+ n 6= Tn )6C0 exp{−Adn } + C1 dn n + 24n exp{−Bndn n } −1 −1=2 dn (log n)1=2 for eventually p all n ∈ N. Choosing dn =a log n with a ¿ A and n =bn with b ¿ 2=B gives the assertion of Proposition 4.3 for bounded kernels. Finally, consider 3=2 ¡ p ¡ ∞. Following the proof of Theorem 4.1, we put

Zn∗ = Zn − EZn and de ne

(

Z˜ n (k) =

E(Zn∗ (k)|X[n]+1; n ; : : : ; X[n]+k; n );

06k6dn ;

E(Zn∗ (k)|X[n]; n ; : : : ; X[n]+k+1; n );

−dn 6k ¡ 0:

Since Z˜ n = Z − m + Vn it follows sup

−dn 6k6dn

|Wn2 (k)|6

sup

−dn 6k6dn

|Zn∗ (k) − Z˜ n (k)| +

sup

−dn 6k6dn

|Vn (k)|:

Consequently, by (4.7), we obtain for eventually all n ∈ N    n (p)−p + dpn n (p)−p : P sup |Wn2 (k)| ¿ n 6Cn−p dn2 (p) n−p + d (p) n −dn 6k6dn

(4.21) Choosing n = n−a and dn = nb with a ¿ 0; 0 ¡ b ¡ 1 and 2b + a ¡ 1, we can conclude from (4.11) – (4.18) and (4.21) that b (p) + n2b−a + nbp+ap− (p) + n−(2p−3) 1(3=2; 2) (p) + n (p)) P(+ n 6= Tn )6C(n

(4.22) for eventually all n ∈ N. Finding the best possible rate in (4.22) leads to the linear equations (p)b = a − 2b = (p) − bp − ap. Solving this linear system and plugging in the solutions gives the desired result for p¿p0 . If p ¡ p0 then n−(2p−3) is the leading term.

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

103

Example 4.5. Consider w(t) = t −a (1 − t)−b with 06a; b61. If p¿2, then n (p) is negligible by Remark 4.2 and therefore we have P(nn − [n] 6= Tn )6Cn−p=2(p+7) : Let n = L(n ) and  = L(T ) denote the distributions of n = nn − [n] and T , respectively. With the help of Proposition 4.3 we can easily establish rates of convergence for the variational distance between n and  because sup |n (B) − (B)|6P(nn − [n] 6= Tn ):

B∈B

(4.23)

Consequently, we obtain the same rates as in Proposition 4.3. 5. Exact rates of Lp -convergence In this section we investigate how fast n converges to  in the Lp -norm. The following lemma provides the basic tool. Lemma 5.1. Assume Ms ¡ ∞ for some s ¿ 2 and w with (1:4)– (1:6) is such that n (s) = o(n−p ). If (P) holds; then for all 0 ¡ p ¡ s=2 the sequence pn = (nn − [n])p is uniformly integrable. Proof. We have to show Z lim lim sup a→∞ n→∞

{|n |p ¿a}

|n |p dP = 0:

Now we have Z Z p 1{|n |p ¿a} |n | dP =

P(|n |p ¿ max(a; x))(d x) Z ∞ p P(|n |p ¿ x) d x = A + B: = aP(|n | ¿ a) + R+

a

From Theorem 2.10 we can infer that for all a ¿ x0p A6C1 a1−s=2p + C2 an−s=2 + C3 n (s) and in addition, taking into account |n |6n, that Z np Z np P(|n | ¿ x1=p )(d x)6C1 x−s=2p (d x) + C2 np−s=2 + C3 np n (s): B= a

a

Thus for all a¿x0p Z lim sup n→∞

{|n |p ¿a}

|n |p dP6Ca1−s=2p :

Let a tend to in nity to get the desired result. Exact rates of Lp -convergence are given in

104

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Theorem 5.2. Let Ms ¡ ∞ for some s ¿ 2. Assume w with (1:4)– (1:6) is continuously di erentiable at  and n (s) = o(n−p ). If (P) holds and T in (4:1) is a.s. unique then for all 0 ¡ p ¡ s=2 |T |p is integrable and

n − [n] ∼ ||T ||n−p n p

as well as n − [n] ∼ ||T ||n−1 n p

if 0 ¡ p ¡ 1

if 16p ¡ s=2:

Here ||X ||p :={E|X |p }min(1; 1=p) for a p-fold integrable random variable. L

Proof. Since |n |p → |T |p by Theorem 4.1, Lemma 4:1 and Theorem 5:4 in Billingsley (1968) give the desired result. Example 5.3. Let w(t) = t −a (1 − t)−b ; 06a; b ¡ 1, so that  −s=2 ; 06a ¡ 1=2; n n (s) = n−s=2 log n; a = 1=2;  −s(1−a) ; 1=2 ¡ a ¡ 1; n in the case of s ¿ 2 (see Example 2.3). Thus the condition n (s) = o(n−p ) is ful lled for all 06a61=2. If 1=2 ¡ a ¡ 1 one additionally has to require s ¿ p=(1 − a).

6. Rates of convergence in the Prokhorov metric and in the Ky-Fan metric In Sections 3 and 5 we saw that the sequence (n ) converges to  almost surely as well as in Lp . Especially we have stochastic convergence and convergence in distribution: n → ;

P

n→∞

(6.1)

L

n → ∞:

(6.2)

and n → ;

Again it is obvious to ask for rates of convergence in (6.1) and (6.2). To make this statement precise recall that stochastic convergence may be expressed by means of the Ky-Fan metric de ned on the set Z of all real-valued random variables de ned on a common probability space ( ; A; P): (X; Y ) = inf { ¿ 0: P(|X − Y |¿)6};

X; Y ∈ Z:

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

105

P

Let (Xn )n¿1 and X be in the pseudometric space (Z; ). Then Xn → X i (Xn ; X ) → 0. Analogously, convergence in law is equivalent to convergence in the pseudometric space (Z; ) with  denoting the Prokhorov metric, which is given by (X; Y ) = inf { ¿ 0: P(X ∈ A)6P(Y ∈ A ) + 

∀A ⊆ R closed};

where A = {y ∈ R: ∃x with |x − y| ¡ } is the -neighborhood of A. Hence (6.1) and (6.2) are equivalent to (n ; ) → 0

(6.3)

(n ; ) → 0:

(6.4)

and

Our goal now is to establish rates of convergence in (6.3) and (6.4). It is easy to prove that (X; Y )6 (X; Y )6 + P(|X − Y |¿)

(6.5)

for all X; Y ∈ Z and for all  ¿ 0. Since  is constant, Dudley’s (1968) Theorem 1 actually gives (n ; ) = (n ; )6 + P(|n − |¿):

(6.6)

Therefore, again Theorem 2.10 comes into play with which one can easily prove Theorem 6.1. Let Mp ¡ ∞ for some p ¿ 3=2 and w with (1:4)– (1:6). If (P) holds; then  √  −(2p−3)  ; 3=2 ¡ p61 + 22   n √ + C1 n (p): (n ; ) = (n ; )6C0 n−(p−1)=p ; 1 + 22 ¡ p ¡ 2     −p=(p+2) n ; 26p ¡ ∞ If K is bounded and (2:3) holds; then (n ; ) = (n ; )6Cn−1 log n. Inequality (6.5) is often used in the literature to obtain rates of convergence in functional limit theorems for partial sums processes. See Borovkov (1973), Komlos et al. (1974), Hausler (1984) or Sakhanenko (1985). With this in mind we may expect that also Theorem 6.1 establishes sharp bounds in the change-point analysis.

Appendix: Miscellaneous inequalities Lemma A.1. Let (i ; Fi )16i6n be a martingale di erence array and let Sk = 16k6n. Then we have n p n P P E|i |p for all p ¿ 2; E i 6Bp np=2−1 i=1

i=1

Pk

i=1

i ;

(A.1)

106

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

where Bp = (8(p − 1) max(1; 2p−3 ))p ; and n p n P P E|i |p for all 16p62: E i 62 i=1

i=1

(A.2)

Especially; if the 1 ; : : : ; n are i.i.d. and have expectation zero and mp = E|1 |p ¡ ∞; then n p ( Bp mp np=2 ; 2 ¡ p ¡ ∞; P (A.3) E i 6 i=1 2mp n; 16p62: For the proof of (A.1), see Dharmadhikari et al. (1968). A proof of (A.2) is given in Koroljuk and Borovskich (1994, p. 69). Lemma A.2. Let 1 ; : : : ; n ; n ∈ N; be independent and identically distributed X-valued random variables with distribution  and let h : X2 → R be a measurable mapping such that Z Z h(x; y)(d x) = h(y; x)(d x) = 0 -a:e: If mp = E|h(1 ; 2 )|p ¡ ∞ for some p¿1 then there exists a positive constant Ap which does not depend on n such that for all  ¿ 0 ! P l n P h(i ; j ) ¿  6Ap mp −p n (p) (A.4) P max 16l6n i=l+1 j=1 with (p) = max(p; 2). Proof. For p ¿ 2 this is Lemma 3:1 of Ferger (1994d). For 16p62 one has to replace Burkholder’s and Holder’s inequality therein by the inequality of von Bahr and Esseen (1965). Lemma A.3. (1) For all m ∈ N and  r r m;    r−1 m−1 P r l (m − l)−r 6 m log m;  l=1   1 1−r m;

r¿0 we have r ¿ 1; r = 1; 06r ¡ 1:

(2) For all m ∈ N and 06r ¡ 1 we have m−1 P l=1

l(m − l)−r 6

1 m2−r : 1−r

Proof. (1) For all p; q ¿ 1 with p−1 + q−1 = 1 it follows from Holder’s inequality that  m−1 1=p  m−1 1=q m−1 P r P pr P −qr −r l (m − l) 6 l l : l=1

l=1

l=1

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

Moreover, m−1 P l=1

pr

Z

l 6

m

1

xpr d x6

mpr+1 pr + 1

and similarly if, e.g. r ¿ 1, then Z m m−1 P −qr l 61 + x−qr d x61 + l=1

1

Thus we obtain m−1 P l=1

lr (m − l)−r 6



107

1 pr + 1

qr 1 = : qr − 1 qr − 1

1=p 

1 1 − 1=qr

1=q

(A.5)

mr+1=p :

Passing to the limit p ↑ ∞ (or equivalently q ↓ 1) gives (1) in the case of r ¿ 1. If 06r ¡ 1, replace (A.5) by m−1 P l=1

l−qr 6

m1−qr : 1 − qr

Finally if r = 1 then the sum in (1) simpli es to m−1 P l=1

m−1 P 1 l =m − m + 16m log m: m−l l=1 l

In the same manner one proves (2). The proof of the following lemma is elementary and can be found in Ferger (1995). Lemma A.4. Let T ⊆ R and f : T → R a function with the properties f has at least one maximizer; i:e: there exists a t0 ∈ T

with f(t)6f(t0 ) ∀t ∈ T

and − inf f(t) ¡ f(t0 ): t∈T

Then the set of maximizers of |f| coincides with that of f. Lemma A.5. Let 1 ; : : : ; n ; n ∈ N; be i.i.d. random variables with a distribution function F; which is Lipschitz continuous on R+ ; i.e. there exists a positive constant L such that F(x + h) − F(x)6Lh for all x¿0 and h¿0. Then the distribution function Pk Hn of Mn = max06k6n i=1 i satisÿes Hn (x + h) − Hn (x)6Lnh ∀x¿0 ∀h¿0 ∀n ∈ N: Proof. For n=1 we have H1 (x)=P(max(0; )6x)=F(x) for all x¿0, and the assertion is true by assumption on F. We shall prove the general case by induction on n. By

108

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109 L

Theorem 4 in Chow and Teicher (1978, p. 368), Mn+1 = max(0; Mn + ), where Mn and  ∼ F are independent. Thus for all x¿0 and h¿0 Hn+1 (x + h) − Hn+1 (x) Z Z x+h Hn (x + h − y)F(dy) − = =

−∞ Z x+h x

Hn (x + h − y)F(dy) +

x

−∞ Z x −∞

Hn (x − y)F(dy) [Hn (x − y + h) − Hn (x − y)]F(dy)

6F(x + h) − F(x) + Lnh6L(n + 1)h by the induction hypothesis. References Antoch, J., Huskova, M., 1998. Estimators of changes. In: Ghosh, S. (Ed.), Asymptotics, Nonparametrics and Time Series. M. Dekker, New York, pp. 533–578. Bhattacharya, P.K., Brockwell, P.J., 1976. The minimum of an additive process with applications to signal estimation and storage theory. Z. Wahrsch. Verw. Gebiete 37, 51–75. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Borovkov, A.A., 1973. On the rate of convergence for the invariance principle. Theory Probab. Appl. 18, 207–225. Chow, Y.S., Teicher, H., 1978. Probability Theory. Springer, New York. Csorg˝o, M., Horvath, L., 1988. Invariance principles for change-point problems. J. Multivariate Anal. 27, 151–168. Darkhovskh, B.S., 1976. A nonparametric method for the a posteriori detection of the “disorder” time of a sequence of independent random variables. Theory Probab. Appl. 21, 178–183. Dharmadhikari, S.W., Fabian, V., Jogdeo, K., 1968. Bounds on the moments of martingales. Ann. Math. Statist. 39, 1719–1723. Dudley, R.M., 1968. Distances of probability measures and random variables. Ann. Math. Statist. 39, 1563–1572. Dumbgen, L., 1991. The asymptotic behavior of some nonparametric change-point estimators. Ann. Statist. 19, 1471–1495. Ferger, D., 1994a. An extension of the Csorg˝o–Horvath functional limit theorem and its applications to change-point problems. J. Multivariate Anal. 51, 338–351. Ferger, D., 1994b. Asymptotic distribution theory of change-point estimators and con dence intervals based on bootstrap approximation. Math. Methods Statist. 3, 362–378. Ferger, D., 1994c. Change-point estimators in case of small disorders. J. Statist. Planning Inference 40, 33–49. Ferger, D., 1994d. On exact rates of convergence in functional limit theorems for U-statistic type processes. J. Theoret. Probab. 7, 709–723. Ferger, D., 1995. Change-point estimators based on weighted empirical processes with applications to the two-sample problem in general measurable spaces. Habilitationsschrift, University of Giessen (in German). Ferger, D., Stute, W., 1992. Convergence of change-point estimators. Stochastic Process. Appl. 42, 345–351. Ganssler, P., Stute, W., 1977. Wahrscheinlichkeitstheorie. Springer, Heidelberg. Hausler, E., 1984. An exact rate of convergence in the functional central limit theorem for special martingale di erence arrays. Z. Wahrsch. Verw. Gebiete 65, 523–534. Hoe ding, W., 1963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13–30. Komlos, J., Major, P., Tusnady, G., 1974. Weak convergence and embedding. In: Revesz, P. (Ed.), Colloquia Mathematica Societations Janos Bolyai, Vol. 11. Limit Theorems of Probability Theory. North-Holland, Amsterdam, pp. 149–165.

D. Ferger / Journal of Statistical Planning and Inference 92 (2001) 73–109

109

Koroljuk, V.S., Borovskich, Yu.V., 1994. Theory of U-Statistics. Kluwer Academic Publishers, Dordrecht. Sakhanenko, A.I., 1985. Estimates in invariance principle. Proc. Inst. Math. Novosibirsk 5, 37–44. Szyszkowicz, B., 1991. Change-point problems and contiguous alternatives. Statist. Probab. Lett. 11, 299–308. van der Vaart, A.W., Wellner, J.A., 1996. Weak Convergence and Empirical Processes. Springer, New York. von Bahr, B., Esseen, C.-G., 1965. Inequalities for the r-th absolute moment of a sum of random variables, 16r62. Ann. Math. Statist. 36, 299–303. Yao, Y.-C., Huang, D., Davis, R., 1994. On almost sure behavior of change-point estimators. In: Carlstein, E., Muller, H.-G., Siegmund, D. (Eds.), Change-Point Problems, IMS Lecture Notes-Monograph Series, Vol. 23, pp. 359 –372.