Tests for continuity of regression functions

Tests for continuity of regression functions

Journal of Statistical Planning and Inference 137 (2007) 753 – 777 www.elsevier.com/locate/jspi Tests for continuity of regression functions Jaromír ...

367KB Sizes 0 Downloads 66 Views

Journal of Statistical Planning and Inference 137 (2007) 753 – 777 www.elsevier.com/locate/jspi

Tests for continuity of regression functions Jaromír Antocha,∗ , Gerard Gregoireb , Marie Huškováa a Department of Statistics, Faculty of Mathematics and Physics, Charles University of Prague, Sokolovská 83, CZ-186 75 Praha 8, Czech Republic b Université Joseph Fourier, Grenoble, France

Available online 21 July 2006

Abstract Tests for continuity of regression functions based on local linear estimators are developed and their limit properties are studied. A bootstrap method is proposed to get approximations for critical values. A small simulation study was conducted in order to check how the tests works in finite sample situation. © 2006 Elsevier B.V. All rights reserved. MSC: 62G20; 62E20; 60F17 Keywords: Non-smooth regression; Local linear estimators; Limit properties

1. Introduction We consider the regression model: Yi = m(Xi ) + (Xi ) i ,

i = 1, . . . , n,

(1.1)

where (X1 , Y1 ), . . . , (Xn , Yn ) are independent identically distributed (i.i.d.) random vectors with 1 , . . . , n being i.i.d. random errors with E i = 0,

var i = 1

and

E |i |2+ < ∞

(1.2)

with some  > 21 . The regression function m(·), the variance function 2 (·) and the density f (·) of Xi are unknown functions that are supposed to be smooth except possibly a finite number of points. The density f (·) is bounded away from 0 on 0, 1 and is equal to 0 outside of 0, 1. We are interested in the testing problem concerning the smoothness of the regression function m(·), namely, we want to test H0 : m(·) is smooth function on 0, 1

(1.3)

against H1 : m(·) has at least one jump in (0, 1).

(1.4)

∗ Corresponding author.

E-mail addresses: [email protected] (J. Antoch), [email protected] (G. Gregoire), [email protected] (M. Hušková). 0378-3758/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2006.06.007

754

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

The aim of the paper is to develop procedures for testing problem (H0 , H1 ) based on nonparametric estimators of m(·) and 2 (·), particularly, based on their locally linear estimators. Most of the papers in the area of nonparametric regression deals with efficient estimation of smooth regression curves, see, e.g., Gijbels and Fan (1996); Antoch et al. (2003) or Hart (1997) for basic information. During the last 10 years some papers on estimation of location jumps in otherwise smooth regression function have appeared, e.g., Wu and Chu (1993a, b), Müller and Song (1997), Eubank and Speckman (1994) and Eubank et al. (1995). However, as soon as one suspects that the regression function m(·) can have some discontinuity points, one should start with testing continuity versus discontinuity of the regression function, eventually to estimate location(s) of jump(s) and finally to estimate the regression function. There are only few papers concerning this testing problem. Müller and Stadtmüller (1999) developed a test for (1.3) versus (1.4) in the model (1.1) with constant nonzero variance and fixed design Xi = i/n, i = 1, . . . , n, based on simple quadratic type test statistics. Hor´vath and Kokoszka (2002) developed test procedure based on one-sided local polynomial estimators for the same model as Müller and Stadtmüller (1999). Our test procedure is related to the procedure proposed by Hor´vath and Kokoszka (2002), however, we admit a more general model. Moreover, we provide also estimation of location of jumps. Particularly, our procedure is based on the supremum of properly standardized absolute values of the difference of the one-sided local linear estimators of the regression function m(·). We derive the limit distribution of the proposed test statistic under the null hypothesis which appears to belong to extreme value type ones (see Theorems 3.1 and 3.2 below). This provides approximations for the critical values. The rest of the paper is organized as follows. The test procedure is developed in Section 2. The results on the limit behavior both under the null and alternative hypotheses are stated and discussed in Section 3. Approximations to the desired critical values based on resampling methods are discussed in Section 4. An illustrative example is presented in Section 5. The proofs of theorems from Sections 3 and 4 are contained in Section 6. 2. Test statistics Our test procedure for (H0 , H1 ) is based on the local linear estimators of lim m(y) = m+ (x) and

y→x+

lim m(y) = m− (x),

y→x−

i.e., of the left and right limits of m(·) at point x, x ∈ (0, 1). We recall some basic notions. The local linear estimator + (x) of m+ (x) is defined as m + (x) =  m + (x), where ( + (x),  + (x)) minimizes n 

 (Yi − (x) − (x)(x − Xi ))2 k+

i=1

x − Xi hn

 ,

w.r.t. (x) and (x), k+ (·) is a one-sided kernel function with the support −1, 0 and hn is a bandwidth. + (x) of m+ (x) can be expressed as The local linear estimator m n i=1 wi+ (x) Yi + (x) =  m , x ∈ 0, 1), n i=1 wi+ (x) where  wi+ (x) = k+

x − Xi hn

⎧  ⎨ n ⎩

j =1

 k+

x − Xj hn

 (x − Xj )2 − (x − Xi )

n  j =1

 k+

x − Xj hn



(2.1) ⎫ ⎬

(x − Xj ) . ⎭

(2.2)

− (·) and wi− (·) are defined accordingly based on the kernel k− (·) related to k+ (·) as Now, m k− (x) = k+ (−x), i.e.,

x ∈ R1,

n i=1 wi− (x) Yi − (x) =  m , n i=1 wi− (x)

(2.3)

x ∈ (0, 1,

(2.4)

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

where  wi− (x) = k−

x − Xi hn

⎧  ⎨ n ⎩

 k−

j =1

x − Xj hn

 (x − Xj )2 − (x − Xi )

n  j =1

 k−

x − Xj hn

755



⎫ ⎬

(x − Xj ) . ⎭

(2.5)

It is expected that the difference − (x)|, | m+ (x) − m

x ∈ (0, 1),

is sensitive w.r.t. to the jumps in the regression function m(·). Large values for some x indicate possible jump(s) of the + (x) − m − (x), x ∈ (0, 1), could regression function m(·) in a neighborhood of x. Therefore, functionals based on m be used to construct test statistics. This leads to the max-type test statistics defined as follows  nhn − (x)| , Tn = Tn (k± , hn ) = sup (2.6) · | m+ (x) − m  v (x) x∈hn ,1−hn  where  v (x), x ∈ (0, 1), is an estimator of the function v(x) =

2+ (x) 2− (x) + , f+ (x) f− (x)

x ∈ (0, 1),

with the property, as n → ∞, f

x∈hn ,1−hn −(Dn (hn )∪Dn (hn ))

sup

 −1 −1 ) · | v (x) − v (x)| = oP (1), (log h−1 n



sup

   v −1 (x) = OP (1),

(2.7)

(2.8)

f

x∈(Dn (hn )∪Dn (hn ))

where Dn (hn ) and Dn (hn ) are unions of hn -neighborhoods of the discontinuity points of f (·) and 2 (·), respectively, for details see (6.1)–(6.3). Here f

f+ (x) = lim f (y), y→x+

f− (x) = lim f (y), y→x−

(2.9)

with f (x) being the density of Xi , and 2+ (x) = lim 2 (y), y→x+

2− (x) = lim 2 (y). y→x−

(2.10)

Alternatively, one can introduce the sum-type test statistics defined as Tn (q) = Tn (k± , hn , q) =

n 1  q(Xi ) − (Xi )|, · | m+ (Xi ) − m n  v (Xi )

(2.11)

i=1

where q(·) is a weight function. We will not study properties of these test statistics here. The proposed test statistics Tn and Tn (q) are functionals of a weighted distances between the estimators of m+ (x) and m− (x), in other words, for each x we compare estimators m(x) based on observations (Xi , Yi ) for which Xi ’s are in the right or left neighborhood of x, respectively. The function

nhn − (x) |, x ∈ hn , 1 − hn  + (x) − m · |m  v (x) can be used for diagnostic purposes because it has peaks close to the jump points of m(·) and, therefore, it can be useful for estimation of jump points, eventually for estimation of the number of jumps.

756

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Large values of the test statistic Tn indicate that the null hypothesis H0 is not true, so that the -level test has the rejection region Tn cn ()

(2.12)

with cn () defined as PH0 (Tn cn ()) = , where PH0 denotes the probability corresponding to H0 . An approximation to cn () can be obtained through the limit distribution of Tn under H0 or through resampling methods. As soon as the variance function 2 (·) and the density function f (·) are continuous at x, the function v(·) is also continuous at x and v(x) = 22 (x)/f (x), x ∈ hn , 1 − hn . If the variance function 2 (·) is constant and if the density function f (x) of Xi is uniform on (0, 1), then the test statistic Tn reduces to  nhn + (x) − m − (x) | , · |m max hn  x  1−hn 2 2n where 2n is an estimator of 2 = var Yi with the property, as n → ∞, (log h−1 2n − 2 ) = oP (1). n )( One can think about a number of other modifications, e.g., to use local polynomial estimators instead of the local linear ones, see some remarks att the end of the Section 3. Along the same line one can also develop test procedures for testing on smoothness of the variance function 2 (·). The results are formulated for random designs, however, one can introduce accordingly the test statistics for the case of fixed design, for details see Remark 3.5). For the fixed design situation with Xi = i/n, i = 1, . . . , n, the considered test statistic Tn reduce to the kernel type test statistics considered in the change point analysis as considered by Hušková and Slabý (2001). A similar type of test statistics were considered by Grégoire and Hamrouni (2002), who constructed a test for smoothness at a fixed point. Hor´vath and Kokoszka (2002) considered local polynomial estimators, fixed design with equidistant design points and 2 (x) = 2 > 0, x ∈ 0, 1. 3. Limit distribution of Tn under H0 and H1 Throughout this section it is assumed on the model: (A.0) (X1 , Y1 ), . . . , (Xn , Yn ) are i.i.d. random vectors such that Yi = m(Xi ) + (Xi ) i ,

i = 1, . . . , n,

and random vectors (1 , . . . , n ) and (X1 , . . . , Xn ) are independent. (A.1) 1 , . . . , n are i.i.d. random variables with zero mean, unit variance and E |i |2+ < + ∞ for some  > 1/2. (A.2) The regression function m(x), x ∈ 0, 1, has the Lipschitz first derivative, i.e., |m (x) − m (y)| c|x − y| ∀x, y ∈ 0, 1, with some c > 0; the derivatives at the end points are one-sided. (A.3) The variance function 2 (x), x ∈ 0, 1, is bounded away from 0 and can be decomposed as 2 (x) = 20 (x) +

p 

bj I {x aj },

x ∈ 0, 1,

j =1

where 20 (x) is Lipschitz on 0, 1 and b1 , . . . , bp , a1 , . . . , ap are real numbers such that 0 < a1 < · · · < ap < 1 and bj = bj +1 , j = 1, . . . , p − 1.

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

757

(A.4) The density f (x) of Xi is bounded away from 0 on [0, 1], is equal to zero outside 0, 1 and can be decomposed as f (x) = f0 (x) +

q 

dj I {x cj },

x ∈ 0, 1,

j =1

where f0 (·) is Lipschitz on 0, 1 and d1 , . . . , dq , c1 , . . . , cq are real numbers such that 0 < c1 < · · · < cq < 1 and dj = dj +1 , j = 1, . . . , q − 1, ai = cj , i = 1, . . . , p, j = 1, . . . , q. The assumptions on the bandwidth {hn } are the following: (A.5) {hn } is a sequence of positive numbers such that (2+)/

lim

n(h2n + hn

2

n→∞

(log n)

)

= ∞ and

lim nh5n log2 n = 0.

n→∞

The assumptions on the kernels k+ (·) and k− (·) are the following: (A.6.1) The kernels k− (·) and k+ (·) are functions equal to zero outside of the intervals 0, 1 and −1, 0, respectively, their first derivatives (with one-sided derivatives at the end points) are Lipschitz and  1 2  1  1 k− (x) dx x 2 k− (x) dx − x k− (x) dx = 1, 0

0

0



k− (0) = k− (1)

1 0

u(1 − u)k− (u) du = 0,

k+ (x) = k− (−x),

∀ x ∈ R1.

Here limy→0+ k− (y) = k− (0) and limy→1− k− (y) = k− (1). (A.6.2) The kernels k− (·) and k+ (·) are functions zero outside of the intervals 0, 1 and −1, 0, respectively, and such that their first derivatives (with one-sided derivatives at the end points) are Lipschitz and  1 2  1  1 k− (x) dx x 2 k− (x) dx − x k− (x) dx = 1, 0

0

k− (0) = 0,

and/or

k+ (x) = k− (−x),

 k− (1)

0

u(1 − u)k− (u) du = 0,

∀ x ∈ R1.

Remark 3.1. Notice that the assumption (A.3) admits that the variance function 2 (·) is a sum of a smooth function and a jump function. If b1 = · · · = bp = 0, then the variance function 2 (·) is smooth. Similar remarks can be made on the density function f (·). It is assumed that 2 (·) and f (·) have no jumps in the same points. Remark 3.2. The assumption (A.5) is satisfied for hn = n− for  ∈

1

1 5 , min( 2 ,

 /( + 2) .

Remark 3.3. Notice that k+ (·) and k− (·) are one-sided kernels and that the assumption  2 1 2 1 k (x) dx x k (x) dx − x k (x) dx = 1 reflects standardization that simplifies the formulation of the − − − 0 0 0 assertions, moreover it means that k− (·) cannot be constant on 0, 1 and that  1  1 k− (x) dx x 2 k− (x) dx > 0. 1

0

0

Now, we formulate the main assertions on the limit behavior of the test statistic Tn under the null hypothesis.

758

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Theorem 3.1. Let assumptions (A.0)–(A.5) and (A.6.1) be satisfied and let the estimator  v (·) satisfies (2.7). Then for all x ∈ R 1 , as n → ∞, ⎛ ⎞ Tn ⎜ ⎟ P ⎝a(hn )   x + b1 (hn )⎠ → exp{−2 exp{−x}}, ∗2 (u) du 2 k+ where a(h) =

(3.1)

 2 log(h−1 ),

0 < h < 1,  ∗

(k (t))2 dt 1 b1 (h) = 2 log(h−1 ) + log  +∗2 − log(), 2 4 k+ (t) dt where ∗ k± (z) = k± (z)



 u k± (u) du − z 2

 uk ± (u) du ,

(3.2) 0 < h < 1,

(3.3)

z ∈ R1,

(3.4)



∗ (·) denotes the derivative of k ∗ (·). and k± ±

Theorem 3.2. Let assumptions (A.0)–(A.5) and (A.6.2) be satisfied and let the estimator  v (·) satisfies (2.7). Then for all x ∈ R 1 , as n → ∞, ⎛ ⎞ Tn ⎜ ⎟ P ⎝a(hn )   x + b2 (hn )⎠ → exp{−2 exp{−x}}, ∗2 (u) du 2 k+

(3.5)

where a(h) is defined by (3.2) and b2 (h) = 2 log h−1 +

2k ∗2 (0) + k ∗2 (−1) 1 1 log log h−1 + log +  ∗2 + − log , 2 2 2 k+ (x) dx

0 < h < 1.

(3.6)

+ (x) − m − (x), x ∈ hn , 1 − hn  − Next, we formulate an assertion on a uniform weak invariance principle for m f (Dn (hn ) ∪ Dn (hn )), that provides the crucial tool in the proofs of Theorems 3.1 and 3.2 and is of its own importance. f For the definition of Dn (hn ) and Dn (hn ) see (6.1)–(6.3). Theorem 3.3. Let assumptions of Theorems 3.1 or 3.2 be satisfied. Then there exists a sequence of Wiener processes Wn = {Wn (t) : t 0} such that, as n → ∞,

 nhn ∗ ∗ − (x)) = (k+ m+ (x) − m (x/ hn − z) − k− (x/ hn − z)) dWn (z) + oP ((log n)−1 ) (3.7) (  v (x) uniformly for x ∈ hn , 1 − hn  − (Dn (hn ) ∪ Dn (hn )) and  nhn + (x) − m − (x) | = OP (1) sup |m  v (x) {x:|x−x0 |  hn } f

for any x0 ∈ hn , 1 − hn . The proofs are postponed to Section 6. Notice that the process   ∗  ∗ k+ (x/ hn − z) − k− (x/ hn − z) dW (z),

x ∈ hn , 1 − hn 

(3.8)

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

759

is a stationary Gaussian process with the correlation function   ∗  ∗ ∗ ∗ k+ (x/ hn − z) − k− (x/ hn − z) (k+ (y/ hn − z) − k− (y/ hn − z)) dz, x, y ∈ R 1 , that does not depend on X1 , . . . , Xn . Going through the proof we find that the Wiener process Wn in (3.7) is related to 1 , . . . , n and not to X1 , . . . , Xn . Remark 3.4. The limit distribution of Tn under the null hypothesis belongs to the extreme value type. Moreover, it depends on the choice of the kernel. The limit behavior of Tn remains unchanged if the maximum in Tn is taken over a suitable smaller set of x, e.g., if all hn -neighborhoods of discontinuity points of f (·) and 2 (·) are omitted. Notice that limn→∞ (b2 (hn ) − b1 (hn )) = ∞. Remark 3.5. The above theorems remain true also for various modifications of the assumptions. We discuss some of them. The assertions remain true if instead of the random design (Xi ’s are random variables) we consider the fixed design x1n , . . . , xnn such that there is an increasing function F (x) on [0, 1], limx→−∞ F (x) = 0, limx→∞ F (x) = 1, having the bounded away from zero derivative f that is Lipschitz and such that, as n → ∞, max | xin − F −1 (i/n) | = o(n−1 ).

1i n

For such fixed designs the assumption (A.1) on the error terms can be relaxed, e.g., the errors term can be dependent, particularly, the assumption (A.1) can be replaced by the assumption (A∗ .1) there exists a sequence of Wiener processes {Wn (t) : t > 0} such that, as n → ∞  k      max  i − Wn (k) = OP (n1/(2+) )  1k n  i=1

with some positive constant  and some positive . This is satisfied, e.g., if 1 , . . . , n form a linear process i =

∞ 

vj ei−j ,

i = 1, 2, . . . ,

j =0 2+ < ∞ for some  > 1 and the where {ej }∞ j =−∞ are i.i.d. random variables with zero mean, nonzero variance, E |ej | 2 weights {vj }∞ j =0 satisfy ∞ 

j |vj | < ∞,

j =0

∞ 

vj = 0.

j =0

Remark 3.6. The approximations to the critical values can be easily calculated from (3.1) or (3.2) using only a pocket calculator. Another possibility is to apply some of the resampling methods as discussed in Section 4. Remark 3.7. The function v(·) can be estimated by  v (x) =

2− (x) 2+ (x) + , f− (x) f+ (x)

x ∈ hn , 1 − hn ,

(3.9)

± (x) are (one-sided) kernel estimators of f± (x) and ± (x), respectively. Particularly, we can choose where f± (x) and for x ∈ hn , 1 − hn  f± (x) =

nhn



  n  x − Xi 1 , k± hn k± (u) du i=1

(3.10)

760

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

and   n  x − X(i) 1 2± (x) = n , (Y (i) − Y (i − 1))2 k± hn 2 i=1 k± ((x − Xi )/ hn )

(3.11)

i=2

where X(1)  · · · X(n) are order statistics corresponding to X1 , . . . , Xn and Y (1), . . . , Y (n) denote the corresponding observations. If the assumptions (A.0)–(A.5) and (A.6.1) or (A.6.2) are satisfied, then, as n → ∞,      = oP (1) sup (3.12) (log h−1 n ) f± (x) − f± (x) f

x∈hn ,1−hn −Dn (hn )

and

   2  2 (log h−1 n )  ± (x) − ± (x)  = oP (1),

 sup f

x∈hn ,1−hn −(Dn (hn )∪Dn (hn ))

(3.13)

where Dn (hn ) and Dn (hn ) are the unions of hn -neighborhoods of the discontinuity points of f (·) and 2 (·), respectively. Moreover, f

sup

f± (x) = OP (1)

(3.14)

sup

±−2 (x) = OP (1)

(3.15)

x∈(x0 −hn ,x0 +hn )

and x∈(x0 −hn ,x0 +hn )

for any x0 ∈ hn , 1 − hn . ± (x). They have to The kernels and the bandwidths need not be identical with those generating the estimators m fulfill proper assumptions. For the proofs see Lemma 6.3 below. The test statistics for (H0 , H1 ) based on the local polynomial estimators can be developed along the same lines and ∗ (·). Hamrouni (1999) in the corresponding test statistic has the same limit behavior as Tn with properly modified k± his PhD thesis developed and studied a test based on the local linear estimators for the test that the regression function m(·) is continuous at the fixed point x0 against the alternative that m(·) has the jump in x0 . He also studied related estimator of the jump location. He made also a simulation study to check the performance of the proposed procedures for the finite sample sizes. The assertion of Theorem 3.2 coincides with the results of Hor´vath and Kokoszka (2002) where they consider the model with the regression function m(·) being constant under the null hypothesis, constant variance function 2 (·), xin = i/n, i = 1, . . . , n, and with local polynomial estimators. Hušková and Slabý (2001) derived the limit behavior of the test statistics corresponding to the kernel estimators in the model with the regression function m(·) being constant under the null hypothesis, constant variance function 2 (·) and xin = i/n, i = 1, . . . , n, and kernel type estimators. They proved that Theorems 3.1 and 3.2 hold true for these test ∗ (·) is replaced by k (·). statistics if k± ± At the end of this section we briefly discuss the consistency of the test based on Tn . To avoid cumbersome expressions we present the results for a smooth density f (·) of the design points, i.e. in (A.4) we put f (x)=f0 (x), a smooth variance function 2 (·), i.e. in (A.3) we put 2 (x) = 20 (x) and the regression function m(·) with one jump only. Particularly, we assume that the regression function m(·) has the form m(x) = m0 (x) + I {x x0 },

x ∈ 0, 1,

(3.16)

where m0 (·) satisfies (A.2),  = 0 and x0 ∈ (0, 1). Theorem 3.4. Let assumptions (A.0), (A.1), (A.3) with (·) = 0 (·), (A.4) with f (·) = f0 (·), (A.5) and (A.6.1) or (A.6.2) be satisfied and let the estimators ± (·) and f± (·) be defined by (3.10) and (3.11), respectively. Let the regression

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

761

function m(·) satisfy (3.16). Then, as n → ∞, there exists a sequence of Wiener processes Wn = {Wn (t) : t 0} such that, as n → ∞,

  ∗  nhn ∗ − (x)) = k+ (x/ hn − z) − k− (x/ hn − z) dWn (z) m+ (x) − m (  v (x)

 x−x0 / hn  x−x0 / hn nhn ∗ ∗ + k+ (y) dy·I {x x0 −hn }− k− (y) dy·I {x x0 } v(x) −1 0

! " log n −1 + oP (log n) + (3.17) nhn uniformly for x ∈ hn , 1 − hn  − (Dn (hn ) ∪ Dn (hn )) and  nhn − (x) | = OP (1) + (x) − m |m sup  v (x) {x:|x−x0 |  hn } f

for any x0 ∈ hn , 1 − hn . Moreover, the test is consistent. Remark 3.8. More detailed investigation of the asymptotic behavior of Tn under alternatives would give information on the optimal choice of the bandwidth hn and the kernel k± (·). It will be done in a different paper. Remark 3.9. Clearly, the term on the right-hand side of (3.17) has two components. The former one is random and the latter one nonrandom that dominates the random one in a neighborhood of the jump point. Theorem 3.4 implies that under the considered assumptions, as n → ∞,

# # f (x0 ) Tn = nhn || ( log n). + O P 22 (x0 ) Remark 3.10. Going through the proofs, we find out that a jump in the regression function m(·) changes the limit distribution of Tn , particularly, there is a c > 0 such that under H1 (log n)−1/2 Tn → ∞ while under H0 (log n)−1/2 Tn = OP (1). However, jump(s) in either the variance function 2 (·) or the density f (·) do not influence the limit behavior of Tn . Particularly, in estimation of v(x) one could use estimators that do not take into account possible jumps of 2 (·) or f (·). 4. Resampling It is well known that the convergence in Theorems 3.1 and 3.2 is rather slow so that the asymptotic critical values based on these theorems will not provide a good approximation to the critical values for small and moderate sample sizes. It appears useful to apply some of the resampling methods. Here we apply the bootstrap based on estimated residuals without replacement which is connected also with the permutation principle used in the classical testing, for more details see, e.g. Good (2000) and Lehmann (1991). Some more insight into this approach in testing for smoothness can be found, e.g., in Antoch and Hušková (2001), Hušková and Slabý (2001) and Hušková (2004). By the assumptions the error terms 1 , . . . , n are i.i.d. Therefore, using the general permutation principle we permute 1 , . . . , n and work with the permutation counterpart of Tn , where 1 , . . . , n are replaced by R1 , . . . , Rn , {R1 , . . . , Rn } being a random permutation of {1, . . . , n}. Since 1 , . . . , n are unknown, we replace them by their estimators, more precisely, by the standardized estimated residuals e˜1 , . . . , e˜n and permute these residuals. The permutation counterpart Tn (R) of Tn is developed as follows. We define the estimated residuals by  i =

(Xi ) Yi − m , (Xi )

i = 1, . . . , n,

(4.1)

762

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

(Xi ) and (Xi ) are estimators of m(Xi ) and (Xi ), respectively, with the properties, as n → ∞, where m 1  (Xi ) − m(Xi ) |2+ = oP (1), |m n

(4.2)

i

max 

    −2   (Xi ) − −2 (Xi )  +

max 

   −2    (Xi ) − −2 (Xi )  +

{i:Xi ∈D / n (hn )}

and {i:Xi ∈Dn (hn )}

max f

{i:Xi ∈D / n (hn )}

max

f {i:Xi ∈Dn (hn )}

% $  f(Xi ) − f (Xi )  = oP (1)

(4.3)

% $  f(Xi )  = OP (1)

(4.4)

for some > 0. This is satisfied by a number of estimators. For example by Lemma 6.4 and its corollaries one can choose (Xi ) = ( − (Xi )) /2, m m+ (Xi ) + m

(4.5)

(Xi ) = ( + (Xi ) + − (Xi )) /2,   f(Xi ) = f+ (Xi ) + f− (Xi ) /2,

(4.6) (4.7)

± (·) and ± (·) are defined by (2.1), (2.4) and (3.11), respectively. The standardized estimated residuals are where m defined by   j i − n1 nj=1 & , i = 1, . . . , n. (4.8) i =    n n 1 1 2  (  − )  k j k=1 j =1 n n Let R = (R1 , . . . , Rn ) be a random permutation of {1, . . . , n} independent on (Yi , Xi ), i = 1, . . . , n. Then the permutation counterpart of Tn is defined as  √  + (x, R) − m − (x, R) | , Tn (R) = sup nh f(x)/2 | m (4.9) {x∈hn ,1−hn }

where

n Ri wi± (x) i=1& ± (x, R) =  m , n i=1 wi± (x)

(4.10)

and f(x) is an estimator of f (x) with the properties sup f x∈hn ,1−hn −Dn (hn )

 {(log h−1 n ) · |f (x) − f (x)|} = oP (1),

sup

| f(x) | = OP (1).

(4.11)

f

{x∈Dn (hn )}

Since the distribution of R is known, the conditional distribution of Tn (R) given (Yi , Xi ), i =1, . . . , n, is also known. Then the corresponding (1 − )100% quantile is used as an approximation to the desired critical value cn (). The question that arises immediately is whether this approximation is reasonable. Next, we formulate results that answer this question at least asymptotically. A simulation study should be performed in order to find how good is the approximation in finite sample sizes. Theorem 4.1. Let assumptions (A.0), (A.2)–(A.5) and (A.6.1) be satisfied, let m(·) satisfy either (A.2) or (3.16) and (·) and f(·) satisfy (4.2)–(4.4). Then for all x ∈ R 1 , as n → ∞, let the estimators (·), m ⎛ ⎞ Tn (R) P ⎜ ⎟ P ⎝a(hn )   x + b1 (hn )|(Y1 , X1 ), . . . , (Yn , Xn )⎠ − exp{−2 exp{−x}} → 0, ∗2 (u) du 2 k+

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

763

where P (· | (Y1 , X1 ), . . . , (Yn , Xn )) denotes the conditional probability, and a(h) and b1 (h) are defined by (3.2) and (3.3), respectively. Theorem 4.2. Let assumptions (A.0), (A.2)–(A.5) and (A.6.2) be satisfied, let m(·) satisfy either (A.2) or (3.16) and (·) and f(·) satisfy (4.2)–(4.4). Then for all x ∈ R 1 , as n → ∞, let the estimators (·), m ⎛



Tn (R) P ⎜ ⎟ P ⎝a(hn )   x + b2 (hn )|(Y1 , X1 ), . . . , (Yn , Xn )⎠ − exp{−2 exp{−x}} → 0, ∗2 2 k+ (u) du where a(h) and b2 (h) are defined by (3.2) and (3.6), respectively. It is of great importance that the assertions hold true when observations follow H0 and alternatives considered in Theorem 3.4. Therefore, the permutation principle or equivalently resampling without replacement provides approximations to the critical values. We reject the hypothesis on the level  if Tn > cn (, (Y, X)), where cn (, (Y, X)) is the 100(1 − )% conditional quantile of Tn (R), given (Y, X). Going carefully through the proofs one realizes that the above assertions remain true even for bootstrap without replacement, i.e. boostrapping without replacement of the standardized estimated residuals& 1 , . . . ,& n . It can be shown that under H0 , as n → ∞, Tn − TN (R) = oP ((log n)−1 ).

5. Illustrative example To demonstrate the usefulness of our test procedure and to show its “sensitivity”, we prepared several illustrative examples. Taking into account limited space we have for our contribution, only a small part of it is presented here. As the basic model, of course, not the only one, we used the straight line with one jump, i.e.,  Yi =

a + b sin(xi /c) + (xi )e, a + + b sin(xi /c) + (xi )e,

i = 1, . . . , m, i = m + 1, . . . , n.

(5.1)

The values xi , x1 < x2 < · · · < xn , were simulated from the uniform distribution R(0, 1). Evidently, aside the model (5.1) several other forms of m(·) were considered. As concerns of (xi )’s, we used the linear model for them as well. More precisely, our (xi )’s also follow the straight line. Typical data are shown in Fig. 1. Concerning the “free” parameters, we played with the number of observations, location of the change point and, especially, with the size of a change (size of a jump ). Evidently, changing the value of show us how “small” changes we are able to detect using our procedure. Aside the points mentioned above, we were also interested whether the use of resampling, for details see the Section 4, is worth to be used despite being quite time consuming in the given setup. − (x) are shown. Analogously, in Fig. 2 the data with In Fig. 1 the data together with the “left-hand side” estimator m + (x) are shown. Fig. 3 presents the values of the function and the “right-hand side” estimator m

tn (x) =

nhn − (x)| , | m+ (x) − m  v (x)

(5.2)

764

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Data with the left sided estimator 4 3.8 3.6 3.4 3.2 3 2.8 2.6 2.4 2.2 2 0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.7

0.75

Fig. 1. Data together with the left side estimators.

Data with the right sided estimator 4 3.8 3.6 3.4 3.2 3 2.8 2.6 2.4 2.2 2 0.25

0.3

0.35

0.4

0.45

0.5 x

0.55

0.6

0.65

Fig. 2. Data together with the right side estimators.

on which the test statistic Tn is based. In a bit familiar way we say throughout the paper that the values of the test statistic Tn are shown. Finally, Fig. 4 presents the smoothed density of X and smoothed standard deviations of the error term. Concerning the choice of hn , a cross-validation has been used.

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

765

TEST STATITISTICS Tn 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0.25

0.3

0.35

0.4

0.45

0.5 x

0.55

0.6

0.65

0.7

0.75

0.6

0.65

0.7

0.75

0.65

0.7

0.75

Fig. 3. Values of test statistics.

kernel estimator of X 1.1 1.05 1 0.95 0.9 0.25

0.3

0.35

0.4

0.45

0.5 x

0.55

kernel estimator of sigma 0.6 0.5 0.4 0.3 0.2 0.1 0 0.25

0.3

0.35

0.4

0.45

0.5 x

0.55

0.6

Fig. 4. Values of smoothed density of X and smoothed standard deviations of the error term.

6. Proofs We concentrate on the proof of Theorem 3.3 because it contains the fundamental results and all others are its consequences or modifications. The proof is quite technical. The main steps correspond to the lemmas, Lemma 6.2 containing the key assertion. Notice that the index n is omitted whenever possible. Throughout the proofs following

766

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

notation is used:  Dn− (h) =

p '

aj − h, aj ,

 Dn+ (h) =

i=1 f

Dn− (h) =

q '

p '

aj , aj + h,

i=1

cj − h, cj ,

i=1

f

Dn+ (h) =

q '

cj , cj + h,

(6.1)

i=1

  (h) ∪ Dn− (h) Dn (h) = Dn+

(6.2)

and f

f

f

Dn (h) = Dn+ (h) ∪ Dn− (h).

(6.3)

 (h) and D  (h) are h-left and h-right neighborhoods of the discontinuity points of , respectively, and Notice that Dn− n+ f f similarly for Dn− (h) and Dn+ (h).

Proof of Theorem 3.3. Since the assumption (2.7) on v(x) and the corresponding estimator  v (x), it suffices to prove (3.7) for v(x) replaced by  v (x). Setting g± (z; r) = k± (z) zr ,

r = 0, 1, 2, . . . , z ∈ R 1 .

(6.4)

We notice that wi+ (x)h−2 =



(g+ ((x − Xi )/ h; 0) g+ ((x − Xj )/ h; 2)

j

− g+ ((x − Xi )/ h; 1) g+ ((x − Xj )/ h; 1)),

x ∈ h, 1 − h, i = 1, . . . , n,

(6.5)

± (·) can be expressed through the and hence the estimators m functions g± ((x − Xi )/ h; r), i = 1, . . . , n; r = 0, 1, 2. n Therefore, it is useful to study various limit properties of i=1 i g± ((x − Xi )/ h; r)(Xi ), x ∈ h, 1 − h, r = n 0, 1, 2, . . . , and of i=1 g± ((x − Xi )/ h; r), x ∈ h, 1 − h, r = 0, 1, 2, . . . . Their main limit properties are stated in Lemmas 6.1 and 6.2. Lemma 6.1. Under assumptions (A.0), (A.2), (A.4), (A.5) and (A.6.1) or (A.6.2), as n → ∞,       sup  (g± ((x − Xi )/ h; r) − E g± ((x − Xi )/ h; r)) = OP ((nh log n)1/2 ),   x∈h,1−h

(6.6)

i

f

       E g+ ((x − Xi )/ h; r) − nhf (x) g+ (z) dz = O(nh2 ),   

f

       E g− ((x − Xi )/ h; r) − nhf (x) g− (z) dz = O(nh2 ),   

sup x∈h,1−h−Dn− (h)

sup x∈h,1−h−Dn+ (h)

(6.7)

i

(6.8)

i

  f+ (cj )(x−cj )/ h   sup E g+ ((x − Xi )/ h; r) − nh f+ (cj ) g+ (t; r) dt  −1 x∈cj −h,cj   i   0   +f− (cj ) g+ (t; r) dt  = O(nh2 ), j = 1, . . . , q  (x−cj )/ h

(6.9)

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

and

  (x−cj )/ h   sup E g− ((x − Xi )/ h; r) − nh f+ (cj ) g− (t; r) dt  0 x∈cj ,cj +h  i   1   +f− (cj ) g− (t; r) dt  = O(nh2 ), j = 1, . . . , q, r = 0, 1, 2, . . . .  (x−cj )/ h

767

(6.10)

Proof. We show the assertions for g+ (·) only. Since k+ (·) can be expressed as a difference of two nonnegative nondecreasing functions and since (x − Xi )s I {(x − Xi )/ h ∈ −1, 0}, s = 0, 1, 2, . . ., i = 1, . . . , n, are monotone functions in x, we can assume without loss of generality that the kernel k+ (·) is nonnegative nondecreasing on 0, 1. This immediately implies that for any grid points −1 = t0 < t1 < · · · < tL = 0 we have       sup  (g+ ((x − Xi )/ h; r) − E g+ ((x − Xi )/ h; r))   x∈h,1−h i        max  (g+ ((t − Xi )/ h; r) − E g+ ((t − Xi )/ h; r))  1  
+ max n |E (g+ ((t − Xi )/ h; r) − g+ ((t−1 − Xi )/ h; r))| .

(6.11)

1L

Standard calculations give       E (g+ ((t − Xi )/ h; r) − g+ ((t−1 − Xi )/ h; r)) D1 |t − t−1 | hn,   

 = 1, . . . , L,

(6.12)

i

with some D1 > 0. We take t − t−1 = 1/L. Since i [g+ ((x −Xi )/ h; r)−E g+ ((x −Xi )/ h; r)] is a sum of bounded i.i.d. variables, we can get by the Bernstein inequality  ! "     1/2 P  [g+ ((t − Xi )/ h; r) − E g+ ((t − Xi )/ h; r)] A(hn log n)   i ⎧ ⎫ ⎪ ⎪  ⎪ ⎪ ⎨ ⎬ A2 log n A2 nh log n     = 2 exp − 2 exp − √  2 ⎪ 2/3 · nhC + CA nh log n n ⎪ ⎪ ⎪ ⎩ 2/3 · C 2 + CA log ⎭ nh for any t ∈ h, 1 − h and for any A > 0, where we used also var g+ ((t − Xi )/ h; r) nhC 2 , C = maxx |k+ (x)|. It immediately implies that for any B > 0 there exists A > 0 such that for r = 0, 1, . . .  ! "     P  [g+ ((t − Xi )/ h; r) − E g+ ((t − Xi )/ h; r)] A (n h log n)1/2 < 2n−B .   i

Since B can be chosen arbitrarily large the distance L−1 for the equidistant grid points t can be chosen small (n− ,  > 0) and the proof of (6.6) can be easily finished. The assertions (6.7) and (6.8) are immediate. Concerning (6.9) the points cj are points of discontinuity of f, therefore, the corresponding terms have to be treated more carefully. We have for x ∈ (cj − h, cj )    1  x−y ; r f (y) dy = h g+ g+ (t; r)f (x − ht) dt E g+ ((x − Xi )/ h; r) = h 0 ) *   = h f+ (cj )

(x−cj )/ h

−1

g+ (t; r) dt + f− (cj )

The assertion (6.10) can be shown in the same way therefore it is omitted.

0

(x−cj )/ h



g+ (t; r) dt + O(h2 ).

768

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Corollary 6.1. Under the assumptions of Lemma 6.1 we have, as n → ∞,   ! " +     2 4 2 4 log n 2 5 sup  +n h , wi,± (x) − n h w˜ ± (x) = OP n h  nh x∈h,1−h 

(6.13)

i

where wi,± (x) is defined by (2.2) and f

w˜ + (x) = f 2 (x), if x ∈ h, 1 − h − Dn− (h),  0 0 k+ (z1 )k+ (z2 )(z12 − z1 z2 )(f+2 (cj )I {max(z1 , z2 ) (x − cj )/ h} = −1 −1

+ f−2 (cj )I {min(z1 , z2 ) < (x − cj )/ h} + f− (cj )f+ (cj ) × I {min(z1 , z2 ) (x − cj )/ h < max(z1 , z2 )}) dz1 dz2 if x ∈ cj − h, cj  j = 1, . . . , q,

(6.14)

and f

w˜ − (x) = f 2 (x), if x ∈ h, 1 − h − Dn− (h)cj , cj + h,  1 1 k− (z1 )k− (z2 )(z12 − z1 z2 ) × (f+2 cj )I {max(z1 , z2 ) (x − cj )/ h} = 0

0 2 + f+ (cj )I {min(z1 , z2 ) < (x

− cj )/ h} + f− (cj )f+ (cj )

× I {min(z1 , z2 ) (x − cj )/ h < max(z1 , z2 )}) if x ∈ cj , cj + h, j = 1, . . . , q. Proof. By Lemma 6.1 and (6.5) we get for



(6.15)

wi+ (x) uniformly in x ∈ h, 1 − h:

1  wi+ (x) = Q+ (x, h; 0) Q+ (x, h; 2) + M+ (x, h; 0) Q+ (x, h; 2) h2 + Q+ (x, h; 2) M+ (x, h; 0) + M+ (x, h; 0) M+ (x, h; 2) 2 − {Q2+ (x, h; 1) + 2M+ (x, h; 1) Q+ (x, h; 1) + M+ (x, h; 1)} # = OP (nh log n + (nh)3/2 log n) + n(n − 1) "     !   x − x1 x − x2 x − x2 2 x − x 2 x − x 1 × k+ − k+ · f (x1 ) f (x2 ) dx1 dx2 h h h h h  2 k+ (z1 ) k+ (z2 ) (z22 − z1 z2 ) f (x − z1 h) f (x − z2 h) dz1 dz2 = n(n − 1)h

# + OP (nh log n + (nh)3/2 log n), where Q+ (x, h; r) =



[g+ ((x − Xi )/ h, r) − E g+ ((x − Xi )/ h; r)],

x ∈ h, 1 − h,

i

M+ (x, h; r) =



E g+ ((x − Xi )/ h; r),

x ∈ h, 1 − h,

r = 0, 1, . . . .

i

Standard calculation gives  k+ (z1 ) k+ (z2 ) (z22 − z1 z2 ) f (x − z1 h) f (x − z2 h) dz1 dz2 = w˜ + (x) + OP (h),

r = 0, 1, . . .

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

which together with the above considerations implies (6.13) for analogously and, therefore, it is omitted. 



i wi,+ (x). The proof of (6.13) for

769



i wi− (x) follows

Corollary 6.2. Let the assumptions of Lemma 6.1 be satisfied. Then max



x∈h,1−h

  #   wi,± (x) m(Xi )  log n nh  i  nh5 + √ − m± (x) = OP . nh i wi± (x)

(6.16)

Proof. By the assumptions on the regression function m(·) we have        wi+ (x)(m(Xi ) − m(x)) − wi+ (x)(Xi − x) m + (x) max    x∈h,1−h i i ! "  2 max |w+ (x)| (Xi − x) . = OP x∈h,1−h

(6.17)

i

∗ (·)|, we get after some standard calculation Since (6.6) in Lemma 6.1 remains true even if g± (·; r) is replaced by |k±

  Xi − x 2 |wi+ (x)| h i   |g+ ((x − Xi )/ h; 2)| g+ ((x − Xj )/ h; 2) h2 i

+ h2



j

|g+ ((x − Xi )/ h; 3)|

i

h2

 i



|g+ ((x − Xj )/ h; 1)|

j

E (|g+ ((x − Xi )/ h; 2)| · |g+ ((x − Xj )/ h; 2)|

j

+ |g+ ((x − Xi )/ h; 3)| · |g+ ((x − Xj )/ h; 1)|) + OP (nh3 log n + n3/2 h7/2 )  = n2 h4 |k+ (z1 )| · |k+ (z2 )| (z22 + |z1 z2 |) z12 f (x − z1 h) f (x − z2 h) dz1 dz2 + OP (n3/2 h7/2 + nh3 log n) = OP (n2 h4 + nh3 log n).

(6.18)

It can be easily checked that   1  Xi − x wi+ (x) = g+ ((x − Xi )/ h; 1) g+ ((x − Xi )/ h; 2) 2 h h i



 i

j

g+ ((x − Xi )/ h; 2)



g+ ((x − Xi )/ h; 1) = 0.

(6.19)

j

The assertion immediately follows from (6.5), (6.13) and (6.17)–(6.19).



The next assertion provides a crucial tool in the proofs of our theorems. It is a modification of the results of Johnstone (1982) or Liero (1982) among others. Our assumptions differ from those considered by Johnstone (1982) and Liero (1982), moreover, another approach is used. It can still be easily generalized, e.g., the design need not be random, certain dependence among 1 , . . . , n can also be allowed etc.

770

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Lemma 6.2. Under assumptions (A.1)–(A.5) and (A.6.1) or (A.6.2) there exists a sequence of Wiener processes {Wn (t); t ∈ 0, 1}, n = 1, 2, . . ., such that for r = 0, 1, . . ., as n → ∞,  n  1  (Xi )  sup i g+ ((x − Xi )/ h; r) √ f (x) f nh i=1  (h))  x∈h,1−h−(Dn− (h)∪Dn−    (x) 1  −√ (6.20) dWn (u) = oP ((log n)−1 ), g+ ((x − F −1 (u))/ h; r) # −1  hf (x) f (F (u))  n  1  (Xi )  sup i g− ((x − Xi )/ h; r) √ f (x) f nh i=1  (h))  x∈h,1−h−(Dn+ (h)∪Dn+    (x) 1  −√ (6.21) dWn (u) = oP ((log n)−1 ), g− ((x − F −1 (u))/ h; r) # −1  hf (x) f (F (u))   n  1  ˜ ± (Xi )   sup i g± ((x − Xi )/ h; r) (6.22) √  = OP (1), f˜± (x)  x∈x0 −h,x0 +h  nh i=1

for any fixed x0 ∈ h, 1 − h, where c,f f˜+ (x) = f+ (x) if x ∈ Dn− (h) !  (x−cj )/ h  = f+ (cj ) k+ (t) dt + f− (cj ) −1

 ×

(x−cj )/ h

if x ∈ cj − h, cj ,

k+ (t) dt

c,f f˜− (x) = f− (x) if x ∈ Dn+ (h) !  (x−cj )/ h  = f+ (cj ) k− (t) dt + f− (cj ) 0



−1



(6.24)

"

0

(x−aj )/ h

k+ (t) dt

−1

0 −1

k− (t) dt

if x ∈ cj , cj + h, j = 1, . . . , q,

c, ˜ + (x) = + (x) if x ∈ Dn− (h) !  (x−aj )/ h  = + (aj ) k+ (t) dt + − (aj )

×

(6.23)

"

1

(x−cj )/ h

k− (t) dt

0

j = 1, . . . q,

−1

1

×

k+ (t) dt

−1

0 −1

"

0

k+ (t) dt

if x ∈ aj − h, aj , j = 1, . . . , p,

(6.25)

and c, ˜ − (x) = − (x) if x ∈ Dn+ (h) !  (x−aj )/ h  = + (aj ) k− (t) dt + − (aj ) 0

 0

(x−aj )/ h

k− (t) dt

−1

1

×

"

1

k− (t) dt

if x ∈ aj , +h, j = 1, . . . , p

Proof. It is a consequence of the Theorems in Hušková (2006). 

(6.26)

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

771

Lemma 6.3. Let assumptions of Theorem 3.1 or 3.2 be satisfied. Then  √     nh  log n  (w (x) − w ˜ (x)) (X ) = O sup , √   i± i± i i P 2 4  nh x∈h,1−h n h  i where ∗ w˜ i+ (x) = nh3 k+

 

= nh3 k+

x − Xi h x − Xi h

 f (x),

cf

if x ∈ Dn− (h)

    x − Xi k+ (z) z2 − z h −1



0

× (f+ (cj )I {−1 z (x − cj )/ h} + f− (cj )I {(x − cj )/ h < z 0}) dz if x ∈ cj − h, cj , j = 1, . . . , q ∗ w˜ i− (x) = nh3 k−

 

= nh3 k−

x − Xi h x − Xi h

 f (x), 

1

0

cf

if x ∈ Dn+ (h),

  x − Xi k− (z) z2 − z h

× (f+ (cj )I {0 z (x − cj )/ h} + f− (cj )I {(x − cj )/ h < z 1}) dz if x ∈ (cj , cj + h), j = 1, . . . , q. Proof. Applying Lemma 6.1 and (6.5) we receive uniformly for x ∈ h, 1 − h,   (wi± (x) − E (wi± (x)/Xi )) i (Xi ) ⎞ ! "⎛   = h2 g± (x, Xi ; 0)i ± (Xi ) ⎝ (g± (x, Xj ; 2) − E g± (x, Xj ; 2))⎠ i

j

!

"⎛



g± (x, Xi ; 1)i ± (Xi ) ⎝

(g± (x, Xj ; 1) − E g± (x, Xj ; 1))⎠



− h2



i

!



= h2

j

"

# g± (x, Xi ; 0)i ± (Xi ) OP ( nh log n)

i

! −h

2



"

# g± (x, Xi ; 1)i ± (Xi ) OP ( nh log n) = OP (nhh2 log n).

i

Moreover,  E (wi± (x)/Xi ) = (n − 1) h k± 3

x − Xi h





x − Xi k± (z) z − z h 2

 f (x − zh) dz + O(nh4 ) f

uniformly in x ∈ h, 1 − h. Treating separately x ∈ (cj − h, cj ), (cj , cj + h), j = 1, . . . , q, and h, 1 − h,  − Dn± we obtain the assertions after some standard steps.  Next, the following lemma implies (3.12)–(3.15).

772

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Lemma 6.4. Let the assumptions of Theorem 3.1 or 3.2 be satisfied then  sup x∈h,1−h

| ± (x) − ˜ ± (x)| = OP

 log n (2+)/ −/(2+) , ) √ + (nh nh

(6.27)

and sup x∈h,1−h

|f± (x) − f˜± (x)| = OP

√  log n +h , √ nh

(6.28)

where ± (x), ˜ ± (x), f± (·) and f˜± are defined by (3.11), (6.25)–(6.26) (3.10) and (6.23)–(6.24), respectively. Proof. We have for x ∈ h, 1 − h 2± (x) ×

    n n x − Xi x − X(i) 1  1  = k± (Y (i) − Y (i − 1))2 k± 2nh h nh h i=1

i=2

=d

n 1  (i (X(i) ) − i−1 2 (X(i−1) ) − (m(X(i) ) 2nh i=2   x − X(i) 2 − m(X(i−1) ))) k± h

= An± (x) + Bn± (x) − 2Cn± (x), say, where   n x − X(i) 1  2 (i (X(i) ) − i−1 (X(i−1) )) k± , An± (x) = 2nh h i=2

  n x − X(i) 1  2 Bn± (x) = (m(X(i) ) − m(X(i−1) )) k± , 2nh h i=2

Cn± (x) =

  n x − X(i) 1  . (i (X(i) ) − i−1 (X(i−1) ))(m(X(i) ) − m(X(i−1) ))k± 2nh h i=2

Concerning Bn± (x), by assumption (A.4) we have uniformly for x ∈ h, 1 − h !

"2    n  x − X(i)  1   Bn± (x) = OP (X(i) ) − X(i−1) k±  2nh h i=2 ⎞⎞ ⎛⎛ q q+1   I {cj −1 X(i−1) X(i) cj } + I {X(i−1) < cj < X(i) }⎠⎠ × ⎝⎝ j =1

j =1

 q+1  n   1 log2 n   x − X(i)   = OP I {cj −1 X(i−1) X(i) cj }  k± nh n2 h i=2 j =1 ! "     1 1 1 (log n)2 + OP = O = OP + , P nh nh n2 nh 

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

773

where c0 = 0, cq+1 = 1. Standard arguments yield that uniformly in x ∈ h, 1 − h An± (x) =

  n 1  x − X(i) (i (X(i) ) − i−1 (X(i−1) ))2 k± 2nh h i=2 ⎛⎛ ⎞⎞ p+1 p   × ⎝⎝ I {av−1 X(i−1) X(i) av } + I {X(i−1) < av < X(i) }⎠⎠ v=1

v=1



n 1  = (i − i−1 )2 2 (X(i) ) + OP 2nh



i=2

×

p+1 

log n n 

I {av−1 X(i−1) X(i) av } + OP

v=1



1 nh

 k±

x − X(i) h





with a0 = 0, ap+1 = 1. Denoting Qi =

i 

((j − j −1 )2 ) − E ((j − j −1 )2 ),

i = 1, . . . , n

j =1

we notice that max

1i n

1 |Qi | = OP (1), n2/(2+)

since k± has the finite variation and applying the Abel summation we have   n 1  x − X(i) ((i − i−1 )2 − 2)2 (X(i) )k± nh h i=1

      n−1 1  x − X(i) x − X(i+1) = Qi k± (X(i) ) − k± (X(i+1) ) nh h h i=1 ! "     n2/(2+) x − X(n) 1 Qn k± (X(n) ) = OP = OP ((nh(2+)/ )−/(2+) ). + nh h nh Hence       An± (x) − i k± ((x − Xi )/ h)2 (Xi )  1 (2+  )/  −  /(2+  )    sup  )  = OP nh + (nh x∈h,1−h i k± ((x − Xi )/ h) which together with the assumption (A.3) after few standard steps yields       An± (x) 1 sup  n + (nh(2+)/ )−/(2+) . − ˜ 2± (x) = OP nh h,1−h i=1 k± ((x − Xi )/ h) Combining results on An± (x) and Bn± (x) and applying the Hölder inequality we receive # sup |Cn± (x)| = OP ( An± (x)Bn± (x)) = OP ((nh)−1/2 ). h,1−h

The assertion (6.27) is proved. The proof of (6.28) is quite analogous and hence is omitted.



774

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Now, we finish the proof Theorem 3.3. By Corollaries 6.1 and 6.2 and by Lemmas 6.2 and 6.3 we have √ − (x)) nh( m+ (x) − m v −1/2 (x) =

n  1 ∗ ∗ i (Xi )(k+ ((x − Xi )/ h) − k− ((x − Xi )/ h))v −1/2 (x) + oP ((log n)−1 ) √ f (x) nh i=1



= V˜n (x/ h, k ∗ )

∗2 k+ (u) du

1/2

+ oP ((log n)−1 )

uniformly for x ∈ h, 1 − h − (Dn ∪ Dn ), where  F (x+h) ∗ V˜n (x/ h, k ∗ ) = k+ ((x − F −1 (u))/ h) #

(6.29)

f

F (x)

 −

F (x)

F (x−h)

∗ k− ((x

−F

−1

(u))/ h) #

1 f+ (F −1 (u)) 1

f− (F −1 (u))

dWn (u)  dWn (u)

 2h

−1/2 ∗2 k+ (u) du ,

x ∈ h, 1 − h,

(6.30)

where {Wn (t); t 0} is the Wiener process from Lemma 6.2. The process {V˜n (t, k ∗ ); x ∈ 1, h−1 − 1} is a Gaussian processes with zero mean and the covariance function cov{V˜n (t1 , k ∗ ), V˜n (t2 , k ∗ )}  1 ∗ ∗  ∗2 = (t1 − F −1 (u)/ h) − k− (t1 − F −1 (u)/ h)) (k+ 2h k+ (u) du ∗ ∗ × (k+ (t2 − F −1 (u)/ h) − k− (t2 − F −1 (u)/ h))

=

1 ∗2 2 k+ (u) du 



1 du f (F −1 (u))

∗ ∗ ∗ ∗ (t1 − z) − k− (t1 − z))(k+ (t2 − z) − k− (t2 − z)) dz. (k+

Therefore, the Gaussian process {V˜n (t, k ∗ ); t ∈ 1, h−1 − 1} has the same correlation function as the Gaussian process {V (t, k ∗ ); t 1} defined by  1 ∗ ∗ (t − z) − k− (t − z)) dW (z), (6.31) V (t, k ∗ ) =   (k+ ∗2 2 k+ (u) du where {W (t); t 0} is a Wiener process, and, therefore, the processes {V˜n (t; k ∗ ); t 1} and {V (t; k ∗ ); t 1} have the same distribution. This together with (6.29) and Lemma 6.4 implies (3.7). The assertion (3.8) follows from Corollaries 6.1 and 6.2, Lemmas 6.3 and 6.4 and (6.21), (6.22) and (3.7). Proof of Theorems 3.1 and 3.2. We start we an auxiliary lemma concerning the behavior of the process {V (t, k ∗ ); t 1} defined by (6.31).  Lemma 6.5. (i) Let the assumptions (A.5) and (A.6.1) be satisfied. Then for all x, as n → ∞ ! " P

a(h)

sup t∈1,h−1 −1

|V (t, k ∗ )|x + b1 (h) → exp{−2 exp{−x}},

where a(h) and b1 (h) are defined by (3.2) and (3.3), respectively.

(6.32)

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

775

(ii) Let the assumptions (A.5) and (A.6.2) be satisfied. Then for all x, as n → ∞ ! " P

sup

a(h)

t∈1,h−1 −1

|V (t, k ∗ )|x + b2 (h) → exp{−2 exp{−x}},

(6.33)

where b2 (h) is defined by (3.6). Proof of Lemma 6.5. It follows from Hušková (2006).



Now, we can finish the proofs of Theorems 3.1 and 3.2. Notice that Lemma 6.5 implies # |V (t, k ∗ )| = OP ( log n), sup t∈1,h−1 −1

so that we can infer the assertions of both theorems from Theorem 3.3 and Lemma 6.5.



± into the “smooth” and “non-smooth” part, i.e. Proof of Theorem 3.4. We decompose m ± (x) = m 0± (x) + m 1± (x), m 1 w i=1 i± (x)

0± (x) = n m

1 w i=1 i± (x)

1± (x) = n m

n 

(6.34) (i (Xi ) + m0 (x))wi± (x),

(6.35)

I {x x0 }wi± (x).

(6.36)

i=1 n  i=1

0± (·) relates to the “smooth” part of the regression function m(·), while m 1± (·) reflects the jump at x0 . Clearly, m Moreover, + (x) − m − (x) = m 0,+ (x) − m 0,− (x), m

x∈ / x0 − h, x0 + h.

(6.37)

Noticing that Lemma 6.4 holds true even under the assumptions of Theorem 3.4 and applying Theorem 3.3 to j , Xj ), j = 1, . . . , n, with Y j = m0 (x) + i (Xi ), j = 1, . . . , n, we realize that there exists a sequence of Wiener (Y processes Wn = {Wn (t); t 0} such that, as n → ∞,    # 0+ (x) − m 0− (x) 1 m ∗ ∗ (6.38) nhn = (k+ (x/ hn − z) − k− (x/ hn − z)) dWn (z) + oP √ log n  v (x) uniformly for x ∈ hn , 1 − hn  − (Dn (hn ) ∪ Dn (hn )) and f

sup x;|x−x 0 |  hn

#

nhn

0− (x)| | m0+ (x) − m = OP (1) √  v (x)

for any x 0 ∈ hn , 1 − hn . By Lemma 6.1 and Corollary 6.1 we receive !+ " ⎧ ⎪ log n ⎨   (x−x0 )/ h k ∗ (z) dz + O , x x0 − h, P + −1 nh 1+ (x) = m ⎪ ⎩ 0, x < x0 − h, and

!+ " ⎧  ⎪ log n )/ h (x−x ⎨ 0 ∗ (z) dz + O k− , x x0 , P 0 nh 1− (x) m ⎪ ⎩ 0, x < x0 .

Combining (6.34)–(6.40) we obtain the assertion of Theorem 3.3.

(6.39)

(6.40) 

776

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

Proofs of Theorem 4.1 and 4.2. Going through the proofs of Theorems 3.1–3.3 and comparing Tn with Tn (R), we see that the crucial issue is to prove that given (Yi , Xi ), i = 1, . . . , n, there exists a sequence of Wiener processes {Wn∗ (t), t 0}n such that as n → ∞,  k  ! "   P   P max  ˜Ri − Wn∗ (k)|n−1/(2+) c (Yi , Xi ), i = 1, . . . , n → 0  1k n  i=1

for any c > 0. To show this we apply the results of Hušková (1997). We have to verify the assumptions. Since n 1  ˜i = 0 n

n 1  2 ˜i = 1, n

and

i=1

i=1

it remains to show that n 1  |˜i |2+ = OP (1) n

(6.41)

i=1

with some > 0. Notice that taking into accounts the assumptions (4.2)–(4.4) and applying standard tools we obtain ! "" ! n n n   1  1 | (Xi ))−2+ |i |−2+ + | m(Xi ) − m(Xi )|−2+ i |2+ = OP max( i n n i=1

i=1

i=1

= OP (1), 1 n

n 

| i |2 =

i=1

n n 1  2 1  2 2 (Xi ) − 2 (Xi ) i + i n n 2 (Xi ) i=1

+

2 n

n  i=1

i=1

n (Xi )) 1  (m(Xi ) − m (Xi ))2 i (Xi )(m(Xi ) − m + 2 2 n  (Xi )  (Xi ) i=1

n 1  2 = i + oP (1). n i=1

These two relations then imply (6.41). The proof is finished.



Acknowledgement ˇ 201/06/0186 and MSM 0021620839. The work of the first and third author was supported by grants GACR References Antoch, J., Hušková, M., 2001. Permutation tests for change point analysis. Statist. Probab. Lett. 53, 37–46. Antoch, J., Hušková, M., Jarušková, D., 2003. Off-line quality control. In: Lauro, N.C. et al. (Eds.), Multivariate Total Quality Control: Foundations and Recent Advances. Springer, Heidelberg, pp. 1–86. Eubank, R.L., Speckman, P.L., 1994. Nonparametric estimation of functions with jump discontinuities. In: Carlstein, E., Müeller, H.G., Siegmund, D. (Eds.), Change-Point Problems. IMS, pp. 130–144. Eubank, R.L., Cline, D.B.H., Speckman, P.L., 1995. Nonparametric estimation of regression curves with discontinuous derivatives. Technical Report, Texas A&M University, University of Missouri-Columbia. Gijbels, I., Fan, J., 1996. Local Polynomial Modelling and its Applications. Chapman & Hall. Good, P., 2000. Permutation Tests. Springer, New York. Grégoire, G., Hamrouni, Z., 2002. Change point estimation by local linear smoothing. J. Multivariate Anal. 83, 56–83. Hamrouni, Z., 1999. Inférence statistique par lissage linéaire local pour une fonction de régression présentant des discontinuités. Dissertation, University of Joseph Fourier, Grenoble, 1999. Hart, J.D., 1997. Nonparametric Smoothing and Lack-of-Fit Tests. Springer, New York. Hor´vath, L., Kokoszka, P., 2002. Change-point detection in non-parametric regression. Statistics 36, 9–31. Hušková, M., 1997. Limit theorems for rank statistics. Statist. Probab. Lett. 32, 45–55.

J. Antoch et al. / Journal of Statistical Planning and Inference 137 (2007) 753 – 777

777

Hušková, M., 2004. Permutation principle and bootstrap in change point analysis. In: Horváth, L., Szyszkowicz, B. (Eds.), Asymptotic Methods in Stochastics, Fields Institute Communications, vol. 44, pp. 273–291. Hušková, M., 2006. Results on invariance principles useful in nonlinear regression, in preparation. Hušková, M., Slabý, A., 2001. Permutation principle for multiple changes. Kybernetika 37, 606–622. Johnstone, G.J., 1982. Probabilities of maximal deviations for nonparametric regression functions estimates. J. Multivariate Anal. 12, 402–414. Lehmann, E.L., 1991. Theory of Point Estimation. Wadworth & Brooks/Cole, CA. Liero, H., 1982. On the maximal deviation of the kernel regression function estimate. Math. Operationforsch. Ser. Statist. 13, 171–182. Müller, H.G., Song, K.S., 1997. Two-stage change-point estimators in smooth regression models. Statist. Probab. Lett. 34, 323–335. Müller, H.G., Stadtmüller, U., 1999. Discontinuous versus smooth regression. Ann. Statist. 27, 299–337. Wu, J.S., Chu, C.K., 1993a. Kernel-type estimators of jump points and values of a regression function. Ann. Statist. 21, 1545–1566. Wu, J.S., Chu, C.K., 1993b. Kernel-type function estimations and bandwidth selection for discontinuous regression functions. Statistica Sinica 3, 557–576.