The adaptive rate of convergence in a problem of pointwise density estimation

The adaptive rate of convergence in a problem of pointwise density estimation

Statistics & Probability Letters 47 (2000) 85 – 90 The adaptive rate of convergence in a problem of pointwise density estimation Cristina Butucea a;b...

90KB Sizes 1 Downloads 26 Views

Statistics & Probability Letters 47 (2000) 85 – 90

The adaptive rate of convergence in a problem of pointwise density estimation Cristina Butucea a;b b Humboldt

a Universit e

Paris VI, Paris, France Universitat zu Berlin, SFB 373, Spandauer Strasse 1, D 10178 Berlin, Germany Received April 1998; received in revised form May 1999

Abstract We estimate the common density function of n i.i.d. observations, at a xed point, over Sobolev classes of functions having regularity . We prove that the optimal rate of convergence cannot be attained in adaptive estimation, i.e. uniformly c 2000 Published by Elsevier Science B.V. All over in some interval Bn . A slower rate is shown to be adaptive. rights reserved Keywords: Sobolev classes; Minimax risk; Adaptive density estimation

1. Introduction This work is related to the exact adaptive result that was proven in Butucea (1998). There we assumed that the adaptive rate of convergence for density estimation on the scale of Sobolev classes is of order (log n=n)( −1=2)=2 , where n is the sample size and is the regularity parameter of the estimated density. In the present paper, we complement this result by showing that the adaptive rate of convergence is, in fact, of order (log n=n)( −1=2)=2 . These results extend work on Holder classes by Lepskii (1990) and Brown and Low (1996) to Sobolev classes of densities. Let us recall the framework of Butucea (1998). Let X1 ; : : : ; Xn be n i.i.d. random variables, having common probability density, p : R → [0; +∞). We want to estimate p at a xed point x0 , assuming that p belongs to the class of functions Wn ( ; L), de ned below, where ¿ 12 characterizes the smoothness (e.g. the number of continuous derivatives, if is an integer). Let us de ne the ball of radius L ¿ 0 in the Sobolev space of density functions as follows:   Z Z p = 1; (F(p)(x))2 |x|2 d x62L2 ; W ( ; L) = p : R → [0; +∞): R

R

E-mail address: [email protected], [email protected] (C. Butucea) c 2000 Published by Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 9 9 ) 0 0 1 4 1 - 8

86

C. Butucea / Statistics & Probability Letters 47 (2000) 85 – 90

where F(p)(x) = class

R R

p(y)e−ixy dy denotes the Fourier transform of the function p. Now, we introduce the

Wn ( ; L) = {p ∈ W ( ; L); p(x0 )¿n }; where the sequence n of positive real numbers satis es limn → ∞ n = 0 and lim inf n → ∞ n log n ¿ 0. Indeed, as we will see in Proposition 2 below, we need to introduce the truncated class of Sobolev densities Wn ( ; L) in order to avoid the possible case of di erent densities for each n such that p(x0 ) → 0 too fast, as n → ∞ (see Butucea, 1998 for further details). Let us consider an estimator pn of p, a xed point x0 ∈ R, a sequence of positive real numbers ’n; which goes to 0 when n → ∞, and de ne the maximal risk of an estimator pn of p by Rn; (pn ; ’n; ) =

sup

p∈Wn ( ; L)

q ’−q n; Ep |pn (x0 ) − p(x0 )| ;

(1)

where q ¿ 0 and Ep is the expectation with respect to the distribution of X1 ; : : : ; Xn when the underlying probability density is p. From now on, c, C and C 0 denote nite, positive constants that may be di erent in each context. De nition 1. The sequence ’n; is an optimal rate of convergence on Wn ( ; L), with respect to the risk (1), ∗ if there exists an estimator pn; such that ∗ c6 lim inf inf Rn; (pn ; ’n; )6 lim sup Rn; (pn; ; ’n; )6C; n → ∞ pn

n→∞

where the in mum is taken over all possible estimators pn of p. The following result gives the optimal rate of convergence over the Sobolev class of density functions when estimating at a xed point. This result follows easily from the argument of Donoho and Low (1992). Proposition 1. For a xed value ; the sequence ’n; = (1=n)( −1=2)=2 is an optimal rate of convergence on the class Wn ( ; L); with respect to the risk (1). We are interested in adaptive estimation, i.e. in nding the asymptotically best estimator pn∗ independent of , provided that belongs to a set Bn . More precisely, the set Bn = { 1 ; : : : ; Nn } is such that 1 2 ¡ 1 ¡ · · · ¡ Nn ¡ + ∞. We suppose that 1 is xed with n, lim n → ∞ Nn = +∞; {Nn }n¿1 is a nondecreasing sequence of positive integers and n = mini=1;:::; Nn −1 ( i+1 − i ) satis es lim sup n ¡ + ∞: n→∞

(2)

Moreover, we assume that n log n → +∞: log log n n → ∞

N2 n

(3)

By construction, Bn is a grid ofppoints which are at least n apart. In particular, we can take equidistant stepwidths of n = 1 or n = 1= log n and Nn = log log n. The rst natural step is to look for an estimator pn∗ such that lim sup sup Rn; (pn∗ ; ’n; )6C; n → ∞ ∈Bn

(4)

C. Butucea / Statistics & Probability Letters 47 (2000) 85 – 90

87

where ’n; is the optimal rate of convergence. If such an estimator exists, it is called optimal rate adaptive estimator. Lepskii (1990) and Brown and Low (1996) showed that it is not possible to attain the optimal rate of convergence ’n; , uniformly in belonging to a set Bn , when estimating functions in Holder classes, at a xed point x0 . Here, we want to establish a similar result for density functions in the Sobolev classes. The question is whether condition (4) is satis ed when ’n; is replaced by some other sequence n; . The candidate for such a n; is the exact asymptotic normalization found in Butucea (1998).  such that lim sup → ∞ a( )  \{ Nn }, a=a( ; L; q; p(x0 )) ¿ 0, for all in B, More precisely, we de ne B=B p pn ¡ ∞ and lim inf → ∞ a( ) ¿ 0 and choose (  a(log n=n)( −1=2)=2 for ∈ B; = (5) n; (1=n)( −1=2)=2 for = Nn : Proposition 2 (Butucea, 1998). There exist an estimator pn∗ independent of in Bn explicitly given and a precise constant a = a( ; L; q; p(x0 )) ¿ 0 associated with n; in (5); such that lim inf sup Rn; (pn ;

n → ∞ pn

∈B

lim sup Rn; Nn (pn∗ ; n→∞

n; )

= lim sup Rn; (pn∗ ; n→∞

n; )

∈B

= 1;

n; Nn ) ¡ ∞:

It may seem that a natural way to de ne an adaptive rate of convergence relation c6 lim inf inf sup Rn; (pn ; n → ∞ pn ∈B n

n; )6 lim sup sup n → ∞ ∈Bn

Rn; (pn∗ ;

n;

for the risk Rn; is via the

n; )6C

(6)

with some estimator pn∗ that does not depend on . As we see in the example below, relation (6) does not determine uniquely the adaptive rate of convergence. Example 1. Let us consider the case of a set B = { 1 ; 2 } with 1 ¡ 2 and two sequences (1) n;

≡ ’n; 1 =

 ( 1 −1=2)=2 1 1 ; n

(2) n; 1

 =

log n n

( 1 −1=2)=2 1

;

(2) n; 2

=

 ( 2 −1=2)=2 2 1 : n

∗ By Proposition 1 at = 1 , there exists an estimator pn; 1 such that ∗ lim sup Rn; 1 (pn; 1 ; n→∞

for C ¿ 0, as

(1) n;

= ’n; 1 . Using also the fact that Wn ( 1 ; L) ⊇ Wn ( 2 ; L) when 1 ¡ 2 , we get

∗ lim sup Rn; 2 (pn; 1 ; n→∞

(1) n; )6C

(1) ∗ n; )6 lim sup Rn; 1 (pn; 1 ; n→∞

(1) n; )6C:

From the above inequality, we obtain the upper bounds in relation (6) for n;(1) . By Proposition 2, for C¿1 and for the estimator pn∗ , the same upper bounds are valid for n;(2) . For the lower bounds, we get lim inf sup Rn; (pn ;

n → ∞ pn ∈B

(1) inf inf n; )¿ lim n → ∞ pn

Rn; 1 (pn ;

(1) inf inf n; 1 )¿ lim n → ∞ pn

Rn; 1 (pn ; ’n; 1 )¿c;

88

C. Butucea / Statistics & Probability Letters 47 (2000) 85 – 90

where we applied Proposition 1 to = 1 . For the sequence n;(2) , Proposition 2 gives us direct lower bounds in relation (6). This shows that there exist two quite di erent rates n;(1) and n;(2) that satisfy condition (6), for di erent choices of pn∗ . In Section 2, we shall prove that n; de ned by (5) is an adaptive rate of convergence over the Sobolev class of densities, in the sense of De nition 2, given below. This problem was considered for the Gaussian white noise model, rst by Lepskii (1990) on the Holder classes and, recently, by Tsybakov (1998) on the Sobolev classes.

2. Results The following de nition is a modi cation of Lepskii’s (1990) de nition (see Tsybakov, 1998), adapted to the density estimation problem. De nition 2. The sequence n; is an adaptive rate of convergence, if (1) there exists an estimator pn∗ , independent of over Bn , which is called rate adaptive estimator, such that lim sup sup Rn; (pn∗ ; n → ∞ ∈Bn

n; )6C;

(2) if there exist another sequence of positive real numbers n; and an estimator pn∗∗ such that lim sup sup Rn; (pn∗∗ ; n; )6C 0 n → ∞ ∈Bn

and, at some 0 in Bn ; n; 0 = → n → ∞ + ∞:

n; 0

→ n → ∞ 0; then there exists another 00 in Bn such that n; 0 =

n; 0 ·n; 00 = n; 00

In other words, if another rate satis es the lower and upper uniform bounds in (6) and if at one point 0 this rate is faster than the adaptive rate, then there exists some other point 00 where the loss with respect to the adaptive rate is in nitely larger than the gain at 0 . Remark. If an optimal adaptive estimator exists, it is also rate adaptive. Indeed, an optimal adaptive estimator satis es condition (1) by de nition, for the optimal rate of convergence ’n; . We can verify that in this case condition (2) in De nition 2 is redundant, since such a sequence n; does not exist. (1) (2) n; and n; satisfy the rst statement in Def(1) (1) (2) n; . Indeed, for 1 we get n; 1 = n; 1 → n → ∞ 0,

Remark. Let us return to the previous section and see that both inition 2. In terms of the second statement but there exists 2 such that

(1) (2) n; 1 = n; 1

·

(2) n; is preferred to (1) (2) n; 2 = n; 2 → n → ∞ +

∞.

Theorem 1. Let Bn = { 1 ; : : : ; Nn } such that 12 ¡ 1 ¡ · · · ¡ Nn ¡ + ∞; 1 is xed with n; limn → ∞ Nn = +∞ and Nn satisfy conditions (2) and (3). Then there is no optimal adaptive estimator for the risk (1) and n; ; given by (5) with a = a( ; L; q; p(x0 )) ¿ 0; is the adaptive rate of convergence in the sense of De nition 2.

C. Butucea / Statistics & Probability Letters 47 (2000) 85 – 90

89

Proof. Suppose there is an optimal adaptive estimator pn∗ . Then the optimal rate of convergence ’n; = (1=n)( −1=2)=2 ful lls C ¿ lim sup sup Rn; (pn∗ ; ’n; ) n → ∞ ∈Bn



n;

¿ lim sup sup n → ∞ ∈B

¿ lim inf n→∞

sup ∈B

’n;



n;

q

Rn; (pn∗ ;

 n; )

!

q inf sup Rn; (pn ;

’n;

pn

n; )

∈B

:

(7)

This leads to a contradiction as the right-hand side expression tends to ∞, by Proposition 2 and the fact that 1=2−1=4    1=2  log n log n log Nn n; sup ¿ sup c 1=4 ¿c exp → +∞: n→∞ Nn 4 Nn ∈B ’n; ∈B Indeed, log n= Nn → n → ∞ + ∞ as a consequence of hypothesis (3) (see Butucea, 1998). The rst statement of the Theorem 1 is proven. According to Proposition 2, there exists an estimator pn∗ , independent of , such that lim sup sup Rn; (pn∗ ; n → ∞ ∈Bn

n; ) ¡ C:

Suppose now that there exist another sequence n; and an estimator pn∗∗ , that does not depend on , for which the previous inequality holds. Using the same technique as in (7), we nd that for each particular ; n; cannot converge faster to 0 than the optimal rate ’n; . We also have lim sup max{Rn; 0 (pn∗∗ ; n; 0 ); Rn; Nn (pn∗∗ ; n; Nn )}6C: n→∞

(8)

 since at = Nn the rate Moreover, suppose n; 0 = n; 0 → n → ∞ 0, at some 0 in Bn . Therefore, 0 ∈ B, 00 n; Nn = ’n; Nn is optimal over the class Wn ( Nn ; L). It remains to show that there is another in Bn such that (n; 0 = n; 0 ) · (n; 00 = n; 00 ) → n → ∞ + ∞. We denote N = Nn from now on. In our context, n; N is in nitely slower then some nr , where n = n−1 and r = 1=2 − = 0 , for some xed  ∈ (0; 1) as small as needed. Indeed, if n; N =nr → + ∞ with n → ∞, we choose 00 = N . We prove that the alternative case when lim inf n → ∞ n; N =nr 6c1 leads to a contradiction. Thus n; is the adaptive rate of convergence, in both the cases. Let r 0 = 1=2 − 2= 0 . From (8) we obtain q  r q   0 n r 0 −r n; 0 lim inf inf max{Rn; 0 (pn ; n; 0 ); Rn; N (pn ; nr )}: ; n C¿ lim sup min n → ∞ pn n; 0 n; N n→∞ The right-hand side tends to in nity, because n; 0 =n; 0 → n → ∞ + ∞, by hypothesis, lim supn → ∞ (nr =n; N ) 0 0 nr −r ¿(1=c1 ) lim supn → ∞ n= = ∞, as  ¿ 0, and lim inf inf max{Rn; 0 (pn ; n→∞

pn

r0 n; 0 ); Rn; N (pn ; n )} ¿ 0:

(9)

In order to show (9), we construct similarly to Butucea (1998) dicult enough experiments pn;0 in Wn ( N ; L) and pn;1 in Wn ( 0 ; L). That is, the non-asymptotic expression in (9) is further bounded from below by Rn = inf max{ pn

−q n; 0 Epn;1 |(pn

0

− pn;1 )(x0 )|q ; n−r q Epn;0 |(pn − pn;0 )(x0 )|q }:

90

C. Butucea / Statistics & Probability Letters 47 (2000) 85 – 90

We proved in Butucea (1998) that for some  ¿ 0 and small in (0; 1) Rn ¿

(1 − )qnq (2 )q (1 − 2 )2q ; (1 − 2 )q + qnq (2 )q

where qn = n; 0 =nr . By choosing  = n and = (q=4 0 )(1 − 16) we get qnq → ∞ with n and nally lim inf n → ∞ Rn ¿(1 − )(1 − 2 )2q ¿ 0. 0

References Brown, L.D., Low, M.G., 1996. A constrained risk inequality with application to nonparametric functional estimation. Ann. Statist. 24, 2524–2535. Butucea, C., 1998. Exact adaptive pointwise estimation on Sobolev classes of densities. Doc. Travail 9818, CREST. Donoho, D.L., Low, M.G., 1992. Renormalization exponents and optimal pointwise rates of convergence. Ann. Statist. 20, 944–970. Lepskii, O.V., 1990. On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35, 454– 466. Tsybakov, A.B., 1998. Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes. Ann. Statist. 26, 2420–2460.