Minimax lower bounds and moduli of continuity

Minimax lower bounds and moduli of continuity

Statistics & Probability Letters 50 (2000) 279 – 284 Minimax lower bounds and moduli of continuity Geurt Jongbloed Department of Mathematics, Vrije U...

87KB Sizes 0 Downloads 57 Views

Statistics & Probability Letters 50 (2000) 279 – 284

Minimax lower bounds and moduli of continuity Geurt Jongbloed Department of Mathematics, Vrije Universiteit De Boelelaan 1081a, 1081 HV Amsterdam, Netherlands Received October 1999; received in revised form February 2000

Abstract The asymptotic behaviour of the minimax risk can be used as a measure of how ‘hard’ an estimation problem is. We relate the asymptotic behaviour of this quantity to an appropriate modulus of continuity, using elementary ideas and c 2000 Elsevier Science B.V. All rights reserved techniques only. MSC: 60J65; 60G40; 60J25 Keywords: Minimax risk; Continuity modulus; Density estimation; Decreasing density

1. Introduction A way of assessing the hardness of an estimation problem, is to study the asymptotic behaviour of its minimax risk. Donoho and Liu (1991) study the asymptotics of the minimax risk in a general density estimation setting. Fan (1993) considers minimax risks in the problem of estimating a regression function. In both these papers, the geometric concept ‘modulus of continuity’ of a certain functional over a class of functions appears. Also many references to work on minimax risks can be found there. We consider a measure space (X; A; ) where  is -ÿnite. On this space, let G be a class of densities with respect to . Moreover, let T be a functional deÿned on G. The problem is to estimate the functional Tg of g which is known to be contained in G, based on a sample from that density. Let l be an increasing convex loss function on R+ , and g⊗n denote the n-fold product density associated with g. As in Donoho and Liu (1991), we deÿne the minimax risk for this problem by Rl (n; T; G) = inf sup Eg⊗n l(|tn (X ) − Tg|): tn g∈G

(1)

The inÿmum is taken over all measurable functions tn : Xn → R. In this paper we derive an asymptotic lower bound for Rl (n; T; G) in terms of an appropriate modulus of continuity, using elementary techniques only. This lower bound is similar to that given by (1.2b) in Donoho and Liu (1991), but derived quite di erently. c 2000 Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 0 ) 0 0 1 0 4 - 8

280

G. Jongbloed / Statistics & Probability Letters 50 (2000) 279 – 284

It should be noted that the main theorem of this paper, Theorem 1, is a generalized version of Theorem 3:1 in Jongbloed (1995). That original result has been rephrased and applied in e.g. Groeneboom (1996), de Wolf (1999) and Cator (2000).

2. A minimax lower bound theorem Theorem 1 below can be used to derive an asymptotic lower bound for the optimal rate for estimating a functional of a density which is known to belong to a class of densities G. Before stating this theorem, we introduce the Hellinger distance between two probability densities. Deÿnition 1. Let f and h be probability densities on a measurable space ( ; B) with respect to a dominating -ÿnite measure . The Hellinger distance H (f; h) between f and h is then deÿned as the square root of Z p Z p p 1 ( f(x) − h(x))2 d(x) = 1 − f(x)h(x) d(x): H 2 (f; h) = 2

If we use the special loss function l(x) = |x|, as in Theorem 1, we denote the minimax risk Rl as deÿned in (1) by R1 . Theorem 1. Let G be a class of densities on X and T a functional on G. Fix g ∈ G. Let (gn ) be a sequence of densities in G such that √ lim sup nH (gn ; g)6: n→∞

Then 2

lim inf |Tgn − Tg|−1 R1 (n; T; {g; gn })¿ 14 e−2 : n→∞

In the proof of Theorem 1 we need some facts concerning the Hellinger distance. A very useful relation exists between the Hellinger distance of the product densities f⊗n (x1 ; x2 ; : : : ; x n ) = f(x1 )f(x2 ) · · · f(x n ) and h⊗n (x1 ; x2 ; : : : ; x n ) = h(x1 )h(x2 ) · · · h(x n ) with respect to the dominating product measure ⊗n on Xn , and the distance between the densities f and h on X: Z p f⊗n h⊗n d⊗n 1 − H 2 (f⊗n ; h⊗n ) = Xn

Z =

X

p fh d

n

= (1 − H 2 (f; h))n :

(2)

We shall also need the inequality given in the lemma below. This inequality is sometimes referred to as LeCam’s inequality. Lemma 1. Let f and h be probability densities on a measurable space (X; A) with respect to a dominating measure . Then  2 Z Z (1 − H 2 (f; h))2 61 − 1 − f ∧ h d 62 f ∧ h d:

G. Jongbloed / Statistics & Probability Letters 50 (2000) 279 – 284

281

Proof. Writing out the square, the second inequality is trivial. The ÿrst inequality is essentially Cauchy– Schwarz: 2  2 Z 2  Z Z p 1 |f − h| d fh d + (1 − H 2 (f; h))2 + 1 − f ∧ h d = 2 Z = Z 6

p fh d p fh d

2 +

1 4

+

1 4

2

Z

√ p √ p | f − h|( f + h) d

2

Z p Z p √ √ ( f − h)2 d ( f + h)2 d = 1:

Proof of Theorem 1. We will now show that for each n¿1, R1 (n; T; {g; gn })¿ 14 |Tgn − Tg|(1 − H 2 (gn ; g))2n : The theorem then follows by letting n tend to inÿnity. Since the maximum of two numbers is always larger than the average, we get R1 (n; T; {g; gn })¿ 12 inf (Eg⊗n |tn (X ) − Tg| + Egn⊗n |tn (X ) − Tgn |): tn

The expectations at the right-hand side decrease when they are both taken with respect to g⊗n ∧ gn⊗n instead of g⊗n and gn⊗n , respectively. Hence, also using the triangle inequality, we obtain for each estimation procedure tn : Eg⊗n |tn (X ) − Tg| + Egn⊗n |tn (X ) − Tgn | Z ¿

Xn

Z ¿

Xn

(|tn (x) − Tg| + |tn (x) − Tgn |)g⊗n (x) ∧ gn⊗n (x) d⊗n (x) |Tgn − Tg|g⊗n (x) ∧ gn⊗n (x) d⊗n (x) = |Tgn − Tg|

Z Xn

g⊗n (x) ∧ gn⊗n (x) d⊗n (x)

¿ 12 |Tgn − Tg|(1 − H 2 (g⊗n ; gn⊗n ))2 = 12 |Tgn − Tg|(1 − H 2 (gn ; g))2n : In the last line we ÿrst apply Lemma 1, followed by property (2) of the Hellinger distance. Two observations can be made in connection with this theorem. The ÿrst is that for convex loss functions l, Jensen’s inequality yields that 2

lim inf l( 14 e−2 |Tgn − Tg|)−1 Rl (n; T; {g; gn })¿1 n→∞

under the conditions of Theorem 1. The second observation is that for each subclass Gn of G containing (gn ) for all n suciently large, the lower bound on the minimax risk as given in Theorem 1 holds. In particular, the lower bound holds for all Hellinger balls around g with positive radius. Considering the lower bound given in Theorem 1, and the fact that we may choose (gn ) arbitrarily, as long as the condition on its Hellinger distance to g is satisÿed, we can maximise the lower bound. If we ÿx ,√we should therefore make Tgn − Tg as big as possible under the restriction that, asymptotically, H (gn ; g)6= n.

282

G. Jongbloed / Statistics & Probability Letters 50 (2000) 279 – 284

A formal way of stating this problem is to deÿne the modulus of continuity of T over G with respect to the Hellinger metric, locally at g: m() = sup{|Th − Tg|: h ∈ G and H (h; g)6}:

(3)

Theorem 1 then leads to Corollary 1. Let G be a class of densities on X and T a functional on G. Fix g ∈ G; and let the function m be deÿned as in (3). Then for each subset Gg of G containing some Hellinger ball of positive radius around g; √ 2 lim inf m(= n)−1 R1 (n; T; Gg )¿ 14 e−2 : n→∞

for each positive . √ Proof. Fix  ¿ 0. For all n suciently large, the Hellinger ball of radius = n around g in G, is contained in Gg . Now choose (h ) ⊂ G such that H 2 (h ; g)6 and |Th − Tg| ¿ m()(1 − ). Then, √ √ lim inf m(= n)−1 Rl (n; T; Gg ) ¿ lim inf m(= n)−1 Rl (n; T; {h=√n ; g}) n→∞

n→∞

√ 2 ¿ lim inf |Tg − Th=√n |−1 (1 − = n)Rl (n; T; {h=√n ; g})¿ 14 e−2 : n→∞

It is especially the behaviour of the function m near zero that is important for the lower bound of the minimax risk. In many problems, of which we will encounter an example in Section 3, this behaviour can be described by m() = (c)r (1 + o(1))

as  ↓ 0

(4)

for some positive parameters c and r. Then we get the following result. Corollary 2. Let G be a class of densities on X and T a functional on G. Fix g ∈ G; and let the function m be deÿned as in (3); allowing an asymptotic expansion as given in (4). Then for each subset Gg of G containing some Hellinger ball around g; √ lim inf nr=2 R1 (n; T; Gg )¿ 14 ( 12 c r)r e−r=2 : n→∞

Proof. From Corollary 1 we have that for each  ¿ 0 2

lim inf nr=2 R1 (n; T; Gg )¿ 14 e−2 (c)r : n→∞

Maximizing this lower bound with respect to  gives the result. We end this section stating a useful lemma for computing the Hellinger distance between two densities on the real line. Lemma 2. Let g and (gn )∞ n=1 be densities on (−∞; ∞); such that {x: gn (x) ¿ 0} ⊂{x: g(x) ¿ 0} for all n suciently large and gn (x) − g(x) → 0 for n → ∞: sup g(x) {x: g(x)¿0}

(5)

G. Jongbloed / Statistics & Probability Letters 50 (2000) 279 – 284

283

Then there is a vanishing sequence of positive numbers cn such that Z (gn (x) − g(x))2 d x 6 8H 2 (gn ; g) (1 − cn ) g(x) {x: g(x)¿0} Z 6 (1 + cn )

{x: g(x)¿0}

(gn (x) − g(x))2 dx g(x)

for all n. Proof. Using (5), the ÿrst inequality follows from Z (gn (x) − g(x))2 dx g(x) {x: g(x)¿0} Z =

{x: g(x)¿0}

p p ( gn (x) − g(x))2 s

62

sup

{x : g(x)¿0}

1+

gn (x) g(x)

s 1+

gn (x) g(x)

!2 dx

!2 H 2 (gn ; g)6(1 + n )8H 2 (gn ; g);

where √ n is a vanishing sequence of positive numbers. The second inequality can be shown as follows, using that 1 + u¿1 + 12 u − (1 + o(1)) 18 u2 for u → 0. Z p p 1 2 ( gn (x) − g(x))2 d x H (gn ; g) = 2 {x: g(x)¿0} s

Z =



Z 6

1−

{x: g(x)¿0}

{x: g(x)¿0}

! g(x) d x

 gn (x) − g(x) (gn (x) − g(x))2 + − (1 + n ) g(x) d x 2g(x) 8g(x)2

Z = (1 + n )

gn (x) − g(x) 1+ g(x)

{x: g(x)¿0}

(gn (x) − g(x))2 dx 8g(x)

where n ¿0 for all n and n → 0. 3. Example Consider the problem of estimating a decreasing density at a ÿxed point. Denote by G the class of decreasing density functions on [0; ∞). Fix a decreasing density g ∈ G and a point x0 ¿ 0 such that g is di erentiable at x0 and that g0 (x0 ) ¡ 0. We will determine the behaviour of the function m for small values of , where m() = sup{|h(x0 ) − g(x0 )|: h ∈ G and H (h; g)6}: This we can do by determining the function e for small values of , where e() = inf {H (h; g): h ∈ G and |h(x0 ) − g(x0 )|¿};

284

G. Jongbloed / Statistics & Probability Letters 50 (2000) 279 – 284

since m and e are inverses of each other, in the sense that m()¿ ⇔ e()6. It is easily seen that for small  the inÿmum in the deÿnition of e is attained at the function h , where  g(x0 ) +  for x0 − c1 6x6x0 ;     h (x) = g(x0 ) − c2  for x0 ¡ x6x0 + c3 ;     g(x) elsewhere; where c1 ; c2 and c3 are positive constants depending on  and g such that h integrates to one and is continuous at the points x0 − c1  and x0 + c3 . We may without loss of generality assume that c2 ¡ 1 for each small . Approximating g locally near x0 by its linearization at x0 , we see that for  ↓ 0, c1 → −1=g0 (x0 ), c2 → 1 and c3 → −1=g0 (x0 ). Therefore, in view of Lemma 2, we have for  ↓ 0, 1=2 Z ∞ (h (x) − g(x))2 ∼ (−12g(x0 )g0 (x0 ))−1=2 3=2 ; dx e()  8g(x) 0 giving that, for  ↓ 0, m() ∼ ((−12g(x0 )g0 (x0 ))1=2 )2=3 . Corollary 2 now gives that for any subclass Gg of G containing some Hellinger ball around g of positive radius, p lim inf n1=3 R1 (n; T; Gg ) ¿ 14 ( 12 (−12g(x0 )g0 (x0 ))1=2 2=3)2=3 e−1=3 n→∞

= 14 (−2g(x0 )g0 (x0 ))1=3 e−1=3 :

(6)

Let X1 ; X2 ; : : : be a sequence of independent identically distributed random variables with density g, as ÿxed above. Then the evaluation of the maximum likelihood estimator gˆn for g at the point x0 , merely knowing g to be decreasing on [0; ∞), has the following asymptotic behaviour (see e.g. Robertson et al., 1988, Theorem 7:2:5): D

n1=3 (gˆn (x0 ) − g(x0 )) → 22=3 (−g(x0 )g0 (x0 ))1=3 Z:

(7)

Here Z is distributed as the location of the maximum of the process W (t) − t 2 , where W is a standard two-sided Brownian motion on R. Note that the constants at the right-hand sides of (6) and (7) both depend on g only via the quantity (−g(x0 )g0 (x0 ))1=3 . References Cator, E.A., 2000. Deconvolution with arbitrarily smooth kernels, submitted for publication. Donoho, D.L., Liu, R.C., 1991. Geometrizing rates of convergence III. Ann. Statist. 19, 668–701. Fan, J., 1993. Local linear regression smoothers and their minimax eciencies. Ann. Statist. 21, 196–216. Groeneboom, P., 1996. Lectures on inverse problems. In: Lectures on Probability Theory and Statistics. Saint-Flour, 1994, Lecture Notes in Mathematics, Vol. 1648, Springer, Berlin, pp. 67–164. Jongbloed, G., 1995. Three statistical inverse problems. Ph.D. dissertation, Delft University. Robertson, T., Wright, F.T., Dykstra, R.L., 1988. Order Restricted Statistical Inference. Wiley, New York. de Wolf, P.-P., 1999. Estimating the estreme value index. Ph.D. dissertation, Delft University.