On the non-consistency of an estimate of Chiu

On the non-consistency of an estimate of Chiu

ELSEVIER Statistics & Probability Letters On the non-consistency 20 (1994) 183-188 of an estimate of Chiu* Luc Devroye McGill University, Sc...

367KB Sizes 0 Downloads 25 Views

ELSEVIER

Statistics

& Probability

Letters

On the non-consistency

20 (1994) 183-188

of an estimate

of Chiu*

Luc Devroye McGill

University,

School qf‘ Computer

Science, 3480 University

Received

1993; revised June

January

Street,

Montreal,

Canada

1993

Abstract

We show that for some densities, a bandwidth selection method of Chiu (1991) for kernel density estimates is not consistent. While the method shows promise for some densities, it should be used with caution. Kq words: Density estimation; Kernel estimate; Convergence; characteristic function; Nonparametric estimation

Smoothing

factor; Bandwidth

selection; Empirical

1. Introduction

The purpose of this note is to give examples of densities for which one of Chiu’s (1991) bandwidth selectors by is not consistent. We consider an i.i.d. sample X, , . . . ,X, drawn from a univariate densityf; and estimatef

fnh(x)= i

,$Kh(X-

1-l

xi),

and h > 0 is the smoothing factor where K is the kernel (a function integrating to one), Kh(x) = (l/h)K(x/h), (Akaike, 1954; Parzen, 1962; Rosenblatt, 1956). The fundamental problem in kernel density estimation is that of the joint choice of h and K in the absence of a priori information regardingf: Theoretical studies going back to Watson and Leadbetter (1963) show that the choice of h and K should not be split into two independent subproblems. Also, the choice of K largely depends upon the smoothness off: Watson and Leadbetter started from Parseval’s identity E

s

(.Mx)

-f(x))‘dx

= &E

s

l(~n,,(t) - q(t)12dt>

where q and ~p,,~are the Fourier transforms offandf,,. Let $ be the Fourier transform of the function K, (note that h is absorbed in this definition). Then the expected L, error given above is minimal for the choice

*The author’s

research

was sponsored

by NSERC

Grant

A3456 and FCAR

0167.7152/94/$7.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0167.7152(93)E0130-L

Grant

90-ER-0291.

184

L. Devroye/

With this choice, the minimal

expected

Statistics

& Probability

Letters 20 (1994)

183-188

L, error reduces to

IvIZ(l - lv12) i 1 + (n - l)l$?12’ These fundamental results were at the basis of a number of fine results. (a) Davis (1975, 1977) looked at the rate of decrease of the expected L, error for various rates of decrease of cp, when tj(t) = ((th) for a fixed form 5. Here one is not allowed to vary the form of the kernel with n. She looked in particular at the Fourier kernel K(x) = sin(x)/( xx, ) and showed it to be nearly optimal for many densities. (b) Davis (1977) proposed letting h = l/t, where t is the smallest t for which the estimate of the optimal 1~1’ equals l/(n + 1). For the Fourier kernel sin(x)/(nx), this estimate is

I@(t)12 =

5

((

iit1

COS(IXi))2

+ (h

sin(tXi)

,$ I-1

2 >>

- 1 n-

1’

Bloxom (1979) provides some encouraging experimental results with this estimate. (c) Wahba (1981) and Hall and Marron (1988) adaptively estimate parameters of the optimal Fourier transform of the kernel under certain tail conditions on the characteristic function off: This would require knowledge of the underlying class of densities. (d) Cline (1988) uses the Watson-Leadbetter result to point out that the optimal Fourier transform always is symmetric and positive. As the Fourier transform (say, 5) of the Epanechnikov kernel takes negative values, it can always be replaced by t+, its positive truncation, for a strict improvement in the expected L2 error. (e) Cline (1990) gives precise asymptotic analysis of the expected L2 error based upon the Fourier transform approach. Finally, based upon recent developments in data-based bandwidth selection, several methods have been proposed for picking the bandwidth that have their origin in the expressions given by Watson and Leadbetter. Chiu (1991) has a plug-in-method that is based upon the empirical characteristic function q,(t) =

k,$eitxj. J-1

Let n be the smallest positive t such that jcp,,(t)12 d 3/n (where the constant recommends any constant > 1). Then use

h=

3 is a design parameter;

Chiu

(Cn(J-ff:K)2)llS.

where A

cc1n: o t4(ldt)12 s

This is related

CAn

- 1/4dt.

to a plug-in-method

s1

t4(Ivn(t)12

suggested

by Park and Marron

(1990), in which C is taken as

- l14ti2(h’4dc

and h’ = C’s3”3h’0”3 is a pilot bandwidth, C’ is a function off(which in turn is estimated by the reference density approach) and s is a given measure of the scale off: This yields an implicit equation in h, which must be solved. The nonconsistency dealt with in this paper is due to a problem that is endemic in most L,-based cross-validation methods including such ‘solve-an-equation’ schemes.

185

L. Devroye / Statistics & Probability Letters 20 (1994) 183-188

2. An inequality

for empirical characteristic

functions

The literature on empirical characteristic functions contains many strong results (Csiirgii, 1981, Keller, 1988), but none of these really fits our needs, as we require an inequality for p

sup

i II/< r

1981; Marcus,

Iv(t) - cp,(r)l> B 2 I

where /? and c(depend upon n in an arbitrary fashion. For fi near l/J%, the results of Csiirg6 (1981) are useful. For p fixed, we are in large deviation territory (Keller, 1988). We believe that the following inequality is of independent general utility. Theorem 1. Let X be a random variable with characteristicfunction cpandfinite$rst moment, and let q,, be the empirical characteristic function based upon an i.i.d. sample of size n drawn from X. Then, for a > 0, p > 0 possibly dependent upon n,

where the o(1) term is uniform over all 01and /3. Proof. Define

P y = 4EIXI’ We find numbers tl < t2 < ... < tk with the property assure this with k < 1 + 2a/y. We begin with p

sup Iv(t) - cpn(t)l > a d p sup ldt) I i jr - 31< ; Ifl< 1

+ p

that tl = - cc,tk = a, 1ti - ti+ 116 y. Clearly,

- cp@)l ’ fB

sup l%(t) II- .VI <)

we can

I

- (Pn(s)I >38

+ iiI pi lVtti) - Vntti)l > 3P) “2 I + II + III. Note that Iv(t) -

&)I

d 41 - ei(*ms)x)d El(t - s)Xl < YE/X) d @,

when 1t - SI < y. Therefore, I = 0. Next, we let Y be the random Xi’s in the sample drawn from X. Then Iv,,(t) - q,,(s)I G El1 - ei”-“‘YI <

El(t - s)YI

=)t-Sl

I I Xi.

t,i;

I-1

variable

that puts mass l/n at each of the

186

L. Devroyej Statistics & Probability Letters 20 (1994) 183-188

Therefore, ll~p{Y~~jlxi~

a:8}

by the law of large numbers. Finally, cp. Let [, and qn be the corresponding fixed ti, p{

lV(ti)

-

VnCti)I



4P)

by Hoeffding’s inequality Theorem 1. 0

G

p(

<

4e-n82/72,

Ii

we let [ and q be the real and imaginary parts of the Fourier transform empirical functions. For example, i,,(t) = (l/n)Cy= i COS(tXi). Then, for

-

for bounded

in(

>

random

&P}

+

p{

variables

Irl(ti)

-

Yln(ti)l

(Hoeffding,

>

bP)

1963). This concludes

the proof

of

3. The main result In the Theorem below, we describe a simple class of densities attempt was made to obtain a general result.

for which Chiu’s method

is nonconsistent.

No

Theorem 2. Let f be a density with jinite first moment, and with real unimodal characteristic function cp satisfying q(t) - t-’ as t + 00, where c < f is ajixed constant. Then, if H is the bandwidth choice for Chiu’s method, we have 1iminfE n-r=

lfnH-fl >O. s

An example. The condition of the theorem is satisfied for the random variable X = Z - Z’, where Z and Z’ are i.i.d. gamma random variables with parameter a < 4 (note that cp(t) = l/(1 + t’)“). Other modes of convergence. We cannot give an L, version of the Theorem, as s f2 = co for the densities under consideration. It should come as no surprise that it is precisely for these densities that problems occur, as the design of the method is L,-based. This raises the interesting question of whether we should test for the finiteness of 1f 2 before applying an L,-based bandwidth selector. Nevertheless, fnHis undesirable by any standard as we will show that nH -+ 0 in probability, so that we will not even have pointwise convergence at any point. Other bandwidth selectors. Chiu’s bandwidth selector shares is anomalous behavior with most of the L2 cross-validation criteria. For example, a similar nonconsistency was pointed out in Devroye (1989) for the original L2 cross-validation method (Bowman, 1974; Rudemo, 1974). For a survey of other methods in this class, and for some fixes, see for example Jones and Kappenman (1992) or Marron (1988,1989). Also, we have not considered Chiu’s stabilized method or one of its modifications (Chiu, 1992). Proof. If the constant C in Chiu’s method is such that C/n4 + 00 in probability, then nH -+ 0 in probability as well. By necessary conditions for consistency (Devroye and Gyiirfi, 1985) this implies that for any kernel, 1iminfE n-rm

\fnH--fl >O. s

187

L. Devroye/ Statistics & Probability Letters 20 (1994) 183-188

Define a constant 1

z with

4C

5 - 2c

tPZ’-

Define cx as the solution

of

cp(t) = n-=. Note in particular

that as n -+ 00, CI- n”‘. Define

fi = f inf Iv(t)] = $cp(a) = in-‘. 111 <1 Let A, be the event that sup Iv(t) - cp,(r)l d B. Ifl< 1 If A, holds, then for t E [ - c(,x], IMr)l

3 Iv(r)1 - B > inf Iv(t)] - P > &a-‘. IfI
For n large enough, and 1tl d CI,q,(t) l/n d $cp’(t), we see that C 3 i

:t4(lq.(a)l’

- l/n)dt

:i4($la(z)12

- l/n)dt

> J3/ n, and thus /i > 2. As on C-M, a], under A,, q,(t)

2 )cp(t) and

s 3 k s

N5-2c -

8(5 - 2c)n n(z/c)(5/c)

- 8(5 - 2c)rc’ By our choice of z and c, the exponent P{A,} + 1, then for any constant M,

of n on the right-hand

side is greater

than 4. We conclude

that if

lim P{C > Mn4} = 1, n+30 or, equivalently, lim P{nH< n-cc

for any constant

E > 0,

E} = 1.

To prove that P{A,) + 1, invoke Theorem 1. The constants a and /3 there are the same ones we introduced in the proof of Theorem 2. Note that B = in-’ with z < f, and that LX/Pgrows polynomially with n. Thus, P{A,} -+ 1 as required. 0

188

L. Devroye/

Statistics

& Probability

Letters 20 (1994)

183-188

Acknowledgment I would like to thank

the referee.

References Akaike, H. (1954), An approximation to the density function, Ann. Inst. Statist. Math. 6, 127-132. Bloxom, B. (1979) A Fourier integral density estimate: a Monte Carlo study, Comm. Statist. B 8, 391-396. Bowman, A.W. (1984), An alternative method of cross-validation for the smoothing of density estimates, Biometrika 71, 353-360. Chiu, S.-T. (1991), Bandwidth selection for kernel density estimation, Ann. Statist. 19, 1883-1905. Chiu, S.-T. (1992), An automatic bandwidth selector for kernel density estimate, Biometrika 79, 771-782. Cline, D.B.H. (1988), Admissible kernel estimators of a multivariate density, Ann. Statist. 16, 1421-1427. Cline, D.B.H. (1990), Optimal kernel estimation of densities, Ann. Inst. Smtist. Math. 42, 287-303. Csiirgii, S. (1981), Limit behavior of the empirical characteristic function, Ann. Probab. 9, 130-144. Davis, K.B. (1975), Mean square error properties of density estimates, Ann. Statist. 5, 1025-1030. Davis, K.B. (1977), Mean integrated square error properties of density estimates, Ann. Statist. 5, 530-535. Devroye, L. (1989), On the non-consistency of the L2 cross-validated kernel density estimate, Stat&. Probab. Lett. 8, 425-433. Devroye, L. and L. Gy&fi (1985). Nonparametric Densiry Estimarion: The Lf View (Wiley, New York). Hall, P. and J.S. Marron (1988), Choice of kernel order in density estimation, Ann. Statist. 16, 161- 173. Hoeffding, W. (1963), Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58, 13-30. Jones, M.C. and R.F. Kappenman (1992), On a class of kernel density estimate bandwidth selectors, Stand. J. Statist. 19, 337-349. Keller, H.-D. (1988), Large deviations of the empirical characteristic function, Acta Sci. Math. Hunyar. 52, 207-214. Marcus, M.B. (1981), Weak convergence of the empirical characteristic function, Arm. Probab. 9, 194-201. Marron, J.S. (1988). Automatic smoothing parameter selection: a survey, Empirical Econom. 13, 187-208. Marron, J.S. (1989), Automatic smoothing parameter selection: a survey, in: A. Ullah, ed., Semiparametric and Nonparametric Economics, (Heidelberg) pp. 65-86. Park, B.U. and J.S. Marron (1990), Comparison of data-driven bandwidth selectors, J. Amer. Statist. Assoc. 85, 66-72. Parzen, E. (1962), On the estimation of a probability density function and the mode, Ann. Math. Statist. 33, 1065-1076. Rosenblatt, M. (1956), Remarks on some nonparametric estimates of a density function, Ann. Marh. Statist. 27, 832-837. Rudemo, M. (1982), Empirical choice of histograms and kernel density estimators, Stand. J. Statist. 9, 65-78. Wahba, G. (1981), Data-based optimal smoothing of orthogonal series density estimates, Ann. Suztist. 9, 146-156. Watson, G.S. and M.R. Leadbetter (1963), On the estimation of the probability density, Ann. Math. Statist. 34, 480-491.