Bump hunting in regression analysis

Bump hunting in regression analysis

Statistics & Probability North-Holland Letters 27 May 1992 14 (1992) 141-152 Bump hunting in regression analysis Nancy E. Heckman Department of St...

762KB Sizes 0 Downloads 64 Views

Statistics & Probability North-Holland

Letters

27 May 1992

14 (1992) 141-152

Bump hunting in regression analysis Nancy E. Heckman Department of Statistics, University of British Columbia, Vancouver, BC, Canada

Received March 1990 Revised August 1991

Abstract: Computer techniques for scatter-plot smoothing are valuable exploratory analysis tools. However, one must exercise caution when making inferences based on features of the smoothed data. Here, the size and number of observed modes are studied: the asymptotic distributions are determined under the assumption of constancy of the true regression function, and sufficient conditions for consistency of the estimated number of modes and their locations are given in the general case.

Keywords: Mode, bump

hunting,

Poisson

approximation.

1. Introduction Computer-intensive regression techniques such as spline and kernel smoothing are valuable exploratory data analysis tools for the applied researcher. However, the practitioner must often guess which features of the smoothed data are ‘real’ as opposed to those which are mere chance variations. In addition, the features of interest are greatly influenced by the amount of smoothing. A bump is a feature which can be of enormous significance. For instance, in evolutionary biology, if the probability of survival of an individual of a species is largest at a particular body weight, then stabilizing selection occurs, with individuals at that best body weight being favoured in the evolutionary process. The existence of the bump would then be of primary importance, not its location or size. A commonly used technique to determine the presence of stabilizing selection is parabolic logistic regression. But this technique has limitations, since a data cloud with a bump can often be better fit by a line than a parabola. Recently, Schluter (1988) used a non-parametric form of logistic regression called GLIM splining to study the shape of probability survival curves. The resulting data displays often exhibited bumps which were not detectable by standard analyses. How does one know if these are indeed bumps, or merely random fluctuations which can be expected in any data set? This question underlies the analysis in this paper. One might decide that a bump has occurred if the estimates of the regression function at specified values of the independent variable rise and then fall over a range of the independent variable. Specifically, let c, i = 1,. . . , N, represent these estimates. (Typically, N will be much smaller than the

Correspondence to: Nancy Canada V6T 122.

E. Heckman,

Research

the Natural

supported

0167-7152/92/$05.00

under

0 1992 - Elsevier

Dept.

Sciences

of Statistics,

University

and Engineering

Science Publishers

Research

of British

Columbia,

2021 West

Council

of Canada

grant

B.V. All rights reserved

Mall,

Vancouver,

BC,

5-87969.

141

Volume

14, Number

2

STATISTICS

& PROBABILITY

LETTERS

27 May 1992

sample size, to insure that we do not search for a bump over too small a range of the independent variable.) A ‘bump’ of size k (or a ‘k-bump’) at the jth estimate, k
The value of k must be at least 1 (in which case a bump involves 3 q. values) and N > 2k. The purpose of this paper is to study the asymptotic distribution of SC, the number of k-bumps, and K,, the maximum bump size observed. These results can be used in hypothesis tests which reject that the regression function is monotone if either Sk > 0 or, equivalently, if K, > k. The results also give some indication of how large a bump should be before it can be considered ‘real’. In addition, Sk is a consistent estimate of the number of modes of the regression function, under conditions on the regression function, k, N, the sample size, and the amount of smoothing. Assumptions and notation are given in Section 2. Section 3 contains the derivation of the asymptotic distribution of Sh and K, in the simple case in which the true regression function is a constant. In particular, it is shown that P{S,‘$> O} converges to zero if k, N + 03, and converges to 1 if N + CQ,k fixed. In Section 4 bounds on P(Sk > O}= P(K, & k) are obtained assuming that the regression function is monotone. These bounds imply that P{$, > 0) converges to zero if N/(K + l)! + ~0.Sk is shown to be a consistent estimator of the number of local maxima of a general regression function. Section 5 contains simulation results, which confirm that ‘real’ bumps must be of a substantial size. Several authors have studied the bump-hunting problem in density estimation. Good and Gaskins (1980) propose an ad hoc method for indexing the ‘believability’ of a bump. Silverman (1981, 1983) propose a clever test statistic which exploits the sensitivity of the number of modes to the amount of normal kernel smoothing done. Although his test statistics appears to be sensible for other kernels and for regression problems, his proof of the test’s consistency depends heavily upon the use of a normal kernel and the application to density estimation, and thus may not easily extend to the regression model considered here. Muller (1985) propose estimating the height and location of a mode of a regression function by the size and location of a mode of a kernel estimate of the regression function, but he does not consider multi-modal functions, nor does he discuss the behavior of his estimates when the regression function is monotone. In addition, Muller’s conditions on the regression function are more restrictive than those considered here. He assumes that the regression function is twice continuously differentiable. Here, this condition is replaced by weaker ones that insure that a local maximum is ‘sufficiently visible’ (see Section 4). In particular, the regression function need not be continuous. However, Muller derives the asymptotic distribution of the location of a mode, while Theorem 4.2 below states only that the estimated locations of the bumps converge to the true locations.

2. Notation Suppose that (ti, y), i = 1,. . ., n, are raw data, with a f t, : f:

ti)/h)k:

w((i,-

f(fj) = $. = i=,’

C w(

(;j

i=l

142

-

Ii)/h)



Volume

14, Number

STATISTICS

2

& PROBABILITY

27 May 1992

LETTERS

where o is non-negative, with support contained in the interval [- f, il. Typicaliy, N is much smaller than n; the existence of a bump is inferred from the f”(&Ys rather than from the f(ti)‘s, which are more likely to exhibit a spurious bump. For ease of calculations, the following assumption is made to insure that $ - E(q)% are independent and identically distributed. Assume that ti = i/(n + I), ~j =j/(N + I>, m=m,=(n+l)/(N+l)

is an integer, and h Let I,! = 1 if $k has occurred at tj if the maximum bump

(1)

< l/W

+ 1). Thus,*4 is anweighted averagepf yi’s with (j - i)rn m. < $_k+l < a.. < y and q > q+i > . * * > l$+k, and I, = 0 else. Thus, a k-bump and only if ZF = 1. Si, the number of k-bumps, is I:+ i + . . . +Zi_k and, if K, is size observed,

P(K,,,>, k} =P(SI;#O}. Thus, the distribution of the number of bumps of size k determines the distribution of K,, and thus helps determine what size bump is ‘real’. Note that Z: and 1: are independent, provided I j - I I > 2k, and that Z:1/ = 0 for 0 < I j - I I < 2k. If pjk’ = E(l,b], then N-k E(S$)

=

c

I)i’k’,

ktl

and 2k

N-k-l

N-3k

= C pi’“‘(l -pj’“‘) - 2 C

C

pj(k)pjk+)l + 2 C

j=k+l

k+l

N-k

var(Si)

kfl

I=1

E(Zjlik,,,).

The results of Sections 3 and 4 may hold under different constructions of the f.‘s. The proofs of the theorems use the independence of the f’s and an analysis of the E($$. For instance, if f is monotone, as in Section 3, for the construction here E(gl) = E(pz> = . . . = E(Y,). This would certainly hold for many other estimates of f. The weakening of the independence structure is more delicate, as it complicates the analysis of P tj_k
...


...

>tj+k

>

)

where Ej = $ - E(I’,). If this probability is sufficiently close to the probability in the independent (e.g. the difference is o(N/k)), the results should hold.

3. The case of a constant regression

case

function

If f is constant, then the t’s are independent and identically distributed and every ordering of the t’s is equally likely. Thus, the distribution of S,k does not depend on w or h, provided that the t’s are independent; nor does it depend upon the error distribution, provided only that it is continuous. Thus, the results ofAthis section actually apply not only to the t’s, but also to the original data, where N = 12, m = 1, and q = y. In this case pck’ =pjk) and Cck’ = Cjk) = E(I,kIjk+2k) don’t depend on j, yielding E( Sk) = (N - 2k)pck’,

and var(Sk)

= (N-2k)pck)-

(pck’)‘(N(4k

+ 1) - 4k - 12k2) + 2(N-

4k)Cck’. 143

Volume

14, Number

2

STATISTICS

& PROBABILITY

LETI’ERS

27 May 1992

Lemma 3.1. p(k)=

(2k+

l)-‘(k!)_*,

Cck’= ~(47+,,) =

~$_ “;(,;,“:;T’ ds]* du

( _ l)j+’ (k-j)!(k-Z)!(k+j)!(k+l)!(2k+j+E+l)

j,r=o

g(2k+1)(p’k’)2’

Proof. The superscript k will be omitted. Since the distribution of the Zj’s doesn’t depend upon the distribution of the t.‘s, assume without loss of generality that the 8’s are uniformly distributed on the unit interval. Let

It can be shown by induction on 1 that f(a, b, I) = (b - all/l!, and thus

=

Ykk+l(Yk+l -4k-1 dy

1

/

k!(k - l)!

Yk+l=U

k+l-

Hence, du = (2k + l)-‘(k!)-*,

p = (‘h(u) ‘0

and

U&+2< =

/ *l_*h2(4

by the independence 1

>?dk+l I

du

du,

(y-l+l)k(y-u)k-l k!(k - l)!

‘(:)(-l)i(l-U)‘+k I=0

=/$o(k_,),;k++l)‘(l-u)‘+km

144

‘-*

and exchangeability of the 2’s. By a binomial expansion,

h(u) = / y=u =

*** <&+1&k++

(ydy=

‘(:)l:. I=0

T(1-t l)T(k) Z(Z+k+I)

kl(k:I)!

l)‘(y-u)k-l kl(k-I)!

dy

Volume

14, Number

STATISTICS

2

& PROBABILITY

27 May 1992

LETTERS

The summation expression for C follows easily by term by term integration of h*. Integration yields that 1 cG

q

(k!(k-l)!j2

For the remainder of the section, ZN will denote a Poisson random variable with parameter h,=(N-2k)p’k’ Theorem 3.1.

sup 1P{Sk EA} - P(ZN EA} 1G 2pck’(4k + l)(l

- e--AN),

A P{Sk

f

o}

<

(n

-

2k)P@‘.

Proof. The first statement follows immediately from Lemma 3.1 and an application of Theorem Arratia, Goldstein, and Gordon (1988). The second statement of the theorem holds since

P(SL # 0} = P{bump at ik+i or at . . . t^N_k}
1 of

0

The following theorem and corollaries illustrate the danger of not allowing the size of the bump to increase adequately with the sample size. If k, the bump size, does not increase with N, then we will ‘see’ more and more bumps as the sample size grows, even when the regression function is constant. This is intuitive: if k is fixed, as we gather more data, we will be searching for a bump in too small a range of the independent variable. However, the probability of seeing a bump of size k will converge to zero, even if k approaches infinity slowly (see Corollary 3.2). Theorem 3.2. Suppose that k is fixed. Then (Sk - E(Sk))/

/m

is asymptotically standard normal.

Proof. Using the k-dependent

form of the Central Limit Theorem (Theorem suffices to show that N-*j3 var(SL) converges to infinity. Since C 2 0, var(S,$) > Np( 1 -p(4k

+ 1) + o(1))

= O(N)

7.3.1 of Chung, 1974), it

for k 2 2.

The last equality follows from the fact that p(4k + 1) is a decreasing function of k for k 2 1 which q equals 9/20 when k = 2. For k = 1, direct computation using Lemma 3.1 yields var(Si) = O(N). Corollary 3.1. Zf NpCk) approaches infinity, then Sk is asymptotically normal with mean and variance equal to NpCk), If NpCk’ approaches A, finite and non-zero, then Sk is asymptotically Poisson with parameter A. If NpCk’ approaches zero, then SL converges to zero in probability. q Corollary 3.2. Zf k is ftied, P{K,

< k} + 0. If k + COthen

1P( K, < k} - exp( -AN) 1G 2pCk)(4k + 1) which converges to zero.

q 145

Volume

14, Number

2

STATISTICS

4. The case of a general regression

& PROBABILITY

LETTERS

27 May 1992

function

The consistency of S,$ is stated in Theorems 4.1 (assuming the regression function is non-decreasing) and in Theorem 4.2, for the general regression function. The proofs of the theorems are given at the end of this section. Theorem 4.1. Suppose that f is non-decreasing.

Then

In the general regression function case, suppose that f has exactly 1 local maxima b, < b and that there exist c > 0 and 6 > 0 such that

at a < b, < . . . <

f(t) -f(s) 2c(t-s)2, whenever either bj - S . It is also satisfied for some non-continuous f. The regression function must satisfy monotonicity properties on subintervals of [a, 61. Specifically, assume that there exists a finite number of closed intervals, on each of which f is either non-decreasing or non-increasing. To understand the necessity of this assumption, consider f defined on [O, 11 with f(x) =x for x < i, f(i) = 1, and f(x) = 2 -x for x > i. Then f has no local maxima, but most smoothing methods would produce an estimate of f with a local maximum at x near i. Assume that there exists a A > 0 such that for each i, iflofilIf(bi+x)

either

-f(bi-x)

20

or

inf f(bj-x) x E KU1

-f(bi+x)

20.

Suppose that the kernel, w, and W* are Riemann integrable, symmetric about zero, uniformly continuous on [ - $, i], and bounded away from zero. These assumptions appear to be an artifact of the method of proof. Fortunately they are mild, being satisfied by commonly used kernels and by most well behaved functions. Theorem 4.2. Assume that the E,‘S haue variance u2 and let l/2

/ p2

=

w’(x) dx

-l/2 2’

dx

w(x)

Suppose that k N

+ 0,

N (k+l)!

+”

hN+h*E(O,

11,

and that f satisfies the following Lipschitz condition: there exist (Y > 0, /3 > 0 such that If(bi) 146

-f(x)1

GPIbi-xl”>

Volume

14, Number

2

STATISTICS

& PROBABILITY

LETTERS

27 May 1992

for all x in a neighbourhood of bi (since, by assumption, f is increasing on [b, - 6, bj] and decreasing on [bi,bi + S] with a maximum at bi, this condition is always satisfied for LY= 0). Let m be as in (1) and c as in Theorem 4.1. If there exists y > 0 such that l-

4a2u2 (1 +y)h*c2,s

n4

exist exactly 1 bumps

of size k) + 1,

provided that n’2/m13+a -+ 0. Moreover, 3/(2(n + 1)) of the hi’s converges to one. If we assume

(2)

+ 1,

then P( there

1 k

the probability that the 1 bumps occur at ij’s that are within

that the E~‘Sare normally

-

I

i

2pLan2(1

then (2) can be replaced

by the weaker

condition:

II

em5/2/., *r/2 l--2@

distributed k

+ 1,

+ y)

(2’)

where @ denotes the cumulative distribution function of a standard normal. Note that conditions (2) and (2’) imply that m512/n2 -;’ ~0 and thus n12[m13+a - 0 for (Y2 2. The proofs of both theorems require analysis of .Cj= E; - E(E;) and _Hq - q.+ 1>.

‘i=

i=i Ew((‘j-t,)/hj i=l

Therefore,



-m/2
the tj’s are independent

c var(.Cj) = u2

ojh-I--&)

c

=

and identically

distributed

with

m2(h-+

-m/2
pi”’

n:

1 ij2

(3)

“@2PL2(h*m)-‘.

-m/2
Similar

calculations

yield

c (j-1/2)m
-

@+,)

=

ar(h-‘(t;-

c

ti))



(4)

(j-1/2h
Proo[ of Theorem 4.1, EquaJion (4) and I+ ?;.+l, then 2j (= y - E(x.1) > tj+ 1 and P{S+-O}=P

Nck I{$_,< i

f

“.

non-decreasing

<$,Q

imply

...

that

E(e)

Q E($+l>.

Therefore,

if

>$+k

j=k+l

I

147

STATISTICS & PROBABILITY

Volume 14, Number 2



forsome

**. >tj+k

LETTERS

27 May 1992

j:k+l
<(N-2k)P{i1>

since the Eli’sare independent

*.- >&+l}=

(k+l)!’

and identically distributed.

0

4.2. The proof proceeds by considering two sets of i,‘s: those far from the hi’s and those close to the bi’s. A bump will occur far from the hi’s with low probability. With high probability, the only bump that will occur in a neighborhood of bi will occur at one of the nearest fj’s. Specifically, let Proof of Theorem

and suppose that N is large enough to ensure that (k + 2)/( N + 1) is less than both 6 and A. FixAfj E W. Since there are no local maxima on the interval [Fj_,, ?j+k], f must be either non-increasing on [tj_k, ijl or non-decreasing on [ fj, Tj+,]. In the latter case, by an argument similar to that in the proof of Theorem 4.1, PIbump at fj> G P{tj > . . * > Lj+,} < l/( k + l)!. The same bound holds in the case that f is non-increasing on [fj_k, tj]. Thus P{bump at some ii E W) < N/(k + l)! + 0. To consider behavior near bi, fix i and let j be such that bi = (j + A,,)/(N + 1) E [fj, t;.+,> for some A,, E [O, 1). Note that {s: ~~s-bi~~(k+l)/(N+l)}~{j-k-l,...,j+k+l}. Assume that A,, E [0, $). The proof for the case A,, E [i, 1) is similar. It will be shown that p q._k_l < . . * <

q._l, $+, >

” ’ >

$+k) +’

and

These facts imply that the probability of a k-bump at fj-1 or & (and not at fj-k-1,. . . , t^._ , 2, t^. ,+I’. . . >‘j+k+l) \ converges to one. For j + 1 cm*(n + l>-*. Therefore, if both ] Y -E(K) 1 and ] Y+i - J%Y+J) 1 are less than cm2/(2(n + l>*), then e - $+i > 0. Similarly, for j - k - 1 < i 0. Thus ... aP(Ii;+i-E($+i)I

Using

<$_i

andq.+,>

“.

++k)

< cm*/(2( n + 1)2), i = -k - 1 ,...,

-1, l,...)

Chebyshev’s inequality, (3), and (2), for II sufficiently large, this probability is bounded below by

1 2k+l

’ - var( ‘j>

148

k1

-+I.

Volume

14, Number

STATISTICS

2

If the E~‘Sare normally

To show that

distributed

& PROBABILITY

LETTERS

27 May 1992

then by (3) and (2’),

P{q. < $_ ,, $ < $.+ ,} converges

to zero, first consider

the case

Then

E($_,) -E($.) = g + ;, where

c

A= (j-

1/2)m

-(“-‘(~,-li)](f(~)-f(~))7


c

B=

(j+A,h
and

c

G=

~(h-‘(i,-ti))

-h*m/;;f2m(x) dx.

(j-1/2)m
In A, f is evaluated

A<

at points m2

there

c

-c Cn +

Therefore,

in (bi - 6, bi] where

f is increasing.

w(he’(ij

-

f,))

Thus

-

--c

in~l12h*m_/~l,u(x)

dx.

112 (j-l/Z)m
exists CA > 0 such that
A/G

< - C,m 2II -2 . Whenever

(j + A,)m < i < (j + i)m,

and

yielding (j fCbi)-f

+ +>m n+l

(

By the Lipschitz condition on f there exists C, > 0 such that B/G < Cgmanpa(i - A,)l+u. Using the symmetry of o and assuming that (j + i)m is an integer, for ease of calculation,

write

149

Volume

14, Number

2

STATISTICS

& PROBABILITY

LETTERS

27 May 1992

where c

c=

O
and

In C, f is evaluated at t E [bi, bj + 6), where f is decreasing, so C/G < 0. Since, for (5 - A,)m < i < m, + 1) E (b;, bi + Al and ((j + i)m - i)/(n + 1) < bi,

((j + i)m + i)/(n

f

b.+ I

(j+ +)m +i

(j+

n+l

(j+t)m+i

_b, 1

n+l

+)m -i

n+l

< -4c

m2

(’ -A$

(n + l)*

2

Thus D<

m2

-4c

2(i -A,)2m(+

+A,)i;f

W(X).

(n+l) Therefore, there exists C, > 0 such that D/G yields that P i;<$_,,l+$,, ( ~j-~j_,

> =P(~j-~j_I

~ -Cam

+ C~~(f

< -C,m*n-*(i

-A,)*.


-A,)l+a:

~j-~j+1

~j-~,+1~

Combining the above bounds


-C~~(i

-An)2

(5)

Since var(Lj - ij+,) is of order l/m, if A,,, is a subsequence with m5/2n-2(i - A,r)2 -+ co, then (5) tends to zero, by Chebyshev’s inequality. If A,, is a subsequence with rn5/*np1(i - A,r>2 < M2 < co for all n’, some h4, then -C$

+ c,$(f

-A,)lta

Q - $

C, - CeM1+a

,,::.,,

= -$(Ca+o(l)),

since n12/m13 fa --+0. Again, we see that (5) converges to zero, using Chebyshev’s inequality and the facts that var(Ej - 2j_1> is of order l/m and m512/n2 + CQ. it will be shown that P($. < $__,I converges to For the case that inf,,I,,,lf(bi+x)-f(bj-x)~O, zero. Using the same notation and argument as above, there exists C, > 0 such that A/G Q -C,m2np2. For (j + A,)m < i < (j + i)m,

150

Volume 14, Number 2

STATISTICS

& PROBABILITY

since f is increasing on [bi - 6, bi]. Therefore,

which converges to zero, since var(Ej> = O(m-‘)

5. Simulation

LE-ITERS

27 May 1992

B G 0 and

and m5/‘/n2

converges to infinity.

0

results

Simulations were carried out using two regression functions defined on the unit interval: f(t) = - 8(t -2 ‘j2, which has one bump, and f(t) = sin(3nt), which has two. This scaling was chosen so that the differences between the maximum and minimum values of f were the same. One hundred equally spaced points were used, that is ti = i/101. The regression errors were normal, mean zero random variables, and simulations were run with the regression error standard deviations equal to 0.1, 0.5, and 1. Five hundred runs were carried out for each combination of regression function and error variance. The results are summarized in Tables 1 and 2. For instance, when f(t) = sin(3rt) and ;he error standard deviation is 1, consider looking for bumps of size k = 3 among the N = 50 c’s, where k; = f
Table 1 g(t) = - 80 - f,’ N

o-

k

100

50

25

20

10

5

4

0.1

1 2 3 4 5 6 7 8 9 10 11 12

0.0% 6.6% 22.8% 1.2%

0.0% 42.4% 8.4% 0.4%

11.0% 52.4% 23.4% 15.4% 12.2% 11.4% 11.2% 11.2% 11.0% 11.0% 11.0% 5.6%

42.0% 63.6% 46.0% 42.8% 42.0% 42.0% 42.0% 42.0% 36.8%

100.0% 100.0% 100.0% 100.0%

100.0% 100.0% 100.0% 100.0%

100.0% 100.0% 100.0% 100.0%

0.5

1 2 3 4 5

0.0% 2.2% 29.4% 2.4%

0.0% 20.8% 19.2% 1.4% 0.2%

0.0% 43.4% 9.0% 1.0%

0.0% 46.6% 9.0% 2.0% 0.8%

50.4% 65.4% 51.8% 45.8%

99.4% 95.8%

100.0%

1

1 2 3 4

0.0% 2.0% 29.6% 2.6%

0.0% 18.4% 17.0% 1.2%

0.0% 41.4% 7.4% 0.6%

0.0% 44.6% 7.4% 1.4%

11.6% 46.8% 16.0% 7.8%

93.2% 74.6%

100.0%

151

Volume

14, Number

2

STATISTICS

& PROBABILITY

LETTERS

21 May 1992

Table 2 g(t) = sin(3at) (T

k

N 100

50

25

20

10

0.1

1 2 3 4 5 6 7 8

0.0% 24.4% 2.6%

0.0% 27.8% 4.0% 1.6% 1.0% 0.6% 0.2% 0.2%

85.8% 92.0% 91.4% 30.8%

98.4% 98.4% 65.6%

100.0%

0.5

1 2 3

0.0% 6.6% 4.6%

0.0% 31.2% 1.4%

0.2% 24.0% 2.2%

6.6% 31.8% 7.2%

98.4% 0.6%

1

1 2 3

0.0% 7.2% 5.2%

0.0% 34.6% 1.2%

0.0% 24.4% 0.4%

0.2% 21.0% 1.0%

77.2% 2.6%

5

4

0.6%

function, for functions with shapes similar to those considered here. For a sample size of 100, to locate a central bump, one should take N = 5 and k = 1, while for two ‘side’ bumps, one should take N = 10 and k= 1.

References Arratia, R., L. Goldstein and L. Gordon (1988), Two moments suffice for Poisson approximations: the Chen-Stein method, Ann. Probab. 17, 9-25. Chen, L.H.Y. (19751, Poisson approximation for dependent trials, Ann. Probab. 3, 534-545. Chung, K.L. (19741, A Course in Probability Theory (Academic Press, New York). Good, I.J. and R.A. Gaskins (19801, Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data, J. Amer. Statist. Assoc. 75, 42-56. Muller, H.G. (1985), Kernel estimators of zeros and of loca-

152

tion and size of extrema of regression functions, &and. J. Statist. 12, 221-232. Schluter, D. (1988), Estimating the form of natural selection on a quantitative trait, Evolution 42, 849-861. Silverman, B.W. (19811, Using kernel density estimates to investigate multimodality, J. Roy. Statist. Sot. Ser. B 43, 97-99. Silverman, B.W. (19831, Some properties of a test for multimodality based on kernel density estimates, London Math. Sot. Lecture Note Ser. 79 (Cambridge University Press, Cambridge).