Statistics & Probability North-Holland
Letters
27 May 1992
14 (1992) 141-152
Bump hunting in regression analysis Nancy E. Heckman Department of Statistics, University of British Columbia, Vancouver, BC, Canada
Received March 1990 Revised August 1991
Abstract: Computer techniques for scatter-plot smoothing are valuable exploratory analysis tools. However, one must exercise caution when making inferences based on features of the smoothed data. Here, the size and number of observed modes are studied: the asymptotic distributions are determined under the assumption of constancy of the true regression function, and sufficient conditions for consistency of the estimated number of modes and their locations are given in the general case.
Keywords: Mode, bump
hunting,
Poisson
approximation.
1. Introduction Computer-intensive regression techniques such as spline and kernel smoothing are valuable exploratory data analysis tools for the applied researcher. However, the practitioner must often guess which features of the smoothed data are ‘real’ as opposed to those which are mere chance variations. In addition, the features of interest are greatly influenced by the amount of smoothing. A bump is a feature which can be of enormous significance. For instance, in evolutionary biology, if the probability of survival of an individual of a species is largest at a particular body weight, then stabilizing selection occurs, with individuals at that best body weight being favoured in the evolutionary process. The existence of the bump would then be of primary importance, not its location or size. A commonly used technique to determine the presence of stabilizing selection is parabolic logistic regression. But this technique has limitations, since a data cloud with a bump can often be better fit by a line than a parabola. Recently, Schluter (1988) used a non-parametric form of logistic regression called GLIM splining to study the shape of probability survival curves. The resulting data displays often exhibited bumps which were not detectable by standard analyses. How does one know if these are indeed bumps, or merely random fluctuations which can be expected in any data set? This question underlies the analysis in this paper. One might decide that a bump has occurred if the estimates of the regression function at specified values of the independent variable rise and then fall over a range of the independent variable. Specifically, let c, i = 1,. . . , N, represent these estimates. (Typically, N will be much smaller than the
Correspondence to: Nancy Canada V6T 122.
E. Heckman,
Research
the Natural
supported
0167-7152/92/$05.00
under
0 1992 - Elsevier
Dept.
Sciences
of Statistics,
University
and Engineering
Science Publishers
Research
of British
Columbia,
2021 West
Council
of Canada
grant
B.V. All rights reserved
Mall,
Vancouver,
BC,
5-87969.
141
Volume
14, Number
2
STATISTICS
& PROBABILITY
LETTERS
27 May 1992
sample size, to insure that we do not search for a bump over too small a range of the independent variable.) A ‘bump’ of size k (or a ‘k-bump’) at the jth estimate, k
The value of k must be at least 1 (in which case a bump involves 3 q. values) and N > 2k. The purpose of this paper is to study the asymptotic distribution of SC, the number of k-bumps, and K,, the maximum bump size observed. These results can be used in hypothesis tests which reject that the regression function is monotone if either Sk > 0 or, equivalently, if K, > k. The results also give some indication of how large a bump should be before it can be considered ‘real’. In addition, Sk is a consistent estimate of the number of modes of the regression function, under conditions on the regression function, k, N, the sample size, and the amount of smoothing. Assumptions and notation are given in Section 2. Section 3 contains the derivation of the asymptotic distribution of Sh and K, in the simple case in which the true regression function is a constant. In particular, it is shown that P{S,‘$> O} converges to zero if k, N + 03, and converges to 1 if N + CQ,k fixed. In Section 4 bounds on P(Sk > O}= P(K, & k) are obtained assuming that the regression function is monotone. These bounds imply that P{$, > 0) converges to zero if N/(K + l)! + ~0.Sk is shown to be a consistent estimator of the number of local maxima of a general regression function. Section 5 contains simulation results, which confirm that ‘real’ bumps must be of a substantial size. Several authors have studied the bump-hunting problem in density estimation. Good and Gaskins (1980) propose an ad hoc method for indexing the ‘believability’ of a bump. Silverman (1981, 1983) propose a clever test statistic which exploits the sensitivity of the number of modes to the amount of normal kernel smoothing done. Although his test statistics appears to be sensible for other kernels and for regression problems, his proof of the test’s consistency depends heavily upon the use of a normal kernel and the application to density estimation, and thus may not easily extend to the regression model considered here. Muller (1985) propose estimating the height and location of a mode of a regression function by the size and location of a mode of a kernel estimate of the regression function, but he does not consider multi-modal functions, nor does he discuss the behavior of his estimates when the regression function is monotone. In addition, Muller’s conditions on the regression function are more restrictive than those considered here. He assumes that the regression function is twice continuously differentiable. Here, this condition is replaced by weaker ones that insure that a local maximum is ‘sufficiently visible’ (see Section 4). In particular, the regression function need not be continuous. However, Muller derives the asymptotic distribution of the location of a mode, while Theorem 4.2 below states only that the estimated locations of the bumps converge to the true locations.
2. Notation Suppose that (ti, y), i = 1,. . ., n, are raw data, with a f t,
: f:
ti)/h)k:
w((i,-
f(fj) = $. = i=,’
C w(
(;j
i=l
142
-
Ii)/h)
’
Volume
14, Number
STATISTICS
2
& PROBABILITY
27 May 1992
LETTERS
where o is non-negative, with support contained in the interval [- f, il. Typicaliy, N is much smaller than n; the existence of a bump is inferred from the f”(&Ys rather than from the f(ti)‘s, which are more likely to exhibit a spurious bump. For ease of calculations, the following assumption is made to insure that $ - E(q)% are independent and identically distributed. Assume that ti = i/(n + I), ~j =j/(N + I>, m=m,=(n+l)/(N+l)
is an integer, and h Let I,! = 1 if $k has occurred at tj if the maximum bump
(1)
< l/W
+ 1). Thus,*4 is anweighted averagepf yi’s with (j - i)rn m. < $_k+l < a.. < y and q > q+i > . * * > l$+k, and I, = 0 else. Thus, a k-bump and only if ZF = 1. Si, the number of k-bumps, is I:+ i + . . . +Zi_k and, if K, is size observed,
P(K,,,>, k} =P(SI;#O}. Thus, the distribution of the number of bumps of size k determines the distribution of K,, and thus helps determine what size bump is ‘real’. Note that Z: and 1: are independent, provided I j - I I > 2k, and that Z:1/ = 0 for 0 < I j - I I < 2k. If pjk’ = E(l,b], then N-k E(S$)
=
c
I)i’k’,
ktl
and 2k
N-k-l
N-3k
= C pi’“‘(l -pj’“‘) - 2 C
C
pj(k)pjk+)l + 2 C
j=k+l
k+l
N-k
var(Si)
kfl
I=1
E(Zjlik,,,).
The results of Sections 3 and 4 may hold under different constructions of the f.‘s. The proofs of the theorems use the independence of the f’s and an analysis of the E($$. For instance, if f is monotone, as in Section 3, for the construction here E(gl) = E(pz> = . . . = E(Y,). This would certainly hold for many other estimates of f. The weakening of the independence structure is more delicate, as it complicates the analysis of P tj_k
...
...
>tj+k
>
)
where Ej = $ - E(I’,). If this probability is sufficiently close to the probability in the independent (e.g. the difference is o(N/k)), the results should hold.
3. The case of a constant regression
case
function
If f is constant, then the t’s are independent and identically distributed and every ordering of the t’s is equally likely. Thus, the distribution of S,k does not depend on w or h, provided that the t’s are independent; nor does it depend upon the error distribution, provided only that it is continuous. Thus, the results ofAthis section actually apply not only to the t’s, but also to the original data, where N = 12, m = 1, and q = y. In this case pck’ =pjk) and Cck’ = Cjk) = E(I,kIjk+2k) don’t depend on j, yielding E( Sk) = (N - 2k)pck’,
and var(Sk)
= (N-2k)pck)-
(pck’)‘(N(4k
+ 1) - 4k - 12k2) + 2(N-
4k)Cck’. 143
Volume
14, Number
2
STATISTICS
& PROBABILITY
LETI’ERS
27 May 1992
Lemma 3.1. p(k)=
(2k+
l)-‘(k!)_*,
Cck’= ~(47+,,) =
~$_ “;(,;,“:;T’ ds]* du
( _ l)j+’ (k-j)!(k-Z)!(k+j)!(k+l)!(2k+j+E+l)
j,r=o
g(2k+1)(p’k’)2’
Proof. The superscript k will be omitted. Since the distribution of the Zj’s doesn’t depend upon the distribution of the t.‘s, assume without loss of generality that the 8’s are uniformly distributed on the unit interval. Let
It can be shown by induction on 1 that f(a, b, I) = (b - all/l!, and thus
=
Ykk+l(Yk+l -4k-1 dy
1
/
k!(k - l)!
Yk+l=U
k+l-
Hence, du = (2k + l)-‘(k!)-*,
p = (‘h(u) ‘0
and
U&+2< =
/ *l_*h2(4
by the independence 1
>?dk+l I
du
du,
(y-l+l)k(y-u)k-l k!(k - l)!
‘(:)(-l)i(l-U)‘+k I=0
=/$o(k_,),;k++l)‘(l-u)‘+km
144
‘-*
and exchangeability of the 2’s. By a binomial expansion,
h(u) = / y=u =
*** <&+1&k++
(ydy=
‘(:)l:. I=0
T(1-t l)T(k) Z(Z+k+I)
kl(k:I)!
l)‘(y-u)k-l kl(k-I)!
dy
Volume
14, Number
STATISTICS
2
& PROBABILITY
27 May 1992
LETTERS
The summation expression for C follows easily by term by term integration of h*. Integration yields that 1 cG
q
(k!(k-l)!j2
For the remainder of the section, ZN will denote a Poisson random variable with parameter h,=(N-2k)p’k’ Theorem 3.1.
sup 1P{Sk EA} - P(ZN EA} 1G 2pck’(4k + l)(l
- e--AN),
A P{Sk
f
o}
<
(n
-
2k)P@‘.
Proof. The first statement follows immediately from Lemma 3.1 and an application of Theorem Arratia, Goldstein, and Gordon (1988). The second statement of the theorem holds since
P(SL # 0} = P{bump at ik+i or at . . . t^N_k}
1 of
0
The following theorem and corollaries illustrate the danger of not allowing the size of the bump to increase adequately with the sample size. If k, the bump size, does not increase with N, then we will ‘see’ more and more bumps as the sample size grows, even when the regression function is constant. This is intuitive: if k is fixed, as we gather more data, we will be searching for a bump in too small a range of the independent variable. However, the probability of seeing a bump of size k will converge to zero, even if k approaches infinity slowly (see Corollary 3.2). Theorem 3.2. Suppose that k is fixed. Then (Sk - E(Sk))/
/m
is asymptotically standard normal.
Proof. Using the k-dependent
form of the Central Limit Theorem (Theorem suffices to show that N-*j3 var(SL) converges to infinity. Since C 2 0, var(S,$) > Np( 1 -p(4k
+ 1) + o(1))
= O(N)
7.3.1 of Chung, 1974), it
for k 2 2.
The last equality follows from the fact that p(4k + 1) is a decreasing function of k for k 2 1 which q equals 9/20 when k = 2. For k = 1, direct computation using Lemma 3.1 yields var(Si) = O(N). Corollary 3.1. Zf NpCk) approaches infinity, then Sk is asymptotically normal with mean and variance equal to NpCk), If NpCk’ approaches A, finite and non-zero, then Sk is asymptotically Poisson with parameter A. If NpCk’ approaches zero, then SL converges to zero in probability. q Corollary 3.2. Zf k is ftied, P{K,
< k} + 0. If k + COthen
1P( K, < k} - exp( -AN) 1G 2pCk)(4k + 1) which converges to zero.
q 145
Volume
14, Number
2
STATISTICS
4. The case of a general regression
& PROBABILITY
LETTERS
27 May 1992
function
The consistency of S,$ is stated in Theorems 4.1 (assuming the regression function is non-decreasing) and in Theorem 4.2, for the general regression function. The proofs of the theorems are given at the end of this section. Theorem 4.1. Suppose that f is non-decreasing.
Then
In the general regression function case, suppose that f has exactly 1 local maxima b, < b and that there exist c > 0 and 6 > 0 such that
at a < b, < . . . <
f(t) -f(s) 2c(t-s)2, whenever either bj - S . It is also satisfied for some non-continuous f. The regression function must satisfy monotonicity properties on subintervals of [a, 61. Specifically, assume that there exists a finite number of closed intervals, on each of which f is either non-decreasing or non-increasing. To understand the necessity of this assumption, consider f defined on [O, 11 with f(x) =x for x < i, f(i) = 1, and f(x) = 2 -x for x > i. Then f has no local maxima, but most smoothing methods would produce an estimate of f with a local maximum at x near i. Assume that there exists a A > 0 such that for each i, iflofilIf(bi+x)
either
-f(bi-x)
20
or
inf f(bj-x) x E KU1
-f(bi+x)
20.
Suppose that the kernel, w, and W* are Riemann integrable, symmetric about zero, uniformly continuous on [ - $, i], and bounded away from zero. These assumptions appear to be an artifact of the method of proof. Fortunately they are mild, being satisfied by commonly used kernels and by most well behaved functions. Theorem 4.2. Assume that the E,‘S haue variance u2 and let l/2
/ p2
=
w’(x) dx
-l/2 2’
dx
w(x)
Suppose that k N
+ 0,
N (k+l)!
+”
hN+h*E(O,
11,
and that f satisfies the following Lipschitz condition: there exist (Y > 0, /3 > 0 such that If(bi) 146
-f(x)1
GPIbi-xl”>
Volume
14, Number
2
STATISTICS
& PROBABILITY
LETTERS
27 May 1992
for all x in a neighbourhood of bi (since, by assumption, f is increasing on [b, - 6, bj] and decreasing on [bi,bi + S] with a maximum at bi, this condition is always satisfied for LY= 0). Let m be as in (1) and c as in Theorem 4.1. If there exists y > 0 such that l-
4a2u2 (1 +y)h*c2,s
n4
exist exactly 1 bumps
of size k) + 1,
provided that n’2/m13+a -+ 0. Moreover, 3/(2(n + 1)) of the hi’s converges to one. If we assume
(2)
+ 1,
then P( there
1 k
the probability that the 1 bumps occur at ij’s that are within
that the E~‘Sare normally
-
I
i
2pLan2(1
then (2) can be replaced
by the weaker
condition:
II
em5/2/., *r/2 l--2@
distributed k
+ 1,
+ y)
(2’)
where @ denotes the cumulative distribution function of a standard normal. Note that conditions (2) and (2’) imply that m512/n2 -;’ ~0 and thus n12[m13+a - 0 for (Y2 2. The proofs of both theorems require analysis of .Cj= E; - E(E;) and _Hq - q.+ 1>.
‘i=
i=i Ew((‘j-t,)/hj i=l
Therefore,
’
-m/2
the tj’s are independent
c var(.Cj) = u2
ojh-I--&)
c
=
and identically
distributed
with
m2(h-+
-m/2
pi”’
n:
1 ij2
(3)
“@2PL2(h*m)-‘.
-m/2
Similar
calculations
yield
c (j-1/2)m
-
@+,)
=
ar(h-‘(t;-
c
ti))
’
(4)
(j-1/2h
Proo[ of Theorem 4.1, EquaJion (4) and I+ ?;.+l, then 2j (= y - E(x.1) > tj+ 1 and P{S+-O}=P
Nck I{$_,< i
f
“.
non-decreasing
<$,Q
imply
...
that
E(e)
Q E($+l>.
Therefore,
if
>$+k
j=k+l
I
147
STATISTICS & PROBABILITY
Volume 14, Number 2
forsome
**. >tj+k
LETTERS
27 May 1992
j:k+l
<(N-2k)P{i1>
since the Eli’sare independent
*.- >&+l}=
(k+l)!’
and identically distributed.
0
4.2. The proof proceeds by considering two sets of i,‘s: those far from the hi’s and those close to the bi’s. A bump will occur far from the hi’s with low probability. With high probability, the only bump that will occur in a neighborhood of bi will occur at one of the nearest fj’s. Specifically, let Proof of Theorem
and suppose that N is large enough to ensure that (k + 2)/( N + 1) is less than both 6 and A. FixAfj E W. Since there are no local maxima on the interval [Fj_,, ?j+k], f must be either non-increasing on [tj_k, ijl or non-decreasing on [ fj, Tj+,]. In the latter case, by an argument similar to that in the proof of Theorem 4.1, PIbump at fj> G P{tj > . . * > Lj+,} < l/( k + l)!. The same bound holds in the case that f is non-increasing on [fj_k, tj]. Thus P{bump at some ii E W) < N/(k + l)! + 0. To consider behavior near bi, fix i and let j be such that bi = (j + A,,)/(N + 1) E [fj, t;.+,> for some A,, E [O, 1). Note that {s: ~~s-bi~~(k+l)/(N+l)}~{j-k-l,...,j+k+l}. Assume that A,, E [0, $). The proof for the case A,, E [i, 1) is similar. It will be shown that p q._k_l < . . * <
q._l, $+, >
” ’ >
$+k) +’
and
These facts imply that the probability of a k-bump at fj-1 or & (and not at fj-k-1,. . . , t^._ , 2, t^. ,+I’. . . >‘j+k+l) \ converges to one. For j + 1 cm*(n + l>-*. Therefore, if both ] Y -E(K) 1 and ] Y+i - J%Y+J) 1 are less than cm2/(2(n + l>*), then e - $+i > 0. Similarly, for j - k - 1 < i 0. Thus ... aP(Ii;+i-E($+i)I
Using
<$_i
andq.+,>
“.
++k)
< cm*/(2( n + 1)2), i = -k - 1 ,...,
-1, l,...)
Chebyshev’s inequality, (3), and (2), for II sufficiently large, this probability is bounded below by
1 2k+l
’ - var( ‘j>
148
k1
-+I.
Volume
14, Number
STATISTICS
2
If the E~‘Sare normally
To show that
distributed
& PROBABILITY
LETTERS
27 May 1992
then by (3) and (2’),
P{q. < $_ ,, $ < $.+ ,} converges
to zero, first consider
the case
Then
E($_,) -E($.) = g + ;, where
c
A= (j-
1/2)m
-(“-‘(~,-li)](f(~)-f(~))7
c
B=
(j+A,h
and
c
G=
~(h-‘(i,-ti))
-h*m/;;f2m(x) dx.
(j-1/2)m
In A, f is evaluated
A<
at points m2
there
c
-c Cn +
Therefore,
in (bi - 6, bi] where
f is increasing.
w(he’(ij
-
f,))
Thus
-
--c
in~l12h*m_/~l,u(x)
dx.
112 (j-l/Z)m
exists CA > 0 such that
A/G
< - C,m 2II -2 . Whenever
(j + A,)m < i < (j + i)m,
and
yielding (j fCbi)-f
+ +>m n+l
(
By the Lipschitz condition on f there exists C, > 0 such that B/G < Cgmanpa(i - A,)l+u. Using the symmetry of o and assuming that (j + i)m is an integer, for ease of calculation,
write
149
Volume
14, Number
2
STATISTICS
& PROBABILITY
LETTERS
27 May 1992
where c
c=
O
and
In C, f is evaluated at t E [bi, bj + 6), where f is decreasing, so C/G < 0. Since, for (5 - A,)m < i < m, + 1) E (b;, bi + Al and ((j + i)m - i)/(n + 1) < bi,
((j + i)m + i)/(n
f
b.+ I
(j+ +)m +i
(j+
n+l
(j+t)m+i
_b, 1
n+l
+)m -i
n+l
< -4c
m2
(’ -A$
(n + l)*
2
Thus D<
m2
-4c
2(i -A,)2m(+
+A,)i;f
W(X).
(n+l) Therefore, there exists C, > 0 such that D/G yields that P i;<$_,,l+$,, ( ~j-~j_,
> =P(~j-~j_I
~ -Cam
+ C~~(f
< -C,m*n-*(i
-A,)*.
-A,)l+a:
~j-~j+1
~j-~,+1~
Combining the above bounds
-C~~(i
-An)2
(5)
Since var(Lj - ij+,) is of order l/m, if A,,, is a subsequence with m5/2n-2(i - A,r)2 -+ co, then (5) tends to zero, by Chebyshev’s inequality. If A,, is a subsequence with rn5/*np1(i - A,r>2 < M2 < co for all n’, some h4, then -C$
+ c,$(f
-A,)lta
Q - $
C, - CeM1+a
,,::.,,
= -$(Ca+o(l)),
since n12/m13 fa --+0. Again, we see that (5) converges to zero, using Chebyshev’s inequality and the facts that var(Ej - 2j_1> is of order l/m and m512/n2 + CQ. it will be shown that P($. < $__,I converges to For the case that inf,,I,,,lf(bi+x)-f(bj-x)~O, zero. Using the same notation and argument as above, there exists C, > 0 such that A/G Q -C,m2np2. For (j + A,)m < i < (j + i)m,
150
Volume 14, Number 2
STATISTICS
& PROBABILITY
since f is increasing on [bi - 6, bi]. Therefore,
which converges to zero, since var(Ej> = O(m-‘)
5. Simulation
LE-ITERS
27 May 1992
B G 0 and
and m5/‘/n2
converges to infinity.
0
results
Simulations were carried out using two regression functions defined on the unit interval: f(t) = - 8(t -2 ‘j2, which has one bump, and f(t) = sin(3nt), which has two. This scaling was chosen so that the differences between the maximum and minimum values of f were the same. One hundred equally spaced points were used, that is ti = i/101. The regression errors were normal, mean zero random variables, and simulations were run with the regression error standard deviations equal to 0.1, 0.5, and 1. Five hundred runs were carried out for each combination of regression function and error variance. The results are summarized in Tables 1 and 2. For instance, when f(t) = sin(3rt) and ;he error standard deviation is 1, consider looking for bumps of size k = 3 among the N = 50 c’s, where k; = f
Table 1 g(t) = - 80 - f,’ N
o-
k
100
50
25
20
10
5
4
0.1
1 2 3 4 5 6 7 8 9 10 11 12
0.0% 6.6% 22.8% 1.2%
0.0% 42.4% 8.4% 0.4%
11.0% 52.4% 23.4% 15.4% 12.2% 11.4% 11.2% 11.2% 11.0% 11.0% 11.0% 5.6%
42.0% 63.6% 46.0% 42.8% 42.0% 42.0% 42.0% 42.0% 36.8%
100.0% 100.0% 100.0% 100.0%
100.0% 100.0% 100.0% 100.0%
100.0% 100.0% 100.0% 100.0%
0.5
1 2 3 4 5
0.0% 2.2% 29.4% 2.4%
0.0% 20.8% 19.2% 1.4% 0.2%
0.0% 43.4% 9.0% 1.0%
0.0% 46.6% 9.0% 2.0% 0.8%
50.4% 65.4% 51.8% 45.8%
99.4% 95.8%
100.0%
1
1 2 3 4
0.0% 2.0% 29.6% 2.6%
0.0% 18.4% 17.0% 1.2%
0.0% 41.4% 7.4% 0.6%
0.0% 44.6% 7.4% 1.4%
11.6% 46.8% 16.0% 7.8%
93.2% 74.6%
100.0%
151
Volume
14, Number
2
STATISTICS
& PROBABILITY
LETTERS
21 May 1992
Table 2 g(t) = sin(3at) (T
k
N 100
50
25
20
10
0.1
1 2 3 4 5 6 7 8
0.0% 24.4% 2.6%
0.0% 27.8% 4.0% 1.6% 1.0% 0.6% 0.2% 0.2%
85.8% 92.0% 91.4% 30.8%
98.4% 98.4% 65.6%
100.0%
0.5
1 2 3
0.0% 6.6% 4.6%
0.0% 31.2% 1.4%
0.2% 24.0% 2.2%
6.6% 31.8% 7.2%
98.4% 0.6%
1
1 2 3
0.0% 7.2% 5.2%
0.0% 34.6% 1.2%
0.0% 24.4% 0.4%
0.2% 21.0% 1.0%
77.2% 2.6%
5
4
0.6%
function, for functions with shapes similar to those considered here. For a sample size of 100, to locate a central bump, one should take N = 5 and k = 1, while for two ‘side’ bumps, one should take N = 10 and k= 1.
References Arratia, R., L. Goldstein and L. Gordon (1988), Two moments suffice for Poisson approximations: the Chen-Stein method, Ann. Probab. 17, 9-25. Chen, L.H.Y. (19751, Poisson approximation for dependent trials, Ann. Probab. 3, 534-545. Chung, K.L. (19741, A Course in Probability Theory (Academic Press, New York). Good, I.J. and R.A. Gaskins (19801, Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data, J. Amer. Statist. Assoc. 75, 42-56. Muller, H.G. (1985), Kernel estimators of zeros and of loca-
152
tion and size of extrema of regression functions, &and. J. Statist. 12, 221-232. Schluter, D. (1988), Estimating the form of natural selection on a quantitative trait, Evolution 42, 849-861. Silverman, B.W. (19811, Using kernel density estimates to investigate multimodality, J. Roy. Statist. Sot. Ser. B 43, 97-99. Silverman, B.W. (19831, Some properties of a test for multimodality based on kernel density estimates, London Math. Sot. Lecture Note Ser. 79 (Cambridge University Press, Cambridge).