Statistics & Probability North-Holland
Letters
27 May 1992
14 (1992) 119-128
Limit theorems for the simplicial depth Lutz Diimbgen Institut fiir angewandte Mathematik, Heidelberg, Germany Received July 1990 Revised July 1991
Abstract: The large sample behavior and a robustness property of the simplicial depth for multivariate data, introduced (1990), are studied. A functional CLT is derived and applied to an L-statistic, which estimates the center of a smooth distribution on d-dimensional Euclidean space. Keywords: Simplicial
depth,
Vapnik-Cervonenkis
class, empirical
process,
robustness,
multivariate
by R. Liu symmetric
L-statistic.
1. Introduction Let F be a probability distribution y E Rd with respect to F, D(Y,
on R d, d a 2. Liu (1990) introduces
F):=Pr{y~ch”(X,,
X2,...,&+,)},
(Xt,
where chO(.x,, x2, x3,. . > denotes the interior of the D(y, F) is a measure of ‘centrality’ of the point y with simplicial median M(F) of F to be a point with maximal to extend L-functionals for univariate distributions, e.g. simplest way is to use weighted averages such as L(F)
:= jYw(u(y,
F))F(dy)
/W(
the simplicial
xZ,...,xd+t>
depth
of a point
_Fd+‘,
convex hull ’ of xi, x2, x3,. . . . This quantity respect to F. For instance one can define the depth D(., F). Another possible application is the trimmed mean, to higher dimensions. The
D( y, F))F(dy),
I
with a suitable weight function W: [O, 11+ [O, 11. As for the connection to the univariate case note that for a continuous c.d.f. F on the line the above definition yields D(., F) = 2&y>(l -F(y)). Hence the median of F is a point maximizing DC., F), and the a-trimmed mean of F, (1 -a)-‘~~~~]‘:1,‘“4”cdy), 01 is just L(F) with W(r) := I{r 2 (~(1 - &)I. Let us briefly describe some of Liu’s (1990) results. An important property of D(., . > is its affine equivariance: if A is an affine transformation of Rd, then D(Ay, _Y’,,_.(Ax)) =+NY, F) for all y E Rd. Correspondence
to: Prof. Lutz Diimbgen,
Inst. f. angew.
Mathematik,
INF 294, 69 Heidelberg,
Germany
Partly supported by the Deutsche Forschungsgemeinschaft. ’ Liu (1990) used the (closed) convex hull rather than its interior. For smooth distributions F this makes no difference, the interior simplifies some arguments; see Diimbgen (1990) for the technical details without this modification.
0167-7152/92/$05.00
0 1992 - Elsevier
Science
Publishers
B.V. All rights reserved
and using
119
Volume 14, Number 2
STATISTICS & PROBABILITY
LETTERS
27 May 1992
Hence location parameters such as M(F) or L(F) above are affine equivariant. The function D(y, F) tends to 0 as 1y I + 03. If F is smooth in the sense that F(H)
= 0
H c Rd,
for arbitrary hyperplanes
(1)
then D(., F) is continuous. In addition let F be symmetric around P E Rd, that means 9x__.(X) = _Px_ F(2p -X). Then D(p, F) = 2-d and D(p + ry, F) is monotone decreasing in Y2 0 for arbitrary y E Rd. In particular M(F) = I_Land, by equivariance, L(F) = EL. In Section 2 it is shown that F H DC., F) is continyous in a certain sense. In particular DC., finI is a uniformly consistent estimator for DC., F), where F, is the empirical distribution of IZ independent random variables X,, X,, . . . , X,, with distribution F. Moreover, if F satisfies (1) and F’ is any other distribution on Rd, then DC., F’) tends uniformly to DC*, F) whenever F’ converges weakly to F. In this sense the simplicial depth DC., . > is robust. In Section 3 a functional CLT for &(D(*, fi,,,>- DC., F)) is derived under assumption (1) on F. This CLT is applied in Section 4 to derive the asymptotic normality of L(F) for smooth W.
2. Consistency
and robustness
For any second distribution F’ on Rd there is a useful representation of the function DC., F’) - DC., F), which relates it to the function ((F’ - F)(A): A E&I with a simple class ~2 of subsets of Rd. At first we have to introduce some nOtatiOn: for y E Rd and x,, x2,. . .) xd+l E [Wd let d := (x,, x2,. . . , xd+,>, u :=(x,, x2, . . . , x,>, and A( y, U) := {x E Rd: y E ChO(U, x)}. Then, by the symmetry of Ily E ch”WJ as a function of i?‘,one can write D(Y,
F’)-D(y,
F)=~I{y~ch(l(~)}(F’~+‘-Fd+‘)(dir)
=
=
/{
/
1 y E ch”(5))
5 FfkFd+(F’-F)(dy) k=O
(F’-
F)( A( y, u))
5
F’kFd-k(du).
k=O
Now consider the sets A(y, v). One may write x~R~:O~ch’=(x,-y,
A(y,u)={
=Y +A’(x,
x2-y
,..., xd-y,
-Y, x2-Y,...,xd-y),
where generally A’(U) := {x E Rd: 0 E chO(u, x)}. One can easily show that d A’(u)
=
-interiorof
CA,x;: i=l
120
A,, h,,...,h,>O
.
x-y)}
(2)
Volume
14. Number
STATISTICS
2
of d open halfspaces
& := {intersections and the corresponding
Kolmogorov-Smirnov
27 May 1992
LETTERS
set is nonvoid, if and only if x,, x2,. . . , xd are linearly of d open halfspaces. Consequently, with
The latter intersection
Theorem
& PROBABILITY
independent,
in which case it is the
in Rd) 11. II @, the representation
norm
(2) yields:
1. For arbitrary distributions F, F’ on Rd,
IID(.,F’)-D(.,F)II,~(d+l)llF’-FII,. The class ti
is a Vapnik-Cervonenkis
11$n - F (IM = o( 1) see Pollard (19841, Chapter := A(@,, - F):
0
class, which implies
almost
that
surely,
II. In addition
(3) a CLT
holds
for the
normalized
empirical
The processes (B,(A): A E&‘) converge in distribution to a centered Gaussian process (B(A): A E_w’) having bounded, uniformly continuous paths with respect to the pseudodistance p(A, A’) := F(A A A’),
see Pollard is uniformly Corollary
(1984), Chapters consistent:
IV and VII.5. These
two facts and Theorem
1 together
process
B,
(4)
show that D(. , fin’,>
1. almostsurely
/D(.,
fin)-D(*,
F)II_=o(l)
I&,
F’,)-W,
F)/I_=0,(1/6+.
and
Another
consequence
of Theorem
Corollary 2. Zf F satisfies (0, IID(., With a simple
1 is the robustness
of D(. , .I:
then
F’) -D(.,
F)II,-+O
compactness
argument
sup F(U(H,
0
6)) -+O
wheneuerF’+Fweakly. one can show that (1) implies
as SJO,
(5)
HEX
where
3
is the family of hyperplanes U( s, 6) := {x E R d: dist(x,
•I
in Rd, and S)
dist(x,
S) := j$Ix-yl
for subsets S of Rd. Since the boundary of any set A E& is contained in a union of d hyperplanes, it follows from (5) and Theorem 2 of Billingsley and Topsoe (1967) that II F’ - F IIti -+ 0 whenever F’ + F weakly. 121
Volume 14, Number 2
STATISTICS & PROBABILITY
LETTERS
27 May 1992
3. A central limit theorem Here and in Section G,, :=
where
4 we generally
(WY):
assume
the process
Y E W),
06 := Rd U (a> is the Alexandrov G,(Y)
that F meets (1). We consider
:= \~;;(D(Y,
compactification
pn) -NY,
of Rd, and
F))
(D(ccJ, . > := 0). Using (2) one can write G,(Y)
4) i
= /&(A(Y>
k=O
(4~
. ) := @I. The result usual empirical process G,*(y)
:=
consists
fi;Fd-k(d4
of two parts:
(6)
One can approximate
this process
u))Fd(du) = /g(y>x)4,(dx)
(d+ l)/B,(A(y,
(Y
G,, uniformly
by the
EK),
where
Then
a CLT holds for G,, and G,*:
Theorem 2.
If F
meets (101, then
IIG,*-G,II~=o,(l). Further the processes Gi*’ converge in distribution to a centered Gaussian process (G(y): continuous paths and covariances given by E(G(y)G(y’))
=Cov,_.(g(y,
X),
g(y’,
X)).
y E W> with
(7)
‘Convergence in distribution’ means the following: There is a sequence of stochastic processes G,, = (6,(y): y E W) with continuous paths, where G,, and GL*) are defined on the same probability space, such that IIen - GA*) IIm tends to 0 in (outer) probability, and 6, converges in distribution to G in the space of continuous functions on K. By an approximation argument similar as in the proof of Theorem 21 in Pollard (1984), Chapter VII.5, one can show that the weak convergence of the finite dimensional distributions, (9) below, and the stochastic equicontinuity, (10) below, are sufficient for Theorem 2 to hold. Proof of Theorem 2. Since the functions IID(., with a nonrandom
6 ++ I{y E ch”(fiN, y E Rd, are uniformly
in) -Dnll,~O(I/n), 0(1/n>,
where
D,(y)
is the U-statistic
-1 lgi(l)< 122
“.
I{y E
c
chO(Xi(l),...,Xi(d+l))).
bounded,
Volume
14, Number
Moreover,
STATISTICS
2
the standardized G,*(Y)
for arbitrary
fi(D,(y)
- G,(y)
CLT implies
= or,(l)
F)) is equal
to
(1948). Consequently,
for arbitrary
fixed y E K.
(8)
that:
The finite dimensional Gaussian distributions Now suppose
- D(y,
21 May 1992
LETTERS
-to,(I),
fixed y E IWd, see Hoeffding G,*(y)
The usual
U-statistic
& PROBABILITY
that the processes
marginal distributions of G,* converge with covariances given by (7). GL*) are stochastically
equicontinuous
lim supPr IGA*)(p’)-G~*)(y)~>~)+O sup ( Y,Y’EK, m(y,y’)
is any metric for K. Then the first assertion The second part is a consequence of (9) and (10). Assertion Lemma.
(10) itself is a consequence
of the following
weakly to centered (9) in the following as6LO,V7I>O,
in Theorem
sense: (10)
2 follows easily from (8) and (lo),
result, which is proved
at the end of this section.
Let G* be an arbitrary function on K of the form G*(Y)
= /b(A(y,
u))
where F,, F2,. . . , Fd are probability measures on IWd,and b is a bounded function on JZ?with b(@) = 0. Then the following inequalities hold for arbitrary R, E > 0, where w(b, . > denotes the modulus of continuity of b with respect to p(. , . >: for ally E rWd\U(O, R): IG*( y)I < I, b I,ti 5 Fi(rWd\U(O, R)) + w(b, F(Rd\U(O,
R)));
(11)
i=l
forally,
y’~
U(0, R),
~G*(y’)-G*(y)~<2,,b,,,k’“\~(O,R))+~(b,o) i=l +21/bilMi
sup Fi(u(H> i=i
Y))>
Ht&”
(12)
where (Y := 2F(rWd\U(0,
R)) + 2d sup F(U(H,
/3)),
HET p:=
ly’-yl(1+(2R)/e),
~:=((4R)“-‘e)“~.
If we apply the Lemma to the processes CLT (4) for B, implies that limsupPr(w(B,,6)>n}+O n+m
G, and G,*, we have to take b = B, and
F, E {F, p,J. The
asSJO,Vq>O.
123
Volume 14, Number 2
STATISTICS & PROBABILITY
LETTERS
27 May 1992
Further, by (3) and the Law of Large Numbers, SUP &(qK
6))
=
SUP F(qK
HEZ
6))
+0,(l),
HEX
QrWd\U(O,
R))
=qRd\q,
R))
+ o,(l),
and the limits on the right hand sides are arbitrarily small for 6 sufficiently small and R sufficiently large; see (5). Together with the Lemma this yields (10). 0 Proof of the Lemma. First of all, if y E rWd\ U(0,
RI and Y E U(0, Rid,then the set A(y, v> is a subset of
I@\ U(0, RI.Thus ~G*(Y)] =l/(WG
u)) -@))(
< i Rj(rWd\U(O, R)) ++,
fiF,)(du)l F(Rd\U(O,
R))),
i=l
which is (11). In order to prove (12) let y, y’ be arbitrary points in U(0, RI.One can easily show that I*(Y’)
-G*(y)1
~211&f$Qd\U(0,
R))++,
a’)
i=l
u E U(0, R)d: dist( y, ch( u)) < &}
{u E U(0, R)d:
dist( y’, ch(v))
(13)
where a’
:=
ch( .) denotes
F( A( Y, p>A A( Y', u));
SUP UE[R~]~,
dist(y,
ch(u))>e,
distcy’,
ch(u))>E
the convex hull. The term (Y’in (13) may be bounded as follows. For Y as in the definition
of (Y’, F(A(y,
~)\A(Y’,
u))
O(Rd\U(O,
R))
+d
SUP F(Wf,
P)),
HEZ
where p := I y’ - y I(1 + (~R)/E).
The last two summands on the right hand side of (13) may be bounded
where y := [(4R)d-1~]‘/d. Th ese two bounds (14) and (151, applied to (131, yield (12). Proof of (14). Let x EA(y,
v), say
d
x=y-
&i,(xj-y) i=l
124
(14)
(A,,&
,..., A,>o).
Volume
14. Number
STATISTICS
2
& PROBABILITY
27 May 1992
LETTERS
Then d
Ix I >
ch(v))
-
I Y I >
E c
Ai
-R.
i=l
The distance
Ifthevectors our bounds
between
x and x’ :=y’ - C~=~A,(X~ - y’) is equal to
- y' are linearly
x,-y’,...,x, show that
independent,
then the point
x’ lies in A(y ‘, v). In this case
A(Y,~)\A(Y’,~)=(~~\U(O,R))U(U(~(Y’,~),~)\~(Y’,~)),
where p is the constant in (14). Now A( y ‘, v) is an intersection of d open halfsnaces. Hence u(A(~‘, u),P)\A(~‘, v) is contained in the union of the P-neighborhoods of d hyperplanes. If the xi - y’ are linearly dependent, then A(y’, v) = @, and x’ lies in a fixed hyperplane H containing the xi - y’. Hence A( Y, u> = (Rd\U(O,
R))
u U(ff,
P)t
and in any case (14) holds. Proof of (15). Obviously F,{x, E U(O, RI: dist(y, {xi)) < ~1 is not greater than SUP~,~F’,(U(FZ, 7)). Now xk E U(0, R) such that the distance from y to ch(x,, . . . , xk) is not less than let 1 0. Let H be a hyperplane containing y, xi,. . . , xk, and let xk+i be any point in UO, R)\U(H, r>. Then one can bound dist(y, ch(x,, . . . , x~+~)) as follows: One may write xk+i = h + r, where h is the orthogonal projection of xk + , onto H. Then, for arbitrary x E ch(x,, . . . , xk) and A E [O, 11, )12=I(l-A)(y-~)+A(y-h)/2+
IY-((I-A)x+Ax,+,
lAr12
~((1-A)~y-x~-Aly-hl)2+A2y2 z Since 6~
Iy-xl ly-h12=
<2R
Iy-xI+Iy-hl)2+y2).
and ly-x
k+112-
the right hand side of the preceding dist(y,
Iy-x12y2/((
ch(xi,...,xk+i))
lr12<(2R)2-y2, inequality
is not less than
~3*y~/(4R)~.
Hence
>Sy/(4R),
and
{ukE U(0,
R)k:
dist(y,
ch(v,))
+
sup Fk+*(U(Hp
HEX NOW
(15) follows inductively.
Y)).
0 125
Volume
14, Number
4. Application
2
STATISTICS
& PROBABILITY
LETTERS
21 May 1992
to an L-statistic
As an example we apply the results of Sections 2 and 3 to the L-statistic L(F) defined in Section 1. Our assumptions on the weight function W are: W
is continuously differentiable
3r,>Osuchthat
with derivative w;
(16)
W(r)=OforOOforr>r,;
(17)
/W( D( Y, f’))F(dy)
> 0.
(18)
The results mentioned in the introduction indicate that r,, in (17) should be less than 2-d. For if F is symmetric around p E Rd and has a continuous density f with f(p) > 0, then (18) holds automatically. From the robustness result of Section 2 one can easily deduce that L(o) is robust, too: L(F’)
Under (1) and (16)-(18),
-L(F)
as F’+F
weakly.
(19)
Further, one can apply the CLT of Section 3 to prove asymptotic normality of L@): Theorem 3. Zf (1) and (16)~(18) hold, then the distribution of I&(L(F~,) -L(F)) d-variate Gaussian distribution with mean 0 and covariance matrix x:=
converges weakly to a
Covx_#qX)),
where F, :=P”_ K(x)
F (X-L(F)),
:= (E(x)
+xw(D(
E(x) := /YW(~(Y,
x, F,)))//W@(x’,
F,))~(Y,
F,))F,(dx’),
x)Fc(dy).
The covariance matrix 2 in Theorem 3 is difficult to treat analytically. If one is interested in confidence ellipsoids for L(F), a possible way out is to use a bootstrap approximation for the distribution of &( L($) - L(F)). For Corollary 1, Theorem 2, Theorem 3 remain valid, if the n random variables defining F,, have distribution F.,, and if F,, converges weakly to a distribution F satisfying (1) and (161418). Proof of Theorem 3. By equivariance F, = F.
&+I’(
D( X, fi,#(dx)
= v’+xW(D(x,
g,#(dx)
= _/xdi(W(D(x,
fin)) - W(D(x,
= lxw(77,(x))G,(x)~~(dx)
126
we may assume without loss of generality that L(F) = 0 and thus
One can write
- /xW(D(x,
+
F)))t@)
F))F(dx)) + /xW(D(x,
jxW(u(x,F))Bn(dx),
F))Udx)
STATISTICS & PROBABILITY
Volume 14, Number 2
27 May 1992
LETTERS
where 117, - D(. , F’) llocG R,, := IIG, IIm/h Now we want to show that the integral that
= O,( I,‘&).
lxW(77,(x))G,(x)~~,(dx)
can be approximated
by lE(x)B,(dx),
so
One the one hand.
G II G, Ilm
Iw(r’)
SUP
r,r’r[O,l],lr-r’l
-w(~)l/Id{L+,
F)
+R,>~,}&d.x)
=oJl). Further
the integral /xw(D(x,
/xw(D(x,
F))G,(x)Fn,(dx)
F))G,(x)F(dx)
is equal to
t-or,(l)
= /xw(D(x, =
/
E(x)B,(dx)
This is a consequence of the tightness of the sequence continuous functions g,, g,, . . . , g, on K such that lim sup Pr ( rz_nL
. .
n-m
The functions
x ++xw(D(x,
IlxW(x,
6
Il(;,-gjIlm>t-) F>)gj
and continuous
+0,(l). For every F > 0 there
are finitely
many
on Rd. Hence
-F)(dx)i
2.5max Ixw( D(x, F)) I + lTjyL . .
XEWd
=0(E)
(G,):
+0,(l)
GE.
are bounded
F))G,(x)(E
F))G:(x)F(dx)
I/xw(D(xJ
F))gj(x)('n-F)(dx)l
+0,(l)
with asymptotic probability not less than 1 - E. Letting E JO yields (20). Similar (or more elementary) considerations show that
/f+“(+, &))fl,,(dx)= jW(D(x, F))F(dx) Then
(20) and (21) together
&!i( fin)= /K( and Theorem
‘q,(l).
(21)
lead to x)B,(dx)
(24
+ op( 1))
3 follows from the multivariate
CLT.
q 127
Volume
14, Number
2
STATISTICS
& PROBABILITY
LETTERS
27 May 1992
Acknowledgements
While working on the revision of this paper I learned that Arcones and Gin6 (1991) had worked on U-processes, and their general results cover part of the present material. Nevertheless I think the approach developed here has its own merit, because it is elementary, geometrical and gives some insight into continuity and robustness of the simplicial depth. I would like to thank the referee for his criticisms and useful comments.
References Arcones, M.A. and E. Gine (19911, Limit theorems for Uprocesses, Preprint, submitted. Billingsley, P. and F. Topsoe (1967), Uniformity in weak convergence, Z. Wahrsch. Verw. Gebiete 7, 1-16. Diimbgen, L. (1990), Limit theorems for the empirical simplicial depth, Preprint 581, SFB 123, Universitit Heidelberg.
128
Hoeffding, W. (1948), A class of statistics with asymptotically normal distribution, Ann. Math. Statist. 19, 293-325. Liu, R. (19901, On a notion of data depth based on random simplices, Ann. Statist. 18, 405-414. Pollard, D. (19841, Concergence of Stochastic Processes (Springer, New York).