Limit theorems for the simplicial depth

Limit theorems for the simplicial depth

Statistics & Probability North-Holland Letters 27 May 1992 14 (1992) 119-128 Limit theorems for the simplicial depth Lutz Diimbgen Institut fiir a...

546KB Sizes 0 Downloads 61 Views

Statistics & Probability North-Holland

Letters

27 May 1992

14 (1992) 119-128

Limit theorems for the simplicial depth Lutz Diimbgen Institut fiir angewandte Mathematik, Heidelberg, Germany Received July 1990 Revised July 1991

Abstract: The large sample behavior and a robustness property of the simplicial depth for multivariate data, introduced (1990), are studied. A functional CLT is derived and applied to an L-statistic, which estimates the center of a smooth distribution on d-dimensional Euclidean space. Keywords: Simplicial

depth,

Vapnik-Cervonenkis

class, empirical

process,

robustness,

multivariate

by R. Liu symmetric

L-statistic.

1. Introduction Let F be a probability distribution y E Rd with respect to F, D(Y,

on R d, d a 2. Liu (1990) introduces

F):=Pr{y~ch”(X,,

X2,...,&+,)},

(Xt,

where chO(.x,, x2, x3,. . > denotes the interior of the D(y, F) is a measure of ‘centrality’ of the point y with simplicial median M(F) of F to be a point with maximal to extend L-functionals for univariate distributions, e.g. simplest way is to use weighted averages such as L(F)

:= jYw(u(y,

F))F(dy)

/W(

the simplicial

xZ,...,xd+t>

depth

of a point

_Fd+‘,

convex hull ’ of xi, x2, x3,. . . . This quantity respect to F. For instance one can define the depth D(., F). Another possible application is the trimmed mean, to higher dimensions. The

D( y, F))F(dy),

I

with a suitable weight function W: [O, 11+ [O, 11. As for the connection to the univariate case note that for a continuous c.d.f. F on the line the above definition yields D(., F) = 2&y>(l -F(y)). Hence the median of F is a point maximizing DC., F), and the a-trimmed mean of F, (1 -a)-‘~~~~]‘:1,‘“4”cdy), 01 is just L(F) with W(r) := I{r 2 (~(1 - &)I. Let us briefly describe some of Liu’s (1990) results. An important property of D(., . > is its affine equivariance: if A is an affine transformation of Rd, then D(Ay, _Y’,,_.(Ax)) =+NY, F) for all y E Rd. Correspondence

to: Prof. Lutz Diimbgen,

Inst. f. angew.

Mathematik,

INF 294, 69 Heidelberg,

Germany

Partly supported by the Deutsche Forschungsgemeinschaft. ’ Liu (1990) used the (closed) convex hull rather than its interior. For smooth distributions F this makes no difference, the interior simplifies some arguments; see Diimbgen (1990) for the technical details without this modification.

0167-7152/92/$05.00

0 1992 - Elsevier

Science

Publishers

B.V. All rights reserved

and using

119

Volume 14, Number 2

STATISTICS & PROBABILITY

LETTERS

27 May 1992

Hence location parameters such as M(F) or L(F) above are affine equivariant. The function D(y, F) tends to 0 as 1y I + 03. If F is smooth in the sense that F(H)

= 0

H c Rd,

for arbitrary hyperplanes

(1)

then D(., F) is continuous. In addition let F be symmetric around P E Rd, that means 9x__.(X) = _Px_ F(2p -X). Then D(p, F) = 2-d and D(p + ry, F) is monotone decreasing in Y2 0 for arbitrary y E Rd. In particular M(F) = I_Land, by equivariance, L(F) = EL. In Section 2 it is shown that F H DC., F) is continyous in a certain sense. In particular DC., finI is a uniformly consistent estimator for DC., F), where F, is the empirical distribution of IZ independent random variables X,, X,, . . . , X,, with distribution F. Moreover, if F satisfies (1) and F’ is any other distribution on Rd, then DC., F’) tends uniformly to DC*, F) whenever F’ converges weakly to F. In this sense the simplicial depth DC., . > is robust. In Section 3 a functional CLT for &(D(*, fi,,,>- DC., F)) is derived under assumption (1) on F. This CLT is applied in Section 4 to derive the asymptotic normality of L(F) for smooth W.

2. Consistency

and robustness

For any second distribution F’ on Rd there is a useful representation of the function DC., F’) - DC., F), which relates it to the function ((F’ - F)(A): A E&I with a simple class ~2 of subsets of Rd. At first we have to introduce some nOtatiOn: for y E Rd and x,, x2,. . .) xd+l E [Wd let d := (x,, x2,. . . , xd+,>, u :=(x,, x2, . . . , x,>, and A( y, U) := {x E Rd: y E ChO(U, x)}. Then, by the symmetry of Ily E ch”WJ as a function of i?‘,one can write D(Y,

F’)-D(y,

F)=~I{y~ch(l(~)}(F’~+‘-Fd+‘)(dir)

=

=

/{

/

1 y E ch”(5))

5 FfkFd+(F’-F)(dy) k=O

(F’-

F)( A( y, u))

5

F’kFd-k(du).

k=O

Now consider the sets A(y, v). One may write x~R~:O~ch’=(x,-y,

A(y,u)={

=Y +A’(x,

x2-y

,..., xd-y,

-Y, x2-Y,...,xd-y),

where generally A’(U) := {x E Rd: 0 E chO(u, x)}. One can easily show that d A’(u)

=

-interiorof

CA,x;: i=l

120

A,, h,,...,h,>O

.

x-y)}

(2)

Volume

14. Number

STATISTICS

2

of d open halfspaces

& := {intersections and the corresponding

Kolmogorov-Smirnov

27 May 1992

LETTERS

set is nonvoid, if and only if x,, x2,. . . , xd are linearly of d open halfspaces. Consequently, with

The latter intersection

Theorem

& PROBABILITY

independent,

in which case it is the

in Rd) 11. II @, the representation

norm

(2) yields:

1. For arbitrary distributions F, F’ on Rd,

IID(.,F’)-D(.,F)II,~(d+l)llF’-FII,. The class ti

is a Vapnik-Cervonenkis

11$n - F (IM = o( 1) see Pollard (19841, Chapter := A(@,, - F):

0

class, which implies

almost

that

surely,

II. In addition

(3) a CLT

holds

for the

normalized

empirical

The processes (B,(A): A E&‘) converge in distribution to a centered Gaussian process (B(A): A E_w’) having bounded, uniformly continuous paths with respect to the pseudodistance p(A, A’) := F(A A A’),

see Pollard is uniformly Corollary

(1984), Chapters consistent:

IV and VII.5. These

two facts and Theorem

1 together

process

B,

(4)

show that D(. , fin’,>

1. almostsurely

/D(.,

fin)-D(*,

F)II_=o(l)

I&,

F’,)-W,

F)/I_=0,(1/6+.

and

Another

consequence

of Theorem

Corollary 2. Zf F satisfies (0, IID(., With a simple

1 is the robustness

of D(. , .I:

then

F’) -D(.,

F)II,-+O

compactness

argument

sup F(U(H,

0

6)) -+O

wheneuerF’+Fweakly. one can show that (1) implies

as SJO,

(5)

HEX

where

3

is the family of hyperplanes U( s, 6) := {x E R d: dist(x,

•I

in Rd, and S)
dist(x,

S) := j$Ix-yl

for subsets S of Rd. Since the boundary of any set A E& is contained in a union of d hyperplanes, it follows from (5) and Theorem 2 of Billingsley and Topsoe (1967) that II F’ - F IIti -+ 0 whenever F’ + F weakly. 121

Volume 14, Number 2

STATISTICS & PROBABILITY

LETTERS

27 May 1992

3. A central limit theorem Here and in Section G,, :=

where

4 we generally

(WY):

assume

the process

Y E W),

06 := Rd U (a> is the Alexandrov G,(Y)

that F meets (1). We consider

:= \~;;(D(Y,

compactification

pn) -NY,

of Rd, and

F))

(D(ccJ, . > := 0). Using (2) one can write G,(Y)

4) i

= /&(A(Y>

k=O

(4~

. ) := @I. The result usual empirical process G,*(y)

:=

consists

fi;Fd-k(d4

of two parts:

(6)

One can approximate

this process

u))Fd(du) = /g(y>x)4,(dx)

(d+ l)/B,(A(y,

(Y

G,, uniformly

by the

EK),

where

Then

a CLT holds for G,, and G,*:

Theorem 2.

If F

meets (101, then

IIG,*-G,II~=o,(l). Further the processes Gi*’ converge in distribution to a centered Gaussian process (G(y): continuous paths and covariances given by E(G(y)G(y’))

=Cov,_.(g(y,

X),

g(y’,

X)).

y E W> with

(7)

‘Convergence in distribution’ means the following: There is a sequence of stochastic processes G,, = (6,(y): y E W) with continuous paths, where G,, and GL*) are defined on the same probability space, such that IIen - GA*) IIm tends to 0 in (outer) probability, and 6, converges in distribution to G in the space of continuous functions on K. By an approximation argument similar as in the proof of Theorem 21 in Pollard (1984), Chapter VII.5, one can show that the weak convergence of the finite dimensional distributions, (9) below, and the stochastic equicontinuity, (10) below, are sufficient for Theorem 2 to hold. Proof of Theorem 2. Since the functions IID(., with a nonrandom

6 ++ I{y E ch”(fiN, y E Rd, are uniformly

in) -Dnll,~O(I/n), 0(1/n>,

where

D,(y)

is the U-statistic

-1 lgi(l)< 122

“.

I{y E

c
chO(Xi(l),...,Xi(d+l))).

bounded,

Volume

14, Number

Moreover,

STATISTICS

2

the standardized G,*(Y)

for arbitrary

fi(D,(y)

- G,(y)

CLT implies

= or,(l)

F)) is equal

to

(1948). Consequently,

for arbitrary

fixed y E K.

(8)

that:

The finite dimensional Gaussian distributions Now suppose

- D(y,

21 May 1992

LETTERS

-to,(I),

fixed y E IWd, see Hoeffding G,*(y)

The usual

U-statistic

& PROBABILITY

that the processes

marginal distributions of G,* converge with covariances given by (7). GL*) are stochastically

equicontinuous

lim supPr IGA*)(p’)-G~*)(y)~>~)+O sup ( Y,Y’EK, m(y,y’) is any metric for K. Then the first assertion The second part is a consequence of (9) and (10). Assertion Lemma.

(10) itself is a consequence

of the following

weakly to centered (9) in the following as6LO,V7I>O,

in Theorem

sense: (10)

2 follows easily from (8) and (lo),

result, which is proved

at the end of this section.

Let G* be an arbitrary function on K of the form G*(Y)

= /b(A(y,

u))

where F,, F2,. . . , Fd are probability measures on IWd,and b is a bounded function on JZ?with b(@) = 0. Then the following inequalities hold for arbitrary R, E > 0, where w(b, . > denotes the modulus of continuity of b with respect to p(. , . >: for ally E rWd\U(O, R): IG*( y)I < I, b I,ti 5 Fi(rWd\U(O, R)) + w(b, F(Rd\U(O,

R)));

(11)

i=l

forally,

y’~

U(0, R),

~G*(y’)-G*(y)~<2,,b,,,&#k’“\~(O,R))+~(b,o) i=l +21/bilMi

sup Fi(u(H> i=i

Y))>

Ht&”

(12)

where (Y := 2F(rWd\U(0,

R)) + 2d sup F(U(H,

/3)),

HET p:=

ly’-yl(1+(2R)/e),

~:=((4R)“-‘e)“~.

If we apply the Lemma to the processes CLT (4) for B, implies that limsupPr(w(B,,6)>n}+O n+m

G, and G,*, we have to take b = B, and

F, E {F, p,J. The

asSJO,Vq>O.

123

Volume 14, Number 2

STATISTICS & PROBABILITY

LETTERS

27 May 1992

Further, by (3) and the Law of Large Numbers, SUP &(qK

6))

=

SUP F(qK

HEZ

6))

+0,(l),

HEX

QrWd\U(O,

R))

=qRd\q,

R))

+ o,(l),

and the limits on the right hand sides are arbitrarily small for 6 sufficiently small and R sufficiently large; see (5). Together with the Lemma this yields (10). 0 Proof of the Lemma. First of all, if y E rWd\ U(0,

RI and Y E U(0, Rid,then the set A(y, v> is a subset of

I@\ U(0, RI.Thus ~G*(Y)] =l/(WG

u)) -@))(

< i Rj(rWd\U(O, R)) ++,

fiF,)(du)l F(Rd\U(O,

R))),

i=l

which is (11). In order to prove (12) let y, y’ be arbitrary points in U(0, RI.One can easily show that I*(Y’)

-G*(y)1

~211&f$Qd\U(0,

R))++,

a’)

i=l

u E U(0, R)d: dist( y, ch( u)) < &}

{u E U(0, R)d:

dist( y’, ch(v))


(13)

where a’

:=

ch( .) denotes

F( A( Y, p>A A( Y', u));

SUP UE[R~]~,

dist(y,

ch(u))>e,

distcy’,

ch(u))>E

the convex hull. The term (Y’in (13) may be bounded as follows. For Y as in the definition

of (Y’, F(A(y,

~)\A(Y’,

u))

O(Rd\U(O,

R))

+d

SUP F(Wf,

P)),

HEZ

where p := I y’ - y I(1 + (~R)/E).

The last two summands on the right hand side of (13) may be bounded

where y := [(4R)d-1~]‘/d. Th ese two bounds (14) and (151, applied to (131, yield (12). Proof of (14). Let x EA(y,

v), say

d

x=y-

&i,(xj-y) i=l

124

(14)

(A,,&

,..., A,>o).

Volume

14. Number

STATISTICS

2

& PROBABILITY

27 May 1992

LETTERS

Then d

Ix I >

ch(v))

-

I Y I >

E c

Ai

-R.

i=l

The distance

Ifthevectors our bounds

between

x and x’ :=y’ - C~=~A,(X~ - y’) is equal to

- y' are linearly

x,-y’,...,x, show that

independent,

then the point

x’ lies in A(y ‘, v). In this case

A(Y,~)\A(Y’,~)=(~~\U(O,R))U(U(~(Y’,~),~)\~(Y’,~)),

where p is the constant in (14). Now A( y ‘, v) is an intersection of d open halfsnaces. Hence u(A(~‘, u),P)\A(~‘, v) is contained in the union of the P-neighborhoods of d hyperplanes. If the xi - y’ are linearly dependent, then A(y’, v) = @, and x’ lies in a fixed hyperplane H containing the xi - y’. Hence A( Y, u> = (Rd\U(O,

R))

u U(ff,

P)t

and in any case (14) holds. Proof of (15). Obviously F,{x, E U(O, RI: dist(y, {xi)) < ~1 is not greater than SUP~,~F’,(U(FZ, 7)). Now xk E U(0, R) such that the distance from y to ch(x,, . . . , xk) is not less than let 1 0. Let H be a hyperplane containing y, xi,. . . , xk, and let xk+i be any point in UO, R)\U(H, r>. Then one can bound dist(y, ch(x,, . . . , x~+~)) as follows: One may write xk+i = h + r, where h is the orthogonal projection of xk + , onto H. Then, for arbitrary x E ch(x,, . . . , xk) and A E [O, 11, )12=I(l-A)(y-~)+A(y-h)/2+

IY-((I-A)x+Ax,+,

lAr12

~((1-A)~y-x~-Aly-hl)2+A2y2 z Since 6~

Iy-xl ly-h12=

<2R

Iy-xI+Iy-hl)2+y2).

and ly-x

k+112-

the right hand side of the preceding dist(y,

Iy-x12y2/((

ch(xi,...,xk+i))

lr12<(2R)2-y2, inequality

is not less than

~3*y~/(4R)~.

Hence

>Sy/(4R),

and

{ukE U(0,

R)k:

dist(y,

ch(v,))


+

sup Fk+*(U(Hp

HEX NOW

(15) follows inductively.

Y)).

0 125

Volume

14, Number

4. Application

2

STATISTICS

& PROBABILITY

LETTERS

21 May 1992

to an L-statistic

As an example we apply the results of Sections 2 and 3 to the L-statistic L(F) defined in Section 1. Our assumptions on the weight function W are: W

is continuously differentiable

3r,>Osuchthat

with derivative w;

(16)

W(r)=OforOOforr>r,;

(17)

/W( D( Y, f’))F(dy)

> 0.

(18)

The results mentioned in the introduction indicate that r,, in (17) should be less than 2-d. For if F is symmetric around p E Rd and has a continuous density f with f(p) > 0, then (18) holds automatically. From the robustness result of Section 2 one can easily deduce that L(o) is robust, too: L(F’)

Under (1) and (16)-(18),

-L(F)

as F’+F

weakly.

(19)

Further, one can apply the CLT of Section 3 to prove asymptotic normality of L@): Theorem 3. Zf (1) and (16)~(18) hold, then the distribution of I&(L(F~,) -L(F)) d-variate Gaussian distribution with mean 0 and covariance matrix x:=

converges weakly to a

Covx_#qX)),

where F, :=P”_ K(x)

F (X-L(F)),

:= (E(x)

+xw(D(

E(x) := /YW(~(Y,

x, F,)))//W@(x’,

F,))~(Y,

F,))F,(dx’),

x)Fc(dy).

The covariance matrix 2 in Theorem 3 is difficult to treat analytically. If one is interested in confidence ellipsoids for L(F), a possible way out is to use a bootstrap approximation for the distribution of &( L($) - L(F)). For Corollary 1, Theorem 2, Theorem 3 remain valid, if the n random variables defining F,, have distribution F.,, and if F,, converges weakly to a distribution F satisfying (1) and (161418). Proof of Theorem 3. By equivariance F, = F.

&+I’(

D( X, fi,#(dx)

= v’+xW(D(x,

g,#(dx)

= _/xdi(W(D(x,

fin)) - W(D(x,

= lxw(77,(x))G,(x)~~(dx)

126

we may assume without loss of generality that L(F) = 0 and thus

One can write

- /xW(D(x,

+

F)))t@)

F))F(dx)) + /xW(D(x,

jxW(u(x,F))Bn(dx),

F))Udx)

STATISTICS & PROBABILITY

Volume 14, Number 2

27 May 1992

LETTERS

where 117, - D(. , F’) llocG R,, := IIG, IIm/h Now we want to show that the integral that

= O,( I,‘&).

lxW(77,(x))G,(x)~~,(dx)

can be approximated

by lE(x)B,(dx),

so

One the one hand.

G II G, Ilm

Iw(r’)

SUP

r,r’r[O,l],lr-r’l

-w(~)l/Id{L+,

F)

+R,>~,}&d.x)


=oJl). Further

the integral /xw(D(x,

/xw(D(x,

F))G,(x)Fn,(dx)

F))G,(x)F(dx)

is equal to

t-or,(l)

= /xw(D(x, =

/

E(x)B,(dx)

This is a consequence of the tightness of the sequence continuous functions g,, g,, . . . , g, on K such that lim sup Pr ( rz_nL

. .

n-m

The functions

x ++xw(D(x,

IlxW(x,

6

Il(;,-gjIlm>t-) F>)gj

and continuous

+0,(l). For every F > 0 there

are finitely

many

on Rd. Hence

-F)(dx)i

2.5max Ixw( D(x, F)) I + lTjyL . .

XEWd

=0(E)

(G,):

+0,(l)

GE.

are bounded

F))G,(x)(E

F))G:(x)F(dx)

I/xw(D(xJ

F))gj(x)('n-F)(dx)l

+0,(l)

with asymptotic probability not less than 1 - E. Letting E JO yields (20). Similar (or more elementary) considerations show that

/f+“(+, &))fl,,(dx)= jW(D(x, F))F(dx) Then

(20) and (21) together

&!i( fin)= /K( and Theorem

‘q,(l).

(21)

lead to x)B,(dx)

(24

+ op( 1))

3 follows from the multivariate

CLT.

q 127

Volume

14, Number

2

STATISTICS

& PROBABILITY

LETTERS

27 May 1992

Acknowledgements

While working on the revision of this paper I learned that Arcones and Gin6 (1991) had worked on U-processes, and their general results cover part of the present material. Nevertheless I think the approach developed here has its own merit, because it is elementary, geometrical and gives some insight into continuity and robustness of the simplicial depth. I would like to thank the referee for his criticisms and useful comments.

References Arcones, M.A. and E. Gine (19911, Limit theorems for Uprocesses, Preprint, submitted. Billingsley, P. and F. Topsoe (1967), Uniformity in weak convergence, Z. Wahrsch. Verw. Gebiete 7, 1-16. Diimbgen, L. (1990), Limit theorems for the empirical simplicial depth, Preprint 581, SFB 123, Universitit Heidelberg.

128

Hoeffding, W. (1948), A class of statistics with asymptotically normal distribution, Ann. Math. Statist. 19, 293-325. Liu, R. (19901, On a notion of data depth based on random simplices, Ann. Statist. 18, 405-414. Pollard, D. (19841, Concergence of Stochastic Processes (Springer, New York).