Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
Multivariate trimmed means based on the Tukey depth夡 Jean-Claude Massé∗ Département de Mathématiques et de Statistique, Université Laval, Sainte-Foy, QC, Canada G1K 7P4
A R T I C L E
I N F O
Article history: Received 17 October 2005 Received in revised form 27 February 2008 Accepted 19 March 2008 Available online 17 May 2008 Keywords: Asymptotic relative efficiency Breakdown point Functional delta method Hadamard derivative Multivariate trimmed mean Robustness Trimmed region Tukey depth
A B S T R A C T
In univariate statistics, the trimmed mean has long been regarded as a robust and efficient alternative to the sample mean. A multivariate analogue calls for a notion of trimmed region around the center of the sample. Using Tukey's depth to achieve this goal, this paper investigates two types of multivariate trimmed means obtained by averaging over the trimmed region in two different ways. For both trimmed means, conditions ensuring asymptotic normality are obtained; in this respect, one of the main features of the paper is the systematic use of Hadamard derivatives and empirical processes methods to derive the central limit theorems. Asymptotic efficiency relative to the sample mean as well as breakdown point are also studied. The results provide convincing evidence that these location estimators have nice asymptotic behavior and possess highly desirable finite-sample robustness properties; furthermore, relative to the sample mean, both of them can in some situations be highly efficient for dimensions between 2 and 10. © 2008 Elsevier B.V. All rights reserved.
1. Introduction Through his notion of depth, Tukey (1975) initiated a very fruitful approach to defining multivariate location estimators. A depth function can be seen as a device for measuring the centrality of a multivariate data point within a data cloud. Each such function induces a center-outward ranking of data points within a given multivariate data set, thus allowing a multivariate generalization of univariate location estimators such as the median, trimmed means or, more generally, L-estimators. Various depth functions have since appeared in the statistical literature (Zuo and Serfling, 2000), each one giving rise to a ranking of the data and therefore to a family of DL-estimators of location, i.e. L-estimators based on a depth function (Liu et al., 1999). Besides Tukey's depth, the best known depth functions are the simplicial depth of Liu (1990) and the projection depth function introduced in Liu (1992). Donoho (1982) and Donoho and Gasko (1992) studied the breakdown properties of location estimators based on the Tukey depth, including the median and a constant weight trimmed mean. Robustness of Tukey's median was the subject of Chen (1995) and Chen and Tyler (2002). Nolan (1999) derived the limit distribution of Tukey's median for dimension d = 2, a result that was extended for d > 2 by Bai and He (1999) and, under a different set of assumptions, by Massé (2002). In an extensive Monte Carlo study, Massé and Plante (2003) observed that, for moderate and large sample sizes, Tukey's median and one of the constant weight Tukey depth-based trimmed means are very good alternatives to the sample mean for the estimation of a bivariate location parameter. ¨ Dumbgen (1992) derived asymptotic normality of simplicial depth-based trimmed means after establishing a central limit theorem for the corresponding empirical depth process. Asymptotics for DL-location estimators can be found in Zuo et al. (2004a); 夡 ∗
This research was supported by grants from the National Sciences and Engineering Research Council of Canada. Tel.: +1 418 656 2131x2028; fax: +1 418 656 2817. E-mail address:
[email protected].
0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.03.038
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
367
the same authors as well as Zuo et al. (2004b) have also investigated robustness and efficiency of projection depth-based location estimators. Asymptotic properties of the empirical Tukey depth process are studied in Massé (2004), where, as an application of the main result, asymptotic normality of a special class of Tukey depth-based trimmed mean is obtained. The current paper focuses on two types of Tukey depth-based trimmed means from the standpoints of asymptotic distribution, asymptotic efficiency and finite-sample robustness. Trimmed means are viewed throughout as statistical functionals defined on some space of distribution functions, an approach allowing derivation of asymptotic normality through the use of the Hadamard derivatives methodology. Section 2 covers some useful properties of the Tukey depth and introduces the two types of multivariate trimmed means; in addition, Hadamard derivatives and Donsker's theorem for empirical processes are recalled and some of the notation used throughout the paper is set down. Section 3 deals with asymptotic normality of the trimmed means. Section 4 is concerned with asymptotic relative efficiency and finite-sample robustnesss as measured by the breakdown point. Technical details and most proofs can be found in Section 5. 2. Depth-based trimmed means and the functional delta method Let F be a probability distribution on Rd and let H denote the class of closed halfspaces H in Rd . The Tukey depth (or halfspace depth) of a point x ∈ Rd (with respect to F) is defined as D(x) ≡ DF (x) := inf{FH : H ∈ H, x ∈ H}. Write H[x, u] := {y ∈ Rd : u · y u · x} for every u ∈ Sd−1 := {u ∈ Rd : |u| = 1}. Then clearly D(x) = inf u∈Sd−1 FH[x, u]. It is well known that D is upper semicontinuous (Donoho and Gasko, 1992, Lemma 6.1) and that D attains its supremum (Rousseeuw and Ruts, 1999, Proposition 7). According to Donoho and Gasko (1992, Lemma 6.3), the maximal value ∗ ≡ ∗ (F) of D is positive and such that 1 ∗ 1. d+1 In case F is absolutely continuous, it can be checked that D is continuous (Donoho and Gasko, 1992, Lemma 6.1). Another nice property of the Tukey depth is affine invariance. Specifically, for any affine map xAx + b, where A is a d × d nonsingular matrix and b ∈ Rd , DF (x) = DF
A,b
(Ax + b),
x ∈ Rd ,
where FA,b is the image probability distribution of F by the affine map. The affine invariance property ensures that the Tukey depth of a point does not depend on the underlying coordinate system. Given an independent identically distributed sample Fn denote the empirical distribution. The Tukey depth with respect to Fˆ n , denoted by Dn ≡ DFˆ , is called the X1 , . . . , Xn from F, let n
empirical Tukey depth function. For any > 0, define the -trimmed region Q ≡ Q (F) := {x : DF (x) }. Trimmed regions are known to be compact; moreover, they are nonempty and convex whenever ∗ (Rousseeuw and Ruts, 1999, Corollary to Proposition 1, Propositions 5 and 7). Let Qn := Q ( Fn ) denote the empirical -trimmed region. In the following, we will assume 0 < < ∗ . Clearly a depth-based multivariate trimmed mean can be defined by averaging those elements of the sample that belong to an empirical trimmed region. A weight function allows us to obtain a more general class of trimmed means by averaging over weighted observations. A statistical functional description of this approach is given next. In what follows, let W : [0, 1] → R be a continuously differentiable weight function, where differentiability at 0 (resp. 1) means right-differentiability (resp. left-differentiability). Assuming that 0 < | Q W(DF (x))F(dx)| Q |W(DF (x))|F(dx) < ∞, put
L1 (F) :=
Q
W(DF (x))xF(dx)
Q
W(DF (x))F(dx) .
We define the sample trimmed mean of the first type (or first trimmed mean) as Fn ) = W(Dn (x))x Fn (dx) W(Dn (x)) Fn (dx) = W(Dn (Xi ))Xi L ( 1
Qn
Qn
i:Dn (Xi )
W(Dn (Xi )),
(1)
i:Dn (Xi )
so that every Xi is given a weight depending on its depth and proportional to W(Dn (Xi )). Thus, this trimmed mean is well defined provided there exists at least one Xi such that Dn (Xi ) . In the special case of a constant weight function, the breakdown properties of the sample trimmed mean of the first type were investigated in Donoho (1982) and Donoho and Gasko (1992). A central limit theorem for DL-statistics is obtained in Zuo et al. (2004a) under the restriction that points of lower depth have a weight close to 0, without making use of Hadamard derivatives.
368
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
The centroid of a trimmed region can be regarded as another form of trimmed mean. Extended with a weight function, this second approach is described next through a statistical functional defined as an average over a trimmed region with respect to Lebesgue measure. Assuming that 0 < | Q W(DF (x)) dx| Q |W(DF (x))| dx < ∞, let L2 (F) := W(DF (x))x dx W(DF (x)) dx . Q
Q
The sample trimmed mean of the second type (or centroid trimmed mean) is then defined to be Fn ) = W(Dn (x))x dx W(Dn (x)) dx . L2 ( Qn
Qn
(2)
Letting ∗n := max Dn (x), this can also be written as the weighted average
k/n ∗n W(k/n) Dn (x)=k/n dx Dn (x)=k/n x dx Dn (x)=k/n dx , Fn ) = L2 ( k/n ∗n W(k/n) Dn (x)=k/n dx where each centroid Dn (x)=k/n x dx/ Dn (x)=k/n dx has a weight proportional to W(k/n) Dn (x)=k/n dx. This statistic is well defined
provided the empirical -trimmed region Qn has nonempty interior. In the special case where W is a nonzero constant, L2 ( Fn ) is seen to coincide with the centroid of Qn : L2 ( Fn ) = x dx dx . Qn
Qn
(3)
It appears that Donoho and Gasko (1987) were the first to consider (3); it is mentioned in Nolan (1992) and briefly discussed in van der Vaart and Wellner (1996, p. 395). These authors do not, however, make any reference to Tukey's depth; in fact, elementary examples based on discrete distributions show that their notion of trimming is slightly different from ours. For F sufficiently smooth, van der Vaart and Wellner (1996) assert that (3) is asymptotically normal but do not describe the covariance matrix of the limit distribution. It is not too difficult to see that trimmed means L1 ( Fn ) and L2 ( Fn ) are random variables for any F; furthermore, under mild restrictions on F, both can be shown to be strongly consistent. Put X := {X1 , . . . , Xn }. Recall that a statistic T(X) is said to be affine equivariant if T(AX + b) = AT(X) + b for any d × d nonsingular matrix A and any b ∈ Rd , where AX + b = {AX1 + b, . . . , AXn + b}. Using the affine invariance of the Tukey depth, it is straightforward Fn ) and L2 ( Fn ) are both affine equivariant. to check that L1 ( In this paper, central limit theorems for the trimmed means are obtained through an extension of the classical delta method, the so-called functional delta method (van der Vaart, 1998, chap. 20). According to the functional delta method, given that a statistical functional has a derivative in some appropriate sense, the asymptotic distribution of the corresponding statistic is determined by the derivative and the asymptotic behavior of the empirical process. References on this approach to asymptotics of classical L-statistics can be found in Boos (1979), Serfling (1980), Fernholz (1983) and van der Vaart (1998). The functional delta method relies on the theory of weak convergence for empirical processes. Our main reference for the necessary tools is van der Vaart and Wellner (1996), for short, VW. Since the methodology is applied to vector-valued functionals, central limit theorems for empirical processes indexed by classes of vector-valued maps are needed. Extension of classical theorems stated for real-valued maps is straightforward and, except for some new notation, does not require going into details. Let L2 (F, Rm ) be the space of F-square-integrable Rm -valued maps defined on Rd and normed by (F(f 2 ))1/2 . Suppose that 2 )]1/2 and define F ⊂ L2 (F, Rm ) is nonempty and totally bounded with respect to the L2 semimetric F (f , g) := [F(f − g√ Fn − F] be the UC(F, F ) to be the space of Rm -valued uniformly continuous functions on F with respect to F . Let n := n[ empirical process viewed as an element of ∞ (F, Rm ), the space of bounded Rm -valued maps on F equipped with the sup norm tF := supF t(f ). If F is a Donsker class for F, then it is known that n F in ∞ (F, Rm ), where F is the F-Brownian bridge; furthermore, there always exists a version of F such that all sample paths belong to UC(F, F ) (for the real-valued case, see VW, 1996, p. 81, 89). An example of a Donsker class is F = H ⊂ L2 (F) ≡ L2 (F, R), where each closed halfspace is identified with its indicator function (VW, 1996, p. 152, Problem 14). A convenient notion of differentiability for statistical functionals is the following. Given a map : D ⊂ DE where D, E are normed spaces, is said to be Hadamard differentiable tangentially to a set D0 ⊂ D at ∈ D (VW, 1996, pp. 372--373) if there exists a continuous linear map : D0 E such that lim
n→∞
( + tn hn ) − () tn
= (h)
for all sequences (tn ) in R and (hn ) in D such that tn → 0, hn → h ∈ D0 and + tn hn ∈ D for every n. In the literature, Hadamard differentiability is also called compact differentiability (van der Vaart, 1998, p. 297).
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
369
In the next section, application of the functional delta method always takes the following form: assuming that the statistical functional is Hadamard differentiable at a distribution F tangentially to some set D0 and given that n F where F has its √ Fn ) − (F)]F (F ) (VW, 1996, Theorem 3.9.4). sample paths in D0 , then n[( 3. Asymptotic normality Given x ∈ Rd , a halfspace H[x, u] is said to be minimal (with respect to F) at x if D(x) = FH[x, u]. If D is continuous, then there exists at least one minimal halfspace at every x. Define the set of smooth points of F to be
SF := {x : x has a unique minimal halfspace or D(x) = 0}. Examples 4.9 and 4.10 in Massé (2004) illustrate the distinction between smooth and unsmooth (rough) points. Note that D(x) = 0 occurs only if x is on the boundary of the support of F or outside of it. In what follows, statistical functionals are Hadamard differentiable at F provided SF is large enough. For any F, it has been seen that D always attains its supremum. Although the compact convex set of maximal points Q∗ is nonempty, it is not necessarily a singleton. The centroid of Q∗ is sometimes called the halfspace median or Tukey median of F (Rousseeuw and Ruts, 1999, p. 219). By virtue of affine invariance and without loss of generality, we shall assume in the rest of the section that the Tukey median of the true distribution F is 0. Let M0 denote the set of probability distributions F such that 0 belongs to the interior of Q . If F is absolutely continuous and its Tukey median is 0, then clearly F ∈ M0 ; moreover, if Fn F, then Corollary 5.3 (b) implies that for n large enough Fn ∈ M0 . In the following, the domain of trimmed means functionals is always taken to be a subset of M0 . Assuming that F ∈ M0 , define the radius functional r(u) ≡ rF (u) := inf{ 0 : u ∈/ Q }, u ∈ Sd−1 . It is readily seen that r is continuous on Sd−1 and that the map ur(u)u is bijective from Sd−1 onto jQ . In the following, it is shown that trimmed means functionals can always be expressed in terms of the radius functional. Let M01 ⊂ M0 denote the domain of trimmed mean functional L1 . Let T be the measurable map from Rd into Sd−1 × [0, ∞) defined by ⎧ x ⎨ , |x| : x0 |x| Tx = ⎩ (u0 , 0) : x = 0 F as the image probability distribution F ◦ T −1 on Sd−1 × [0, ∞), where u0 is any fixed element of Sd−1 . For any F ∈ M01 , define d−1 F(A) := F(A × [0, ∞)) for any Borel subset A of S . If (u, ) denotes an element of Sd−1 × [0, ∞), let Fu be a conditional distribution of given u (Dudley, 1989, 10.2). Then the first trimmed mean functional can be written as r(u) W(D(u))u Fu (d)F(du) d−1 0 . L1 (F) = S r(u) W(D(u)) Fu (d)F(du) Sd−1 0
(4)
In case F is absolutely continuous with density f, the following spherical coordinate representation is also available: r(u()) W(D(u()))f (u())u()J(, ) d d 0 L1 (F) = , r(u()) W(D(u()))f (u())J(, ) d d 0 where for ≡ (1 , . . . , d−1 ) ∈ := (0, )d−2 × (0, 2 ), u() := (cos 1 , sin 1 cos 2 , . . . , sin 1 sin 2 . . . cos d−1 , sin 1 sin 2 . . . sin d−1 ) and J(, ) = d−1 (sin 1 )d−2 (sin 2 )d−3 · · · sin d−2 is the Jacobian. The following smoothness conditions are needed in most of the section. Both are concerned with the behavior of F near the boundary jQ . (A1) F has a density function f which is continuous and positive on a neighborhood of jQ . (A2) There exists 0 ∈ (, 1/2] such that Qc ≡ {x : D(x) 0 } ⊂ SF and W is constant on [0 , 1]. 0
Remark 3.1. Condition (A1) implies that jQ ={x : D(x)=} so that F{x : D(x)=}=0. Under (A2), jQ ⊂ SF and, more generally, all points of depth small enough are smooth. For a large class of absolutely continuous centrally symmetric distributions F, Qc ⊂ SF 0 for any 0 such that < 0 < ∗ (F) = 1/2.
370
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
For any x ∈ Rd , let H[x] ≡ HF [x] denote any minimal halfspace for F at x. Recall that such a minimal halfspace exists and is unique if x ∈ SF and D(x) > 0. Assuming that (A2) holds, then there exists a unique minimal halfspace at every x = r(u)u ∈ jQ . Let n(u) ≡ nF (u) denote the unit vector normal and outward pointing to jH[x], so that H[x] ≡ H[r(u)u, n(u)]. Define the convergence of a sequence of halfspaces (H[xn , un ]) to H[x, u] to mean that xn → x and un → u. Then a compactness argument and the Lebesgue Dominated Convergence Theorem imply that un(u) is continuous on Sd−1 . Given that X ∼ F and u ∈ Sd−1 , let F(·; u) denote the distribution function of the univariate projection u X. Let q(u) ≡ qF (u) := inf{a : F((−∞, a]; u) 1 − } denote the (1 − )-quantile of F(·; u). The following condition is concerned with the regularity of the univariate distributions F(·; n(u)) near q(n(u)). For > 0 and u ∈ Sd−1 , let Ju ( ) := [q(u) − , q(u) + ]. (A3) For every u ∈ Sd−1 , F(·; u) has a density function f (·; u) such that the map uf (q(n(u)); n(u)) is continuous and for some > 0 0 < inf
inf
n(u) a∈Jn(u) ( )
f (a; n(u)) supn(u)
sup
a∈Jn(u) ( )
f (a; n(u)) < ∞.
Remark 3.2. Suppose that (A2) and (A3) hold. Then inf F(H[r(u)u − n(u), n(u)]\H[r(u)u + n(u), n(u)]) > 0, u
where, for each u, H[r(u)u − n(u), n(u)]\H[r(u)u + n(u), n(u)] is seen to be a strip centered on the boundary of the support hyperplane at r(u)u. Roughly, this says that there is no gap in the probability distribution around the boundary of the trimmed region. To give an instance where (A3) is met, consider an elliptically symmetric distribution F with density of the form f (x) = ||−1/2 g(x −1 x), where is a symmetric positive definite matrix of order d and g 0. Then, according to Chen and Tyler (2002), √ f (a/ u u) f (a; u) = 0 √ , a ∈ R, u ∈ Sd−1 , u u where f0 (y1 ) =
⎛ g ⎝y21 +
d
⎞ y2i ⎠ dy2 . . . dyd .
2
Since q(n(u)) = r(u)u n(u), (A3) holds provided f0 is continuous and bounded away from 0 and ∞ on some appropriate interval of positive numbers, a condition which clearly occurs if g is well chosen. Assuming that F and W satisfy conditions (A1)--(A3), define the bounded maps G1i : Rd Rd , i = 1, 2, 3, as follows: W (D(y))1H[y] (x)[y − L1 (F)]F(dy) W(D(y))F(dy) , G11 (x) := Q
Q
G12 (x) := W(D((x))1Q (x)[x − L1 (F)]
Q
W(D(y))F(dy)
and G13 (x) := −
W()f (r(u())u())J(r(u()), )
u() · n(u())f (q(n(u())); n(u()))
1H(u()) (x)[r(u())u() − L1 (F)] d
Q
W(D(y))F(dy) ,
where H(u) ≡ HF (u) := H[r(u)u, −n(u)]. The following central limit theorems hold for the first trimmed mean. Theorem 3.3. (a) Let d 2. Assume that F and W satisfy conditions (A1)--(A3), where in the case d = 2 the density function f is bounded √ on the interior of Q− for some 0 < < . Then n[L1 ( Fn ) − L1 (F)]N(0, Cov(G11 (X) + G12 (X) + G13 (X))). √ (b) Let d 1. Assume that F and W satisfy conditions (A1) and (A2) and W =0 on [0, ]. Then n[L1 ( Fn )−L1 (F)]N(0, Cov(G11 (X)+ G12 (X))). Remark 3.4. The proof of Theorem 3.3(a) relies on the fact that for d 2 the collection of compact convex subsets of a bounded set of Rd is a Donsker class. According to Dudley (1999, p. 365), if d 3, then the corresponding collection is not a Donsker class with respect to the uniform distribution over a bounded set.
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
371
¨ Remark 3.5. Trimmed means such that W = 0 on [0, ] were considered by Dumbgen (1992), Massé (2004) and Zuo et al. (2004a). Continuous differentiability of W implies that observations near the boundary jQn have a weight close to 0, thus excluding application to sums of the form i:Dn (Xi ) Xi /#{i : Dn (Xi ) }. Theorem 3.3(b) has been obtained in Massé (2004, Theorem 3.2) using a different proof based on the asymptotic tightness of the empirical Tukey depth process. Remark 3.6. Under the hypotheses of Theorem 3.3(a) or (b), L1 (F) has a bounded influence function (Hampel et al., 1986). For instance, in part (a) conditions (A1)--(A3) imply that the influence function is IF(x) ≡ IF(x; L1 , F) = G11 (x) + G12 (x) + G13 (x) − FG11 − FG12 − FG13 ,
x ∈ Rd ,
√ Fn ) − L1 (F)]N(0, E[IF(X) · IF(X)T ]). A similar result holds for the centroid trimmed mean functional and Theorem and n[L1 ( 3.8(a) and (b) later on in this section. Example 3.7. Consider the case d = 1. Assume that (A1)--(A3) hold for F, W and some < 0 < 1/2, so that F has a density f which is positive and continuous in the neighborhood of F −1 () and F −1 (1 − ), 0 is the median of F and W is constant on [0 , 1]. Assume √ that f is bounded in some neighborhood of Q = [F −1 (), F −1 (1 − )]. Then Theorem 3.3 says that n[L1 ( Fn ) − L1 (F)]N(0, 2 ), 2 where is the variance of G11 (X) + G12 (X) + G13 (X), and G13 (x) is given by ⎧ −W()[F −1 (1 − ) − L1 (F)] ⎪ ⎪ , x F −1 () ⎪ −1 ⎪ ⎪ F (1−) ⎪ ⎪ ⎪ W(D(y))F(dy) ⎪ ⎪ ⎪ F −1 () ⎪ ⎪ ⎪ ⎪ ⎪ −1 −1 ⎪ ⎪ ⎨ −W()[F () + F (1 − ) − 2L1 (F)] , F −1 () < x < F −1 (1 − ) F −1 (1−) ⎪ W(D(y))F(dy) ⎪ ⎪ ⎪ ⎪ F −1 () ⎪ ⎪ ⎪ ⎪ ⎪ −W()[F −1 () − L1 (F)] ⎪ ⎪ ⎪ , F −1 (1 − ) x ⎪ ⎪ F −1 (1−) ⎪ ⎪ ⎪ ⎩ W(D(y))F(dy) F −1 () Under the above hypotheses, Stigler (1974, Theorem 5, Remark 3) has shown that L1 ( Fn ) is asymptotically N(L1 (F), n−1 2 (W, F)), where
2 (W, F) :=
F −1 (1−) F −1 (1−) F −1 ()
F −1 ()
J(F(x))J(F(y))[F(min(x, y)) − F(x)F(y)] dx dy
and J(u) :=
W(D(F −1 (u))) , Q W(D(y))F(dy)
< u < 1 − , =0 otherwise.
According to Serfling (1980, p. 276, 280), 2 (W, F) = E[IF(X; L1 , F)2 ], where IF(X; L1 , F) is centered and defined by IF(x; L1 , F) := −
1−
[1[x,∞) (y) − F(y)]J(F(y)) dy.
Using integration by parts, it is readily checked that IF(x; L1 , F) − G11 (x) − G12 (x) − G13 (x) is a constant, so that Theorem 3.3(a) overlaps Stigler's central limit theorem. In case W is constant, IF(x; L1 , F) is seen to reduce to the well-known expression of the influence function of the classical trimmed mean (see for example Huber, 1981, p. 58). Let M02 ⊂ M0 denote the domain of trimmed mean functional L2 . For any F ∈ M02 , L2 can be represented through spherical coordinates as L2 (F) =
r(u()) W(D(u()))u()J(, ) d d 0 . r(u()) W(D(u()))J(, ) d d 0
Assuming that F and W satisfy conditions (A1)--(A3), define the bounded maps G21 , G23 : Rd Rd to be G21 (x) :=
Q
W (D(y))1H[y] (x)[y − L2 (F)] dy
Q
W(D(y)) dy
372
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
and G23 (x) := −
W()J(r(u()), )
u() n(u())f (q(n(u())); n(u()))
1H(u()) (x)[r(u())u() − L2 (F)] d
Q
W(D(y)) dy .
The following central limit theorems hold for the centroid trimmed mean. √ Fn ) − L2 (F)]N(0, Cov(G21 (X) + G23 (X))). Theorem 3.8. (a) Let d 1. Assume that F and W satisfy conditions (A1)--(A3). Then n[L2 ( √ (b) Let d 1. Assume that F and W satisfy conditions (A1)--(A2) and W = 0 on [0, ]. Then n[L2 ( Fn ) − L2 (F)]N(0, Cov(G21 (X))). Example 3.9. If d = 1 and W is constant, the centroid functional is seen to take the pleasant form L2 (F) =
F −1 () + F −1 (1 − ) . 2
Let IQ := F −1 (1 − ) − F −1 () denote the length of the -interquantile interval. Then IF(x; L2 , F) = G23 (x) − F(G23 ), where G23 is the step function given by ⎧ F −1 (1 − ) − L2 (F) ⎪ ⎪ , x F −1 () ⎪− ⎪ ⎪ f (F −1 (1 − )) ⎪ ⎪ ⎪ ⎪ ⎨ F −1 () − L2 (F) F −1 (1 − ) − L2 (F) IQ · G23 (x) = − + , F −1 () < x < F −1 (1 − ) −1 (1 − )) −1 ()) ⎪ f (F f (F ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ −1 ⎪ ⎪ ⎩ − F () − L2 (F) , F −1 (1 − ) x. f (F −1 ()) In case F is symmetric, it follows that ⎧ 1 ⎪ − , x F −1 () ⎪ ⎪ −1 (1 − )) ⎪ 2f (F ⎪ ⎨ IF(x; L2 , F) = 0, F −1 () < x < F −1 (1 − ) ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎩ , F −1 (1 − ) x. 2f (F −1 (1 − )) Therefore, under the assumptions of Theorem 3.8 and symmetry, √ n[L2 ( Fn ) − L2 (F)]N(0, /(2f 2 (F −1 ())), a well-known result, found for example in Staudte and Sheather (1990, p. 142). 4. Efficiency and finite-sample robustness 4.1. Asymptotic efficiency As an application of the foregoing results, we study the relative asymptotic efficiency of trimmed means with respect to the sample mean in the case when F has a spherically symmetric density f. Assume without loss of generality that 0 is the point of symmetry, so that L1 (F) = L2 (F) = 0. Let |x| denote the euclidean norm. Then f (x) = g(|x|) and D(x) = (|x|) for some nonnegative functions g and , hence the trimmed region Q is a ball of radius R and u · n(u) = 1 for every u ∈ Sd−1 . Furthermore, it follows that all univariate projections u · X have the same distribution. Under spherical symmetry, it is well known that |X| and X/|X| are independent and X/|X| is uniformly distributed over Sd−1 . In the following this is used to compute the asymptotic covariance matrices in two special cases when F is a multivariate t distribution. Let −(d+)/2 ((d + )/2) |x|2 fT (x; d, ) := ≡ g(|x|), x ∈ Rd , 1+ ( )d/2 (/2) denote the d-dimensional t distribution with degrees of freedom. Let f0 be the common density of the univariate projections. Then clearly f0 (t) = f0 (−t) ≡ fT (t; 1, ), so that R is the (1 − )-quantile of the corresponding univariate t distribution. For a spherically symmetric F and each d > 1, it can be shown (see Section 5) that
d−1 min(R ,|x|) 2 2 2 (d−1)/2 x W( (|y|))F(dy) · G11 (x) = W ( ()) 1 − g()d d · (d − 1)((d − 1)/2) 0 |x| |x|2 Q
(5)
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
373
Table 1 Fn ) relative to sample mean in dimension 2 at bivariate t distribution with degrees of freedom ( = ∞ corresponds to the Asymptotic relative efficiency of L1 ( standard bivariate normal distribution) for various trimmings in the case of a constant weight function
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
3 8 15 30 ∞
1.80 1.06 0.98 0.95 0.92
1.91 1.03 0.94 0.89 0.86
1.93 0.99 0.89 0.85 0.81
1.90 0.95 0.84 0.80 0.76
1.84 0.90 0.80 0.75 0.72
1.77 0.85 0.76 0.71 0.68
1.69 0.81 0.72 0.67 0.64
1.61 0.77 0.68 0.64 0.61
1.54 0.74 0.65 0.61 0.58
and
⎧ ⎪ ⎪ ⎨
d−1 2 W()g(R )Rd R2 2 (d−1)/2 x 1− 2 W( (|y|))F(dy) · G13 (x) = (d − 1)((d − 1)/2) f0 (R ) |x| |x| ⎪ Q ⎪ ⎩ 0
x ∈/ Q
(6)
x ∈ Q
Moreover, Q
W( (|y|))F(dy) = d ·
R 0
W( ())g()d−1 d,
where
d :=
J(1, )d =
2 d/2 (d/2)
is the area of Sd−1 . Analogous expressions for Q W( (|y|)) dy · G21 (x) and Q W( (|y|) dy · G23 (x) are obtained by taking g() ≡ 1 in (5) and (6).
Throughout this section, it follows from the above that all Gij (X) components have 0 expectation, and E[X · X T /|X|2 ] = Id /d, where Id is the unit d-dimensional matrix. 4.1.1. Case when W is constant on [0, 1] Then G11 ≡ 0 and G12 · GT13 ≡ 0, where Q F(dy) · G12 (x) = |x|1[0,R ] (|x|)x/|x|. According to Theorem 3.3(a), a central limit theorem for L ( Fn ) holds when d = 1, 2. When d = 2, the asymptotic covariance matrix is given by 1
E[G12 (X) · G12 (X)T ] + E[G13 (X) · G13 (X)T ] 2 4 ∞ R 1 3 d + 4 g(R ) R g( ) 1− = 2 0 f0 (R )2 R 2 0R g() d
R2
2
g() d · I2 /2.
For any > 2 and any d, the asymptotic covariance matrix of the sample mean under the multivariate t distribution with degrees of freedom is known to be /( − 2)Id . Throughout this subsection, the asymptotic covariance matrix of trimmed means is seen to be a multiple of the identity matrix, so that asymptotic relative efficiency (ARE) with respect to the sample mean takes the form of the inverse of that multiple times /( − 2). Table 1 displays ARE of the first trimmed mean for nine trimmings and five t distributions when d = 2. It is seen that, except for = 3, efficiency of the first trimmed mean decreases as increases. When 8, the trimmed mean is more efficient than the sample mean provided trimming is low enough. All calculations in this section were done with R. Programs can be obtained from the author. Similarly, for any d the asymptotic covariance matrix of L2 ( Fn ) is given by the following multiple of the identity matrix: E[G23 (X)G23 (X)T ] =
4 d−1 d2
d (d − 1)2 ((d − 1)/2)2 f0 (R )2
d−1 ∞ R2 g()d−1 d · Id /d. 1− R
2
Table 2 displays ARE of the centroid trimmed mean for various trimmings and t distributions in dimensions 2--10. Remarkably, in all cases efficiency increases as dimension grows. It is also noted that for every , highest efficiency is achieved when 0.15. Separate calculations have shown that low trimming ( 0.01) determines poor efficiency for every .
374
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
Table 2 Fn ) relative to sample mean at multivariate t distribution with degrees of freedom ( = ∞ corresponds to the standard Asymptotic relative efficiency of L2 ( multivariate normal distribution), nine dimensions, four trimmings and constant weight function
d 2
3
4
5
6
7
8
9
10
3
0.05 0.15 0.30 0.45
0.31 1.20 1.99 2.04
0.33 1.26 2.07 2.18
0.35 1.30 2.11 2.25
0.36 1.32 2.14 2.30
0.36 1.33 2.15 2.34
0.37 1.34 2.17 2.37
0.37 1.35 2.18 2.38
0.37 1.35 2.18 2.40
0.38 1.36 2.19 2.41
8
0.05 0.15 0.30 0.45
0.41 0.88 1.08 1.01
0.46 0.93 1.12 1.07
0.49 0.96 1.14 1.11
0.51 0.97 1.15 1.14
0.52 0.98 1.16 1.16
0.53 0.99 1.16 1.17
0.54 0.99 1.17 1.18
0.55 0.99 1.17 1.19
0.55 1.00 1.17 1.19
15
0.05 0.15 0.30 0.45
0.48 0.87 0.98 0.90
0.55 0.92 1.02 0.96
0.59 0.95 1.03 0.99
0.61 0.96 1.04 1.01
0.63 0.97 1.05 1.03
0.64 0.98 1.06 1.04
0.66 0.98 1.06 1.05
0.66 0.98 1.06 1.06
0.67 0.98 1.07 1.06
30
0.05 0.15 0.30 0.45
0.53 0.87 0.94 0.85
0.61 0.92 0.97 0.91
0.65 0.95 0.99 0.94
0.68 0.96 1.00 0.96
0.71 0.97 1.00 0.97
0.73 0.98 1.01 0.98
0.74 0.98 1.01 0.99
0.75 0.98 1.02 1.00
0.76 0.99 1.02 1.00
∞
0.05 0.15 0.30 0.45
0.59 0.87 0.90 0.81
0.67 0.93 0.93 0.86
0.73 0.96 0.95 0.89
0.77 0.97 0.96 0.91
0.80 0.98 0.96 0.92
0.82 0.98 0.97 0.93
0.84 0.99 0.97 0.94
0.86 0.99 0.97 0.95
0.87 0.99 0.98 0.95
4.2. Case when W = 0 on [0, ] Let > 0 be such that 0 := + < 1/2, so that R0 < R . Here we consider weight functions of the form
W(a) ≡ W, (a) :=
⎧ 0, ⎪ ⎪ ⎪ ⎪ ⎨ 2(a − )2 / 2 ,
0a
⎪ −2(a − )2 / 2 + 4(a − )/ − 1, ⎪ ⎪ ⎪ ⎩ 1,
+ /2 a < +
a < + /2 + a1
Then W is of the type considered in (A2) with continuous derivative given by ⎧ 4(a − )/ 2 , a < + /2 ⎪ ⎨ W (a) = −4(a − )/ 2 + 4/ , + /2 a < + ⎪ ⎩ 0, elsewhere For such W we note that G11 (x) = 0 for |x| R0 . Fig. 1 shows W and W when = 0.10 and = 0.35 (0 = 0.45). For W = W, , the asymptotic covariance matrix of L1 ( Fn ) is given by E[G11 (X)G11 (X)T ] + E[G12 (X)G12 (X)T ] + 2E[G11 (X)G12 (X)T ] ⎡ ⎛ ⎞2 −2 d−1 ∞ min(,R ) d−1 2 R 2 ⎢ 4 a ⎜ ⎟ −1 d−1 d ⎢ = d W( ())g() d W ( (a)) 1 − g(a)a da⎠ ⎝ ⎣ 2 R0 0 (d − 1)2 ((d − 1)/2)2 R0 × g()d−1 d +
R 0
4 (d−1)/2 (d − 1)((d − 1)/2) ⎞
W(D())2 g()d+1 d +
⎤ ⎛ d−1 R 2 2 ⎥ a ⎜ ⎟ W ( (a)) 1 − g(a)ad da⎠ W( ())g()d−1 d⎥ × ⎝ ⎦ · Id /d. 2 R0
R0
For L1 ( Fn ), calculations show that best overall performance is achieved when 0.01 and is small. It can also be checked that higher trimming determines very low efficiency in dimensions 6. Even though has less effect than on overall performance, a large increases efficiency when is small.
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
375
1.0 5 W' (x,0.1,0.35)
W (x, 0.1, 0.35)
0.8 0.6 0.4 0.2
4 3 2 1
0.0
0 0.0
0.0
0.8
0.4
0.4
x
0.8 x
Fig. 1. Weight function W = W, and derivative W when = 0.10 and = 0.35 (0 = 0.45).
Table 3 Fn ) relative to sample mean at multivariate t distribution with degrees of freedom ( = ∞ corresponds to the standard Asymptotic relative efficiency of L1 ( multivariate normal distribution), where for each d and each the first entry refers to W0.001,0.05 and the second to W0.0005,0.1
d 2
3
4
5
6
7
8
9
10
3
1.795 1.978
1.856 2.025
1.885 2.030
1.894 2.010
1.889 1.976
1.875 1.933
1.855 1.885
1.830 1.834
1.802 1.782
5
1.254 1.311
1.247 1.288
1.218 1.241
1.176 1.180
1.127 1.113
1.076 1.043
1.023 0.974
0.970 0.907
0.918 0.844
10
1.092 1.106
1.055 1.056
0.998 0.983
0.927 0.895
0.854 0.806
0.779 0.717
0.705 0.632
0.633 0.553
0.565 0.481
30
1.024 1.019
0.971 0.955
0.893 0.861
0.806 0.757
0.714 0.651
0.620 0.546
0.529 0.452
0.442 0.353
0.362 0.276
∞
0.997 0.984
0.936 0.910
0.851 0.810
0.751 0.696
0.648 0.579
0.542 0.524
0.440 0.388
0.388 0.277
0.285 0.196
To illustrate cases of relatively good performance, two pairs of values of (, ) have been selected, namely (0.001, 0.05) and (0.0005, 0.1). As seen from Table 3, for both choices efficiency can decrease significantly as dimension grows. Finally, for the weight function W = W, the asymptotic covariance matrix of L2 ( Fn ) takes the form E[G21 (X)G21 (X)T ] =
4 d−1 2 d 0 W( ())d−1 d (d − 1)2 ((d − 1)/2)2 ⎛ ⎞2 d−1 ∞ min(,R ) 2 2 a ⎜ ⎟ × W ( (a))ad 1 − da⎠ g()d−1 d · Id /d. ⎝ R
R0
R0
2
Calculations show that L2 ( Fn ) achieves good efficiency when 0.1, provided is not too small. They also show that low trimming ( 0.05) implies low efficiency when 8 and d is large; furthermore, it can be checked that high trimming determines high efficiency when is small. To illustrate situations of high efficiency, two pairs of (, ) have been selected, namely (0.25, 0.15) and (0.35, 0.01). Table 4 exhibits ARE in dimensions 2 to 10 for these W = W, . The main feature to be seen is that efficiency is little dependent on the dimensions considered. To summarize, in some situations both trimmed means can have high efficiency relative to the sample mean. However, in the cases considered a well-chosen centroid trimmed mean tends to perform much better than the trimmed mean of the first type. This is most obvious for a weight function of the type W, . Here, for dimensions 2 to 10, efficiency of the first trimmed mean is seen to decrease as dimension grows (except for = 3). This behavior is far different from that of the centroid whose efficiency, in some instances, increases with dimension and, in general, is little dependent on that variable. Finally, in defence of the first trimmed mean, one can point out that its two-dimensional version can achieve higher efficiency than the centroid.
376
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
Table 4 Fn ) relative to sample mean at multivariate t distribution with degrees of freedom ( = ∞ corresponds to the standard Asymptotic relative efficiency of L2 ( multivariate normal distribution), where for each d and each the first entry refers to W0.25,0.15 and the second to W0.35,0.01
d 2
3
4
5
6
7
8
9
10
3
2.028 2.085
2.091 2.166
2.116 2.217
2.129 2.192
2.134 2.240
2.237 2.296
2.253 2.338
2.241 2.374
2.212 2.395
8
1.085 1.075
1.120 1.112
1.138 1.136
1.148 1.142
1.238 1.204
1.209 1.257
1.165 1.283
1.125 1.284
1.098 1.262
15
0.986 0.970
1.020 1.002
1.036 1.023
1.046 1.094
1.146 1.126
1.090 1.115
1.025 1.079
0.975 1.035
0.946 0.997
30
0.943 0.922
0.974 0.952
0.991 0.972
1.000 1.071
1.110 1.097
1.026 1.065
0.940 1.002
0.878 0.938
0.847 0.889
∞
0.905 0.873
0.934 0.907
0.949 0.926
0.960 0.938
0.966 0.947
0.971 0.952
0.975 0.957
0.978 0.960
0.981 0.964
4.3. Breakdown point Finite-sample robustness of a location statistic can be measured by means of the enlargement breakdown point. Let t be a location statistic well defined at the data set X := {X1 , . . . , Xn } in Rd . The enlargement breakdown point of t at X is defined to be the smallest fraction m/(n + m) of outliers that can make the statistic arbitrarily large: $
#
∗ (t, X) := min m/(n + m) : sup t(Xm ) = ∞ , m1
Xm
where the supremum is taken over all corrupted samples Xm obtained by enlarging X with m arbitrary points (possibly equal) values, and where we assume that t(Xm ) is well defined for any such sample. For location equivariant statistics, it is well-known that the enlargement breakdown point 1/2 (Donoho, 1982). It is understood next that a trimmed mean is evaluated on a corrupted sample Xm by using the empirical depth function with respect to Xm . For clarity, let then D(·, X) and D(·, Xm ) denote the empirical depth functions based on data sets X and Xm , respectively. For any data set X of size n, define Qn (X) := {x : D(x, X) }. For any real number r, let r denote the smallest integer greater than or equal to r. The following proposition and corollary extend Lemma 3.1 and Proposition 3.3 in Donoho and Gasko (1992) to trimmed means with a nonconstant weight function and, in the case of the corollary, to noncentrally symmetric distributions. Proposition 4.1. Let 0 < min W max W < ∞ on [, 1]. Assume that the -trimmed mean L1 is well defined at X and that for some 0 < 1 there exists Xi belonging to X ∩ Q n (X). Then, provided 0 < < n/( n/(1 − ) + n),
∗ (L1 , X) =
n/(1 − ) . n + n/(1 − )
Corollary 4.2. Let X(n) be a sample of size n from a distribution F such that FQ∗ − > 0 for every 0 < < ∗ . Let W be as in the proposition and let /(1 − ) < ∗ . Then almost surely, for n large enough, the -trimmed mean L1 (X(n) ) is well defined and lim ∗ (L1 , X(n) ) = .
n→∞
Remark 4.3. The condition FQ ∗ − > 0 for every 0 < < ∗ is true for instance if F has a density function which is positive in the neighborhood of the deepest point. If F is absolutely continuous, then ∗ 1/2, so that the condition /(1 − ) < ∗ means that the degree of trimming is less than 1/3. In the worst case scenario where ∗ = 1/(d + 1), must be less than 1/(d + 2). If d = 1, it is well known that the -trimmed mean of the first type has the breakdown point n/(1 − )/(n + n/(1 − )) for any < 1/2; in view of the fact that ∗ 1/2, the condition /(1 − ) < ∗ is therefore not necessary. This is because the trimmed mean is defined for every one-dimensional data set when < 1/2. Proposition 4.4. Let 0 < min W max W < ∞ on [, 1]. Let the -trimmed mean statistic L2 be well defined at data set X of size n d + 1. Assume that, for some 0 < 1, Q n (X) contains d + 1 elements Xij of X, 1 j d + 1, not belonging to the same hyperplane.
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
377
Then, provided 0 < < n/( n/(1 − ) + n),
∗ (L2 , X) =
n/(1 − ) . n + n/(1 − )
The proof of the following corollary is similar to that of Corollary 4.2. In the statement, absolute continuity of F ensures that with probability one data sets are in general position, i.e. no more than d data points are contained in any hyperplane. Corollary 4.5. Let W be as in the proposition and assume /(1− ) < ∗ . Suppose X(n) is a sample of size n from an absolutely continuous distribution F whose density function is positive in some neighborhood of the deepest point. Then almost surely, for n large enough, L2 (X(n) ) is well defined and lim ∗ (L2 , X(n) ) = .
n→∞
5. Technical details and proofs The following lemma is needed in several convergence arguments. Lemma 5.1. Assume Fn F. Let gn , g : Rd Rm be Borel measurable functions such that |gn | h where h : Rd [0, ∞) is measurable and bounded. For every x belonging to a set of F-probability 1 and for every sequence (xn ) converging to x, suppose that gn (xn ) → g(x). Then lim gn (x)Fn (dx) = g(x)F(dx). n→∞
Proof. We use the Skorokhod representation theorem for weakly convergent probability distributions (Dudley, 1989, Theorem 11.7.2). If Fn F, this says that there exist on some probability space Rd -valued random variables (Xn ) and X such that Xn and X have distributions Fn and F, respectively, and Xn → X almost surely. Since gn (Xn ) → g(X) almost surely and (gn (Xn )) is bounded, the result follows by the Dominated Convergence Theorem. Given probability distributions F and G, let DF − DG ∞ := sup |DF (x) − DG (x)| x
and F − GH := sup |FH − GH]. H∈H The next proposition and corollary describe uniform consistency of sequences of depth functions and trimmed regions. Let denote weak convergence of probability distributions. Proposition 5.2. (a) (Donoho and Gasko, 1992, pp. 1816--1817) For any F, almost surely lim Dn − D∞ = 0.
n→∞
(b) Let F be absolutely continuous and assume Fn F. Then lim DFn − D∞ = 0.
n→∞
(7)
Proof. (b) Absolute continuity of F implies that F(jH) = 0 for any H ∈ H. Since H is contained in the class of measurable convex subsets of Rd , it follows from Billingsley and TopsBe (1967, p. 14) that lim Fn − FH = 0.
n→∞
The conclusion then follows from the inequality DF − DG ∞ F − GH holding for any F and G.
(8)
378
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
Corollary 5.3. (a) For any F and for every 0 < < min(, ∗ − ), almost surely Q+ ⊂ Qn ⊂ Q− if n is large enough. (b) Let F be absolutely continuous and Fn F. Then for every 0 < < min(, ∗ − ), Q+ (F) ⊂ Q (Fn ) ⊂ Q− (F) if n is large enough. Proof. According to Proposition 5.2(a), almost surely there exists n0 such that, uniformly in x, Dn (x) − < D(x) < Dn (x) + for all n n0 . Clearly this implies (a) for n n0 . Part (b) is proved similarly using Proposition 5.2(b). Remark 5.4. For the sequence ( Fn ) of empirical distributions, note that (7) holds almost surely for any F (Proposition 5.2(a)). It is not difficult to see that Proposition 5.2(b) may fail for an appropriate sequence if F is not absolutely continuous. The domain of a statistical functional is some subset of M, the set of probability distributions on Rd . To apply Hadamard differentiability, such a domain must be some subset of a normed space containing the sample paths of the Brownian bridge, typically a space of the form ∞ (F, Rm ). In all cases where supf ∈F |f | G for some F-integrable envelope G, this paper identifies
F ∈ M with the map f F(f ). The ensuing embedding of M is denoted by M ⊂ ∞ (F, Rm ).
Example 5.5. Provided closed halfspaces are identified with their indicator function, M can be embedded as above into a bounded subset of ∞ (H) := ∞ (H, R). Note that Fn − FH → 0 implies that Fn F (Cramér--Wold Theorem), but that the converse can be false for a discrete F. However, if F is absolutely continuous, then Fn → F in ∞ (H) if and only if Fn F, as can be seen from the proof of Proposition 5.2(b). In the following, the Hadamard derivative of a trimmed mean functional is proved to exist at F under absolute continuity of F and some additional conditions. In that situation, if F is approached along a sequence Fn = F + tn hn of the above form, then Fn − FH = tn hn H = |tn |hn H → 0, so that Fn F. In what follows, this property of the embedding is used in some convergence arguments. For any nonempty A ⊆ Rd , the depth functional D : F DF clearly defines a map from M into ∞ (A). The next two lemmas are critical in deriving Hadamard differentiability of the trimmed means functionals. For any h ∈ UC(H, F ), it can be observed that h is constant on the class of halfspaces of F-probability 0. In the sequel, this entails that hH[x] does not depend on the choice of the minimal halfspace H[x] whenever the latter is nonunique and D(x) = 0. Lemma 5.6. Assume F is absolutely continuous and let A be a nonempty closed subset of Rd such that A ⊆ SF . Then D : M ⊂
∞ (H)∞ (A) is Hadamard-differentiable at F tangentially to UC(H, F ) with derivative DF (h) = hH[·] ∈ ∞ (A),
h ∈ UC(H, F ),
or, equivalently, lim sup |hH[x] − (DFn (x) − D(x))/tn | = 0,
n→∞ x∈A
for all sequences (tn ) in R and (hn ) in ∞ (H) such that tn → 0, hn → h ∈ UC(H, F ) and Fn := F + tn hn ∈ M. Proof. Choose (tn ), (hn ), h and (Fn ) as in the statement. For any x and n such that tn 0, let Hn (x) = H[x, un (x)] such that Fn Hn (x) DFn (x) + |tn |/n. Then, for any x ∈ Rd , tn hn Hn (x) = (Fn − F)Hn (x) DFn (x) − D(x) + |tn |/n (Fn − F)H[x] + |tn |/n = tn hn H[x] + |tn |/n. Dividing the terms of the last inequalities by tn and taking the limit, it suffices to prove that limn hn Hn (x) = limn hn H[x] = hH[x] uniformly in x ∈ A. Since limn supx∈A |(hn − h)Hn (x)| = 0 and h is uniformly continuous on H, we need only show that, for any > 0, supx∈A F(Hn (x) H[x]) if n is large enough, where denotes the symmetric difference. Assume this is not true. Then there exist 0 and a subsequence (nk ) such that supx∈A F(Hnk (x) H[x]) > 0 for k = 1, 2, . . . Moreover, for any x ∈ A such that D(x) < 0 /5, the proof of Proposition 5.2(b) and (8) imply that F(Hnk (x)H[x]) FHnk (x) + FH[x] |(F − Fnk )Hnk (x)| + Fnk Hnk (x) + 0 /5
sup |(Fnk − F)H| + DFn (x) + (|tnk | + 1)/nk + 0 /5 0 H
k
if k is large enough. Thus, there exists a subsequence (xnk ) in A ∩ Q /5 such that F(Hnk (xnk ) H[xnk ]) > 0 for k = 1, 2, . . . By 0 compactness, there exist subsequences (nk ) and (xn ) such that xn → x0 , H[xn ] → H[x0 , v1 ] and Hn (xn ) → H[x0 , v2 ] for some k
k
k
k
k
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
379
x0 ∈ A ∩ Q /5 and v1 , v2 ∈ Sd−1 , where we refer to p. 4 for halfspace convergence. Then, by the Lebesgue Dominated Convergence 0 Theorem, we have lim FHF [xn ] = FH[x0 , v1 ] = limk D(xn ) = D(x0 ) 0 /5. k
k
k
Furthermore, the Lebesgue Dominated Convergence Theorem, the above definition of Hn (x), Proposition 5.2(b) and (8) yield FH[x0 , v2 ] = lim FHn (xn ) = lim Fn Hn (xn ) = lim DF (xn ) = lim D(xn ) = D(x0 ). n k k k k k k k k k k k k
Since x0 ∈SF , there is a unique minimal halfspace at x0 , so that H[x0 , v1 ]=H[x0 , v2 ]=HF [x0 ]. This entails that F(Hn (xn ) H[xn ])< 0 k k k
if k is large enough, a contradiction. Thus DF (h)(x) = lim
n→∞
DFn (x) − D(x) = hH[x] tn
uniformly in x ∈ A, proving the lemma.
Define the functional W ◦ D : M∞ (Rd ) as the map F W ◦ DF . Lemma 5.7. Assume that F and W satisfy conditions (A1) and (A2). Then W ◦ D : M ⊂ ∞ (H)∞ (Rd ) is Hadamard-differentiable at F tangentially to UC(H, F ) with derivative (W ◦ D)F (h)(·) = W (D(·))hH[·] ∈ ∞ (Rd ),
h ∈ UC(H, F ),
or, equivalently, lim sup |W (D(x))hH[x] − (W(DFn (x)) − W(D(x)))/tn | = 0, x∈Rd
n→∞
for all sequences (tn ) in R and (hn ) in ∞ (H) such that tn → 0, hn → h ∈ UC(H, F ) and Fn := F + tn hn ∈ M. Proof. Choose (tn ), (hn ), h and (Fn ) as in the statement. The mean value theorem implies that for any x ∈ Rd DF (x) − D(x) W(DFn (x)) − W(D(x)) = W (n (x)) n , tn tn
(9)
for some n (x) between DFn (x) and D(x). Now the chain of inequalities at the beginning of the proof of Lemma 5.6 shows that % % % DF (x) − D(x) % % < ∞; sup sup %% n % tn n x
(10)
moreover, by Proposition 5.2(b) and uniform continuity of W , lim W (n (x)) = W (D(x)) n
uniformly in x ∈ Rd . The proposition thus follows from Lemma 5.6 applied with A = Qc and the fact that W (D(x)) = 0 for 0 x ∈ Q0 . Next we reformulate in terms of Hadamard differentiability a result in Nolan (1992). For a proof, see VW (1996, Lemma 3.9.32). Lemma 5.8. Assume that F ∈ M0 satisfies (A3) and that for every x = r(u)u ∈ jQ there exists a unique minimal halfspace at x. Then the radius functional r : M0 ⊂ ∞ (H)∞ (Sd−1 ) is Hadamard-differentiable at F tangentially to UC(H, F ) and hH(u) rF (h)(u) = − , u · n(u)f (q(n(u)); n(u))
h ∈ UC(H, F ),
where rF (h) is a uniform limit in the sense that
% % % % hH(u) sup %%− − (rFn (u) − r(u))/tn %% = 0, n→∞ u · n(u)f (q(n(u)); n(u)) d−1 lim
u∈S
for all sequences (tn ) in R and (hn ) in ∞ (H) such that tn → 0, hn → h ∈ UC(H, F ) and Fn := F + tn hn ∈ M0 .
380
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
The following technical lemma allows us to apply Fubini's theorem in some of the proofs later on. Lemma 5.9. Let F be absolutely continuous and assume Qc ⊂ SF for some 0 ∈ (, 1/2]. Then the map (x, y)1H [x] (y) is jointly F 0 Borel measurable when restricted to (Q ∩ Qc ) × Rd . 0
Proof. We need only show that C := {(x, y) : D(x) 0 , y ∈ H[x]} is a closed set. Assume that (x0 , y0 ) ∈ (Q ∩ Qc ) × Rd is a limit point of C, meaning that (xn , yn ) → (x0 , y0 ) for some sequences 0 (xn ) in Q ∩ Qc and (yn ) with the property that yn ∈ H[xn ]. For any n, write H[xn ] := H[xn , un ]. Since un · yn un · xn for each 0 n, it is enough to prove that un → u0 . By compactness, H[xnk , unk ] → H[x0 , v0 ] for some sequence (unk ), so that the Lebesgue Dominated Convergence Theorem yields D(xnk ) = FH[xnk , unk ] → FH[x0 , v0 ] = D(x0 ). Because x0 ∈ SF and D(x0 ) , this implies that v0 = u0 . Since this holds for any subsequence of (un ), it follows that un → u0 , which ends the proof. The next lemma is proved like Lemma 5.9 and used in the same way. Lemma 5.10. Assume that for every x = r(u)u ∈ jQ there exists a unique minimal halfspace at x. Then the map (, y)1H(u()) (y) is jointly Borel measurable on × Rd .
To derive Hadamard derivatives of trimmed mean functionals, it will be necessary to embed M into two or three spaces of the above form, simultaneously. Given, for example, three such spaces ∞ (Fi , Rmi ), i = 1, 2, 3, this is done by embedding M into ∞ (F1 × F2 × F3 , Rm1 +m2 +m3 ) through the identification map (f1 , f2 , f3 )(F(f1 ), F(f2 ), F(f3 )),
fi ∈ Fi , i = 1, 2, 3.
Assume now that F is approached along a sequence Fn = F + tn hn ∈ M where tn → 0 and hn → h. Then hn = (Fn − F)/tn defines a sequence of bounded signed measures that can be embedded as above into ∞ (F1 × F2 × F3 , Rm1 +m2 +m3 ). Thus, hn → h in ∞ (F1 × F2 × F3 , Rm1 +m2 +m3 ) means that hn − hF ×F ×F = 1 2 3
sup
(f1 ,f2 ,f3 )∈F1 ×F2 ×F3
|(hn (f1 ), hn (f2 ), hn (f3 )) − h(f1 , f2 , f3 )| → 0,
so that hn (fi ) converges uniformly on Fi for i = 1, 2, 3. Without risk of confusion, for those h which arise as uniform limits of the above form, we set h(fi ) := limn hn (fi ), i = 1, 2, 3, meaning that h(f1 , f2 , f3 ) is identified with (h(f1 ), h(f2 ), h(f3 )). Finally, UC(F1 × F2 × F3 , F ) will denote the set of h in ∞ (F1 × F2 × F3 , Rm1 +m2 +m3 ) that are uniformly continuous with respect to a semimetric F defining the product topology on (F1 , F ) × (F2 , F ) × (F3 , F ). If h is a uniform limit of the above form, this says that h ∈ UC(Fi , F ) for i = 1, 2, 3. For any 0 < < , let K := {K compact convex : K ⊂ Q− }. Define p : Rd Rd by W(D(y))F(dy) . p(x) := W(D(x))[x − L1 (F)] Q
Let p · K := {p · 1K : K ∈ K }. Proposition 5.11. (a) Assume that F and W satisfy conditions (A1)--(A3). Let G1 := {G11 + G12 + G13 }, where G1i ∈ ∞ (Rd , Rd ), i=1, 2, 3, are defined in Section 3. Then, for every 0 < < , the trimmed mean functional L1 : M01 ⊂ ∞ (G1 × H ×p· K , R2d+1 )Rd is Hadamard-differentiable at F tangentially to UC(G1 × H × p · K , F ) with derivative given by L1F (h) = h(G11 + G12 + G13 ),
h ∈ UC(G1 × H × p · K , F ).
(b) Assume that F and W satisfy conditions (A1) and (A2) and W = 0 on [0, ]. Let G01 := {G11 + G12 }. Then L1 : M01 ⊂ ∞ (G01 × H, Rd+1 )Rd is Hadamard-differentiable at F tangentially to UC(G01 × H, F ) with derivative given by L1F (h) = h(G11 + G12 ),
h ∈ UC(G01 × H, F ).
Proof. (a) Assume tn → 0 in R and hn → h are such that h ∈ UC(G1 × H × p · K , F ) and Fn := F + tn hn ∈ M01 . Let A1 (F) and B1 (F) denote the numerator and denominator of L1 (F), respectively.
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
Consider L1 (Fn ) − L1 (F) 1 = tn B1 (Fn )
&
' A1 (Fn ) − A1 (F) B (Fn ) − B1 (F) − 1 L1 (F) . tn tn
381
(11)
Then Lemma 5.1 implies that B1 (Fn ) → B1 (F). Since Fn F (Example 5.5), this can indeed be seen by taking gn (x)=1Q (Fn ) (x)W(DFn (x)) and g(x) = 1Q (x)W(D(x)), then by using Proposition 5.2(b), Corollary 5.3(b) and continuity of W ◦ D. Thus, it suffices to study convergence of the term within the brackets of (11). From (4) we have r (u) Fn A1 (Fn ) − A1 (F) 1 = W(DFn (u))uF( nu (d)Fn (du) tn tn Sd−1 0 r(u) W(D(u))uFu (d)F(du) − Sd−1 0
=
1 tn
Sd−1 0
+
r (u) Fn
Sd−1 0
+
r (u) Fn
r (u) Fn
Sd−1 r(u)
[W(DFn (u)) − W(D(u))]uF( nu (d)Fn (du)
W(DF (u))u[F( nu (d)Fn (du) − Fu (d)F(du)] W(DF (u))uFu (d)F(du)
:= I1n + I2n + I3n . The first term of the decomposition can be written W (D(x))hH[x]xF(dx) + o(1) = W (D(x))hn H[x]xF(dx) + o(1) I1n = Q
Q
& =
Q
' W (D(x))1H[x] (y)xF(dx) hn (dy) + o(1).
Indeed the first equality follows from Corollary 5.3(b), Lemmas 5.1 and 5.7, the second one from uniform convergence of (hn ) and the third one from Lemma 5.9 and Fubini's theorem. The first equality requires further explanation. With respect to the F semimetric, it is clear that xH[x] is continuous on Q ∩ Qc , hence xhH[x] is itself continuous on that set. Since W = 0 on 0
[0 , 1], this ensures that xW (D(x))hH[x]x is continuous on Rd , therefore Lemma 5.1 can be applied. Now, since F is absolutely continuous, I3n can be represented through spherical coordinates. Putting k(, ) := W(D(u()))f (u())J(, ), an application of the mean value theorem to the inner integral entails that I3n =
r (u()) Fn [rFn (u()) − r(u())] 1 k(, )u() d d = k(n (), )n ()u() d, tn r(u()) tn
(12)
where n () lies between r(u()) and rFn (u()). Then by Lemma 5.8, the Lebesgue Dominated Convergence Theorem, uniform convergence of (hn ), Lemma 5.10 and Fubini's theorem, hn H(u()) I3n = − k(r(u()), )r(u())u() d + o(1) u() · n(u())f (q(n(u())); n(u())) ' & W()f (r(u())u())J(r(u()), ) 1H(u()) (y)r(u())u() d hn (dy) + o(1). = − u() · n(u())f (q(n(u())); n(u())) As above, we have the decomposition B1 (Fn ) − B1 (F) := J1n + J2n + J3n . tn Now limn F(Q (Fn )Q ) = 0 by the Lebesgue Dominated Convergence Theorem. Since hn → h ∈ UC(p · K , F ) and Q (Fn ) ⊂ Q− for n large enough (Corollary 5.3(b)), it follows that I2n − J2n L1 (F) = W(D(y))[y − L1 (F)]hn (dy) = W(D(y))[y − L1 (F)]hn (dy) + o(1). Q (Fn )
Q
382
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
Thus from (11) L1 (Fn ) − L1 (F) = hn (G11 + G12 + G13 ) + o(1), tn proving part (a) of the proposition. To prove part (b), note that under the condition W = 0 on [0, ] a trimmed mean can be expressed as a ratio of integrals over Rd without explicit need of a trimmed region or a radius functional. Clearly this entails that A1 (Fn ) − A1 (F) = tn
[W(DFn (x)) − W(D(x))] tn
&
=
Q
xF n (dx) +
W(D(x))xhn (dx)
' W (D(x))1H[x] (y)xF(dx) hn (dx) +
Q
W(D(x))xhn (dx) + o(1),
so that the above proof still applies in a simplified form. Proof of Theorem 3.3. Part (a) is proved by applying the functional delta method. We need to verify that n F in ∞ (G1 × H × p · K , R2d+1 ), where some version of F has its sample paths in UC(G1 × H × p · K , F ). Clearly this amounts to checking that the empirical central limit theorem holds componentwise. Since G1 is a singleton made up of a bounded measurable map and since H is a well-known Donsker class for any probability distribution, it comes down to verify that p · K is a Donsker class for F. If d = 1, then any collection of compact intervals is known to be a VC- class (VW, 1996, Problem 14), hence a Donsker class for any probability distribution. If d = 2, then the class K has the bracketing entropy bound log N[] ( , K , L2 (F))
c
,
> 0,
for some constant c depending only on Q− and F, which entails that K is a Donsker class for F (VW, 1996, p. 83, 85, 163). For d 2, this implies that p · K = p · 1Q− · K is itself a Donsker class (VW, 1996, Example 2.10.10). By Proposition 5.11 and the functional delta method, the conclusion follows, where the limit distribution is obtained from the multivariate central limit theorem. Finally, part (b) follows from the fact that G1 and H are Donsker classes for any F and d. Proposition 5.12. (a) Assume that F and W satisfy conditions (A1)--(A3). Let G2 := {G21 + G23 }, where G21 , G23 are defined in Section 3. Then the trimmed mean functional L2 : M02 ⊂ ∞ (G2 × H, Rd+1 )Rd is Hadamard-differentiable at F tangentiallly to UC(G2 × H, F ) with derivative given by L2F (h) = h(G21 + G23 ),
h ∈ UC(G2 × H, F ).
(b) Assume that F and W satisfy conditions (A1) and (A2) and that W = 0 on [0, ]. Let G02 := {G21 }. Then L2 : M02 ⊂ ∞ (G02 ×
H, Rd+1 )Rd is Hadamard-differentiable at F tangentiallly to UC(G02 × H, F ) with derivative given by L2F (h) = h(G21 ),
h ∈ UC(G02 × H, F ).
Proof. Similar to that of Proposition 5.11 but easier. Using the standard notation, a sequence of the form (L2 (Fn ) − L2 (F))/tn can be entirely expressed in terms of spherical coordinates. Since the integrals involve the Lebesgue measure, only two terms are needed to describe the behavior of (A2 (Fn ) − A2 (F))/tn and Dominated Convergence Theorem can be applied instead of Lemma 5.1. Proof of Theorem 3.8. Very similar to that of Theorem 3.3. Since G2 and H are Donsker classes for any d, (a) and (b) hold without restriction on dimension. Proof of Eq. (5) and (6). We start with Eq. (6). If x ∈ Q , then 1H(u()) (x) = 1 for any , so that G13 (x) = 0 by symmetry. If x ∈/ Q , then Q
W( (|y|))F(dy) · G13 (x) = −
= −
W()g(R )J(R , ) 1H(u()) (x)R u() d f0 (R )
W()g(R )Rd 1H(u()) (x)J(1, )u() d. f0 (R )
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384
383
For any x = (|x|, 0, . . . , 0) with |x| > R , let 0 = { : x ∈ H(u())}. If 01 := arcsin(1 − R2 /|x|2 )1/2 ∈ (0, /2) and d 2, then the first component of the last integral is given by & ' ' & J(1, ) cos 1 d = 0 cos 1 (sin 1 )d−2 d1 ··· 2 sin d−2 dd−2 · · · d2 1 0 0 0 d−1 2 R2 2 (d−1)/2 = − , 1− (d − 1)((d − 1)/2) |x|2 while all other components are 0. By rotational equivariance, this yields (6). Now note that W( (|y|))F(dy) · G11 (x) = W ( (|y|))1H[y] (x)yF(dy) Q
Q
=
R 0
W ( ())
&
' (1 − 1H[u()]c (x))J(, )u() d g() d,
c
where, up to a null set, H[u()] and H[u(), −n(u())] are the same. Using the close analogy between the sets H[u, −n(u)] and H(u) ≡ H[r(u)u, −n(u)], this implies that the inner integral can be evaluated as above for every min(R , |x|). For any , J(, )u() d = 0, hence (5) follows. Proof of Proposition 4.1.. As in the proof of Proposition 3.2 in Donoho and Gasko (1992), one can see that L1 is well defined at any corrupted sample Xm such that m 1 and n/(n + m); indeed we have for some Xi ∈ X (n + m)D(Xi , Xm ) nD(Xi , X) n (n + m). Since W > 0 on [, 1], we note that the trimmed mean takes its value in the convex hull of the data set. Put m∗ = n/(1 − ). As in the proof of Lemma 3.1 in Donoho and Gasko (1992), one can check that ∗ (L1 , X) m∗ /n + m∗ . To show that breakdown occurs if we take m = m∗ , define Xm∗ by adjoining to X a data set of m∗ points all equal to the value Y ∈/ X. Put c1 = inf a∈[,1] W(a), c2 = supa∈[,1] W(a). Then, since L1 is well defined on Xm∗ , it follows that 1 L1 (Xm∗ ) (n + m∗ )c2
) ) ) ) ∗ )m W(D(Y, Xm∗ ))Y + ) ∗ ) X ∈X∩Q n+m (X i
) ) ) 1 ) ⎢ ∗ m c Y − ) ⎣ 1 ) (n + m∗ )c2 )X ∈X∩Q n+m∗ (X ⎡
i
) ) ) ) W(D(Xi , Xm∗ ))Xi ) ) ) ∗)
m
)⎤ ) ) )⎥ W(D(Xi , Xm∗ ))Xi )⎦ . ) ) ∗)
m
Since Y can be taken arbitrarily large, it follows that L1 is unbounded over the class of such corrupted samples. Proof of Corollary 4.2. Let and be such that /(1 − ) < < + < ∗ . According to the strong law of large numbers, lim #{Xi : DF (Xi ) + }/n = F(Q+ ) > 0
n→∞
almost surely. By Proposition 5.2(a), almost surely, for n large enough, there exists Xi such that Dn (Xi ) . Since < n/( n/(1 − ) + n) if n is large enough, Proposition 4.1 can be applied. Proof of Proposition 4.4. As in the beginning of the proof of Proposition 4.1, one can see that ∪{Xij : 1 j d + 1} ⊂ Qn+m (Xm ) for any corrupted sample Xm , where m 1 is such that n/(n + m). Since the Xij 's do not all belong to the same hyperplane, their convex hull has a nonempty interior, hence L2 (Xm ) is well defined. Note that the hypothesis on W implies that L2 takes its value within the convex hull of the data set where it is defined. Let m∗ := n/(1 − ). Just as for L1 , it can be proved that ∗ (L2 , X) m∗ /n + m∗ . To show that breakdown occurs when m = m∗ , ∗ consider the points Xi1 , . . . , Xid , all belonging to Qn+m (Xm∗ ) for any corrupted sample Xm∗ . Denote the unique hyperplane they
define by H0 := {x : uT0 x = c0 }, where u0 ∈ Sd−1 . Then the (d − 1)-simplex generated by Xij , 1 j d, contains some (d − 1)-ball B(y0 , r0 ), where y0 ∈ H0 and r0 > 0. Enlarge X by putting m∗ points at the single point Y = y0 + u0 , > 0. Since D(Y, Xm∗ )
m∗ m∗ + n
n/(1 − ) = , n/(1 − ) + n
384
J.-C. Massé / Journal of Statistical Planning and Inference 139 (2009) 366 -- 384 ∗
∗
this defines Xm∗ such that Qn+m (Xm∗ ) ⊃ {Y} ∪ {Xij , 1 j d}. Because Qn+m (Xm∗ ) is convex, it must contain the right circular cone C0 with base B(y0 , r0 ) and vertex Y. Suppose is large enough so that Y is outside the convex hull of X. Let C1 denote the smallest right circular cone that contains the convex hull of Xm∗ , whose vertex is Y and base is a (d − 1)-ball centered at y0 − u0 for some 0. Without loss of generality, it can be assumed that coordinates are such that the x1 axis coincides with the axis {y0 + tu0 : t ∈ R} and that the origin lies at y0 . Then (resp. + ) is seen to be the height of the cone C0 (resp. C1 ). Let r1 r0 denote the radius of the base of C1 . Put c1 = minx∈[,1] W(x) and c2 = maxx∈[,1] W(x). Then, since ∗
C0 ⊆ Qn+m (Xm∗ ) ⊆ C1 , standard multiple integration techniques yield ∗ W(D(x, Xm∗ ))x1 dx c1 1 c1 r0 d−1 2 Q n+m (Xm∗ ) C0 x1 dx = , d + 1 c2 r1 W(D(x, Xm∗ )) dx c2 C dx +
n+m∗ Q
(Xm∗ )
1
where r0 , r1 and are either independent of Xm∗ or bounded. Letting → ∞, it is seen that the last term tends to ∞, hence m∗ points suffice to send to infinity the first coordinate of the centroid trimmed mean. Acknowledgements The author greatly appreciates the careful reading, constructive remarks and suggestions made by an Associate Editor and two referees, which led to improvement of the paper. References Bai, Z.D., He, X., 1999. Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann. Statist. 27, 1616--1637. Billingsley, P., TopsBe, F., 1967. Uniformity in Weak Convergence. Z. Wahrsch. Verw. Geb. 7, 1--16. Boos, D.D., 1979. A differential for L-statistics. Ann. Statist. 7, 955--959. Chen, Z., 1995. Robustness of the half-space median. J. Statist. Plann. Inference 46, 175--181. Chen, Z., Tyler, D.E., 2002. The influence function and maximum bias of Tukey's median. Ann. Statist. 30, 1737--1759. Donoho, D., 1982. Breakdown properties of multivariate location estimators. Ph.D. Qualifying Paper, Department of Statistics. Harvard University. Donoho, D., Gasko, M., 1987. Multivariate generalizations of the median and trimmed mean, I. Technical report, University of California, Berkeley. Donoho, D., Gasko, M., 1992. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20, 1803--1827. Dudley, R.M., 1989. Real Analysis and Probability. Wadsworth & Brooks/Cole, Pacific Grove, CA. Dudley, R.M., 1999. Uniform Central Limit Theorems. Cambridge University Press, Cambridge, UK. ¨ Dumbgen, L., 1992. Limit theorems for the simplicial depth. Statist. Probab. Lett. 14, 119--128. Fernholz, L.T., 1983. von Mises Calculus for Statistical Functionals. Lecture Notes in Statistics, vol. 19. Springer, New York. Hampel, F.R., Ronchetti, E., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on Influence Functions. Wiley, New York. Huber, P.J., 1981. Robust Statistics. Wiley, New York. Liu, R.Y., 1990. On a notion of data depth based on random simplices. Ann. Statist. 18, 405--414. Liu, R.Y., 1992. Data depth and multivariate rank tests. In: Dodge, Y. (Ed.), L1 -Statistics and Related Methods. North-Holland, Amsterdam, pp. 279--294. Liu, R.Y., Parelius, J.M., Singh, K., 1999. Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion). Ann. Statist. 27, 783--858. Massé, J.-C., 2002. Asymptotics for the Tukey median. J. Multivariate Anal. 81, 286--306. Massé, J.-C., 2004. Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean. Bernoulli 10, 397--419. Massé, J.-C., Plante, J.-F., 2003. A Monte Carlo study of the accuracy and robustness of ten bivariate location estimators. Comput. Statist. Data Anal. 42, 1--26. Nolan, D., 1992. Asymptotics for multivariate trimming. Stochastic Process. Appl. 42, 157--169. Nolan, D., 1999. On min-max majority and deepest points. Statist. Probab. Lett. 43, 325--333. Rousseeuw, P.J., Ruts, I., 1999. The depth function of a population distribution. Metrika 49, 213--244. Serfling, R., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York. Staudte, R.G., Sheather, S.J., 1990. Robust Estimation and Testing. Wiley, New York. Stigler, S.M., 1974. Linear functions of order statistics with smooth weight functions. Ann. Statist. 2, 676--693. Tukey, J.W., 1975. Mathematics and picturing data. In: Proceedings of the 1975 International Congress of Mathematics, vol. 2, pp. 523--531. van der Vaart, A.W., 1998. Asymptotic Statistics. Cambridge University Press, Cambridge. van der Vaart, A.W., Wellner, J.A., 1996. Weak Convergence and Empirical Processes. Springer, New York. Zuo, Y., Serfling, R., 2000. General notions of statistical depth function. Ann. Statist. 28, 461--482. Zuo, Y., Cui, H., He, X., 2004a. On the Stahel--Donoho estimator and depth-weighted means of multivariate data. Ann. Statist. 32, 167--188. Zuo, Y., Cui, H., Young, D., 2004b. Influence function and maximum bias of projection depth based estimators. Ann. Statist. 32, 189--218.