Asymptotics of generalized depth-based spread processes and applications

Asymptotics of generalized depth-based spread processes and applications

Accepted Manuscript Asymptotics of generalized depth-based spread processes and applications Jin Wang PII: DOI: Reference: S0047-259X(18)30234-3 htt...

677KB Sizes 0 Downloads 44 Views

Accepted Manuscript Asymptotics of generalized depth-based spread processes and applications Jin Wang

PII: DOI: Reference:

S0047-259X(18)30234-3 https://doi.org/10.1016/j.jmva.2018.09.012 YJMVA 4422

To appear in:

Journal of Multivariate Analysis

Received date : 1 May 2018 Please cite this article as: J. Wang, Asymptotics of generalized depth-based spread processes and applications, Journal of Multivariate Analysis (2018), https://doi.org/10.1016/j.jmva.2018.09.012 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Asymptotics of generalized depth-based spread processes and applications Jin Wanga,∗ a Department

of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA

Abstract In this paper, we study the asymptotic behavior of generalized depth-based spread processes, which include the scale curve of Liu et al. [25] as a special case. Both uniform strong and weak convergences of the generalized depth-based spread processes are established. As applications, we obtain the asymptotic distributions of some nonparametric multivariate kurtosis measures. Applications to compare spread and kurtosis of two multivariate data sets, and to assess multivariate normality, are also discussed. Keywords: Asymptotics, Depth function, Generalized spread process, Multivariate kurtosis, Multivariate normality, Nonparametric method, Scale curve.

1. Introduction Generally, a depth function DF (x) is a nonnegative real-valued mapping which provides a distribution-based center-outward ordering of points x in Rd . Desirable properties for a depth function are: (1) affine invariance, namely DFAX+b (Ax + b) = DF X (x) for any nonsingular d × d matrix A and d-vector b; (2) maximality at “center”; (3) monotonicity relative to the deepest point; (4) vanishing at infinity. As an important tool, depth functions have been widely used in nonparametric multivariate analysis. Various nonparametric multivariate descriptive measures have been defined via depth functions such as depthbased medians, weighted and trimmed means [7, 8, 25, 30, 41, 47] for location, the scale curve [25], the generalized spread function [45] and depth weighted scatter matrices [25, 48] for spread, the multivariate skewness measures of Liu et al. [25], the Lorenz curves, shrinkage plots and fan plots of Liu et al. [25], and the nonparametric multivariate kurtosis functional of Wang and Serfling [42] for kurtosis. Some multivariate rank tests have also been constructed through depth functions; see, e.g., [24, 26, 27]. Two distributions in Rd can be compared visually by depth-depth (DD) plots [25] and generalized depth-based quantile-quantile plots [45]. In addition, depth functions have been applied to regression [36], discrimination and classification [9, 14, 20, 21, 31], cluster analysis [17, 19], outlier detection [6], and functional data analysis [5, 13, 28, 29]. General discussion on depth functions and their applications can be found in [25, 39, 49]. Given a depth function DF (x) and α ≥ 0, the α-depth trimmed region (or α-depth inner region) is defined as T F (α) = {x ∈ Rd : DF (x) ≥ α} and the pth central region C F (p) = supα {T F (α) : Pr{T F (α)} ≥ p}, i.e., the smallest α-depth trimmed region having probability weight at least p. The boundary ∂T F (α) of T F (α) is called the α-depth contour and can be characterized by the radius function, viz. rF (α, u) = inf{r ≥ 0 : ru < T F (α)},

u ∈ S d−1 = {u ∈ Rd : |u| = 1},

when the origin is the deepest point. Denote by F D and F D−1 the cdf and the quantile function of the random depth DF (X), respectively. If F D is continuous, which is assumed throughout this paper, then C F (p) = T F {F D−1 (1 − p)}. To characterize the spread of a distribution F in Rd , Liu et al. [25] introduced the depth-based scale curve defined, for all p ∈ [0, 1), by VF (p) = volume{C F (p)}. ∗ Corresponding

author URL: [email protected] (Jin Wang)

Preprint submitted to Journal of Multivariate Analysis

September 25, 2018

The influence function of VF (p) was derived by Wang and Serfling [43] for a general class of depth functions, and the asymptotic distribution of its sample version VFn (p) was established by Wang and Serfling [44] for the univariate case. Various applications of VF (p) have appeared in the literature. For example, VF (p) was utilized to define a multivariate kurtosis measure, a “fan plot”, by Liu et al. [25] and a nonparametric multivariate kurtosis functional by Wang and Serfling [42]. To cover other applications, Wang and Zhou [45] proposed a generalized depth-based spread function defined, for all p ∈ [0, 1), by λF (p) = m{C F (p)} for any measure m in Rd , which may relate to F and thus is denoted by mF hereafter. As an increasing function of p, λF (p) characterizes the spread of F in some sense. Broad applications of λF (p) can be found in Wang and Zhou [45]. When mF is the Lebesgue measure in Rd , λF (p) becomes the depth-based scale curve of Liu et al. [25]. If mF is absolutely continuous with respect to the Lebesgue measure, then by the Radon–Nikodym Theorem, there exists a density ϕF (x) of mF such that Z λF (p) =

C F (p)

ϕF (x)dx.

We will confine attention to those measures. Given a sample Xn = {X1 , . . . , Xn }, the sample version λFn (p) of λF (p) is obtained by replacing F with Fn , the empirical distribution function of Xn . Because C F (p) = {x ∈ Rd : DF (x) ≥ F D−1 (1 − p)}, λF (p) is a positive decreasing function of F D−1 (1 − p), which −1 depends on F in general. If this function is denoted by φF , i.e., λF (p) = φF {F D−1 (1− p)}, then λ−1 F (r) = 1− F D {φF (r)} = d FφF (D) (r). It follows that λF (p) is the quantile function of φF (D). Given a probability measure P in R , a class A of Borel sets in Rd , and a real-valued set function κ defined on A, Einmahl and Mason [11] defined a generalized quantile function given, for all t ∈ (0, 1), by U(t) = inf{κ(A) : P(A) ≥ t, A ∈ A}. Using this notion, Serfling [38] introduced a generalized depth-based quantile function given, for all p ∈ (0, 1), by U(p; P, CF ) = inf{κ(C) : P(C) ≥ p, C ∈ CF },

where CF = {T F (α) : 0 < α < α∗F = sup x∈Rd {DF (x)}}. When κ{T F (α)} is strictly decreasing in α, i.e., assumption A3 (ii) of [38], then for all p ∈ (0, 1), U(p; P, CF ) = κ{C F (p)}.

Furthermore, if κ is a measure in Rd , U(p; P, CF ) is a special case of λF (p) = mF {C F (p)}. However, both U(t) and U(p; P, CF ) do not cover λF (p) in general since the set function κ in them is given, whereas the measure mF may relate to F and thus may be unknown. Invoking some assumptions on P, DF (x) and κ, Serfling [38] showed that for each closed interval [a, b] ⊂ (0, 1), {n1/2 {U(p; Pn , CF ) − U(p; P, CF )} : p ∈ [a, b]}

{U 0 (p)B(p) : p ∈ [a, b]},

where U 0 (p) = dU(p; P, CF )/d p, Pn is the empirical probability measure and B(p) denotes a Brownian bridge, i.e., a Gaussian process with E{B(p)} = 0 and cov{B(p1 ), B(p2 )} = min(p1 , p2 ) − p1 p2 . It should be pointed out that even if U(p; P, CF ) = λF (p), U(p; Pn , CF ) is essentially different from λFn (p) since it involves CF , which is usually unknown in practical problems. For example, when both κ and mF are the Lebesgue measure in Rd , U(p; P, CF ) = VF (p) = λF (p). However, U(p; Pn , CF ) becomes V(p; Pn , CF ) = inf{volume(C): Pn (C) ≥ p, C ∈ CF }, i.e., the volume of the smallest T F (α) containing at least a fraction p of the data points, which clearly differs from VFn (p). In this paper, we study the generalized depth-based spread function λF (p) and its sample version λFn (p). Important properties of λF (p) are given in Section 2. Uniform strong and weak convergences of λFn (p) are established in Section 3 and Section 4, respectively. In Section 5, we obtain the asymptotic distributions of the “fan plot” of Liu et al. [25] and the nonparametric multivariate kurtosis functional of Wang and Serfling [42]. Furthermore, applications to compare spread and kurtosis of two multivariate data sets, and to assess multivariate normality are also discussed. Some concluding remarks are given in Section 6. Throughout this paper, we use uppercase letters to denote distribution functions and their lowercase counterparts to denote density functions if they exist. For example, we denote by F X and f X the cdf and density of a random 2

vector X in Rd , respectively. When X is a random variable, the quantile function of X is denoted by F X−1 . Without confusion, we will omit the subscript. The indicator function of a set A is denoted by 1A and S d−1 = {u ∈ Rd : |u| = 1}. a.s. Finally, “−→” denotes convergence almost surely and “ ” convergence in distribution. 2. Important properties For the univariate case, an important property of a spread (scale or dispersion) measure σF is that σFaX+b = |a|σF X for any a , 0 and b. Bickel and Lehmann [4] and Oja [33] even considered it as a criterion for a univariate spread measure. Does λF (p) possess the similar property or when does it possess the property? The following result provides some answers. Proposition 1. (i) If DF (x) and ϕF (x) are affine invariant, then λFAX+b (p) = | det(A)|λF X (p) and λAFn +b (p) = | det(A)|λFn (p) for any nonsingular d × d matrix A and d-vector b, where AFn + b denotes the empirical distribution function of AXn + b = {AX1 + b, . . . , AXn + b}. (ii) If DF (x) and ϕF (x) are invariant under translations, then for any d-vector b, λF X+b (p) = λF X (p) and λFn +b (p) = λFn (p).

Proof. Since DF (x) is affine invariant, C FAX+b (p) = AC F X (p) + b by Wang and Serfling [42]. Thus Z Z λFAX+b (p) = ϕFAX+b (y)d(y) = ϕFAX+b (Ax + b)d(Ax + b) AC F X (p)+b

C FAX+b (p)

=

Z

C F X (p)

ϕF X (x)| det(A)|dx = | det(A)|λF X (p).

In the same way, we have λAFn +b (p) = | det(A)|λFn (p), and hence (i) holds. As for assertion (ii), it is straightforward. Remark 1. When DF (x) is affine invariant, a typical case for ϕF (x) to be affine invariant is that ϕF (x) = w{DF (x)} for any function w not relating to F. Another question of interest is as follows. If λFY (p) = cλF X (p) for any c > 0, what is the relationship between X and Y? We have the following result for elliptically symmetric distributions. A continuous distribution F in Rd is called elliptically symmetric, denoted by Ed (h; θ, Λ), if it has a density given, for all x ∈ Rd , by f (x) = |Λ|−1/2 h{(x − θ)> Λ−1 (x − θ)}

for a nonnegative function h and a positive definite matrix Λ. Proposition 2. Suppose that X ∼ Ed (h1 ; θ X , Λ X ) and Y ∼ Ed (h2 ; θY , ΛY ), the depth function is affine invariant and attains its maximum at the center, and the measure mF has a density of the form ϕF (x) = γF {(x − θ)> Λ−1 (x − θ)}β for some elliptically symmetric distribution F = Ed (h; θ, Λ), where γF is a positive constant, which may relate to F, and β , −d/2. Then for any c > 0, λFY (p) = cλF X (p) if and only if Y = AX + b in distribution for some nonsingular d × d matrix A and d-vector b.

Proof. The assertion follows from Theorem 2.3 of Wang and Zhou [45] with trivial modifications. Under the conditions of Proposition 2, we have the following explicit formula for λF (p).

Proposition 3. Suppose that the depth function is affine invariant and attains its maximum at the center. If X ∼ F = Ed (h; θ, Λ) and ϕF (x) = γF {(x − θ)> Λ−1 (x − θ)}β with γF > 0 and β ∈ R, then λF (p) =

oβ+d/2 γF πd/2 |Λ|1/2 n −1 F|Λ−1/2 (X−µ)|2 (p) , Γ(d/2)(β + d/2)

where | · | denotes the Euclidean norm. As a special case, VF (p) =

od/2 πd/2 |Λ|1/2 n −1 F|Λ−1/2 (X−µ)|2 (p) . Γ(d/2 + 1) 3

Proof. By Wang and Serfling [42], −1 C F (p) = {x ∈ Rd : (x − µ)> Λ−1 (x − µ) ≤ F|Λ −1/2 (X−µ)|2 (p)}.

Taking the transformation y = Λ−1/2 (x − µ) first and then using the polar coordinates representation, we obtain Z Z (y> y)β dy λF (p) = γF {(x − µ)> Λ−1 (x − µ)}β dx = γF |Λ|1/2 {y∈Rd :y> y≤F −1−1/2

C F (p)

Z



(X−µ)|2

{F −1−1/2

(p)}

1/2

(p)} |Λ (X−µ)|2 2π r2β rd−1 dr Γ(d/2) 0 γF πd/2 |Λ|1/2 β+d/2 = {F −1−1/2 . 2 (p)} Γ(d/2)(β + d/2) |Λ (x−µ)|

= γF |Λ|1/2

d/2

Since γF = 1 and β = 0 for VF (p), the expression for VF (p) is straightforward. 3. Uniform strong convergence In this section, we focus on the uniform strong convergence of λFn (p). Since λFn (p) is a function of F D−1n (1 − p) as discussed in Section 1, we start from F Dn (α) and F D−1n (p), the sample versions of F D (α) and F D−1 (p), respectively. The following conditions are assumed. (C1 ) DF (x) is strictly decreasing along any ray from the deepest point and vanishes at infinity. a.s.

(C2 ) sup x∈S |DFn (x) − DF (x)| −→ 0 for any bounded set S ⊂ Rd . (C3 ) T Fn (α) is convex and closed almost surely for any n and α ∈ (0, α∗F ), where α∗F = sup x∈Rd {DF (x)}. (C4 ) F is continuous. a.s.

Remark 2. It has been proved that sup x |DFn (x) − DF (x)| −→ 0 for the half-space depth by Donoho and Gasko [7], the simplicial depth by Liu [23], the majority and Mahalanobis depths by Liu and Singh [26], and the projection depth and all Type D depth functions by Zuo and Serfling [50]. Thus Condition (C2 ) is satisfied for all those depth functions. Condition (C3 ) can be replaced by 0

(C3 ) There is a Glivenko–Cantelli class G such that T F (α) ∈ G and T Fn (α) ∈ G almost surely for any n and α ∈ (0, α∗F ). It is satisfied by many depth functions. See Remark 5 for details. Theorem 1. Under conditions (C1 )–(C4 ), as n → ∞, a.s.

(i) F Dn (α) −→ F D (α) uniformly in α ∈ [a0 , α∗F ] for any a0 ∈ (0, α∗F ); a.s.

(ii) F D−1n (p) −→ F D−1 (p) uniformly in p ∈ [p0 , 1] for any p0 ∈ (0, 1). Proof. (i) Let C be the set of all measurable convex sets in Rd . Then by (C3 ), T Fn (α) ∈ C almost surely for any n and α ∈ (0, α∗F ), which along with (C2 ) implies T F (α) ∈ C for any α ∈ (0, α∗F ). Since F is continuous, we see from Theorem 4.2 of Ranga Rao [35] that C is a Glivenko–Cantelli class. Thus a.s.

|Pn − P|C = sup{|Pn (C) − P(C)| : C ∈ C} −→ 0. a.s.

By Theorem 4.5 of Dyckerhoff [10], T Fn (α) −→ T F (α) uniformly in α ∈ [a0 , α∗F ]. Then Lebesgue’s Dominated Convergence Theorem yields a.s.

P{T Fn (α)} −→ P{T F (α)} uniformly in α ∈ [a0 , α∗F ]. 4

Thus |Pn {T Fn (α)} − P{T F (α)}| ≤ |Pn {T Fn (α)} − P{T Fn (α)}| + |P{T Fn (α)} − P{T F (α)}| a.s.

≤ |Pn − P|C + |P{T Fn (α)} − P{T F (α)}| −→ 0 uniformly in α ∈ [a0 , α∗F ].

In the same way, we obtain a.s.

Pn {T Fn (α)\∂T Fn (α)} −→ P{T F (α)\∂T F (α)} uniformly in α ∈ [a0 , α∗F ]. Therefore

a.s.

Pn {∂T Fn (α)} −→ P{∂T F (α)} uniformly in α ∈ [a0 , α∗F ].

Combining those results, we have

a.s.

F Dn (α) = Pn (Dn ≤ α) = 1 − Pn {T Fn (α)} + Pn {∂T Fn (α)} −→ 1 − P{T F (α)} + P{∂T F (α)} = F D (α) uniformly in α ∈ [a0 , α∗F ]. a.s. (ii) Since F Dn (α) −→ F D (α) uniformly in α, the assertion follows by a well-known result for quantile processes; see, e.g., Lemma 1.5.6 of Serfling [37]. This concludes the proof of Theorem 1. a.s.

Zuo and Serfling [50], and Dyckerhoff [10] showed the sample α-depth trimmed region satisfies T Fn (α) −→ T F (α) uniformly in α ∈ [α0 , α∗F ] for any α0 ∈ (0, α∗F ). He and Wang [16] established the uniform strong convergence of the sample pth central region C Fn (p) for elliptically symmetric distributions. Given the uniform strong convergence of F D−1n (p), we have the following general uniform strong convergence result for C Fn (p). Corollary 1. Under conditions (C1 )–(C4 ), for any ε > 0 and n sufficiently large, C F (p − ε) ⊂ C Fn (p) ⊂ C F (p + ε) a.s. a.s. uniformly in p ∈ [0, p0 ] for any p0 ∈ (0, 1). In other words, C Fn (p) −→ C F (p) uniformly in p ∈ [0, p0 ] for any p0 ∈ (0, 1) as n → ∞. a.s.

Proof. By Theorem 1, F D−1n (1 − p) −→ F D−1 (1 − p) uniformly in p ∈ [0, p0 ]. Then the result follows from Theorem 4.1 of Zuo and Serfling [50] with αn = F D−1n (1 − p) and α = F D−1 (1 − p). Remark 3. Dyckerhoff [10] pointed out that strict monotonicity of DF (x) is needed for the uniform strong convergence of T Fn (α). This condition is included in (C1 ). a.s.

Theorem 2. Suppose that conditions (C1 )–(C4 ) are satisfied. If ϕF (x) is bounded and ϕFn (x) −→ ϕF (x) uniformly a.s. in x ∈ Rd , then λFn (p) −→ λF (p) uniformly in p ∈ [0, p0 ] for any p0 ∈ (0, 1) as n → ∞. a.s.

Proof. Since ϕF (x) is bounded and ϕFn (x) −→ ϕF (x) uniformly in x ∈ Rd , we obtain by Corollary 1 and Lebesgue’s Dominated Convergence Theorem, Z Z a.s. λFn (p) = ϕFn (x)dx −→ ϕF (x)dx = λF (p) uniformly in p ∈ [0, p0 ]. C Fn (p)

C F (p)

This completes the argument. As a special case, we have the uniform strong convergence result for VFn (p). a.s.

Corollary 2. Under the conditions (C1 )–(C4 ), VFn (p) −→ VF (p) uniformly in p ∈ [0, p0 ] for any p0 ∈ (0, 1) as n → ∞. Proof. For this case, ϕF (x) = 1 = ϕFn (x). The assertion follows from Theorem 2 immediately. Remark 4. Wang and Serfling [44] established the uniform strong convergence of VFn (p) under a different set of conditions. 5

4. Uniform weak convergence Now we investigate the uniform weak convergence of λFn (p). We also start from F Dn (α) and F D−1n (p). Throughout this section, we assume that (A0 ) DF (x) and ϕF (x) are invariant under translations. Then by Proposition 1 (ii), both λF (p) and λFn (p) are not affected by any translation. Thus we can assume without loss of generality that the origin is the deepest point. 4.1. Uniform weak convergence of F Dn (α) Since the uniform weak convergence of n1/2 {F Dn (α) − F D (α)} is stronger than the uniform strong convergence of F Dn (α), we need stronger assumptions as follows. (A1 ) There exists a Donsker class D of sets such that T F (α) ∈ D and T Fn (α) ∈ D almost surely for any n and α ∈ (0, α∗F ). R (A2 ) n1/2 {rFn (α, u) − rF (α, u)} = ψ(x, α, u)dvn (x) + o p (1) with {ψ(x, α, u) : α ∈ (0, α∗F ), u ∈ S d−1 } being a Donsker class and vn = n1/2 (Fn − F). (A3 ) F has a continuous density f . Remark 5. Compared with the conditions for uniform strong convergence, assumptions (A1 ) and (A2 ) are not directly on the depth function. Assumption (A1 ) is satisfied by the half-space depth, the Mahalanobis depth, the projection ˇ depth and the simplicial depth. Actually for those depth functions, there is a Vapnik–Cervonenkis (VC) class that contains T F (α) and T Fn (α) almost surely for any n and α ∈ (0, α∗F ). For a fixed α ∈ (0, α∗F ), the asymptotic representation of n1/2 {rFn (α, u) − rF (α, u)} was derived for the half-space depth by Nolan [32] and the projection depth by Zuo [47]. Considering α there as a variable, we see that (A2 ) is satisfied by those depth functions. First we prove a lemma, which is important for our uniform weak convergence results. Lemma 1. If A = {g(x, w, t) : w ∈ W ⊂ Rk , t ∈ T ⊂ Rm } with W bounded is a Donsker class and exists for t ∈ T , then (Z ) B= g(x, w, t)dw : t ∈ T ⊂ Rm

R

W

g(x, w, t)dw

W

is a Donsker class.

Proof. Let ∆1 , . . . , ∆n be a partition of W and for each i ∈ {1, . . . , n}, Ai be the area or volume of ∆i with max(Ai ) → 0 as n → ∞. By the definition of integral, Z

W

R

n X Ai g(x, wi , t), n→∞ n i=1

g(x, w, t)dw = lim

where wi ∈ ∆i . That is, W g(x, w, t)dw is in the point-wise sequential closure of the symmetric convex hull of A. Then the assertion follows by Theorems 2.10.2 and 2.10.3 of van der Vaart and Wellner [40]. Theorem 3. Under assumptions (A0 )–(A3 ), for any closed interval [a, b] ⊂ (0, α∗F ), Z 1/2 n {F Dn (α) − F D (α)} = − {τ1 (x, α) + τ2 (x, α)}dvn (x) + o p (1) for all α ∈ [a, b], where τ1 (x, α) = IT F (α) (x),

τ2 (x, α) =

Z

S d−1

f {rF (α, u)u}|J{rF (α, u), u}|ψ(x, α, u)du 6

with J(r, u) being the Jacobian of the polar transformation x = ru, and vn = n1/2 (Fn − F). Furthermore, n o n1/2 {F Dn (α) − F D (α)}, α ∈ [a, b] {G D (α), α ∈ [a, b]} , where G D (α) is a Gaussian process with E{G D (α)} = 0 and covariance structure cov{G D (α1 ), G D (α2 )} = E[{τ1 (X, α1 ) + τ2 (X, α1 )}{τ1 (X, α2 ) + τ2 (X, α2 )}] − E{τ1 (X, α1 ) + τ2 (X, α1 )}E{τ1 (X, α2 ) + τ2 (X, α2 )}. Proof. Decompose Pn {T Fn (α)} − P{T F (α)} as follows: Z Pn {T Fn (α)} − P{T F (α)} =

dFn (x) −

T Fn (α)

where

I1n = It is obvious that

Z

T Fn (α)

dFn (x) −

Z

T Fn (α)

dF(x),

n1/2 I1n = a.s.

I2n =

Z

T Fn (α)

Z

T F (α)

Z

T Fn (α)

dF(x) = I1n + I2n ,

dF(x) −

Z

T F (α)

dF(x).

dvn (x).

Since D is a Donsker class and (A2 ) implies T Fn (α) −→ T F (α) uniformly in α, we then have the equicontinuity of the empirical processes vn . Lemma VII.15 of Pollard [34] leads to Z 1/2 n I1n = IT F (α) (x)dvn (x) + o p (1). Now we work on I2n . Using the polar coordinates representation, x = ru with r = |x| and u = x/|x|, we have (Z ) (Z rF (α,u) ) Z Z n 1/2 1/2 1/2 n I2n = n dF(x) − dF(x) = n f (ru) |J(r, u)| dr du. T Fn (α)

S d−1

T F (α)

rF (α,u)

Then by the Mean Value Theorem, there exists δn (α, u) between rFn (α, u) and rF (α, u) such that Z 1/2 n I2n = f {δn (α, u)u}|J{δn (α, u), u}|[n1/2 {rFn (α, u) − rF (α, u)}]du + o p (1). S d−1

a.s.

Since (A2 ) implies rFn (α, u) −→ rF (α, u) and f is continuous, we have δn (α, u) = rF (α, u)+o p (1), |J{δn (α, u), u}| = |J{rF (α, u), u}| + o p (1), and f {δn (α, u)u} = f {rF (α, u)u} + o p (1) uniformly in α ∈ [a, b] and u ∈ S d−1 for n sufficiently large. Combining all those results, and calling on Condition (A2 ) and Fubini’s Theorem, we obtain (Z ) Z 1/2 n I2n = f {rF (α, u)u}|J{rF (α, u), u}| ψ(x, α, u)dvn (x) du + o p (1) S d−1 # Z "Z = f {rF (α, u)u}|J{rF (α, u), u}|ψ(x, α, u)du dvn (x) + o p (1). S d−1

Given (A1 ), {τ1 (x, α) : α ∈ [a, b]} is a Donsker class by Theorem 2.10.1of van der Vaart and Wellner [40] and {τ2 (x, α) : α ∈ [a, b]} is a Donsker class by Lemma 1, F ={τ1 (x, α) + τ2 (x, α) : α ∈ [a, b]} is a Donsker class; see, e.g., Example 2.10.7 of van der Vaart and Wellner [40]. Then by the Central Limit Theorem for empirical processes, n o n1/2 [Pn {T Fn (α)} − P{T F (α)}] : α ∈ [a, b] {G D (α) : α ∈ [a, b]}.

Since each sample path of the Gaussian process is uniformly continuous with respect to the L2 (P) seminorm on F , we conclude that n1/2 [Pn {∂T Fn (α)} − P{∂T F (α)}] = o p (1) uniformly in α ∈ [a, b]. Therefore,   n1/2 {F Dn (α) − F D (α)} = n1/2 [1 − Pn {T Fn (α)} + Pn {∂T Fn (α)}] − [1 − P{T F (α)} + P{∂T F (α)}] = −n1/2 [Pn {T Fn (α)} − P{T F (α)}] + o p (1),

which leads to the assertions in Theorem 3. 7

4.2. Uniform weak convergence of F D−1n (α) It is now relatively straightforward to obtain the uniform weak convergence of F D−1n (p). Theorem 4. Suppose that assumptions (A0 )–(A3 ) are satisfied. If fD {F D−1 (p)} > 0 on [a, b] ⊂ (0, 1), then Z   1 τ1 {x, F D−1 (p)} + τ2 {x, F D−1 (p)} dvn (x) + o p (1), p ∈ [a, b], n1/2 {F D−1n (p) − F D−1 (p)} = fD {F D−1 (p)}

and

n o n1/2 {F D−1n (p) − F D−1 (p)}, p ∈ [a, b]

n o −G D {F D−1 (p)}/ fD {F D−1 (p)}, p ∈ [a, b] .

Proof. By Lemma 3.9.20 of van der Vaart and Wellner [40], the map φ : F D (α) → F D−1 (p) is Hadamard-differentiable 0 at F D tangentially to the set of functions g that are continuous at F D−1 (p) with derivative φF (g) = −g{F D−1 (p)}/ fD {F D−1 (p)}. Then the assertions follow by the Delta method; see, e.g., Theorem 3.9.4 of van der Vaart and Wellner [40]. Remark 6. As a special case of U(p; Pn , CF ), Serfling [38] treated a different sample version of F D−1 (p). Compared with his uniform weak convergence conclusion for that sample version, an additional term is involved in our result for F D−1n (p). 4.3. Uniform weak convergence of λFn (p) To establish the uniform weak convergence of λFn (p), we first strengthen assumption (A2 ) to R (A2 ) n1/2 {rFn (α, u) − rF (α, u)} = ψ(x, α, u)dvn (x) + o p (1) with ψ(x, α, u) being continuous in α and {ψ(x, α, u) : α ∈ (0, α∗F ), u ∈ S d−1 } being a Donsker class. Furthermore, rF (α, u) has a continuous partial derivative with respect to α. 0

We also invoke an additional assumption, viz. R (A4 ) n1/2 {ϕFn (x) − ϕF (x)} = η(x, y)dvn (y) + o p (1) with {η(x, y) : x ∈ C F (b)\C F (a) for 0 < a < b < 1} being a uniformly bounded Donsker class and ϕF is continuous. Remark 7. Essentially, (A4 ) requires uniform weak convergence of n1/2 {ϕFn (x) − ϕF (x)}. Uniform weak convergence of n1/2 {DFn (x) − DF (x)} has been established for the simplicial depth by Arcones and Gin´e [2] and D¨umbgen [8], for the half-space depth by Mass´e [30], for the projection depth and some generalized half-space depth by Arcones et al. [1]. For those depth functions, if ϕF (x) = w{DF (x)} with w being a continuously differentiable function, then (A4 ) is satisfied. 0

In addition, we need the details of the density function fD (α) of DF (X) and λF (p) = dλF (p)/d p, which are given in the following lemma. Lemma 2. Suppose that assumption (A0 ) is satisfied. (i) If ∂rF (α, u)/∂α exists, then fD (α) = −

Z

S d−1

f {rF (α, u)u}|J{rF (α, u), u}|

∂ rF (α, u)du. ∂α

(ii) If fD {F D−1 (1 − p)} > 0 and ∂rF (α, u)/∂α exists at α = F D−1 (1 − p), then 0

d λF (p) dp Z     ∂ 1 =− ϕF rF {F D−1 (1 − p), u}u |J rF {F D−1 (1 − p), u}, u | rF (α, u)|α=F D−1 (1−p) du. −1 ∂α fD {F D (1 − p)} S d−1

λF (p) =

8

Furthermore, if both ϕF [rF {F D−1 (1 − p), u}u] and f [rF {F D−1 (1 − p), u}u] do not depend on u, i.e., ϕF (x) and f (x) are constant over the boundary ∂C F (p) of C F (p), denoted by ϕF {∂C F (p)} and f {∂C F (p)} respectively, then dλF (p)/d p = ϕF {∂C F (p)}/ f {∂C F (p)}. Proof. Using the polar coordinates representation, we obtain Z Z F D (α) = Pr{DF (X) ≤ α} = 1 − f (x)dx =

S d−1

T F (α)

(Z



rF (α,u)

f (ru)|J(r, u)|dr du,

 Z r {F −1 (1−p),u}       F D du. λF (p) = ϕF (x)dx = ϕF (ru)|J(r, u)|dr     C F (p) S d−1  0 Z

)

Z

Taking derivatives, the results for fD (α) and dλF (p)/d p follow. It is straightforward that dλF (p)/d p = ϕF {∂C F (p)}/ f {∂C F (p)} when both ϕF [rF {F D−1 (1− p), u}u] and f [rF {F D−1 (1− p), u}u] do not depend on u. Remark 8. Suppose that F is an elliptically symmetric distribution, the depth function is affine invariant and attains its maximum at the center. Then f [rF {F D−1 (1 − p), u}u] does not depend on u. Also ϕF [rF {F D−1 (1 − p), u}u] does not depend on u when ϕF (x) = w{DF (x)} for any function w. 0

Theorem 5. Suppose that assumptions (A0 ), (A1 ), (A2 ), (A3 ) and (A4 ) are satisfied. (i) If fD {F D−1 (1 − p)} > 0 on [a, b] ⊂ (0, 1), then n1/2 {λFn (p) − λF (p)} =

Z

`(x, p)dvn (x) + o p (1),

where `(x, p) = `1 (x, p) + `2 (x, p) + `3 (x, p) + `4 (x, p) with Z `1 (y, p) = η(x, y)dx, C F (p) Z `2 (x, p) = ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|ψ{x, F D−1 (1 − p), u}du, S d−1

0

0

`4 (x, p) = −λF (p)τ2 {x, F D−1 (1 − p)}.

`3 (x, p) = −λF (p)τ1 {x, F D−1 (1 − p)},

In addition, {n1/2 {λFn (p) − λF (p)}, p ∈ [a, b]} {Gλ (p), p ∈ [a, b]}, where Gλ (p) is a zero-mean Gaussian process with covariance structure cov{Gλ (p1 ), Gλ (p2 )} = E{`(X, p1 )`(X, p2 )} − E{`(X, p1 )}E{`(X, p2 )}. (ii) Furthermore, if both ϕF [rF {F D−1 (1− p), u}u] and f [rF {F D−1 (1− p), u}u] do not depend on u, then the above results hold with `(x, p) = `1 (x, p) + `3 (x, p). Proof. (i) Decompose λFn (p) − λF (p) as follows: Z Z λFn (p) − λF (p) = ϕFn (x)dx − C Fn (p)

where 0

I1n =

Z

C Fn (p) 0

ϕFn (x)dx −

Z

C Fn (p)

0

C F (p)

0

ϕF (x)dx,

I2n =

a.s.

Z

0

ϕF (x)dx = I1n + I2n ,

C Fn (p)

ϕF (x)dx −

Z

C F (p)

ϕF (x)dx. a.s.

Since assumption (A2 ) implies that T Fn (α) −→ T F (α) uniformly in α and Theorem 4 implies that F D−1n (1 − p) −→ a.s.

F D−1 (1 − p) uniformly in p, we conclude that C Fn (p) −→ C F (p) uniformly in p, which along with (A4 ) and Fubini’s Theorem yields 9

0

n1/2 I1n = =

Z

C Fn (p)

Z

C F (p)

n1/2 {ϕFn (x) − ϕF (x)}dx =

(Z

Z

C F (p)

n1/2 {ϕFn (x) − ϕF (x)}dx + o p (1)

) Z (Z η(x, y)dvn (y) dx + o p (1) =

C F (p)

) η(x, y)dx dvn (y) + o p (1).

0

For I2n , using the polar coordinates representation, x = ru with r = |x| and u = x/|x|, we have  Z Z rFn {F D−1n (1−p),u}     0   du. ϕF (ru)|J(r, u)|dr n1/2 I2n = n1/2      −1 d−1 rF {F (1−p),u} S D

Then by the Mean Value Theorem, there exists ξn (p, u) between rFn {F D−1n (1 − p), u} and rF {F D−1 (1 − p), u} such that 1/2

n

0

I2n =

Z

S d−1

ϕF {ξn (p, u)u}|J{ξn (p, u), u}|[n1/2 [rFn {F D−1n (1 − p), u} − rF {F D−1 (1 − p), u}]]du.

a.s.

Since C Fn (p) −→ C F (p) uniformly in p ∈ [a, b] and ϕF is continuous, ξn (p, u) = rF {F D−1 (1 − p), u} + o p (1), |J{ξn (p, u), u}| = |J[rF {F D−1 (1 − p), u}, u]| + o p (1) and ϕF {ξn (p, u)u} = ϕF [rF {F D−1 (1 − p), u}u] + o p (1). Thus Z 0 ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|[n1/2 [rFn {F D−1n (1 − p), u} − rF {F D−1 (1 − p), u}]]du + o p (1). n1/2 I2n = S d−1

0

0

0

Decomposing rFn {F D−1n (1 − p), u} − rF {F D−1 (1 − p), u}, we obtain n1/2 I2n = n1/2 (I2n1 + I2n2 ) + o p (1), where Z

0

n1/2 I2n1 = 1/2

n

ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|[n1/2 [rFn {F D−1n (1 − p), u} − rF {F D−1n (1 − p), u}]]du,

S d−1

0

I2n2 =

0

By (A2 ), 0

Z

S d−1

n1/2 I2n1 =

ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|[n1/2 [rF {F D−1n (1 − p), u} − rF {F D−1 (1 − p), u}]]du.

Z

S d−1

ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|

"Z

# ψ{x, F D−1n (1 − p), u}dvn (x) du + o p (1).

a.s.

Since Theorem 4 implies F D−1n (1 − p) −→ F D−1 (1 − p) uniformly in p and ψ(x, α, u) is continuous in α, the equicontinuity of the empirical processes vn and Fubini’s Theorem lead to "Z # Z 1/2 0 −1 −1 −1 n I2n1 = ψ{x, F D (1 − p), u}dvn (x) du + o p (1) ϕF [rF {F D (1 − p), u}u]|J[rF {F D (1 − p), u}, u]| S d−1 # Z "Z −1 −1 −1 = ϕF [rF {F D (1 − p), u}u]|J[rF {F D (1 − p), u}, u]|ψ{x, F D (1 − p), u}du dvn (x) + o p (1). S d−1

0

For n1/2 I2n2 , by the Mean Value Theorem, there exists θn (p) between F D−1n (1 − p) and F D−1 (1 − p) such that rF {F D−1n (1 − p), u} − rF {F D−1 (1 − p), u} =

∂ rF (α, u)|α=θn (p) {F D−1n (1 − p) − F D−1 (1 − p)}. ∂α

a.s.

Since F D−1n (1 − p) −→ F D−1 (1 − p) and ∂rF (α, u)/∂α is continuous in α, θn (p) = F D−1 (1 − p) + o p (1) and ∂ ∂ rF (α, u)|α=θn (p) = rF (α, u)|α=F D−1 (1−p) + o p (1) ∂α ∂α 10

uniformly in u ∈ S d−1 for n sufficiently large. Combining all those results, and using Theorem 4 and Lemma 2 (ii), we obtain Z 1/2 0 n I2n2 = ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]| S d−1

∂ rF (α, u)|α=θn (p) {F D−1n (1 − p) − F D−1 (1 − p)}]du ∂α Z ∂ = ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]| rF (α, u)|α=FD−1 (1−p) du d−1 ∂α S Z h i 1 × τ1 {x, F D−1 (1 − p)} + τ2 {x, F D−1 (1 − p)} dvn (x) + o p (1) −1 fD {F D (1 − p)} Z h i 0 = −λF (p) τ1 {x, F D−1 (1 − p)} + τ2 {x, F D−1 (1 − p)} dvn (x) + o p (1). [n1/2

Since {η(x, y) : x ∈ Rd } is a uniformly bounded Donsker class by (A4 ) and {IC F (p) (x) : p ∈ [a, b]} is a uniformly bounded Donsker class, {η(x, y)ICF (p) (x) : p ∈ [a, b], x ∈ Rd } is a Donsker class; see, e.g., Example 2.10.8 of van der Vaart and Wellner [40]. By Lemma 1, {`1 (x, p) : p ∈ [a, b]} is a Donsker class. Since {ψ{x, F D−1 (1 − p), u} : p ∈ [a, b], u ∈ S d−1 } is a Donsker class and ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]| does not depend on x, n o ϕF [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|ψ{x, F D−1 (1 − p), u} : p ∈ [a, b], u ∈ S d−1

is a Donsker class and thus {`2 (x, p) : p ∈ [a, b]} is a Donsker class by Lemma 1. Since {τ1 {x, F D−1 (1 − p)} : p ∈ [a, b]} and {τ2 {x, F D−1 (1 − p)} : p ∈ [a, b]} are Donsker classes, it is straightforward that {`3 (x, p) : p ∈ [a, b]} and {`4 (x, p) : p ∈ [a, b]} are Donsker classes. Therefore, {`(x, p) : p ∈ [a, b]} is a Donsker class. Then the uniform weak convergence result follows by the Central Limit Theorem for empirical processes. This completes the proof of part (i). (ii) If both ϕF [rF {F D−1 (1 − p), u}u] and f [rF {F D−1 (1 − p), u}u] do not depend on u, `2 (x, p) = −`4 (x, p) and thus the assertions follow. This concludes the proof of Theorem 5. As a special case, we have the following results for VFn (p). 0

Corollary 3. Suppose that the assumptions (A0 ), (A1 ), (A2 ), and (A3 ) are satisfied. (i) If fD {F D−1 (1 − p)} > 0 on [a, b] ⊂ (0, 1), then n1/2 {VFn (p) − VF (p)} =

Z

τ(x, p)dvn (x) + o p (1),

where τ(x, p) = τ0 {x, F D−1 (1 − p)} − vF (p)[τ1 {x, F D−1 (1 − p)} + τ2 {x, F D−1 (1 − p)}] with Z τ0 {x, F D−1 (1 − p)} = |J[rF {F D−1 (1 − p), u}u]|ψ{x, F D−1 (1 − p), u}du S d−1

and vF (p) = dVF (p)/d p. In addition, {n1/2 {VFn (p) − VF (p)}, p ∈ [a, b]} zero-mean Gaussian process with covariance structure

{GV (p), p ∈ [a, b]}, where GV (p) is a

cov{GV (p1 ), GV (p2 )} = E{τ(X, p1 )τ(X, p2 )} − E{τ(X, p1 )}E{τ(X, p2 )}. (ii) Furthermore, if f [rF {F D−1 (1 − p), u}u] does not depend on u, then the above results hold with τ(x, p) = −vF (p)τ1 {x, F D−1 (1 − p)} = −vF (p)IC F (p) (x). Proof. (i) For this special case, ϕF (x) = 1 = ϕFn (x) and thus η(x, y) = 0. Then the assertions follow by Theorem 5 (i). (ii) First, ϕF [rF {F D−1 (1 − p), u}u] = 1 does not depend on u. If f [rF {F D−1 (1 − p), u}u] does not depend on u, then the result follows by Theorem 5 (ii). 11

Remark 9. Much effort has been made to study the uniform weak convergence of sample versions of VF (p). Serfling [38] obtained the uniform weak convergence result for V(p; Pn , CF ) = inf{volume(C): Pn (C) ≥ p, C ∈ CF }. Wang and Serfling [44] established the uniform weak convergence of Vn∗ (p) = inf{volume(C): Pn (C) ≥ p, C ∈ E}, where E is the set of all ellipsoids in Rd . While both V(p; Pn , CF ) and Vn∗ (p) differ from VFn (p), each of them can be considered as a sample version of VF (p) under some conditions. 5. Applications 5.1. Asymptotics of some nonparametric multivariate kurtosis measures The function λF (p) not only serves as a nonparametric spread measure for multivariate distributions F, it also contains information about the kurtosis of F. Utilizing the generalized spread functions of distributions F and G in Rd , Wang and Zhou [45] defined a generalized multivariate kurtosis ordering F ≤k G



λG {λ−1 F (r)} is covex for r ≥ 0.

An important special case is the kV -ordering, viz. F ≤kV G



VG {VF−1 (r)} is covex for r ≥ 0.

In addition, some nonparametric multivariate kurtosis measures have been defined via VF (p). For example, treating kurtosis and tailweight as the same notion, Liu et al. [25] introduced a depth-based multivariate kurtosis measure, a “fan plot”, where for all t ∈ [0, 1], bF (t|p0 ) = VF (tp0 )/VF (p0 ). Extending the Groeneveld and Meeden [15] kurtosis measure for univariate symmetric distributions, Wang and Serfling [42] proposed a nonparametric multivariate kurtosis functional, defined for all p ∈ (0, 1), by kF (p) =

VF {(1 + p)/2} + VF {(1 − p)/2} − 2VF (1/2) . VF {(1 + p)/2} − VF {(1 − p)/2}

Wang and Zhou [45] showed that the kV -ordering is preserved by kF (p) and is inversely preserved by bF (t|p0 ). Thus higher kF (p) means higher kurtosis, whereas higher bF (t|p0 ) implies lower kurtosis. As applications of our asymptotic results, we study the asymptotic behavior of bFn (t|p0 ) and kFn (p). First we have the following uniform strong convergence results. Theorem 6. Under conditions (C1 )–(C4 ), a.s.

(i) bFn (t|p0 ) −→ bF (t|p0 ) uniformly in t ∈ [0, 1] for any p0 ∈ (0, 1); a.s.

(ii) kFn (p) −→ kF (p) uniformly in p ∈ [a, b] for any closed interval [a, b] ⊂ (0, 1). Proof. The results follow immediately from Corollary 2. With the uniform weak convergence result of VFn (p), it is relatively straightforward to establish the uniform weak convergences of bFn (t|p0 ) and kFn (p). 0

Theorem 7. Suppose that the assumptions (A0 ), (A1 ), (A2 ) and (A3 ) are satisfied. If fD {F D−1 (1 − p)} > 0, then (i) for any p0 , t0 ∈ (0, 1), n o n1/2 {bFn (t|p0 ) − bF (t|p0 )}, t ∈ [t0 , 1]

(

) 1 VF (tp0 ) GV (tp0 ) − GV (p0 ), t ∈ [t0 , 1] ; VF (p0 ) {VF (p0 )}2

12

(ii) for any closed interval [a, b] ⊂ (0, 1), (

n o n1/2 {kFn (p) − kF (p)}, p ∈ [a, b]

2[VF (1/2) − VF {(1 − p)/2}] GV {(1 + p)/2} [VF {(1 + p)/2} − VF {(1 − p)/2}]2

) 2 2[VF {(1 + p)/2} − VF (1/2)] − GV {(1 − p)/2}, p ∈ [a, b] , GV (1/2) + VF {(1 + p)/2} − VF {(1 − p)/2} [VF {(1 + p)/2} − VF {(1 − p)/2}]2

where GV is the Gaussian process given in Corollary 3. Proof. Observe that n1/2 {bFn (t|p0 ) − bF (t|p0 )} = n1/2

(

VFn (tp0 ) VF (tp0 ) − VFn (p0 ) VF (p0 )

)

VF (p0 )n1/2 {VFn (tp0 ) − VF (tp0 )} − VF (tp0 )n1/2 {VFn (p0 ) − VF (p0 )} VFn (p0 )VF (p0 ) 1 = [n1/2 {VFn (tp0 ) − VF (tp0 )} − bF (t|p0 )n1/2 {VFn (p0 ) − VF (p0 )}]. VFn (p0 )

=

In the same way, we have 1/2

n

(

VFn {(1 + p)/2} + VFn {(1 − p)/2} − 2VFn (1/2) VFn {(1 + p)/2} − VFn {(1 − p)/2} ) VF {(1 + p)/2} + VF {(1 − p)/2} − 2VF (1/2) − VF {(1 + p)/2} − VF {(1 − p)/2}  h  1 = n1/2 VFn {(1 + p)/2} + VFn {(1 − p)/2} − 2VFn (1/2) VFn {(1 + p)/2} − VFn {(1 − p)/2} h  i − VF {(1 + p)/2} + VF {(1 − p)/2} − 2VF (1/2) − kF (p)n1/2 VFn {(1 + p)/2}   i − VFn {(1 − p)/2} − VF {(1 + p)/2} − VF {(1 − p)/2} .

{kFn (p) − kF (p)} = n

1/2

Then the assertions follow by Corollary 3 and Slutsky’s Lemma.

The above uniform weak convergence results can also be obtained by the Delta method. With the asymptotic distributions of bFn (t|p0 ) and kFn (p), we can use them for statistical inference. 5.2. Comparing the spread and kurtosis of two multivariate data sets Wang and Zhou [45] designed a two-dimensional visual device to compare two distributions F and G in any dimension with respect to spread and kurtosis using a plot of z = λG {λ−1 F (r)}. Based on the uniform strong convergence result, we can now compare the spread and kurtosis of two data sets, Xn = {X1 , . . . , Xn } and Y m = {Y 1 , . . . , Y m }, in Rd via the plot of z = λGm {λ−1 Fn (r)} with r = λFn (p), where F n and G m are the empirical distribution functions of {X1 , . . . , Xn } and {Y 1 , . . . , Y m }, respectively. Specifically, (1) if the plot is above (below) the line z = r (equivalently λGm (p) > (<) λFn (p)) at p, then Y m is more (less) scattered than Xn at p; (2) if the plot is convex (concave), then Y m has higher (lower) kurtosis than Xn ; (3) if the plot exhibits a substantial linear pattern, then Xn and Y m have the same kurtosis approximately. The aforementioned plot is essentially the plot of (λFn (p), λGm (p)) so we call it a generalized multivariate spreadspread plot. In the following example, we illustrate the method based on the plot of (VFn (p), VGm (p)) by simulated data sets from some elliptically symmetric distributions Ed (h; µ, Λ). The following notations will be used: 13

Nd (µ, Λ): A d-dimensional normal distribution with µ and Λ. Mtd (ν, µ, Λ): A d-dimensional Student t distribution with ν degrees of freedom, µ and Λ. MUd (µ, Λ): A d-dimensional uniform distribution on {x ∈ Rd : (x − µ)> Λ−1 (x − µ) ≤ 1}. Example 1. Let us compare the spread and kurtosis of two data sets in R3 by the plot of (VFn (p), VGm (p)) for p ∈ [0, 1] with a step ∆ = 0.01. The projection depth and the R package DepthProc of Zawadzki et al. [46] are used. Since neither VFn (p) nor VGm (p) is affected by µ and any rotation according to Proposition 1 (i), we can assume without loss of generality that µ = 0 and Λ is a diagonal matrix. To make things clear, we use the notation (V Xn (p), VY m (p)) instead of (VFn (p), VGm (p)). Let X500 : A sample of size 500 from N3 (0, I3×3 ).

Y 600 : A sample of size 600 from N3 [0,diag(1, 1.44, 2.25)].

Z 600 : A sample of size 600 from Mt3 [5, 0,diag(1, 0.25, 0.64)].

U600 :A sample of size 600 from MU3 [0,diag(1, 2.25, 4)]. Figure 1 (a) shows the plot of (V X500 (p), VY 600 (p)). Since the plot exhibits a substantial linear pattern and is above the line z = r, Y 600 has approximately the same kurtosis as X500 and is more scattered than X500 . From Figure 1 (b), we see that Z 600 has higher kurtosis than X500 . In addition, Z 600 is more scattered than X500 for p such that V X500 (p) ≥ 49 (correspondingly p ≥ 0.93) and less scattered otherwise. Figure 1 (c) reveals that U600 has lower kurtosis and less spread than X500 .

(FIGURE 1 ABOUT HERE)

Figure 1. Spread-spread plots to compare spread and kurtosis of two multivariate data sets. 5.3. Assessing multivariate normality For the univariate case, the quantile-quantile plot is a popular graphical method to assess normality. As we discussed in Section 1, λF (p) is essentially a multivariate quantile function and thus can be used to assess multivariate normality. Suppose that a sample Xn = {X1 , . . . , Xn } comes from a d-dimensional elliptically symmetric distribution F. Under the conditions of Proposition 2, F is a d-dimensional normal distribution if and only if λF (p) = cλNd (0,I) (p), where Nd (0, I) denotes the d-dimensional standard normal distribution. a.s. Since λFn (p) −→ λF (p) uniformly in p, the plot of (λNd (0,I) (p), λFn (p)) should resemble a straight line through the origin when n is large and provides a visual device to assess multivariate normality. Since λF (p) is essentially a multivariate quantile function, we call the plot a generalized multivariate quantile-quantile plot. Actually, the wellknown chi-square plot is a special case of the plot, which can be seen from Proposition 3. In the following example, we demonstrate the method based on the plot of (VNd (0,I) (p), VFn (p)) by simulated data sets from some elliptically symmetric distributions Ed (h; µ, Λ). Example 2. Let us assess multivariate normality by the plot of (VNd (0,I) (p), VFn (p)) for p ∈ [0.01, 0.99] with a step ∆ = 0.01. The projection depth and the R package DepthProc of Zawadzki et al. [46] are used. Since VFn (p) is not affected by µ and any rotation according to Proposition 1 (i), we can assume without loss of generality that µ = 0 and Λ is a diagonal matrix. To make things clear, we use the notation V Xn (p) instead of VFn (p). X200 : A sample of size 200 from N2 [0,diag(1, 0.36)].

Y 200 : A sample of size 200 from Mt2 [4, 0,diag(1, 0.36)]. U200 : A sample of size 200 from MU2 [0,diag(2.25, 9)]. 14

(1) Figure 2 (a) shows the plot of (VN2 (0,I) (p), V X200 (p)). The plot exhibits a substantial linear pattern, which indicates normality of the data. (2) The plot of (VN2 (0,I) (p), VY 200 (p)) is given in Figure 2 (b). Since a curved pattern is observed, the data do not come from a bivariate normal distribution. Furthermore, since the plot is convex, we can conclude that the data come from a population with higher kurtosis than a bivariate normal distribution. (3) Figure 2 (c) is for U200 . It reveals a concave curved pattern, which indicates that the data come from a population with lower kurtosis than a bivariate normal distribution.

(FIGURE 2 ABOUT HERE)

Figure 2. Quantile-quantile plots to assess multivariate normality. Remark 10. When we use the above method to assess multivariate normality for a practical data set, if the plot shows a curved pattern it is sufficient to conclude that the data come from a non-normal population. However, if the plot exhibits a substantial linear pattern, it is still necessary to check elliptical symmetry of F. Tests for elliptical symmetry have been proposed by Beran [3], Fang, Li and Zhu [12], and Li, Fang and Zhu [22]. In classical multivariate analysis, a well-known graphical procedure to assess multivariate normality is the chisquare plot; see, e.g., Johnson and Wichern [18] for details. For comparison, the chi-square plots for the data sets in Example 2 are given in Figure 3. Overall, each chi-square plot shows a similar pattern to its corresponding quantilequantile plot in Figure 2. This is not a surprise since the chi-square plot is also a special case of the generalized multivariate quantile-quantile plot as discussed above. Meanwhile, the top three ordered squared Mahalanobis distances in the chi-square plot for X200 are quite variable, which is not just a fortuity. It is observed from our simulation that the largest a few of squared Mahalanobis distances frequently deviates from a linear pattern in a chi-square plot for a multivariate normal data set. One may argue that the plot in Figure 2 (a) excludes V X200 (0.995) and V X200 (1), the volumes of the sample central regions covering the two data points of the least depth, respectively. While this is true, even if the largest two squared Mahalanobis distances are removed from the chi-square plot for X200 , the quantile-quantile plot in Figure 2 (a) still performs better, showing a stronger linear pattern. (FIGURE 3 ABOUT HERE)

Figure 3. The chi-square plots for the data sets in Example 2. 6. Concluding remarks In this paper, we studied a generalized spread function λF (p), especially the asymptotic behavior of its sample version λFn (p). Many important depth-based functionals such as VF (p) and F D−1 (p) are special cases of λF (p) and various descriptive measures have been defined through them. As we discussed in Section 1, λF (p) can be considered as a generalized multivariate quantile function. For the univariate case, order statistics and quantile functions play a central role in nonparametric statistical analysis and inference. It is not surprising that λF (p) will play a similar role in nonparametric multivariate statistics. Our asymptotic results for λFn (p) provide a foundation for its applications. A variant of λF (p) is Z λ∗F (p) =

C F (p)

ϕF (x)dF(x).

Both the depth-based Lorenz curves of Liu et al. [25] and Serfling [38] are special cases of λ∗F (p). When F has a density function f (x), Z λ∗F (p) = ϕF (x) f (x)dx, C F (p)

15

which is similar to λF (p). However, its sample version Z λ∗Fn (p) =

C Fn (p)

ϕFn (x)dFn (x)

is quite different from λFn (p). Under the same conditions, we can also establish the uniform strong and weak convergences of λ∗Fn (p). a.s.

Theorem 8. Suppose that conditions (C1 )–(C4 ) are satisfied. If ϕF (x) is bounded and ϕFn (x) −→ ϕF (x) uniformly in a.s. x ∈ Rd , then λ∗Fn (p) −→ λ∗F (p) uniformly in p ∈ [0, p0 ] for any p0 ∈ (0, 1) as n → ∞. 0

Theorem 9. Suppose that assumptions (A0 ), (A1 ), (A2 ), (A3 ) and (A4 ) are satisfied. (i) If fD {F D−1 (1 − p)} > 0 on [a, b] ⊂ (0, 1), then n1/2 {λ∗Fn (p) − λ∗F (p)} =

Z

`∗ (x, p)dvn (x) + o p (1),

where `∗ (x, p) = `0∗ (x, p) + `1∗ (x, p + `2∗ (x, p) + `3∗ (x, p) + `4∗ (x, p) with Z ∗ ∗ `0 (x, p) = ϕF (x)IC F (p) (x), `1 (y, p) = η(x, y) f (x)dx, C F (p)

`2∗ (x, p) = and

Z

S d−1

ϕF [rF {F D−1 (1 − p), u}u] f [rF {F D−1 (1 − p), u}u]|J[rF {F D−1 (1 − p), u}, u]|ψ{x, F D−1 (1 − p), u}du,

`3∗ (x, p) = −

(

) d ∗ λF (p) τ1 {x, F D−1 (1 − p)}, dp

In addition, {n1/2 {λ∗Fn (p) − λ∗F (p)} : p ∈ [a, b]} process with covariance structure

`4∗ (x, p) = −

(

) d ∗ λF (p) τ2 {x, F D−1 (1 − p)}. dp

{Gλ∗ (p) : p ∈ [a, b]}, where Gλ∗ (p) is a zero-mean Gaussian

cov{Gλ∗ (p1 ), Gλ∗ (p2 )} = E{`∗ (X, p1 )`∗ (X, p2 )} − E{`∗ (X, p1 )}E{`∗ (X, p2 )}. (ii) Furthermore, if ϕF [rF {F D−1 (1 − p), u}u] does not depend on u, then the above results hold with `∗ (x, p) = `0∗ (x, p) + `1∗ (x, p + `3∗ (x, p). The proofs are similar to the proofs of Theorem 2 and Theorem 5, respectively, and thus are omitted. It should be pointed out that λ∗F (p) does not possess the properties of λF (p) given in Propositions 1 and 2. Thus it is inappropriate to consider λ∗F (p) as a spread measure and use λ∗Fn (p) to assess multivariate normality. In our simulation, we also used various other elliptically symmetric distributions such as Pearson Type II distributions and Kotz type distributions including multivariate exponential power distributions. Various other sample sizes and steps were also considered. In Examples 1 and 2, we only presented the representative patterns observed. Performances of both the plot of (VFn (p), VGm (p)) and the plot of (VNd (0,I) (p), VFn (p)) increase as sample sizes increase. A good choice of the step ∆ can also improve the performances of the plots. In general, the step ∆ should be selected such that min(n, m)∆ ≥ 1 for the plot of (VFn (p), VGm (p)) and n∆ ≥ 1 for the plot of (VNd (0,I) (p), VFn (p)), while the number of points in the plots should be adequate. In Section 5, we developed a graphical method to compare spread and kurtosis of two multivariate data sets and a graphical procedure to assess multivariate normality. Based on our uniform weak convergence results, tests for equal kurtosis of two multivariate data sets and new tests for multivariate normality can be further designed. The idea for those tests is straightforward. Select proper p1 , . . . , pk first, then test for equal kurtosis by testing the linearity of the points (λFn (pi ), λGm (pi )) and test for multivariate normality by testing the linearity of the points (λNd (0,I) (pi ), λFn (pi )) with i ∈ {1, . . . , k}. Future work entails developing a powerful test for each case and thus finding an efficient test statistic and associated asymptotic distribution. 16

Acknowledgments Many thanks are due to two anonymous referees, Prof. Brent Burch, an Associate Editor and the Editor-in-Chief, Prof. Christian Genest, FRSC, for their insightful comments and valuable suggestions, which contributed to improve the paper considerably. The responsibility for any remaining error rests, of course, with the author. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]

M.A. Arcones, H. Cui, Y. Zuo, Empirical depth processes, Test 15 (2006) 151–177. M.A. Arcones, E. Gin´e, Limit theorems for U-processes, Ann. Probab. 21 (1993) 1494–1542. R. Beran, Testing for ellipsoidal symmetry of a multivariate density, Ann. Statist. 7 (1979) 150–162. P.J. Bickel, E.L. Lehmann, Descriptive statistics for nonparametric models: III. Dispersion, Ann. Statist. 4 (1976) 1139–1158. A. Cuevas, M. Febrero, R. Fraiman, Robust estimation and classification for functional data via projection-based depth notions, Comput. Statist. 22 (2007) 481–496. X. Dang, R. Serfling, Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties, J. Statist. Plann. Inf. 140 (2010) 198–213. D.L. Donoho, M. Gasko, Breakdown properties of location estimates based on halfspace depth and projected outlyingness, Ann. Statist. 20 (1992) 1803–1827. L. D¨umbgen, Limit theorems for the simplicial depth, Statist. Probab. Lett. 14 (1992) 119–128. S. Dutta, A.K. Ghosh, On robust classification using projection depth, Ann. Inst. Statist. Math. 64 (2012) 657–676. R. Dyckerhoff, Convergence of depths and depth-trimmed regions, arXiv preprint arXiv:1611.08721, 2018. J.H.J. Einmahl, D.M. Mason, Generalized quantile processes, Ann. Statist. 20 (1992) 1062–1078. K.T. Fang, R.Z. Li, L.X. Zhu, A projection NT-type test of elliptical symmetry based on the skewness and kurtosis measures, ACTA Mathematicae Applicatae Sinica14 (1998) 314–323. M. Febrero, P. Galeano, W. Gonz´alez-Manteiga, Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels, Environmetrics 19 (2008) 331–345. A.K. Ghosh, P. Chaudhuri, On data depth and distribution-free discriminant analysis using separating surfaces, Bernoulli 11 (2005) 1–27. R.A. Groeneveld, G. Meeden, Measuring skewness and kurtosis, Statist. 33 (1984) 391–399. X. He, G. Wang, Convergence of depth contours for multivariate datasets, Ann. Statist. 25 (1997) 495–504. R. Hoberg, Cluster analysis based on data depth, In: H. Kiers, J.R. Rasson, P. Groenen, M. Schader (Eds.) Data Analysis, Classification, and Related Methods, pp. 17–22, Springer, Berlin, 2000. R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, Pearson Education, Upper Saddle River, NJ, 2007. R. J¨ornsten, Clustering and classification based on L1 data depth, J. Multivariate Anal. 90 (2004) 67–89. T. Lange, K. Mosler, P. Mozharovskyi, Fast nonparametric classification based on data depth, Stat. Papers 55 (2014) 49–69. J. Li, J.A. Cuesta-Albertos, R.Y. Liu, DD-classifier: Nonparametric classification procedure based on DD-plot, J. Amer. Statist. Assoc. 107 (2012) 737–753. R.Z. Li, K.T. Fang, L.X. Zhu, Some Q-Q probability plots to test spherical and elliptical symmetry, J. Comput. Graph. Statist. 6 (1997) 435–450. R.Y. Liu, On a notion of data depth based on random simplices, Ann. Statist. 18 (1990) 405–414. R.Y. Liu, Data depth and multivariate rank tests. In: Y. Dodge (Ed.), L1 -Statistics and Related Methods, pp. 279–294, North-Holland, Amsterdam, 1992. R.Y. Liu, J.M. Parelius, K. Singh, Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion), Ann. Statist. 27 (1999) 783–858. R.Y. Liu, K. Singh, A quality index based on data depth and multivariate rank tests, J. Amer. Statist. Assoc. 88 (1993) 252–260. R.Y. Liu, K. Singh, Rank tests for multivariate scale difference based on data depth. In: R.Y. Liu, R. Serfling, D.L. Souvaine (Eds.), Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, pp. 17–35, American Mathematical Society, 2006. S. L´opez-Pintado, J. Romo, Depth-based classification for functional data, In: R.Y. Liu, R. Serfling, D.L. Souvaine (Eds.), Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, pp. 103–119, American Mathematical Society, 2006. S. L´opez-Pintado, J. Romo, On the concept of depth for functional data, J. Amer. Statist. Assoc. 104 (2009) 718–734. J.-C. Mass´e, Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean, Bernoulli 10 (2004) 397–419. K. Mosler, R. Hoberg, Data analysis and classification with the zonoid depth, In: R.Y. Liu, R. Serfling, D.L. Souvaine (Eds.), Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, pp. 49–59, American Mathematical Society, 2006. D. Nolan, Asymptotics for multivariate trimming, Stoch. Proc. Appl. 42 (1992) 157–169. H. Oja, On location, scale, skewness and kurtosis of univariate distributions, Scand. J. Statist. 8 (1981) 154–168. D. Pollard, Convergence of Stochastic Processes, Springer, New York, 1984. R. Ranga Rao, Relations between weak and uniform convergence of measures with applications, Ann. Math. Statist. 33 (1962) 659–680. P.J. Rousseeuw, M. Hubert, Regression depth (with discussion), J. Amer. Statist. Assoc. 94 (1999) 388–433. R. Serfling, Approximation Theorems of Mathematical Statistics, Wiley, New York, 1980. R. Serfling, Generalized quantile processes based on multivariate depth functions, with applications in nonparametric multivariate analysis, J. Multivariate Anal. 83 (2002) 232–247. R. Serfling, Depth functions in nonparametric multivariate inference, In: R.Y. Liu, R. Serfling, D.L. Souvaine (Eds.), Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, pp. 1–16, American Mathematical Society, 2006. A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes, With Applications to Statistics, Springer, New York, 1996.

17

[41] J. Wang, A note on weak convergence of general halfspace depth trimmed means, Statist. Probab. Lett. 142 (2018) 50–56. [42] J. Wang, R. Serfling, Nonparametric multivariate kurtosis and tailweight measures, J. Nonparam. Statist. 17 (2005) 441–456. [43] J. Wang, R. Serfling, Influence functions for a general class of depth-based generalized quantile functions, J. Multivariate Anal. 97 (2006) 810–826. [44] J. Wang, R. Serfling, On scale curves for nonparametric description of dispersion, In: R.Y. Liu, R. Serfling, D.L. Souvaine (Eds.), Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, pp. 37–48, American Mathematical Society, 2006. [45] J. Wang, W. Zhou, A generalized multivariate kurtosis ordering and its applications, J. Multivariate Anal. 107 (2012) 169–180. [46] Z. Zawadzki, D. Kosiorowski, K. Slomczynski, M. Bocian, A. Wegrzynkiewicz, DepthProc: Statistical Depth Functions for Multivariate Analysis, R Package version 2.0.2, https://cran.r-project.org/package=\texttt{DepthProc}, 2017. [47] Y. Zuo, Multidimensional trimming based on projection depth, Ann. Statist. 34 (2006) 2211–2251. [48] Y. Zuo, H. Cui, Depth weighted scatter estimators, Ann. Statist. 33 (2005) 381–413. [49] Y. Zuo, R. Serfling, General notions of statistical depth function, Ann. Statist. 28 (2000) 461–482. [50] Y. Zuo, R. Serfling, Structural properties and convergence results for contours of sample statistical depth functions, Ann. Statist. 28 (2000) 483–499.

18

VY600(p) 150

(a)

100

50 VX500(p) 0 0

250

50

100

150

VZ600(p)

200 150 (b) 100 50 VX500(p) 0 0

50

100

150

200

250

100 VU600(p) 80 60 (c) 40 VX500(p)

20 0 0

20

40

60

80

100

30 VX200(p)

25 20 (a)

15 10 5

VN2(0,I)(p)

0 0

60

5

10

15

20

25

30

VY200(p)

50 40 (b)

30 20 10

VN2(0,I)(p)

0 0

10

20

30

40

50

60

30 VU200(p)

25 20 (c)

15 10 5

VN2(0,I)(p)

0 0

5

10

15

20

25

30

12 d2(i)

10 8 (a)

6 4 2 qc,2((i−1/2)/n) 0 0

60

2

4

6

8

10

12

d2(i)

50 40 (b)

30 20 10 qc,2((i−1/2)/n) 0 0

10

20

30

40

50

60

12 d2(i)

10 8 (c)

6 4 2 qc,2((i−1/2)/n) 0 0

2

4

6

8

10

12

Some Changes 1. Page 14, line 15: Change “49” to “47.12”. 2. Page 14, line 16: Change “0.93” to “0.91”. 3. Page 14, line 19: Change the caption of Figure 1 to “Figure 1. Spread-spread plots to compare spread and kurtosis of two multivariate data sets: (a) the plot of , (b) the plot of , and (c) the plot of .” 4. Page 15, line 9: Change the caption of Figure 2 to “Figure 2. Quantile-quantile plots to assess multivariate normality: (a) the plot of , (b) the plot of , and (c) the plot of .” 5. Page 15, line 18: Change “three” to “four”. 6. Page 15, line 26: Change the caption of Figure 3 to “Figure 3. The chi-square plots for the data sets in Example 2: (a) the chi-square plot for the chi-square plot for , and (c) the chi-square plot for .”

, (b)