Journal of Statistical Planning and Inference 97 (2001) 235–261
www.elsevier.com/locate/jspi
Asymptotic distribution of regression M-estimators Miguel A. Arcones Department of Mathematical Sciences, State University of New York, Binghamton, NY 13902-6000, USA Received 30 March 2000; received in revised form 1 August 2000; accepted 22 September 2000
Abstract We consider the following linear regression model: Yi = Zi 0 + Ui ;
i = 1; : : : ; n;
m ∞ where {Ui }∞ i=1 is a sequence of R -valued i.i.d. r.v.’s; {Zi }i=1 is a sequence of i.i.d. d×m random matrices; and 0 is a d-dimensional parameter to be estimated. Given a function : Rm → R, we de5ne a robust estimator ˆn as a value such that
n−1
n i=1
(Yi − Zi ˆn ) = inf n−1 ∈Rd
n i=1
(Yi − Zi ):
We study the convergence in distribution of an (ˆn − 0 ) in di7erent situations, where {an } is a sequence of real numbers depending on and on the distributions of Zi and Ui . As a particular case, we consider the case (x) = |x|p . In this case, we show that if E[||Z||p + ||Z||2 ] ¡ ∞; either p ¿ 12 or m¿2; and some other regularity conditions hold, then n1=2 (ˆn − 0 ) converges in distribution to a normal limit. For m = 1 and p = 12 , n1=2 (log n)−1=2 (ˆn − 0 ) converges in distribution to a normal limit. For m = 1 and 12 ¿ p ¿ 0, n1=(3−2p) (ˆn − 0 ) converges in c 2001 Elsevier Science B.V. All rights reserved. distribution. MSC: primary 62E20; secondary 62F12 Keywords: Regression; Robustness; M-estimators; Lp estimators
1. Introduction We consider the linear model: Y is a m-dimensional response variable, Z is a d × m matrix regressor or predictor variable, U is a m-dimensional error independent of Z and they are related by the equation Y = Z 0 + U; E-mail address:
[email protected] (M.A. Arcones). c 2001 Elsevier Science B.V. All rights reserved. 0378-3758/01/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 2 2 4 - X
(1.1)
236
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
where 0 ∈ Rd is a parameter to be estimated. This model represents two variables Y and Z, which are linearly dependent. 0 represents the linear relation between the two variables. U is a random error. The problem is to estimate 0 from a sample (Y1 ; Z1 ); : : : ; (Yn ; Zn ), i.e. (Y1 ; Z1 ); : : : ; (Yn ; Zn ) are i.i.d. r.v.’s with the distribution of (Y; Z). The usual method to estimate 0 is to use the least-squares method (see, for example, Draper and Smith, 1981). One of the advantages of this method is the easy computability of the estimator. The disadvantage of this method is that it is not robust against outliers in the errors. Since using a computer program, it is neither diGcult nor long to compute the considered estimators, robust estimators are preferred to the least squares estimator. We refer for more in robust methods to Huber (1981) and Hampel et al. (1986). Given a continuous function : Rm → R, we de5ne ˆn as a value such that n−1
n i=1
(Yi − Zi ˆn ) = inf n−1 (Yi − Zi ): n
∈Rd
i=1
(1.2)
A popular choice is (x) = |x|, where |x| is the Euclidean distance. Another possibility is to take ˆn as a value such that n−1
n j=1
|Yj − Zj ˆn |p = inf n−1 ∈Rd
n j=1
|Yj − Zj |p ;
(1.3)
where p ¿ 0. Regression M-estimators (in di7erent variations) has been considered by several authors, see, for example, Huber (1973), JureJckovKa (1977), Koul (1977), Koenker and Bassett (1978), Yohai and Marona (1979), Ruppert and Carroll (1980), Koenker and Portnoy (1987), Bloom5eld and Steiger (1983), Bai et al. (1990), Bai et al. (1992), Davis et al. (1992), and Davis and Wu (1997). Of course, to get that ˆn converges to 0 , we must have that E[(Y − Z 0 )] = inf E[(Y − Z )] ∈Rd
(1.4)
and 0 is the unique value with this property. We can view these regression M-estimators as a particular type of general Mestimators. Next, we recall the de5nition of an M-estimator. Let (S; S; P) be a probability space and let {Xi }∞ i=1 be a sequence of i.i.d. r.v.’s with values in S. Let X be a copy of X1 . Let be a subset of Rd . Let g : S × → R be a function such that g(·; ) : S → R is measurable for each ∈ . Huber (1964) introduced the M-estimator ˆn of 0 , as a random variable ˆn = ˆn (X1 ; : : : ; Xn ) satisfying n−1
n i=1
g(Xi ; ˆn ) = inf n−1 ∈
n i=1
g(Xi ; ):
(1.5)
ˆn is estimating the parameter 0 ∈ characterized by E[g(X; ) − g(X; 0 )] ¿ 0 for each = 0 .
(1.6)
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
237
We use the notation in empirical processes. For instance, we write n Pf = E[f(X )] and Pn f = n−1 f(Xi ); i=1
where f is function on S. The heuristic idea to get the limit distribution of an (ˆn − 0 ) is the following: ˆn is the argument which minimizes the process {Pn g(·; ): ∈ }. So, an (ˆn − 0 ) is the argument which minimizes the process −1 {a2n Pn (g(·; 0 + a−1 n ) − g(·; 0 )): 0 + an ∈ }:
(1.7)
We expect that an (ˆn −0 ) converge to the argument which minimizes the limit process. This method has been used by several authors in di7erent situations (see, for example, Prakasa Rao, 1968; Kim and Pollard, 1990). To apply this method, we will have to prove (among other things) that a2n E[g(X; 0 + a−1 n ) − g(X; 0 )] → w() for some function w() and {a2n (Pn − P)(g(·; 0 + a−1 n ) − g(·; 0 )): ||6M } converges weakly. We will see that under certain conditions, the function E[(Y − Z ) − (Y − Z 0 )] is second di7erentiable at 0 . So, a2n E[(Y − Z 0 − a−1 n Z ) − (Y − Z 0 )] → V;
as n → ∞. Usually, the convergence of n a2n n−1 ((Yi − Zi 0 − a−1 n Zi ) i=1
−(Yi − Zi 0 ) − E[(Yi − Zi 0 − a−1 n Zi ) − (Yi − Zi 0 )])
is convergence to a normal limit with convergence of variances. So, usually, the rate of convergence of a M-estimator, i.e. an , is determined so that a4n n−1 Var((Y − Z 0 − a−1 n Z ) − (Y − Z 0 ))
converges. Under smooth conditions the order of Var((Y −Z 0 −Z )−(Y −Z 0 )), as → 0+, is 2 . In this case, an = n1=2 and we have the usual limit theorem to a normal random variable. If ||−q Var((Y − Z 0 − Z ) − (Y − Z 0 )); converges as → 0+, for some 0 ¡ q ¡ 2, then we choose an so that n−1 a4n a−q n converges, i.e. an = n1=(4−q) . For example, if m = 1 and (x) = |x|p for some 0 ¡ p ¡ 12 , then ||−(2p+1) Var(|Y − Z 0 − Z |p − |Y − Z 0 |p ) converges, which gives that n1=(3−2p) (ˆn − 0 ) converges in distribution (see Theorem 2:18 below). For m = 1 and (x) = |x|1=2 , ||−2 (log −1 )−1 Var(|Y − Z 0 − Z |p − |Y − Z 0 |p )
238
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
converges, which gives that n1=2 (log n)−1=2 (ˆn −0 ) converges in distribution (see Theorem 2.19 below). Although the rate of convergence for the M-estimator over (x)=|x|p , 1 2 ¿ p ¿ 0 and m=1, is slower than the usual one, this estimator is more robust against outliers in the errors (|x|p ¡ |x|2 , for x large and 12 ¿ p ¿ 0) than other estimators (least-squares regression estimator, in particular). For the M-estimator in (1.3), we will show that if E[||Z||p + ||Z||2 ] ¡ ∞; either p ¿ 12 or m¿2; and some other regularity conditions hold, then n1=2 (ˆn −0 ) converges in distribution to a normal limit. Observe that E[||Y −Z 0 −Z |p −|Y −Z 0 |p |2 ] ¡ ∞, for some = 0 implies that E[||Z||2p ] ¡ ∞. But, this condition is too strong if p ¿ 1. This makes impossible (if we want to get best possible conditions) to apply the results in the asymptotics of M-estimators in Pollard (1984). He assumed that E[|g(X; ) − g(X; 0 )|2 ] ¡ ∞, for each in a neighborhood of 0 . Given a d × m matrix z, we de5ne the following norm: ||z|| := max sup |b z|; sup |zb| : b∈Rd |b|61
b∈Rm |b|61
Observe that if either m = 1 or d = 1, last norm is just the Euclidean norm. c will design a constant which may vary from occurrence to occurrence. 2. Asymptotics for regression M-estimators First, we present several results about the asymptotic normality of regression Mestimators for smooth enough. The next proposition deals with the consistency of the M-estimator. Proposition 2.1. With the above notation; let : Rm → R be a continuous function. Suppose that: (i) (ii) (iii) (iv) (v) (vi)
For each a = 0; E[(U + a) − (U )] ¿ 0. For each = 0; Pr{Z = 0} ¡ 1. lim|x|→∞ (x) = ∞. For each x ∈ Rm ; (x)¿(0). E[(U − a)] is a continuous function on a. For each ∈ Rd ; lim E
#→0
sup |(U − Z − Z t) − (U − Z )| = 0:
t: |t|6#
Then; ˆn → 0 a:s:; where ˆn is any sequence of random variables satisfying (1:2). Proof. Without loss of generality, we may assume that (0)=0. Since is continuous, it is possible to choose ˆn satisfying (1.2). Hypothesis (ii) implies that Pr{Z = 0} ¿ 0,
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
239
for each = 0. By compactness there exists a #0 ¿ 0 such that Pr{|Z |¿#0 }¿#0 , for each ∈ Rd with || = 1. Take M0 ¡ ∞ such that Pr{||Z||¿M0 }6#0 =2;
2−1 #0 inf (x) Pr{|U |6M0 }¿E[(U )] + 2: |x|¿M0
Take t1 ; : : : ; tm ∈ Rd such that |tk |=1 for each 16k6m, and for each t ∈ Rd with |t|=1, there exists a 16k6m such that |t − tk |62−1 M0−1 #0 . Given ||¿4M0 #−1 0 , there exists a 16k6m such that |||−1 − tk |62−1 M0−1 #0 . Moreover, if ||z||6M0 and |z tk |¿#0 , then |z | ¿ |||z tk | − |z ( − ||tk )| ¿ #0 || − ||z||||2−1 M0−1 #0 ¿2−1 #0 ||¿2M0 : Thus, n−1
n j=1
(Ui − Zj ) ¿ n−1 ¿ n−1
n j=1
(Uj − Zj )I||Zj ||6M0 ;
n
inf (x)I||Zj ||6M0 ;
j=1 |x|¿M0
|Uj |6M0 ; |Zj tk |¿#0
|Uj |6M0 ; |Zj tk |¿#0
→ inf (x) Pr{||Z||6M0 ; |U |6M0 ; |Z tk |¿#0 } |x|¿M0
¿ 2 + E[(U )]
a:s:
So, eventually inf
||¿4M0 #−1 0
n−1
n j=1
(Uj − Zj )¿1 + n−1
n j=1
(Uj ):
Hence, eventually |ˆn − 0 |64M0 #−1 0 . Given, N ¿ 0, if |a|¿2N and |U |6N , then |U − a|¿N . So, E[(U − a)]¿ inf (x) Pr{|U |6N }: |x|¿N
Therefore, lim|a|→∞ E[(U − a)] = ∞. From this and condition (v), for each # ¿ 0, inf E[(U − a) − (U )] ¿ 0:
|a|¿#
We have that inf E[(U − Z ) − (U )] ¿ inf E[((U − Z ) − (U ))I|Z |¿##0 ]
||¿#
||¿#
¿ inf Pr{|Z |¿##0 } inf E[(U − a) − (U )] ||¿#
|a|¿##0
¿ #0 inf E[(U − a) − (U )] ¿ 0: |a|¿##0
We have just obtained that for each # ¿ 0, inf E[(U − Z ) − (U )] ¿ 0:
||¿#
(2.1)
240
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
By the law of the large numbers for classes of functions satisfying a bracketing condition (Dudley, 1984, Theorem 6:1:5) and hypothesis (vi), for each M ¡ ∞,
n
−1
((Uj − Zj ) − (Uj ) − E[(Uj − Zj ) − (Uj )]) → 0 a:s: (2.2) sup n
j=1 ||6M
It follows from (2.1) and (2.2) that given 0 ¡ # ¡ M ¡ ∞, there exists a ( ¿ 0 such that n inf n−1 ((Uj − Zj ) − (Uj ))¿(; #6||6M
j=1
for each n large enough. So, |ˆn − 0 |6#, for each n large enough. The next theorem follows directly from Proposition 2.1 above, and Theorem 9 in Arcones (2000). Theorem 2.2. With the above notation; let : Rm → R be a continuous function and let ) : Rm → R. Suppose that: (i) (ii) (iii) (iv) (v) (vi)
For each a = 0; E[(U + a) − (U )] ¿ 0. For each = 0, Pr{Z = 0} ¡ 1. lim|x|→∞ (x) = ∞. For each x ∈ Rm ; (x)¿(0). E[(U − a)] is a continuous function on a. For each ∈ Rd ; lim E
#→0
sup |(U − Z − Z t) − (U − Z )| = 0:
t: |t|6#
(vii) There exists a positive-de:nite symmetric matrix V such that E[(U − Z ) − (U )] = V + o(||2 ); as → 0. (viii) is :rst di;erentiable with continuity. (ix) E[(@=@)(U )] = 0, E[||Z||2 ] ¡ ∞ and
2
@
2 E sup
(U − Z t)
||Z|| ¡ ∞ |t|6# @ for some # ¿ 0. Then; n1=2 (ˆn − 0 ) − 2−1 n−1=2 V −1
n i=1
Zi
@ Pr (Ui ) → 0; @
where ˆn is a sequence of random variables satisfying (1:2).
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
241
We are using the notation @=@ to denote the vector of 5rst derivatives of (). @2 =@ 2 denotes the matrix of second derivatives of (). The next theorem follows from Proposition 2:1, above, and Theorem 11 in Arcones (2000). Theorem 2.3. Suppose that: (i) (ii) (iii) (iv) (v) (vi)
For each a = 0; E[(U + a) − (U )] ¿ 0. For each = 0; Pr{Z = 0} ¡ 1. lim|x|→∞ (x) = ∞. For each x ∈ Rm ; (x)¿(0). E[(U − a)] is a continuous function on a. For each ∈ Rd ; lim E
#→0
(vii) (viii) (ix) (x)
sup |(U − Z − Z t) − (U − Z )| = 0:
t: |t|6#
is second di;erentiable with continuity. E[(@=@)(U )] = 0; E[||Z||2 ] ¡ ∞ and E[|(@=@)(U )|2 ] ¡ ∞. V := E[Z(@2 =@ 2 )(U )Z ] is a positive-de:nite symmetric matrix. For some #0 ¿ 0;
2
@
2
E sup
2 (U − Z t)
||Z|| ¡ ∞: |t|6#0 @ Then; @ Pr n1=2 (ˆn − 0 ) − 2−1 n−1=2 V −1 Zi (Ui ) → 0; @ i=1 n
where ˆn is a sequence of random variables satisfying (1:2). Many possible ’s are not smooth, for example (x) = |x|. To consider these cases, we study the case when the class of functions {(u − z ) − (u): ∈ } is a V–C subgraph class. In this situation, the conditions to obtain asymptotic limit theorems simplify. Next, we recall the de5nition of a V–C subgraph class. Let S be a set and let C be a collection of subsets of S. For A ⊂ S, we de5ne -C (A) = card{A ∩ C: C ∈ C}, mC (n) = max{-C (A): card(A) = n} and s(C) = inf {n: mC (n) ¡ 2n }. C is said to be a VC class of sets if s(C) ¡ ∞. General properties of VC classes of sets can be found in Chapters 9 and 11 in Dudley (1984). Given a function f : S → R, the subgraph of f is the set {(x; t) ∈ S × R: 06t6f(x) or f(x)6t60}. A class of functions F is a VC-subgraph class if the collection of subgraphs of F is a VC class. Next, we show that some common classes of functions are VC subgraph classes. Lemma 2.4. Consider the linear model with m = 1. Let : R → R be a continuous function, non-decreasing on [0; ∞) and non-increasing on (−∞; 0]. Let 0 ∈ Rd . Then,
242
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
the class of functions {(u−z )−(u): ∈ Rd }, where u ∈ R, z ∈ Rd , is a VC subgraph class of functions. Proof. We have to show that {A ∪ B : ∈ Rd } is a VC class of sets, where A := {(u; z; t): 06t6(u − z ) − (u)} and B := {(u; z; t): 0¿t¿(u − z ) − (u)}. Let 1 (u) = (u) for u60 and let 2 (u) = (u) for u¿0. Let −1 1 (t) = sup{u60: (u)¿t} and let −1 = A (t) = inf {u¿0: (u)¿t}. We have that A 2 ∪ A , where A := {(u; z; t): 06t; u − z ; u − z − −1 2 (t + (u))} and A := {(u; z; t): 06t; z − u; z − u + −1 1 (t + (u))}: We have that {C(t1 ; : : : ; tm ): t1 ; : : : ; tm ∈ R} is a VC class, where C(t1 ; : : : ; tm ) = {x ∈ m S: j=1 tj fj (x)¿0} and f1 ; : : : ; fm are functions on S (Dudley, 1984, Theorem 9:2:1). We also have that if {Ct : t ∈ T } and {Dt : t ∈ T } are VC classes, then so are {Ct ∩ Dt : t ∈ T } and {Ct ∪ Dt : t ∈ T } (Dudley, 1984, Proposition 9:2:5). Hence, {A : ∈ Rd } is a VC class. A similar argument gives that {B : ∈ Rd } is a VC class. Therefore, the claim follows. A similar argument gives the following: Lemma 2.5. Let 4: [0; ∞) → [0; ∞) be an increasing, continuous function, with 4(0)= 0 and limx→∞ 4(x) = ∞. Then, the class of functions {4(|u − z |) − 4(|u|): ∈ Rd }, where u ∈ Rm ; z ∈ Rd , is a VC subgraph class of functions. Next, we consider the consistency of the M-estimator in this VC situation. Proposition 2.6. Suppose that: (i) (ii) (iii) (iv) (v) (vi)
For each a = 0; E[(U + a) − (U )] ¿ 0. For each = 0, Pr{Z = 0} ¡ 1. lim|x|→∞ (x) = ∞. For each x ∈ Rm , (x)¿(0). E[(U − a)] is a continuous function on a. The class of functions {(u − z ) − (u): | − 0 |6#0 } is a VC subgraph class of functions for some #0 ¿ 0. (vii) For each M ¡ ∞, E
sup |(U − Z ) − (U )| ¡ ∞:
||6M
Then, ˆn → 0 a.s., where ˆn is any sequence of random variables satisfying (1:2).
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
243
Proof. The proof is similar to that of Proposition 2.1. The di7erence is that to obtain (2.2), we use the law of the large numbers for VC classes of functions (see GinKe and Zinn, 1984, Theorem 8:3). The following follows from Theorem 4 in Arcones (2000). Theorem 2.7. With the above notation, let ): Rm → R. Assume that: (i) (ii) (iii) (iv) (v) (vi)
For each a = 0, E[(U + a) − (U )] ¿ 0. For each = 0, Pr{Z = 0} ¡ 1. lim|x|→∞ (x) = ∞. For each x ∈ Rd ; (x)¿(0). E[(U − a)] is a continuous function on a. The class of functions {(u − z ) − (u): | − 0 |6#0 } is a VC subgraph class of functions for some #0 ¿ 0. (vii) For each M ¡ ∞, E
sup |(U − Z ) − (U )| ¡ ∞:
||6M
(viii) There exists a positive-de:nite symmetric matrix V such that E[(U − Z ) − (U )] = V + o(||2 ); as → 0. (ix) E[Z)(U )] = 0 and E[|Z)(U )|2 ] ¡ ∞. (x) For each M; ( ¿ 0, n Pr
sup |(U − n−1=2 Z ) − (U ) + n−1=2 Z)(U )|¿(
||6M
(xi) There are constants 0 ¡ q ¡ 1 and c ¿ 0 such that E
M −1 sup |(U − Z ) − (U )| ||6#
∧ M
−2
2
sup |(U − Z ) − (U )|
||6#
6c#2 M −1−q ; for each # ¿ 0 small enough and each M ¿ 0 large enough. (xii) For each ∈ Rd , nE[|(U − n−1=2 Z ) − (U ) + n−1=2 Z)(U )| ∧|(U − n−1=2 Z ) − (U ) + n−1=2 Z)(U )|2 ] → 0:
→ 0:
244
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
Then, Pr n1=2 (ˆn − 0 ) − 2−1 n−1=2 V −1 Zi )(Ui ) → 0; n
i=1
where ˆn is a sequence of random variables satisfying (1:2). Next, we consider the convergence in distribution of the Lp estimators, as de5ned in (1.3). We will need the following lemma. The proof is omitted, since it is simple calculus exercise. Lemma 2.8. There exists a universal constant c, depending only on p, such that: (i) If 1¿p ¿ 0, then ||x − |p − |x|p |6c(|x|p−1 || ∧ ||p ) for each x; ∈ Rd . (ii) If p¿1, then ||x − |p − |x|p |6c(|x|p−1 || ∨ ||p ) for each x; ∈ Rd . (iii) If 1 ¿ p ¿ 0, then ||x − |p − |x|p + p|x|p−2 x|6c(|x|p−1 || ∧ |x|p−2 ||2 ) for each x; ∈ Rd . (iv) If p = 1 and d = 1, then ||x − | − |x| + |x|−1 x|62||I|x|6|| for each x; ∈ Rd . (v) If 2 ¿ p¿1, then ||x − |p − |x|p + p|x|p−2 x|6c(|x|p−2 ||2 ∧ ||p ) for each x; ∈ Rd . (vi) If p¿2, then ||x − |p − |x|p + p|x|p−2 x|6c(|x|p−2 ||2 ∨ ||p ) for each x; ∈ Rd . (vii) If 2¿p ¿ 0, then ||x − |p − |x|p + p|x|p−2 x − 2−1 p(p − 2)|x|p−4 ( x)2 − 2−1 p|x|p−2 ||2 | 6c(|x|p−3 ||3 ∧ |x|p−2 ||2 ) for each x; ∈ Rd .
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
245
(viii) If 3¿p¿2, then ||x − |p − |x|p + p|x|p−2 x − 2−1 p(p − 2)|x|p−4 ( x)2 − 2−1 p|x|p−2 ||2 | 6c(|x|p−3 ||3 ∧ ||p ) for each x; ∈ Rd . (ix) If p¿3, then ||x − |p − |x|p + p|x|p−2 x − 2−1 p(p − 2)|x|p−4 ( x)2 − 2−1 p|x|p−2 ||2 | 6c(|x|p−3 ||3 ∨ ||p ) for each x; ∈ Rd . The next theorem gives the asymptotic normality of Lp regression estimators. Theorem 2.9. Let p ¿ 0, suppose that: (i) (ii) (iii) (iv)
For each a = 0, E[|U − a|p − |U |p ] ¿ 0. For each = 0; Pr{Z = 0} ¡ 1. E[||Z||2 ] ¡ ∞; E[||Z||p ] ¡ ∞, E[|U |2p−2 ] ¡ ∞ and E[|U |p−2 ] ¡ ∞. V := 2−1 p(p − 2)|U |p−4 ZUU Z + 2−1 p|U |p−2 ZZ is a positive-de:nite matrix.
Then, Pr n1=2 (ˆn − 0 ) − 2−1 n−1=2 V −1 p (|Ui |p−2 Zi Ui − E[|Ui |p−2 Zi Ui ]) → 0; n
i=1
where ˆn is any sequence of r.v.’s satisfying (1:3). Proof. The case p ¿ 2, follows from Theorem 2.3. We have that
2
@
E sup
2 (U − Z t)
||Z||2 6cE sup |U − Z t|p−2 ||Z||2 ¡ ∞: |t|6#0 @ |t|6#0 The case p = 2 is trivial. To get the case 2 ¿ p ¿ 0, we apply Theorem 2.7. We will only consider the case 2 ¿ p ¿ 1. Other cases are similar. Conditions (i) and (ii) in Theorem 2.7. are assumed. Conditions (iii) – (v) hold trivially. Condition (vi) follows from Lemma 2.5. Condition (vii) follows from Lemma 2.8. By Lemma 2.8, |E[|U − Z |p − |U |p + p|U |p−2 Z U − 2−1 p(p − 2)|U |p−4 ( ZU )2 −2−1 p|x|p−2 |Z |2 ]| 6cE[(|U |p−3 |Z |3 ∧ |U |p−2 |Z |2 )] = o(||2 ):
246
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
This implies that E[p|U |p−2 Z U ] = 0. So, conditions (viii) and (ix) hold. As to condition (x), n Pr
sup ||U − n−1=2 Z |p − |U |p + p|U |p−2 n−1=2 ZU |¿(
||6M
6n Pr{c|U |p−2 n−1 ||Z||2 ¿(} → 0: As to condition (xi), by Lemma 2.8, E M
−1
p
p
sup ||U − Z | − |U | | ∧ M
||6#
−2
sup ||U − Z | − |U | |
||6#
p
p 2
6cE[(M −1 (|U |p−1 ||Z||# ∨ ||Z||p #p )) ∧ (M −2 (|U |p−1 ||Z||# ∨ ||Z||p #p )2 )] 6cE[(M −1 |U |p−1 ||Z||# ∧ M −2 |U |2p−2 ||Z||2 #2 )I (|U | ¿ ||Z||#)] +cE[(M −1 ||Z||p #p ∧ M −2 ||Z||2p #2p )I (|U |6||Z||#)] 6cE[M −2 |U |2p−2 ||Z||2 #2 ] + cnE[M −2 ||Z||2 #2 ]6cM −2 #2 : We have that nE[||U − n−1=2 Z |p − |U |p + p|U |p−2 n−1=2 Z U | ∧ ||U − n−1=2 Z |p −|U |p + p|U |p−2 n−1=2 Z U |2 ] 6cE[(|U |p−2 |Z |2 ) ∧ (|U |2p−4 n−1 |Z |4 )] → 0; and condition (xii) follows. Observe that the conditions in the last theorem are best possible, we need (i) and (ii), in order that have E[|U − Z |p − |U |p ] ¿ 0 for each = 0. In order that the covariance of the limit to be de5ned, we need E[||Z||2 ] ¡ ∞; E[|U |2p−2 ] ¡ ∞ and E[|U |p−2 ] ¡ ∞. To have |E[|U − Z |p − |U |p ]| ¡ ∞, we need E[||Z||p ] ¡ ∞. We have V is a positive-de5nite matrix, if either p ¿ 1 or p = 1 and m¿2. It also could be positive de5nite for other values of p and some distributions. For example, if U has a symmetric distribution and m¿2, then 2−1 p(p − 2)E[|U |p−4 | ZU |2 ] + 2−1 pE[|U |p−2 | Z|2 ] =2−1 p(m−1 (p − 2) + 1)E[|U |p−2 | Z|2 ] ¿ 0 for each = 0. Observe that ( u) du ||2 (u(1) )2 du |u|=1 |u|=1 = = m−1 ||2 ; du du |u|=1 |u|=1 where u = (u(1) ; : : : ; u(m) ). It is interesting to notice that, under regularity conditions, the condition E[|U |p−2 ] ¡ ∞ is only satis5ed if either p ¿ 1 or m¿2. If p¿2,
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
247
the condition E[|U |p−2 ] ¡ ∞ is a moment condition. If 2¿p ¿ 0, then, ∞ ∞ p−2 p−2 Pr{t −1=(2−p) ¿|U |} dt: Pr{|U | ¿t} dt = E[|U | ] = 0
0
If U positive density in a neighborhood of 0, then Pr{|U |6t} = O(t m ) as t → 0+. So, E[|U |p−2 ] ¡ ∞, if and only if p + m ¿ 2. The previous theorem covers the case either m¿2 or p ¿ 1. Next, we consider the case m = 1 and 1¿p ¿ 0. Next, we consider the case m = p = 1. This case has been considered by Bassett and Koenker (1978) and Bloom5eld and Steiger (1983, p. 50). They assumed that U has a density fU (u) in a neighborhood of 0. To obtain asymptotic normality for the median is not needed to have a density in a neighborhood of the median. It suGces that the derivative of the distribution function at the median is positive (see, for example, Smirnov, 1949). The same happens in this regression case. Theorem 2.10. Let ˆn be a sequence of r.v.’s satisfying (1:3) with p = 1 and m = 1. Suppose that: (i) FU (0) = 2−1 ; FU (u) is di;erentiable at u = 0 and FU (0) ¿ 0; where FU (u) = Pr{U 6u}. (ii) For each = 0; Pr{Z = 0} ¡ 1. (iii) E[||Z||2 ] ¡ ∞. Then; n1=2 (ˆn − 0 ) − 2−1 n−1=2 V −1 where V
n i=1
Pr
Zi sign(Ui ) → 0;
:= FU (0)E[ZZ ].
Proof. The result follows from Theorem 2.7 with )(u)=sign(u) and V =FU (0)E[ZZ ]. All the conditions in Theorem 2.7 are very easy to check. The only condition diGcult to check is condition (viii). Let M (t) = E[|U − t| − |U | + t sign(u)]. We have that t 0 M (t) = 0 2(FU (v) − FU (0)) dv, for t ¿ 0; and M (t) = t 2(FU (0) − FU (v)) dv, for t ¡ 0. This gives that t −2 (M (t) − t 2 FU (0)) → 0; as t → 0, and supt∈R |t −2 (M (t) − t 2 FU (0))| ¡ ∞. From this and the dominated convergence theorem, condition, we get that lim ||−2 |E[|U − Z | − |U | + Z sign(U ) − (Z )2 FU (0)]|;
→0
i.e. condition (viii) in Theorem 2.7 follows. The next lemma deals with the second di7erentiability of the function E[|Y −Z |p − |Y − Z 0 |p ], for 1 ¿ p ¿ 0.
248
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
Lemma 2.11. Let 1 ¿ p ¿ 0 and m = 1. Suppose that: (i) (ii) (iii) (iv) (v)
E[|U − a|p − |U |p ] ¿ 0; for each a = 0. For each = 0; Pr{Z = 0} ¡ 1. E[||Z||2 ] ¡ ∞. There exists a # ¿ 0 such that U has a density fU (u) in [ − #; #]. # p−2 |u| |fU (u) − fU (0)| du. −#
Then; E[|U − Z |p − |U |p ] − V = o(||2 ); where V := E[2−1 p(p − 1)|U |p−2 I|U |¿# ]E[ZZ ] + p#p−1 fU (0)E[ZZ ] +
#
−#
2−1 p(p − 1)|u|p−2 (fU (u) − fU (0)) du E[ZZ ]:
(2.3)
Proof. Conditions (i) and (iv) imply that E[sign(U )|U |p−1 ] = 0. We have that E[|U − Z |p − |U |p ] − V =E[(|U − Z |p − |U |p + p sign(U )|U |p−1 Z − 2−1 p(p − 1)|U |p−2 (Z )2 )I|U |¿# ] +
#
−#
+
#
−#
+
#
−#
E[(|u − Z |p − |u|p − p#p−1 (Z )2 )I||||Z||6# ]fU (0) du E[(|u − Z |p − |u|p − p#p−1 (Z )2 )I||||Z||¿# ]fU (0) du E[|u − Z |p − |u|p + p sign(u)|u|p−1 Z
− 2−1 p(p − 1)|u|p−2 (Z )2 ](fU (u) − fU (0)) du = : I + II + III + IV: By Lemma 2.8, |I| 6 cE[(|U |p−3 ||3 ||Z||3 ) ∧ (|U |p−2 ||2 ||Z||2 )] = o(||2 ); |III|6c#p−1 E[||2 ||Z||2 I||||Z||¿# ] = o(||2 ) and
|IV|6c
#
−#
E[(|u|p−3 ||3 ||Z||3 ) ∧(|u|p−2 ||2 ||Z||2 )] |fU (u) − fU (0)|du=o(||2 ):
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
249
It easy to see that for |a|6#, # (|u − a|p − |u|p ) du = (p + 1)−1 ((# + a)p+1 + (# − a)p+1 − 2#p+1 ): −#
So, |II|6cE[|(# + Z )p+1 + (# − Z )p+1 − 2#p+1 − p(p + 1)#p−1 (Z )2 |I||||Z||6# ]: By elementary calculus, lim a−2 ((# + a)p+1 + (# − a)p+1 − 2#p+1 − p(p + 1)#p−1 a2 ) = 0:
a→0
From this and the dominated convergence theorem |II| = o(||2 ). If U has a density in R, ∞ V = E[ZZ ] 2−1 p(1 − p)|u|p−1 (fU (0) − fU (u)) du: −∞
2
Hence, if E[(Z ) ] ¿ 0, for each = 0, and fU (0)¿fU (u) for each u ∈ R; V is positive de5nite. The following follows from Theorem 2.7 and Lemmas 2.8 and 2.11: Theorem 2.12. Let 1 ¿ p ¿ 12 and m = 1. Let ˆn be any sequence of r.v.’s satisfying (1:3). Suppose that: For each a = 0; E[|U + a|p − |U |p ] ¿ 0. For each = 0; Pr{Z = 0} ¡ 1. E[||Z||2 ] ¡ ∞. There exists a # ¿ 0 such that U has a density fU (u) in [ − #; #]. # p−2 |u| |fU (u) − fU (0)| du. −# # −1 (vi) E[2 p(p − 1)|U |p−2 I|U |¿# ] + p#p−1 fU (0) + −# 2−1 p(p − 1)|u|p−2 (fU (u) − fU (0)) du ¿ 0.
(i) (ii) (iii) (iv) (v)
Then; n1=2 (ˆn − 0 ) − 2−1 n−1=2 V −1
n i=1
Pr
p sign(Ui )Zi |Ui |p−1 → 0;
where V is as in (2:3). In the case 0 ¡ p ¡ 12 , these estimators behave as the Lp medians, which were considered in Arcones (1994). We will need the following theorem, which is an obvious variation of Theorem 1 in Arcones (1994). Theorem 2.13. Let {Xi }∞ i=1 be a sequence of i.i.d. r.v.’s with values in S. Let be d a Borel subset of R . Let g : S × → R be a function such that g(·; ) : S → R is measurable for each ∈ . Let {an } and let {bn } be two sequences of positive numbers converging to in:nity. Let {ˆn = ˆn (X1 ; : : : ; Xn )} be a sequence of random
250
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
variables. Let 7 ¿ 0. Suppose that: Pr ˆn → 0
(i)
bn n−1
and
n j=1
g(Xj ; ˆn )6 inf bn n−1 g(Xj ; ) + oPr (1): n
∈
(ii) There are 40 ; #0 ¿ 0 such that
j=1
40 (an | − 0 |)7 6bn E[g(X; ) − g(X; 0 )] for each | − 0 |6#0 . (iii) There exists a stochastic process {Y (): ∈ Rd } such that
n
−1
bn n
j=1
(g(Xj ; 0 +
a−1 n )
− g(Xj ; 0 )): ||6M
converges weakly to {Y (): ||6M }; for each M ¡ ∞. (iv) There exists a #1 ¿0 such that lim lim sup
M →∞
Pr
n→∞
sup
|bn n−1
n
|−0 |6#1
j=1 (g(Xj ; ) −g(Xj ; 0 ) −E[g(Xj ; ) −g(Xj ; 0 )])| ¿1 2−1 40 |an ( −0 )|7 + M
= 0:
(v) With probability one; the stochastic process {Y (): ∈ Rd } has a unique mini˜ mum at ˜ and for each 0 ¡ #; M ¡ ∞; with ||6M , inf
||6M ˜ |−|¿#
˜ Y () ¿ Y ():
Then; d ˜ an (ˆn − 0 ) → :
To check condition (iv) in Theorem 2.13, we will use di7erent results depending on the form of the limit process. When the limit is normal, we will use the following theorem: Theorem 2.14. Under the notation in Theorem 2:13; suppose that: (i) For some # ¿ 0; {g(x; ) − g(x; 0 ): | − 0 |6#} is a VC subgraph class of functions. (ii) For each ( ¿ 0; n Pr
bn n
−1=2
sup |g(X; 0 +
|t|6M
a−1 n t)
− g(X; 0 )|¿(
→ 0:
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
251
(iii) For each s; t ∈ Rd ; the following limit exists: lim b2n Cov((g(X; 0 + a−1 ; n s) − g(X; 0 ))Ibn n−1=2 |g(X;0 +a−1 n s)−g(X;0 )|61
n→∞
): (g(X; 0 + a−1 n t) − g(X; 0 ))Ibn n−1=2 |g(X;0 +a−1 n t)−g(X;0 )|61
(iv)
lim sup b2n E n→∞
2 sup |g(X; 0 + a−1 n t) − g(X; 0 )|
|t|6M
× Ibn n−1=2 sup
|t|6M
|g(X;0 +a−1 n t)−g(X;0 )|61
¡∞:
(v) lim lim sup
#→0
n→∞
sup
|s|;|t|6M |s−t|6#
−1 2 b2n E[|g(X; 0 + a−1 n s) − g(X; 0 + an t)|
−1 × Ibn n−1=2 |g(X;0 +a−1 ] = 0: n s)−g(X;0 +an t)|61
(vi)
sup bn n1=2 |E[(g(X; 0 + ta−1 ]| → 0: n ) − g(X; 0 ))Ibn n−1=2 |g(X;0 +a−1 n t)−g(X;0 )|¿1
|t|6M
Then;
n −1 bn n−1=2 (g(Xi ; 0 + ta−1 n ) − g(Xi ; 0 ) − E[g(X; 0 + tan ) i=1
−g(X; 0 )]): |t|6M converges weakly to the process {G(t): |t|6M } with mean zero and covariance given by E[G(s)G(t)] = lim b2n Cov((g(X; 0 + a−1 n s) − g(X; 0 )) n→∞
; × Ibn n−1=2 |g(X;0 +a−1 n s)−g(X;0 )|61 (g(X; 0 + a−1 ): n t) − g(X; 0 ))Ibn n−1=2 |g(X;0 +a−1 n t)−g(X;0 )|61 The last theorem follows from Theorem 2.3 in Arcones (1999). Similar results, but slightly stronger, are in Alexander (1987) and Pollard (1990, Theorem 10:6). When the limit is in5nite divisible, without a Gaussian part, we will use the following: Theorem 2.15 (Arcones, 1999, Theorem 2:5). Under the notation in Theorem 2:13; let cn (t) be a real number for each |t|6M and each n¿1. Suppose that: (i) The :nite-dimensional distributions of {Zn (t): |t|6M } converge to those of an in:nitely divisible process {Z(t): |t|6M } without Gaussian part; where Zn (t) := bn n−1=2
n i=1
(g(Xi ; 0 + ta−1 n ) − g(Xi ; 0 )) − cn (t):
252
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
(ii) {g(x; ) − g(x; 0 ): | − 0 |6#} is a VC subgraph class of functions for some # ¿ 0. (iii) For each ( ¿ 0; −1 −1 sup |g(X; 0 + san ) − g(X; 0 + tan )|¿( = 0: lim lim sup n Pr #→0 n→∞ |s|;|t|6M |s−t|6#
(iv)
lim lim sup b2n E
#→0
n→∞
2 sup |g(X; 0 + ta−1 n ) − g(X; 0 )|
|t|6M
× Ibn n−1=2 sup
|t|6M
|g(X;0 +ta−1 n )−g(X;0 )|6#
= 0:
(v) lim lim sup
#→0
n→∞
sup
|s|;|t|6M |s−t|6#
×Ibn n−1=2 sup
−1 |bn n1=2 E[(g(X; 0 + sa−1 n ) − g(X; 0 + tan )
−1 |t|6M |g(X;0 +tan )−g(X;0 )|61
] − (cn (s) − cn (t))| = 0:
Then; {Zn (t): |t|6M } converges weakly to {Z(t): |t|6M }. The next lemma deals with condition (iii) in Theorem 2.13. It is an obvious variation of Lemma 3 in Arcones (2000). Lemma 2.16. With the notation of Lemma 2:13; let 7 ¿ 0; suppose that: (i) The class of functions {g(x; ) − g(x; 0 ): | − 0 |6#0 } is a VC subgraph class of functions for some #0 ¿ 0. (ii) For each M ¿ 0; sup a7n |(Pn − P)(g(·; 0 + a−1 n ) − g(·; 0 ))| = OPr (1):
||6M
(iii) There are constant c; 9; : ¿ 0 such that E[G#2 (X )]6c#9 (log #−1 ): for each # ¿ 0 small enough. (iv) (log an ): an27−9 = O(n). Then; for each 4 ¿ 0; there exists a # ¿ 0 such that a7n |(Pn − P)(g(·; ) − g(·; 0 ))| lim lim sup Pr ¿1 = 0: sup M →∞ n→∞ 4a7n | − 0 |7 + M |−0 |6# Sometimes, we will use the following variation, that do not require 5nite second moment from G# (X ).
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
253
Lemma 2.17. With the notation of Lemma 2:13; let 7 ¿ 0; suppose that: (i) The class of functions {g(x; ) − g(x; 0 ): | − 0 |6#0 } is a VC subgraph class of functions for some #0 ¿ 0. (ii) For each M ¿ 0; sup a7n |(Pn − P)(g(·; 0 + a−1 n ) − g(·; 0 ))| = OPr (1):
||6M
(iii) There are constants 0 ¡ q ¡ 1 and c ¿ 0 such that E[(M −1 G# (X )) ∧ (M −2 G#2 (X ))]6c#7 M −1−q ; for each # ¿ 0 small enough and each M ¿ 0 large enough. (iv) an = O(n1=7 ). Then; for each 4 ¿ 0; there exists a # ¿ 0 such that a7n |(Pn − P)(g(·; ) − g(·; 0 ))| lim lim sup Pr sup ¿1 = 0: M →∞ n→∞ 4a7n | − 0 |7 + M |−0 |6# Next, we consider the asymptotics of M-estimators for 0 ¡ p ¡ 12 . Theorem 2.18. Let 12 ¿ p ¿ 0 and m = 1. Let ˆn be any sequence of r.v.’s satisfying (1:3). Suppose that: For each a = 0; E[|U − a|p − |U |p ] ¿ 0. For each = 0; Pr{Z = 0} ¡ 1. E[||Z||2 ] ¡ ∞. There exists a # ¿ 0 such that U has a density fU (u) in [ − #; #]. # p−2 |u| |fU (u) − fU (0)| du. −# # −1 (vi) E[2 p(p − 1)|U |p−2 I|U |¿# ] + p#p−1 fU (0) + −# 2−1 p(p − 1)|u|p−2 (fU (u) − fU (0)) du ¿ 0. Then; n1=(3−2p) (ˆn − 0 ) converges in distribution to the argument that minimizes the process
(i) (ii) (iii) (iv) (v)
{Z(t) + t Vt: t ∈ Rd }; where V is an in (2:3) and {Z(t): t ∈ Rd } is the Gaussian process with mean zero and covariance given by E[Z(t)Z(s)] := 2−1 E[|Z t|2p+1 + |Z s|2p+1 − |Z t − Z s|2p+1 ]fU (0) ∞ (|u − 1|p − |u|p )2 du: × −∞
Proof. We apply Theorem 2.13 with an = n1=(3−2p) , bn = a2n and 7 = 2. Condition (i) in Theorem 2.13 follows from Proposition 2.6. Condition (ii) follows from Lemma 2.11.
254
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
To check condition (iii), we have to prove that n p p −1 p p a2n n−1 (|Uj − a−1 n Zj t| − |Uj | − E[|Uj − an Zj t| − |Uj | ]): |t|6M j=1
converges weakly to {Z(t) : |t|6M }, for each M ¡ ∞. To prove this, we apply Theorem 2.14. Condition (i) in Theorem 2.14 follows from Lemma 2.5. As to condition (ii) in Theorem 2.14, sup n(2p−1)=(3−2p) ||U − n−1=(3−2p) Z t|p − |U |p |¿(
n Pr
|t|6M
6n Pr{|U |p−1 ||Z||¿c(n(2−2p)=(3−2p) ; |U |¿n−1=(3−2p) ||Z||} +n Pr{||Z||p ¿c(n(1−p)=(3−2p) ; |U | ¡ n−1=(3−2p) ||Z||}: We have that n Pr{|U |p−1 ||Z||¿c(n(2−2p)=(3−2p) ; |U |¿n−1=(3−2p) ||Z||} 6n Pr{||Z||p ¿c(n(1−p)=(3−2p) } 6cE[||Z||p(3−2p)=(1−p) I||Z||p ¿c(n(1−p)=(3−2p) ] → 0; because 0 ¡ p(3 − 2p)=(1 − p) ¡ 2. Using that Pr{|U |6t}6ct; n Pr{||Z||p ¿c(n(1−p)=(3−2p) ; |U | ¡ n−1=(3−2p) ||Z||} 6n(2−2p)=(3−2p) E[||Z||I||Z||p ¿c(n(1−p)=(3−2p) ]6cE[||Z||2p+1 I||Z||p ¿c(n(1−p)=(3−2p) ] → 0: (2.4) Therefore, condition (ii) in Theorem 2.14 follows. To check condition (iii), we need that lim −(2p+1) E[(|U − Z s|p − |U |p )(|U − Z t|p − |U |p )] = E[Z(s)Z(t)] (2.5)
→0+
for each s; t ∈ Rd , and n(2p+1)=(3−2p) E[(|U − n−1=(3−2p) Z t|p − |U |p )2 I||U −n−1=(3−2p) Z t|p −|U |p |¿n(1−2p)=(3−2p) ] → 0 for each t ∈ Rd . We have that −(2p+1) E[(|U − Z s|p − |U |p )(|U − Z t|p − |U |p )] =
−(2p+1)
#
−#
E[(|u − Z s|p − |u|p )(|u − Z t|p − |u|p )]fU (0) du
(2.6)
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
+−(2p+1)
#
−#
255
E[(|u − Z s|p − |u|p )(|u − Z t|p − |u|p )](fU (u) − fU (0)) du
+−(2p+1) E[(|U − Z s|p − |U |p )(|U − Z s|p − |U |p )I|U |¿# ]:
(2.7)
By the change of variable u = v, # −(2p+1) E[(|u − Z s|p − |u|p )(|u − Z s|p − |u|p )]fU (0) du −#
=
−1 #
−−1 #
→
E[(|u − Z s|p − |u|p )(|u − Z t|p − |u|p )]fU (0) du
∞
−∞
E[(|u − Z s|p − |u|p )(|u − Z t|p − |u|p )]fU (0) du:
(2.8)
We have that ∞ (|u − a|p − |u|p )(|u − b|p − |u|p ) du −∞
=2−1
∞
−∞
((|u − a|p − |u|p )2 + (|u − b|p − |u|p )2 − (|u − b|p − |u − a|p )2 ) du:
By the change of variable u = a + (b − a)v, ∞ (|u − a|p − |u − b|p )2 du = |b − a|2p+1 −∞
So,
∞
−∞
−∞
(|u − 1|p − |u|p )2 du:
(|u − a|p − |u|p )(|u − b|p − |u|p ) du
=2−1 (|a|2p+1 + |b|2p+1 − |b − a|2p+1 ) and
∞
∞
−∞
∞
−∞
(|u − 1|p − |u|p )2 du
E[(|u − Z s|p − |u|p )(|u − Z t|p − |u|p )]fU (0) du = E[Z(s)Z(t)]:
(2.9)
By the change of variable u = v,
#
−(2p+1)
E[(|u − s Z|p − |u|p )(|u − t Z|p − |u|p )](fU (u) − fU (0)) du
−#
6
−1 #
−−1 #
E[||u − Z s|p − |u|p |||u − Z t|p − |u|p |]|fU (−1 u) − fU (0)| du → 0: (2.10)
256
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
By Lemma 2.8, −(2p+1) |E[(|U − s Z|p − |U |p )(|U − t Z|p − |U |p )I|U |¿# ]| 62−(2p+1) E[(|U − Z s|p − |U |p )2 I|U |¿# ] +2−(2p+1) E[(|U − Z t|p − |U |p )2 I|U |¿# ] 6c1−2p E[||Z||2 |U |2p−2 I|U |¿# ] → 0;
(2.11)
as → 0+. (2.5) follows from (2.7) – (2.11). As to (2.6), by Lemma 2.8, n(2p+1)=(3−2p) E[(|U − n−1=(3−2p) Z t|p − |U |p )2 I||U−n−1=(3−2p) Z t|p−|U |p |¿n(1−2p)=(3−2p) ] 6n(2p−1)=(3−2p) E[|U |2p−2 ||Z||2 I|U |¿cn−1=(3−2p) ||Z||; c|U |p−1 ||Z||¿n(1−2p)=(3−2p) ] +n1=(3−2p) E[||Z||2p I|U |6cn−1=(3−2p) ||Z||; ||Z||p ¿cn(1−p)=(3−2p) ]: Using that E[|U |2p−2 I|U |¿a ]6ca2p−1 ; n(2p−1)=(3−2p) E[|U |2p−2 ||Z||2 I|U |¿cn−1=(3−2p) ||Z||;
c|U |p−1 ||Z||¿n(1−2p)=(3−2p) ]
6E[||Z||2p+1 I||Z||¿cn1=(3−2p) ] → 0:
(2.12)
By (2.4), n1=(3−2p) E[||Z||2p I|U |6cn−1=(3−2p) ||Z||;
||Z||p ¿cn(1−p)=(3−2p) ]
6cE[||Z||2p+1 I||Z||p ¿cn(1−p)=(3−2p) ] → 0: So, (2.6) follows. As to condition (iv) in Theorem 2.14, n(2p+1)=(3−2p) E
sup (|U − n−1=(3−2p) Z t|p − |U |p )2
|t|6M
× Isup|t|6M ||U −n−1=(3−2p) Z t|p −|U |p |6n(1−2p)=(3−2p) 6cn(2p+1)=(3−2p) E[|U |2p−2 n−2=(3−2p) ||Z||2 In−1=(3−2p) ||Z||6|U | ] +cn(2p+1)=(3−2p) E[n−2p=(3−2p) ||Z||2p I|U |6n−1=(3−2p) ||Z|| ]6cE[||Z||2p+1 ]: As to condition (v) in Theorem 2.14, for |s|; |t|6M , |s − t|6#, n(2p+1)=(3−2p) E[(|U − n−1=(3−2p) s Z|p − |U − n−1=(3−2p) t Z|p )2 ] 6n(2p+1)=(3−2p) E[(|U − n−1=(3−2p) s Z|p
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
257
−|U − n−1=(3−2p) t Z|p )2 I2Mn−1=(3−2p) ||Z||6|U | ] +n(2p+1)=(3−2p) E[(|U − n−1=(3−2p) s Z|p −|U − n−1=(3−2p) t Z|p )2 )I2Mn−1=(3−2p) ||Z||¿|U | ]: By Lemma 2.8 and (2.12), n(2p+1)=(3−2p) E[(|U − n−1=(3−2p) s Z|p − |U − n−1=(3−2p) t Z|p )2 I2Mn−1=(3−2p) ||Z||6|U | ] 6cn(2p−1)=(3−2p) E[|s − t|2 ||Z||2 |U − n−1=(3−2p) s Z|2p−2 I2Mn−1=(3−2p) ||Z||6|U | ] 6cn(2p−1)=(3−2p) E[|s − t|2 ||Z||2 |U |2p−2 I2Mn−1=(3−2p) ||Z||6|U | ] 6cE[|s − t|2 ||Z||2p+1 M 2p−1 ]: By Lemma 2.8 and (2.4), n(2p+1)=(3−2p) E[(|U −n−1=(3−2p) s Z|p − |U −n−1=(3−2p) t Z|p )2 )I2Mn−1=(3−2p) ||Z||¿|U | ] 6cn1=(3−2p) E[|s − t|2p ||Z||2p I2Mn−1=(3−2p) ||Z||¿|U | ]6cME[|s − t|2p ||Z||2p+1 ]: So, condition (v) holds As to condition (vi) in Theorem 2.14, by Lemma 2.8 and (2.4), n2=(3−2p) E
sup ||U − n−1=(3−2p) Z |p − |U |p |
||6M
× In(2p−1)=(3−2p) sup||6M ||U −n−1=(3−2p) Z |p −|U |p |¿1 6cn(2−p)=(3−2p) E[c||Z||p Icn(p−1)=(3−2p) ||Z||p ¿1; n−1=(3−2p) ||Z||¿|U | ] +cn1=(3−2p) E[|U |p−1 ||Z||Icn(2p−2)=(3−2p) |U |p−1 ||Z||¿1; n−1=(3−2p) ||Z||¡|U | ] 6cn(1−p)=(3−2p) E[c||Z||p+1 Icn(p−1)=(3−2p) ||Z||p ¿1 ] → 0 (E[|U |p−1 I|U |¿a ]6cap ). Therefore, the conditions in Theorem 2.14 hold. To check condition (iv) in Theorem 2.13, by Lemma 2.17, we need that E
sup ||U − Z |p − |U |p ||2 = O(#2p+1 ):
||6#
By Lemma 2.8, (2.4) and (2.11), E
sup |U − Z |p − |U |p ||2 6cE[|U |2p−2 ||Z||2 #2 ∧ ||Z||2p #2p ]
||6#
6cE[|U |2p−2 ||Z||2 #2 I||Z||#6|U | ] + cE[||Z||2p #2p I||Z||#¿|U | ] 6cE[||Z||2p+1 #2p+1 ]6c#2p+1 :
258
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
By Lemma 5 in Arcones (1994), with probability one, {Z(t) + t Vt : t ∈ Rd } attains its minimum at a unique point. In the case p =
1 2
and m = 1, we have the following:
Theorem 2.19. Let ˆn be any sequence of r.v.’s satisfying (1:3) with m = 1. Suppose that: (i) (ii) (iii) (iv) (v) (vi) (vii)
For each a = 0; E[|U − a|1=2 − |U |1=2 ] ¿ 0. For each = 0; Pr{Z = 0} ¡ 1. E[||Z||2 ( log ||Z||−1 )] ¡ ∞. exists a # ¿ 0 such that U has a density fU (u) in [ − #; #]. There # −3=2 |u| |fU (u) − fU (0)| du. −# fU (0) ¿ 0. # −E[2−3 |U |−3=2 I|U |¿# ] + 2−1 #−1=2 fU (0) − −# 2−3 |u|−3=2 (fU (u) − fU (0)) du ¿ 0.
Then; n1=2 (2 log n)−1=2 (ˆn − 0 ) converges in distribution to 2−1 V −1 ;; where V is as in (2:3) and ; is a d-dimensional normal random vector with mean zero and covariance E[;; ] = 2−2 fU (0)E[ZZ ]. Proof. We apply Theorem 2.13 with an = (2 log n)−1=2 n1=2 and bn = (2 log n)−1 n. Condition (i) in Theorem 2.13 follow from Proposition 2.6. Condition (ii) follows from Lemma 2.11. To get condition (iii), we need that n (2 log n)−1 (|Uj − (2 log n)1=2 n−1=2 Zj t|1=2 − |Uj |1=2 i=1
−E[|Uj − (2 log n)1=2 n−1=2 t Zj |1=2 − |Uj |1=2 ]): |t|6M
converges weakly to {Z(t) := t ; : |t|6M }, for each M ¡ ∞. We apply Theorem 2.14. Condition (i) in Theorem 2.14 follows from Lemma 2.5. As to condition (ii), n Pr
sup ||U − (2 log n)1=2 n−1=2 t Z|1=2 − |U |1=2 |¿( log n
|t|6M
6n Pr{cM ||Z||¿(2 n1=2 ( log n)3=2 } → 0: To check condition (iii), we need that for each s; t, lim n(2 log n)−2 E[(|U − (2 log n)1=2 n−1=2 t Z|1=2 − |U |1=2 )
n→∞
× I(2 log n)−1 ||U −(2 log n)1=2 n−1=2 t Z|1=2 −|U |1=2 |61 × (|U − (2 log n)1=2 n−1=2 s Z|1=2 − |U |1=2 ) × I(2 log n)−1 ||U −(2 log n)1=2 n−1=2 s Z|1=2 −|U |1=2 |61 ] = E[s;; t]:
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
259
We claim that lim −2 (2 log −1 )−1 E[(|U −s Z|1=2 −|U |1=2 )(|U −s Z|1=2 −|U |1=2 )] = E[s;; t]
→0+
(2.13) for each s; t ∈ Rd . We have that −2 (2 log −1 )−1 E[(|U − s Z|1=2 − |U |1=2 )(|U − t Z|1=2 − |U |1=2 )] # E[(|u − Z s|1=2 − |u|1=2 )(|u − Z t|1=2 − |u|1=2 )]fU (0) du =−2 (2 log −1 )−1 −#
+
−2
(2 log
−1 −1
)
#
−#
E[(|u − Z s|1=2 − |u|1=2 )(|u − Z t|1=2 − |u|1=2 )]
× (fU (u) − fU (0)) du +−2 (2log−1 )−1 E[(|U − Z s|1=2 − |U |1=2 )(|U − Z t|1=2 − |U |1=2 )I|U |¿# ]: Let N be a constant. By a change of variables, # E[(|u − Z s|1=2 − |u|1=2 )(|u − t Z|1=2 − |u|1=2 )]fU (0) du −2 (2 log −1 )−1 −#
=(2 log −1 )−1 =(2 log −1 )−1 +(2 log
#−1
−#−1
−N
−#−1
−1 −1
)
−N
+(2log
N
−1 −1
)
E[(|u − s Z|1=2 − |u|1=2 )(|u − t Z|1=2 − |u|1=2 )]fU (0) du E[(|u − s Z|1=2 − |u|1=2 )(|u − t Z|1=2 − |u|1=2 )]fU (0) du E[(|u − Z s|1=2 − |u|1=2 )(|u − Z t|1=2 − |u|1=2 )]fU (0) du
#−1
N
E[(|u − Z s|1=2 − |u|1=2 )(|u − Z t|1=2 − |u|1=2 )]fU (0) du:
By a Taylor expansion #−1 E[(|u − s Z|1=2 − |u|1=2 )(|u − t Z|1=2 − |u|1=2 )]fU (0) du (2 log −1 )−1 N
=(2 log −1 )−1 =(2 log
−1 −1
)
#−1
N
#−1
N
E[2−1 u−1=2 t Z2−1 u−1=2 Z s]fU (0) du + o(1) E[2−1 u−1=2 t Z2−1 u−1=2 Z s]fU (0) du + o(1):
=2−3 ( log −1 )−1 (log (#−1 ) − log N )E[t ZZ s]fU (0) du + o(1) → 2−3 E[t ZZ s]fU (0) du:
260
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
Similarly, =(2 log −1 )−1
−N
−#−1
E[(|u − s Z|1=2 − |u|1=2 )(|u − t Z|1=2 − |u|1=2 )]fU (0) du
→ 2−3 E[t ZZ s]fU (0) du: We have that (2 log
−1 −1
)
N
1=2
E[(|u − s Z|
−N
1=2
− |u|
1=2
)(|u − t Z|
1=2
− |u|
)]fU (0) du
6(2 log −1 )−1 2N |t|1=2 |s|1=2 E[||Z||]fU (0) → 0: By Lemma 2.8,
−2
(2 log 1=2
−|u|
−1 −1
)
#
E[(|u − s Z|1=2 − |u|1=2 )(|u − t Z|1=2
−#
)](fU (u) − fU (0)) du
2
6cE[|t||s|||Z|| ](2 log
−1 −1
)
#
−#
|u|−1 |fU (u) − fU (0)| du → 0;
as → 0+. By Lemma 2.8, −2 (2 log −1 )−1 |E[(|U − s Z|1=2 − |U |1=2 )(|U − t Z|1=2 − |U |1=2 )I|U |¿# ]| 6c(2 log −1 )−1 E[|t||s|||Z||2 |U |−1 I|U |¿# ] → 0; as → 0+. (2.13) follows from all these estimations. The rest of the conditions in Theorem 2.14 follow similarly to those checked in Theorem 2.18. To check condition (iv) in Theorem 2.13, by Lemma 2.16, we need that E
sup ||U − Z |1=2 − |U |1=2 ||2 = O(#2 (log #−1 )):
||6#
By Lemma 2.8 and (2.4), E
1=2
sup ||U − Z |
||6#
1=2 2
− |U |
||
6cE[|U |−1 ||Z||2 #2 ∧ ||Z||#]
6cE[|U |−1 ||Z||2 #2 I||Z||#6|U | ] + cE[||Z||#I||Z||#¿|U | ] 6cE[|U |−1 ||Z||2 #2 I||Z||#6|U |61 ] + c#2 6E[||Z||2 #2 ( log ||Z||−1 #−1 )I||Z||#61 ] + c#2 6c#2 ( log #−1 ): As to condition (v) in Theorem 2.13, we have that {t Vt + t ;: t ∈ Rd } attains its minimum only at t = −2−1 V −1 ;.
M.A. Arcones / Journal of Statistical Planning and Inference 97 (2001) 235–261
261
References Alexander, K.S., 1987. Central limit theorems for stochastic processes under random entropy conditions. Probab. Theory Related Fields 75, 351–378. Arcones, M.A., 1994. Distributional convergence of M-estimators under unusual rates. Statist. Probab. Lett. 21, 271–280. Arcones, M.A., 1999. Weak convergence for the row sums of a triangular array of empirical processes indexed by a manageable triangular array of functions. Electron. J. Probab. 4 (7), 1–17. Arcones, M.A., 2000. M-estimators converging to a stable limit. J. Multivariate Anal. 74, 193–221. Bai, Z.D., Chen, X.R., Miao, B.Q., Rao, C.R., 1990. Asymptotic theory of least distance estimate in multivariate linear models. Statistics 4, 503–519. Bai, Z.D., Rao, C.R., Wu, Y., 1992. M-estimation of multivariate linear regression parameters under a convex discrepancy function. Statist. Sinica 2, 237–254. Bassett, G., Koenker, R., 1978. Asymptotic theory of least absolute error regression. J. Amer. Statist. Assoc. 73, 618–622. Bloom5eld, P., Steiger, W.L., 1983. Least Absolute Deviations. Theory, Applications, and Algorithms. BirkhVauser, Boston. Davis, R.A., Knight, K., Liu, J., 1992. M-estimation for autoregression with in5nite variance. Stochastic Process. Appl. 40, 145–180. Davis, R.A., Wu, W., 1997. M-estimation for linear regression with in5nite variance. Probab. Math. Statist. 17, 1–20. Draper, N.R., Smith, H., 1981. Applied Regression Analysis. Wiley, New York. Dudley, R.M., 1984. A Course on Empirical Processes. Lecture Notes in Mathematics, Vol. 1097. Springer, New York, pp. 1–142. GinKe, E., Zinn, J., 1984. Some limit theorems for empirical processes. Ann. Probab. 12, 929–989. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics, the Approach Based on InWuence Functions. Wiley, New York. Huber, P.J., 1964. Robust estimation of a location parameter. Ann. Math. Statist. 35, 73–101. Huber, P.J., 1973. Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Statist. 1, 799–821. Huber, P.J., 1981. Robust Statistics. Wiley, New York. JureJckovKa, J., 1977. Asymptotic relations of M-estimates and R-estimates in linear regression model. Ann. Statist. 5, 464–472. Kim, J., Pollard, D., 1990. Cube root asymptotics. Ann. Statist. 18, 191–219. Koenker, R., Bassett Jr, G., 1978. Regression quantiles. Econometrica 46, 33–50. Koenker, R., Portnoy, S., 1987. L-estimation for linear models. J. Amer. Statist. Assoc. 82, 851–857. Koul, H.L., 1977. Behavior of robust estimators in the regression model with dependent errors. Ann. Statist. 15, 681–699. Pollard, D., 1984. Convergence of Stochastic Processes. Springer, New York. Pollard, D., 1990. Empirical Processes: Theory and Applications. NSF–CBMS Regional Conference Series in Probability and Statistics, Vol. 2. Institute of Mathematical Statistics, Hayward, CA. Prakasa Rao, B.L.S., 1968. Estimation of the location of the cusp of a continuous density. Ann. Math. Statist. 39, 76–87. Ruppert, D., Carroll, R.J., 1980. Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75, 828–838. Smirnov, N.V., 1949. Limit distributions for the terms of a variational series. Amer. Math. Soc. Trans. Ser. 11 (1), 82–143. Yohai, V.J., Marona, R.A., 1979. Asymptotic behavior of M-estimators for the linear model. Ann. Statist. 7, 258–268.