Journal of Statistical Planning and Inference 121 (2004) 231 – 248
www.elsevier.com/locate/jspi
M-estimation in linear models under nonstandard conditions Faouzi El Bantli∗;1 I.S.R.O., Campus Plaine Triomphe Boulevard Du Triomphe, Universite Libre de Bruxelles, Brussels 1050, Belgium Received 29 June 2001; accepted 20 November 2002
Abstract The limiting distribution of M-estimators of the regression parameter in linear models is derived under nonstandard conditions, allowing, e.g., for discontinuities in density functions. Unlike usual regularity assumptions, our conditions are satis4ed, for instance, in the case of regression quantiles, hence also in the context of L1 estimation; our results thus extend those of Knight (Ann. Statist. 26 (1998) 755). The resulting asymptotic distributions, in general, are not Gaussian. Therefore, the limiting bootstrap distributions of these estimators are also investigated. It is shown that bootstrap approximations are correct to the 4rst order only when limiting distributions are Gaussian, or along speci4c sequences mn of bootstrap sample sizes. Numerical examples are given to illustrate these asymptotic results. c 2003 Elsevier B.V. All rights reserved. MSC: primary 62F12; 60F05; secondary 62E20; 62G05 Keywords: Bootstrap; Limiting distribution; Linear model; M-estimators
1. Introduction Consider the linear regression model Y(n) = X(n) + e(n) ;
(1)
where Y(n) := (Y1 ; : : : ; Yn ) is an observed n-dimensional vector, X(n) a (n×p) matrix of known constants, ∈ R p an unknown regression parameter, and e(n) := (e1 ; : : : ; en ) the unobservable error. As usual, we assume that the errors ei are i.i.d. random variables ∗
Corresponding author. Tel.: +32-2-6505898; fax: +32-2-6505899. E-mail address:
[email protected] (F. El Bantli). 1 Research supported by the Fonds d’Encouragement a E la Recherche de l’UniversitFe Libre de Bruxelles.
c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter doi:10.1016/S0378-3758(03)00113-7
232
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
with a common distribution function F, and that the rows of X(n) are of the form xi = (xi1 = 1; : : : ; xip ); i = 1; : : : ; n, so that the 4rst component 1 of can be interpreted as an intercept. To simplify the exposition, we will assume that the regressors xi are nonrandom, though our results still hold for random regressors, provided, e.g., that they are independent of the ei ’s. Our objective is to study the M-estimators ˆ(n) of , de4ned as solutions of the minimization problem ˆ(n) := arg minp ∈R
(n)
%(Yi − xi );
(2)
i=1
where % is some suitably chosen function. If % is convex and diJerentiable, with derivative , then ˆ(n) can be obtained by solving the estimating equations n xi (Yi − xi ) = 0: (3) i=1
The function in (3) is called the score function of the M-estimator ˆ(n) . If % is not diJerentiable, but admits a right and a left derivative, + and − , respectively, at each point, then one can choose the score function such that − 6 6 + . Note that this inequality is a consequence of the convexity of %. This de4nition of will be maintained throughout the paper. A huge literature has been devoted, in this context, to the asymptotic behavior of M-estimators, under increasingly mild conditions on %, and F. Huber (1973) derives the asymptotic normality of ˆ(n) , assuming that the score function is continuous, bounded and non-decreasing, with bounded second derivative. JureLckovFa (1977) obtains the same result under weaker conditions on , but requiring that the distribution function F has 4nite Fisher information. Yohai and Maronna (1979) also prove asymptotic normality, but under additional conditions on , which exclude Lp -estimation (p ¿ 1). JureLckovFa (1989) considers a very broad class of score functions, but still assumes the existence of a second-order derivative for F. The most recent results can be found in Bai et al. (1992), Rao and Zhao (1992), and Bai and Wu (1997), who (among other results) show that 2 √ (n) L −1 ˆ (4) n( − ) → Np 0; Q g(0)2 provided that the following assumptions hold: (A0) G(u) := E[ (e1 + u)] exists, and has a bounded derivative g in a neighborhood of zero, with g(0) ¿ 0, (A1) (Fisher consistency) E[ (e1 )] = 0, (A2) E[ 2 (e1 )] := 2 ¡ ∞, and n (A3) limn→∞ n−1 i=1 xi xi := Q where Q is a positive de4nite (p × p) matrix. In practice, however, assumption (A0) is often too strong. When, for instance, (x) = sgn(x), that is, if we are dealing with the L1 -estimate of , the assumption
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
233
g(0) ¿ 0 reduces to F (0) := f(0) ¿ 0. But, this does not hold in the nonregular case of a distribution function F with zero median and either a discontinuity at zero, or distinct left and right derivatives at zero. This is the case of the so-called scale-signed perturbation models, with distribution functions of the form (x=a) if x ¡ 0; F(x) := (x=b) if x ¿ 0; where stands for the standard normal distribution function, and a ¿ b ¿ 0. Clearly, the right- and left-hand derivatives of F exist, but they are not equal. The reader is referred to Chapter 5 of Ibragimov and Has’minskii (1981) for further examples of nonregular problems for which assumption (A0) is not satis4ed. The asymptotics of L1 -estimators under nonstandard conditions where (A0) does not hold have been investigated by Knight (1998) and Rogers (2001). To the best of our knowledge, Smirnov (1952) was the 4rst who catalogs the limiting distributions of the sample quantiles for i.i.d. observations. He identi4ed four possible types of limiting distributions and characterized their domains of attraction—see Knight (2000) for details. In this paper, we address the general case of M-estimators. Our results are covering, for instance, the -regression quantiles of Koenker and Bassett (1978): clearly, whenever the distribution function F is discontinuous at the population quantile of order , condition (A0) is not ful4lled, and classical results do not apply (see example (ia) below). In this paper, we investigate the limiting behavior of the regression M-estimators (2) when assumption (A0) does not hold. We mainly will require that admits the following representation: √ (A4) nE[ (e1 +rn−1 s)− (e1 )]=(s)+n (s), s ∈ R , for some nondecreasing sequence rn ↑ ∞ of positive real numbers, where is a strictly increasing function, and limn→∞ n (s) = 0 for all s. Note that need not be linear. Such a representation exists in most cases of practical interest; examples are: (ia) (regression quantiles) The regression quantile of order corresponds to the objective function %(x) := x (x), where (x) := − I [x ¡ 0] and 0 ¡ ¡ 1. If at the population F has positive one-sided derivatives, f+ and f− , respectively, √ quantile q() := F −1 (), then (A4) holds with rn = n and f+ (q())s if s ¿ 0; (s) = f− (q())s if s ¡ 0: (ib) (regression quantiles) Still in the context of regression quantiles, assume that the distribution of errors is two-sided Gamma, i.e., admits the density f(x) =
|x|#−1 exp(−|x|) ; 2$(#)
# ¿ 0:
Assumption (A4) then holds, with √ n(F(q() + rn−1 s) − (F(q())) = (s) + n (s);
(5)
234
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
where q() denotes the -quantile, rn = n#=2 , and 1 (s) = sgn(s)|s|# : 2$(# + 1) The two-sided Gamma distribution is a special case of a more general situation where the distribution function is locally nonlinear in a neighborhood of the speci4c quantile. (ii) (Lp -estimation) Lp -estimators correspond to the objective function %(x) := |x|p , p ¿ 1, with score function (x) := sgn(x)|x|p−1 . If we assume that G(u) − G(0) = % sgn(u)|u| (1 + (u)); where (u) → 0 as u → 0 and ¿ 0, assumption (A4) holds with rn = n−1=2
and
(s) = % sgn(s)|s| :
(iii) (Huber’s estimators) Assumption (A0) also may not hold in the case of Huber’s traditional estimators. Such estimators are obtained from score functions of the form x if |x| 6 k; (x) := k sgn(x) if |x| ¿ k; where k ¿ 0 is a 4xed constant. Under the uniform distribution 0 if x ¡ − k; x+k F(x) := if − k 6 x ¡ k; 2k 1 if x ¿ k: √ (A0) is not satis4ed, whereas (A4) holds with rn = n and −∞ if s ¡ − 2k; if |s| 6 2k; (s) = s ∞ if s ¿ 2k: (iv) (discontinuous score functions) Score functions with jump discontinuities, such as those of the form = 1 + 2 , considered in JureLckovFa (1983) and JureLckovFa and Sen (1996, p. 196), where 1 is an absolutely continuous function and 2 is a step function, also can be treated within the present context. The limiting distribution of the regression M-estimator ˆ(n) based on score functions for which (A4) holds is derived in Theorem 2.1; this limiting distribution is Gaussian if and only if admits a linear representation (i.e., de4ned in (A4), is linear). This asymptotic result ensures the consistency of ˆ(n) , at appropriate rate. Unfortunately, in the nonstandard case, the asymptotic result of Theorem 2.1 cannot be exploited for building, e.g., asymptotic con4dence regions, as it involves an unspeci4ed transformation. The bootstrap approximation of the limiting distribution of ˆ(n) therefore is considered in Section 3. This bootstrap approximation is shown to be
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
235
correct only when ˆ(n) itself is asymptotically normal, or along speci4c sequences mn of bootstrap sample sizes. This generalizes an earlier result obtained by Huang et al. (1996) for sample quantiles under i.i.d. models. 2. Limiting distribution of the M-estimators To establish the limiting distribution of the M-estimator ˆ(n) , we assume that (A1) – (A4) hold; in addition, we also need the following two assumptions:
u n (A5) Letting (u) := 0 (s) ds, limn→∞ n−1 i=1 (xi b) = '(b), b ∈ R p , where ' is a strictly convex real-valued function. (A6) There exist positive c and ) such that for any * ∈ (0; )), and any x, (x + *) − (x) 6 c: Note that '(b) may be in4nite provided that '(b) is 4nite for b in a open set. Clearly, (A5) complements (A3), since it also imposes a moment condition on the xi ’s. Assumption (A6) is a mild restriction on the score function , it holds when is bounded. If we take (x) = |x|p for some 1 6 p 6 2, then (x) = sgn(x)|x|p−1 and (A6) is automatically satis4ed. We then have the following result. Theorem 2.1. Assume that (A1) – (A6) hold. Then, L rn (ˆ(n) − )→ arg min (Z(b)); b∈R p
where Z(b) := −b Z + '(b), and Z ∼ Np (0; 2 Q).
The proof of Theorem 2.1 relies on convexity arguments, much in the same spirit as in Pollard (1991), Niemiro (1992), He and Shao (1996), and Knight (1998). Note indeed that n rn (n) −1 ˆ rn ( − ) = arg minp √ [%(ei − rn xi b) − %(ei )] : b∈R n i=1 Considering the sequence of random convex functions (mapping R p to R ) n rn Zn (:) : b → Zn (b) := √ [%(ei − rn−1 xi b) − %(ei )]; n i=1 the basic idea of the method consists in approximating Zn (:) by a convex function Z(:) which has a unique minimum, then showing that, for any k-tuple (b1 ; : : : ; bk ), we have L
(Zn (b1 ); : : : ; Zn (bk ))→(Z(b1 ); : : : ; Z(bk )) as n → ∞. Convexity then entails—see HjHrt and Pollard (1993, Lemmas 1 and 2), and Geyer (1996, Lemma 3)—that L rn (ˆ(n) − ) = arg min Zn (b)→ arg min Z(b); as n → ∞.
b∈R p
b∈R p
236
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
Proof. It is straightforward to see that n
n
1 rn (xi b) (ei ) + √ Zn (b) = − √ n i=1 n i=1
−rn−1 (xi b)
0
[ (ei + s) − (ei )] ds:
Decompose Zn into Zn (b) = Zn(1) (b) + Zn(2) (b); where n
Zn(1) (b)
1 := − √ (x b) (ei ) n i=1 i
Zn(2) (b)
rn := √ n i=1
and n
−rn−1 (xi b)
0
[ (ei + s) − (ei )] ds :=
n
Zni(2) (b):
i=1 L
By the central limit theorem, for all b ∈ R p , we have Zn(1) (b)→ − b Z, where Z ∼ N(0; 2 Q). As for Zn(2) (b), it follows from (A5) that lim E[Zn(2) (b)] = lim
n→∞
n→∞
n
E[Zni(2) (b)]
i=1 n
rn = lim √ n→∞ n i=1
−rn−1 (xi b)
0
E[ (ei + s) − (ei )] ds
n
1 (xi b) = '(b): n→∞ n
= lim
t=1
Furthermore, decomposing n
E[Zni(2) (b)] +
i=1
n
E[Zn(2) (b)]
into
[Zni(2) (b) − E[Zni(2) (b)]]
i=1
yields n
c E[(Zni(2) (b))2 ] 6 √ max |xi b|E[Zn(2) (b)]; n 16i6n i=1 √ by (A6). If '(b) ¡ ∞ and since, however, max16i6n |xi b| = o( n), it follows that Var[Zn(2) (b)] 6
P
Zn(2) (b) − E[Zn(2) (b)]→0 as n → ∞. In case '(b) = ∞, then
n Var[Zn(2) (b)] (2) (2) Zni (b) − E[Zni (b)] ¿ . 6 P .2 E(Zn(2) (b))2 i=1 c max16i6n |xi b| 6√ →0 n .2 E[Zn(2) (b)]
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
237
using (A6), thus with probability tending to one, Zn(2) (b) ¿ (1 − .)E[Zn(2) (b)] → ∞ = L '(b). Consequently, Zn (b)→Z(b), as n → ∞. Finally, Lemma 3 in Geyer (1996) implies convergence of all 4nite-dimensional distributions, and that the minimizer of Zn (b) converges to the minimizer of Z(b). This completes the proof. It follows from Theorem 2.1 that the limiting distribution of rn (ˆ(n) − ), in general, is not multivariate normal, and may be quite complicated—see, for instance, the examples given by Knight (1998). However, if Z is a quadratic function, that is, whenever ' is diJerentiable, with a linear gradient ∇', then rn (ˆ(n) − ) is asymptotL ically normal. More precisely, rn (ˆ(n) − )→Y, where Y is characterized by ∇'(Y) = Z ∼ Np ((0; 2 Q). Remark. The technique used here to derive the asymptotics of M-estimators is valid only for convex objective functions. In the case of nonconvex function , one may try to use other techniques, namely those based on empirical processes or Donsker classes and the related central limit theorems, see, for instance, Dudley (1978), Pollard (1985), Kim and Pollard (1990), and van de Geer (2000) for details. 3. Bootstrap approximation 3.1. Resampling residuals The theoretical result in Theorem 2.1 hardly can be exploited for practical purpose, since the function ', in applications, typically remains unknown. One may be tempted, therefore, by a bootstrap approach. The use of bootstrap techniques in the context of M-estimation has been considered by Bickel and Freedman (1981), Mammen (1989), Lahiri (1992), and several others. For a general discussion of resampling methods in linear models, the reader is referred to Wu (1986). 3.1.1. Bootstrap sample size n Letting jˆi := Yi − xi ˆ(n) , 1 6 i 6 n, denote by {j∗1 ; : : : ; j∗n } an i.i.d. sample of size n from Fˆ n , the empirical distribution function of the collection {jˆ1 ; : : : ; jˆn }. De4ne the resulting n bootstrap pseudo-observations Yi∗ := xi ˆ(n) + j∗i ; 1 6 i 6 n: (6) Then, the bootstrap M-estimator ˆ∗(n) of is de4ned as a solution of the minimization problem n ˆ∗(n) := arg minp %(Yi∗ − xi ): ∈R
ˆ∗(n)
i=1
Equivalently, can be obtained as a solution of n xi (Yi∗ − xi ) = 0; ∈ R p : i=1
238
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
Let z = (z1 ; : : : ; zp ) ∈ R p . The bootstrap estimator of Gn (z) := P{rn ( ˆj(n) − j ) 6 zj ; 1 6 j 6 p} is given by (n) − ˆj(n) ) 6 zj ; 1 6 j 6 p|Y1 ; : : : ; Yn } Gn∗ (z) := P{rn ( ˆ∗j
which, using the traditional notation P ∗ for bootstrap probabilities, we write as (n) − ˆj(n) ) 6 zj ; 1 6 j 6 p}: Gn∗ (z) := P ∗ {rn ( ˆ∗j
Theorem 3.1. Assume that (A1) – (A6) hold and that the inverse (∇')−1 of the gradient ∇' exists. Then, for all k-tuple (z1 ; : : : ; zk ), L (rn (ˆ(n) − ); Gn∗ (z1 ); : : : ; Gn∗ (zk ))→((∇')−1 (Z); G∗ (z1 ); : : : ; G∗ (zk ))
(7)
as n → ∞, where G∗ (z) := FZ (∇'(z + (∇')−1 (Z)) − Z) and FZ stands for the distribution function of Z. Proof. De4ne the sequence of random convex functions (mapping R p to R ) n
rn {%(j∗i − rn−1 xi b) − %(j∗i )}: Zn∗ (:) : b → Zn∗ (b) := √ n t=i ˆ(n) proof of Theorem 1, we have Zn∗ (b) = The minimizer of Zn∗ is rn (ˆ∗(n) − √ ).n As in the ∗ ∗ ∗ ∗ −b 'n + Dn (b), with 'n := 1= n i=1 xi (ji ) and −1 n n rn −rn (xi b) ∗ Dn∗ (b) := √ { (j∗i + s) − (j∗i )} ds := Dni (b): n i=1 0 i=1 De4ne Kn (s) := E{ (j∗1 + s)|Y1 ; : : : ; Yn }: We then have n i=1
n
rn ∗ E[Dni (b)] = √ n i=1 n
rn = √ n i=1 n
rn = √ n i=1
−rn−1 (xi b)
0
−rn−1 (xi b)
0
0
−rn−1 (xi b)
E{ (j∗1 + s) − (j∗1 )|Y1 ; : : : ; Yn } ds
{Kn (s) − Kn (0)} ds E[ {ji − xi (ˆ(n) − ) + s}] ds + oP (1)
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248 n
rn = √ n i=1
−rn−1 xi {b+rn (ˆ(n) −)}
−xi (ˆ(n) −)
239
E{ (ji + s) − (ji )} ds + oP (1)
P → '{b + rn (ˆ(n) − )} − '{rn (ˆ(n) − )}
(8)
uniformly over compact subsets of R p . As previously, one can easily show that n
P
∗ (b)}2 ]→0; E[{Dni
i=1
as n → ∞. Hence, Zn∗ (b) = −b '∗n + '{b + rn (ˆ(n) − )} − '{rn (ˆ(n) − )} + Rn (b); P
where supb∈C |Rn (b)|→0, for any compact subset C of R p . Next, we shall need the following nearness of argmins result due to HjHrt and Pollard (1993, Lemma 2). Lemma 3.1. Suppose that An is a sequence of convex random functions and is approximated pointwise, in probability, by a sequence of random functions Bn . Let an be the minimizer of An . If (i) Bn has a unique minimum bn with bn = OP (1), and (ii) for each j ¿ 0, {*n (j)}−1 = OP (1), where *n (j) := inf s−bn =j |Bn (s) − Bn (bn )|, P
then an − bn →0. Condition (ii) of Lemma 2 is satis4ed if the 4nite-dimensional distributions of Bn converge weakly to those of a random function B which has a unique minimum. Therefore, a direct application of this Lemma with An (b) = Zn∗ (b)
and
Bn (b) = −b '∗n + '{b + rn (ˆ(n) − )} − '{rn (ˆ(n) − )}
implies that arg minp (Zn∗ (b)) = (∇')−1 ['∗n + ∇'{rn (ˆ(n) − )}] − rn (ˆ(n) − ) + oP (1): b∈R
Note that Bn has a unique minimum since ' is strictly convex. Now, let d2 denote the p+1 ; Bp+1 ) such Mallows
2distance on the space of all probability measures P over (R p+1 that |x| dP ¡ ∞ (| · | here denotes the usual Euclidean norm in R ). This distance is de4ned as d2 (P; Q) = inf (E|X − Y|2 ); where the in4mum is taken over all pairs (X; Y) of random vectors such that X ∼ P and Y ∼ Q, the reader is referred to Bickel and Freedman (1981) and the references therein for a detailed discussion of the Mallows distance. We have d2 (P'∗∗n ; P'n )=oP (1) n as n → ∞, where 'n := √1n t=1 xi (ji ) ∼ P'n , and P'∗∗n stands for the bootstrap
240
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248 L
distribution of '∗n . Thus, since 'n →Z ∼ N(0; 2 ( ), for any sequence bn → b, P
P ∗ ('∗n 6 bn )→Z (b): Furthermore, ∇' is invertible, thus one-to-one. Hence, L
Gn∗ (z) = P ∗ {arg minp (Zn∗ (b)) 6 z}→Z [∇'{z + (∇')−1 (Z)} − Z]; b∈R
L since, from Theorem 1, rn (ˆ(n) −) →(∇')−1 (Z). The joint convergence in (7) follows immediately.
Clearly, if ' is quadratic, then its gradient ∇' is linear, and thus P
Gn∗ (z) → FZ (∇'(z)) = G(z); so that the bootstrap approximation is asymptotically correct. Hence, the approximation is asymptotically invalid, unless the limiting distribution of the M-estimator is normal. This asymptotic normality condition, however, in general does not hold in the nonregular case, as shown in Theorem 2.1. Theorem 3.1 generalizes a similar result by Huang et al. (1996) about bootstrapping sample quantiles under nonregular conditions (in the i.i.d. setting). 3.1.2. Bootstrap sample size mn Since the bootstrap approximation with bootstrap sample size n in general is not valid, one may try bootstrap sample sizes m = mn , in the spirit of Bickel and Freedman (1981), Politis and Romano (1994), Huang et al. (1996), and Bickel et al. (1997). Bickel and Sakov (2000) have shown that the choice of m such that m=n tends to zero for bootstrapping the sample median achieves a higher order asymptotic re4nement. Depending on the asymptotic behavior of mn , we obtain the following results. Theorem 3.2. Let BQ := {x ∈ R p : x Qx 6 2 }. Assume that (A1) – (A6) hold, and that mn → ∞ as n → ∞ (all limits below are taken as n → ∞): P
(i) if limn→∞ (rmn =rn ) = 0, then Gm∗ n (z) →G(z); a:s: (ii) if limn→∞ (rmn log log n=n) = 0, then Gm∗ n (z) →G(z); (iii) if limn→∞ (rmn =rn ) = c ¿ 0, then BQ 1 a:s: ∗ −1 : Gmn (z) → FZ (∇'(z + (∇') (x)) − x) such that x ∈ √ ∇' c 2 Proof. In the sequel, we write m instead of mn . For each b ∈ R p , de4ne n rm ∗ Zm∗ (:) : b → Zm∗ (b) := √ M [%(ei − rm−1 xi b) − %(ei )]: m t=1 mi ∗ Clearly, rm (ˆ(m) ∗ − ) minimizes Zm . To prove (i), just note that rm rm (ˆ(n) − ) = rn (ˆ(n) − ) → 0: rn
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
241
Consequently, following the same lines as in the proof of Theorem 3.1, one obtains P
Gm∗ (z)→FZ (∇'(z)) = G(z): To prove (ii), let sn := n=log log n and de4ne n sn Sn (b) := [%(ei − sn−1 xi b) − %(ei )]: n log log n i=1 Following the proof of Theorem 2.1, we have n 1 (xi b) (ei ) + '(b) + oP (1): Sn (b) = − n log log n i=1 By the Law of the Iterated Logarithm, see e.g., Berning (1979), we have n 1 a:s: xi (ei )→BQ : 2n log log n i=1 Therefore,
rm a:s: sn (ˆ(n) − )→0; rm (ˆ(n) − ) = sn
and (ii) follows. Finally, (iii) follows from the fact that √ a:s: rm (ˆ(n) − )→{c(∇')−1 ( 2x) : x ∈ BQ }: 3.2. Resampling pairs As known, there is a diJerent way of bootstrapping regression model, that is the so-called bootstrapping pairs. This method is less model dependent and is suitable when we suspect heteroscedasticity. For basic comparisons between the two methods, we refer to Efron and Tibshirani (1993). 3.2.1. Bootstrap sample size n We consider the bootstrap procedure where the data set Y := {(Y1 ; x1 ); : : : ; (Yn ; xn )} is sampled with replacement (sample size n). Denote by (Y1∗ ; x1∗ ); : : : ; (Yn∗ ; xn∗ ) the resulting bootstrap sample. The bootstrap M-estimator ˆ∗(n) is de4ned as a solution of the minimization problem n n ˆ∗(n) := arg minp %(Yi∗ − xi∗ ) := arg minp Mni∗ %(Yi − xi ); (9) ∈R
i=1
∈R
i=1
∗ ∗ ; : : : ; Mnn ) (Mn1
is a multinomial random vector with n trials and uniform cell where probability 1=n. Equivalently, ˆ∗(n) can be obtained as a solution of n Mni∗ xi (Yi − xi ) = 0; ∈ R p : (10) i=1
Let z = (z1 ; : : : ; zp ) ∈ R p . The bootstrap estimator of G(z) := P[rn (ˆj(n) − j ) 6 zj ; 1 6 j 6 p]
242
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
is given by (n) Gn∗ (z) := P[rn (ˆ∗j − ˆj(n) ) 6 zj ; 1 6 j 6 p|Y];
where Y denotes the original data set. We then have the following result. Theorem 3.3. Assume that (A1) – (A6) hold and that the inverse (∇')−1 of the gradient ∇' exists. Then, for all z ∈ R p , P
Gn∗ (z)→FZ (∇'(z + (∇')−1 (Z)) − Z) as n → ∞, where FZ stands for the distribution function of Z. Proof. De4ne the sequence of random convex functions (mapping R p to R ) n rn ∗ Zn∗ (:) : b → Zn∗ (b) := √ M [%(ei − rn−1 xi b − %(ei )]: n i=1 ni The minimizer of Zn∗ is rn (ˆ∗(n) − ). As in the proof of Theorem 2.1, one readily obtains Zn∗ (b) = −b 'n − b '∗n + '(b) + oP (1); with n
1 xi (ei ) 'n := √ n i=1 and '∗n
n
1 ∗ := √ (Mni − 1)xi (ei ): n i=1
Lemma 3.1 entails arg minp (Zn∗ (b)) = (∇')−1 ('n + '∗n ) + oP (1): b∈R
L
Following Knight (1989), it is straightforward to see that '∗n → Z; decomposing rn (ˆ∗(n) − ) into rn (ˆ∗(n) − ˆ(n) ) + rn (ˆ(n) − ), we have Gn∗ (z) = P[arg min(Zn∗ ) 6 z + rn (ˆ(n) − )] = P['∗n 6 ∇'(z + (∇')−1 (Z)) − 'n ] + oP (1) P
→ FZ (∇'(z + (∇')−1 (Z)) − Z): Clearly, if ' is quadratic, then its gradient ∇' is linear, and thus P
Gn∗ (z)→FZ (∇'(z)) = G(z); so that the bootstrap approximation is asymptotically correct. Hence, the approximation is asymptotically invalid, unless the limiting distribution of the M-estimator is
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
243
normal. This asymptotic normality condition, however, in general does not hold in the nonregular case, as shown in Theorem 2.1. Theorem 3.1 generalizes a similar result by Huang et al. (1996) about bootstrapping sample quantiles under nonregular conditions (in the i.i.d. setting). 3.2.2. Bootstrap sample size mn Again the bootstrap approximation with bootstrap sample size n is not valid, we then bootstrap sample sizes m = mn . Depending on the asymptotic behavior of mn , we obtain the following results. Theorem 3.4. Let BQ := {x ∈ R p : x Qx 6 2 }. Assume that (A1) – (A6) hold, and that mn → ∞ as n → ∞ (all limits below are taken as n → ∞): P
(i) if limn→∞ (rmn =rn ) = 0, then Gm∗ n (z)→ G(z); a:s: (ii) if limn→∞ (rmn log log n=n) = 0, then Gm∗ n (z)→G(z); and (iii) if limn→∞ (rmn =rn ) = c ¿ 0, then BQ 1 a:s: −1 ∗ : Gmn (z)→ FZ (∇'(z + (∇') (x)) − x)such that x ∈ √ ∇' c 2 Proof. The proof goes along the same lines as in Theorem 3.2, we then skip the details. The consistency (strong or weak) of the bootstrap thus is by no means guaranteed, unless rmn is o(n=log log n) or o(rn ). 4. Simulation results Two simulation studies have been conducted from model (1) Yi = 0 + 1 xi + ji ;
1 6 i 6 n;
(11)
with xi ’s 4xed design points generated from a uniform distribution on [0; 1] and i.i.d. innovations ji ’s. (a) The 4rst simulation study deals with the L1 -estimator of = ( 0 ; 1 ). Let the innovation density in (11) be 2 1 if − 6 x ¡ 0; 3 8 1 f(a) (x) = if 0 6 x 6 ; 8 3 0 elsewhere: The standard assumption (A0) is not satis4ed, since f√ (a) has a discontinuity jump at 0, but assumptions (A1) – (A3) hold, with rn = n. Two parameter values
244
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
Table 1 Resampling residuals; = (0:5; 1); innovation density f(a) ; L1 -estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.2335 (0.130) 0.2735 (0.187) 0.1875 (0.145) 0.3135 (0.161) 0.2745 (0.152)
22
0.1135 (0.078) 0.1085 (0.088) 0.1125 (0.090) 0.0785 (0.064) 0.0335 (0.036)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0655 (0.050) 0.055 (0.032) 0.0125 (0.033) 0.0200 (0.018) 0.0080 (0.027)
Table 2 Resampling residuals; = (0:5; −1); innovation density f(a) ; L1 -estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.3435 (0.123) 0.4235 (0.201) 0.2275 (0.141) 0.3055 (0.163) 0.3045 (0.171)
22
0.1240 (0.089) 0.1122 (0.093) 0.1075 (0.085) 0.0875 (0.071) 0.0540 (0.048)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0650 (0.043) 0.040 (0.041) 0.0225 (0.030) 0.0175 (0.029) 0.0090 (0.026)
( = (0:5; 1) and (0:5; −1)), 4ve series lengths (n = 100; 150; 200; 300; and 500), and three sequences of bootstrap sample sizes (mn = n; mn = n=log n; and mn = n=log log n) have been considered. For each combination of a parameter value and a series length n, N = 1000 replications have been simulated; for each of them, the L1 -estimator ˆ(n) , and the bootstrap estimators associated with each of the three sample sizes mn have been computed. The empirical distributions of these distribution √ estimators have been used in order to evaluate the ∗“exact” √ ˆ(n) ˆ Gn of n(ˆ(n) − ) and the bootstrap distribution functions Gm of m n (∗ − ) n ∗ over a regular grid of 500 points. The L∞ distances sup |Gn − Gmn | have been computed for each of the three bootstrap sample sizes mn . This whole process has been repeated 2000 times, and the L∞ distances have been averaged over these 2000 iterations. The results corresponding to the resampling residuals technique are shown in Tables 1 and 2, while those corresponding to the resampling pairs technique are shown in Tables 3 and 4.
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
245
Table 3 Resampling paris; = (0:5; 1); innovation density f(a) ; L1 -estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.3315 (0.127) 0.1375 (0.176) 0.1785 (0.137) 0.2460 (0.167) 0.3525 (0.171)
22
0.1020 (0.053) 0.1065 (0.082) 0.1040 (0.079) 0.0665 (0.054) 0.0275 (0.026)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0655 (0.050) 0.045 (0.036) 0.0110 (0.027) 0.0185 (0.013) 0.0050 (0.019)
Table 4 Resampling pairs; = (0:5; −1); innovation density f(a) ; L1 -estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.2735 (0.136) 0.3230 (0.192) 0.3275 (0.134) 0.2065 (0.151) 0.2065 (0.167)
22
0.1135 (0.077) 0.1080 (0.087) 0.0995 (0.056) 0.0775 (0.062) 0.0435 (0.046)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0555 (0.041) 0.045 (0.039) 0.0195 (0.024) 0.0160 (0.023) 0.0085 (0.021)
(b) The second simulation study addresses the case, described in example (ii) of Section 1, of the Huber estimator of (with k = 1), under the uniform density 1 if − 1 6 x 6 1; f(b) (x) = 2 0 elsewhere: √ Here again, (A0) fails to hold, but (A1) – (A3) are satis4ed, with rn = n. The same simulations have been conducted as in (a); results corresponding to the resampling residuals technique are shown in Tables 5 and 6, while those corresponding to the resampling pairs technique are shown in Tables 7 and 8. ∗ Both simulation studies corroborate the fact that Gn − Gm does not converge to zero n ∗ for mn = n. Note that, in all tables, the approximation of Gn by Gm is uniformly better n for mn = n=log log n than for mn = n=log n. Given the variability of the estimates of the mean supremum distance, we also note a little bit superiority for the resampling pairs technique.
246
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
Table 5 Resampling residuals; = (0:5; 1); innovation density f(a) ; Huber estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.3360 (0.142) 0.2555 (0.177) 0.1665 (0.138) 0.3135 (0.161) 0.2065 (0.162)
22
0.1220 (0.069) 0.1125 (0.078) 0.1095 (0.083) 0.0880 (0.070) 0.0335 (0.040)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0655 (0.050) 0.060 (0.031) 0.0108 (0.027) 0.0245 (0.021) 0.0080 (0.033)
Table 6 Resampling residuals; = (0:5; −1); innovation density f(a) ; Huber estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.2430 (0.130) 0.3260 (0.187) 0.2380 (0.155) 0.2150 (0.157) 0.2115 (0.166)
22
0.1345 (0.091) 0.1205 (0.103) 0.1065 (0.088) 0.0875 (0.080) 0.0535 (0.049)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0745 (0.049) 0.0550 (0.039) 0.0320 (0.044) 0.0180 (0.034) 0.0095 (0.025)
Table 7 Resampling pairs; = (0:5; 1); innovation density f(a) ; Huber estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.2310 (0.132) 0.2210 (0.155) 0.2385 (0.143) 0.2635 (0.172) 0.2520 (0.157)
22
0.1015 (0.054) 0.1034 (0.063) 0.1020 (0.072) 0.0450 (0.048) 0.0230 (0.023)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0475 (0.042) 0.035 (0.031) 0.0105 (0.023) 0.0165 (0.011) 0.0045 (0.014)
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
247
Table 8 Resampling pairs; = (0:5; 1); innovation density f(a) ; Huber estimator mn = n
∗ | sup |Gn − Gm n
mn = n=log n
∗ | sup |Gn − Gm n
100
0.2565 (0.145) 0.2235 (0.186) 0.2270 (0.143) 0.2205 (0.155) 0.1660 (0.117)
22
0.1095 (0.065) 0.1065 (0.074) 0.0870 (0.047) 0.0580 (0.053) 0.0340 (0.031)
150 200 300 500
30 38 53 80
mn = n=log log n 65 93 120 172 274
∗ | sup |Gn − Gm n
0.0475 (0.038) 0.040 (0.042) 0.0180 (0.022) 0.0140 (0.018) 0.0050 (0.017)
Acknowledgements The author thanks the Associate Editor and two anonymous referees for their careful reading of the manuscript and helpful comments and suggestions. References Bai, Z.D., Wu, Y., 1997. General M -estimation. J. Multivariate Anal. 63, 119–135. Bai, Z.D., Chen, X.R., Wu, Y., 1992. M-estimation of multivariate parameters under a convex discrepancy functione. Statist. Sinica 2, 237–254. Berning Jr., J.A., 1979. On the multivariate law of the iterated logarithm. Ann. Probab. 7, 980–988. Bickel, P., Freedman, D.A., 1981. Some asymptotic theory for the bootstrap. Ann. Statist. 9, 1196–1217. Bickel, P.J., GWotze, F., van Zweet, W.R., 1997. Resampling fewer than n observations: gains, losses, and remedies for losses. Statist. Sinica 7, 1–32. Bickel, P., Sakov, A., 2000. An Edgeworth expansion of the m out of n bootstrapped median. Statist. Probab. Lett. 49, 217–223. Dudley, R.M., 1978. Central limit theorems for empirical measures. Ann. Statist. 6, 899–929. Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. In: Monographs on Statistics and Applied Probability, Vol. 57. Chapman & Hall, London. Geyer, C.J., 1996. On the asymptotics of convex stochastic optimization. University of Minnesota, unpublished manuscript. He, X., Shao, Q.M., 1996. General Bahadur representation of M-estimators and its application to linear models with nonstochastic design. Ann. Statist. 24, 2608–2630. HjHrt, N.L., Pollard, D., 1993. Asymptotics for minimizers of convex processes. Statistical Research Report, University of Oslo. Huang, J.S., Sen, P.K., Shao, J., 1996. Bootstrapping a sample quantile when the density has a jump. Statist. Sinica 6, 299–309. Huber, P.J., 1973. Robust regression: asymptotics, conjectures and Monte-Carlo. Ann. Statist. 1, 799–821. Ibragimov, I.A., Has’minskii, R.Z., 1981. Statistical Estimation. Asymptotic Theory. Springer, New York. JureLckovFa, J., 1977. Asymptotic relations of M-estimates and R-estimates in linear model. Ann. Statist. 5, 464–472. JureLckovFa, J., 1983. Asymptotic behavior of M-estimators of location in nonregular cases. Statist. Decisions 1, 323–340. JureLckovFa, J., 1989. Consistency of M-estimators of vector parameters. In: Mandl, P., HuLskovEa, M. (Eds.), Proceedings of the Fourth Symposium on Asymptotic Statistics, Charles University, Prague, 305 –312.
248
F. El Bantli / Journal of Statistical Planning and Inference 121 (2004) 231 – 248
JureLckovFa, J., Sen, P.K., 1996. Robust Statistical Procedures. Asymptotics and Interrelations. Wiley, New York. Kim, J., Pollard, D., 1990. Cube root asymptotics. Ann. Statist. 18, 191–219. Knight, K., 1989. On the bootstrap of the sample mean in the in4nite variance case. Ann. Statist. 17, 1168–1175. Knight, K., 1998. Limiting distributions for L1 regression estimators under general conditions. Ann. Statist. 26, 755–770. Knight, K., 2000. What are the limiting distributions of quantiles estimators? University of Toronto, preprint. Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46, 33–50. Lahiri, S.N., 1992. Bootstrapping M -estimators of a multiple linear regression parameter. Ann. Statist. 20, 1548–1570. Mammen, E., 1989. Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist. 17, 382–400. Niemiro, W., 1992. Asymptotics for M -estimators de4ned by convex minimization. Ann. Statist. 20, 1514–1533. Politis, D.N., Romano, J.P., 1994. Large sample con4dence regions based on subsamples under minimal assumptions. Ann. Statist. 22, 2031–2050. Pollard, D., 1985. New ways to prove central limit theorems. Econometric Theory 1, 295–314. Pollard, D., 1991. Asymptotics for least absolute deviation regression estimators. Econometric Theory 7, 186–199. Rao, C.R., Zhao, L.C., 1992. Linear representation of M -estimates in linear models. Canad. J. Statist. 20, 359–368. Rogers, A., 2001. LAD estimation under nonstandard conditions. Econometric Theory 17, 820–852. Smirnov, N.V., 1952. Limit distributions for the terms of a variational series. Amer. Math. Soc. Trans. (1), 11, 82–143. van de Geer, S., 2000. Empirical Processes in M-estimation. Cambridge University Press, Cambridge. Wu, C.F.J., 1986. Jackknife, bootstrap, and other resampling methods (with discussion). Ann. Statist. 14, 1261–1295. Yohai, V.J., Maronna, R.A., 1979. Asymptotics behavior of M-estimators for the linear model. Ann. Statist. 7, 248–268.