Journal of Statistical Planning and Inference 88 (2000) 189–203
www.elsevier.com/locate/jspi
Some contributions to M-estimation in linear models Lincheng Zhao Department of Statistics and Finance, University of Science and Technology of China, Hefei, People’s Republic of China Received 12 December 1997; received in revised form 6 May 1998
Abstract In this paper, we brie y survey some contributions to asymptotic theory on M-estimation in c 2000 Elsevier Science a linear model as well as on the relevant test criteria in ANOVA. B.V. All rights reserved. MSC: 62J05; 62F12; 62F05 Keywords: Linear models; M-estimation; Least absolute deviations (LAD) estimation; Analysis of variance; Asymptotics
1. Introduction As a general approach on statistical data analysis, asymptotic theory of M-estimation in regression models has received extensive attention. In recent years, the author and some of his coworkers worked on this eld and obtained some new results. In this paper we brie y introduce some of them and the related work in the literature. As a special case, the minimum L1 -norm (ML1 N) estimation, also known as the least absolute deviations (LAD) estimation, plays an important role and is of special interest. Considering this point, we will pay much attention to them as well. Consider the linear model Yi = xi0 + ei ;
i = 1; : : : ; n;
(1)
where xi is a known p-vector, is the unknown p-vector of regression coecients and ei is an error variable. We shall assume e1 ; : : : ; en are i.i.d. variables with a common E-mail address:
[email protected] (L. Zhao). This work is partially supported by National Natural Science Foundation of China (19631040), Ph. D. Program Foundation of National Education Committee of China and Special Foundation of Academic Sinica.
c 2000 Elsevier Science B.V. All rights reserved. 0378-3758/00/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 0 7 8 - 1
190
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
distribution function F throughout this paper unless there is some other statement. An M-estimate ˆn of is de ned by minimizing P (2) (Yi − xi0 ) for a suitable function , or by solving an estimating equation of the type P (3) (Yi − xi0 )xi = 0 P Pn for a suitable function . Hereafter for simplicity we always write for i=1 . The well-known least-squares (LS) estimate and the LAD estimate of can be obtained by taking (u) = u2 and (u) = |u|, respectively. Especially, the LAD estimate of is de ned as any value ˜n which satis es P P (4) |Yi − xi0 ˜n | = min |Yi − xi0 |:
There is considerable literature on the asymptotic theory of M-estimation, starting with the seminal work of Huber (1973) (see Huber (1981) for details and relevant references to earlier work). References to recent work on M-estimation can be found in Bai et al. (1992), Rao and Toutenburg (1995), Chen and Zhao (1996), and Jureckova and Sen (1996). Throughout this paper, we assume that is a nonmonotonic convex function, and is a non-trivial nondecreasing function, and that p is xed as n → ∞. For the cases when p = pn → ∞ as n → ∞, refer to Portnoy (1984,1985) and Welsh (1989). Yohai and Maronna (1979) studied both the cases. Write P d2n = max xi0 Sn−1 xi : (5) Sn = xi xi0 ; 16i6n
Hereafter we assume that Sn0 ¿ 0 for some integer n0 and that n¿n0 .
2. Weak consistency Zhao et al. (1993) studied the (weak) consistency of M-estimate ˆn of de ned by minimizing (2), and established the following: Theorem 1. In model (1) we assume that (A1) is a convex function on R1 with right and left derivatives Choose such that − (u)6
(u)6
+ (u)
+ (·)
and
− (·).
for all u ∈ R1 :
(A2) E (e1 ) = 0 and there exist positive constants c and such that |E (e1 + u)|¿c|u| for |u|6
(6)
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
191
and E
2
(e1 ± ) ¡ ∞:
(7)
Sn−1
→ 0 as n → ∞. ˆ Then n is consistent for . (A3)
In the special case of LAD estimates, the following theorem follows as a corollary to Theorem 1. Theorem 2. Let ˜n be an LAD estimate of deÿned by (4). Assume that zero is a median of F and there exist positive constants c and such that |1 − 2F(u)|¿c|u| for |u|6: If (A3) holds; then ˜n is consistent.
(8)
It is worthwhile to point out that, the continuity of F(u) at u = 0 is not assumed in Theorem 2. For proving this theorem, we need only to take (u) = sign(u) + (1 − |sign(u)|) such that E (e1 ) = 0 by choosing some ∈ [ − 1; 1]. (Also refer to Chen and Zhao, 1996.) It may be noted that when studying the weak consistency of an M-estimate or an LAD estimate of , it is assumed in most of the earlier papers that dn → 0
as n → ∞;
(9)
which is stronger than (A3) (see Chen et al., 1990; Pollard, 1991; Bai et al., 1992). Now, we consider the same problem for an M-estimate ˆn de ned by an estimating equation such as (3). In this direction, Yohai and Maronna (1979) proved the weak consistency of ˆn under condition (A3), but one of the additional conditions imposed on and F is somewhat severe and it excludes some important cases suc has the LAD estimate. Rao and Zhao (1992a) got rid of this condition and proved the consistency of ˆn under condition (A3) and other mild conditions, where ˆn satis es the more general condition −1=2 P (10) (Yi − x0 ˆ )xi = op (1) as n → ∞ S n
i n
with || · || denoting the Euclidean norm of a vector. Note that there may exist ˆn satisfying (10) even if Eq. (3) has no solution at all in some circumstances. Theorem 3. Assume that is nondecreasing; and (A2) and (A3) are satisÿed; and let ˆn satisfy condition (10). Then ˆn is a consistent estimate of . In the above theorems, condition (A3) plays a prominent role for the weak consistency of an M-estimate of . A well known result for the LS estimation is that, in model (1) with E(e1 ) = 0;
0 ¡ Var(e1 ) ¡ ∞;
192
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
(A3) is necessary and sucient (n. & s.) for both weak and strong consistencies of the LS estimate of (refer to Lai et al., 1979; Drygas, 1976; Chen, 1979). Motivated by this fact, we may propose a conjecture on the necessity of (A3) for the weak consistency of an M-estimate ˆn de ned by minimizing (2) as follows. Conjecture. Assume that in model (1); condition (7) holds and there exist positive constants c and such that |E (e1 + u)|6c|u| for |u|6:
(11)
Then (A3) is necessary for the weak consistency of ˆn . However, this is still unsolved. For the LAD estimate ˜n of , Chen et al. (1993) and Chen et al. (1995) obtained some results in this direction. Theorem 4. Assume that med(e1 ) = 0; {xi } is bounded; and there exists a ¿ 0 such that the derivative function f of F is positive and absolutely continuous in [ − ; ]; and Z (f0 (u))2 du ¡ ∞: −
Then (A3) is n. & s. for the weak consistency of ˜n . Theorem 5. Assume dimension p = 1 in model (1); and there exist positive constants c1 ; c2 and such that c1 |u|6| 12 − F(u)|6c2 |u| for |u|6: Then (A3) is an n. & s. condition for the weak consistency of ˜n . Theorem 6. Assume that in model (1); x1 ; x2 ; : : : are not all zeros; e1 ; e2 ; : : : are independent; Fi is the distribution function of ei ; and there exist positive constants c and such that | 12 − Fi (u)|6c|u| for |u|6; i = 1; 2; : : : : P∞ Then the condition i=1 ||xi ||2 = ∞ is necessary for the weak consistency of ˜n . 3. Strong consistency When dealing with the strong consistency of an M-estimate or an LAD estimate of , it is assumed in most of the earlier papers that Sn =n → Q;
(12)
where Q is a positive-de nite matrix. Wu (1988) obtained some results of a more general nature for the strong consistency of an LAD estimate ˜n of , but still assumed
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
193
some cumbersome and unnecessary conditions. Chen et al. (1992) greatly improved these results by establishing the following general theorem. Theorem 7. Suppose that in model (1); e1 ; e2 ; : : : are independent; med(ei ) = 0 for each i; and there exist positive constants c and such that P(−h ¡ ei ¡ 0)¿ch6P(0 ¡ ei ¡ h) for any h ∈ (0; ) and i = 1; 2; : : : . Then the following assertions are true. (a) d2n = o(1=log n) ⇒ ˜n → a.s. as n → ∞. (b) d2n = O(1=n) ⇒ ˜n tends to exponentially in the sense that for any given ¿ 0; there exists a constant c ¿ 0 such that P(|| ˜n − ||¿) = O(e−cn ): This theorem relates the strong convergence of ˜n simply and exclusively to the order of d2n . An example in this paper shows that these conditions on d2n are by no means necessary. However, as pointed out by Theorem 3:4 in Chen and Zhao (1996), if the condition d2n =o(1=log n) is weakened as d2n =O(1=log n); ˜n may not be strongly consistent. For the strong consistency of an M-estimate ˆn de ned by minimizing (2), Chen and Zhao (1995) obtained the following theorems. Theorem 8. Suppose that in model (1); e1 ; e2 ; : : : are independent; and there exist positive constants c and such that E{(ei + u) − (ei )}¿cu2
for |u|6;
and i = 1; 2; : : : :
(13)
Furthermore; assume that one of the following two conditions is satisÿed: (i) There exists a constant M ¡ ∞ such that |ei |6M; for each i. (ii) satisÿes the Lipschitz condition; or equivalently; + is bounded. Then ˆn is a strongly consistent estimate of provided d2n = o(1=log n): Theorem 9. Suppose that in model (1); e1 ; e2 ; : : : are independent; (13) holds and E|
+ (ei
± )|m 6hm ¡ ∞;
i; m = 1; 2; : : : ;
where ¿ 0; hm ¿ 0; m = 1; 2; : : : are constants. If for some constant ¿ 0; d2n = O(n− ); we have ˆn → a.s. as n → ∞. When is not necessary to be convex, the strong consistency of an M-estimate of de ned by minimizing (2) was discussed in Chen and Wu (1988) (also refer to Rao and Toutenburg, 1995; Chen and Zhao, 1996).
194
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
4. Asymptotic normality Many authors have attempted to give a proof that the LAD estimate ˜n of , when suitably standardized, tends to normal N(0; Ip ) in distribution as n → ∞, where Ip is the identity matrix of order p. First such an attempt was made by Bassett and Koenker (1978). References are also made to Amemiya (1982), Bloom eld and Steiger (1983), Dupacova (1987) and others. For detailed comments on these references, see for instance Chen et al. (1990), Rao and Toutenburg (1995), and Chen and Zhao (1996). Chen et al. (1990) (also refer Bai et al. (1987), the pre-printed version) gave a rigorous proof of the asymptotic normality of ˜n under very general conditions. In the special case that e1 ; e2 ; : : : are i.i.d., their result can be formulated as follows. Theorem 10. Suppose that in model (1); med(e1 ) = 0 and the following conditions are satisÿed: (i) F 0 (u) exists for u in some vicinity of 0; and is continuous at 0 and f(0) = 0 F (0) ¿ 0. (ii) dn → 0 as n → ∞. Then as n → ∞; 2f(0)Sn1=2 ( ˜n − ) → N(0; Ip )
in distribution:
This result was also obtained in the later work of Pollard (1991). As pointed out in Theorem 4:3 in Chen and Zhao (1996), condition (i) can be replaced by the following weaker one: (i0 ) The derivative f(0) = F 0 (0) exists and f(0) ¿ 0. For establishing the asymptotic normality of M-estimation of regression coecients, Bai et al. (1992) considered the following multivariate linear model: Yi = Xi0 + Ei ;
i = 1; : : : ; n;
(14)
where Ei are iid p-vectors with a common distribution F; Xi are m × p given matrices. In model (14), we are interested in the M-estimate ˆn of de ned by minimizing P (Yi − Xi0 ) (15) for a given convex function of p variables in this subsection. Let (u) be a choice of a subgradient of at u = (u1 ; : : : ; up )0 . (A p-vector (u) is said to be a subgradient of at u, if (z)¿(u) + (z − u)0 (u) for any z ∈ Rp .) Note that if is dierentiable at u according to the usual de nition, has a unique subgradient at u and vice versa. In this case, 0 @ @ ;:::; : (u) = ∇(u) =: @u1 @up Denote by D the set of points where is not dierentiable. This is, in fact, the set of points where is discontinuous, which is the same for all choice of . It is well
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
195
known that D is topologically an F set of Lebesgue measure zero (refer Rockafellar, 1970, Section 25, p. 218). Bai et al. (1992) made the following assumptions: (M1) F(D) = 0. (M2) E (E1 + u) = u + o(||u||) as ||u|| → 0, where ¿ 0 is a p × p constant matrix. (M3) E|| (E1 + u) − (E1 )||2 exists for all suciently small ||u||, and is continuous at u = 0. (M4) E (E1 ) 0 (E1 ) =: ¿ 0. P (M5) Sn = Xi Xi0 ¿ 0 for n large, and d2n =: max tr(Xi0 Sn−1 Xi ) → 0 16i6n
as n → ∞:
(16)
They established the following: Theorem 11. Under assumptions (M1)–(M5); Tn−1=2 Kn ( ˆn − ) → N(0; Ip ); where Tn =
P
Xi Xi0 ;
Kn =
P
in distribution; as n → ∞;
Xi Xi0 :
(17)
(18)
For the case when ˆn satis es (10) in model (1), the asymptotic normality was discussed in Chen and Zhao (1996).
5. M-Tests In many situations we are interested in testing the linear hypothesis H0 : H 0 ( − b) = 0
against H1 : H 0 ( − b) 6= 0
(19)
in model (1), where H 0 and b are known q × p matrix of rank q and p-vector, respectively (0 ¡ q ¡ p). Put P P (20) (Yi − xi0 ) =: (Yi − xi0 n∗ ) =: Mn∗ ; inf ∈H0
inf
∈Rp
P P (Yi − xi0 ) =: (Yi − xi0 ˆn ) =: Mˆ n :
(21)
To test hypothesis (19), Schrader and Hettmansperger (1980) studied the asymptotic distribution of Mn∗ − Mˆ n under H0 and the three conditions of Huber (1973), where is assumed to be a nonmonotonic convex function and to possess bounded derivatives of suciently high order. Based on this, they proposed some related test statistics. McKean and Schrader (1987), Koenker (1987) and Bai et al. (1990) studied the case where (u) = |u|.
196
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
There are also other test criteria, for example, Wald’s test criterion Wn , Rao’s score-type criterion Rn (refer Rao, 1948) and a similar criterion R∗n studied by Sen (1982): ˆ 0 Hn H 0 (n) ˆ = ( ˆn − b)0 H (H 0 Sn−1 H )−1 H 0 ( ˆn − b); Wn = (n) n Rn = n ( n∗ )0 n ( n∗ ); R∗n = n ( n∗ )0 Hn Hn0 n ( n∗ );
(22)
where n∗ and ˆn are de ned in (20) and (21), and ˆ (n) = Sn1=2 ( ˆn − b); Hn = Sn−1=2 H (H 0 Sn−1 H )−1=2 : p × q; n ( n∗ ) = Sn−1=2
P
xi (Yi − xi0 n∗ )
(23)
and Hn0 Hn = Iq . The case considered by Sen (1982) is a canonical form with H 0 = (0; Iq ), and n∗ in the expression of R∗n is de ned as a solution of equation P 0 K xi (Yi − xi0 ) = 0 under restraint H 0 = 0, where K 0 = (Ip−q ; 0) is a (p − q) × p matrix. To establish the asymptotic chi-square distribution of R∗n under H0 , he imposed a lot of restrictions on ; F, and {xi }. Some improvement was made in Singer and Sen (1985). Under some mild conditions, Zhao and Chen (1991) established the limiting distributions of Mn∗ − Mˆ n and R∗n under the following local alternatives: H 0 ( − b) = H 0 !n ;
(24)
where !n : p × 1 satis es ||Sn1=2 !n || = O(1)
as n → ∞:
(25)
They also studied the estimates of the nuisance parameters which will be given below. Chen and Zhao (1996) revisited this problem, and introduced the following assumptions in model (1): (B1) (·) is a convex function on R1 with right and left derivatives + (·) and − (·), and (·) is a function such that − (u)6 (u)6 + (u) for all u ∈ R1 . (B2) E (e1 + u) = u + o(||u||) as u → 0, where ¿ 0 is a constant. (B3) 0 ¡ E 2 (e1 ) =: 2 ¡ ∞, and lim E( (e1 + u) − (e1 ))2 = 0:
u→0
(B4) d2n =: max16i6n xi0 Sn−1 xi → 0 as n → ∞.
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
197
They proved the following. Theorem 12. Assume that in model (1); (B1)–(B4) and (24); (25) are satisÿed. Then 2−2 (Mn∗ − Mˆ n ) = 2 −2 Wn + op (1) = −2 Rn + op (1) = −2 R∗n + op (1) 2 P = −1 Hn0 xin (ei ) + −1 n + op (1);
(26)
where xin = Sn−1=2 xi ;
n = Hn0 Sn1=2 !n : q × 1:
By the Lindeberg theorem, P −1 Hn0 xin (ei ) → N(0; Iq )
in distribution:
Therefore, under the conditions of Theorem 12, 2−2 (Mn∗ − Mˆ n ); 2 −2 Wn ; −2 Rn and −2 R∗n are all asymptotically equivalent to a noncentral 2 random variable with q degrees of freedom and noncentrality parameter n = 2 −2 0n n = 2 −2 !n0 Sn1=2 Hn Hn0 Sn1=2 !n ; and each of them converges in distribution to q2 if ||Sn1=2 !n || → 0 as n → ∞. To estimate 2 , the estimate P 2 ˆ2n = n−1 (Yi − xi0 ˆn )
(27)
was suggested. To estimate , take h = hn ¿ 0 such that hn =dn → ∞;
hn → 0;
and
lim inf nh2n ¿ 0 n→∞
(28)
and de ne
P ˆn = (2nh)−1 { (Yi − xi0 ˆn + h) − (Yi − xi0 ˆn − h)}:
(29)
Theorem 13. Suppose that in model (1); (B1)–(B4) and (28) are met; and is the true parameter. Then ˆ2n → 2 ; ˆn → ;
in pr:; in pr: as n → ∞:
Refer to Chen and Zhao (1996). −2 −2 ∗ ∗ ˆ2 −2 ˆ It is easily seen, we can take 2ˆn ˆ−2 n (Mn − M n ); n ˆn Wn ; ˆn Rn and ˆn Rn as test statistics for testing hypothesis (19), and under the conditions of Theorem 13, (26) is still true when and 2 are replaced by ˆn and ˆ2n , respectively. Note that the limiting distributions of Mn∗ − Mˆ n and Wn under H0 were also obtained by Bai et al. (1992), who considered these two test criteria in the more general multivariate linear model (14). Now we consider a special case of model (14), the standard multivariate linear model Yi = B0 xi + Ei ;
i = 1; : : : ; n;
(30)
198
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
where Ei are i.i.d. p-vectors with a common distribution F; xi are given m-vectors, B is an unknown m × p matrix of regression coecients. It is interesting to test the hypothesis H0 : H 0 B = C0 ;
(31)
where H and C0 are known m × q matrix of rank q and q × p matrix. By using M-method, Bai et al. (1993) developed MANOVA-type analysis leading to some test criteria based on the roots of a determinantal equation for testing H0 . Denote by Bˆ n and Bn∗ any values of B which minimize P (32) (Yi − B0 xi ); respectively, without any restraint and subject to restraint (31), where is a given convex function of p variables. As before, let (u) be a choice of a subgradient of at u ∈ Rp , and assume that (M1) – (M4) and the following (M50 ) are satis ed. P (M50 ) Sn = xi xi0 ¿ 0 for large n and d2n =: max xi0 Sn−1 xi → 0 16i6n
as n → ∞:
For testing H0 , Bai et al. (1993) proposed two alternative test criteria. One is based on the roots of the determinantal equation −1 −1 |Wn − ˆ n ˆ n ˆ n | = 0;
(33)
Wn = (H 0 Bˆ n − C0 )0 (H 0 Sn−1 H )−1 (H 0 Bˆ n − C0 )
(34)
where is the Wald-type statistic, and (ˆ n ; ˆ n ) is a consistent estimate of (; ), the matrix parameters are de ned in (M2) and (M4), respectively. Another test is based on the roots of the determinantal equation |Rn − ˆ | = 0; where Rn = (Bn∗ )0 Sn−1 (Bn∗ ) with (B) =
(35) P
xi
0
(Yi − B0 xi )
(36)
is the Rao’s score-type statistic, and ˆ n is a consistent estimate of . The asymptotic distribution of the roots of (33) or (35) is the same as that in the normal theory, and hence the test proposed by Fisher and Hsu (see for instance Rao, 1973, pp. 556 –560) can be used. Consider a sequence of local alternatives to the null hypothesis H 0 B = C0 , say, H 0 B − C0 = H 0 n ;
(37)
where n is a known m × p matrix such that ||Sn1=2 n || = O(1):
(38)
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
199
Write xin = Sn−1=2 xi ; Hn = Sn−1=2 H (H 0 Sn−1 H )−1=2 ; P 0 Hn = (u1n ; : : : ; uqn ) :p × q; (Ei )xin Un0 = −1 P 0 0 (Ei )xin Hn = (v1n ; : : : ; vqn ): p × q; Vn = n = Hn0 Sn1=2 n = (H 0 Sn−1 H )−1=2 H 0 n ; P 0 = Im and Hn0 Hn = Iq . where xin xin It is easily seen that u1n ; : : : ; uqn are asymptotically independent with common limiting distribution Np (0; −1 −1 ), so that the limiting distribution of Un0 Un is central Wishart on q degrees of freedom, Wp (q; −1 −1 ). Similarly v1n ; : : : ; vqn are asymptotically independent with common limiting distribution Np (0; ), so that the limiting distribution of Vn0 Vn is central Wishart on q degrees of freedom, Wp (q; ). Bai et al. (1993) established the following theorems concerning the asymptotic distributions of Wn and Rn under the null hypothesis and also under the sequence of alternative hypotheses (37). Theorem 14. Assume that under model (30); (M1)–(M4); (M50 ); (37) and (38) satisÿed. Then Wn = (Un + n )0 (Un + n ) + op (1)
as n → ∞:
||Sn1=2 n || → 0 as Wp (q; −1 −1 ).
n → ∞; the asymptotic distribution of Especially; if H0 holds or Wn is the central Wishart; If n has a limit 6= 0 as n → ∞; the asymptotic distribution of Rn is the noncentral Wishart; Wp (q; −1 −1 ; 0 ). Theorem 15. Under the conditions of Theorem 14; Rn = (Vn + n )0 (Vn + n ) + op (1) as n → ∞: Especially; if H0 holds or ||Sn1=2 n || → 0 as n → ∞; the asymptotic distribution of Rn is the central Wishart; Wp (q; ). If n has a limit 6= 0 as n → ∞; the asymptotic distribution of Wn is the noncentral Wishart; Wp (q; ; 0 ). Note that the local power for the sequence of alternatives considered depends on the magnitude of the roots of the equation |0 − −1 −1 | = 0 for the test based on Wn and |0 − | = 0 for the test based on Rn . Since the roots of these two equations are the same, the two alternative tests are equally ecient asymptotically. In practical applications, we need to estimate the nuisance matrix parameters and . A natural estimate of is P 0 0 ˆ = n−1 (Yi − Bˆ n xi ) 0 (Yi − Bˆ n xi ):
200
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
To estimate , we take a p × p nonsingular matrix Z consisting of 1 ; : : : ; p as its columns, take h = hn ¿ 0 such that hn =dn → ∞; de ne kn = (2nh)−1
hn → 0;
and
lim inf nh2n ¿ 0;
P 0 0 { (Yi − Bˆ n xi + hk ) − (Yi − Bˆ n xi − hk )};
An = (1n ; : : : ; pn )Z
(39)
n→∞
k = 1; : : : ; p;
−1
and use ˆ n = (An + A0n )=2 as an estimate of . Bai et al. (1993) established the following: Theorem 16. Assume that (M1)–(M4); (M50 ) hold in model (30); and B is the true parameter. Then ˆn →
in pr: as n → ∞:
Furthermore; if (39) also holds; then ˆ n →
in pr: as n → ∞:
6. The Bahadur representation In model (1), let ˆn be an M-estimate of by minimizing (2). An important subject on ˆn is its Bahadur representation in which ˆn is written as a sum of n independent summands plus a remainder with an order of magnitude with probability one (or in probability) as n → ∞, and is called its strong (or weak) representation. This representation has important applications in studying its convergence rate such as the law of iterated logarithm, the Berry–Esseen bound and so on. This important direction was initiated by Bahadur (1966) who studied the strong representation of the sample quantile in the location-parameter case. This result was extended by Babu (1989) to the LAD estimation of in model (1). For a detailed history on this subject, the reader is referred to Ser ing (1980). Rao and Zhao (1992b) studied the Bahadur representation of ˆn . They made the following assumptions: (I) The same as (B1). (IIa) There exist positive constants c and h0 such that for any h ∈ (0; h0 ) and any u; (u + h) − (u)6c. (IIb) satis es the Lipschitz condition. (IIIa) G(u) =: E (e1 +u) has the derivative function g(u) in a neighborhood of zero, say (−; ), with G(0) = 0 and g(0) ¿ 0, and |g(u) − g(0)|6c|u|1=2
for some c ¿ 0 and |u|6:
(40)
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
201
(IIIb) In (IIIa), (40) is strengthened as |g(u) − g(0)|6c|u| for some c ¿ 0 and |u|6: (IVa) Sn ¿ 0 and dn (log n)1=2 → 0 as n → ∞, where dn is de ned in (5). (IVb) dn 6cn− for some c ¿ 0 and 0 ¡ 6 12 . Write P rn = Sn1=2 ( ˆn − ) − {g(0)}−1 Sn−1=2 xi (ei ):
(41)
(42)
Rao and Zhao (1992b) established the following theorems, and generalized the results of Bahadur and Babu. Theorem 17. Assume that in model (1); (I) and (IVa) are met; and the moment generating function M (u) = E exp{u (e1 )} exists for |u| ¡ ; then the following conclusions are true: (1) If (IIa) and (IIIa) are met; ||rn || = O(dn1=2 (log n)3=4 )
a:s:
(43)
(2) If (IIb) and (IIIb) are met; ||rn || = O(dn log n)
a:s:
(44)
Theorem 18. Assume that in model (1); (I) and (IVb) are met; and E| (e1 )|t ¡ ∞ for some t¿2 + −1 ; where is deÿned in (IVb); then (IIa) and (IIIa) imply (43); and (IIb) and (IIIb) imply (44). Denote by n and n the smallest and largest eigenvalues of Sn , respectively. If n = O(n ) as n → ∞, the order of magnitude of ||rn || can be improved. This case is of interest in applications, since it includes the important case where Sn =n → Q ¿ 0. Recently, Chen (1994) established the following theorems (refer to Chen and Zhao, 1996). Theorem 19. Assume that in model (1); assumptions (I); (IIa); (IIIa) and the following conditions are satisÿed: (V) E| (e1 )|3 ¡ ∞. (VI) d2n = o((log n)−2 (log log n )−1 ) and d2n = O((log n )− )
for some ¿ 2:
(VII) n = O(n ). Then ||rn || = O(dn1=2 (log n)1=2 (log log n )1=4 ) a:s:
(45)
202
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
Theorem 20. Assume that in model (1); assumptions (I); (IIb); (IIIb); (V) and (VII) are satisÿed; and (VI) is replaced by (VI0 ) d2n = o((log n log log n )−1 ) and d2n = O((log n )− )
for some ¿ 2:
Then ||rn || = O(dn (log n log log n )1=2 )
a:s:
(46)
Chen (1994) also studied the strong representation of an M-estimate of when is not con ned to be convex (also refer to Chen and Zhao, 1996). As for its weak representation, see Bai et al. (1991) and Chen and Zhao (1996). He and Shao (1996) also obtained some results in this direction. References Amemiya, T., 1982. Two stage least absolute deviations estimators. Econometrika 50, 689–711. Babu, G.J., 1989. Strong representations for LAD estimators in linear models. Probab. Theor. Related Fields 83, 547–558. Bahadur, R.R., 1966. A note on quantiles in large samples. Ann. Math. Statist. 37, 577–580. Bai, Z.D., Chen, X.R., Wu, Y.H., Zhao, L.C., 1987. Asymptotic normality of minimum L1 -norm estimates in linear models. Technical Report, 87-35, Center for Multivariate Analysis, University of Pittsburgh. Bai, Z.D., Rao, C.R., Wu, Y., 1992. M-estimation of multivariate linear regression parameters under a convex discrepancy function. Statist. Sinica 2, 237–254. Bai, Z.D., Rao, C.R., Yin, Y.Q., 1990. Least absolute deviations analysis of variance. Sankhya Ser. A 52, 166–177. Bai, Z.D., Rao, C.R., Zhao, L.C., 1991. Weak representation in multivariate linear models obtained by minimizing a convex function of residuals. Technical Report, 91-04, Center of Multivariate Analysis, Penn State University. Bai, Z.D., Rao, C.R., Zhao, L.C., 1993. MANOVA type test under a convex discrepancy function for the standard multivariate linear model. J. Statist. Plann. Inference 36, 77–90. Bassett, G., Koenker, R., 1978. Asymptotic theory of least absolute error regression. J. Amer. Statist. Assoc. 73, 618–622. Bloom eld, P., Steiger, W.L., 1983. Least Absolute Deviations. Birkhauser, Boston. Chen, X.R., 1979. Consistency of least squares estimates in linear models (Special Issue). Sci. Sinica 22 (2) 162–176. Chen, X.R., 1994. Linear representation of M-estimates of multiple regression coecients. Sci. China Ser. A 37, 162–177. Chen, X.R., Bai, Z.D., Zhao, L.C., Wu, Y.H., 1990. Asymptotic normality of minimum L1 -norm estimates in linear models. Sci. China Ser. A (Chinese Edition) 20, 162–177 (English Edition: 33, 1311–1328). Chen, X.R., Bai, Z.D., Zhao, L.C., Wu, Y.H., 1992. Consistency of minimum L1 -norm estimates in linear models. In: Chen, X.R., Fang, K.T., Yang, C.C. (Eds.), The Development of Statistics: Recent Contributions from China, Pitman Research Notes in Mathematics Series, Vol. 258. Longman, Harlow, Essex, UK. Chen, X.R., Wu, Y.H., 1988. Strong consistency of M-estimates in linear models. J. Multivariate Anal. 27, 116–130. Chen, X.R., Wu, Y., Zhao, L.C., 1995. A necessary condition for the consistency of L1 estimates in linear models. Sankhya Ser. A 57, 384–392. Chen, X.R., Zhao, L.C., Wu, Y.H., 1993. On conditions of consistency of ML1 N estimates in linear models. Statist. Sinica 3, 9–18. Chen, X.R., Zhao, L.C., 1995. Strong consistency of M-estimates of multiple regression coecients. Systems Sci. Math. Sci. 8, 82–87.
L. Zhao / Journal of Statistical Planning and Inference 88 (2000) 189–203
203
Chen, X.R., Zhao, L.C., 1996. M-Methods in Linear Model. Shanghai Scienti c & Technical Publishers, Shanghai. Drygas, H., 1976. Weak and Strong consistency of the least squares estimators in regression models. Z. Wahrsch. Verw. Gebiete 34, 119–127. Dupacova, J., 1987. Asymptotic properties of restricted L1 -estimates of regression. In: Dodge, Y. (Ed.), Statistical Data Analysis Based on the L1 -Norm and Related Methods. North-Holland, Amsterdam, pp. 263–274. He, X.M., Shao, Q.M., 1996. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 24, 2608–2631. Huber, P.J., 1973. Robust regression. Ann. Statist. 1, 799–821. Huber, P.J., 1981. Robust Statistics. Wiley, New York. Jureckova, J., Sen, P.K., 1996. Robust Statistical Procedures: Asymptotics and Interrelations. Wiley, New York. Koenker, R.W., 1987. A comparison of asymptotic testing methods for L1 -regression. In: Dodge, Y. (Ed.), Statistical Data Analysis Based on the L1 -Norm and Related Methods. North-Holland, Amsterdam, pp. 287–295. Lai, T.L., Robbins, H., Wei, C.Z., 1979. Strong consistency of least squares estimates in multiple regression II. J. Multivariate Anal. 9, 343–362. McKean, J.W., Schrader, R.M., 1987. Least absolute errors analysis of variance. In: Dodge, Y. (Ed.), Statistical Data Analysis Based on the L1 -Norm and Related Methods. North-Holland, Amsterdam, pp. 297–305. Pollard, D., 1991. Asymptotics for least absolute deviation regression estimators. Econometric Theory 7, 186–199. Portnoy, S., 1984. Asymptotic behavior of M-estimators of p regression parameters when p2 =n is large. I. Consistency. Ann. Statist. 12, 1298–1309. Portnoy, S., 1985. Asymptotic behaviour of M-estimators of p regression parameters when p2 =n is large. II. Normal approximation. Ann. Statist. 13, 1403–1417. Correction. Ann. Statist. 19 (1991) 2282. Rao, C.R., 1948. Tests of signi cance in multivariate analysis. Biometrika 35, 58–79. Rao, C.R., 1973. Linear Statistical Inference and Its Applications, 2nd Edition. Wiley, New York. Rao, C.R., Toutenburg, H., 1995. Linear Models, Least Squares and Alternatives. Springer, New York. Rao, C.R., Zhao, L.C., 1992a. On the consistency of M-estimate in a linear model obtained through an estimating equation. Statist. Probab. Lett. 14, 79–84. Rao, C.R., Zhao, L.C., 1992b. Linear representation of M-estimates in linear models. Canad. J. Statist. 20, 359–368. Rockafellar, R.T., 1970. Convex Analysis. Princeton University Press, Princeton, NJ. Schrader, R.M., Hettmansperger, T.P., 1980. Robust analysis of variance based upon a likelihood ratio criterion. Biometrika 67, 93–101. Sen, P.K., 1982. On M tests in linear models. Biometrika 69, 245–248. Ser ing, R., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York. Singer, J.M., Sen, P.K., 1985. M-methods in multivariate linear models. J. Multivariate Anal. 17, 168–184. Welsh, A.H., 1989. On M-processes and M-estimation. Ann. Statist. 17, 337–361. Correction. Ann. Statist. 18 (1990) 1500. Wu, Y.H., 1988. Strong consistency and exponential rate of the minimum L1 -norm estimates in linear regression models. Comput. Statist. Data Anal. 6, 285–295. Yohai, V.J., Maronna, R.A., 1979. Asymptotic behavior of M-estimators for the linear model. Ann. Statist. 7, 258–268. Zhao, L.C., Chen, X.R., 1991. Asymptotic behavior of M-test Statistics in linear models. J. Combin. Inform. System Sci. 16, 234–248. Zhao, L.C., Rao, C.R., Chen, X.R., 1993. A note on the consistency of M-estimates in linear models. In: Cambanis, S., Ghosh, J.K., Karandikar, R.L., Sen, P.K. (Eds.), Stochastic Process, A Festschrift in Honour of Gopinath Kallianpur. Springer, New York, pp. 359–367.