Journal of Statistical Planning and Inference 122 (2004) 253 – 269
www.elsevier.com/locate/jspi
Longitudinal data analysis using t-type regression Xuming Hea , Hengjian Cuib , Douglas G. Simpsona;∗ a Department
of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA b Department of Mathematics, Beijing Normal University, Beijing, China Accepted 15 June 2003
Abstract We consider a robust estimator of linear regression for longitudinal data by maximizing marginal likelihood of a scaled t-type error distribution. The marginal likelihood can also be applied to the de-correlated response when the within-subject correlation can be consistently estimated from an initial estimate of the model based on the working assumption of independence. While the t-distributed errors can be motivated from a latent hierarchical model as an extension of Gaussian mixed models, our estimators have asymptotic normal distributions for a wider class of error distributions. The estimators have bounded in6uence functions and can achieve positive breakdown points regardless of the dimension of the covariates. c 2003 Elsevier B.V. All rights reserved. MSC: primary 62F35; secondary 62H20, 62J99 Keywords: Correlation; t-type regression; M -estimator; One-step estimator; Longitudinal data; Asymptotic normality
1. Introduction Robust estimation of regression has been quite well developed over the past three decades. A wide variety of robust estimators have been proposed in the literature motivated by a variety of considerations such as bounded in6uence functions (e.g. GM-estimator of Maronna and Yohai, 1981), high breakdown points (e.g. Rousseeuw and Yohai, 1984), local and global robustness (e.g. Simpson et al., 1992), This research was supported in part by National Science Foundation Grants DMS-0073044 and DMS-0102411 and National Security Agency Grant MDA-904-02-1-0092. ∗ Corresponding author. Tel.: +1 217 333 2167; fax: +1 217 244 7190. E-mail addresses:
[email protected] (X. He),
[email protected] (H. Cui),
[email protected] (D.G. Simpson).
c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter doi:10.1016/j.jspi.2003.06.002
254
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
bias-eHciency tradeoIs (Hampel et al., 1986), etc. Recently, He et al. (2000) proposed estimators based on t-type likelihood, which combine a bounded in6uence function with a positive breakdown point using the familiar likelihood approach. They are fully efJcient under a heteroscedastic t model rather than under a homoscedastic Gaussian model. They can be viewed as a fully iterated version of the GM-estimators deJned through optimization rather than through the estimating equation so as to avoid the problem of multiple roots to the estimating equations. The multiple roots to an estimating equation may be utilized for diagnostic purposes, see Markatou (2000) for details. This paper extends the work of He et al. (2000) to the estimation of linear models for longitudinal data. In a longitudinal study individual subjects are measured repeatedly over time. In this paper, we suppose that a random sample of n subjects are measured at q time points and the response yij (i = 1; : : : ; n; j = 1; : : : ; q) is related to covariates xij in a linear fashion. For convenience, we often use vector–matrix form of the model. Let Yi = (yi1 ; : : : ; yiq )T , Xi = (xi1 ; : : : ; xiq ) ∈ Rp×q , and Zi = (1; XiT )T with 1 = (1; : : : ; 1)T ∈ Rq . We assume that Yi = ZiT 0 + i ;
i = 1; : : : ; n;
(1)
for some 0 ∈ Rp+1 and independent error vectors i ∈ Rp . We further assume that i have the common probability density function f(u) = |0 |−1=2 f0 (u 0−1 u) for some q × q positive deJnite matrix 0 . In the rest of the paper, we consider estimating 0 using a t-type pseudo-likelihood applied to properly scaled errors. In Section 2, we motivate our working model with the t-distributed error from a Bayesian hierarchical model with scaled Gaussian random eIects. The t-type likelihood estimators, however, can be used as robust estimators for model (1). In Section 3, we start with a marginal t-likelihood estimator without regard to the within-subject correlation, which is then used to estimate 0 before obtaining a marginal t-likelihood estimator based on the transformed response. A one-step estimator derived from the corresponding estimating equation is also discussed, and a simple cross-checking method can optimize robustness and eHciency from both the initial and the one-step estimators. We derive the asymptotic normality results in Section 4 for the marginal t-likelihood estimators based on the transformed response as well as the one-step estimators. A comparison of asymptotic eHciencies with the multivariate t-likelihood of Lange et al. (1989) is also given. We illustrate these estimation methods in Section 5 through an analysis of the bid-ask price spreads for the NYSE listed stocks and the others traded on NSADAQ. Some technical details are provided in Appendix A.
2. Motivating models It is well-known that the Student t-distribution can be derived by a Gaussian random variable with a random scale. This fact was used by Dempster et al. (1977) and Little (1988) for constructing EM algorithms with t-likelihood. It was further used by Lange et al. (1989) and He et al. (2000) for constructing robust models and robust estimators.
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
255
In the regression setting, if the response y given the covariate z ∈ Rp+1 and a random scale variable s has a normal distribution N (z T ; s2 2 ), and =s2 has the chi-square distribution with degrees of freedom, then the conditional distribution of (y − z T )= given z is the t-distribution with degrees of freedom. This can be generalized to heteroscedastic models by assuming y|z; b; s2 ∼ N (z T b; s2 );
b|s2 ∼ N (; s2 );
s−2 ∼ 2 ()=(2 );
which results in (y − z T ) ∼ t (1 + z T z)1=2 for some vector and a positive-deJnite matrix . If we consider a hierarchical mixed model for longitudinal data yij = zijT + si (bi + eij );
i = 1; : : : ; n; j = 1; : : : ; q;
where bi is a Gaussian random-eIect, zij are design vectors for the Jxed eIects, eij are i.i.d. Gaussian random errors, and si are independent random scale variables with si−2 ∼ 2 ()=, we can integrate out the scale component as before to get Yi = ZiT + Pi ;
(2)
where i = (i1 ; : : : ; iq )T are i.i.d. random vectors having the standard multivariate t distribution with degrees of freedom, and P is some q by q matrix. In this paper, we take (2) as our working model for some general P matrix so that = P Cov(i )P T , the covariance matrix of Yi given Zi , is unspeciJed. If = 2 I , the model reduces to the usual case of homoscedastic, uncorrelated errors, which is often a default assumption for cross-sectional data. The common form of compound symmetry corresponds to = 12 I + 22 11T , where 1 is a vector of 1’s. The present analysis diIers from the classical mixed model analysis in that we do not assume Gaussian errors. We shall introduce marginal t-type likelihood estimators of in the next section without assuming that (2) gives the true error structure for the data. Instead, we assume that the data are from the model (1) with both 0 and f unspeciJed.
3. Marginal t regression estimators We Jrst introduce a marginal M estimator for t regression with correlated errors, and then discuss several reJnements. The idea is to derive marginal estimates by proceeding as if the errors, ij , are i.i.d. random variables, but to develop inferences without assuming the i.i.d. structure. SpeciJcally, the initial estimator, ˆ0 , is deJned by (ˆ0 ; ˆ0 ) = arg
n
∈R
q
1 [#{(yij − zijT )wij } − log()]; ;¿0 n
min p
i=1 j=1
(3)
256
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
where is a scale parameter, wij = wn (xij ) with wn (·) as a given weight function satisfying condition (B1) in Appendix A, and # is a loss function satisfying the conditions (A1) – (A3) in Appendix A. He et al. (2000) established the breakdown properties of such estimators in the case of independent errors. Here we extend the treatment to the case with correlated errors. The primary choice used in the paper is #(x) = 12 ( + 1)log(1 + x2 =), which corresponds to the log density of the t-distribution with degrees of freedom. The weight function is chosen to be wn (x) = {1 + (x − Mn )T Cn−1 (x − Mn )=p}−1=2
(4)
for some robust estimate (Mn ; Cn ) of location and scatter for the design points xij . If (Mn ; Cn ) are aHne equivalent, the weights are invariant under aHne transformations of the design. Let (·) = #(1) (·) be the derivative of # and (x) = x (x) − 1. The estimator (ˆ0 ; ˆ0 ) solves the following equations q n
{(yij − zijT )wij }wij zij = 0;
i=1 j=1 q n
{(yij − zijT )wij } = 0:
(5)
i=1 j=1
Although the marginal estimator is designed to be eHcient when there is no correlation among the observations, it belongs to the class of generalized estimating equations (Godambe, 1991) and may be used for longitudinal data as well. Liang and Zeger (1986) pioneered the use of marginally deJned estimators in the context of generalized linear models with correlated errors. Huber (1967) developed a rigorous asymptotic theory for maximum likelihood estimators with misspeciJed likelihood, i.e., the general class of M estimators. As in He et al. (2000), any consistent estimator satisfying (5) has the following representation −1 ˆ0 − = D0n
q n
{ˆ0 wij rij }wij zij + op (n−1=2 )
with ˆ0 − n = op (1) and n satisfying D0n = ˆ0
(6)
i=1 j=1
q n
(1)
n
i=1
{ˆ0 wij rij }wij2 zij zijT ;
E[
q
j=1
{n wij ij }] = 0, and
rij = yij − zijT ˆ0 :
i=1 j=1
We can use the residuals to estimate the matrix 0 in model (1) up to a multiplicative constant. Let Ri =Yi −ZiT ˆ0 and ˆ be a scatter estimate from the residuals. We transform the variables y˜ ij = ejT ˆ −1=2 Yi ;
z˜ij = Zi ˆ −1=2 ej
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
257
with ej being the unit vector whose jth element is 1. Based on the transformed variˆ ) ables, we can obtain the estimate (; ˆ by solving (3) with (zij ; yij ) replaced by (z˜ij ; y˜ ij ). This can be computed with the same EM algorithm as in He et al. (2000). For convenience we focus on balanced longitudinal data (equal numbers of observations per subject). Further extension to the unbalanced case is possible if the within subject covariance matrix i is determined by a Jnite dimensional parameter *, which can be estimated. We forego the details. One may also try to get the estimate of by solving q n
{(y˜ ij − z˜Tij ˆ0 )wij } = 0;
(7)
i=1 j=1
which has a unique root ˆ1 due to the monotonicity property of . Furthermore, ˆ can be obtained by solving q n
{ˆ1 (y˜ ij − z˜Tij )wij }wij z˜ij = 0
(8)
i=1 j=1
using an iterative method, say, Newton’s method. To alleviate the potential convergence problem, we can use a one-step estimator as in Simpson et al. (1992) and Simpson and Yohai (1998) by deJning −1 ˆ1 = ˆ0 + D1n gn ;
where gn =
n
Zi ˆ −1=2 Ain ;
i=1
D1n = ˆ1
n
T ˆ −1=2 T Zi ˆ −1=2 Bin Bin Zi ;
i=1
and Ain = ( {ˆ1 e1T ˆ −1=2 Ri wi1 }wi1 ; : : : ; {ˆ1 eqT ˆ −1=2 Ri wiq }wiq )T ; Bin = (
(1)
{ˆ1 e1T ˆ −1=2 Ri wi1 }wi1 ; : : : ;
(1)
{ˆ1 eqT ˆ −1=2 Ri wiq }wiq )T :
In the next section we shall see that the estimator ˆ and the one-step estimator ˆ1 are asymptotically equivalent under the model (1). Both are as eHcient as in the case of a known 0 . The breakdown point of ˆ is automatic by Theorem 1 of He et al. (2000). In this paper, we deJne the breakdown point of an estimator as the smallest fraction of subjects that can be added to the original sample in order to drive the estimator to unboundedness or singularity. More speciJcally, let ∗ be the breakdown point of ˆ and wij = w(xij ; X ) for some weight function w, where X denotes the set of all design points. Following He et al. (2000, Eq. 2.3), deJne the weight breakdown point, w∗ , as the smallest fraction of points that need to be added to X in order to drive w(xij )xij to arbitrarily large values. We then have
258
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
Proposition 1. Under the conditions (A1) – (A3) and (B1), the asymptotic breakdown point of ˆ satis9es ˆ ¿ min{∗ ; ∗ ; 1=(1 + .); .=(1 + .)}; ∗ () w where . is de9ned in (A.3) of Appendix A. High breakdown location and scatter estimates such as the S-estimators of Rousseeuw and Yohai (1984) can be used in constructing the weights (4) and in estimating . He et al. (2000) observed that wi ≡ 1 implies w∗ = 0, so that the unweighted t-MLE has breakdown point of zero. Similarly, the multivariate t regression estimator has a breakdown point of zero if w∗ = 0, so robust weights are necessary in (4) to obtain a nonzero breakdown point. We cannot ensure the same breakdown property for ˆ1 , but the idea of cross-checking considered by He (1991) and He and Wang (1996) can be used to deJne ˆc = ˆ1 I (det(S1 ) 6 det(S0 )) + ˆ0 I (det(S1 ) ¿ det(S0 )); where −1 S0 = D0n
and S1 =
−1 D1n
i
(9)
2
(ˆ0 wij rij )wij2 zij zijT
j
−1 D0n ;
ˆ −1=2
Zi
i
Ain ATin ˆ −1=2 ZiT
−1 D1n :
(10)
S0 and S1 are consistent variance–covariance estimates for ˆ0 and ˆ1 respectively, provided ˆ has root-n convergence to a nonsingular covariance matrix. Note that a breakdown of ˆ1 occurs only when D1n is being driven to singularity and that the selection (9) uses ˆ1 only when this does not happen, therefore the resulting estimator ˆc has the same breakdown point as ˆ0 . In addition, the asymptotic eHciency of ˆc is always the better of the two estimators ˆ0 and ˆ1 . If the model (2) holds with P = diag(wi1 ; : : : ; wiq ) and i having the multivariate t-distribution with degrees of freedom, then ˆ1 is Jrst-order equivalent to the maximum likelihood estimator and thus more eHcient than ˆ0 . In general, ˆ1 is not necessarily more eHcient.
4. Asymptotic eciency In this section we investigate the asymptotic eHciencies of the marginal estimator ˆ0 , the estimator on transformed variables ˆ and the one-step estimator ˆ1 . They are compared with the eHciency of the maximum likelihood estimator under the multivariate t-model with constant weights. The latter estimator, to be denoted by ˆmult , was considered by Lange et al. (1989) and can be computed via an EM algorithm.
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
259
If we choose constant weights wij = 1, ˆ bears a close relationship with ˆmult , except that a preliminary estimate of is used. The estimator ˆmult updates the estimate of in each iteration. The use of weights as in this paper is necessary to achieve a good breakdown property of the estimator. We assume that our estimate of is root-n consistent, that is, there exists a positive constant c0 such that ˆ − c0 0 = Op (n−1=2 ). Without loss of generality, we take c0 = 1 ˆ in all the proofs. Even though Ri = Yi − ZiT 0 + Op (n−1=2 ) are used to compute , this requirement can be shown to be met by the usual S-estimators based on the same arguments as in Davies (1987). We omit the details. We now deJne ∗n to be the solution to q n E {∗n wij ejT 0−1=2 i } = 0: (11) i=1
j=1
Also let Qn =
q n
E[
i=1 j=1
(1)
(∗n ejT 0−1=2 i wij )]wij2 Zi 0−1=2 ej ejT 0−1=2 ZiT :
Theorem 1. Under the conditions given in Appendix A, any consistent sequence satisfying (5) with (yij ; zij ) replaced by (y˜ ij ; z˜ij ) has the following representation ˆ − 0 = Qn−1
q n i=1 j=1
(∗n ejT 0−1=2 i wij )wij Zi 0−1=2 ej + op (n−1=2 ):
(12)
The one-step estimator ˆ1 has the same asymptotic representation. As a consequence, we have S1−1=2 (ˆ − 0 ) → N (0; I );
S1−1=2 (ˆ1 − 0 ) → N (0; I )
in distribution, where S1 is deJned in (10). The theorem also indicates that the estimators ˆ is as eHcient as if 0 were known. In fact, the result does not depend on the choice of ˆ as an estimate of 0 so long as it is root-n consistent. We now compare the asymptotic eHciencies of these estimators under a variety of designs and error distributions. We consider the following model Yi = ZiT + Pi ;
i = 1; : : : ; n;
where Yi ∈ R2 , ∈ R3 , xi ∈ R3 , i is a bivariate error variable and
1=2 1 # P= # 1 with # = 0:4 and 0.8. No intercept appears in this model. The weight function w(x) = [1 + x =3]−1=2 and the # function that corresponds to the log-likelihood from the t
260
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
Table 1 Asymptotic relative eHciency of ˆ and ˆ0 versus ˆmult
ˆ
ˆ0 (# = 0:4)
ˆ0 (# = 0:8)
Scenario (1) 1 2 4 10
0.880 0.983 0.945 0.917
0.738 0.738 0.715 0.681
0.585 0.586 0.565 0.537
Scenario (2) 1 2 4 10
0.968 0.955 0.919 0.855
0.697 0.676 0.629 0.561
0.515 0.500 0.459 0.405
Scenario (3) 1 2 4 10
0.210 0.224 0.233 0.251
0.138 0.138 0.132 0.120
0.106 0.103 0.097 0.086
Scenario (4) 1 2 4 10
4.235 5.425 6.606 7.830
3.944 4.555 5.172 5.880
3.139 2.269 4.202 4.694
distribution are used. The following four scenarios are considered. (1) (2) (3) (4)
x x x x
∼ N (0; I ), ∼ N (0; I ), ∼ N (0; I ), ∼ 0:9N (0; I ) + 0:1N (0; 25I ), ∼ 0:9N (0; I ) + 0:1N (0; 36I ), ∼ 0:9N (0; I ) + 0:1N (0; 25I ), ∼ N (0; I ), ∼ N (0; (1 + x )2 I ).
ˆ ˆ1 , and ˆmult are aHne equivariant so their asymptotic eHciency The estimators , comparisons do not depend on the value of #. When # = 0, ˆ0 and ˆ are equally eHcient. For our model with x symmetrically distributed, all the estimators have the asymptotic variance–covariance matrix as a constant times the 3 by 3 identity matrix so the comparison of asymptotic eHciencies are straightforward. Table 1 gives the asymptotic relative eHciencies of ˆ0 and ˆ computed as the asymptotic mean square error divided by that of the multivariate estimator ˆmult for each of the four scenarios. It is clear that ˆ (or ˆ1 ) obtained after a transformation of variables to de-correlate the response improves on the marginal estimator ˆ0 for nonzero #. In the Jrst three cases where there are no or few outliers at leverage points, ˆmult has better eHciency. The loss of eHciency for ˆ is mainly due to downweighting good leverage points.
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
261
The robustness of ˆ and ˆ0 becomes evident in the fourth case where the outliers at leverage points have a serious impact on the eHciency of ˆmult , but not on the high breakdown estimators.
5. An example A number of studies in Jnancial economics over the past several years have pointed to higher trade execution costs for NASDAQ stocks than for listed stocks on the New York Stock Exchange (NYSE), see, e.g. Huang and Stoll (1996), and Barclay (1997). NYSE is an auction market but NASDAQ trades are executed through market makers who compete for bid/ask prices. Individual investors are more concerned with the stock spread, the diIerence between bid and ask prices on the same security. The higher spread on NASDAQ stocks was sometimes attributed to the way the exchange handled its trade, but other factors such as the liquidity of the stocks were known to play an important role. Recently NASDAQ has introduced some new order-handling rule designed to lower the spreads. We collected a small amount of post reform data as a pilot study on stock spread comparisons to illustrate the performance of various estimators discussed in this paper. We obtained the spreads of 20 stocks at two diIerent times in the morning trading hours of February 20, 2001. We use the spread measured as percent of the Table 2 Spread as a percent of price and liquidity (in log(1 + dollars)) for twenty stocks at two diIerent times Symbol
Exchange
% Spread 1
Liquidity 1
% Spread 2
Liquidity 2
ADIC AOL HYSQ ICN SNDK LSI NOVL GX AMT TQNT SYMC PWAV NOK CPQ DY HTSF HEII CHU SAWS IFX
1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0
0.74 0.08 1.56 0.52 0.25 0.24 0.45 0.24 0.45 0.21 0.12 0.27 0.33 0.09 1.18 9.38 3.12 0.53 0.30 0.66
7.44 9.98 4.69 5.93 8.82 5.00 8.17 9.41 7.54 8.64 8.56 8.33 10.82 10.01 6.27 0.00 0.00 8.19 8.53 6.12
0.37 0.12 1.56 0.20 0.50 0.19 0.89 0.12 0.33 0.21 0.12 0.27 0.19 0.14 1.12 9.38 1.04 0.33 0.60 0.03
8.19 11.80 5.62 6.62 9.38 9.55 8.77 10.20 8.15 10.18 9.48 9.30 11.57 10.81 7.36 0.00 3.22 6.52 9.44 6.82
262
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
Table 3 Parameter estimates and standard errors (SE) for stock spread example Method
2
1
2
12
ˆmult (SE) ˆ0
0.7034 (0.1885) 0.8284 (0.3037) 0.9336 (0.2421) 0.6241 (0.7232)
2.0715 (0.2558) 2.0519 (2.6320) 2.0441 (0.2428) 4.3987 (0.9755)
−0:0489 (0.0199) −0:0615 (0.0283) −0:0728 (0.0262) −0:0320 (0.0729)
−0:2226 (0.0281) −0:2285 (0.3354) −0:2312 (0.0267) −0:4736 (0.1066)
(SE) ˆ (SE) Gaussian MLE (SE)
NASDAQ stocks NYSE stocks
Percent of Spread
8
6
4
2
0 0
2
4
6
8
10
12
Liquidity Fig. 1. Scatterplot of Stock Spreads.
stock price as the response variable y with x1 as exchange indicator (1 for NASDAQ and 0 for NYSE) and x2 as a measure of liquidity (amount of trading up to that time in log dollars). The data are listed in Table 2. We Jt the following
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
263
model yit = 2 + 1 x1i + 2 x2it + 12 x1i x2it + eit ;
i = 1; : : : ; 20; t = 1; 2;
where i indicates the stock and t indicates the time. Four estimates of the parameters and their standard errors are computed and given in Table 3. The t-distribution with 4 degrees of freedom is used as the working likelihood for the Jrst three estimators. In this example, both ˆ and ˆmult give similar results, but ˆ0 has higher variability and the Gaussian maximum likelihood estimate (obtained from PROC MIXED with SAS) clearly over-estimates the 1 parameter. Fig. 1 gives the scatterplot of y against x2 , which shows two clearly outlying stocks with very low liquidity. The robust estimate ˆ indicates that the relative spread is higher at NASDAQ by 2% for illiquid stocks but the diIerence shrinks with improving stock liquidity. The Gaussian estimator allows the highly illiquid stocks to have considerable in6uence on the Jt even though they would be of little interest for investment.
Acknowledgements We thank two referees for helpful suggestions and corrections. Research was partially supported by National Science Foundation (USA), National Natural Science Foundation of China and EYTP of China.
Appendix A Formal conditions used in this paper are provided as follows. (A1): #(·) is symmetric about 0 and continuously increasing from 0 to inJnity on [0; ∞). (A2): has a bounded and continuous second-order derivative (2) and supt∈R1 (i−1) i (1 + |t| ) (t) ¡ ∞ for i = 1; 2. (A3): (x) is an increasing function with −1 = (0) ¡ lim|t|→∞ (t) = . ¿ 0. (B1): supx∈Rp [w(x)(1 + x )] ¡ + ∞, and there exists a constant 4 ¿ 0 such that V 1 #{i : i 6 n; w(xij ) 6 4} ¡ .=(1 + .) lim n
n→∞
for all 1 6 j 6 q. (B2): limn→∞ Qn =n = Q for some positive deJnite matrix Q. Without loss of generality, we assume in the proofs that ˆ − 0 = Op (n−1=2 ). The Jrst two lemmas handle the scale . Recall that ∗n is given by (11).
264
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
Lemma 1. Under conditions (A1) – (A3) and (B1), there exist constants 1 ; 2 ¿ 0 such that 1 6 ∗n 6 2 for su=ciently large n. Proof. If there exists a subsequence 0 ¡ ∗n → 0 as n → ∞, then by the continuity and boundedness of (·) and condition (B1) we have limn →∞ E[∗n wij ejT 0−1=2 i ]=−1 uniformly for 1 6 i 6 n; 1 6 j 6 q. Therefore,
q n 1 lim E[∗n wij ejT 0−1=2 i ] = −q; n →∞ n i=1 j=1
which contradicts the deJnition of ∗n . If for some subsequence ∗n → ∞ as n → ∞, then limn →∞ E[∗n wij ejT 0−1=2 i ]= . ¿ 0 whenever w(xij ) ¿ 4. Thus, by condition (A3) and (B1), q n . 1 . ∗ T −1=2 + =0 lim E[n wij ej 0 i ] ¿ q − n →∞ n 1+. 1+.
i=1 j=1
which contradicts the deJnition of ∗n . The proof is complete. Lemma 2. Under conditions (A1) – (A3) and (B1), we have ˆ1 − ∗n = Op (n−1=2 ). Proof. Let n
5n () =
q
1 [wij ejT 0−1=2 i ]: n i=1 j=1
Since (·) is monotone, continuous and bounded, we obtain by the maximal inequality (7.10) of Pollard (1990) that sup¿0 |5n () − E5n ()| = Op (n−1=2 ), see He and Shao (1996). By condition (A2) on (1) , we have |(t2 ) − (t1 )| 6 L|t2 − t1 |=(1 + |t1 |) for some constant L ¿ 0. Thus, we have
n q
1
T ˆ
sup [wij (y˜ ij − z˜ij 0 )] − 5n ()
¿0 n i=1 j=1
−1=2 n L wij |(y˜ ij − z˜Tij ˆ0 ) − ejT 0 i | n 1 + |wij ejT 0−1=2 i | q
6 sup ¿0
6 sup ¿0
i=1 j=1
q n L [wij i O( ˆ − 0 ) + O( ˆ0 − 0 )] n 1 + |wij ejT 0−1=2 i | i=1 j=1
= Op (n−1=2 ):
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
265
ˆ We have shown that Here we have used (6) and our assumption on .
n q
1
T ˆ
[wij (y˜ ij − z˜ij 0 )] − E5n ()
= Op (n−1=2 ): Zn = : sup ¿0 n i=1 j=1
By condition (A2), (B1) and Lemma 1, we have q
n
1 wij E[ (∗n wij ejT 0−1=2 i )ejT 0−1=2 i ] n i=1 j=1
¿
q4 min E[ (tejT 0−1=2 i )ejT 0−1=2 i ]= : c1 ¿ 0: 1 + . t∈[41 ;42 ]
By monotonicity of (·), if ˆ1 ¿ ∗n + Kn−1=2 , we have for any K ¿ 0 q
n
√ 1 [(∗n + K= n)wij (y˜ ij − z˜Tij ˆ0 )] 0¿ n i=1 j=1 q
n
¿
√ √ 1 E[(∗n + K= n)wij ejT 0−1=2 i ] − Zn ¿ c1 K=(2 n) − Zn : n i=1 j=1
√ As a result, Zn ¿ c1 K=(2 n). Similar arguments (using −K) show that ˆ1 6 ∗n −Kn−1=2 √ implies Zn ¿ c1 K=(2 n). It then follows that ˆ1 − ∗n = Op (n−1=2 ). ˆ Observe that Proof of Theorem 1. Part 1 of the proof handles . q
n
1 [ (∗n wij ejT 0−1=2 i )]wij Zi 0−1=2 ej − n i=1 j=1 n
q
1 ˆ ˆ −1=2 ej [ (w ˆ ij ejT ˆ −1=2 (Yi − ZiT ))w = ij Zi n i=1 j=1
− (∗n wij ejT 0−1=2 (Yi − ZiT 0 ))]wij Zi 0−1=2 ej n
q
1 = n
(1)
i=1 j=1
ˆ (∗n wij ejT 0−1=2 i )[w ˆ ij ejT ˆ −1=2 (Yi − ZiT )
− ∗n wij ejT 0−1=2 (Yi − ZiT 0 )]Zi 0−1=2 ej + op (n−1=2 ) = − ∗n
n
q
1 n i=1 j=1
(1)
(∗n wij ejT 0−1=2 i )wij Zi 0−1=2 ej ejT 0−1=2 ZiT (ˆ − 0 )
+ I1n + I2n + op (n−1=2 );
266
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
where I1n = (ˆ −
∗n )ˆ −1=2
q
n
1 n
(1)
i=1 j=1
I2n =
∗n (ˆ −1=2
−
0−1=2 )
n
q
(∗n wij ejT 0−1=2 i )[i − ZiT (ˆ − 0 )]wij Zi 0−1=2 ej ;
1 n i=1 j=1
(1)
(∗n wij ejT 0−1=2 i )
× [i − ZiT (ˆ − 0 )]wij Zi 0−1=2 ej : Since (·) is an even function, it is easy to see that I1n = op (n−1=2 ) + Jn (ˆ − 0 ) with Jn = op (1) in matrix norm. Also, we have I2n = op (n−1=2 ), and hence q
n
1 [ (∗n wij ejT 0−1=2 i )]wij Zi 0−1=2 ej n i=1 j=1
1 =∗n n
q n
(1)
i=1 j=1
(∗n wij ejT 0−1=2 i )wij Zi 0−1=2 ej ejT 0−1=2 ZiT (ˆ − 0 )
+ Jn (ˆ − 0 ) + op (n−1=2 ) q n 1 (1) ∗ = ∗n (n wij ejT 0−1=2 i )wij Zi 0−1=2 ej ejT 0−1=2 ZiT + Jn (ˆ − 0 ) n i=1 j=1
+ op (n−1=2 ) D2n =: + Jn (ˆ − 0 ) + op (n−1=2 ); n where D2n has its expectation Qn . It then follows that E 1n (D2n − Qn ) 2 = O(n−1 ) and therefore n−1 (D2n − Qn ) = Op (n−1=2 ), which proves (12). Part 2 of the proof concerns ˆ1 . Let G(0 ; ; ) =
q n
(ejT −1=2 i wij )wij Zi −1=2 ej :
i=1 j=1
By Taylor expansion, we get (ˆ1 ejT ˆ −1=2 i wij ) = (ˆ1 ejT ˆ −1=2 Ri wij + ejT ˆ −1=2 ZiT (ˆ0 − 0 )wij ) = (ˆ1 ej ˆ −1=2 Ri wij ) + +
1 2
(2)
(1)
(ˆ1 ej ˆ −1=2 Ri wij )ejT ˆ −1=2 ZiT (ˆ0 − 0 )wij
[ˆ1 ejT ˆ −1=2 (Ri + ZiT (˜ − 0 ))wij ](ejT ˆ −1=2 ZiT (ˆ0 − 0 )wij )2 ;
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
267
where ˜ is on the line segment between 0 and ˆ0 . It then follows that ˆ = G(0 ; ˆ1 ; )
q n
{ˆ1 (ejT ˆ −1=2 Ri wij )wij Zi ˆ −1=2 ej
i=1 j=1 (1)
+
(ˆ1 ejT ˆ −1=2 Ri wij )wij2 Zi ˆ −1=2 ej ejT ˆ −1=2 ZiT (ˆ0 − 0 )}
+ Op (n ˆ0 − 0 2 ) = D1n (ˆ1 − 0 ) + Op (n ˆ0 − 0 2 );
(13)
−1 gn . where ˆ1 = ˆ0 + D1n On the other hand,
ˆ = G(0 ; ˆ1 ; )
q n i=1 j=1
+
(ˆ1 ejT ˆ −1=2 i wij )wij Zi 0−1=2 ej
q n i=1 j=1
(ˆ1 ejT ˆ −1=2 i wij )wij Zi (ˆ −1=2 − 0−1=2 )ej
and (ˆ1 ejT ˆ −1=2 i wij ) = (∗n ejT 0−1=2 i wij + ˆ1 ejT (ˆ −1=2 − 0−1=2 )i wij + (ˆ1 − ∗n )ejT 0−1=2 i wij ) = (∗n ejT 0−1=2 i wij ) +
(1)
(∗n ejT 0−1=2 i wij )[ˆ1 ejT (ˆ −1=2 − 0−1=2 )i wij
+ (ˆ1 − ∗n )ejT 0−1=2 i wij ] +
1 2
(2)
˜ i wij )[ˆ1 ejT (ˆ −1=2 − −1=2 )i wij (˜n ejT 0
+ (ˆ1 − ∗n )ejT 0−1=2 i wij ]2 ; where ˜ = 21 ˆ −1=2 + (1 − 21 )0−1=2 and ˜n = 22 ˆ1 + (1 − 22 )∗n with 0 6 21 ; 22 6 1. Therefore, ˆ G(0 ; ˆ1 ; ) =
q n i=1 j=1
+
n i=1
(∗n ejT 0−1=2 i wij )wij Zi 0−1=2 ej
q Zi (ˆ −1=2 − 0−1=2 ) (∗n ejT 0−1=2 i wij )wij ej
+ 0−1=2 diag(
j=1
(1)
(∗n e1T 0−1=2 i wi1 )wi1 ; : : : ;
(1) ∗ T −1=2 n eq 0 i wiq )wiq )
268
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
ˆ −1=2
=
0−1=2
+ (ˆ1 −
∗n )0−1=2 ]i
+ Op [n( ˆ − 0 2 + |ˆ1 − ∗n |2 )]
× [
−
q n
√ (∗n ejT 0−1=2 i wij )wij Zi 0−1=2 ej + Op [ n( ˆ − 0 + |ˆ1 − ∗n |)]
i=1 j=1
+ Op [n( ˆ − 0 2 + |ˆ1 − ∗n |2 )]:
(14)
Combining (13) and (14), we get q n 1 1 ˆ (∗n ejT 0−1=2 i wij )wij Zi 0−1=2 ej D1n (1 − 0 ) = n n i=1 j=1
√ + Op ( ˆ0 − 0 2 ) + Op [( ˆ − 0 + |ˆ1 − ∗n |)= n] + Op ( ˆ − 0 2 + |ˆ1 − ∗n |2 ) n
=
q
1 (∗n ejT 0−1=2 i wij )wij Zi 0−1=2 ej + op (n−1=2 ): n i=1 j=1
Finally, it is easy to show n−1 [D1n − Qn ] = Op ( ˆ0 − 0 + ˆ − 0 + |ˆ1 − ∗n |) + Op (n−1=2 ) = Op (n−1=2 ). We have now completed the proof of Theorem 1. References Barclay, M., 1997. Bid-Ask spreads and the avoidance of odd-eighth quotes on NASDAQ: an examination of exchange listings. J. Financial Economics 45, 35–38. Davies, P.L., 1987. Asymptotic behavior of S-estimates of multivariate location parameters and dispersion matrices. Ann. Statist. 15, 1269–1292. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm (with Discussion). J. R. Statist. Soc. B 39, 1–38. Godambe, V.P., 1991. Estimating Functions. Clarendon Press, Oxford. Hampel, F.R., Ronchetti, E.M., Rousseeuw, R.J., Stahel, W.A., 1986. Robust Statistics. Wiley, New York. He, X., 1991. A local breakdown property of robust tests in linear regression. J. Multiva. Anal. 38, 294–305. He, X., Wang, G., 1996. Cross-checking using the minimum volume ellipsoid estimator. Stat. Sinica 6, 367–374. He, X., Shao, Q.M., 1996. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 24, 2608–2630. He, X., Simpson, D.G., Wang, G.Y., 2000. Breakdown points of t-type regression estimators. Biometrika 87, 675–687. Huang, R.D., Stoll, H.R., 1996. Dealer versus auction markets: a paired comparison of execution costs on NASDAQ and the NYSE. J. Finance Econ. 41, 313–357. Huber, P.J., 1967. The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 221–233. Lange, K.L., Little, R.J.A., Taylor, J.M.G., 1989. Robust statistical modeling using the t distribution. J. Amer. Statist. Assoc. 84, 881–896. Liang, K.-Y., Zeger, S.L., 1986. Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
X. He et al. / Journal of Statistical Planning and Inference 122 (2004) 253 – 269
269
Little, R.J.A., 1988. Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics 37, 23–28. Markatou, M., 2000. Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56 (2), 483–486. Maronna, R.A., Yohai, V.J., 1981. Asymptotic behavior of general M -estimates for regression and scale with random carries. Z. Wahr. Ver. Geb. 58, 7–21. Pollard, D., 1990. Empirical processes: theory and applications. NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 2. Institute of Mathematical Statistics, Hayward, California. Rousseeuw, P.J., Yohai, V., 1984. Robust regression by means of S-estimators. In: Franke, J., Hardle, W., Martin, R.D. (Eds.), Robust and Nonlinear Time Series Analysis. Springer, Berlin and New York, pp. 256 –272. Simpson, D.G., Yohai, V.J., 1998. Functional stability of one-step GM-estimators in approximately linear regression. Ann. Statist. 26, 1147–1169. Simpson, D.G., Ruppert, D., Carroll, R.J., 1992. On one-step GM estimates and stability of inferences in linear regression. J. Am. Stat. Assoc. 87, 439–450.