Robust estimators in semiparametric partly linear regression models

Robust estimators in semiparametric partly linear regression models

Journal of Statistical Planning and Inference 122 (2004) 229 – 252 www.elsevier.com/locate/jspi Robust estimators in semiparametric partly linear re...

433KB Sizes 0 Downloads 103 Views

Journal of Statistical Planning and Inference 122 (2004) 229 – 252

www.elsevier.com/locate/jspi

Robust estimators in semiparametric partly linear regression models Ana Biancoa , Graciela Boenteb;∗ a Universidad

b Universidad

de Buenos Aires, Argentina de Buenos Aires and CONICET, Argentina Accepted 15 June 2003

Abstract In this paper, under a semiparametric partly linear regression model, a family of robust estimates for the regression parameter and the regression function is introduced and discussed. Some of their asymptotic properties are studied. Through a Monte Carlo study the performance of the estimates is compared with the classical ones. c 2003 Elsevier B.V. All rights reserved.  MSC: primary 62F35; secondary 62H25 Keywords: Partly linear models; Robust estimation; Smoothing techniques; Rate of convergence; Asymptotic properties

1. Introduction Statistical inference for multidimensional random variables commonly focuses on functionals of its distribution that are either purely parametric or purely nonparametric. A reasonable parametric model produces precise inferences, while a badly misspeci7ed model possibly leads to seriously misleading conclusions. On the other hand, nonparametric modeling is associated both with greater stability and less precision. Recently, nonparametric regression models have gained a lot of attention in order to study nonlinearity to explore the nature of complex nonlinear phenomena. Let (yi ; xi ; ti ) be  This research was partially supported by Grant PICT # 03-00000-006277 from ANPCYT at Buenos Aires, Argentina. The research of Graciela Boente was also partially supported by a Guggenheim fellowship. ∗ Corresponding author. E-mail address: [email protected] (G. Boente).

c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter  doi:10.1016/j.jspi.2003.06.007

230

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

independent observations such that yi ∈ R, ti ∈ R, xi ∈ Rp and yi = (xi ; ti ) + i

1 6 i 6 n;

(1)

where the errors i are independent and independent of (xi ; ti ) . Analysis of model (1) requires multivariate smoothing since the function  has a multidimensional domain. Thus, this model often encounters, in applications, the dif7culty known as the “curse of dimensionality”. In recent years, several authors have dealt with the dimensionality reduction problem in nonparametric regression models. In order to solve the problem of empty neighborhoods several approaches have been given to estimate the regression function when covariates lie in a high dimensional space. Hastie and Tibshirani (1990) introduced additive models so as to generalize linear regression, with the goal of solving the curse of dimensionality and keeping the easy interpretability of this model. This approach combines the Iexibility of nonparametric models and the simple interpretation of the linear ones. An intermediate strategy employs a semiparametric form, that combines the advantages of both parametric and nonparametric methods, and in which the regression function is (x; t) = o x + g(t), and so (1) can be written as yi = o xi + g(ti ) + i

1 6 i 6 n:

(2)

Partly linear models (2) are more Iexible than standard linear models since they have a parametric and a nonparametric component. They can be a suitable choice when one suspects that the response y linearly depends on x, but that it is nonlinearly related to t. The components of o may have, for instance, interesting signi7cance. As it is well known, model (2) is used when the researcher knows more about the dependence among y and x than about the relationship between y and the predictor t, which establishes an unevenness in prior knowledge. This model has been studied by several authors: Ansley and Wecker (1983), Green et al. (1985), Denby (1986), Heckman (1986), Engle et al. (1986), Rice (1986), Chen (1988), Robinson (1988), Speckman (1988), Chen and Chen (1991), Chen and Shiau (1991, 1994), Gao (1992), Gao and Zhao (1993), Gao and Liang (1995), He and Shi (1996) and Yee and Wild (1996) who investigated some asymptotic results using smoothing splines, kernel or nearest neighbors techniques. Heckman (1986) and Chen (1988) showed that the estimates of the regression parameter o can achieve a root-n rate of convergence if x and t are independent. Rice (1986) obtained the asymptotic bias of a partial smoothing estimate of o due to the dependence between x and t. Robinson (1988) explained why estimates of o based on incorrect parametrization of the function g are generally inconsistent and proposed a least-squares estimator of o which will be root-n consistent by inserting nonparametric regression estimators in the nonlinear orthogonal projection on t. Estimates based on kernel weights were also considered by Severini and Wong (1992) for the independent setting, in a more general framework, and by Truong and Stone (1994) for autoregression models. Liang et al. (1999) proposed an unbiased estimate for the case of errors in variables while He et al. (2001) considered M-type estimates for repeated measurements. It is well known that, both in linear regression and in nonparametric regression, least-squares estimators can be seriously aMected by anomalous data. The same

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

231

statement holds for partly linear models. The aim of this paper is, thus, to propose a class of robust procedures under the partly linear model (2) and to study some of their asymptotic properties. In Section 2, we review the classical approach and we introduce the robust estimates. Consistency results are stated in Section 3, while asymptotic normality for the estimates of the regression parameter is studied in Section 4. All proofs are given in the appendix. Finally, in Section 5, for small samples, the behavior of the least-squares estimates and of diMerent resistant estimates is compared through a Monte Carlo study under normality and contamination. 2. Estimators 2.1. Model and classical approach Assume that (Y; X ; T ) is a random vector with the same distribution as (yi ; xi ; ti ) , that is Y = o X + g(T ) + ;

(3)

where  is independent of (X ; T ) and we will assume that  has a symmetric distribution. In the classical approach to the problem it is assumed that E(||) ¡ ∞ and E(X2 ) ¡ ∞. When E(X) = 0 and X and T are independent, under regularity conditions, the least-squares procedure of Y on X yields consistent and eNcient estimates of o . As noted by Robinson (1988), the main tool to solve the problem under nonorthogonality is to insert nonparametric shape estimators of the nonparametric component in a standard parametric regression estimate. Denote o (t) = E(Y |T = t) and (t) = (1 (t); : : : ; p (t)) , where j (t) = E(Xj |T = t). Then, we have g(t) = o (t) − o (t) and hence Y − o (t) = o (X − (t)) + . ˆ This suggests that estimators of o (t) and (t), ˆ o (t) and (t), can be inserted prior to the estimation of the regression parameter. The classical nonparametric approach estimates the conditional expectations through n n   ˆ o; LS (t) = wi (t)yi ˆ j; LS (t) = wi (t)xij ; i=1

i=1

where, for the kernel approach, the weights are given by K((ti − t)=h) ; wi (t) = n j=1 K((tj − t)=h)

(4)

with K a kernel function, i.e., a nonnegative integrable function on R and h the bandwidth parameter, while the nearest neighbor with kernel approach considers as weight function K((ti − t)=Hn (t)) wi (t) = n ; (5) j=1 K((tj − t)=Hn (t)) with Hn (t) the distance between t and its kn -nearest neighbor among t1 ; : : : ; tn .

232

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

With either of these initial estimates, the least-squares estimator of o , ˆLS , can be obtained minimizing n  [yi − ˆ o; LS (ti ) −  (xi − ˆ LS (ti ))]2 ; (6) i=1  leading to the 7nal least squares estimate of g, gˆLS (t) = ˆ o; LS (t) − ˆLS ˆ LS (t). The properties of these estimates have been widely studied in the literature, as it was mentioned in the Introduction.

2.2. Robust estimates As mentioned by Chen and Shiau (1994), the procedure described above and proposed independently by Denby (1986) and Speckman (1988), can be related to the partial regression procedure in linear regression. More precisely, as in this procedure, in order to obtain the regression estimator, these authors 7rst smoothed the covariates x and the response y and then, regressed the residuals of the smoothing from y on the residuals of the smoothing from x. Similarly to the purely parametric and nonparametric models, the least-squares estimators, used at each step, can be seriously aMected by a small fraction of outliers. So, it may be preferable to estimate the conditional location functional through a robust smoothing and the regression parameter by a robust regression estimator. Putting these ideas together, the procedure to obtain robust estimators in partly linear models can be described as follows: Step 1: Estimate o (t) and j (t) through a robust smoothing, as the local medians ˆ or local M-type estimates. Let ˆ o (t) and ˆ j (t) denote the obtained estimates and (t)=  ˆ ˆ (1 (t); : : : ; p (t)) . Step 2: Estimate the regression parameter by applying a robust regression estimate ˆ i ). Let ˆ denote the obtained estimator. to the residuals yi − ˆ o (ti ) and xi − (t  ˆ Step 3: De7ne the estimate of the regression function g as g(t) ˆ = ˆ o (t) − ˆ (t). It is worth noticing that our proposal is a robust version of the partial regression estimators introduced by Denby (1986), Robinson (1988) and Speckman (1988). In Step 3, an alternative estimate of the regression function g can be obtained by  robustly smoothing the residual yi − ˆ xi . However, this procedure would be computationally more expensive than the one described above. In Step 1, we compute the local medians ˆ o; med (t) and ˆ j; med (t) as the median of the empirical conditional distribution functions Fˆ o (y|T = t) and Fˆ j (x|T = t), respectively which are de7ned as n  Fˆ o (y|T = t) = wi (t)I(−∞; y] (yi ); (7) i=1

Fˆ j (x|T = t) =

n  i=1

wi (t)I(−∞; x] (xij )

1 6 j 6 p;

(8)

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

233

with wi (t) the kernel weights de7ned in (4) or the nearest neighbor with kernel weights given in (5). Note that Fˆ o (y|T =t) and Fˆ j (x|T =t) provide estimates of the distribution of Y |T = t and of Xj |T = t, which will be denoted Fo (y|T = t) and Fj (x|T = t), respectively. On the other hand, local M-type estimates, ˆ o; M (t) and ˆ j; M (t) are de7ned as the location M-estimates related to Fˆ o (y|T = t) and Fˆ j (x|T = t), respectively. Thus, they are the solutions of n  i=1

 wi (t)

yi − ˆ o; M (t) sˆo (t)

 =0

n  i=1

 wi (t)

xij − ˆ j; M (t) sˆj (t)

 = 0;

where wi (t) were introduced in (4) or (5), is an odd, increasing, bounded and continuous function and sˆo (t) and sˆj (t) are local robust scale estimates. Possible choices for the score function are the Huber or the bisquare -function, while the scales sˆo (t) and sˆj (t) can be taken as the local median of the absolute deviations from the local median (local MAD), i.e., the MAD (Huber, 1981) with respect to the distributions Fˆ o (y|T = t) and Fˆ j (x|T = t) de7ned in (7) and (8). As described in Step 2, once we have obtained robust estimates of o (t) and j (t), ˆ o (t) and ˆ j (t), the robust estimation of the regression parameter can be perfomed ˆ i ) any of the robust by applying to the residuals rˆi = yi − ˆ o (ti ) and zˆi = xi − (t methods proposed for linear regression. Among them, we have M-estimates (Huber, 1981), which fail to resist outliers with high leverage and GM-estimators (Mallows, 1975, Krasker and Welsch, 1982), which have breakdown point that decreases with the dimension of the carriers. These two estimates have root-n order of convergence and can be calibrated in order to reach high eNciency. On the other hand, LMS-estimator (least median of squares) and LTS-estimator (least trimmed of squares) have high breakdown point, but low eNciency (Rousseeuw and Leroy, 1987). Also, high breakdown point estimates with high eNciency as MM or -estimates or reweighted estimates could be evaluated (Yohai, 1987; Yohai and Zamar, 1988; Gervini and Yohai, 2000). As in Chen (1988) and Robinson (1988), we will assume that the vector 1n is not in the space spanned by the column vectors of x, that is, we do not allow o to include an intercept so that the model is identi7able, i.e., if 1 xi + g1 (ti ) = 2 xi + g2 (ti ) for 1 6 i 6 n then, 1 = 2 and g1 = g2 . Due to the generality of the semiparametric model (2), identi7ability implies that only “slope” coeNcients can be estimated. Moreover, we avoid any linear combination of the components of x from being a function of t. As Robinson (1988) noticed, this rules out the situation of an unknown regression function depending only on t, where o x represents the Taylor’s expansion of order p and g taking care of the remainder term, since these models are more nonparametric than semiparametric. The generality of model (2) and the nature of g exclude them beyond the estimation procedure used. Instead, a functional relationship among the elements of x is not excluded.

234

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

3. Consistency Let so (t) and sj (t) be the MAD of the conditional distribution of Y |T = t and of Xj |T = t, respectively, and de7ne j (t), 0 6 j 6 p as the solution of    Y − o (t)  E  T = t = 0; so (t)    Xj − j (t)  E  T = t = 0; 1 6 j 6 p: sj (t) Let  be the vector with jth component j (t); 1 6 j 6 p. We will derive consistency for both kernel or nearest neighbor with kernel estimates. For this reason, assumptions are split according to the weights used. We will consider the following set of assumptions. A1.

: R → R is an odd function, strictly increasing, bounded and continuous diMerentiable, satisfying u  (u) 6 (u). A2. Fo (y|T = t) and Fj (x|T = t) are symmetric around o (t) and j (t), for 1 6 j 6 p, respectively. A3. For any compact set C ⊂ R, the density fT of T is bounded on C and inf t∈C fT (t) ¿ 0: A4. Fo (y|T = t) and Fj (x|T = t) are continuous functions of t. Furthermore, for any compact set C ⊂ R, they satisfy the following equicontinuity condition: ∀ ¿ 0 ∃# ¿ 0: |u − v| ¡ # ⇒ sup max (|Fj (u|T = t) − Fj (v|T = t)|) ¡ : t∈C 06j6p

A5. The

kernel K : R → R is a bounded nonnegative function such that K(u) du=1, |u|K(u) du ¡ ∞, |u|K(u) → 0 as |u| → ∞ and satis7es a Lipschitz condition of order one. A6. The sequence h = hn is such that hn → 0, nhn → ∞ and nhn =log n → ∞. A7. The density fT of T is a continuous function. A8. The kernel K : R → R satis7es K(uz) ≥ K(z) for any u ∈ (0; 1). A9. The sequence k = kn is such that kn =n → 0, kn → ∞ and kn =log n → ∞. Remark 3.1. This set of assumptions can be divided into three groups. The 7rst one establishes standard conditions on the score function . The second one states regularity conditions on the marginal density of T and on the conditional distribution functions which imply that, for 0 6 j 6 p and any compact set C, 0 ¡ inf t∈C sj (t) 6 supt∈C sj (t) ¡ ∞. The third group restricts the class of kernel functions to be choosen and establishes conditions on the rate of convergence of the smoothing parameters, which are standard in nonparametric regression. The following result will be needed in order to ensure consistency of both the regression parameter and the function g, when the smoothing is based either on local medians or local M-smoothers.

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

235

Proposition 1. Assume that A3–A5 hold. Moreover, assume that A6 holds, for kernel weights and that A7–A9 hold for nearest neighbor with kernel weights. Then, for any compact set C, a:s: (a) under A1 and A2, we have that supt∈C |ˆ j; M (t) − j (t)|−→0; 0 6 j 6 p (b) if, in addition, Fj (x|T = t) have a unique median at j (t), for 0 6 j 6 p, we have that a:s:

sup |ˆ j; med (t) − j (t)|−→0; t∈C

0 6 j 6 p:

(9)

Remark 3.2. Write R(t) = Y − o (t) and Z(t) = X − (t), then R(t) − o Z(t) = Y − o X − (o (t) − o (t)) = g(t) − g(t) ˜ + . In order to guarantee Fisher-consistency, it is necessary that g(t) ˜ equals g(t). For instance, if Z(t) has noncollinear columns, Xj |T =t is symmetric around j (t), Y |T = t is symmetric around o (t) + g(t), i.e., if A2 holds with o (t) = o (t) + g(t), we have that g˜ = g. Therefore, the robust estimates de7ned in Section 2.2 will be consistent to o as it will be shown in Theorem 1. First, we state an auxiliary lemma. Lemma 1. Let (ri ; zi ; ti ) ∈ Rp+2 ; 1 6 i 6 n be i.i.d. random vectors over ('; A; P) such that (ri ; zi ) have common distribution P. Let (ˆo (t) and (t) ˆ = ((ˆ1 (t); : : : ; (ˆp (t)) be random functions such that for any compact set C ⊂ R a:s:

sup |(ˆj (t)|−→0; t∈C

0 6 j 6 p:

(10)

De8ne Pn and Qn as the following empirical measures over Rp+1 n

Pn (A) =

1 IA (ri ; zi ) n

n

Qn (A) =

i=1

1 IA (ri + (ˆo (ti ); zi + (t ˆ i )); n i=1

where A ⊂ Rp+1 is a Borel set. Then, (a) for any bounded and continuous function f : Rp+1 → R we have that a:s:

|EQn (f) − EPn (f)|−→0 a:s:

(b) ,(Qn ; P)−→0, where , stands for the Prohorov distance. Theorem 1. Let (yi ; xi ; ti ) , 1 6 i 6 n be independent random vectors satisfying (2). Denote by P the distribution of (ri ; zi ) = (yi − o (ti ); xi − (ti ) ) , where o (t) and (t) are de8ned in A2 with o (t) = o (t) + g(t). Assume that ˆ j (t), 0 6 j 6 p are estimates of j (t) such that for any compact set C ⊂ R a:s: sup |ˆ j (t) − j (t)|−→0; t∈C

0 6 j 6 p:

(11)

236

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Let (H ) be a regression functional, for the model u =  v + , where (u ; v) ∼ H and  and v are independent. Assume that (H ) is continuous at P and that it also provides Fisher-consistent estimates. n ˆ ˆ ˆ If Pˆ n (A) = 1n I ( r ˆ ; i=1 A i zˆi ) with rˆi = yi − o (ti ) and zˆi = xi − (ti ), where (t) =  ˆ ˆ ˆ ˆ (1 (t); : : : ; p (t)) and ROB = (P n ), we have that a:s: ˆROB −→o : a:s:

Proof. From (11) and Lemma 1, we have that |EPˆn (f) − EPn (f)|−→0, where n

Pn (A) =

1 IA (ri ; zi ) n i=1 a:s:

and so ,(Pˆ n ; P)−→0. The result follows now from the continuity of the functional (H ) and from the fact ri = o zi + i . Remark 3.3. Note that the continuity condition at P is ful7lled for most robust regression estimates if the components of Z(T ) are not collinear. The implication of this condition is that any linear combination of the components of X cannot be a function of T and therefore, 1n is not in the space spanned by the column vectors of X. Corollary. Let (yi ; xi ; ti ) , 1 6 i 6 n be independent random vectors satisfying (2) and assume that ˆ j (t), 0 6 j 6 p are estimates of j (t) such that for any compact set C⊂R a:s:

sup |ˆ j (t) − j (t)|−→0; t∈C

0 6 j 6 p:

ˆ of Under the conditions stated in Theorem 1, the estimates g(t) ˆ = ˆ o (t) − ˆROB (t) the regression function g are uniformly consistent over compact sets. Remark 3.4. Using Proposition 1 and Theorem 1 we have that the proposed robust estimates, introduced in Section 2.2, are consistent if no linear combination of the components of X is a function of T .

4. Asymptotic distribution Let 1 and w2 be a score and a weight function, respectively. In this section, we will derive the asymptotic distribution of the regression parameter estimates de7ned as any solution of    n  rˆi − ˆ zˆi w2 (zˆi ) zˆi = 0; (12) 1 sn i=1

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

237

ˆ i ) and sn is an estimate of the residuals scale. with rˆi = yi − ˆ o (ti ), zˆi = xi − (t ˆ Let ˆ o (t) and (t) denote consistent estimates of o (t) and (t), respectively, with o (t) = o (t) + g(t). In order to derive the asymptotic distribution of the regression parameter estimates, we will require that the covariates ti lie in a compact set. Thus, without loss of generality, we will assume that ti ∈ [0; 1]. Denote by (R(T ); Z(T ) ) a random vector with the same distribution as (ri ; zi ) = (yi − o (ti ); [xi − (ti )] ) . Thus, R(T ) − Z(T ) o ∼ , with  as in (2). We will need the following set of assumptions. N1.

1 is an odd, bounded and twice continuously diMerentiable function with bounded derivatives 1 and 1 , such that ’1 (t) = t 1 (t) and ’2 (t) = t 1 (t) are bounded. N2. E(w2 (Z(T ))Z(T )2 ) ¡ ∞ and the matrix     R(T ) − Z(T ) o A = E 1 w2 (Z(T ))Z(T )Z(T ) .o     = E 1 E(w2 (Z(T ))Z(T )Z(T ) ) .o

is nonsingular. N3. w2 (u)= 2 (u)u−1 ¿ 0 is a bounded function, Lipschitz of order 1. Moreover, 2 is also a bounded and continuously diMerentiable function with bounded derivative   2 such that /2 (t) = t 2 (t) is bounded. N4. E(w2 (Z(T ))Z(T )|T = t) = 0 for almost all t. N5. The functions j (t), 0 6 j 6 p are continuous with 7rst derivative j (t), continuous in [0; 1]. As noted by Robinson (1988), condition N2 will prevent any element of X from being a.s. perfectly predictable by T . The additional condition implied by N2 is the lack of multicollinearity among the columns X − (T ), which fails if X itself is collinear. The smoothness condition N5 is a standard requirement in classical kernel estimation in semiparametric models in order to guarantee asymptotic normality, see for instance, Robinson (1988) and Severini and Wong (1992). Lemma 2. Let (yi ; xi ; ti ) , 1 6 i 6 n be independent random vectors satisfying (2) with i independent of (xi ; ti ) . Assume that ti are random variables with distribution on [0; 1]. Denote (R(T ); Z(T ) ) a random vector with the same distribution as (ri ; zi ) = (yi − o (ti ); [xi − (ti )] ) . Let ˆ j (t), 0 6 j 6 p be estimates of j (t) such that p

sup |ˆ j (t) − j (t)|−→0;

t∈[0;1]

06j6p

p p p and assume that ˜−→o and sn →.o . Then, under N1–N3, An →A where A is given in N2 and   n 1   rˆi − zˆi ˜ An = w2 (zˆi )zˆi zˆi : 1 n sn i=1

238

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Theorem 2. Let (yi ; xi ; ti ) , 1 6 i 6 n be independent random vectors satisfying (2) with i independent of (xi ; ti ) with symmetric distribution. Assume that ti are random variables with distribution on [0; 1]. Denote (R(T ); Z(T ) ) a random vector with the same distribution as (ri ; zi ) =(yi −o (ti ); [xi −(ti )] ) where o (t)=o (t)+g(t). Let ˆ j (t), 0 6 j 6 p be estimates of j (t) such that ˆ j (t) has 8rst continuous derivative and p

n1=4 sup |ˆ j (t) − j (t)|→0;

t∈[0;1]  sup |ˆ j (t) t∈[0;1]

p

− j (t)|→0;

0 6 j 6 p;

0 6 j 6 p:

(13) (14)

p

Then, if sn →.o , under N1–N5, D n1=2 (ˆ − o )→N (0; A−1 '(A−1 ) );

where A is de8ned in N2 and     R(T ) − Z(T ) o w22 (Z(T ))Z(T )Z(T ) ' = .o2 E 12 .o     = .o2 E 12 E(w22 (Z(T ))Z(T )Z(T ) ): .o Remark 4.1. Following analogous arguments to those used in Boente and Fraiman (1991) it can be shown that (13) hold under A5 for the optimal bandwidth of order n−1=5 . On the hand, (14) can be derived similarly to Proposition 2.1 in Boente et al. (1997) under regularity conditions on the kernel K. 5. Monte Carlo study A simulation study was carried when the regression parameter has dimension 2. The behavior of the least-squares estimates was compared with the following estimates: • those obtained by smoothing with a local M-estimate with bisquare score function, with constant 4.685, which gives a 95% eNciency. • those obtained by smoothing using a local median. After smoothing the response variables y and the regression covariates x, the following regression estimates of o were computed: • the M-estimates with Huber function with constant 1.345. • the M-estimates with bisquare (Tukey) function with constant 4.685. • the GM-estimates with Huber functions with constants 1.73 on the regression variables and 1.6 on the residuals. • the least median of squares. • the least trimmed with 33% trimmed observations.

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

239

In all the tables and 7gures LS denotes the least-squares estimate, MH and MT the M-estimates obtained with the Huber and the Tukey function, GM the GM-estimates, while LTS and LMS denote the high breakdown point estimates obtained using the least trimmed and the least median of squares. When we use the local median as smoother, the estimates are indicated as MH.m, MT.m, GM.m, LTS.m and LMS.m, respectively. In the smoothing procedure, we have used a bandwidth h = 0:04 and the gaussian kernel with standard deviation 0:25=0:675 = 0:37 such that the interquartile range is 0.5. The performance of an estimate gˆ of g is measured using two measures: n

1 MSE(g) ˆ = [g(t ˆ i ) − g(ti )]2 ; n i=1

MedSE(g) ˆ = median([g(t ˆ i ) − g(ti )]2 ): We performed 500 replications generating independent samples of size n = 100 following the model yi = o xi + 2 sin(42ti ) + i

1 6 i 6 n;

where o = (31 ; 32 ) = (3; 3) , (xi ; ti ) ∼ N (; ') with  = (0; 0; 12 ) and   1 √ 1 0  6 3    1   √  1 '= 0 ;  6 3    1 1 1  √ √ 36 6 3 6 3 and i ∼ N (0; .2 ) with .2 = 0:25 in the noncontaminated case. The results for normal data sets will be indicated by C0 in the tables, while C1 and by C2 will denote the following two contaminations. • C1 : 1 ; : : : ; n , are i.i.d. 0:9N (0; .2 )+0:1N (0; 25.2 ). This contamination corresponds to inIating the errors and thus, will only aMect the variance of the regression estimates. • C2 : 1 ; : : : ; n , are i.i.d. 0:9N (0; .2 ) + 0:1N (0; 25.2 ) and arti7cially 10 observations of the carriers but not of the response variables, were modi7ed to be equal to (20; 20) at equally spaced values of t. This case corresponds to introduce high-leverage points. The aim of this contamination is to study changes in bias in the estimation of the regression parameter. The following tables summarize the results of the simulations. Tables 1 and 2 give means, standard deviations and mean square errors for the regression estimates of 31 and 32 , while Fig. 1 shows the boxplots of the estimates of 31 and 32 , respectively. On the other hand, Table 3 gives the mean and median of MSE(g), ˆ while Table 4 only shows the median of MedSE(g) ˆ over the 500 replications since in this case both the mean and the median are quite similar. Furthermore, in Fig. 2, due to the skewness of

240

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Table 1 Estimation of the 7rst coordinate 31 of o LS

MH

MT

GM

LMS

LTS

MH.m

MT.m

GM.m

LMS.m

LTS.m

Mean SD MSE Median MAD

2.999 0.078 0.006 2.997 0.067

2.994 0.080 0.006 2.993 0.071

2.994 0.081 0.007 2.994 0.073

3.007 0.083 0.007 3.006 0.079

3.010 0.200 0.040 3.006 0.189

3.001 0.165 0.027 3.002 0.153

2.616 0.178 0.179 2.632 0.177

2.616 0.186 0.181 2.628 0.182

2.646 0.168 0.154 2.660 0.159

2.406 0.615 0.729 2.507 0.568

2.469 0.476 0.509 2.524 0.473

Mean SD MSE Median MAD

3.000 0.140 0.020 2.993 0.139

2.993 0.100 0.010 2.995 0.093

2.993 0.096 0.009 2.997 0.089

3.008 0.106 0.011 3.006 0.105

2.994 0.218 0.048 2.990 0.214

2.985 0.173 0.030 2.988 0.162

2.602 0.202 0.199 2.611 0.189

2.597 0.210 0.206 2.599 0.196

2.633 0.194 0.172 2.646 0.167

2.410 0.653 0.775 2.550 0.619

2.446 0.514 0.570 2.516 0.473

Mean SD MSE Median MAD

0.024 0.222 8.903 0.025 0.214

0.024 0.226 8.910 0.026 0.223

0.026 0.230 8.897 0.029 0.224

2.329 0.258 0.516 2.361 0.228

3.017 0.277 0.077 2.991 0.278

3.013 0.201 0.040 3.011 0.205

0.022 0.224 8.917 0.034 0.217

0.025 0.402 8.901 0.032 0.211

1.193 0.235 3.426 1.210 0.420

2.025 0.785 1.564 2.127 0.816

2.127 0.638 1.167 2.189 0.638

C0

C1

C2

Table 2 Estimation of the second coordinate 32 of o LS

MH

MT

GM

LMS

LTS

MH.m

MT.m

GM.m LMS.m LTS.m

Mean SD MSE Median MAD

2.999 0.082 0.006 3.001 0.076

2.994 0.085 0.007 2.998 0.080

2.995 0.086 0.007 2.999 0.084

3.006 0.086 0.007 3.008 0.085

3.007 0.208 0.043 3.006 0.202

3.001 0.168 0.028 2.992 0.153

2.619 0.196 0.183 2.542 0.197

2.618 0.204 0.187 2.637 0.205

2.645 0.184 0.159 2.649 0.178

2.396 0.610 0.737 2.480 0.566

2.463 0.489 0.527 2.542 0.485

Mean SD MSE Median MAD

2.996 0.140 0.020 3.000 0.137

2.993 0.104 0.011 2.995 0.098

2.993 0.099 0.009 2.991 0.100

3.005 0.108 0.012 3.008 0.099

2.992 0.220 0.048 2.979 0.222

2.992 0.177 0.031 2.990 0.170

2.604 0.215 0.203 2.626 0.475

2.598 0.223 0.211 2.628 0.588

2.632 0.208 0.179 2.639 0.199

2.409 0.637 0.754 2.533 0.207

2.446 0.500 0.556 2.534 0.202

Mean −0.017 −0.012 −0.010 2.319 3.014 3.009 −0.008 −0.003 1.176 SD 0.226 0.231 0.235 0.260 0.206 0.280 0.229 0.238 0.402 MSE 9.155 9.128 9.116 0.530 0.079 0.043 9.101 9.076 3.489 Median −0.020 −0.014 −0.011 2.352 3.026 3.006 −0.009 −0.006 1.180 MAD 0.232 0.225 0.236 0.211 0.262 0.189 0.215 0.227 0.421

2.028 0.788 1.564 2.105 0.822

2.135 0.643 1.161 2.222 0.618

C0

C1

C2

the distribution, the density estimates of MedSE(g) ˆ are plotted. The density estimates were evaluated using the normal kernel with bandwidth 0.3 in all cases, except for the least squares and the M-estimates under C2 , where due to the diMerent range of variation we have used a bandwidth equal to 3.

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252 Regression Estimates of β1 based on Local M-Estimators C1

C0 4 3 2 1

-1

-1

LS

MH

MT

GM LMS LTS

C2

4 3 2 1

4 3 2 1

-1

241

LS

MH

MT

GM LMS LTS

LS

MH

MT

GM LMS LTS

Regression Estimates of β1 based on Local Medians 4 3 2 1

4 3 2 1

4 3 2 1

-1

-1

-1

MH.m MT.m GM.m LMS.m LTS.m

MH.m MT.m GM.m LMS.m LTS.m

MH.m MT.m GM.m LMS.m LTS.m

Regression Estimates of β2 based on Local M-Estimators 4 3 2 1

4 3 2 1

4 3 2 1

-1

-1

-1

LS

MH

MT

GM LMS LTS

LS

MH

MT

GM LMS LTS

LS

MH

MT

GM LMS LTS

Regression Estimates of β2 based on Local Medians 4 3 2 1

4 3 2 1

4 3 2 1

-1

-1

-1

MH.m MT.m GM.m LMS.m LTS.m

MH.m MT.m GM.m LMS.m LTS.m

MH.m MT.m GM.m LMS.m LTS.m

Fig. 1. Boxplots of the estimates for the regression parameter o .

Table 3 Estimation of the regression function g (mean square error) LS

MH

MT

GM

LMS

LTS

MH.m

MT.m

GM.m LMS.m LTS.m

Mean Median

0.048 0.045

0.065 0.059

0.065 0.065 0.103 0.087 0.060 0.059 0.081 0.073

1.097 1.008

1.102 1.077 1.025 1.017

1.786 1.236

1.514 1.168

C0

Mean Median

0.150 0.125

0.155 0.136

0.155 0.156 0.200 0.181 0.136 0.135 0.169 0.155

1.231 1.142

1.239 1.209 1.144 1.136

1.937 1.359

1.671 1.291

C1

9.310 2.476

9.040 2.302

C2

Mean 11.08 Median 10.86

11.12 10.91

11.11 10.91

4.007 5.603 5.707 11.99 0.729 0.499 0.476 11.76

11.97 11.76

7.689 4.953

The plots given in black correspond to the densities of MedSE(g) ˆ evaluated over the 500 normally distributed samples, while those in red correspond to C1 and those in green to C2 .

242

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Table 4 Estimation of the regression function g (median square error) LS

MH

MT

GM

LMS

LTS

MH.m

MT.m

GM.m

LMS.m

LTS.m

Median 0.016

0.023

0.023

0.022

0.031

0.026

0.388

0.396

0.386

0.459

0.427

C0

Median 0.036

0.042

0.042

0.042

0.052

0.047

0.428

0.432

0.421

0.484

0.447

C1

Median 3.775

3.890

3.914

0.207

0.157

0.142

4.244

4.256

1.592

0.837

0.769

C2

5.1. Conclusions The simulation con7rms the expected inadequate behavior of the least-squares estimates in the presence of outliers. With respect to the estimation of o , both bias and an increased standard deviation are observed, specially under C2 . Note also that, under C2 , the best performance for estimating the regression parameters is obtained by the least median of squares and the least trimmed estimates, see Tables 1 and 2 and Fig. 1. The M-estimates increase their mean square error under C1 and C2 , in the 7rst case due to an inIated variance and in the second one due to the bias. Recall that M-estimates breakdown when leverage points are present. Fig. 1 shows, not only the poor behavior of least squares and M-estimates in the presence of outliers in the carriers, but also the moderate sensitivity of GM-estimates to C2 . On the other hand, both least median and least trimmed of squares behave robustly, since their boxplots look quite similarly for normal and for contaminated samples. This resistance to outliers is more evident for least trimmed, mainly due to the numerical instability of least median. In general, the estimates based on initial M-smoothers show less variability than those based on the local median. With regard to the estimation of the function g, as it can be seen in Tables 3 and 4, under C2 , the least squares and M-estimators estimate inadequately the regression function. The estimates based on GM-estimators and specially those based on least median and least trimmed estimates show a better performance. Notice that, even for these estimates, large values of MSE(g) ˆ are present since means are considerably larger than medians in Table 3. The plots in Fig. 2 show, not only the described poor behavior of the least-squares estimates and of the M-estimates in the presence of high leverage outliers, but also the sensitivity of those based on GM-estimates. On the other hand, least trimmed and least median of squares seem to be the more robust procedures. Also, all methods, including least squares, appear to be mainly unaMected by the presence of outliers in the errors (contamination C1 ), but the procedures based on local medians are more stable under this contamination than those based on M-smoothers. However, a bias in the estimation with local medians can be observed since the mode of the density of MedSE(g) ˆ using local medians is near 0.5 while it is closer to 0 for local M-smoothers. In consequence, the MSE(g) ˆ and MedSE(g) ˆ obtained from least squares and a M-smoother are on the whole smaller than those resulting from local medians under C0 and C1 .

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252 5

0.20

4

0.15

243

3 0.10 2 0.05

1 0

0.0 0.0

0.2

0.4

0.6 LS

0.8

1.0

0

2

4

6 8 LS

10

12

Estimates based on Local M-estimates

Estimates based on Local Medians 2.0

0.25

5 4

0.25

1.5

3

0.15

0.15

1.0

2 1

0.05

0

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.5

0.05

0.0 0

2

4

MH

6

8

10

0.0 0.0

12

0.5

MH

1.5

2.0

0

2

4

MH.m

4

6

8

10

12

8

10

12

4

5

6

4

5

6

4

5

6

MH.m

2.0

0.25

5

1.0

0.25

1.5 0.15

3

0.15 1.0

2 1

0.05

0.5

0

0.0

0.0

0.0

0.2

0.4

0.6

0.8

1.0

0

2

MT

4

6

8

10

12

0.05 0.0 0.0

0.5

MT

5 4

1.0

1.5

0

2.0

2

4

MT.m

4

2.0

3

1.5

2

1.0

1

0.5

0

0.0

6 MT.m

1.0 0.8 0.6

3

0.4

2 1 0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

GM

0.4

0.6

0.8

0.2 0.0 0.0

1.0

0.5

GM

5

1.5

2.0

0

1

2

GM.m

4

2.0

3

1.5

2

1.0

1

1

0.5

0

0

0.0

4

1.0

3 GM.m

1.0 0.8

3

0.6

2

0.4

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

LMS

0.4

0.6

0.8

0.2 0.0 0.0

1.0

0.5

LMS

1.0

1.5

2.0

0

1

2

LMS.m

3 LMS.m

5

4

2.0

1.0

4

3

1.5

0.8

2

1.0

1

0.5

0

0.0

0.6

3

0.4

2 1 0 0.0

0.2

0.4

0.6

LTS

0.8

1.0

0.0

0.2

0.4

0.6

LTS

0.8

1.0

0.2 0.0 0.0

0.5

1.0 LTS.m

1.5

2.0

0

1

2

3 LTS.m

Fig. 2. Density estimates of the median square error, MedSE(g). ˆ The plots given in black correspond to the 500 normally distributed samples, while those in red correspond to C1 and those in green to C2 .

244

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Acknowledgements The authors wish to thank an anonymous referee for valuable comments which led to an improved version of the original paper. Appendix Proof of Proposition 1. (a) Follows using Theorem 3.3 from Boente and Fraiman (1991). (b) The equicontinuity condition required in A4 and the uniqueness of the conditional median imply that j (t) is a continuous function of t and thus, for any 7xed a ∈ R the function ha (t) = Fj (a + j (t)|T = t) will also be continuous as a function of t. Given  ¿ 0, let 0 ¡ # ¡  be such that  (A.1) |u − v| ¡ # ⇒ sup max (|Fj (u|T = t) − Fj (v|T = t)|) ¡ : 2 t∈C 06j6p Then, from the uniqueness of the conditional median and (A.1) we get that, for 0 6 j 6 p, 1 1  ¡ Fj (j (t) + #|T = t) ¡ + ; (A.2) 2 2 2 1  1 − ¡ Fj (j (t) − #|T = t) ¡ : 2 2 2

(A.3)

Write mj (#) = inf t∈C Fj (j (t) + #|T = t) and Mj (#) = supt∈C Fj (j (t) − #|T = t). The continuity of h# (t) and h−# (t) together with (A.2) and (A.3) entail that, for any 0 6 j 6 p, Mj (#) ¡ 12 ¡ mj (#) and thus   1 1 ( = min min mj (#) − ; − Mj (#) ¿ 0: 06j6p 2 2 Using Theorem 3.1 or 3.2 from Boente and Fraiman (1991), we have that a:s:

sup sup |Fˆ j (x|T = t) − Fj (x|T = t)|−→0; t∈C x∈R

0 6 j 6 p:

Let N be such that P(N) = 0 and for any ! ∈ N, max06j6p supt∈C supx∈R |Fˆ j (x|T = t) − Fj (x|T = t)| → 0: Thus, for n large enough we have that max06j6p supt∈C supx∈R |Fˆ j (x|T =t)−Fj (x|T =t)| ¡ min( (2 ; 2 ) = 1 . Therefore, for 0 6 j 6 p, and t ∈ C, we have that Fj (j (t) + #|T = t) − 1 ¡ Fˆ j (j (t) + #|T = t) ¡ Fj (j (t) + #|T = t) + 1 ; Fj (j (t) − #|T = t) − 1 ¡ Fˆ j (j (t) − #|T = t) ¡ Fj (j (t) − #|T = t) + 1 ; which entail that 12 ¡ Fˆ j (j (t) + #|T = t) ¡ 12 +  and 12 −  ¡ Fˆ j (j (t) − #|T = t) ¡ 12 and so, max06j6p supt∈C |ˆ j; med (t) − j (t)| 6 # ¡  which concludes the proof.

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

245

Proof of Lemma 1. (a) For any  ¿ 0, there exist compact sets C1 ⊂ Rp+1 and C2 ⊂ R such that, if C = C1 × C2 , P(C) ¿ 1 − =(4f∞ ). Note that |EQn (f) − EPn (f)| 6 A1n + A2n , where n 1 |f(ri + (ˆo (ti ); zi + (t ˆ i )) − f(ri ; zi )|IC (ri ; zi ; ti ); A1n = n i=1

n

A2n = 2f∞

1 ICc (ri ; zi ; ti ): n i=1

From (10) and the Strong Law of Large Numbers, we have that there exists a set N ⊂ ' such that P(N) = 0 and such that for any ! ∈ N we obtain that n 1 ICc (ri ; zi ; ti ) → P(Cc ): (A.4) sup |(ˆo (t)| + sup |(t)| ˆ → 0 and n t∈C2 t∈C2 i=1

 2

Hence, for n large enough A2n 6 for ! ∈ N. Denote by C1 the closure of a neighborhood of radius 1 of C1 . The uniform continuity of f on C1 implies that there exists # such that max16j6p+1 |uj − vj | ¡ #; u; v ∈ C1 entails |f(u) − f(v)| ¡ 2 . Hence, from (A.4) we have that for ! ∈ N and n large enough max06j6p supt∈C2 |(ˆj (t)| ¡ # and so, for 1 6 i 6 n, we obtain that  ˆ i )) − f(ri ; zi )|IC (ri ; zi ; ti ) ¡ ; |f(ri + (ˆo (ti ); zi + (t 2 which entails that A1n ¡ 2 . Therefore, |EQn (f) − EPn (f)| ¡  for n large enough and ! ∈ N. (b) Follows immediately. From now on, C: will denote the Lipschitz constant for a Lipschitz function :. Proof of Lemma 2. For any matrix B ∈ Rp×p , let |B| = max16‘; j6p |b‘j |. Denote by =i intermediate points between ri −zi ˜ and rˆi − zˆi ˜ and (ˆj (t)= ˆ j (t)−j (t) for 0 6 j 6 p, and ˆ = ((ˆ1 (t); : : : ; (ˆp (t)) . A Taylor expansion of 7rst order and some algebra lead us to An = An1 + An2 + An3 + An4 , where   n 1   ri − zi ˜ 1 w2 (zi )zi zi ; An = 1 n sn i=1

An2

n

1 =− n

  1

i=1

An3

n

1 =− n i=1

An4

n

1 = n i=1

 1

  1

rˆi − zˆi ˜ sn



=i sn



rˆi − zˆi ˜ sn

 w2 (zˆi )[(t ˆ i )zi + zˆi (t ˆ i ) ];

(ˆo (ti ) − (t ˆ i ) ˜ sn

 w2 (zi )zi zi ;

 [w2 (zˆi ) − w2 (zi )]zi zi :

246

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Analogous arguments to those used in Lemma 1 in Bianco and Boente (2002) allow p us to show that An1 →A. From N3, it is easy to see that zi 2 |w2 (zˆi ) − w2 (zi )| 6 (t ˆ i )( 2 ∞ + (t ˆ i )(w2 ∞ +  2 ∞ ) + /2 ∞ ): ˜ the Law of Large Now, the result follows from N2, the consistency of sn and , p Numbers and the fact that max06j6p supt∈[0; 1] |(ˆj (t)|→0, since   |An2 | 6  1 ∞ max sup |(ˆj (t)| 2 06j6p t∈[0;1]

 |An3 | 6  1 ∞

max sup |(ˆj (t)|

06j6p t∈[0;1]

2 ∞

+ w2 ∞ max sup |(ˆj (t)| ;

˜ 1 + p sn

06j6p t∈[0;1]



n

1 w2 (zi )zi 2 ; n i=1

|An4 | 6 p 1 ∞ max sup |(ˆj (t)| 

16j6p t∈[0;1]



×  2 ∞ + p max sup |(ˆj (t)|(w2 ∞ +  16j6p t∈[0;1]

 2 ∞ )

+ /2 ∞

Proof of Theorem 2. Write 

n . Ln (.; ) = n

1

i=1

ri − zi  .



n  ˆ n (.; ) = . L n

1

i=1



rˆi − zˆi  .

w2 (zi )zi ;  w2 (zˆi )zˆi :

ˆ we get Using a 7rst order Taylor expansion around ,   n ˆ  ˆ r ˆ − z  . i i ˆ n (.; o ) = L w2 (zˆi )zˆi 1 n . i=1

n

1 + n

  1

i=1

rˆi − zˆi ˜ .

 w2 (zˆi )zˆi zˆi (ˆ − o );

with ˜ an intermediate point between ˆ and o . This implies that   n ˜  ˆ r ˆ  − z 1 i i  ˆ n (sn ; o ) = 0 + L w2 (zˆi )zˆi zˆi (ˆ − o ) 1 n sn i=1

:

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

247

ˆ n (sn ; o ) with and so, we get that (ˆ − o ) = An−1 L 

n

1 An = n

 1

i=1

rˆi − zˆi ˜ sn

 w2 (zˆi )zˆi zˆi :

p ˆ Lemma 2 implies that An →A From the consistency of , and therefore, from N2 it will be enough to show that D

(a) n1=2 Ln (.o ; o )→N (0; '). p ˆ n (sn ; o ) − Ln (sn ; o )]→0. (b) n1=2 [ L p (c) n1=2 [Ln (sn ; o ) − Ln (.o ; o )]→0. (a) Follows immediately from the Central Limit Theorem, since ri − zi o = i . (b) Denote by =i intermediate points between ri −zi ˜ and rˆi − zˆi ˜ and (ˆj (t)= ˆ j (t)− j (t) for 0 6 j 6 p, and =( ˆ (ˆ1 (t); : : : ; (ˆp (t)) . Using a second order Taylor expansion, ˆ n (sn ; o ) = Ln (sn ; o ) + L ˆ n; 1 + L ˆ n; 2 + L ˆ n; 3 + L ˆ n; 4 + L ˆ n; 5 + L ˆ n; 6 , we have that L where n

 ˆ n; 1 = 1 L n

 1



i=1



n  ˆ n; 2 = sn L n

1

i=1

n   ˆL n; 3 = sn n

1

n

 ˆ n; 4 = 1 1 L 2sn n i=1

n

i=1

 1





ri − zi o sn



i=1

 ˆ n; 5 = 1 L n

ri − zi o sn



rˆi − zˆi o sn

 1



=i sn



ri − zi o sn

[ˆ (ti )o − (ˆo (ti )]w2 (zi )zi ;

[w2 (zˆi )zˆi − w2 (zi )zi ]; 

 −

1

ri − zi o sn



[ˆ (ti )o − (ˆo (ti )][w2 (zˆi ) − w2 (zi )]zi :

ˆ (ti ) zi  ,

where C = w2 ∞ + C 2 , and

 n

ˆ n; 3  6 pw2 ∞  L

w2 (zˆi )(zˆi − zi );

[ˆ (ti )o − (ˆo (ti )]2 w2 (zˆi )zi ;

Since, N3 entails |w2 (zˆi ) − w2 (zi )| 6 C

1=2



 1=2 1 ∞ n

2 max sup |(ˆj (t)|

16j6p t∈[0;1]

(1 + po );

248

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

n

1=2

ˆ n; 4  6 1 1  L 2 sn

  1=2 1 ∞ n

2 max sup |(ˆj (t)|

16j6p t∈[0;1]

(1 + po )2





×  2 ∞ + p w2 ∞ max sup |(ˆj (t)| ; 16j6p t∈[0;1]

 ˆ n; 5  6 pC 1 ∞ (1 + p o )n1=2 n1=2  L

2 max sup |(ˆj (t)|

16j6p t∈[0;1]

;

p ˆ n; j →0. using (13) and the consistency of sn , we get that for 3 6 j 6 5, n1=2  L p ˆ n; j →0 for j = 1; 2, that is, It remains to show that n1=2 L   n sn   ri − zi o p n1=2 (ˆ‘ (ti )w2 (zi )zi →0; 0 6 ‘ 6 p; (A.5) 1 n sn i=1

1=2

n

n sn  n

 1

i=1

ri − zi o sn



p

[w2 (zˆi )zˆi − w2 (zi )zi ]→0:

(A.6)

For this purpose, we will use the maximal inequality for covering numbers, componentwise, and so, we will need to de7ne suitable classes of functions with 7nite uniform entropy. Fix the coordinate j, 1 6 j 6 p. For any function h, any vector of funtions h(t) = (h1 (t); : : : ; hp (t)) and any . ∈ ( .2o ; 2.o ), if zi; j denotes the jth coordinate of zi , we de7ne   n .   ri − zi o h(ti )w2 (zi )zi; j ; Jn; 1 (.; h) = n1=2 1 n . i=1

Jn; 2 (.; h) =n

1=2

n . n i=1

 1

ri − zi o .

 [w2 (zi + h(ti ))(zi; j + hj (ti )) − w2 (zi )zi; j ];

where we have omitted the subscript j for the sake of simplicity. Let I=( .2o ; 2.o ) and H={h ∈ C1 [0; 1] : h∞ 6 1 h ∞ 6 1}. Note that, for any probabilitymeasure Q,the bracketing number N[ ] (; H; L2 (Q)), and so the covering number N ; H; L2 (Q) , satis7es     log N ; H; L2 (Q) 6 log N[ ] ; H; L2 (Q) 6 K−1 2 for 0 ¡  ¡ 2, where the constant K is independent of the probability measure Q (see Corollary 2.7.2 in van der Vaart and Wellner, 1996).

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

249

Consider the classes of functions     r − z o w2 (z)zj h(t); . ∈ I; h ∈ H ; F1 = f1; .; h (r; z; t) = . 1 .    r − z o F2 = f2; .; h (r; z; t) = . 1 [w2 (z + h(t))(zj + hj (t)) − w2 (z)zj ]; . . ∈ I; h(t) = (h1 (t); : : : ; hp (t)) h‘ ∈ H} ; where again we have omitted the subscript j for the sake of simplicity. Note that F1 and F2 have envelope the constants A1 = 2.o  1 ∞  2 ∞ and A2 = 4.o  1 ∞  2 ∞ , respectively. On the other hand, the independence between the errors i = ri − zi o and the carriers (xi ; ti ) implies that, for any f ∈ F1 , Ef(ri ; zi ; ti ) = 0 since N4 holds; while, Ef(ri ; zi ; ti ) = 0 for any f ∈ F2 , since 1 is odd and the errors have symmetric distribution. Write 1; s (t) = s 1 ( st ) and 1; s (t) = s 1 ( st ). From N1, we have that ’1 and ’2 are bounded, which entails that |

 1; s1 (r)



  1; s2 (r)| 6 ( 1 ∞

+ ’2 ∞ )|s1 − s2 |;

(A.7)

|

1; s1 (r)



1; s2 (r)| 6 ( 1 ∞

+ ’1 ∞ )|s1 − s2 |:

(A.8)

Write fQ; 2 = (EQ (f2 ))1=2 , B1 = ( 1 ∞ (3 + 2.o ) + ’2 ∞ ) 2 ∞ and B2 = 2 2 ∞ ( 1 ∞ + ’1 ∞ ) + 2.o p 1 ∞ (w2 ∞ (1 + p) + p  2 ∞ ). It is easy to see that, given h ∈ H, . ∈ I and 0 ¡  ¡ 2, hs − hQ; 2 ¡  and |.‘ − .| ¡ , entail f1; .‘ ;hs − f1; .; h Q; 2 6 B1  and so, N (B1 ; F1 ; L2 (Q)) 6 N (; H; L2 (Q)) · N (; I; | · |): In an analogous way, given h = (h1 ; : : : ; hp ) with hj ∈ H, . ∈ I and 0 ¡  ¡ 2, hs = (hs; 1 ; : : : ; hs; p ) with hs; j − hj Q; 2 ¡  for 0 6 j 6 p and |.‘ − .| ¡ , it is easy to see that f2; .‘ ;hs − f2; .; h Q; 2 6 B2 ; which implies N ( B2 ; F2 ; L2 (Q)) 6 N (; H; L2 (Q))p · N (; I; | · |): Therefore, these classes of functions have 7nite uniform entropy. For any class of functions F, de7ne the integral  # J(#; F) = sup 1 + log (N (||F||Q; 2 ; F; L2 (Q))) d; Q

0

250

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

where the supremum is taken over all discrete probability measures Q with FQ; 2 ¿ 0 and F is the envelope of F. The function J is increasing, J(0; F) = 0 and J(1; F) ¡ ∞ and J(#; F) → 0 as # → 0 for classes of functions F which satisfy the uniform-entropy condition. Moreover, if Fo ⊂ F and the envelope F is used for Fo , then J(#; Fo ) 6 J(#; F). For any 0 ¡ # ¡ 1, consider the subclasses F1; # = {f1; .; h (r; z; t) ∈ F1 with h∞ ¡ #} ⊂ F1 ; F2; # = {f2; .; h (r; z; t) ∈ F2 with h(t) = (h1 (t); : : : ; hp (t)) and h‘ ∞ ¡ #; 1 6 ‘ 6 p} ⊂ F2 : p

For any  ¿ 0, let 0 ¡ # ¡ 1. Using that sn →.o and since (13) and (14) entail, for 06j6p 

p

sup |(ˆj (t)| = sup |ˆ j (t) − j (t)|→0; t∈[0;1]

t∈[0;1]

p

sup |(ˆj (t)| = sup |ˆ j (t) − j (t)|→0;

t∈[0;1]

t∈[0;1]

we have that, for n large enough, P(sn ∈ I) ¿ 1 − #=2 and P((ˆj ∈ H and (ˆj ∞ ¡ #) ¿ 1 − #=2, for 0 6 j 6 p. Let A3 = 2.o  1 ∞ (w2 ∞ (1 + p) + p  2 ∞ ). Straighforward calculations lead us to n

sup

f∈F1;#

1 2 f (ri ; zi ; ti ) 6 A21 #; n i=1 n

sup

f∈F2;#

1 2 f (ri ; zi ; ti ) 6 A23 #: n i=1

The maximal inequality for covering numbers entails that, for any 0 6 ‘ 6 p, P(|Jn; 1 (sn ; (ˆ‘ )| ¿ ) 6 P(|Jn; 1 (sn ; (ˆ‘ )| ¿ ; sn ∈ I; (ˆ‘ ∈ H and (ˆ‘ ∞ ¡ #) + #     n    1=2 1   6 P sup n f(ri ; zi ; ti ) ¿  + #  n f∈F1;#  6 6

1 E 

i=1

  n    1=2 1   f(ri ; zi ; ti ) + # sup n   n f∈F1;#



i=1

1 D1 A1 J(#; F1 ) + #; 

where D1 is a constant not depending on n. Similarly,   2 1 A3 P(|Jn; 2 (sn ; )| ˆ ¿ ) 6 D2 A2 J #; F2 + #:  A22

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

251

Now, (A.5) and (A.6) follow from the fact that lim#→0 J(#; F1 ) = 0 and lim#→0 J(#; F2 ) = 0, since the classes F1 and F2 satisfy the uniform-entropy condition. (c) Since n1=2 [Ln (sn ; o ) − Ln (.o ; o )] =n−1=2

n 

[

1; sn (ri

− zi o ) −

1; .o (ri

− zi o )]w2 (zi )zi ;

i=1

we get the desired result using (A.8), the boundness of for covering numbers, as in (b).

2

and the maximal inequality

References Ansley, C., Wecker, W., 1983. Extension and examples of the signal extraction approach to regression. In: Zeller, A. (Ed.), Applied Time Series Analysis of Economic Data, Bureau of Census, Washington, DC, pp. 181–192. Bianco, A., Boente, G., 2002. On the asymptotic behavior of one-step estimates in heteroscedastic regression models. Statist. Probab. Lett. 60, 33–47. Boente, G., Fraiman, R., 1991. Strong uniform convergence rates for some robust equivariant nonparametric regression estimates for mixing processes. Internat. Statist. Rev. 59, 355–372. Boente, G., Fraiman, R., Meloche, J., 1997. Robust plug-in bandwidth estimators in nonparametric regression. J. Statist. Plann. Inference 57, 109–142. Chen, H., 1988. Convergence rates for parametric components in a partly linear model. Ann. Statist. 16, 136–146. Chen, H., Chen, K., 1991. Selection of the splined variables and convergence rates in a partial spline model. Canad. J. Statist. 19, 323–339. Chen, H., Shiau, J., 1991. A two-stage spline smoothing method for partially linear models. J. Statist. Plann. Inference 25, 187–201. Chen, H., Shiau, J., 1994. Data-driven eNcient estimates for partially linear models. Ann. Statist. 22, 211–237. Denby, L., 1986. Smooth regression functions, Statistical Research Report 26, AT& T Bell Laboratories, Murray Hill. Engle, R., Granger, C., Rice, J., Weiss, A., 1986. Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81, 310–320. Gao, J., 1992. A large sample Theory in Semiparametric Regression Models. Ph.D. Thesis, University of Science and Technology of China, Hefei, China. Gao, J., Liang, H., 1995. Asymptotic normality of pseudo-LS estimator for partly linear autoregression models. Statist. Probab. Lett. 23, 27–34. Gao, J., Zhao, L., 1993. Adaptive estimation in partly linear regression models. Sci. China Ser. A 1, 14–27. Gervini, D., Yohai, V.J., 2000. A class of robust and fully eNcient regression estimators. Technical report (December 2000), submitted. Green, P., Jennison, C., Seheult, A., 1985. Analysis of 7eld experiments by least squares smoothing. J. Roy. Statist. Soc. Ser. B 47, 299–315. Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive Models. In: Monographs on Statistics and Applied Probability, Vol. 43. Chapman & Hall, London. He, X., Shi, P., 1996. Bivariate tensor-product B-spline in a partly linear model. J. Multivariate Anal. 58, 162–181. He, X., Zhu, Z., Fung, W., 2001. Estimation in a semiparametric model for longitudinal data with unspeci7ed dependence structure. Working Paper. Heckman, N., 1986. Spline smoothing in a partly linear model. J. Roy. Statist. Soc., Ser. B 48, 244–248. Huber, P., 1981. Robust Statistics. Wiley, New York.

252

A. Bianco, G. Boente / Journal of Statistical Planning and Inference 122 (2004) 229 – 252

Krasker, W., Welsch, R., 1982. ENcient bounded-inIuence regression estimation. J. Amer. Statist. Assoc. 77, 595–604. Liang, H., HYardle, W., Carroll, R., 1999. Estimation in a semiparametric partially linear errors-in-variables model. Ann. Statist. 27, 1519–1535. Mallows, C., 1975. On some topics in robustness. Technical Memorandum, AT& T Bell Laboratories, Murray Hill. Rice, J., 1986. Convergence rates for partially splined models. Statist. Probab. Lett. 4, 203–208. Robinson, P., 1988. Root-n-consistent semiparametric regression. Econometrica 56, 931–954. Rousseeuw, P., Leroy, A., 1987. Robust Regression and Outlier Detection. Wiley, New York. Severini, T., Wong, W., 1992. Pro7le likelihood and conditionally parametric models. Ann. Statist. 20, 1768–1802. Speckman, P., 1988. Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50, 413–436. Truong, Y., Stone, C., 1994. Semiparametric time series regression. J. Time Ser. Anal. 15, 405–428. van der Vaart, A., Wellner, J., 1996. Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York. Yee, T., Wild, C., 1996. Vector generalized additive models. J. Roy. Statist. Soc., Ser. B 58, 481–493. Yohai, V., 1987. High breakdown point and high eNciency robust estimates for regression. Ann. Statist. 15, 642–656. Yohai, V., Zamar, R., 1988. High breakdown estimates of regression by means of the minimization of an eNcient scale. J. Amer. Statist. Assoc. 83, 406–413.