Statistical tests in the partially linear additive regression models

Statistical tests in the partially linear additive regression models

Statistical Methodology 19 (2014) 4–24 Contents lists available at ScienceDirect Statistical Methodology journal homepage: www.elsevier.com/locate/s...

470KB Sizes 5 Downloads 171 Views

Statistical Methodology 19 (2014) 4–24

Contents lists available at ScienceDirect

Statistical Methodology journal homepage: www.elsevier.com/locate/stamet

Statistical tests in the partially linear additive regression models Salim Bouzebda a , Khalid Chokri b,∗ a

Laboratoire de Mathématiques Appliquées de Compiègne-L.M.A.C., Université de Technologie de Compiègne, B.P. 529, 60205 Compiègne Cedex, France b

L.S.T.A., Université Pierre et Marie Curie, 4 place Jussieu, 75252 Paris Cedex 05, France

article

info

Article history: Received 1 December 2012 Received in revised form 3 February 2014 Accepted 4 February 2014 Keywords: Additive model Asymptotic normality Curse of dimensionality Kernel-type estimators Law of iterated logarithm Marginal integration Partially linear models Hypothesis testing



abstract In the present paper, we are mainly concerned with statistical tests in the partially linear additive model defined by

Yi = Z⊤ i β+

d 

mℓ (Xi,ℓ ) + εi ,

1 ≤ i ≤ n,

ℓ=1

where Zi = (Zi,1 , . . . , Zip )⊤ and Xi = (Xi,1 , . . . , Xid )⊤ are vectors of explanatory variables, β = (β1 , . . . , βp )⊤ is a vector of unknown parameters, m1 , . . . , md are unknown univariate real functions, and ε1 , . . . , εn are independent random errors with mean zero and finite variances σε2 . More precisely, we first consider the problem of testing the null hypothesis β = β0 . The second aim of this paper is to propose a test for the null hypothesis H0σ : σε2 = σ02 , in the partially linear additive regression models. Under the null hypotheses, the limiting distributions of the proposed test statistics are shown to be standard chi-squared distributions. Finally, simulation results are provided to illustrate the finite sample performance of the proposed statistical tests. © 2014 Elsevier B.V. All rights reserved.

Corresponding author. Tel.: +33 609504606. E-mail addresses: [email protected] (S. Bouzebda), [email protected] (K. Chokri).

http://dx.doi.org/10.1016/j.stamet.2014.02.001 1572-3127/© 2014 Elsevier B.V. All rights reserved.

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

5

1. Introduction Regression analysis is proved to be a flexible tool and that provided a powerful statistical modeling framework in a variety of applied and theoretical contexts, where one intends to model the predictive relationship between related responses and predictors. It is worth noticing that the parametric regression models provide useful tools for analyzing practical data when the models are correctly specified, but may suffer from large modeling biases when the structures of the models are misspecified, which is the case in many practical problems. As an alternative, nonparametric smoothing methods ease the concerns on modeling biases. However, it is well known that unrestricted multivariate nonparametric regression models are subject to the curse of dimensionality, in multivariate settings, and fail to take advantage of the flexibility structure in modeling phenomena with moderate set of data, see [58,59, 26,29] among others. To overcome that difficulty, Engle et al. [24] modeled the covariate effects via a partially linear structure, which is a special semiparametric model. Since the partially linear models contain both parametric and nonparametric components, they offer a compromise between the flexibility of a full nonparametric regression and reasonable asymptotic behavior. For several examples of practical problems that can be solved with partial linear models, the interested reader may refer to Härdle et al. [30] for more details. To be more precise, the partially linear regression models are defined as follows Y = Z⊤ β + m(X) + ε,

(1.1)

where β ∈ R is a vector of unknown parameters, m is the nonlinear part of the model and ε is the modeling error with p

E(ε|X, Z) = 0 and Var(ε|X, Z) = σ 2 (X, Z) := σε2 . Here and in the sequel, V⊤ stands for the transpose of the vector V. The partially linear regression model has a broad applicability in the fields of biology, economics, education and social sciences. This model and various associated estimators, test statistics, and generalizations have generated a substantial body of literature, which includes, among many others, the works of Rice [51], Chen [11], Robinson [52], Chen and Shiau [13], Eubank and Speckman [25], Donald and Newey [21], Shi and Li [56,57], Bhattacharya and Zhao [2], Hamilton and Truong [28], Liang et al. [38], Shen et al. [54], Yu et al. [63], Bouzebda et al. [5] and the reference therein. To reduce the dimension impact of the nonparametric part in the partially linear regression model (1.1), we consider the partially linear additive model that imposes an additive structure to the nonparametric function m Y = Z⊤ β +

d 

mj (xj ) + ε,

(1.2)

j =1

where Xj is the jth component of the vector X and mj is a real univariate function. In the partially linear additive regression models, the functions mj can be estimated with the one-dimensional rate. Hence the curse of dimensionality can be treated in a satisfactory manner and another advantage is that the lower-dimensional curves are easier to visualize and to interpret than a higher-dimensional function, see [46] for further discussions. We may also cite the paper of Yu et al. [63], that analyzes efficiency gains in semiparametric models from imposing additional structure on the nonparametric component. In practice, investigators often want to know the impact of the covariates Z on the response Y , under the model (1.2), which requires testing the null hypothesis β

H0 : β = β0 , versus the alternative β

H1 : β ̸= β0 . In this paper we construct a statistical test for this end. The second aim of the present paper is to test the null hypothesis

H0σ : σε2 = σ02 ,

6

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

versus the alternative

H1σ : σε2 ̸= σ02 .

So it is important to test H0σ before making an inference, which is a strong motivation in order to propose a new statistical test in the present work. It is noteworthy here to point out that some related problems were studied by Dette and Munk [18], Dette [16], Liero [39], Dette and Marchlewski [17], Liu et al. [43] and Lin and Qu [40]. The remaining part of the present paper is organized as follows. In Section 2, we briefly describe the estimation procedure which plays a central role in the construction of the statistical tests. In Section 3, β we propose statistical tests for checking the null hypotheses H0 and H0σ . The asymptotic distributions of the test statistics under the null hypotheses are derived in Section 4. Section 5 provides simulation results in order to illustrate the performance of the proposed statistical tests. To avoid interrupting the flow of the presentation, all mathematical developments are relegated to the Appendix. 2. Estimation procedures We consider a sequence {Xi , Yi , Zi : i ≥ 1} of independent and identically distributed random replicæof the random vector (X, Y , Z) ∈ Rd × R × Rp . We denote the joint density function of (X, Y , Z) by f with respect to the Lebesgue measure and denote by g(x) =

 Rp+1

f (x, y, z)dydz,

the density of X. For each n ≥ 1, and for each choice of the bandwidth hn > 0, we define the kernel estimator of the marginal density g, for any x ∈ Rd , as n 1 

gn (x) =

nhdn i=1

 K

x − Xi



hn

,

where K is a kernel function, i.e., a non-negative function defined on Rd and integrating to 1. Notice that the model (1.1) may be written as Y − Z⊤ β = m(X) + ε.

(2.1)

On the basis of the model (2.1), following the usual Wand and Jones method, the regression estimator involving the nonparametric part of the model may be defined, for any x ∈ Rd , as β

n (x) = m

n  Yi − Z⊤ β i

i=1

ngn (Xi )



d  1

ℓ=1



hn

Kℓ

xℓ − Xiℓ hn



,

(2.2)

where xℓ and Xiℓ are the ℓ-th component of x and Xi respectively, and Kℓ (1 ≤ ℓ ≤ d) are the kernel functions defined on R. It is worth noticing that the profiling at (2.2) is, in fact, to estimate the least favorable curve {m∗ (·, β) : β ∈ Rp } in the space of additive nonparametric functions, where m∗ (·, β) is defined, for fixed β0 (the true value of β), m∗ (·, β) = (β0 − β)Π (E(Z|X = ·)|H ) + m(·, β0 ), where Π (·|H ) denotes the projection operator onto H and, for any g in an arbitrary class of density functions,

 H (g) =

m ∈ L2 (g) : m(x) =

d 

 mℓ (xℓ ) and Emℓ (Xℓ ) = 0, for all 1 ≤ j ≤ d ,

ℓ=1 β

n (x) where L2 (g) denotes the space of functions m : Rd → R such that E(m(X)2 ) < ∞. Note that m depends on the unknown parameter β which needs to be estimated. Considering the model (2.1), the function m clearly depends on the parameter β and its additive structure may be written as β

madd (x) = µ +

d 

ℓ=1

β

mℓ (xℓ ),

(2.3)

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

7

where we recall that the model identifiability considerations impose that β

Emℓ (Xℓ ) = 0,

for 1 ≤ ℓ ≤ d.

In the early eighties, in the field of statistics, there was a development of results concerning the additive components of the regression model. The backfitting algorithm of Breiman and Friedman [6], Buja et al. [7] and Hastie and Tibshirani [31] is widely used to estimate the one-dimensional components mℓ and regression function m. The backfitting idea is to project the data onto the space of additive functions. This projection is done via least squares, where the least squares problem is solved via the Gauss–Seidel algorithm. Notice that the additive model has now become a widely used multivariate smoothing technique, in large part due to the extensive discussion in [31], where the authors give a good overview and analyze estimation techniques based on backfitting, and the availability of fitting routines in S-Plus, described in [10]. It should be remarked that important progress has also been made by Mammen et al. [45] or Opsomer and Ruppert [48] in the asymptotic theory of backfitting. Mammen et al. [46] provided a recent overview over smooth backfitting type estimators in additive models and also discussed extensions to varying coefficient models, additive models with missing observations, and the case of nonstationary covariates. Auestad and Tjøstheim [1], Tjøstheim and Auestad [61] and Linton and Nielsen [42] proposed a method based on marginal integration of the mean function m for estimating the additive components. Their analysis is restricted to the case of dimension d = 2; Chen et al. [12] tried to extend this result to arbitrary d, we may refer also to Newey [47]. One advantage of the integration method is that its statistical properties are easier to describe; specifically, one can easily prove central limit theorems and give explicit expressions for the asymptotic bias and variance of the estimators. There is a main disadvantage of the integration estimator which is perhaps even more time consuming to compute than the backfitting estimator. Our approach combines the marginal integration method to estimate mℓ , ℓ = 1, . . . , d, and the least square error criterion to estimate the parameter β. It results then that our estimate takes the form which depends upon the kernels, the smoothing parameter and the marginal integration, that avoids to consider any choice procedure of the weights leading to minimization issues. Notice that estimators built up considering unknown weight quantities are studied in several papers. Obviously, in such cases, optimization procedures are needed to obtain efficient estimates. The resulting optimal weights are always functions depending on unknown parameters that must be estimated. This issue was investigated, for instance, by the work of Fan et al. [27]. Now, let us introduce some further notation and basic definitions which are used throughout the paper. For any 1 ≤ ℓ ≤ d, set x−ℓ = (x1 , . . . , xℓ−1 , xℓ+1 , . . . , xd ), q−ℓ (x−ℓ ) =

d 

qj (xj )

j=1,j̸=ℓ

and q(x) =

d 

qℓ (xℓ ),

l =1

where qℓ , 1 ≤ ℓ ≤ d, are known univariate density functions. Following the marginal integration method, the additive regression function estimator is given, for any x ∈ Rd , by β

add (x) = m

d 

ℓ=1

β  ξℓ (xℓ ) +

 Rd

βn (x)q(x)dx, m

(2.4)

where β  ξℓ (xℓ ) = β



β

Rd−1

n (x)q−ℓ (x−ℓ )dx−ℓ − m

 Rd

βn (x)q(x)dx. m

(2.5)

Here,  ξℓ is the estimate of the ℓ-th component of the additive regression function which still depends on the parameter β. Therefore, one has to estimate the vector parameter β to have ready estimates.

8

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

Remark 1. Let us recall a general kernel-type estimator of the regression function defined, for x ∈ Rd and a measurable function ψ : R → R, by n 

n;hn (x, ψ) := m

ψ(Yi )K ((x − Xi )/hn )

i=1 n 

.

(2.6)

K ((x − Xi )/hn )

i =1

By setting ψ(y) = y into (2.6) we get the classical Nadaraya–Watson kernel regression function estimator of m(x) := E(Y | X = x), given by n 

n;hn (x) := m

Yi K ((x − Xi )/hn )

i =1 n



.

(2.7)

K ((x − Xi )/hn )

i =1

We define the internal estimator at some predefined point x by

Int m n;hn (x) :=

n 1 

nhdn i=1

 K

x − Xi hn



Yi gn (Xi )

.

(2.8)

For more details on the estimators (2.7) and (2.8), the interested reader may refer, e.g., to Wand and Jones [62]. Linton and Jacho-Chávez [41] pointed out that Mack and Müller [44] were the first to Int propose m n;hn (x), for d = 1, with a view to the estimation of derivatives by computing the derivative of the regression, which has a simpler form than the derivative of the Nadaraya–Watson smoother. 1 The term ‘‘internal’’ stands for the fact that the factor g− n (Xi ) is internal to the summation, while the n;hn (x) has the factor estimator m

 gn (x) = −1

n 1 

nhdn i=1

 K

x − Xi

  −1

hn

externally to the summation. Jones et al. [34] consider various versions of kernel-type regression estimators, especially the Nadaraya–Watson estimator and the local linear estimator. They established the equivalence between the local linear estimator and the internal estimator. Linton and JachoChávez [41] and Shen and Xie [55] pointed out that the internal estimators are particularly adequate for the additive nonparametric regression model, since,

 Rd−1

Int m n;hn (x)q−ℓ (x−ℓ )dx−ℓ =

n 1 

Yi

nhd i=1 gn (Xi )



 K

x − Xi hn

Rd−1



q−ℓ (x−ℓ )dx−ℓ .

Under some smoothness conditions on the density function q−ℓ (·), and also the use of the kernel which takes advantage of these conditions, the last integral can be very closely approximated as follows

 Rd−1

Int m n;hn (x)q−ℓ (x−ℓ )dx−ℓ ≈

n 1 

Yi

nhn i=1 gn (Xi )

 Kℓ

xℓ − Xi,ℓ hn



q−ℓ (X−ℓ ).

(2.9)

Notice that the estimator in (2.8) has been used also in [32] in the context of estimating additive models. The authors showed that it has some additional theoretical advantages over the usage of Nadaraya–Watson estimators. More precisely, it is possible to obtain asymptotic normality of  ξℓ (xℓ ) at the optimal rate for one-dimensional nonparametric regression without assuming additional smoothness. They also mention the computational attractiveness and better performance of the internal estimator (compared to its classical counterpart), in particular, when the covariates are correlated and nonuniformly distributed. Indeed, the internality of the factor f −1 (Xi ) in the summation plays an instrumental role in the approximation (2.9), that facilitates a lot the computation of the estimator  ξℓ (xℓ ).

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

9

Estimation of the parameter β. Now, we give the estimation procedure of β. While considering the partially linear additive regression model Y = Z⊤ β + madd (X) + ε.

(2.10)

Making also use of the statements (2.2), (2.4)–(2.10), and considering the least squares error criterion, it follows that

 Z Y, β = [ Z Z⊤ ]−1

(2.11)

where



n 

 Y = Yi −

⊤ ,

Wnj (Xi )Yj

j=1

  1≤i≤n n   , Z = Zi − Wnj (Xi )Zj j =1

(2.12)

1≤i≤n

Unj (Xi )

Wnj (Xi ) =

(2.13)

ngn (Xj )

and Unj (Xi ) =

d  1 l =1

hn

 Kℓ

Xiℓ − Xjℓ



hn

Dℓ − (d − 1)



d  1

Rd k=1

hn

 Kk

xk − Xjk hn



q(x)dx,

where

 Dℓ =

1

 Rd−1 k=1,k̸=l

hn

 Kk

xk − Xjk



hn

q−ℓ (x−ℓ )dx−ℓ .

Finally, the estimates of the regression function and the additive components are defined as  β

add (x) = m

d 

 β  ξℓ (xℓ ) +

l =1



βn (x)q(x)dx m

(2.14)



Rd

and  β  ξℓ (xℓ ) =



βn (x)q−ℓ (x−ℓ )dx−ℓ − m





Rd−1

βn (x)q(x)dx. m 

Rd

(2.15)

3. Statistical tests β

In order to test the null hypothesis H0 , through the Wald-type statistic, we propose the following statistic Rn :=

n( β − β)⊤ B( β − β)

 σn2

,

(3.1)

where

 σn2 =

n 1

n i=1

⊤ −1 ( Yi −  Z⊤ Zi Yi )2 . i ( Zi Zi )

Here, B is a p × p-positive definite matrix. Under some regularity conditions, we prove, in Theorem 5, that the limit law of the test statistic Rn is a χ 2 -distribution with p degrees of freedom. An application β of Theorem 5, leads to reject the null hypothesis H0 , whenever the value of the statistic Rn exceeds

10

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

z1−α , namely, the (1 − α)-quantile of the χ 2 law with p degrees of freedom. The corresponding test is then, asymptotically of level α , when n → ∞. Then the confidence region associated with our test, for a given significance level α , is formulated by the following ellipsoid





R = b ∈ Rp : n( σn2 )−1 ( β − b)⊤ B( β − b) ≤ z α . Our second test concerns the variance σε of the model (1.2) and we test the null hypothesis

H0σ : σε = σ0 versus

H1σ : σε ̸= σ0 . To test the null hypothesis H0σ , we propose the following statistic Tn :=

n( σn2 − σε2 )2 , Vn

(3.2)

where Vn :=

n 1 

n i =1

⊤ −1 ( Yi −  Z⊤ Zi Yi )2 −  σn2 i (Zi Zi )

2

.

In Corollary 7, we show that Tn follows asymptotically a χ 2 -distribution with one degree of freedom. Therefore, the confidence interval, for a given significance level α , is given by





T = σ ∈ R : n( σn2 − σ 2 )2 ≤ Vn zα , where zα is a χ 2 quantile of order 1 − α . 4. Main results To state our results, we consider an additional assumption on the model structure. Therefore, we suppose that lim

n→∞

n 1

n i =1

Z˜ i Z˜ ⊤ i = B a.s.,

we recall that B is a p × p-positive definite matrix. Further assumptions involving the density function of X, the regression function m, the kernels Kℓ , 1 ≤ ℓ ≤ d and the smoothing parameters are presented below for easy reference. The first part of these conditions is devoted to the regression function m and the density g. In the sequel, I d denotes a compact subset of Rd . We will make use of the following conditions in our analysis. (G.1) m is k-times continuously differentiable and there exists a constant 0 < C < ∞, such that

   ∂ k m(x)    sup  k  ≤ C, kd 1 x∈I  ∂ x . . . ∂ x  1

k1 + · · · + kd = k;

d

(G.2) The marginal density g is continuous and bounded away from 0 on the support I d of the function q; (G.3) The marginal density g is k′ -continuously differentiable on its support and k′ > kd. Throughout, the following hypothesis is considered upon the sequence of bandwidths (hn )n≥1 . (H.1) hn = ϑ1

 log n 1/(2k+1) n

, for 0 < ϑ1 < ∞ and 2k + 1 ≥ d.

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

Set now, for any x ∈ Rd , K(x) := following conditions are fulfilled

 d

R

Rd



l =1

11

Kℓ (xℓ ). We say that the kernel L is of order s, whenever the

L(x)dx = 1, k

k

x11 . . . xdd L(x)dx = 0, k

k

Rd

d

k1 , . . . , kd ≥ 0, k1 + · · · + kd = 1, . . . , s − 1,

|x11 . . . xdd |L(x)dx < ∞,

k1 , . . . , kd ≥ 0, k1 + · · · + kd = s.

The kernel functions are assumed to satisfy the following assumptions. (K.1) For any 1 ≤ ℓ ≤ d, Kℓ is bounded, Lipschitz continuous and integrating to one; (K.2) For any 1 ≤ ℓ ≤ d, Kℓ (u) = 0 for u ̸∈ [−λ/2, λ/2], for some 0 < λ < ∞; d (K.3) K and K := ℓ=1 Kℓ are of order k′ and k respectively. Consider also the following assumptions upon the random variables Y and Z. (M.1) Y and Z are bounded. The assumptions on the weight functions qℓ , 1 ≤ ℓ ≤ d, needed for our analysis are the following. (Q.1) For any 1 ≤ ℓ ≤ d, qℓ has k + 1 continuous and bounded derivatives. (Q.2) The support of the function q is included in the support of the density g. Remark 2. In order to establish the consistency of  β the estimator of the parameter β, most of the assumptions are needed and can be seen in [14], where more details are provided. d

d

In the sequel, ‘‘→’’ denotes the convergence in distribution. Below, we write Z = N (µ, σ 2 ) whenever the rv Z follows a normal law with expectation µ and variance σ 2 . The main results to be proved here may now be stated precisely as follows. Theorem 3. Assume that the assumptions (G.1-3), (H.1), (K.1-3), (M.1) and (Q.1-2) hold. In addition we suppose that max E|εi |r < ∞

1≤i≤n

for some r ≥ 4.

Then, we have, as n → ∞,



d

n( σn2 − σε2 ) → N (0, Var ε12 ),

The proof of Theorem 3 is postponed to the Appendix. Theorem 4. Assume that the assumptions (G.1-3), (H.1), (K.1-3), (M.1) and (Q.1-2) hold. In addition, suppose that max E|εi |r +1 < ∞

1≤i≤n

for some r ≥ 4.

Then, we have, as n → ∞,

 lim sup n→∞

n 2 log log n

   2  σn − σε2  = (Var ε12 )1/2 a.s. 

The proof of Theorem 4 is postponed to the Appendix. Theorem 5. Assume that the assumptions (G.1-3), (H.1), (K.1-3), (M.1) and (Q.1-2) hold. In addition, suppose that max E|εi |r < ∞ for some r ≥ 2.

1≤i≤n

12

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

β

Then, we have, under the null hypothesis H0 , d

Rn → χ 2 (p), where χ 2 (p) denotes the χ 2 -variable with p degrees of freedom. The proof of Theorem 5 is postponed to the Appendix. Theorem 6. Assume that the assumptions of Theorem 3 hold. Then, under the null hypothesis H0σ , Sn :=

n( σn2 − σε2 )2 V

d

→ χ 2,

where V := Var ε12 and χ 2 := χ 2 (1) denotes the χ 2 -variable with one degree of freedom. The proof of Theorem 6 is postponed to the Appendix. Notice that when the quantity V is unknown then we cannot obtain an explicit computation via the previous theorem. To overcome this problem, it suffices to replace V by its estimate. Then, the corollary below is a new version of Theorem 6 when V is estimated by Vn . Corollary 7. Assume that the assumptions of Theorem 3 hold. Under the null hypothesis H0σ , we have d

Tn → χ 2 . The proof of Corollary 7 is captured in the forthcoming Appendix. Remark 8. Note that the condition (M.1) may be replaced by more general hypotheses upon moments of Y as in, e.g., [22,23]. That is

(M.1)′ supx∈I d E(|Y |s |X = x) < ∞, for s > 2, or a more general form as in [15]

(M.1)′′ We denote by {M(x) : x ≥ 0} a non-negative continuous function, increasing on [0, ∞), and such that, for some s > 2, ultimately as x ↑ ∞, (i) x−s M(x) ↓; (ii) x−1 M(x) ↑ . For each t ≥ M (0), we define M

inv

(4.1)

(t ) ≥ 0 by M(M (t )) = t. We assume further that: inv

sup E (M (|Y |) |X = x) < ∞. x∈I d

The finite moment assumption will add much extra complexity to the proofs and we will need also that the sequence {hn }n≥1 satisfies 1 1−2/s h− . n ≤ (n/ log(1/hn ))

Remark 9. Let us recall that n 1

lim

n→∞

n i =1

Z˜ i Z˜ ⊤ i = B a.s.,

and

Z˜ = Z −

n 

Wnj (X)Zj .

j =1

Notice that we have n 

Wnj (Xi )Zj

j=1

=

d  n 

1

l=1 j=1

ngn (Xj )



1 hn

Kl

X − X   il jl hn

d 

Rd−1 k=1,k̸=l

1 hn

Kk

x − X  k jk hn

q−l (x−l )dx−l

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24



=

d − 1  d

Rd k=1

d   l =1

d  1

Rd−1

hn

Kk

x − X  k jk hn

13



q(x)dx Zj

E[Z|Xil , x−l ]q−l (x−l )dx−l −

d − 1  d

Rd



E[Z|X = x]q(x)dx + o(1) a.s.

d

:=



W l (Xil ) + o(1)

a.s.

l =1

Thus, the explicit form of the matrix B is given by the following formula

 ⊤   d d   W l (Xl )  . W l ( Xl ) Z− B = E Z − l =1

l =1

Remark 10. The limiting behavior of gn , for appropriate choices of the bandwidth hn , has been studied by a large number of statisticians over many decades. For good sources of references to research literature in this area along with statistical applications consult Devroye and Lugosi [20], Devroye and Györfi [19], Bosq and Lecoutre [4], Scott [53], Wand and Jones [62] and Prakasa Rao [50]. In particular, Parzen [49] has shown, under some assumptions on K , that gn is an asymptotically unbiased and consistent estimator for g whenever hn → 0, nhdn → ∞ and x is a continuity point of g. Under some additional assumptions on g and hn , the author obtained an asymptotic normality result, too. The paper by Bickel and Rosenblatt [3] is to be cited here. It is noteworthy here to point out that nhdn → ∞ is satisfied in our framework whenever 2k + 1 ≥ d. Notice that the conditions (G.1), (G.3) and (K.1) are classical in the nonparametric estimation procedures. In particular, by imposing the condition (K.1), the kernel function exploits the smoothness of the density function or the regression function. It is well known that the best obtainable rate of convergence of the kernel estimator, in the AMISE sense, is of order n−4/5 , in the univariate case. If we loose the condition that the kernel function K must be a density, the convergence rate could be faster. Indeed, the convergence rate can be made arbitrarily close to the parametric n−1 as the order increases. In fact, Chacón et al. [9] showed that the parametric rate n−1 can be attained by the use of superkernels, and that superkernel density estimators automatically adapt to the unknown degree of smoothness of the density. The main drawback of higher-order kernels in this situation is the negative contributions of the kernel may make the estimated density not a density itself. The interested reader may refer to, e.g., [35,36,33]. Remark 11. In the present paper, we are mainly concerned with performing statistical tests. Towards this aim, we have used the marginal integration technique to profile the nonparametric part and then minimize a profiled squared error criterion to estimate the parameter β. In the paper by Yu et al. [63], the authors were interested in the investigation of the efficiency gains when the nonparametric part of the model has an additive structure. For this end, they make use of a smooth backfitting technique to deal with the additive nonparametric part in order to provide semiparametric efficient estimators for β. We mention that the estimated profile likelihood based on the Gaussian error model investigated in Section 3.2 of Yu et al. [63], coincides with the least squares error estimators. 5. Simulations In this section, series of experiments are conducted in order to examine the performance of the proposed statistical tests, defined in (3.1) and (3.2). More precisely, we have undertaken numerical illustrations regarding the power of these statistical tests in finite sample situations. The computing program codes are implemented in R. The following three models were considered in the simulation study Model I: Y = Z⊤ β + X1 + X2 ,

14

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24 Table 1 Power estimate of 0.01 tests for Rn against alternatives for different values of β based on 1000 replications: Model I.

β (1.01, 2.01)⊤ (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2 , 3 )⊤ (3 , 4 )⊤

n 25

50

100

200

500

1000

0.11 0.159 0.724 0.978 1

0.101 0.231 0.946 1 1

0.098 0.337 0.999 1 1

0.091 0.446 1 1 1

0.084 0.779 1 1 1

0.086 0.866 1 1 1

Table 2 Power estimate of 0.01 tests for Rn against alternatives for different values of β based on 1000 replications: Model II.

β (1.01, 2.01)⊤ (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2 , 3 )⊤ (3 , 4 )⊤

n 25

50

100

200

500

1000

0.111 0.129 0.41 0.791 0.98

0.112 0.178 0.745 0.983 1

0.132 0.246 0.977 1 1

0.148 0.448 1 1 1

0.104 0.796 1 1 1

0.171 0.977 1 1 1

Table 3 Power estimate of 0.01 tests for Rn against alternatives for different values of β based on 1000 replications: Model III.

β (1.01, 2.01)⊤ (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2 , 3 )⊤ (3 , 4 )⊤

n 25

50

100

200

500

1000

0.078 0.091 0.193 0.379 0.746

0.071 0.100 0.182 0.456 0.886

0.085 0.099 0.294 0.671 0.985

0.100 0.123 0.430 0.874 1

0.117 0.128 0.719 0.999 1

0.093 0.169 0.941 1 1

Model II: Y = Z⊤ β + sin(π X1 ) + sin(π X2 ), Model III: Y = Z⊤ β + exp(X1 ) + exp(X2 ), where X1 , X2 and the error are assumed to standard normal random variables. The deterministic vector β (respectively σε2 ) is chosen to take different values from near to far from the null hypothesis while Z is taken as a Gaussian random vector. In our simulations, samples of sizes were n = 25, n = 50, n = 100, n = 500 and n = 1000 have been drawn following the scheme that has been described and m = 1000 replicates have been considered for each scenario. The first kind error risks were α = 0.01, α = 0.05 and α = 0.10. The obtained results are displayed in the following tables. Notice that, as in any other inferential context, the greater the sample size is, the better the power of the tests, studied here, is. Simple inspection of the results reported in the preceding tables allows to deduce that for large values of the sample size n, the empirical powers of the considered tests are all close to 1, in particular for n = 1000. We observe that even for moderate sample sizes (n = 25, n = 50) the power of the test is close to 1 for the value of α = 0.10 and β = (3, 4)⊤ . This can be explained naturally by the fact that the value of β is rather far from the null hypothesis. Thus the modification of the test Rn may become necessary if the sample size is small. The results reported in Tables 1–9 lead to the conclusion that the power of Rn is close to 1 for large sample sizes or in the situation when we are far from the null hypothesis. From Table 10, similar conclusions are valid for the test Tn where we have considered only Model II, which leads to more satisfactory results comparing with the two other models. Finally, we conclude that the illustration of simulation studies displays that the proposed tests methods are effective in all three models for Rn and in Model II for Tn . In order

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

15

Table 4 Power estimate of 0.05 tests for Rn against alternatives for different values of β based on 1000 replications: Model I.

β

n

(1.01, 2.01)⊤ (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2, 3)⊤ (3, 4)⊤

25

50

100

200

500

1000

0.204 0.267 0.700 0.966 1

0.178 0.327 0.917 0.999 1

0.220 0.381 0.993 1 1

0.206 0.454 1 1 1

0.203 0.801 1 1 1

0.208 1 1 1 1

Table 5 Power estimate of 0.05 tests for Rn against alternatives for different values of β based on 1000 replications: Model II.

β

n

(1.01, 1.01) (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2, 3)⊤ (3, 4)⊤



25

50

100

200

500

1000

0.213 0.313 0.842 0.993 1

0.246 0.403 0.986 1 1

0.214 0.499 0.999 1 1

0.232 0.679 1 1 1

0.225 0.914 1 1 1

0.273 0.983 1 1 1

Table 6 Power estimate of 0.05 tests for Rn against alternatives for different values of β based on 1000 replications: Model III.

β

n

(1.01, 2.01) (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2, 3)⊤ (3, 4)⊤



25

50

100

200

500

1000

0.202 0.194 0.305 0.522 0.862

0.188 0.196 0.328 0.626 0.946

0.194 0.205 0.458 0.812 0.997

0.217 0.250 0.574 0.951 1

0.234 0.273 0.840 1 1

0.219 0.334 0.982 1 1

Table 7 Power estimate of 0.10 tests for Rn against alternatives for different values of β based on 1000 replications: Model I.

β

n

(1.01, 2.01) (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2, 3)⊤ (3, 4)⊤



25

50

100

200

500

1000

0.273 0.344 0.781 0.983 1

0.275 0.393 0.950 1 1

0.309 0.478 0.996 1 1

0.315 0.524 1 1 1

0.310 0.901 1 1 1

0.303 1 1 1 1

to extract methodological recommendations for the use of the proposed statistics in this work, it will be interesting to conduct extensive Monte Carlo experiments to compare our procedures with other alternatives presented in the literature, but this would go well beyond the scope of the present paper. 6. Conclusion β

Testing the hypotheses H0 and H0σ is an important step in practice. Towards this aim, the present paper gives two statistical tests in the framework of the partially linear additive model. The limiting distributions under the null hypotheses of the proposed test statistics are derived, and their properties

16

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24 Table 8 Power estimate of 0.10 tests for Rn against alternatives for different values of β based on 1000 replications: Model II.

β

n

(1.01, 2.01)⊤ (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2 , 3 )⊤ (3 , 4 )⊤

25

50

100

200

500

1000

0.285 0.442 0.903 1 1

0.310 0.573 1 1 1

0.292 0.681 1 1 1

0.309 0.801 1 1 1

0.324 0.972 1 1 1

0.346 1 1 1 1

Table 9 Power estimate of 0.10 tests for Rn against alternatives for different values of β based on 1000 replications: Model III.

β

n

(1.01, 2.01) (1.1, 2.1)⊤ (1.5, 2.5)⊤ (2 , 3 )⊤ (3 , 4 )⊤



25

50

100

200

500

1000

0.291 0.231 0.430 0.624 1

0.289 0.245 0.466 0.730 1

0.296 0.293 0.532 0.885 1

0.298 0.325 0.624 1 1

0.327 0.409 0.956 1 1

0.339 0.455 1 1 1

Table 10 Power estimate of the test Tn against alternatives for different values of σε2 based on 1000 replications: Model II.

α

σε2

25

50

100

200

500

1000

0.01

1.01 1.1 1.5 2 3 4 5

0.104 0.121 0.159 0.337 0.590 0.746 0.851

0.197 0.188 0.161 0.357 0.694 0.833 0.909

0.314 0.257 0.304 0.530 0.875 0.961 0.986

0.347 0.316 0.431 0.782 0.988 1 1

0.398 0.582 0.778 0.983 1 1 1

0.339 0.659 0.891 1 1 1 1

0.05

1.01 1.1 1.5 2 3 4 5

0.274 0.267 0.293 0.408 0.651 0.999 1

0.402 0.287 0.319 0.480 0.739 1 1

0.478 0.469 0.413 0.619 0.913 1 1

0.528 0.476 0.557 0.838 0.988 1 1

0.502 0.519 0.846 0.998 1 1 1

0.537 0.675 0.899 1 1 1 1

0.1

1.01 1.1 1.5 2 3 4 5

0.375 0.348 0.364 0.477 0.843 1 1

0.530 0.461 0.449 0.518 0.950 1 1

0.590 0.504 0.542 0.675 0.998 1 1

0.571 0.506 0.601 0.974 1 1 1

0.570 0.534 0.789 1 1 1 1

0.580 0.874 0.968 1 1 1 1

n

are examined by Monte Carlo simulations. The simulation studies display that the test methods are effective in particular for large sample sizes.

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

17

Acknowledgments The authors would like to thank an Associate-Editor and two referees for their very helpful comments, which led to a considerable improvement of the original version of the paper and a more sharply focused presentation. Appendix A To unburden our notation a bit and for simplicity, C denotes a generic finite positive constant which may have different values at each appearance throughout the sequel. The previously presented notation continues to be used in the following. Proof of Theorem 3. Keeping in mind Eq. (3), one can observe the following decomposition of  σn2 1

 σn2 =

n 1

=

n

( Y − Z⊤ ( Z Z⊤ )−1 Z Y )⊤ ( Y − Z⊤ ( Z Z⊤ )−1 Z Y) ⊤  ( Y − P Z Y ) (Y − P Z Y ),

where

⊤ ⊤ −1 P Z. Z = Z (ZZ ) Notice that, it is easy to show that P Z is an idempotent operator, i.e., P Z P Z = P Z , fulfilling ⊤ P = PZ . Z

It follows that



√ 1 ⊤   ⊤ ⊤ n( σn2 − σε2 ) = √ ( Y ⊤ Y − Y ⊤ P Y + Y ⊤ P PY ) − nσε2 Z Y − Y P Z Z Z n 1 ⊤  Y (In − P = √  Z )Y − n



nσε2 ,

where In denotes the identity matrix of order n. Using the fact that

⊤ ⊤ P ZZ = Z

 and  ZP Z = Z,

we infer that

(In − PZ ) Z⊤ = 0n×p and  Z(In − P Z ) = 0p×n , where 0n×p and 0p×n are the null n × p, respectively p × n, matrices. This, in turn, implies that



n( σn2 − σε2 ) 1

= √ ε⊤ ε − n

:=



1 1 2 ⊤  ⊤ nσε2 − √ ε ⊤ P Z ε + √ Mε (In − P Z )Mε + √ Mε (In − P Z )ε n n n

 √  n (I1 − σε2 ) − I2 + I3 + 2I4 ,

where

ε := ( M madd (Xi ) − εi )1≤i≤n , add (Xi ) = madd (Xi ) − m

n  j =1

Wnj (Xi )madd (Xj )

(A.1)

18

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

and n 

 εi :=

Wnk (Xi )εk .

k=1

Recall I3 from the statement (A.1). We have the following bound



 2  n     |I3 | ≤ n−1 max | madd (Xi )|2 +  Wnk (Xi )εk   ∥In − P ∥Mn,n (R)  k=1  1≤i≤n   2  n     ≤ n−1 C max | madd (Xi )|2 +  Wnk (Xi )εk   ,  k=1  1≤i≤n where

∥In − P ∥Mn,n (R) ≤ C and Mn,n (R) denotes the set of n × n real matrices. In view of Lemmas 13 and 14, we have, almost surely, max | madd (Xi )| = O



−k n 2k+1

log n



and

1≤i≤n

  n    −k    max  Wnk (Xi )εk  = O n 2k+1 log n ,  1≤i≤n  k=1

from which we readily infer that

 −2k  |I3 | = n−1 O n 2k+1 log2 n = o(n−1/2 ) a.s.

(A.2)

Now, we claim that, almost surely, I2 = o(n−1/2 ), where 1 ⊤ ε  Z( Z⊤ Z)−1 Z⊤ ε n

I2 =

and  Z is defined in (2.12). Notice that, for any i = 1, . . . , n and 1 ≤ ℓ ≤ p, we have

 

r 

 r  

EZ˜iℓ εi = EZ˜iℓ Eεi = 0 and EZ˜iℓ εi  ≤ C max Eεi  < ∞, 1≤i≤n

and lim inf n→∞

n 1

n i =1

E(Z˜iℓ εi )2 = σε2 bℓℓ > 0,

where bℓℓ is the (ℓℓ)-th element of B. By letting Vi = Z˜iℓ εi and s2n = nσε2 bℓℓ in Lemma 15, we conclude that n 

Z˜iℓ εi = O



n log log n

1/2 

a.s.

(A.3)

i =1

In addition, we have lim

n→∞

1 ⊤  Z  Z = B a.s. n

Making use of the last equation in connection with (A.3), we see that I2 = o(n−1/2 ) a.s.

(A.4)

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

19

Recall I4 from the statement (A.1). We have the following chain of inequalities



|I4 | ≤

 n C  

n  i=1

add (Xi )εi − m

   Wnk (Xi )εk εi   k=1

n  n  i =1

  n n  n  n    C   add (Xi )εi − m Wni (Xi )εi2 − Wnk (Xi )εk εi  ≤   n  i=1 i =1 i=1 k̸=i        n n  n  n  1   1   1       add (Xi )εi  +  m Wni (Xi )εi2  +  Wnk (Xi )εk εi  . ≤C   n  i=1  n  i=1 k̸=i  n  i=1

(A.5)

We evaluate the first term on the right side of the last inequality. One can see that n 

add (Xi )εi ≤ max m add (Xi ) m 1≤i≤n

i=1

n 

εi .

i=1

Making use of Lemma 14, readily implies that, almost surely,





add (Xi ) = O n(−k/2k+1) log n . max m

1≤i≤n

This when combined with the law of the iterated logarithm, implies that n 

add (Xi )εi = o(n1/2 ) a.s. m

(A.6)

i=1

Once more, observe that n 

Wni (Xi )εi2 ≤ max Wni (Xi ) 1≤i≤n

i=1

n 

εi2 .

i=1

It is readily checked, by an application of Lemma 13(i), that max Wni (Xi ) = O (n−2k/2k+1 (log n)−1/2k+1 )

a.s.

1≤i≤n

This when combined with Lemma 15, in turn, implies that n 

Wni (Xi )εi2 = o(n1/2 ) a.s.

(A.7)

i=1

By using very similar arguments, we obtain n  n 



Wnk (Xi )εk εi ≤ O n(−k/2k+1) log n

i=1 k̸=i

n 

εi

i=1

= o(n1/2 ) a.s.

(A.8)

Combining the statements (A.6)–(A.8), we conclude that I4 = o(n−1/2 )

a.s.

(A.9)

The central limit theorem gives



d

n(I1 − σε2 ) → N (0, Var ε12 ) a.s.

(A.10)

The proof of Theorem 3 is completed by combining the statements (A.1)–(A.2), (A.4), (A.9) and (A.10). 

20

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

Proof of Theorem 4. In view of the law of the iterated logarithm, we infer that

 lim sup n→∞

1/2

n 2 log log n

1/2  |I1 − σε2 | = Var ε12

a.s.

(A.11)

By the same arguments of the proof of Theorem 3 combined with (A.11), the results of Theorem 4 follows.  Proof of Theorem 5. Since B is a positive definite matrix, then, from Theorem 2.1 of Chokri and Louani [14], we have n( β − β)⊤ B( β − β)

Rn :=

σ2

d

→ χ 2 (p).

(A.12)

In the light of the model (2.10), we have

β β Y i = Z⊤ i β + madd (Xi ) + εi + (madd (Xi ) − madd (Xi )). It is clear that n 1

n i=1

εi2 =

n 1

n 1

n i =1 n 2

n i=1

(Y˜i − Z˜ i β)2 +

+

n i =1

β

add (Xi ))2 (madd (Xi ) − m β

add (Xi )). (Y˜i − Z˜ i β)(madd (Xi ) − m

The Cauchy–Schwarz inequality shows that n 1

n i=1

β

add (Xi )| |Y˜i − Z˜ i β| |madd (Xi ) − m

 ≤

 12 

n 1

n i=1

(Y˜i − Z˜ i β)

2

n 1

n i=1

β

add (Xi )) (madd (Xi ) − m

 12 .

2

Making use of Lemma 1 of Camlong-Viot [8] where all the conditions related to the mixing setting considered there are relaxed, it follows, under hypotheses (G.1-3), (H.1), (K.1-3), (M.1) and (Q.1-2), that β



ˆ add (Xi )| = O max |madd (Xi ) − m

1≤i≤n

log n nhn

 .

(A.13)

Therefore, using the statement (A.13), it can easily be seen that n 1

n i=1

n 1 2 (Y˜i − Z˜ i β)2 = εi = σε2 ,

n i =1

a.s.

(A.14)

Moreover, it is easy to see, again by the Cauchy–Schwarz inequality, that

| σn2 − σε2 | ≤

n n 2 2  1  ⊤ ˜⊤  Z˜ i ( β − β) + |Y˜i − Z˜ ⊤ i β||Zi (β − β)| n i =1 n i=1   n 1     ⊤ 2 2 + (Y˜i − Z˜ i β) − σε    n i=1



n 1

≤ max ∥Z˜ i ∥2 ∥ β − β∥2 + 2 max ∥Z˜ i ∥ ∥ β − β∥ 1≤i≤n 1≤i≤n n   n 1    2 2 + (Y˜ − Z˜ ⊤ i β) − σε  .  n i=1 i 

2 (Y˜i − Z˜ ⊤ i β)

i=1

 12

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

21

Note that since the random vector Z is bounded, making use of Lemma 13(ii), it follows that Z˜ i is also bounded for any 1 ≤ i ≤ n. An application of Theorem 2.2 of Chokri and Louani [14], gives

∥β −  β∥ = o(1) a.s.

(A.15)

Therefore, considering the statement (A.14) and (A.15) together, it is clear that

 σn2 − σε2 = o(1) a.s.

(A.16)

Making use of Slutsky’s Theorem in combination with (A.12), (A.16) we achieve the proof of Theorem 5.  Proof of Theorem 6. The proof is a straightforward consequence of Theorem 3.



Proof of Corollary 7. First note that a straightforward calculus gives Vn (ε12 ) − Var ε12 =

n 1 

⊤ −1 ( Yi −  Z⊤ Zi Yi )2 −  σn2 i (Zi Zi )

n i=1

=

n 1 

=

n 1 

  − E ε12 − Eε12

  2 ⊤ −1 ( Yi −  Z⊤ Zi Yi )2 − σε2 − ( σn2 − σε2 ) − E ε12 − σε2 i (Zi Zi )

n i=1

⊤ −1 ( Yi −  Z⊤ Zi Yi )2 − σε2 i (Zi Zi )

n i=1

−2

2

n 1 

n i =1

2

+ ( σn2 − σε2 )2

   2 ⊤ −1  2 2 2 2 2   ( Yi −  Z⊤ ( Z Z ) Z Y ) − σ ( σ − σ ) − E ε − σ i i i i i ε n ε 1 ε .

In view of the statement (A.16), the fact that n 1 

n i =1

 ⊤ −1 ( Yi −  Z⊤ Zi Yi )2 − σε2 = o(1) a.s. i (Zi Zi )

and

( σn2 − σε2 )2 = o(1) a.s., we readily obtain that Vn (ε12 ) − Var ε12 =

n 1 

⊤ −1 ( Yi −  Z⊤ Zi Yi )2 − σε2 i (Zi Zi )

n i=1

2

  − E ε12 − σε2 + o(1) a.s.

By the law of large numbers, we obtain n 1 

n i =1

⊤ −1 ( Yi −  Z⊤ Zi Yi )2 − σε2 i (Zi Zi )

2

 2 = E ε12 − σε2 + o(1) a.s.

This, in turn, implies Vn (ε12 ) − Var ε12 = o(1) a.s.

(A.17)

Finally, we have n( σn2 − σε2 )2 Vn (ε ) 2 1

=

n( σn2 − σε2 )2 Var ε

2 1

×

Var ε12 Vn (ε12 )

.

It is sufficient to combine Theorem 6 and (A.17) to achieve the proof of Corollary 7.



22

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

Appendix B. Auxiliary results For reader’s convenience and completeness we provide some technical lemmas used to establish the main results. Lemma 12. Let V1 , . . . , Vn be independent random variables with zero means and finite variances, i.e., for some r ≥ 2, sup E|Vk |r ≤ C < ∞. 1≤k≤n

Assume that (aki )1≤i,k≤n , is a sequence of positive numbers such that,for some 0 < p1 < 1, max |aki | ≤ n−p1

1≤i,k≤n

and n 

aki = O (np2 )

k=1

for p2 ≥ max(0, 2/r − p1 ). Then, for s = (p1 − p2 )/2,

  n       a.s. max  aki Vk  = O n−s log n  1≤i≤n  k=1 Proof. The proof of Lemma 12 is given in [37].



The proofs of the three following lemmas are given in [14]. Lemma 13. Assume that conditions (G.2-3), (H.1), (K.1-2), (Q.1-2) are satisfied. Suppose, for some r ≥ 2, that max E |εi |r < ∞.

1≤i≤n

Then, we have

    1   (i) max max Wnj (Xi ) = O a.s., 1≤i≤n 1≤j≤n nh   n     (ii) max  Wnj (Xi ) = O (1) a.s.,  1≤i≤n  j=1   n    −k    (iii) max  Wnj (Xi )εj  = O n 2k+1 log n  1≤i≤n  j =1

a.s.

Lemma 14. Assume that conditions (G.1-3), (H.1), (K.1-3), (Q.1-2) hold true. In addition, we suppose that, for some r ≥ 2, max E |εi |r < ∞.

1≤i≤n

Then, we have

 

 



−k

add (Xi ) = O n 2k+1 log n max m



a.s.

1≤i≤n

Lemma 15. Let V1 , . . . , Vn be independent random variables with 0 mean such that, for some α > 2, max E|Vk |α < ∞.

1≤k≤n

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

23

Suppose that lim inf n→∞

n 1

n k=1

Var(Vk ) > 0,

then we have lim sup n→∞

|Sn | = 1 a.s., (2s2n log log s2n )1/2

where Sn =

n  k=1

Vk

and

s2n =

n 

Var(Vk ).

k=1

Proof. See, for instance, Stout [60, Corollary 5.2.3].



References [1] B. Auestad, D. Tjøstheim, Functional identification in nonlinear time series, in: Nonparametric Functional Estimation and Related Topics (Spetses, 1990), in: NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., vol. 335, Kluwer Acad. Publ., Dordrecht, 1991, pp. 493–507. [2] P.K. Bhattacharya, P.-L. Zhao, Semiparametric inference in a partial linear model, Ann. Statist. 25 (1) (1997) 244–262. [3] P.J. Bickel, M. Rosenblatt, On some global measures of the deviations of density function estimates, Ann. Statist. 1 (1973) 1071–1095. [4] D. Bosq, J.-P. Lecoutre, Théorie de l’Estimation Fonctionnelle, in: Économie et Statistiques Avancées, Economica, Paris, 1987. [5] S. Bouzebda, K. Chokri, D. Louani, Some uniform consistency results in the partially linear additive model components estimation, Comm. Statist. Theory Methods (2014) in press. [6] L. Breiman, J.H. Friedman, Estimating optimal transformations for multiple regression and correlation, J. Amer. Statist. Assoc. 80 (391) (1985) 580–619. With discussion and with a reply by the authors. [7] A. Buja, T. Hastie, R. Tibshirani, Linear smoothers and additive models, Ann. Statist. 17 (2) (1989) 453–555. [8] C. Camlong-Viot, Vers un test d’additivité en régression non paramétrique sous des conditions de mélange, C. R. Acad. Sci., Paris I 333 (9) (2001) 877–880. [9] J.E. Chacón, J. Montanero, A.G. Nogales, A note on kernel density estimation at a parametric rate, J. Nonparametr. Stat. 19 (1) (2007) 13–21. [10] J.M. Chambers, T. Hastie (Eds.), Statistical Models in S, Chapman & Hall, London, 1991. [11] H. Chen, Convergence rates for parametric components in a partly linear model, Ann. Statist. 16 (1) (1988) 136–146. [12] R. Chen, W. Härdle, O.B. Linton, E. Severance-Lossin, Nonparametric estimation of additive separable regression models, in: Statistical Theory and Computational Aspects of Smoothing (Semmering, 1994), in: Contrib. Statist., Physica, Heidelberg, 1996, pp. 247–265. [13] H. Chen, J.J.H. Shiau, A two-stage spline smoothing method for partially linear models, J. Statist. Plann. Inference 27 (2) (1991) 187–201. [14] K. Chokri, D. Louani, Aymptotic results for the linear parameter estimate in partially linear additive regression model, C. R. Math. Acad. Sci., Paris 349 (19–20) (2011) 1105–1109. [15] P. Deheuvels, One bootstrap suffices to generate sharp uniform bounds in functional estimation, Kybernetika (Prague) 47 (6) (2011) 855–865. [16] H. Dette, A consistent test for heteroscedasticity in nonparametric regression based on the kernel method, J. Statist. Plann. Inference 103 (1–2) (2002) 311–329. C. R. Rao 80 th birthday felicitation volume, Part I. [17] H. Dette, M. Marchlewski, A robust test for homoscedasticity in nonparametric regression, J. Nonparametr. Stat. 22 (5–6) (2010) 723–736. [18] H. Dette, A. Munk, Testing heteroscedasticity in nonparametric regression, J. R. Stat. Soc. Ser. B Stat. Methodol. 60 (4) (1998) 693–708. [19] L. Devroye, L. Györfi, Nonparametric Density Estimation, in: Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics, John Wiley & Sons Inc., New York, 1985. The L1 view. [20] L. Devroye, G. Lugosi, Combinatorial Methods in Density Estimation, in: Springer Series in Statistics, Springer-Verlag, New York, 2001. [21] S.G. Donald, W.K. Newey, Series estimation of semilinear models, J. Multivariate Anal. 50 (1) (1994) 30–40. [22] U. Einmahl, D.M. Mason, An empirical process approach to the uniform consistency of kernel-type function estimators, J. Theoret. Probab. 13 (1) (2000) 1–37. [23] U. Einmahl, D.M. Mason, Uniform in bandwidth consistency of kernel-type function estimators, Ann. Statist. 33 (3) (2005) 1380–1403. [24] R.F. Engle, C.W.J. Granger, J. Rice, G.H. Weiss, Semiparametric estimates of the relation between weather and electricity sales, J. Amer. Statist. Assoc. 81 (394) (1986) 310–320. [25] R.L. Eubank, P. Speckman, Curve fitting by polynomial-trigonometric regression, Biometrika 77 (1) (1990) 1–9. [26] J. Fan, I. Gijbels, Local Polynomial Modelling and its Applications, in: Monographs on Statistics and Applied Probability, vol. 66, Chapman & Hall, London, 1996.

24

S. Bouzebda, K. Chokri / Statistical Methodology 19 (2014) 4–24

[27] J. Fan, W. Härdle, E. Mammen, Direct estimation of low-dimensional components in additive models, Ann. Statist. 26 (3) (1998) 943–971. [28] S.A. Hamilton, Y.K. Truong, Local linear estimation in partly linear models, J. Multivariate Anal. 60 (1) (1997) 1–19. [29] W. Härdle, Applied Nonparametric Regression, in: Econometric Society Monographs, vol. 19, Cambridge University Press, Cambridge, 1990. [30] W. Härdle, H. Liang, J. Gao, Partially Linear Models, in: Contributions to Statistics, Physica-Verlag, Heidelberg, 2000. [31] T.J. Hastie, R.J. Tibshirani, Generalized Additive Models, in: Monographs on Statistics and Applied Probability, vol. 43, Chapman and Hall Ltd., London, 1990. [32] N.W. Hengartner, S. Sperlich, Rate optimal estimation with the integration method in the presence of many covariates, J. Multivariate Anal. 95 (2) (2005) 246–272. [33] M.C. Jones, On higher order kernels, J. Nonparametr. Stat. 5 (2) (1995) 215–221. [34] M.C. Jones, S.J. Davies, B.U. Park, Versions of kernel-type regression estimators, J. Amer. Statist. Assoc. 89 (427) (1994) 825–832. [35] M.C. Jones, O. Linton, J.P. Nielsen, A simple bias reduction method for density estimation, Biometrika 82 (2) (1995) 327–338. [36] M.C. Jones, D.F. Signorini, A comparison of higher-order bias kernel density estimators, J. Amer. Statist. Assoc. 92 (439) (1997) 1063–1073. [37] H. Liang, Asymptotic normality of parametric part in partially linear models with measurement error in the nonparametric part, J. Statist. Plann. Inference 86 (1) (2000) 51–62. [38] H. Liang, S.W. Thurston, D. Ruppert, T. Apanasovich, R. Hauser, Additive partial linear models with measurement errors, Biometrika 95 (3) (2008) 667–678. [39] H. Liero, Testing homoscedasticity in nonparametric regression, J. Nonparametr. Stat. 15 (1) (2003) 31–51. [40] J.-G. Lin, X.-Y. Qu, A consistent test for heteroscedasticity in semi-parametric regression with nonparametric variance function based on the kernel method, Statistics 46 (5) (2012) 565–576. [41] O.B. Linton, D.T. Jacho-Chávez, On internally corrected and symmetrized kernel estimators for nonparametric regression, TEST 19 (1) (2010) 166–186. [42] O. Linton, J.P. Nielsen, A kernel method of estimating structured nonparametric regression based on marginal integration, Biometrika 82 (1) (1995) 93–100. [43] X. Liu, Z. Wang, X. Hu, Testing heteroscedasticity in partially linear models with missing covariates, J. Nonparametr. Stat. 23 (2) (2011) 321–337. [44] Y.P. Mack, H.-G. Müller, Derivative estimation in nonparametric regression with random predictor variable, Sankhya¯ A 51 (1) (1989) 59–72. [45] E. Mammen, O. Linton, J. Nielsen, The existence and asymptotic properties of a backfitting projection algorithm under weak conditions, Ann. Statist. 27 (5) (1999) 1443–1490. [46] E. Mammen, B.U. Park, M. Schienle, Additive models: dxtensions and related models, SFB 649 Discussion Papers SFB649DP2012-045, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany, 2012. [47] W.K. Newey, Kernel estimation of partial means and a general variance estimator, Econom. Theory 10 (2) (1994) 233–253. [48] J.D. Opsomer, D. Ruppert, Fitting a bivariate additive model by local polynomial regression, Ann. Statist. 25 (1) (1997) 186–211. [49] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist. 33 (1962) 1065–1076. [50] B.L.S. Prakasa Rao, Nonparametric Functional Estimation, in: Probability and Mathematical Statistics, Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, 1983. [51] J. Rice, Convergence rates for partially splined models, Statist. Probab. Lett. 4 (4) (1986) 203–208. [52] P.M. Robinson, Root-N-consistent semiparametric regression, Econometrica 56 (4) (1988) 931–954. [53] D.W. Scott, Multivariate Density Estimation, in: Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons Inc., New York, 1992, Theory, practice, and visualization, A Wiley-Interscience Publication. [54] C.-W. Shen, T.-S. Tsou, N. Balakrishnan, Robust likelihood inference for regression parameters in partially linear models, Comput. Statist. Data Anal. 55 (4) (2011) 1696–1714. [55] J. Shen, Y. Xie, Strong consistency of the internal estimator of nonparametric regression with dependent data, Statist. Probab. Lett. 83 (8) (2013) 1915–1925. [56] P.D. Shi, G.Y. Li, Asymptotic normality of the M-estimators for parametric components in partly linear models, Northeast. Math. J. 11 (2) (1995) 127–138. [57] P.D. Shi, G.Y. Li, A note on the convergent rates of M-estimates for a partly linear model, Statistics 26 (1) (1995) 27–47. [58] C.J. Stone, Additive regression and other nonparametric models, Ann. Statist. 13 (2) (1985) 689–705. [59] C.J. Stone, The dimensionality reduction principle for generalized additive models, Ann. Statist. 14 (2) (1986) 590–606. [60] W.F. Stout, Almost Sure Convergence, in: Probability and Mathematical Statistics, vol. 24, Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York, London, 1974. [61] D. Tjøstheim, B.H. Auestad, Nonparametric identification of nonlinear time series: projections, J. Amer. Statist. Assoc. 89 (428) (1994) 1398–1409. [62] M.P. Wand, M.C. Jones, Kernel Smoothing, in: Monographs on Statistics and Applied Probability, vol. 60, Chapman and Hall Ltd., London, 1995. [63] K. Yu, E. Mammen, B.U. Park, Semi-parametric regression: efficiency gains from modeling the nonparametric part, Bernoulli 17 (2) (2011) 736–748.