Rate of uniform consistency for nonparametric estimates with functional variables

Rate of uniform consistency for nonparametric estimates with functional variables

Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 Contents lists available at ScienceDirect Journal of Statistical Planning and In...

250KB Sizes 2 Downloads 38 Views

Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Rate of uniform consistency for nonparametric estimates with functional variables Frédéric Ferratya , Ali Laksacib , Amel Tadjc, ∗ , Philippe Vieua a

Université Paul Sabatier, Toulouse, France Université Djillali Liabes, Sidi Bel Abbes, Algeria c Université de Sidi Bel Abbès, BP 89 Sidi Bel Abbès 22000, Algeria b

A R T I C L E

I N F O

Article history: Received 16 April 2008 Received in revised form 21 July 2009 Accepted 21 July 2009 Available online 6 August 2009 Keywords: Uniform almost complete convergence Kernel estimators Functional data Entropy Semi-metric space

A B S T R A C T

In this paper we investigate nonparametric estimation of some functionals of the conditional distribution of a scalar response variable Y given a random variable X taking values in a semimetric space. These functionals include the regression function, the conditional cumulative distribution, the conditional density and some other ones. The literature on nonparametric functional statistics is only concerning pointwise consistency results, and our main aim is to prove the uniform almost complete convergence (with rate) of the kernel estimators of these nonparametric models. Unlike in standard multivariate cases, the gap between pointwise and uniform results is not immediate. So, suitable topological considerations are needed, implying changes in the rates of convergence which are quantified by entropy considerations. These theoretical uniform consistency results are (or will be) key tools for many further developments in functional data analysis. © 2009 Elsevier B.V. All rights reserved.

Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Topological considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Kolmogorov's entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Same examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Estimation of the regression function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Conditional cumulative distribution estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Conditional density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Some direct consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. Estimation of the conditional hazard function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Conditional mode estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1. Impact of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. General comments on the hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3. Comments on convergence rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

∗ Corresponding author. E-mail address: [email protected] (A. Tadj). 0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.07.019

336 336 336 337 338 339 340 340 340 341 341 341 342 342 343 343 352

336

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

1. Introduction Studying the link between a scalar response variable Y given a new value for the explanatory variable X is an important subject in nonparametric statistics, and there are several ways to explain this link. For examples, the conditional expectation, the conditional distribution, the conditional density and the conditional hazard function. The purpose of this paper is to give some contribution to the nonparametric estimation of these various conditional quantities when the explanatory variable is functional. This investigation is motivated by the fact that there is an increasing number of examples coming from different fields of applied sciences for which the data are curves and there are many nonparametrical statistical problems which occur in the functional setting (see, Ferraty and Vieu, 2006, for an extensive discussion on nonparametric statistics for functional data). Note that the modelization of functional variable is becoming more and more popular since the publication of the monograph of Ramsay and Silverman (1997) on functional data analysis. However, the first results concerning the nonparametric models (mainly the regression function) were obtained by Ferraty and Vieu (2000). They established the almost complete pointwise consistency1 of kernel estimators of the regression function when the explanatory variable is functional and the observations are i.i.d. Their study is extended to nonstandard regression problems such that time series prediction (see, Ferraty et al., 2002). Dabo-Niang and Rhomari (2003) stated the convergence in Lp norm of the kernel estimator of this model, and Delsol (2007) states the exact asymptotic expression for Lp errors. The asymptotic normality result for the same estimator in the strong mixing case has been obtained by Masry (2005) and extended by Delsol (2009). The kernel type estimation of some characteristics of the conditional cumulative distribution function and the successive derivatives of the conditional density have been introduced by Ferraty et al. (2006). They established the almost complete consistency for i.i.d. observations. The strong mixing case has been studied by Ferraty et al. (2005). Recently, Ferraty et al. (2007) gave the asymptotic expansion of the mean squared error of the kernel estimator of the regression function. Pointwise asymptotic properties of a kernel estimate of the conditional hazard function have been investigated by Ferraty et al. (2008a). Among the lot of papers concerning the nonparametric models related with the conditional distribution of a real variable given a random variable taking values in infinite dimensional spaces, we only refer to papers by Dabo-Niang and Laksaci (2007) and Ezzahrioui and Ould-Sa¨d (2008). While this literature is only concerning pointwise asymptotic, our interest in this paper is to establish the uniform almost complete convergence of the nonparametric estimates of the various conditional quantities mentioned above. Uniform consistency results have been successfully used in standard nonparametric setting in order to derive asymptotic properties for data-driven bandwidth choice, additive modelling or multi-step estimation. So, it is natural in this setting of functional data analysis (FDA) to investigate in a systematic way uniform consistency properties. Indeed, because the FDA topic youth one can expect in a near future that all these results will be useful in numerous functional statistical methodologies like data-driven procedures or additive modelling (see, Section 7 for more detailed motivations or related bibliography). Moreover, this work completes the results obtained in Ferraty and Vieu (2006) where the pointwise almost complete consistency with rate of these models is given. It is worth noting that the uniform convergence is not a direct extension of the previous pointwise results. Indeed, it requires additional topological conditions, expressed here in terms of Kolmogorov's entropy. We will see that, unlike in standard nonparametric statistics, these infinite dimensional topological considerations may lead in some cases (see for instance, Example 3 in Section 2.2) to rates which are slower for uniform than for pointwise results. At last, all these asymptotic results are established under conditions related to some concentration properties expressed in terms of small balls probabilities of the underlying explanatory variable. We note that our hypotheses and results unify both cases of finite and infinite dimension of the regressors, which permits to overcome the curse of dimensionality problem. Section 2 focuses on topological considerations via Kolmogorov's entropy, whereas Section 3 deals with a general regression model. In Section 4, we studies the conditional cumulative distribution and the conditional density estimation is developed in Section 5. In Section 6, we emphasize the consequence of the previous results to the estimation of the conditional mode and conditional hazard function. Finally, in Section 7, we comment the obtained results and their potential impact on the statistical literature as key tools for many further advances in FDA. Throughout this paper, we consider a sample of independent pairs (Xi , Yi )1 ⱕ i ⱕ n identically distributed as (X, Y) which is a random vector valued in F × R, where F is a semi-metric space, d denoting the semi-metric and we will use the notation B(x, h) = {x ∈ F/d(x , x) ⱕ h}. 2. Topological considerations 2.1. Kolmogorov's entropy The purpose of this section is to emphasize the topological components of our study. Indeed, as indicated in Ferraty and Vieu (2006), all the asymptotic results in nonparametric statistics for functional variables are closely related to the concentration properties of the probability measure of the functional variable X. Here, we have moreover to take into account the uniformity aspect. To this end, let SF be a fixed subset of F; we consider the following assumption: (H1) ∀x ∈ SF , 0 < C (h) ⱕ P(X ∈ B(x, h)) ⱕ C  (h) < ∞.  1 Let (zn )n∈N∗ be a sequence of real variables; we say that zn converges almost completely (a.co.) to zero if and only if, ∀ > 0, ∞ n=1 P(|zn | > ) < ∞. Moreover,  P(|z | >  u ) < ∞. This kind of convergence implies both let (un )n∈N∗ be a sequence of positive real numbers; we say that zn = O(un ) a.co. if and only if ∃ > 0, ∞ n n n=1 almost sure convergence and convergence in probability.

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

337

We can say that the first contribution of the topological structure of the functional space can be viewed through the function

 controlling the concentration of the measure of probability of the functional variable on a small ball. Moreover, for the uniform consistency, where the main tool is to cover a subset SF with a finite number of balls, one introduces an other topological concept defined as follows: Definition 1. Let S be a subset of a semi-metric space F, and let  > 0 be given. A finite set of points x1 , x2 , . . . , xN in F is called  an -net for S if S ⊂ N k=1 B(xk , ). The quantity S () = log(N (S)), where N (S) is the minimal number of open balls in F of radius  which is necessary to cover S, is called Kolmogorov's -entropy of the set S. This concept was introduced by Kolmogorov in the mid-1950s (see, Kolmogorov and Tikhomirov, 1959) and it represents a measure of the complexity of a set, in sense that, high entropy means that much information is needed to describe an element with an accuracy . Therefore, the choice of the topological structure (with other words, the choice of the semi-metric) will play a crucial role when one is looking at uniform (over some subset SF of F) asymptotic results. More precisely, we will see thereafter that a good semi-metric can increase the concentration of the probability measure of the functional variable X as well as minimize the -entropy of the subset SF . In an earlier contribution (see, Ferraty et al., 2006) we highlighted the phenomenon of concentration of the probability measure of the functional variable by computing the small ball probabilities in various standard situations. We will devote Section 2.2 to discuss the behaviour of Kolmogorov's -entropy in these standard situations. Finally, we invite the readers interested in these two concepts (entropy and small ball probabilities) or/and the use of Kolmogorov's -entropy in dimensionality reduction problems to refer to, respectively, Kuelbs and Li (1993) or/and Theodoros and Yannis (1997). 2.2. Same examples We will start (Example 1) by recalling how this notion behaves in un-functional case (that is when F = RP ). Then, Examples 2 and 3 are covering special cases of functional process. More interestingly (from statistical point of view) is Example 4 since it allows to construct, in any case, a semi-metric with reasonably “small” entropy. Example 1 (Compact subset in finite dimensional space). A standard theorem of topology guaranties that for each compact subset S of Rp and for each  > 0 there is a finite -net and we have for any  > 0,

S () ⱕ Cp log

  1



.

More precisely, Chate and Courbage (1997) have shown that, for any  > 0 the regular polyhedron in Rp with length r can be √ covered by ([2r p/ ] + 1)p balls, where [m] is the largest integer which is less than or equal to m. Thus, Kolmogorov's -entropy of a polyhedron Pr in Rp with length r is ∀ > 0,

Pr () ∼ p log



√  2r p



 +1 .

Example 2 (Closed ball in a Sobolev space). Kolmogorov and Tikhomirov (1959) obtained many upper and lower bounds for the

-entropy of several functional subsets. A typical result is given for the class of functions f (t) on T = [0, 2p) with periodic boundary conditions and 1 2

 2 0

f 2 (t) dt +

1 2

 2 0

2

f (m) (t) dt ⱕ r.

The -entropy of this class, denoted W2m (r), is

W2m (r) () ⱕ C

 1/m r



.

Example 3 (Unit ball of the Cameron–Martin space). Recently, van der Vaart and van Zanten (2007) characterized the Cameron–Martin space associated to a Gaussian process viewed as map in C[0, 1] with the spectral measure  satisfying 

exp(||)(d) < ∞

by  

e−it h() d() : h ∈ L2 () , H = t : Re

338

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

and they show that Kolmogorov's -entropy of the unit ball BCMW of this space with respect to the supremum norm · ∞ is   2 1 BCMW () ∼ log as  → 0.



· ∞

Example 4 (Compact subset in a Hilbert space with a projection semi-metric). The projection-based semi-metrics are constructed in the following way. Assume that H is a separable Hilbert space, with inner product ., . and with orthonormal basis {e1 , . . . , ej , . . .}, and let k be a fixed integer, k > 0. As shown in Lemma 13.6 of Ferraty and Vieu (2006), a semi-metric dk on H can be defined as follows: k   (1) dk (x, x ) = x − x , ej 2 . j=1

Let be the operator defined from H into Rk by

(x) = ( x, e1 , . . . , x, ek ), and let deucl be the Euclidian distance on Rk , and let us denote by Beucl (., .) an open ball of Rk for the associated topology. Similarly, let us note by Bk (., .) an open ball of H for the semi-metric dk . Because is a continuous map from (H, dk ) into (Rk , deucl ), we have that for any compact subset S of (H, dk ), (S) is a compact subset of Rk . Therefore, for each  > 0 we can cover (S) with balls of centers zi ∈ Rk :

(S) ⊂

d 

Beucl (zi , r)

k

with dr = C for some C > 0.

(2)

i=1

For i = 1, . . . , d, let xi be an element of H such that (xi ) = zi . The solution of the equation (x) = zi is not unique in general, but just take xi to be one of these solutions. Because of (1), we have that

−1 (Beucl (zi , r)) = Bk (xi , r).

(3)

Finally, (2) and (3) are enough to show that Kolmogorov's -entropy of S is   1 S () ≈ Ck log .



3. Estimation of the regression function In this section, we consider the problem of the estimation of a generalized regression function defined as follows: m (x) = E[ (Y)|X = x],

∀x ∈ F,

(4)

where is a known real-valued Borel function. Model (4) has been widely studied, when the explicative variable X is real and

(Y) = Y while Deheuvels and Mason (2004) provide recent advances for general function . This model covers and includes many important nonparametric models such as the classical regression function, the conditional distribution, etc.  From the kernel estimator of the classical regression function (see, Ferraty and Vieu, 2006), we propose the estimate of m

(x) of m (x) defined as n −1 i=1 K(hK d(x, Xi )) (Yi )  m , ∀x ∈ F, n

(x) = −1 i=1 K(hK d(x, Xi )) where K is a kernel function and hK = hK,n is a sequence of positive real numbers which goes to zero as n goes to infinity.  Our aim is to establish the uniform almost complete convergence of m

on some subset SF of F. To do that we denote by C or/and C  some real generic constants supposed strictly positive and we assume that: b

(H2) There exists b > 0 such that ∀x1 , x2 ∈ SF , |m (x1 ) − m (x2 )| ⱕ Cd (x1 , x2 ). (H3) ∀m ⱖ 2, E(| (Y)|m |X = x) < m (x) < C < ∞ with m (·) continuous on SF . (H4) K is a bounded and Lipschitz kernel on its support [0,1), and if K(1) = 0, the kernel K has to fulfill the additional condition −∞ < C < K  (t) < C  < 0. (H5) The functions  and S are such that: F



(H5a) ∃C > 0, ∃ 0 > 0, ∀ < 0 ,  ( ) < C, and if K(1) = 0, the function (·) has to fulfill the additional condition:  (u) du > C ( ). ∃C > 0, ∃ 0 > 0, ∀0 < < 0 , 0

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

339

(H5b) For n large enough, (log n)2 /n(hK ) < S (log n/n) < n(hK )/log n. F (H6) Kolmogorov's -entropy of SF satisfies ∞ 

exp (1 − )S

 F

n=1

log n n



< ∞ for some > 1.

Conditions (H2)–(H4) are very standard in the nonparametric setting. Concerning (H5a), the boundness of the derivative of  around zero allows to consider  as a Lipschitzian function. In addition, from a theoretical point of view, one has to separate the case when K(·) is a continuous kernel (i.e. K(1) = 0) and the case when K(·) is not continuous (which contains, for instance, the uniform kernel). The case when K(1) = 0 is more delicate and one has to introduce an additional assumption acting on the behaviour of  around zero (see the proof in Appendix). Hypothesis (H5b) deals with topological considerations by controlling the entropy of SF . For a radius not too large, one requires that S (log n/n) is not too small and not too large. Moreover, F (H5b) implies that S (log n/n)/(n(hK )) tends to 0 when n tends to +∞. As remarked in Section 2, in some “usual” cases, one F

has S (log n/n) ∼ C log n and (H5b) is satisfied as soon as (log n)2 = O(n (hK )). In a different way, Assumption (H6) acts on F Kolmogorov's -entropy of SF . However, if one considers the same particular case as previously, it is easy to see that (H6) is verified as soon as > 2. Theorem 2. Under hypotheses (H1)–(H6), we have ⎛ ⎞  log n SF ⎜ ⎟ n ⎜ ⎟ b  sup |m ⎟.

(x) − m (x)| = O(hK ) + Oa.co. ⎜ ⎝ ⎠ n(hK ) x∈SF

(5)

4. Conditional cumulative distribution estimation In this section, we assume that the regular version of the conditional probability of Y given X exists and we study the uniform almost complete convergence of a kernel estimator of the conditional cumulative distribution function, denoted by F x . A straightforward way to estimate the function F x is to treat this function as particular case of m with (t) = I]−∞,y] (t) (for y ∈ R). Thus, we estimate F x by  F x (y) =

n 

Wni (x)I{Yi ⱕ y} ,

∀y ∈ R, ∀x ∈ F,

i=1

where K(h−1 d(x, Xi )) Wni (x) = n K −1 . i=1 K(hK d(x, Xi )) The estimate of the conditional cumulative distribution function has been investigated, in the real case, by several authors (see, Roussas, 1969; Samanta, 1989, among others). In the functional case, Ferraty et al. (2006) established the almost complete convergence of a double kernel estimator of the conditional cumulative distribution function. Clearly, the previous result stated in Section 3 allows to conclude the almost complete convergence of  F x uniformly in the functional argument x. Indeed, it suffices to apply Theorem 2 to get: Corollary 3. Under hypotheses (H1), (H2) and (H4)–(H6), we have ⎛ ⎞  log n SF ⎜ ⎟ n ⎜ ⎟ sup | F x (y) − F x (y)| = O(hbK ) + Oa.co. ⎜ ⎟. ⎝ ⎠ n(hK ) x∈SF But, in order to derive the uniform consistency on both (functional and real arguments), we fix a compact subset SR of R and we consider the following additional assumptions: (H2 ) ∀(y1 , y2 ) ∈ SR × SR , ∀(x1 , x2 ) ∈ SF × SF , |F x1 (y1 ) − F x2 (y2 )| ⱕ C(d(x1 , x2 )b1 + |y1 − y2 |b2 ). (H6 ) Kolmogorov's -entropy of SF satisfies ∞  n=1

n1/2b2 exp (1 − )S

 F

log n n



< ∞ for some > 1.

340

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Theorem 4. Under hypotheses (H1), (H2 ), (H4), (H5) and (H6 ), we have ⎛ ⎞  log n  ⎜ ⎟ S n ⎜ F ⎟ b sup sup | F x (y) − F x (y)| = O(hK1 ) + Oa.co. ⎜ ⎟. ⎝ ⎠ n(hK ) x∈SF y∈SR

(6)

5. Conditional density estimation In this section, similar results will be derived for the kernel estimator of the conditional density of Y given X. We assume that the conditional probability of Y given X is absolutely continuous with respect to the Lebesgues measure on R and we will denote by f x the conditional density of Y given X = x. We define the kernel estimator fx of f x as follows: h−1 fx (y) = H

n

−1 −1 i=1 K(hK d(x, Xi ))H(hH (y − Yi )) , n −1 i=1 K(hK d(x, Xi ))

∀y ∈ R, ∀x ∈ F,

(7)

where the functions K and H are kernels and hK = hK,n (resp. hH = hH,n ) is a sequence of positive real numbers. Note that a similar estimate was already introduced in the special case when X is a real random variable by Rosenblatt (1969) and by Youndjé (1996) among other authors. In order to establish the uniform almost complete convergence of this estimator, we consider the following additional assumptions: (H2 ) ∀(y1 , y2 ) ∈ SR × SR , ∀(x1 , x2 ) ∈ SF × SF , |f x1 (y1 ) − f x2 (y2 )| ⱕ C(db1 (x1 , x2 ) + |y1 − y2 |b2 ). (H5b ) For some ∈ (0, 1), limn→+∞ n hH = ∞, and for n large enough: (log n)2 < SF n1− (hK )



 n1− (hK ) log n < . n log n

(H6 ) Kolmogorov's -entropy of SF satisfies ∞ 

n(3 +1)/2 exp (1 − )S

 F

n=1

log n n



< ∞ for some > 1.

(H7) H is bounded Lipschitzian continuous function, such that



|t|b2 H(t) dt < ∞ and



H2 (t) dt < ∞.

Theorem 5. Under hypotheses (H1), (H2 ), (H4), (H5a), (H5b ), (H6 ) and (H7), we have ⎛ ⎞  log n SF ⎟ ⎜ n ⎟ ⎜ x b b sup sup | f (y) − f x (y)| = O(hK1 ) + O(hH2 ) + Oa.co. ⎜ ⎟. ⎝ n1− (hK ) ⎠ x∈SF y∈SR

(8)

6. Some direct consequences 6.1. Estimation of the conditional hazard function This section is devoted to the almost complete convergence of the kernel estimator of the conditional hazard function of Y given X uniformly on fixed subset SF × SR of F × R. We return to Ferraty et al. (2008) for the pointwise almost complete convergence of this model, in the functional case. Recall that the conditional hazard function is defined by hx (y) =

f x (y) , 1 − F x (y)

∀y, F x (y) < 1, ∀x ∈ F.

Naturally, the conditional hazard function estimator is closely linked to the conditional survival function estimate. Consider the kernel estimates of the functions F x and f x defined in the previous section, and adopt the kernel estimator hx (y) of the conditional hazard function hx defined by hx (y) =

 f x (y) . 1 − F x (y)

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

341

In addition to the previous assumptions, used to establish the convergence rates of (6) and (8), the following is needed: (H8) ∃1 > 0, 2 > 0 such that ∀x ∈ SF , ∀y ∈ SR , F x (y) < 1 < 1 and f x (y) ⱕ 2 .

Theorem 6. Under the hypotheses of Theorem 5 and if (H6 ) and (H8) hold, then ⎛ ⎞  log n  ⎜ SF ⎟ n ⎜ ⎟ b b sup sup | hx (y) − hx (y)| = O(hK1 ) + O(hH2 ) + Oa.co. ⎜ ⎟. 1− (h ) ⎠ ⎝ n K x∈SF y∈SR

(9)

6.2. Conditional mode estimation Let us now study the almost complete convergence of the kernel estimator of the conditional mode of Y given X = x, denoted by

(x), uniformly on fixed compact subset SF of F. For this, we assume that (x) satisfies on SF the following uniform uniqueness property (see, Ould-Sa¨d and Cai, 2005, for the multivariate case):

(H9) ∀0 > 0 ∃ > 0, ∀r : S → SR , sup |(x) − r(x)| ⱖ 0 ⇒ sup |f x (r(x)) − f x ((x))| ⱖ .

x∈SF

x∈SF

Moreover, we suppose also that there exists some integer j > 1 such that ∀x ∈ SF the function f x is j-times continuously differentiable with respect to y on SR and (H10) f x(l) ((x)) = 0 if 1 ⱕ l < j and f x(j) (·) is uniformly continuous on SR such that |f x(j) ((x))| > C > 0, where f x(j) is the jth order derivative of the conditional density f x . (x) such that We estimate the conditional mode (x) with a random variable  x  (x) = arg sup  f (y). y∈SR

From Theorem 5 we derive the following corollary. Corollary 7. Under the hypotheses of Theorem 5 and if the conditional density f x satisfies (H9) and (H10), we have ⎛ ⎞  log n    ⎜ SF ⎟ n ⎜ ⎟ b b (x) − (x)|j = O(hK1 ) + O(hH2 ) + Oa.co. ⎜ sup | ⎟. 1− (h ) ⎠ ⎝ n K x∈SF

7. Comments 7.1. Impact of the results This paper has stated uniform consistency results in functional setting. They are not only nice extensions of pointwise results but they have great impacts (both from theoretical and practical point of view). First of all, the natural practical interest of getting uniform consistency is for prediction. Look for instance at Theorem 2. The fact to be able to state results on the quantity  sup |m

(x) − m (x)|

x∈SF

allows directly to obtain result on quantity  |m

(X) − m (X)|, where X is a new random functional element valued in SF . The same kind of remark can be done for the other problems treated in Theorems 8, 11 and 14 (i.e. conditional cumulative distribution, conditional density and conditional hazard function). More generally, as in multivariate statistics, it can be useful for estimating the solution of general equations with applications for detecting peaks, valleys, change points (see, for instance, Boularan et al., 1995). Secondly, and this is maybe the main point,

342

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

uniform consistency results are indispensable tools for the study of more sophisticated models in which multi-stage procedures are involved. This occurs in a very wide scope of situation in standard multivariate nonparametric analysis. In functional setting, the needs of uniform consistency results have been recently pointed out in some multi-stage models such as additive modelling (Ferraty and Vieu, 2009), partial functional linear model (Aneiros Perez and Vieu, 2006), and single functional models (Ait Sa¨di et al., 2009). Other functional data methodologies are also using such kind of uniform tools, such as for instance data-driven bandwidth choice (Benhenni et al., 2007), or bootstrapping (Ferraty et al., 2008b). The scope of functional applications of our theoretical uniform results will increase in the near future following the progress of FDA. 7.2. General comments on the hypotheses In addition to the comments given in Section 3, we go back to complete this discussion by comparing the structural assumptions of the uniform convergence to those of the pointwise ones studied by Ferraty and Vieu (2006). On the functional variable: Unlike the pointwise case, the uniform consistency requires a concentration property of the probability measure uniformly over SF (see, (H1)). So, it is important to give here general situations when such an assumption is fulfilled. From a probabilistic point of view, this can be done by introducing the Onsager–Machlup function (see, Onsager and Machlup, 1953) defined as   P(B(x, h)) . ∀(x, z) ∈ SF , FX (x, z) = log lim h→0 P(B(z, h)) Then, (H1) is verified if the Onsager–Machlup function of the probability measure of the functional variable is such that ∀x ∈ SF ,

|FX (x, 0)| ⱕ C < ∞.

(10)

The Onsager–Machlup function has been intensively studied in the literature as well as the quantities P(X ∈ B(0, h)). Their respective explicit expression for several continuous time processes can be found in Bogachev (p. 186, 1999), which produces thereby examples of subsets and functional variables that satisfy (H1). This pure probabilistic point of view focuses on small ball probabilities with standard topologies. But, from a statistical point of view, the practitioner can choose the semi-metric. In particular, Example 4 in Section 2 gives an interesting family of semi-metrics allowing to fulfill (H1) for a large set of functional variables (see, Ferraty and Vieu, 2006, Lemma 13.6). In fact, as a statistician, an important task consists in building a semi-metric adapted to the functional variable. Here, the word “adapted” can mean that this semi-metric allows to satisfy (H1) but other properties for the semi-metric can be required and this issue will be certainly investigated in further works. On the regularity constraints of the model: Regularity-type conditions on the functional objects to be estimated are given via assumptions (H2), (H2 ), (H2 ). In comparison with the pointwise case (see, Ferraty et al., 2006), we do not assume that the constants depend on the conditioning point. Moreover, these assumptions are sufficient but not necessary. For instance, one can replace (H2) by a new one thanks to the function Lx (z) = E[ (Y) − m (x)|X = z]. Consider F as a vector semi-normed space and let us assume that it exists a linear operator Ax such that Lx (z) = Ax (z − x) + o( x − z ), where Ax is bounded uniformly over SF (which amounts to assume that Lx is differentiable since Lx (x) = 0). From an asymptotic point of view, the only change is appearing in the bias. So, by using the first-order expansion of Lx , we can get also under this alternative assumption that sup |Eg

(x) − m (x)| = O(h).

x∈SF

As a conclusion, there are several ways to introduce regularity constraints in the functional nonparametric models. The alternative assumption used here preserves the same rate of convergence. However, one has to keep in mind that considering other smoothness conditions on the model can lead to different convergence rates for the bias. 7.3. Comments on convergence rates It is well known that in finite dimensional case (that is F = Rp ), the uniform rates of convergence (over compact sets) are the same as the pointwise ones. The main point of this paper is to show how this is not so obvious in functional settings. To fix ideas, let us just look at the regression case (the other ones could be discussed similarly). For fixed point x ∈ F, the pointwise result (see, e.g. Ferraty and Vieu, 2006, Theorem 6.11) is stated (for ≡ Identity) as  b  m

(x) − m (x) = O(hK ) + O

log n n(hK )

 a.co.,

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

343

while in Theorem 2 of this paper we have stated result like ⎛ ⎞  log n  ⎟ ⎜ SF n ⎟ ⎜ b  sup |m ⎟ a.co.

(x) − m (x)| = O(hK ) + O ⎜ ⎠ ⎝ n  (h ) K x∈SF To see how the uniform point of view leads to a deterioration of the rates of convergence, one may look to the special examples discussed in Section 2.2. For instance, looking at Example 3, if we have standard Gaussian process with usual metric topology the loss is of log n since we have that   log n SF = O((log n)2 ). n However, if we look at Example 4, we can see the interest of a semi-metric modelling. Indeed, by suitable projection semi-metric, one arrives at entropy function satisfying   log n SF = O(log n). n So, such a new topological choice (as described in Example 4) allows to avoid for the deterioration of the rates of convergence. Acknowledgements The authors would like to thank both referees whose comments and suggestions have improved significantly the presentation of this work. All the participants of the working group STAPH on Functional and Operatorial Statistics in Toulouse are also gratified for their continuous supports and comments. Appendix A. Proofs In the following, we will denote, for all i = 1, . . . , n, by Ki (x) = K(h−1 K d(x, Xi ))

Hi (y) = H(h−1 H (y − Yi )).

and

First of all, according to (H1) and (H4), it is clear that if K(1) > C > 0, ∀x ∈ SF , ∃0 < C < C  < ∞,

C (hK ) < E[K1 (x)] < C  (hK ).

(11)

In the situation when K(1) = 0, the combination of (H1) and (H5a) allows to get the same result (see, Ferraty and Vieu, 2006, p. 44, Lemma 4.4). From now on, in order to simplify the notation, we set  = log n/n. Proof of Theorem 2. We consider the decomposition:  m

(x) − m (x) =

m (x) 1 1    [g ,

(x) − Eg

(x)] +  [Eg

(x) − m (x)] + [1 − f (x)]   f (x) f (x) f (x)

where  f (x) =

n 

1

nE[K(h−1 K d(x, X1 ))] i=1

K(h−1 K d(x, Xi ))

and g

(x) =

1

n 

nE[K(h−1 K d(x, X1 ))]

i=1

K(h−1 K d(x, Xi )) (Yi ).

Therefore, Theorem 2 is a consequence of the following intermediate results. Lemma 8. Under hypotheses (H1) and (H4)–(H6), we have ⎛ ⎞  log n  ⎜ SF ⎟ n ⎜ ⎟ sup | f (x) − 1| = Oa.co. ⎜ ⎟. ⎝ ⎠ n(hK ) x∈SF



(12)

344

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Corollary 9. Under the hypotheses of Lemma 8, we have ∞  n=1



1 P inf  f (x) < 2 x∈SF



< ∞.

Lemma 10. Under hypotheses (H1), (H2) and (H4)–(H6), we have b sup |Eg

(x) − m (x)| = O(hK ).

x∈SF

Lemma 11. Under the assumptions of Theorem 2, we have ⎛ ⎞  log n SF ⎜ ⎟ n ⎜ ⎟  sup |g ⎟.

(x) − Eg

(x)| = Oa.co. ⎜ ⎝ ⎠ n(hK ) x∈SF Proof of Lemma 8. Let x1 , . . . , xN (SF ) be an -net for SF (see, Definition 1) and for all x ∈ SF , one sets k(x) = argmink∈{1,2, ...,N (SF )} d(x, xk ). One considers the following decomposition: sup | f (x) − E f (x)| ⱕ sup | f (x) −  f (xk(x) )| + sup | f (xk(x) ) − E f (xk(x) )| + sup |E f (xk(x) ) − E f (x)| .

x∈SF

x∈SF

!



x∈SF

"



F1

!

"

x∈SF

!



F2

"

F3

• Let us study F1 . By using (11) and the boundness of K, one can write # # n # 1  ## 1 1 # F1 ⱕ sup Ki (x) − Ki (xk(x) )# # # # E[K1 (x)] E[K1 (xk(x) )] x∈SF n i=1

n C 1 sup ⱕ |Ki (x) − Ki (xk(x) )|1B(x,hK )∪B(xk(x) ,hK ) (Xi ). (hK ) x∈SF n i=1

Let us first consider the case K(1) = 0. Because K is Lipschitz on [0,1] in this case, it comes F1 ⱕ sup

x∈SF

n C Zi n

with Zi =

i=1



hK (hK )

1B(x,hK )∪B(xk(x) ,hK ) (Xi ),

with, uniformly on x,  Z1 = O



 , hK (hK )

 EZ 1 = O





 and

hK

Var(Z1 ) = O



2 . 2 hk (hK )

A standard inequality for sums of bounded random variables (see, Ferraty and Vieu, 2006, Corollary A.9) with (H5b) allows to get  F1 = O

 hK



 + Oa.co.





hK

log n n(hK )

 ,

and it suffices to combine (H5a) and (H5b) to get ⎛



SF () ⎠. F1 = Oa.co. ⎝ n(hK ) Now, let K(1) > C > 0. In this situation K is Lipschitz on [0,1). One has to decompose F1 into three terms as follows: F1 ⱕ C sup (F11 + F12 + F13 ), x∈SF

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

345

with F11 =

n  1 |Ki (x) − Ki (xk(x) )|1B(x,hK )∩B(xk(x) ,hK ) (Xi ), n(hK ) i=1

F12

n  1 = Ki (x)1B(x,h )∩B(x ,h ) (Xi ), K k(x) K n(hK )

F13

n  1 = Ki (xk(x) )1B(x,h )∩B(x ,h ) (Xi ). K k(x) K n(hK )

i=1

i=1

One can follow the same steps (i.e. case K(1) = 0) for studying F11 and one gets the same result: ⎛ ⎞ SF () ⎝ ⎠. F11 = Oa.co. n(hK ) Following same ideas for studying F12 , one can write F12 ⱕ

n C Wi n

with Wi =

i=1

1

(hK )

1B(x,hK )∩B(x

k(x) ,hK )

(Xi ),

and by using again (H5a) and the same inequality for sums of bounded random variables, one has       log n + Oa.co. . F12 = O (hK ) n(hK )2 Similarly, one can state the same rate of convergence for F13 . To end the study of F1 , it suffices to put together all the intermediate results and to use again (H5b) for getting ⎛ ⎞ SF () ⎝ ⎠. F1 = Oa.co. n(hK ) • Now, concerning F2 , we have, for all > 0, ⎛ ⎛ ⎞ ⎞   SF () SF () ⎠ = P⎝ ⎠ max P ⎝F2 > | f (xk(x) ) − E f (xk(x) )| > n(hK ) n(hK ) k∈{1, ...,N (SF )} ⎛ ⎞  SF () ⎠. f (xk ) − E ⱕ N (SF ) max P ⎝| f (xk )| > n(hK ) k∈{1, ...,N (SF )} Let

ki =

1 (K (x ) − E[Ki (xk )]). E[K1 (xk )] i k

We show, under (H1) and (H4), that ∀k = 1, . . . , N (SF ), ∀i = 1, . . . , n,

ki = O((hK )−1 ) and also Var(ki ) = O((hK )−1 ). So, one can apply the Bernstein-type inequality (see, Ferraty and Vieu, 2006, Corollary A.9) which gives directly # ⎛ # ⎛ ⎞ ⎞   # # SF () SF () # n # 1   ⎠ = P⎝ # ⎠ P ⎝|f (xk ) − Ef (xk )| > ki ## > n ## n(hK ) n(hK ) # i=1

ⱕ 2 exp{−C 2 SF ()}. Thus, by using the fact that S () = log N (SF ) and by choosing such that C 2 = , we have F ⎛ ⎞  SF () ⎠ ⱕ C  N (SF )1− . f (xk ) − E max P ⎝| f (xk )| > N (SF ) n(hK ) k∈{1, ...,N (SF )} Because

∞

n=1 N (SF )

⎛

1−

< ∞, we obtain that ⎞

SF () ⎠. F2 = Oa.co. ⎝ n(hK )

(13)

346

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

• For F3 , it is clear that F3 ⱕ E(supx∈SF | f (x) − f (xk(x) )|) and by following a similar proof to the one used for studying F1 , it comes ⎛



SF () ⎠. F3 = O ⎝ n(hK )



Proof of Corollary 9. It is easy to see that 1 inf | f (x)| ⱕ ⇒ ∃x ∈ SF 2

x∈SF

1 1 such that 1 −  f (x) ⱖ ⇒ sup |1 −  f (x)| ⱖ . 2 2 x∈SF

We deduce from Lemma 8 that     1 1   ⱕ P sup |1 − f (x)| > . P inf |f (x)| ⱕ 2 2 x∈SF x∈SF Consequently, ∞ 

 P

n=1

1 inf | f (x)| < 2 x∈SF





< ∞.

Proof of Lemma 10. One has # # ⎤ ⎡ # # n  # # 1 # ⎦ ⎣ − m (x) − m (x)| = K (x)

(Y ) (x) E |Eg



## i i # nE[K (x)] 1 # # i=1 # # # # 1 ⱕ ## E[K1 (x) (Y1 )] − m (x)## E[K1 (x)] 1 ⱕ [E[K1 (x)|m (X1 ) − m (x)|]]. E[K1 (x)] Hence, we get ∀x ∈ SF ,

|Eg

(x) − m (x)| ⱕ

1 [EK 1 (x)|m (X1 ) − m (x)|]. E[K1 (x)]

Thus, with hypotheses (H1), (H2) and (11) we have ∀x ∈ SF ,

|Eg

(x) − m (x)| ⱕ C

1 b [EK 1 (x)1B(x,hK ) (X1 )db (X1 , x)] ⱕ ChK , E[K1 (x)]



this last inequality yields the proof, since C does not depend on x.

Proof of Lemma 11. This proof follows the same steps as the proof of Lemma 8. For this, we keep these notations and we use the following decomposition:        sup |g

(x) − Eg

(x)| ⱕ sup |g

(x) − g

(xk(x) )| + sup |g

(xk(x) ) − Eg

(xk(x) )| + sup |Eg

(xk(x) ) − Eg

(x)| .

x∈SF

x∈SF



!

"

x∈SF

!



G1

G2

Condition (H1) and result (11) allow to write directly, for G1 and G3 : # # # # n # # 1 1 G1 = sup ## Ki (x) (Yi ) − Ki (xk(x) ) (Yi )## nE[K1 (x)] nE[K1 (xk(x) )] x∈SF # # i=1

ⱕ sup

x∈SF

n  1 | (Yi )||Ki (x) − Ki (xk(x) )|1B(x,hK )∪B(xk(x) ,hK ) . n(hK ) i=1

Now, as for F1 , one considers K(1) = 0 (i.e. K Lipschitz on [0,1]) and one gets G1 ⱕ

n C Zi n i=1

with Zi =

 (Yi ) sup 1 . hK (hK ) x∈SF B(x,hK )∪B(xk(x) ,hK )

"

x∈SF



! G3

"

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

347

The main difference with the study of F1 is that one uses here an exponential inequality for unbounded variables. Note that one has  E[| m (Y)|] = E[E[| (Y)|m |X]] = m (x) dP X < C < ∞, which implies that E(|Z1 |m ) ⱕ

C m m−1 hm K (hK )

.

( So, by applying Corollary A.8 in Ferraty and Vieu (2006) with a2 = /hK (hK ), one gets G1 = Oa.co. (  log n/nhK (hK )). Now, (H5b) allows to get ⎛ ⎞ SF () ⎝ ⎠. G1 = Oa.co. n(hK )

(14)

If one considers the case K(1) > C > 0 (i.e. K Lipschitz on [0,1)), one has to split G1 into three terms as for F1 and by using similar arguments, one can state the same rate of almost complete convergence. Similar steps allow to get ⎛ ⎞ SF () ⎠. (15) G3 = O ⎝ n(hK ) For G2 , similarly to the proof of Lemma 8, we have, ∀ > 0, ⎛ ⎛ ⎞ ⎞   SF () SF () ⎠=P⎝ ⎠  max P ⎝G2 > |g

(xk ) − Eg

(xk )| > n(hK ) n(hK ) k∈{1, ...,N (SF )} ⎛ 

ⱕ N (SF )

max

k∈{1, ...,N (SF )}

 P ⎝|g

(xk ) − Eg

(xk )| >



SF () ⎠. n(hK )

The rest of the proof is based on the exponential inequality given by Corollary A.8.ii in Ferraty and Vieu (2006). Indeed, let

ki =

1 [K (x ) (Yi ) − E[Ki (xk ) (Yi )]]. E[K1 (xk )] i k

The same arguments as those invoked for proving Lemma 6.3 in Ferraty and Vieu (2006, p. 65) can be used to show that E|ki |m = O((hK )−m+1 ) which gives by applying the exponential inequality mentioned above, for all > 0  SF () 2  ) ⱕ 2N (SF )−C . P(|g

(xk ) − Eg

(xk )| > n(hK ) Therefore, by suitable choice of > 0, we have  N (SF ) As

max

k∈{1, ...,N (SF )}

∞

1− n=1 N (SF )

< ∞, we obtain that

G2 = Oa.co. ⎝

SF () ⎠. n(hK )

⎛



 P |g

(xk ) − Eg

(xk )| >

log N (SF ) n(hK )



ⱕ C  N (SF )1− .



(16)

Now, Lemma 11 can be easily deduced from (14)–(16).



Proof of Theorem 4. Similarly to (12), we have 1 x F x (y)   F x (y) − F x (y) = [(F N (y) − E [Ef (x) −  f (x)], F xN (y)) − (F x (y) − E F xN (y))] +   f (x) f (x)

(17)

where  F xN (y) =

1

n 

nE[K(h−1 K d(x, X1 ))]

i=1

K(h−1 K d(x, Xi ))1{Yi ⱕ y} .

Then, Theorem 4 can be deduced from the following intermediate results, together with Lemma 8 and Corollary 9.



348

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Lemma 12. Under hypotheses (H2 ) and (H4), one has b sup sup |F x (y) − E F xN (y)| = O(hK1 ).

x∈SF y∈SR

Lemma 13. Under the assumptions of Theorem 4, we have ⎛ ⎞  log n  ⎜ ⎟ S n ⎜ F ⎟ sup sup | F xN (y) − E F xN (y)| = Oa.co. ⎜ ⎟. ⎝ ⎠ n(hK ) x∈SF y∈SR Proof of Lemma 12. It is clear that (H4) implies ∀(x, y) ∈ SF × SR , E[ FNx (y)] − F x (y) =

1 E[(K1 (x)1B(x,hK ) (X1 ))(F X1 (y) − F x (y))]. E[K1 (x)]

(18)

The Lipschitz condition (H2 ) allows us to write that ∀(x, y) ∈ SF × SR ,

1B(x,hK ) (X1 )|F X1 (y) − F x (y)| ⱕ ChbK1 ,

∀(x, y) ∈ SF × SR ,

b |E[ F xN (y)] − F x (y)| ⱕ ChK1 .

then,



Proof of Lemma 13. We keep the notation of the Lemma 8 and we use the compactness of SR , we can write that, for some, t1 , t2 , . . . , tzn ∈ SR , SR ⊂

zn 

(tj − ln , tj + ln ),

j=1

with ln = n−1/2b2 and zn ⱕ n1/2b2 . Taking j(y) = arg

min

j∈{1,2, ...,zn }

|y − tj |.

Thus, we have the following decomposition: x x x sup sup | F xN (y) − E F xN (y)| ⱕ sup sup | F xN (y) −  F Nk(x) (y)| + sup sup | F Nk(x) (y) − E F Nk(x) (y)|

x∈SF y∈SR

x∈SF y∈SR

!



"

F1

x∈SF y∈SR



!

F2

"

x

+ sup sup |E F Nk(x) (y) − E F xN (y)| . x∈SF y∈SR



!

"

F3

Concerning (F1 ) and (F3 ), by following the same lines as for studying the terms F1 and F3 , it comes ⎛

F1

⎛



SF () ⎠ = Oa.co. ⎝ n(hK )

and

F3



SF () ⎠. = O⎝ n(hK )

(19)

Concerning (F2 ), the monotony of the functions E[ F xN (·)] and  F xN (·) permits to write, for all j ⱕ zn and for all x ∈ SF , x E F Nk(x) (tj − ln ) ⱕ x  F Nk(x) (tj − ln ) ⱕ

sup y∈(tj −ln ,tj +ln )

sup y∈(tj −ln ,tj +ln )

x x E F Nk(x) (y) ⱕ E F Nk(x) (tj + ln ),

x x  F Nk(x) (y) ⱕ  F Nk(x) (tj + ln ).

(20)

¨ Next, we use Holder's condition on F x and we show that, for any y1 , y2 ∈ SR and for all x ∈ SF , # # # # 1 1 |E F xN (y1 ) − E F x (y2 )| = ## E[K1 (x)F X1 (y1 )] − E[K1 (x)F X1 (y2 )]## E[K1 (x)] E[K1 (x)]

ⱕ C|y1 − y2 |b2 .

(21)

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

349

Then we get by (20) and (21) and because ln = n−1/2b2 : ⎛ ⎞  () S F ⎠, F2 ⱕ F4 + O ⎝ n(hK ) where F4 =

max

max

max

k∈{1, ...,N (SF )} 1 ⱕ j ⱕ zn sj =tj −ln ,tj +ln

x x | F Nk(x) (sj ) − E F Nk(x) (sj )|.

Thus, it remains to study F4 . By using similar arguments as those invoked for studying F2 and combined with (H6 ), one has ) F4 = Oa.co. ( S ()/n(hK )), which implies that F

⎛ sup sup x∈SF y∈SR

x x | FNk(x) (sj(y) ) − E FNk(x) (sj(y) )| = Oa.co.



SF () ⎝ ⎠. n(hK )

So, Lemma 13 can be easily deduced from (19) and (22).

(22)



Proof of Theorem 5. The proof is based on the following decomposition: 1 x f x (y)  x x x  f N (y)) − (f x (y) − E f N (y))] + f (y) − f x (y) = [(f N (y) − E [Ef (x) −  f (x)],   f (x) f (x)

(23)

where x  f N (y) =

1

n 

nhH E[K(h−1 K d(x, X1 ))]

i=1

−1 K(h−1 K d(x, Xi ))H(hH (y − Yi )).

Theorem 5 can be deduced from the following intermediate results, together with Lemma 8 and Corollary 9. Lemma 14. Under hypotheses (H2 ), (H4) and (H7), we have sup sup |f x (y) − E f N (y)| = O(hK1 ) + O(hH2 ). x

b

b

x∈SF y∈SR

Lemma 15. Under the assumptions of Theorem 5, we have ⎛ ⎞  log n SF ⎜ ⎟ n ⎜ ⎟ x x sup sup | f N (y) − E f N (y)| = Oa.co. ⎜ ⎟. 1−

⎝ n (hK ) ⎠ x∈SF y∈SR

Proof of Lemma 14. One has x E f N (y) − f x (y) =

⎡ ⎤ n  1 E⎣ Ki (x)Hi (y)⎦ − f x (y) nhH EK 1 (x) i=1

1 x = E(K1 (x)[h−1 H E(H1 (y)|X1 ) − f (y)]). EK 1 (x) Moreover, by change of variable     1 y − z X1 h−1 E(H (y)|X ) = H (z) dz = H(t)f X1 (y − hH t) dt, f 1 1 H hH R hH R we arrive at x |h−1 H E(H1 (y)|X1 ) − f (y)| ⱕ

 R

H(t)|f X1 (y − hH t) − f x (y)| dt.

Finally, the use of (H2 ) implies that  b b x H(t)(hK1 + |t|b2 hH2 ) dt. |h−1 H E(H1 (y)|X1 ) − f (y)| ⱕ C R

This inequality is uniform on (x, y) in SF × SR and the use of (H7) states Lemma 14.





350

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Proof of Lemma 15. Let us keep the definition of k(x) (resp. j(y)) as in Lemma 8 (resp. in Lemma 13). The compactness of SR permits to write that SR ⊂

zn 

(tj − ln , tj + ln ),

j=1

with ln = n−(3/2) −1/2 and zn ⱕ Cn(3/2) +1/2 . We have the following decomposition: x x x x x x x x | f N (y) − E f N (y)| ⱕ sup sup | f N (y) −  f Nk(x) (y)| + sup sup | f Nk(x) (y) −  f Nk(x) (tj(y) )| + sup sup | f Nk(x) (tj(y) ) − E f Nk(x) (tj(y) )| x∈SF y∈SR

!



"

x∈SF y∈SR

!



T1

"

x∈SF y∈SR



T2

!

"

T3

x x x x + sup sup |E f Nk(x) (tj(y) ) − E f Nk(x) (y)| + sup sup |E f Nk(x) (y) − E f N (y)| . x∈SF y∈SR

!



x∈SF y∈SR

"



T4

!

"

T5

Similarly to the study of the term F1 and by replacing (H5b) with (H5b ), it comes ⎛ ⎛ ⎞ ⎞ SF () SF () ⎝ ⎝ ⎠ ⎠. T1 = Oa.co. and T5 = O n1− (hK ) n1− (hK )

(24)

Concerning the term T2 , by using Lipschitz's condition on the kernel H, one can write x x | f Nk(x) (y) −  f Nk(x) (tj(y) )| ⱕ C



1

n 

nhH (hK )

i=1

Ki (xk(x) )|Hi (y) − Hi (tj(y) )|

n C Zi , n i=1

where

Zi = ln Ki (xk(x) )/h2H (hK ). 

x x  f Nk(x) (tj(y) ) = O f Nk(x) (y) − 

Once again, a standard exponential inequality for a sum of bounded variables allow us to write ln h2H



 + Oa.co.

ln



h2H

log n n(hK )

 .

Now, the fact that limn→+∞ n hH = ∞ and ln = n−(3/2) −1/2 imply that ⎛ ⎛ ⎞ ⎞ SF () SF () ⎠ and T4 = O ⎝ ⎠. T2 = Oa.co. ⎝ n1− (hK ) n1− (hK )

(25)

By using analogous arguments as for Lemma 8, we can show for all > 0, ⎛ ⎛ ⎞ ⎞   SF () SF () xk xk   ⎝ ⎝ ⎠ ⎠ max =P P T3 > max |f N (tj ) − Ef N (tj )| > nhH (hK ) nhH (hK ) j∈{1,2, ...,zn } k∈{1, ...,N (SF )} ⎛ 

ⱕ zn N (SF )

max

max

j∈{1,2, ...,zn } k∈{1, ...,N (SF )}

x x f k (tj ) − E P ⎝| f k (tj )| > N

N



SF () ⎠. nhH (hK )

Let

i =

1 [Ki (xk )Hi (tj ) − E(Ki (xk )Hi (tj ))] hH (hK )

and apply Bernstein exponential inequality (see, Ferraty and Vieu, 2006, Corollary A.9). For that, we must calculate the asymptotic behavior of E|i | and E2i . Firstly, it follows from the fact that the kernels K and H are bounded that E|i | ⱕ C(hH (hK ))−1 . Secondly, the use of the same analytic arguments as of Lemma 14 allows us to get  1 E[H12 (y)|X1 ] = f X1 H2 (t) dt, lim n→∞ hH R which implies that E|i |2 ⱕ

C . hH (hK )

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

351

Thus, we are now in position to apply the Bernstein exponential inequality and we get ⎛ ⎞  SF () xk xk   ⎝ ⎠ ∀ j ⱕ zn , P |f N (tj ) − Ef N (tj )| > nhH (hK )

ⱕ 2 exp{−C 2 SF ()}. (3/2) +1/2 ), by choosing C 2 = one has Therefore, since zn = O(l−1 n ) = O(n ⎛ ⎞   () 2 S x x F ⎠ ⱕ C  zn N (SF )1−C . f Nk (tj ) − E zn N (SF ) max max P ⎝| f Nk (tj )| > nhH (hK ) j∈{1,2, ...,zn } k∈{1, ...,N (SF )}

By using the fact that limn→+∞ n hH = ∞ and (H6 ), one obtains ⎛ ⎞ SF () ⎠. T3 = Oa.co. ⎝ n1− (hK ) So, Lemma 15 can be easily deduced from (24)–(26).

(26)



Proof of Corollary 16. It is clear that *  x x  inf inf |1 − F (y)| ⱕ 1 − sup sup F (y) 2 x∈SF y∈SR

x∈SF y∈SR

*



x ⇒ sup sup | F (y) − F x (y)| ⱖ x∈SF y∈SR

x

1 − sup sup F (y)

2,

x∈SF y∈SR

which implies that  *    x P inf inf |1 −  F (y)| ⱕ 1 − sup sup F x (y) 2 n=1



x∈SF y∈SR

 n=1

x∈SF y∈SR



P



sup sup | F (y) − F x (y)| ⱖ x

x∈SF y∈SR

* 

1 − sup sup F x (y) x∈SF y∈SR

2 < ∞.

We deduce from Theorem 4 that  *    P inf inf |1 −  F x (y)| ⱕ 1 − sup sup F x (y) 2 < ∞. n=1

x∈SF y∈SR

x∈SF y∈SR

This proof is achieved by taking  = (1 − supx∈SF supy∈SR F x (y))/2 which is strictly positive.  Proof of Theorem 6. The proof is based on the same kind of decomposition as (23)   1 |f x (y)| x x x | h (y) − hx (y)| ⱕ | f (y) − f x (y)| + |F (y) − F x (y)| . x x |1 − F (y)| |1 −  F (y)| Consequently, Theorem 6 is deduced from Theorems 4 and 5, and from the next result which is a consequence of Theorem 4.



Corollary 16. Under the conditions of Theorem 6, we have + , ∞  ∃ > 0 such that P inf inf |1 −  F x (y)| <  < ∞. n=1

x∈SF y∈SR

Proof of Corollary 7. By a simple manipulation, we show that |f x ( (x)) − f x ((x))| ⱕ 2 sup | f (y) − f x (y)|. x

y∈S

We use the following Taylor expansion of the function f x : 1  f x ( (x)) = f x ((x)) + f x(j) ( (x))( (x) − (x))j j!

(27)

352

F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

 for some  (x) between (x) and  (x). Clearly, it follows by (H9), (27) and Theorem 5 that

(x) − (x)| → 0 a.co. sup |

x∈SF

Moreover, by means of (H10), we obtain that 

sup |f x(j) ( (x)) − f x(j) ((x))| → 0

a.co.

x∈SF

Hence, as for Corollary 9, we can get  > 0 such that   ∞  x(j)  P inf f ( (x)) <  < ∞, n=1

x∈SF

and we have x sup | (x) − (x)|j ⱕ C sup sup | f (y) − f x (y)|

x∈SF

a.co.

x∈SF y∈SR

By combining this result with Theorem 5, we obtain the claimed result.



References Ait Sa¨di, A., Ferraty, F., Kassa, R., Vieu, P., 2009. Cross-validated estimations in the single-functional index model. Statistics 42, 475–494. Aneiros Perez, G., Vieu, P., 2006. Semi-functional partial linear regression. Statist. Probab. Lett. 76, 1102–1110. Benhenni, K., Ferraty, F., Rachdi, M., Vieu, P., 2007. Local smoothing regression with functional data. Comput. Statist. 22, 353–370. Bogachev, V.I., 1999. Gaussian Measures. Math Surveys and Monographs, vol. 62. American Mathematical Society, Providence, RI. Boularan, J., Ferré, L., Vieu, P., 1995. Location of particular points in nonparametric regression analysis. Austral. J. Statist. 37, 161–168. Chate, H., Courbage, M., 1997. Lattice systems. Physica D 103, 1–612. Dabo-Niang, S., Rhomari, N., 2003. Estimation non paramétrique de la régression avec variable explicative dans un espace métrique. C. R. Math. Acad. Sci. Paris 336, 75–80. Dabo-Niang, S., Laksaci, A., 2007. Estimation non paramétrique du mode conditionnel pour variable explicative fonctionnelle. C. R. Math. Acad. Sci. Paris 344, 49–52. Deheuvels, P., Mason, D., 2004. General asymptotic confidence bands based on kernel type function estimators. Statist. Inference Stochastic Processes 7, 225–277. Delsol, L., 2007. Régression nonparamétrique fonctionnelle: expression asymptotique des moments. Ann. L'ISUP LI (3), 43–67. Delsol, L., 2009. Advances on asymptotic normality in nonparametric functional time series analysis. Statistics 43 (1), 13–33. Ezzahrioui, M., Ould-Sa¨d, E., 2008. Asymptotic normality of nonparametric estimator of the conditional mode for functional data. Journal of Nonparametric Statistics 20 (1), 3–18. Ferraty, F., Vieu, P., 2000. Dimension fractale et estimation de la régression dans des espaces vectoriels semi-normés. C. R. Math. Acad. Sci. Paris 330, 139–142. Ferraty, F., Goia, A., Vieu, P., 2002. Functional nonparametric model for time series: a fractal approach for dimension reduction. TEST 11, 317–344. Ferraty, F., Rabhi, A., Vieu, P., 2005. Conditional quantiles for functionally dependent data with application to the climatic El phenomenon. Sankhya 67, 378–399. Ferraty, F., Laksaci, A., Vieu, P., 2006. Estimating some characteristics of the conditional distribution in nonparametric functional models. Statist. Inference Stochastics Processes 9, 47–76. Ferraty, F., Vieu, P., 2006. Nonparametric Functional Data Analysis. Theory and Practice. Springer, Berlin. Ferraty, F., Mas, A., Vieu, P., 2007. Nonparametric regression on functional data: inference and practical aspects. Austral. New Zealand J. Statist. 49, 267–286. Ferraty, F., Rabhi, A., Vieu, P., 2008a. Estimation non-paramétrique de la fonction de hasard avec variable explicative fonctionnelle. Rom. J. Pure Appl. Math. 53, 1–18. Ferraty, F., Van Keilegom, I., Vieu, P., 2008b. On the validity of the bootstrap in nonparametric functional regression. Preprint. Ferraty, F., Vieu, P., 2009. Additive prediction and boosting for functional data. Comput. Statist. Data Anal. 53, 1400–1413. Kolmogorov, A.N., Tikhomirov, V.M., 1959. -entropy and -capacity. Uspekhi Mat. Nauk 14, 3–86 (Engl Transl. Amer. Math. Soc. Transl. Ser 2 (1961) 277–364). Kuelbs, J., Li, W., 1993. Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116, 133–157. Masry, E., 2005. Nonparametric regression estimation for dependent functional data: asymptotic normality. Stochastic Process. Appl. 115, 155–177. Onsager, L., Machlup, S., 1953. Fluctuations and irreversible processes, I–II. Phys. Rev. 91, 1505–1515 1512–1515. Ould-Sa¨d, E., Cai, Z., 2005. Strong uniform consistency of nonparametric estimation of the censored conditional mode function. Nonparametric Statist. 17, 797–806. Ramsay, J.O., Silverman, B.W., 1997. Functional Data Analysis. Springer, New York. Rosenblatt, M., 1969. Conditional probability density and regression estimators. In: Krishnaiah, P.R. (Ed.), Multivariate Analysis II. Academic Press, New York, London. Roussas, G., 1969. Nonparametric estimation of the transition distribution function of a Markov process. Ann. Math. Statist. 40, 1386–1400. Samanta, M., 1989. Nonparametric estimation of conditional quantiles. Statist. Probab. Lett. 7, 407–412. Theodoros, N., Yannis, G.Y., 1997. Rates of convergence of estimate, Kolmogorov entropy and the dimensionality reduction principle in regression. Ann. Statist. 25 (6), 2493–2511. van der Vaart, A.W., van Zanten, J.H., 2007. Bayesian inference with rescaled Gaussian process priors. Electron. J. Statist. 1, 433–448. Youndjé, E., 1996. Propriétés de convergence de l'estimateur à noyau de la densité conditionnelle. Rev. Roumaine Math. Pures Appl. 41, 535–566.