Journal of the Korean Statistical Society 39 (2010) 455–469
Contents lists available at ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
Some asymptotic properties for a smooth kernel estimator of the conditional mode under random censorship Salah Khardani a,b , Mohamed Lemdani a,c,∗ , Elias Ould Saïd a,b a
Univ. Lille Nord de France, F-59000 Lille, France ULCO, LMPA, F-62228 Calais, France c Lab. de Biomathématiques, Univ. de Lille 2. Fac. de Pharmacie, 3, rue du Pr. Laguesse, 59006 Lille, France b
article
info
Article history: Received 3 February 2009 Accepted 9 October 2009 Available online 27 October 2009 AMS 2000 subject classifications: primary 62G20 secondary 62G07 62N01 62E20
abstract Let (Ti )1≤i≤n be a sample of independent and identically distributed (iid) random variables (rv) of interest and (Xi )1≤i≤n be a corresponding sample of covariates. In censorship models the rv T is subject to random censoring by another rv C . Let θ (x) be the conditional mode function of the density of T given X = x. In this work we define a new smooth kernel estimator θˆn (x) of θ (x) and establish its almost sure convergence and asymptotic normality. An application to prediction and confidence bands is also given. Simulations are drawn to lend further support to our theoretical results for finite sample sizes. © 2009 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
Keywords: Asymptotic distribution Censored data Kernel estimate Mode V–C classes
1. Introduction Let (Ti )1≤i≤n be an iid sample of rv with common unknown continuous distribution function (df) F with density f . In many situations we observe only censored lifetimes of the items under study. That is, assuming that (Ci )1≤i≤n is a sample of iid censoring rv with common continuous df G, we observe only the n pairs (Yi , δi ) with Yi = Ti ∧ Ci and δi = 1{Ti ≤Ci } , 1 ≤ i ≤ n, where 1A denotes the indicator function of the set A. We suppose that (Ti )1≤i≤n and (Ci )1≤i≤n are independent which ensures the identifiability of the model. Let X be a real-valued rv. For any x denote by ζ (.|x) the conditional probability density function of T given X = x which can be written as
ζ ( t | x) =
f0 (x, t )
γ (x)
(1)
where f0 (·, ·) is the joint probability density function of (X , T ) and γ (·) is the marginal density of X with respect to the Lebesgue measure. Assuming that a sequence of covariates is given, we then observe the triplets (Yi , δi , Xi )1≤i≤n . In the paper, for any df L let τL = sup {t , L(t ) < 1} be its support’s right endpoint.
∗
Corresponding author. Tel.: +33 320 964 933; fax: +33 320 964 704. E-mail addresses:
[email protected] (S. Khardani),
[email protected] (M. Lemdani),
[email protected] (E. Ould Saïd). 1226-3192/$ – see front matter © 2009 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jkss.2009.10.001
456
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
Assuming that ζ (·|x) has a unique mode θ (x) the latter is given by
θ (x) := arg max ζ (t |x). t ∈R
Censored data techniques are popular for analyzing medical data. For that we refer to, e.g.,Wei, Lin, and Weissfeld (1989) who considered the regression method for multivariate incomplete failure time data. For their part, Perdoná and LouzadaNeto (2008) studied a modified Weibull distribution model and gave an application to both a dataset of HIV-contaminated children and to the problem of misclassification of the cause of death. Let us also quote Kim and Jhun (2008) who addressed a cure rate model with interval censored data. The problem of estimating the unconditional/conditional mode of the probability density has given way to a large amount of related statistics literature. One important incentive is that it constitutes an alternative to the conditional regression method. In Ould Saïd (1997) a balanced mixture of a N (x, 1) and N (−x, 1) is considered as the conditional distribution of T given X = x. In this case the classical regression function vanishes everywhere which makes regression-based prediction unfit. Therefore the conditional mode constitutes a good substitute for a description of T by X . The long history of the mode-related issues dates back to Parzen (1962) who established the weak consistency and asymptotic normality for the iid case whereas the strong consistency was obtained by Nadaraya (1965) and Van Ryzing (1969). Using classical techniques of weak convergence (see Billingsley (1968)), Eddy (1980) derived the asymptotic normality under weaker conditions than those imposed by Parzen (1962) (see also (Eddy, 1982)). The multidimensional version of these results was obtained by Samanta (1973) and Konakov (1974). Chernoff (1964) studied the naive estimator of the mode defined as the center of the interval which contains the greatest number of observations. The recent developments focus on nonparametric estimation of the conditional mode. Let us quote Romano (1988) who investigated the asymptotic behavior of the kernel estimate of the conditional mode, with data-dependent bandwidths and obtained results under weaker smoothness assumptions on γ (·). Vieu (1996) obtained a rate of convergence for both local and global estimates of the mode function. For the random right-censoring, Louani (1998) studied the asymptotic normality of the kernel estimator of the mode. When conditioning by one of the coordinates of a bidimensional random vector in the iid case, Samanta and Thavaneswaran (1990) showed that, under some regularity conditions, the kernel estimator of the conditional mode function was consistent and asymptotically normally distributed. Finally, Mehra, Ramakrishnaiah, and Sashikala (2000) established the law of iterated logarithm (LIL), the uniform almost sure convergence over a compact set and the asymptotic normality of the smoothed rank nearest neighbor estimator of the conditional mode function. Recently Ould Saïd and Cai (2005) established, in the iid case, the uniform strong consistency of a nonparametric estimator of the censored conditional mode function. For the dependent case, the strong consistency of the conditional mode estimator was established under a φ -mixing condition by Collomb, Härdle, and Hassani (1987) and their results can be applied to process forecasting. In the α -mixing case, the strong consistency over a compact set and the asymptotic normality were obtained by Ould Saïd (1993) and Louani and Ould Saïd (1999), respectively. In a general ergodic framework, a process prediction via the conditional mode estimation was described by Ould Saïd (1997) and the strong consistency was obtained. To our knowledge, the problem of estimating the conditional mode function under random censorship has not been addressed in the statistics literature. This is the central object of interest of this paper. In this work we propose a new estimator and establish its almost sure uniform convergence and asymptotic normality. For that purpose, we consider ˘ the Vapnik–Cervonenkis (V–C) classes’ framework for which uniform exponential inequalities are available. Moreover functional estimation is based on the kernel method. The paper is organized as follows. In Section 2 we define a new kernel conditional mode estimator in the censorship model with some notations. In Section 3 the assumptions and main results are given. Section 4 is devoted to simulated numerical examples. Finally, the proofs of the main results are relegated to Section 5 with some auxiliary results with their proofs. 2. Definition of the new estimator In this section we recall some results and then define our mode estimator. We first consider the conditional density whose estimation is based on the choice of weights. Recall that, in the case of complete data, a well-known kernel estimator of the regression function is based on the Nadaraya–Watson weights1 Win (x) =
K
x−Xi hn
n ∑ x−Xj K
j =1
hn
1 Hereafter and unless otherwise specified we take by convention 0/0 = 0.
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
1 K nhn
=:
457
x−Xi hn
(2)
γn (x)
where K is a probability density function (so-called kernel function), hn is a sequence of positive real numbers (so-called bandwidth) which goes to zero as n goes to infinity and γn (·) is a kernel estimator of γ (·). Constructing an appropriate estimator for censored data is then obtained by adapting the weights (2) in order to put more emphasis on large values of the interest rv T which are more likely to be censored than small ones. Based on the same idea as in Carbonez, Györfi, and Vander Meulin (1995), Ould Saïd and Cai (2005) considered the following weights
in (x) = 1 K W
x − Xi
nhn
hn
δi , ¯G(Yi )γn (x)
¯ (·) = 1 − G(·), and established the strong uniform convergence of the corresponding conditional kernel estimator where G by using a step kernel function for Y . However this estimator is not differentiable and we cannot establish its asymptotic normality. Let us define a smooth estimate of ζ (t |x) by replacing the step kernel by a smooth density function H (1) . Consider the estimator ζ˜n (t |x) =
f˜0,n (x, t )
γn (x)
n ∑
=
δi G¯ −1 (Yi )K
i=1
hn
x−Xi hn
H (1)
t −Yi hn
(3)
n ∑ x−Xi K
hn
i =1
of ζ (t |x) where n 1 −
f˜0,n (x, t ) :=
x − Xi
H (1)
t − Yi
. (4) nh2n i=1 hn hn In practice G(·) is unknown, hence it is not possible to use the estimator (3). One way to overcome this difficulty is to replace ¯ (·) by its Kaplan and Meier (1958) estimate G¯ n (·) given by G ¯ n (t ) = G
n ∏
1−
i=1
δi G¯ −1 (Yi )K
1 − δ(i)
{Y(i) ≤t }
if t < Y(n) ,
n−i+1
0
otherwise
where Y(1) < Y(2) < · · · < Y(n) are the order statistics of (Yi )1≤i≤n and δ(i) is the concomitant of Y(i) . Therefore the feasible estimator of ζ (t |x) is given by
ζˆn (t |x) :=
fˆ0,n (x, t )
(5)
γn (x)
where n 1 −
fˆ0,n (x, t ) :=
δi G¯n
−1
(Yi )K
x − Xi
nh2n i=1 Then a natural estimator of θ (x) is
hn
H (1)
t − Yi hn
θˆn (x) = arg max ζˆn (t |x).
.
(6)
(7)
t ∈R
Note that the estimate θˆn (x) is not necessarily unique and our results are valid for any chosen value satisfying (7). We point out that we can specify our choice by taking
ˆθn (x) = inf t ∈ R such that ζˆn (t |x) = max ζˆn (y|x) . y∈R
Remark 2.1. Consider the df H with derivative H (1) . We easily get an estimator of the conditional df of T given X = x that t is Z (t |x) = −∞ ζ (y|x)dy. This estimator is defined by Zˆn (t |x) :=
∫
t
ζˆn (y|x)dy =
−∞
Fˆ0,n (x, t )
γn (x)
where
n − x − Xi t − Yi −1 ˆF0,n (x, t ) := 1 ¯ δi Gn (Yi )K H nhn i=1
hn
hn
(8)
458
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
is a smooth estimator of F0 (x, t ) :=
∫
t
f0 (x, y)dy. −∞
Remark 2.2. The density γ (·) is not affected by the censoring and is therefore consistently estimated by γn (·). 3. Assumptions and main results
¯ (τF ) > 0) and that (Ci )1≤i≤n and (Xi , Ti )1≤i≤n are indeThroughout this paper we assume that τF < τG (which implies G pendent. Define I0 = {x ∈ R/γ (x) > 0} and let I be a compact subset of I0 . Then β0 := inf γ (x) > 0.
(9)
x∈I
To formulate our results some additional notations are required. From now on, for any function ϕ and j ∈ N let ϕ (j) denote the jth-order derivative of ϕ . Moreover, for any bivariate function ψ(·, ·) and (i, j) ∈ N2 , let D(i,j) ψ(x, t ) =
∂ i+j ψ(x, t ). ∂ xi ∂ t j
For j ≥ 1, we have from (6)
n 1 − ∂j ˆ x − Xi −1 (j+1) t − Yi ˆ ¯ f0,n (x, t ) = D(0,j) f0,n (x, t ) = H . δi Gn (Yi )K 2+j ∂tj hn hn nhn i=1 The derivatives of f˜0,n (x, ·) are obtained analogously. As noted in the introduction, our framework is based on V–C classes for which we here give a very short introduction. For that let (E , d) be a metric space and define, for any ε > 0, the covering number N (E , d, ε) as the minimal number of balls needed to cover E. Now consider a class F of measurable functions on some measurable space. Then F is a bounded V–C class with respect to the envelope Φ ∈ R if for any ϕ ∈ F we have |ϕ| ≤ Φ and there exist V and ν such that
N (F , ‖ · ‖ ∞ , ε Φ ) ≤
ν V
ε
for all ε ∈ (0, 1) (see Van der Vaart and Wellner (1996), pp. 85–86 for more details). The V–C class property amounts to a kind of compactness in functional spaces which eventually allows to derive uniform exponential inequalities. In this paper we use the following result in Giné and Guillou (2001) which is adapted from Talagrand (1996). Proposition A. Let ξ1 , . . . , ξn be iid rvs, F a measurable uniformly bounded V–C class of functions. Consider σ 2 and U such that σ 2 ≥ supf ∈F Varf (ξ1 ), U ≥ supf ∈F ‖f ‖∞ and 0 < σ ≤ U. Then, there exist constants B1 and B2 , depending only on the V–C characteristics V and ν of the class F , such that the inequality
n 1 t − log 1 + P (f (ξi ) − Ef (ξ1 )) > t ≤ B2 exp − i =1 B2 U F
√ 2 B2 nσ + U log VU σ tU
is valid for all
t ≥ B1 U log
VU
σ
√ +
nσ
log
VU
σ
.
(10)
Now we give the assumptions needed to get our results and which are gathered for easy reference. H1: H2: H3: H4:
The bandwidth hn satisfies nh2n / log n → ∞ as n → ∞. (1) H (1) is a C 2 -probability density with compact support such that R tH (t )dt = 0. The kernel K is bounded with compact support and R tK (t )dt = 0. The sets
x−u t −v Θi,n = Ψx,t (u, v) = K H (i) , x, t ∈ R , hn
hn
i = 0, 1, 2, 3, n ≥ 1
are bounded V–C classes of measurable functions. H5: The joint density f0 (·, ·) is bounded and differentiable up to order 3 and supx,t D(i,j) f0 (x, t ) < ∞ for i + j ≤ 3. H6: The marginal density γ (·) has a continuous second order derivative. The last hypothesis intervenes in the asymptotic normality. H7: The bandwidth hn satisfies nh6n / log n → ∞ and nh8n → 0 as n → ∞.
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
459
Remark 3.1. Assumption H4 is considered in Giné and Guillou (2002) (their assumption K1 ). It is needed in order to use Talagrand’s inequality which in turn ensures the uniform consistency of functional estimators (see also Theorem 4.2.1 and 4.2.4 in Dudley, 1999). The other assumptions are classical in nonparametric estimation (H1–H3, H5 and H6). Finally note that H7 implies H1. 3.1. Consistency In this subsection, we prove the consistency of our estimator and give a rate of convergence. Other consistency results are also derived. Proposition 3.2. Under assumptions H1–H6, we have
sup sup |ζˆn (t |x) − ζ (t |x)| = O max
1/6
1/2
nh2n
x∈I t ≤τF
In particular, for hn ∼ n−1 log n
log n
,
h2n
a.s. as n −→ ∞.
, we get the rate
sup sup |ζˆn (t |x) − ζ (t |x)| = O
log n
1/3 a.s. as n −→ ∞.
n
x∈I t ≤τF
Theorem 3.3. Under the assumptions of Proposition 3.2, if the conditional density satisfies supx∈I ζ (2) (θ (x)|x) < 0, we have
sup |θˆn (x) − θ (x)| = O max
log n nh2n
x∈I
In particular, for hn ∼ n−1 log n sup |θˆn (x) − θ (x)| = O
1/6
1/4
, hn
a.s. as n −→ ∞.
, we get the rate
log n
1/6 a.s. as n −→ ∞.
n
x∈I
Remark 3.4. The uniform negativeness assumption on the conditional density’s second derivative (in Theorem 3.3) implies the uniform unicity of the conditional mode that is:
∀ϵ > 0 ∃α > 0, ∀η : I → R,
inf |θ (x) − η(x)| ≥ ϵ ⇒ inf (ζ (θ (x)|x) − ζ (η(x)|x)) ≥ α. x∈I
x∈I
Remark 3.5. A generalisation of the result to higher dimensions for the covariates, that is X ∈ Rd , by adapting the assumptions (imposed on the kernel function K and the bandwidth hn ) is straightforward and Theorem 3.3 becomes
sup |θˆn (x) − θ (x)| = O max
log n
1/4
nhdn+1
x∈I
, hn
a.s. as n −→ ∞.
Proposition 3.6. Under assumptions H1–H6 , we have
sup sup |Zˆn (t |x) − Z (t |x)| = O max
x∈I t ≤τF
In particular, for hn ∼ n−1 log n
1/5
nhn
1/2
,
h2n
a.s. as n −→ ∞.
, we get the rate
sup sup |Zˆn (t |x) − Z (t |x)| = O x∈I t ≤τF
log n
log n n
2/5 a.s. as n −→ ∞.
3.2. Asymptotic normality Now suppose that the density function ζ (.|x) is unimodal at θ (x). Then by assumption H5 we have ζ (1) (θ (x)|x) = 0 and (1) we assume that ζ (2) (θ (x)|x) < 0. Similarly we have ζˆn (θˆn (x)|x) = 0.
460
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
Using a Taylor expansion we get
θˆn (x) − θ (x) = −
ζˆn(1) (θ (x)|x) ζˆn(2) θ¯n (x)|x
(11)
where θ¯n (x) is between θˆn (x) and θ (x). Using (5) we can write
θˆn (x) − θ (x) = −
D(0,1) fˆ0,n (x, θ (x)) D(0,2) fˆ0,n (x, θ¯n (x))
,
(12)
if the denominator does not vanish. To state the asymptotic normality of θˆn (x) we establish that the numerator in (12), suitably normalized, is asymptotically normally distributed and that the denominator converges in probability to D(0,2) f0 (x, θ (x)). The result is given in the following theorem. Theorem 3.7. Let x ∈ R and suppose that assumptions H2–H7 hold. We have
D
nh4n θˆn (x) − θ (x) −→ N 0, σ 2 (x)
D
where −→ denotes the convergence in distribution, f0 (x, θ (x)) 2 (2) 2 ‖K ‖2 ‖H ‖2 ¯G(θ (x)) D0,2 f0 (x, θ (x)) 2 1/2 and for any ϕ, ‖ϕ‖2 = R ϕ 2 (r )dr .
σ 2 (x) =
Now based on fˆ0,n (·, ·) and D(0,2) fˆ0,n (·, ·) we easily get a plug-in estimator σˆ n2 (x) for σ 2 (x) which, under the assumptions of Theorem 3.7, gives a confidence interval of asymptotic level 1 − α for θ (x)
θˆn (x) − η1−α/2 σˆ n (x), θˆn (x) + η1−α/2 σˆ n (x) where η1−α/2 denotes the (1 − α/2)-quantile of the standard normal distribution. Remark 3.8. In the case where ζ (2) (θ (x)|x) = 0, our results can be adapted, provided a higher (even) order derivative of ζ (·|x) is negative at θ (x). In that case, the rate of convergence is adjusted accordingly. Indeed, rewriting the expansion in (11) for a (2k)-order derivative (with k > 1), Formula (12) becomes
θˆn (x) − θ (x) = −
D(0,1) fˆ0,n (x, θ (x))
1/(2k−1)
D(0,2k) fˆ0,n (x, θ¯n (x))
and yields a (nh4n )1/[2(2k−1)] -asymptotic normality rate for the estimator. 4. Simulation study This section is divided into two parts: the first one shows the behavior of our estimate for some particular conditional mode functions, whereas the second deals with asymptotic normality. 4.1. Consistency Linear case with normal errors We first consider the linear model Ti = Xi + σ εi , where (Xi )1≤i≤n and (εi )1≤i≤n are two independent iid sequences distributed as N (0, 1) and σ is an appropriately chosen constant (here we take σ = 0.2). Clearly we have θ (x) = x. We also simulate n iid rv Ci ❀ N (0, 1) and then take Yi = Ti ∧ Ci with indicator rv δi = 1{Ti ≤Ci } . Based on the observed data (Xi , Yi , δi ), i = 1, . . . , n, we calculate our estimator by choosing K as the standard Gaussian kernel. In nonparametric estimation, it is well known that optimality (in the MSE sense) is not seriously affected by the choice of kernel K but can be swayed by that of the bandwidth hn . We notice that the quality of fit increases with n (Fig. 1). In all cases we took hn according to H1. Linear case with lognormal errors
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
461
Fig. 1. Conditional mode function θ(x) (continuous curves) and estimator θˆn (x) (dashed curves) for n = 100, 500 and 1000, respectively.
Fig. 2. Conditional mode function θ (x) (continuous curves) and estimator θˆn (x) (dashed curves) for n = 100, 500 and 1000, respectively.
In order to consider a less trivial mode function, we work with the previous linear case with an error ε following a normalized lognormal distribution (recall that if λ ∼ N (0, 1) then eλ follows the standard lognormal distribution with mean e1/2 and variance e(e − 1)). In this case we have
σ θ (x) = x + √ e(e − 1)
1 e
√ −
e .
In order to get a mode function which is far enough from the first bisector we chose σ = 0.5. As a consequence, the graphs in Fig. 2 exhibit a higher variability than those of Fig. 1 (taking σ = 0.2 gave better estimators but the mode function could not be distinguished from the first bisector). Effect of censoring Again for the linear case, we now consider different censoring distributions. The convergence rate being fast (as shown in Fig. 1), we here take a smaller sample size (n = 50) in order to assess the censoring effect on the quality of estimation (as a consequence, the graphs are not as good as those of Fig. 1). The errors are normal and we take σ = 0.2. The rate of censoring is adapted by considering shifted distributions of the exponential law E (5) (with density g (x) = 5e−5x 1{x≥0} ). The respective censoring rates (C.R.) are indicated in the caption of Fig. 3 where a limited censoring effect can be observed. Nonlinear case with normal errors To conclude this part, we consider some nonlinear models, namely: Ti = Xi2 + 1 + σ εi
parabolic case,
Ti = sin(1.5Xi ) + σ εi
sinus case,
Ti = exp(Xi − 0.2) + σ εi
exponential case.
Figs. 4 and 5 show that the quality of fit for nonlinear models is as good as in the linear case. Note that for the exponential model, the censoring variables Ci are distributed as E (5) to avoid strong censoring and therefore a bad quality of fit. For the other cases normal Ci′ s are considered. 4.2. Asymptotic normality Now we consider the asymptotic normality property. Based on an n-sample, we compare the shape of the estimated density (with normalized deviations) to that of the standard normal density in the case of the linear model Ti = Xi + σ εi ,
i = 1, . . . , n.
462
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
Fig. 3. Conditional mode function θ(x) (continuous curves) and estimator θˆn (x) (dashed curves) for n = 50 and censoring distributions E (5) − 5 (C.R. ≈ 54%), E (5) − 2 (C.R. ≈ 28%) and E (5) + 2 (C.R. ≈ 1%), respectively.
Fig. 4. Conditional mode function θ(x) (continuous curves) and estimator θˆn (x) (dashed curves) for parabolic, sinus and exponential cases, with n = 500.
Fig. 5. Conditional mode function θ(x) (continuous curves) and estimator θˆn (x) (dashed curves) for parabolic, sinus and exponential cases, with n = 1000.
Then θ (x) = x and the data are simulated as in the previous section. The steps are as follows: we estimate the conditional mode function θ (x) by θˆn (x), and we calculate the normalized deviation between this estimate and the theoretical conditional function, for x = 0, i.e.
θ¯ = θ¯ (n) :=
nh4n
σˆ n (x)
(θˆn (0) − θ (0)) =
nh4n
σˆ n (x)
θˆn (0).
We then draw using this scheme, B independent n-samples. The bandwidth hB is chosen according to hypothesis H7. This generates an iid sequence θ¯1 , . . . , θ¯B which density function is estimated by the kernel method. For that we consider the bandwidth h′B = CB−1/5 (see e.g. Silverman, 1986, p. 40), where the constant C is appropriately chosen. We compare the shape of the estimated density with normalized deviation (dashed curves) to that (continuous curves) of the standard normal density (leftward diagrams in Figs. 6 and 7). We also plot the corresponding histograms against the standard normal density (middle diagrams) and the QQ-plots (rightward diagrams). We consider the cases n = 500 and n = 1000 with B = 300. All the graphs show a good quality of fit and vindicate the asymptotic normality theoretical result. Furthermore the Shapiro–Wilk test gave respective P-values (0.416 and 0.774) which suggest not to reject the normality distribution.
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
463
Fig. 6. Normality plots for n = 500 and B = 300.
Fig. 7. Normality plots for n = 1000 and B = 300.
5. Auxiliary results and proofs Lemma 5.1. Under assumptions H1–H5 , we have
1/2 n 1 − log n x − Xi (1) t − Yi 2 sup sup 2 K H − f0 (x, t ) = O max , hn hn hn nh2n x∈I t ≤τF nhn i=1
a.s. as n −→ ∞.
Proof. We have n 1 −
nh2n i=1
K
x − Xi hn
H
(1)
t − Yi hn
− f0 (x, t ) =
n − 1 i=1
nh2n
K
[
x − Xi hn
H
(1)
t − Yi
hn
] 1 x − X1 (1) t − Y1 − E K H nh2 hn hn [ n ] 1 x − X1 t − Y1 (1) + E 2K H − f0 (x, t ) hn
hn
hn
=: S1 + S2 .
(13)
Under H2–H4, the sequence
F1,n
x−u 2 −1 (1) t − v ∧ w = ξx,t (u, v, w) = (nhn ) K H : x, t ∈ R , hn
hn
n≥1
is made of V–C classes of measurable functions. These are uniformly bounded with respective envelopes Un (nh2n )−1 ‖K ‖∞ ‖H (1) ‖∞ . Moreover, under H5,
E ξx2,t (X , T , C ) ≤
‖K ‖22 ‖H (1) ‖22 ‖f0 ‖∞ m2 ‖f0 ‖∞ =: =: σn2 n2 h2n n2 h2n |G¯ (τF )|2 1
with σn ≤ Un for n large enough.
:=
464
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
Applying Proposition A with tn = B3
log n nh2n
(which satisfies (10)) for a positive constant B3 , we get
n − log n P sup {ξx,t (X , T , C ) − E[ξx,t (X , T , C )]} ≥ B3 nh2n ξx,t ∈F1,n i=1 ‖K ‖∞ ‖H (1) ‖∞ B3 log2n nhn log n 1 B3 2 nh2n nh n 2 × nh log 1 + ≤ B2 exp − . n √ √ 2 B ‖K ‖∞ ‖H (1) ‖∞ n m2 ‖f0 ‖∞ ‖K ‖√∞ ‖H (1) ‖∞ ‖K ‖∞ ‖H (1) ‖∞ 2 B2 log V + 2 nhn h m ‖f ‖ nh
n
n
2
0 ∞
Under H1 and using log(1 + w) ∼ w (for w → 0) the last quantity is of order
log n 1 B3 nh2n nh2n B2 exp − B2 ‖K ‖∞ ‖H (1) ‖∞
‖K ‖∞ ‖H (1) ‖∞ B3
log n nh2 n
nh2n
B2
nm2 ‖f0 ‖∞
= B2 n
− m ‖f1 ‖ 2 0 ∞
B2 1 B2 2
n2 h2n
which, for n large enough and by an appropriate choice of B3 , can be made O(n−3/2 ). The latter being a general term of a summable series, we get by Borel–Cantelli’s lemma
S1 = O
log n
1/2 a.s. as n −→ ∞.
nh2n
(14)
On the other hand, using a change of variable and a Taylor expansion, we get, under H2, H3 and H5
∫ ∫
K (s)H (1) (r ) [f0 (x − shn , t − rhn ) − f0 (x, t )] ds dr
S2 = R
R
= O(h2n )
(15)
which together with (13) and (14) and under H1 gives the result.
Lemma 5.2. Under assumptions H1–H5 , we have sup sup |fˆ0,n (x, t ) − f˜0,n (x, t )| = O
log log n
1/2 a.s. as n −→ ∞.
n
x∈I t ≤τF
Proof. From (4) and (6) we have
n x − Xi 1 1 1 − ˆ (1) t − Yi ˜ δi K H − f0,n (x, t ) − f0,n (x, t ) ≤ 2 ¯ (Yi ) G¯ n (Yi ) nhn i=1 hn hn G n ¯ (t )| 1 − supt ≤τF |G¯n (t ) − G x − Xi (1) t − Yi ≤ K H . ¯ (τF ) nh2n i=1 hn hn G¯n (τF )G ¯ (τF ) > 0, in conjunction with the SLLN and the LIL on the censoring law (see formula (4.28) in Deheuvels & Einmahl, Since G 2000), the result is an immediate consequence of Lemma 5.1. Lemma 5.3. Under assumptions H2, H3 and H5 , we have sup sup |Ef˜0,n (x, t ) − f0 (x, t )| = O(h2n ) as n −→ ∞. x∈I t ≤τF
Proof. From (4) we have
Ef˜0,n (x, t ) =
=
1
¯ −1 (Y1 )K E 1{T1 ≤C1 } G 2
hn 1
h2n
E K
x − X1 hn
[
x − X1 hn
H (1)
t − Y1
¯ −1 (Y1 )H (1) E 1{T1 ≤C1 } G
hn
t − Y1 hn
] X1 , T1 .
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
465
Using 1{T1 ≤C1 } ϕ(Y1 ) = 1{T1 ≤C1 } ϕ(T1 ) we get
Ef˜0,n (x, t ) − f0 (x, t ) =
1
¯ −1 (T1 )K E G 2
x − X1
hn
hn
H (1)
t − T1 hn
E 1{T1 ≤C1 } |X1 , T1
− f0 (x, t )
= S2 . The result is a consequence of (15), under H2, H3 and H5.
Proof of Proposition 3.2. Using the triangle inequality we have from (1), (5) and (9) sup sup |ζˆn (t |x) − ζ (t |x)| ≤ x∈I t ≤τF
+
x∈I
1 inf γn (x) x∈I
+
1 inf γn (x) x∈I
1 sup sup |fˆ0,n (x, t ) − f˜0,n (x, t )| inf γn (x) x∈I t ≤τF
sup sup |Ef˜0,n (x, t ) − f0 (x, t )| + sup sup |f˜0,n (x, t ) − Ef˜0,n (x, t )| x∈I t ≤τF
x∈I t ≤τF
β0−1 sup sup |f0 (x, t )| sup |γn (x) − γ (x)|. x∈I t ≤τF
(16)
x∈I
Using a Taylor expansion, we have under assumptions H3 and H6 sup |E[γn (x)] − γ (x)| = O h2n .
(17)
x∈I
Furthermore, by H1 and H3, an analogous proof to that of Lemma 5.1 gives sup |γn (x) − E[γn (x)]| = O
log n
1/2
nhn
x∈I
.
(18)
On the other hand, considering the sequence of bounded V–C classes
¯ −1 (v ∧ w)K F2,n = ξx,t (u, v, w) = (nh2n )−1 1{v≤w} G
x−u hn
H (1)
t −v∧w hn
, x, t ∈ R ,
n≥1
and following part (S1 ) of Lemma 5.1, we get sup sup |f˜0,n (x, t ) − Ef˜0,n (x, t )| = O
x∈I t ≤τF
log n
1/2
nh2n
a.s. as n −→ ∞.
(19)
Then (16)–(19) and Lemmas 5.2 and 5.3 permit to conclude the proof. Finally note that the proof of Proposition 3.6 is similar to the current proof by replacing the quantity in (6) by that in (8). Proof of Theorem 3.3. We have sup |ζ (θˆn (x)|x) − ζ (θ (x)|x)| ≤ sup |ζ (θˆn (x)|x) − ζˆn (θˆn (x)|x)| + sup |ζˆn (θˆn (x)|x) − ζ (θ (x)|x)| x∈I
x∈I
x∈I
≤ sup sup |ζˆn (t |x) − ζ (t |x)| + sup | sup ζˆn (t |x) − sup ζ (t |x)| x∈I t <τF
≤ 2 sup sup |ζˆn (t |x) − ζ (t |x)|. x∈I t <τF
x∈I
t <τF
t <τF
(20)
The a.s. uniform consistency of θˆn (x) follows then immediately from Proposition 3.2, Remark 3.4 and the continuity of ζ (·|x). Now a Taylor expansion gives
ζ (θˆn (x)|x) − ζ (θ (x)|x) =
1 2
(θˆn (x) − θ (x))2 ζ (2) (θn⋆ (x)|x)
where θn⋆ (x) is between θ (x) and θˆn (x). Then by (20) we have
sup |θˆn (x) − θ (x)| ≤ 2 sup sup x∈I
x∈I t <τF
|ζˆn (t |x) − ζ (t |x)| . |ζ (2) (θn⋆ (x)|x)|
Since θn⋆ (x) is a.s. uniformly consistent, we have by the uniform negativeness assumption in Theorem 3.3, under H5, infx∈I |ζ (2) (θn⋆ (x)|x)| > 0. Hence by Proposition 3.2 the proof is completed.
466
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
Proof of Theorem 3.7. From (12) we have
nh4n θˆn (x) − θ (x)
=
nh4n
D(0,1) fˆn (x, θ (x)) − D(0,1) f˜n (x, θ (x)) D(0,2) fˆn (x, θ¯n (x))
+ nh4n
nh4n
D(0,1) f˜n (x, θ (x)) − E[D(0,1) f˜n (x, θ (x))] D(0,2) fˆn (x, θ¯n (x))
E[D(0,1) f˜n (x, θ (x))] D(0,2) fˆn (x, θ¯n (x))
J1 + J2 + J3
=:
+
D(0,2) fˆn (x, θ¯n (x))
.
(21)
First we establish that J1 and J3 are negligible whereas J2 is asymptotically normal.
Lemma 5.4. Under H2–H5 and H7, we have J1 −→ 0 a.s. as n −→ ∞. Proof. We have
n 1 − x − Xi (2) θ (x) − Yi ¯ ¯ K H . sup Gn (t ) − G(t ) × 3
nh4n
|J1 | ≤
¯ (τF ) t ≤τF G¯n (τF )G
nhn i=1
hn
hn
¯ (t ) = O (log log n/n)1/2 a.s. (see the proof of Lemma 5.2). We then follow the same ideas as in Recall that supt G¯n (t ) − G Lemma 5.1 and (19) by considering the sequence
F3,n = ξx (u, v, w) = (nh3n )−1 K and prove that (nh3n )−1
∑n
i =1
K
x−Xi hn
x−u
hn H (2)
H (2)
θ(x)−Yi hn
θ(x) − v ∧ w
hn
,x∈R ,
n≥1
converges to D(0,1) f0 (x, θ (x)) which completes the proof.
Lemma 5.5. Under assumptions H2, H3, H5 and H7, we have J3 −→ 0 as n −→ ∞. Proof. From D(0,1) f˜n (x, θ (x)) =
n 1 −
nh3n i=1
δi G¯ −1 (Yi )K
x − Xi
hn
H (2)
θ (x) − Yi
(22)
hn
we get
J3 = E
h3n
nh4n
=
nh4n
K
K
R
H
hn
∫ ∫
h3n
x − X1
x−u
R
hn
(2)
θ (x) − T1 hn
H (2)
θ (x) − v hn
¯ −1
E[1{T1 ≤C1 } G
(Y1 )|X1 , T1 ]
f0 (u, v)dudv.
Integrating by parts and using H2, H3 and H5 we obtain, by a Taylor expansion J3 =
∫ ∫ R
=o
K (r )H (1) (s)D(0,1) f0 (x − rhn , θ (x) − shn )drds
nh4n
R
nh8n
which goes to zero under H7.
Lemma 5.6. Under assumptions H2, H3 and H5, we have Var [J2 ] −→
f0 (x, θ (x))
¯ (θ (x)) G
‖K ‖22 ‖H (2) ‖22 as n −→ ∞.
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
467
Proof. From (22) we have
[
1
Var 1{T1 ≤C1 } K 2
Var [J2 ] =
x − X1
hn
[
1
¯ −2 (T1 )K 2 E G 2
=
hn x − X1
hn
−
1 h2n
[
¯ −1 (T1 )K E G
hn
H (2)
θ (x) − Y1
(2) 2
(H )
x − X1
hn
H (2)
]
¯ −1 (Y1 ) G
hn
θ (x) − T1
hn
θ (x) − T1
E[1{T1 ≤C1 } |X1 , T1 ]
hn
]
E[1{T1 ≤C1 } |X1 , T1 ]
]2
=: β + βn2 . 1 n
On the one hand, by Lemma 5.5
βn2 = n−1 J32 −→ 0 as n −→ ∞. On the other hand, using a change of variable we can write
β = 1 n
K 2 (r )(H (2) )2 (s)
∫ ∫ R
R
¯ (θ (x) − shn ) G
f0 (x − rhn , θ (x) − shn )dr ds.
Then, since G(·) is continuous, we have, under H2, H3 and H5
βn1 =
f0 (x, θ (x))
∫ ∫
¯ (θ (x)) G
R
which gives the result.
K 2 (r )(H (2) )2 (s)drds + o(1) as n −→ ∞
R
Now we consider the denominator in (21). As θ¯n (x) → θ (x) a.s., its consistency will be proved if we show Lemma 5.7. Under assumptions H2–H5 and H7, we have
sup D(0,2) fˆn (x, t ) − D(0,2) f0 (x, t ) −→ 0 a.s. as n −→ ∞.
t ≤τF
Proof. Since
D(0,2) fˆ0,n (x, t ) − D(0,2) f0 (x, t ) ≤ D(0,2) fˆ0,n (x, t ) − D(0,2) f˜0,n (x, t ) + D(0,2) f˜0,n (x, t ) − D(0,2) f0 (x, t ) =: Σ1,n (x, t ) + Σ2,n (x, t ) the proof is completed through Lemmas 5.8 and 5.9. Lemma 5.8. Under assumptions H2–H5 and H7, we have sup Σ1,n (x, t ) −→ 0 a.s. as n −→ ∞.
t ≤τF
Proof. n 1 −
1 1 δi K x − Xi H (3) t − Yi − 4 ¯ ¯ h h t ≤τF nhn G ( Y ) G ( Y ) n n i n i i=1 n G¯n (t ) − G¯ (t ) 1 − x − Xi (3) t − Yi ≤ sup × 4 K H . ¯ (τF ) nhn i=1 hn hn t ≤τF G¯n (τF )G
sup Σ1,n (x, t ) ≤ sup
t ≤τF
(23)
|Υn (x,t )|
By considering the sequence
F4,n = ξx,t (u, v, w) = (nh4n )−1 K
x−u hn
H (3)
t −v∧w hn
, x, t ∈ R ,
n≥1
analogous ideas to those of Lemma 5.1 and (19) give that Υn (x, t ) converges uniformly on t to D(0,2) f0 (x, t ). Finally from (23), as in Lemma 5.2, we get the result.
468
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
Lemma 5.9. Under assumptions H2–H5 and H7, we have sup Σ2,n (x, t ) −→ 0 a.s. as n −→ ∞.
t ≤τF
Proof.
sup Σ2,n (x, t ) ≤ sup D(0,2) f˜0,n (x, t ) − E(D(0,2) f˜0,n (x, t )) + sup E(D(0,2) f˜0,n (x, t )) − D(0,2) f0 (x, t ) t ≤τF
t ≤τF
t ≤τF
=: I1,n + I2,n .
(24)
Integrating by parts twice and using a change of variable it follows as in Lemma 5.3
∫ ∫
I2,n = sup t ≤τ F
R
R
K (r )H (1) (s)D0.2 f0 (x − rhn , t − shn )drds − D(0,2) f0 (x, t )
= O( ). h2n
(25)
On the other hand
F5,n = ξx,t (u, v, w) = (
)
¯ (w)−1 K
nh4n −1 G
x−u
hn
H
(3)
t −v∧w
hn
, x, t ∈ R ,
n≥1
are bounded V–C classes of measurable functions and as previously we get, under H7
I1,n = O
log n/nh6n
1/2
= o(1) a.s. as n −→ ∞.
Hence, replacing (25) and (26) in (24) we get the result.
(26)
Now the final step to prove Theorem 3.7 is to show the Berry–Esséen condition for J2 (for details see, e.g. Chow & Teicher, 1997, p. 322). For that, in view of (21), put J2 =:
n −
∆i,n (x)
i=1
where
[ ] θ (x) − Ti x − X1 θ (x) − T1 x − Xi ∆i,n (x) = (nh2n )−1/2 δi G¯ −1 (Yi )K H (2) − E δ1 G¯ −1 (Y1 )K H (2) . hn
hn
hn
hn
Then Lemma 5.10. Under assumptions H2, H3, H5 and H7, we have
ωn3 :=
n − 3 E ∆i,n (x) < ∞. i=1
Proof. Applying the Cr -inequality (see Loève, 1963, p. 155) nh2n −3/2 E
(
)
3 ∆1,n (x)3 ≤ 4(nh2 )−3/2 E 1{T ≤C } G−1 (Y1 )K x − X1 H (2) θ (x) − T1 n 1 1 h h n
n
[ ]3 x − X1 −1 (2) θ (x) − T1 . + E 1{T1 ≤C1 } G (Y1 )K H h h n
(27)
n
Both expectation terms in (27) being finite under H2, H3 and H5 we get, by H7
ωn3 :=
n −1/2 − 3 E ∆i,n (x) = O nh6n = o(1). i=1
This completes the proof of Lemma 5.10 and therefore that of Theorem 3.7.
Acknowledgements The authors are grateful to the associate editor and two anonymous referees for their careful reading and appropriate remarks which gave them the opportunity to improve the quality of the paper and add the simulation part.
S. Khardani et al. / Journal of the Korean Statistical Society 39 (2010) 455–469
469
References Billingsley, P. (1968). Convergence of probability measures. New York: Wiley. Carbonez, A., Györfi, L., & Vander Meulin, E. C. (1995). Partitioning estimates of a regression function under random censoring. Statistics & Decisions, 13, 21–37. Chernoff, H. (1964). Estimation of the mode. Annals of the Institute of Statistical Mathematics, 16, 31–41. Chow, Y. S., & Teicher, H. (1997). Probability theory. Independence, interchangeability, martingales. New York: Springer. Collomb, G., Härdle, W., & Hassani, S. (1987). A note on prediction via estimation of the conditional mode function. Journal of Statistical Planning and Inference, 15, 227–236. Deheuvels, P., & Einmahl, J. H. J. (2000). Functional limit laws for the increments of Kaplan–Meier product-limit processes and applications. The Annals of Probability, 28, 1301–1335. Dudley, R. M. (1999). Uniform central limit theorems. Cambridge, UK: Cambridge University Press. Eddy, W. F. (1980). Optimum kernel estimators of the mode. The Annals of Statistics, 8, 870–882. Eddy, W. F. (1982). The asymptotic distributions of kernel estimators of the mode. Zeitschrift für Wahrscheinlichkeitstheorie und Verwansdte Gebiete, 59, 279–290. Giné, E., & Guillou, A. (2001). On consistency of kernel density estimators for randomly censored data: Rates holding uniformly over adaptive intervals. Annales de l’Institut Henri Poincaré, 37, 503–522. Giné, E., & Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Annales de l’Institut Henri Poincaré, 38, 907–921. Kaplan, E. M., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457–481. Kim, Y-M., & Jhun, M. (2008). Cure rate model with interval censored data. Statistics in Medecine, 27, 3–14. Konakov, V. D. (1974). On the asymptotic normality of the mode of multidimensional distributions. Theory of Probability and its Applications, 19, 794–799. Loève, M. (1963). Probability theory. New York: Springer-Verlag. Louani, D. (1998). On the asymptotic normality of the kernel estimators of the function and its derivatives under censoring. Communications in Statistics. Theory and Methods, 27, 2909–2924. Louani, D., & Ould Saïd, E. (1999). Asymptotic normality of kernel estimators of the conditional mode under strong mixing hypothesis. Journal of Nonparametric Statistics, 11, 413–442. Mehra, K. L., Ramakrishnaiah, Y. S., & Sashikala, P. (2000). Laws of iterated logarithm and related asymptotics for estimators of conditional density and mode. Annals of the Institute of Statistical Mathematics, 52, 630–645. Nadaraya, E. N. (1965). On nonparametric estimates of density functions and regression curves. Theory of Probability and its Applications, 10, 186–190. Ould Saïd, E. (1993). Estimation non paramétrique du mode conditionnel. Application à la prévision. Comptes Rendus de l’Académie des Sciences, Paris, Série I, 316, 943–947. Ould Saïd, E. (1997). A note on ergodic process prediction via estimation of the conditional mode function. Scandinavian Journal of Statistics, 24, 231–239. Ould Saïd, E., & Cai, Z. (2005). Strong uniform consistency of nonparametric estimation of the censored conditional mode function. Journal of Nonparametric Statistics, 17, 797–806. Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33, 1065–1076. Perdoná, G. C, & Louzada-Neto, F. (2008). Interval estimation for the parameters of the modified Weibull distribution model with censored data: A simulation study. Tendências em Matemática Aplicada e Computacional, 9, 437–446. Romano, J. P. (1988). On weak convergence and optimality of kernel density estimates of the mode. The Annals of Statistics, 33, 1065–1076. Samanta, M. (1973). Nonparametric estimation of the mode of a multivariate density. South African Statistical Journal, 7, 109–117. Samanta, M., & Thavaneswaran, A. (1990). Nonparametric estimation of the conditional mode. Communications in Statistics. Theory and Methods, 16, 4515–4524. Silverman, B. W. (1986). Estimation for statistics and data analysis. In Monographs on statistics and applied probability. London: Chapman and Hall. Talagrand, M. (1996). New concentration inequalities in product spaces. Inventiones Mathematicae, 126, 505–563. Van der Vaart, A. W., & Wellner, J. A. (1996). Weak convergence and empirical processes. Berlin: Springer. Van Ryzing, J. (1969). On strong consistency of density estimates. The Annals of Mathematical Statistics, 40, 1765–1772. Vieu, P. (1996). A note on density mode function. Statistics and Probability Letters, 26, 297–307. Wei, L. J., Lin, D. Y., & Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association, 84, 1065–1073.