Journal of the Korean Statistical Society 42 (2013) 169–176
Contents lists available at SciVerse ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
Nonparametric estimation of quantile functions for randomly right censored data Soonphill Hong, Jinmi Kim, Choongrak Kim ∗ Department of Statistics, Pusan National University, Pusan, 609-735, Republic of Korea
article
info
Article history: Received 5 April 2012 Accepted 7 July 2012 Available online 28 July 2012 AMS 2000 subject classifications: Primary 62N01 62N02
abstract In this paper we compare four nonparametric quantile function estimators for randomly right censored data: the Kaplan–Meier estimator, the linearly interpolated Kaplan–Meier estimator, the kernel-type survival function estimator, and the Bézier curve smoothing estimator. Also, we compare several kinds of confidence intervals of quantiles for four nonparametric quantile function estimators. © 2013 Published by Elsevier B.V. on behalf of The Korean Statistical Society.
Keywords: Bandwidth Bézier curve Kaplan–Meier estimator Kernel smoothing Linear smoothing Median survival time
1. Introduction Median survival time is often reported as a representative survival time for a disease or cancer of interest. The median survival time is a specific value of a quantile function, and statistical inference for a quantile function depends heavily on the estimation of the survival function since the quantile function is determined by the estimator of the survival function. The most popular estimator for survival functions is the Kaplan–Meier (KM) estimator (Kaplan & Meier, 1958) which has various desirable properties such as self-consistency (Efron, 1967; Fleming & Harrington, 1991; Miller, 1981), strong consistency (Csörgo & Horváth, 1983; Peterson, 1977) and asymptotic normality (Breslow & Crowley, 1974). It is a step function, however, and quantile estimation based on the KM estimator shows poor performance. In fact, Miller (1981) showed that the estimator of the median survival time based on the KM estimator tends to be too large, i.e., it overestimates the true median survival time. This phenomenon is mainly due to the fact that the KM estimator is a step function. He argued that, as an alternative, a linearly interpolated version of the KM estimator shows better numerical performance. Many authors suggested using kernel-type estimators to overcome the undesirable aspect of the step function. Among them, Blum and Susarla (1980), Diehl and Stute (1988) and Földes, Rejtö, and Winter (1981) studied kernel density estimators based on the KM estimator. Kim, Kim, Hong, Park, and Jeong (1999) applied, as another nonparametric smoothing method, Bézier smoothing (Farin, 1990)—which is very popular smoothing technique in computational graphics but little known among the statistical community—in density estimation, and showed that it has the same rate of convergence as the kernel estimator and has smaller mean squared error (MSE) than the kernel estimator. Also, Kim, Park, Kim, and Lim (2003) suggested a smooth version of the KM estimator using the Bézier curve and compared it with the kernel-type estimator by means of the mean
∗
Corresponding author. E-mail address:
[email protected] (C. Kim).
1226-3192/$ – see front matter © 2013 Published by Elsevier B.V. on behalf of The Korean Statistical Society. doi:10.1016/j.jkss.2012.07.002
170
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
integrated squared error (MISE). They showed that the Bézier smoother is much closer to the true curve than the kernel estimator. The Bézier curve has many good properties such as end-point interpolation, symmetry and linear precision. Also, we can easily compute derivatives up to the desired order which will be useful for deriving the density function estimation for censored data. In this paper we consider statistical inference for a quantile function for randomly right censored data. Specifically we compare four nonparametric quantile estimators: the quantile estimators based on the KM estimator, the linearly interpolated KM estimator, the kernel-type survival function estimator, and the Bézier curve smoothing estimator. Also, we compare three kinds of pointwise confidence intervals for the quantile function. Finding the confidence interval entails estimation of the variance of the quantile estimate, and we consider two kinds of variance estimators: one is the well-known Greenwood formula (Greenwood, 1926) and the other is the asymptotic variance derived from the definition of the quantile function (see Sections 3.1 and 3.3). This paper is organized as follows. The KM estimator, a linearly interpolated KM estimator, a kernel smoothing estimator, and the Bézier curve smoothing estimator are introduced in Section 2. In Section 3, three methods of constructing confidence intervals for quantiles are described. In Section 4, extensive studies on the numerical performances of quantile function estimators based on the four nonparametric estimators are made. Finally, a discussion is given in Section 5. 2. Existing methods for the nonparametric estimation of survival functions We review four existing methods for statistical inference for quantile functions for randomly right censored data, using nonparametric survival function estimators: the KM estimator, the linearly interpolated KM estimator, the kernel-type survival function estimator, and the Bézier curve smoothing estimator. Let T1 , T2 , . . . , Tn be the independent and identically distributed (iid) true survival times with cumulative distribution function (cdf ) F and probability density function (pdf ) f . Also, let C1 , C2 , . . . , Cn be the iid censoring random variables with cdf G and pdf g. We assume independent censoring schemes, i.e., T and C are independent. The observed right censored data are denoted by the pairs (Yi , δi ), i = 1, 2, . . . , n, where Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ) is a censoring indicator. For notational convenience, we assume that the Yi are ordered and have no ties, i.e., Y1 < Y2 < · · · < Yn , and let δ1 , δ2 , . . . , δn be corresponding indicators (all the following results still hold even when ties are present). For S (t ), the quantile function is defined by tp = sup {t : S (t ) ≥ p}, 0 < p < 1, and the sample quantile function tˆp = sup {t : Sn (t ) ≥ p}, 0 < p < 1, where Sn (t ) is an estimator of the unknown survival function S (t ). Here, we consider four different kinds of nonparametric estimators Sn (t ) for estimating the quantile function tp . 2.1. The Kaplan–Meier estimator The KM estimator is given by
n − i δi . n−i+1 i:Y ≤t
SˆKM (t ) =
i
Note that if the last observation is censored, i.e., δn = 0, then SˆKM (t ) ̸= 0 as t → ∞. To avoid technical difficulties arising from this, it is usually assumed that the last observation is uncensored, i.e., δn = 1. It is well known that quantile estimation such as that of the median survival time based on SˆKM (t ) has a large bias (Kim et al., 2003; Miller, 1981; Padgett, 1986). 2.2. The interpolated Kaplan–Meier estimator
ˆ ˆ Let d = i=1 δi be the number of uncensored observations, and let (0, 1), (t1 , SKM (t1 )), . . . , (td−1 , SKM (td−1 )), (td , 0) be coordinates where jumps occur in the KM estimator. Then, we can construct the linearly interpolated KM estimator as follows. For tj ≤ t ≤ tj+1 , j = 0, 1, . . . , d − 1 (t0 = 0), the linearly interpolated KM estimator, denoted by SˆIK (t ), can be written as n
SˆIK (t ) =
SˆKM (tj+1 ) − SˆKM (tj ) tj+1 − tj
(t − tj ) + SˆKM (tj ).
Note that the estimate based on SˆIK (t ) is always smaller than that based on SˆKM (t ). 2.3. Kernel smoothing First, we consider complete data (all the observations are uncensored). Let T1 , T2 , . . . , Tn be a random sample from a distribution with an unknown density f (·) and distribution function F (·), which we wish to estimate. The kernel estimator of F at t is FˆK (t ) =
n 1
n
i=1
W
t − Ti h
.
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
Here, K is a kernel function satisfying
x −∞
K (x)dx = 1 and
171
K 2 (x)dx < ∞, h is a bandwidth to be estimated, and W (x) =
K (t ) dt is the cumulative kernel function. The theoretical properties of FˆK (t ) have been investigated by Azzalini (1981)
and Reiss (1981). The bandwidth selection methods for FˆK (t ) are the leave-one-out method of Sarda (1993), the plug-in method of Altman and Leger (1995), and the cross-validation method of Bowman, Hall, and Prvan (1998). The nonparametric estimation of a density for censored data has been studied by Blum and Susarla (1980), Földes et al. (1981), Marron and Padgett (1987), McNichols and Padgett (1986), Padgett and McNichols (1984) and Yandell (1983) among others. In particular, Földes et al. (1981) showed the almost sure convergence of the estimator to explain the strong uniform consistency for nonparametric survival curve estimators, Padgett and McNichols (1984) reviewed the methods for nonparametric density estimation from censored data, and Marron and Padgett (1987) suggested asymptotically optimal bandwidth selection for kernel density estimators. ˆ nLet si be the jump size at Yi in the KM estimator SKM (t ). Since we assumed that the last observation is uncensored, then s = 1. Therefore, the kernel weighted version of the KM estimator SˆKM (t ) is then obtained by replacing the empirical i i =1 distribution function by Sˆ (see, for example, Wand & Jones, 1995), i.e., SˆKE (t ) = 1 −
n
si W
t − Yi h
i=1
.
Even though SˆKE (t ) is a smooth function, it suffers from the boundary problem. For example, SˆKE (0) = 1 is not guaranteed. To avoid this problem, a boundary correction technique is often used. See Klein and Moeschberger (2003), among others. On the other hand, by differentiating SˆKE (t ) with respect to t, we obtain the kernel density estimator fˆKE (t ) =
n 1
h
si K
t − Yi h
i=1
,
which was studied by Földes et al. (1981) and McNichols and Padgett (1986). 2.4. Bézier curve smoothing Bézier curve smoothing (Farin, 1990) is a very popular smoothing technique in the computational graphics area; however, it is little known among statisticians. Among the few applications of Bézier curve smoothing in statistics, those of Kim (1996) and Kim et al. (1999) showed that the density function estimator the Bézier curve has the same asymptotic order obtained via as the kernel estimator, and has smaller mean squared error MISE (θˆ ) = E (θˆ (x) − θ (x))2 dx, where θˆ (x) is an estimator of
θ(x) at x than the kernel estimator. Also, Kim et al. (2003) suggested a smooth version of the KM estimator using a Bézier curve and showed its strong consistency. Consider K + 1 points in R2 , denoted by b0 = (r0 , s0 )′ , b1 = (r1 , s1 )′ , . . . , bK = (rK , sK )′ , where r0 ≤ r1 ≤ · · · ≤ rK . The Bézier curve based on the K + 1 Bézier points (control points) b0 , b1 , . . . , bK is defined by b(ω) ≡
t (ω)
=
y(ω)
K
bi BK , i (ω),
ω ∈ [0, 1],
i =0
where BK , i (ω) is the binomial density given by BK , i (ω) =
K i
ωi (1 − ω)K − i , which is also called a Bernstein polynomial
or a blending function. There are many properties of note for Bézier curves, and we mention two of them which will be used in this paper. First, Bézier curves have the end-point interpolation property, i.e., b0 and bK are always on the curve b(ω). In fact, b(0) = b0 and b(1) = bK . Second, the first derivative of a Bézier curve b(ω) with respect to ω is given by K −1 d b(ω) = K i=0 (bi+1 − bi ) BK −1, i (ω). dω Since the Bézier curve relies heavily on the choice of the Bézier points, Kim et al. (2003) considered three kinds of Bézier points based on the KM estimator. In this paper, we adopt the method of selecting Bézier points suggested by Kim et al. (2003). Assume that the true survival times are bounded by a known constant H, i.e., P (T ≤ H ) = 1, and let Y1∗ < Y2∗ < · · · < Yd∗ be d uncensored survival times. Usually, H is unknown for real data sets, and we have to estimate
ˆ = 1 + 1 Yd∗ , where d is the number of uncensored observations and Yd∗ is the largest H. Kim et al. (2003) suggested H d uncensored observation. Let d + 2 Bézier points be given by b0 = (0, 1)′ ; bi = (y∗i , SˆKM (y∗i ))′ , i = 1, . . . , d; bd+1 = (H , 0)′ under the assumption of δn = 1. Then, the resulting Bézier curve is defined by
b(ω) ≡
t (ω)
y(ω)
where t (ω) ≡ tn (ω) =
=
d+1
bi Bd+1, i (ω),
i=0
d+1 i =0
y∗i Bd+1, i (ω) and y(ω) ≡ yn (ω) =
d+1 i=0
SˆKM (y∗i )Bd+1, i (ω) with y∗0 = 0, y∗d+1 = H. Therefore,
the Bézier smoothing estimator of S (t ) is defined by SˆBE (t ) = y(ω), where ω is the point such that t (ω) = t. There are
172
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
several desirable properties of SˆBE (t ). First, SˆBE (0) = 1 is guaranteed by the end-point interpolation property of the Bézier curve. Second, SˆBE (t ) is monotone which can be easily verified by using the first derivative of the Bézier curve. Third, one does not need to choose a smoothing parameter. Fourth, a Bézier curve estimator of the density with censored data can be easily obtained by differentiating SˆBE (t ) with respect to t. 3. Interval estimation for quantiles Here, we consider three methods of constructing the confidence interval for the quantile tp = sup{t : S (t ) ≥ p}, 0 < p < 1. Two of them are based on using a pointwise confidence interval for an estimator Sˆ (t ) of S (t ). In this case, an approximate 100(1 − α)% confidence interval for tp is easily obtained by inverting the relationship S (tp ) = p if S (t ) is continuous. The third method is based on computing the variance of tˆp directly, i.e., tˆp ± zα/2
(tˆp ). Var
3.1. The naive pivotal method The pointwise confidence interval for S (t ) can be obtained if one can show that Sˆ (t ) − S (t ) D − → N (0, 1), Var (Sˆ (t ))
(3.1)
and, as an estimator for Var (Sˆ (t )), Greenwood’s formula
(Sˆ (t )) = Sˆ 2 (t ) σˆ s2 (t ) ≡ Var
i:yi ≤t
δi (n − i)(n − i + 1)
is widely used. Note that σˆ (t ) is defined only when t < Yn . By using (3.1), the pointwise 100(1 − α)% confidence interval (C.I.) for S (t ) is approximately given by 2 s
Sˆ (t ) − zα/2 σˆ s (t ), Sˆ (t ) + zα/2 σˆ s (t ) .
(3.2)
Let SˆL (t ) ≡ Sˆ (t ) − zα/2 σˆ s (t ) and SˆU (t ) ≡ Sˆ (t ) + zα/2 σˆ s (t ) be the lower and the upper limit for Sˆ (t ); then 1 − α ≃ P (S −1 (SˆU (t )) ≤ t ≤ S −1 (SˆL (t ))). Therefore, the approximate 100(1 − α)% C.I. for tp for a given p becomes
Sˆ −1 (Sˆ (tp ) + zα/2 σˆ s (tp )), Sˆ −1 (Sˆ (tp ) − zα/2 σˆ s (tp )) .
A crucial disadvantage of this method is that the C.I. for S (t ) given in (3.2) may include values outside of the interval (0, 1), and therefore, the C.I. for tp also suffers from this disadvantage. When this happens, however, the lower and/or upper bounds for S (t ) can be modified to max(0, Sˆ (tL )) and/or min(1, Sˆ (tU )), respectively. 3.2. The transformation method To overcome the disadvantage of the naive pivotal method, a transformation technique was suggested. Let ψ(t ) = g (S (t )), where g (·) is a one-to-one, continuous and differentiable function. In survival analysis, the log–log transformation ψ(s) = log(− log s) and the logit transformation ψ(s) = log((1 − s)/s) are widely used (see Lawless, 2003). Here, we consider the log–log transformation ψ(t ) = log [− log (S (t ))]; then
ˆ t ) − ψ(t ) D ψ( − → N (0, 1), σˆ ψ (t ) ˆ t ) = log(− log Sˆ (t )) and where ψ( (ψ( (Sˆ (t ))/{Sˆ (t ) log Sˆ (t )}2 ˆ t )) = Var σˆ ψ2 (t ) ≡ Var ˆ t ) ± zα/2 σˆ ψ (t ), and the 100(1 − α)% C.I. for S (t ) is by the delta method. Thus, the 100(1 − α)% C.I. for ψ(t ) is ψ(
ˆ U (t ))) ≤ S (t ) ≤ exp(− exp(ψˆ L (t ))) , 1 − α ≃ P exp(− exp(ψ ˆ U (t ) = ψ( ˆ t ) + zα/2 σˆ ψ (t ) and ψˆ L (t ) = ψ( ˆ t ) − zα/2 σˆ ψ (t ). Let γˆL (t ) = exp(− exp(ψˆ U (t ))) and γˆU (t ) = exp where ψ (− exp(ψˆ L (t ))); then the approximate 100(1 − α)% C.I. for tp is given by (Sˆ −1 (γˆU (tp )), Sˆ −1 (γˆL (tp ))).
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
173
3.3. The direct method Two methods in Sections 3.1 and 3.2 are based on the pointwise confidence interval for S (t ). Here, we consider a confidence interval for the quantile tp directly by computing the variance of tˆp . It is easy to show that Var (tˆp ) ≃ Var (Sˆ (tp ))/f 2 (tp ), where f (·) is the pdf of the true survival time. We can estimate Var (Sˆ (tp )) using Greenwood’s formula, but f (·) is an unknown density and has to be estimated. We suggest as estimators of f (·) two kinds of nonparametric estimators: one is the kernel density estimator and the other is obtained by taking the first derivative of the Bézier curve estimator given in Section 2.4. Therefore, the direct method gives the 100(1 − α)% C.I. for tp as
(tˆp )}1/2 , tˆp ± zα/2 {Var (tˆp ) is given either by the kernel estimator or the Bézier curve estimator. where Var 4. Numerical studies To compare the four estimators for the quantile function, we carried out extensive simulation studies. To study the performances of the point estimators, we computed the mean squared error (MSE) of tˆp for p = .01, .25, .50, .75 and .90. Also, to study the performances of the interval estimators, we computed the lengths of the confidence intervals and the probabilities of the true time point tp being included in the confidence intervals. For the random numbers, we generate the true survival time T from a Weibull distribution with parameters λ and γ γ −1 denoted by Weibull(λ, γ ), whose exp(−(λt )γ ). In fact, we generate two different random numbers pdfis f (t ) = λγ (λt ) from Weibull(1, 1) and Weibull 21 , 2 . For the censoring random numbers we generate from an exponential distribution with parameter θ , i.e., Exp(θ ). Here, we consider 10%, 30%, and 50% censoringby choosing θ = 1/9, 3/7, 1, respectively, in the Weibull(1, 1) case, and θ = 0.13, 0.59, 4.98, respectively, in the Weibull 21 , 2 case. The sample sizes considered are n = 30, 50, and 100, and 2000 replications are done for each sample size. In this paper, we only report results for 10% and 30% censoring, for p = .25, .50, and .75, and for n = 50 and 100 sample sizes, due to the limited space. First, we evaluate the MSE of tˆp using the pivotal method for four estimators denoted by tˆp,KM , tˆp,IK , tˆp,KE , and tˆp,BE1 , representing quantile estimates based on the KM estimator, the interpolated KM estimator, the kernel estimator, and the first-type Bézier curve estimator, respectively. For the kernel smoothing estimator SˆKE (t ), we use the Epanechnikov kernel with boundary correction (Klein & Moeschberger, 2003, see, for example,). Moreover, for the choice of bandwidth, we compute the mean integrated squared error (MISE) of SˆKE (t ) for h = 0.1(0.1)2.0 and then choose the one giving the minimum MISE. Tables 4.1a and 4.1b list the MSE (the sum of the variance and squared bias) of tˆp for four estimators evaluated at p = .25, .50, and .75, and we see that the quantile estimates based on the first-type Bézier curve tˆp,BE1 and tˆp,KE are quite competitive, and that tˆp,KM is the poorest in most cases. If we take into account the fact that the optimal bandwidth, which is not available in practice, is used in computing tˆp,KE , then the behavior of tˆp,BE1 is quite a bit better for both the Weibull(1, 1) and Weibull 21 , 2 cases. In most cases of censoring, the bias part was almost negligible compared to the variance part. In fact, these results coincide with those for the survival function estimators. Next, we compute the average lengths of the confidence intervals of tp for four estimators and the proportions of confidence intervals containing the true value tp out of 2000 iterations when α = 0.05. These computations are done for three different methods (pivotal method, transformation method, direct method). Of course, the length of the interval should be short and the proportion containing the true value should be close to 1 − α for it to be a good confidence interval. As shown in Tables 4.2a and 4.2b, the interpolated KM estimator gives the shortest length, the Bézier curve estimator and the KM estimator are next, and the kernel estimator is the poorest. For the proportion containing the true value, the Bézier curve estimator is best. The KM and the interpolated KM usually underestimate and the kernel estimator tend to overestimate. Also, the direct method is preferred to the others. In conclusion, we can say that as an estimator of the quantile function the first-type Bézier curve estimator is quite good among the estimators that we considered, and, further, it gives the most reliable interval estimator for computing the variance of tˆp directly. 5. Conclusion In this paper, we compared four estimators (the KM estimator, interpolated KM estimator, kernel estimator, and Bézier curve estimator) for the quantile function. Also, we considered three methods of computing the confidence interval for the quantile using the four estimators. We did an extensive numerical study on the estimators considered. Among them, the Bézier curve smoothing turned out to be best in the sense of the mean squared error when one is interested in the point estimation of the quantile function. Kernel smoothing showed quite competitive results; however, it is based on the optimal smoothing parameter which is not available for real data sets. One important and useful advantage of the Bézier curve smoothing over the kernel smoothing is that the Bézier curve smoothing does not require the process of
174
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
Table 4.1a MSE(var + bias2 ) × 100 for four quantile estimators of tp at p = .25, .50, .75 when n = 50, 100 with 10%, 30% censoring (C ); Weibull(1, 1) case. n 50
C%
p
tˆp.KM
tˆp.IK
tˆp.KE
tˆp.BE1
10
.75
0.77 (0.74+0.03) 2.21 (2.19+0.02) 6.62 (6.58+0.04)
0.66 (0.66+0.00) 2.00 (2.00+0.00) 6.03 (5.96+0.07)
0.53 (0.52+0.01) 1.86 (1.85+0.02) 5.18 (5.12+0.06)
0.56 (0.56+0.00) 1.78 (1.78+0.00) 5.42 (5.41+0.01)
0.81 (0.78+0.02) 2.70 (2.65+0.05) 11.16 (10.94+0.22)
0.70 (0.70+0.00) 2.31 (2.31+0.00) 7.73 (7.60+0.13)
0.69 (0.67+0.02) 2.27 (2.24+0.03) 6.97 (6.80+0.17)
0.59 (0.59+0.00) 2.00 (2.00+0.00) 6.63 (6.62+0.01)
0.36 (0.36+0.01) 1.11 (1.10+0.01) 3.44 (3.43+0.02)
0.34 (0.34+0.00) 1.07 (1.07+0.00) 3.25 (3.24+0.01)
0.25 (0.25+0.00) 0.96 (0.95+0.00) 2.55 (2.54+0.01)
0.30 (0.30+0.00) 0.97 (0.97+0.00) 2.93 (2.93+0.00)
0.39 (0.38+0.01) 1.33 (1.31+0.02) 4.83 (4.80+0.04)
0.36 (0.36+0.00) 1.24 (1.24+0.00) 4.13 (4.09+0.04)
0.35 (0.35+0.00) 1.21 (1.20+0.01) 3.29 (3.27+0.03)
0.32 (0.31+0.00) 1.12 (1.11+0.00) 3.78 (3.78+0.00)
.50 .25 30
.75 .50 .25
100
10
.75 .50 .25
30
.75 .50 .25
Table 4.1b MSE(var + bias2 ) × 100 for four quantile estimators of tp at p = .25, .50, .75 when n = 50, 100 with 10%, 30% censoring (C ); Weibull
1
n 50
, 2 case.
C%
p
tˆp.KM
tˆp.IK
tˆp.KE
tˆp.BE1
10
.75
2.63 (2.61+0.02) 3.51 (3.50+0.01) 5.71 (5.71+0.00)
2.47 (2.45+0.02) 3.34 (3.30+0.04) 5.31 (5.15+0.17)
2.03 (2.03+0.00) 2.85 (2.83+0.02) 4.63 (4.57+0.06)
2.11 (2.07+0.04) 2.86 (2.84+0.03) 4.42 (4.38+0.03)
4.12 (4.05+0.07) 6.97 (6.87+0.10) 16.54 (16.38+0.16)
3.5 (3.44+0.05) 5.48 (5.32+0.16) 10.09 (8.62+1.47)
3.09 (3.07+0.02) 5.6 (5.50+0.10) 12.42 (12.19+0.23)
2.69 (2.55+0.14) 4.38 (4.21+0.17) 7.93 (7.35+0.59)
1.33 (1.31+0.02) 1.82 (1.81+0.01) 2.75 (2.73+0.01)
1.29 (1.29+0.00) 1.77 (1.77+0.00) 2.62 (2.61+0.01)
1.05 (1.05+0.00) 1.52 (1.49+0.03) 2.35 (2.26+0.09)
1.15 (1.14+0.00) 1.58 (1.58+0.00) 2.33 (2.33+0.00)
2.04 (2.00+0.04) 3.26 (3.23+0.03) 7.63 (7.52+0.11)
1.8 (1.80+0.00) 2.88 (2.86+0.02) 5.25 (5.01+0.24)
1.5 (1.50+0.00) 2.56 (2.51+0.05) 5.8 (5.62+0.18)
1.51 (1.49+0.02) 2.36 (2.34+0.02) 4.26 (4.20+0.05)
.50 .25 30
.75 .50 .25
100
2
10
.75 .50 .25
30
.75 .50 .25
estimation of the smoothing parameter. The Bézier curve smoothing can be easily performed if the KM estimator is available. Also, the estimation of the survival function at t = 0 based on the Bézier curve smoothing is guaranteed to give 1; however, the case for the kernel smoothing is not guaranteed due to the boundary problem, even though the boundary-correction kernel was used. In the interval estimation problem for the quantile function, the Bézier curve smoothing showed good results in the sense of the length of intervals and the proportion containing the true quantile function. For each interval estimator the direct method of computing the variance of tˆp showed more consistent and reliable results than the others.
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
175
Table 4.2a Length and proportion ×100 for the C.I. containing the true tp out of 2000 iterations when n = 100 with 10% censoring (C ); Weibull(1, 1) case. n
C%
Method
p
Length(proportion) of C.I. KM
IK
KE
BE1
100
10
Pivotal
.75 .50 .25
0.23(92.8) 0.40(94.3) 0.71(94.2)
0.22(93.9) 0.39(93.9) 0.68(93.1)
0.23(97.9) 0.40(95.9) 0.72(97.6)
0.23(95.4) 0.40(95.8) 0.71(96.2)
Transf.
.75 .50 .25
0.23(95.5) 0.41(94.8) 0.75(94.9)
0.23(95.0) 0.40(94.5) 0.71(94.4)
0.23(98.1) 0.41(96.1) 0.77(97.9)
0.23(96.5) 0.40(96.1) 0.74(96.5)
Direct
.75 .50 .25
0.23(97.3) 0.40(95.5) 0.73(97.3)
0.23(93.8) 0.40(93.6) 0.72(93.9)
Pivotal
.75 .50 .25
0.24(92.8) 0.43(93.8) 0.86(94.1)
0.23(93.3) 0.42(93.6) 0.77(92.7)
0.24(95.6) 0.43(94.3) 0.97(98.1)
0.23(95.3) 0.43(95.6) 0.85(95.8)
Transf.
.75 .50 .25
0.24(95.1) 0.44(94.9) 0.94(95.2)
0.23(94.7) 0.42(93.8) 0.82(94.5)
0.24(96.5) 0.44(95.1) 1.21(98.7)
0.23(96.6) 0.43(96.2) 0.99(96.7)
Direct
.75 .50 .25
0.24(95.3) 0.43(93.6) 0.86(97.1)
0.23(93.3) 0.43(93.7) 0.83(93.7)
30
Table 4.2b Length and proportion ×100 for the C.I. containing the true tp out of 2000 iterations when n = 100 with 10% censoring (C ); Weibull
1 2
, 2 case.
n
C%
Method
p
KM
IK
KE
BE1
100
10
Pivotal
.75 .50 .25
0.44(94.0) 0.50(94.1) 0.64(93.6)
0.43(94.5) 0.50(94.4) 0.61(92.4)
0.45(97.5) 0.52(96.7) 0.66(96.6)
0.44(96.3) 0.50(95.8) 0.63(95.8)
Transf.
.75 .50 .25
0.45(95.2) 0.52(95.2) 0.66(94.2)
0.45(94.5) 0.51(95.2) 0.63(93.7)
0.46(98.1) 0.53(97.5) 0.69(96.7)
0.45(95.8) 0.51(96.5) 0.66(96.3)
Direct
.75 .50 .25
0.45(96.8) 0.52(95.7) 0.67(95.2)
0.44(94.0) 0.51(94.2) 0.64(93.2)
Pivotal
.75 .50 .25
0.53(93.2) 0.69(94.5) 1.00(91.6)
0.51(93.7) 0.64(93.8) 0.81(88.3)
0.55(97.0) 0.73(96.8) 1.06(96.9)
0.52(96.3) 0.67(96.0) 0.91(95.7)
Transf.
.75 .50 .25
0.54(94.7) 0.71(95.1) 1.17(94.2)
0.52(94.2) 0.66(94.2) 0.91(92.1)
0.56(97.6) 0.76(97.6) 1.23(97.7)
0.53(96.3) 0.69(96.5) 1.03(97.3)
Direct
.75 .50 .25
0.53(96.8) 0.70(95.3) 1.14(92.3)
0.51(93.8) 0.66(94.5) 0.94(93.3)
30
Length(proportion) of C.I.
Acknowledgments This work was supported by a National Research Foundation of Korea grant funded by the Korean Government (20090071660). References Altman, N., & Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. Journal of Statistical Planning and Inference, 46, 195–214. Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika, 68, 326–328. Blum, J. R., & Susarla, V. (1980). Maximum deviation theory of density and failure rate function estimates based on censored data. Journal of the Multivariate Analysis, 5, 213–222. Bowman, A., Hall, P., & Prvan, T. (1998). Bandwidth selection for the smoothing of distribution functions. Biometrika, 85, 799–808. Breslow, N., & Crowley, J. (1974). A large sample study of the life table and product limit estimates under random censorship. The Annals of Statistics, 2, 437–453. Csörgo, S., & Horváth, L. (1983). The rate of strong uniform consistency for the product-limit estimator. Zeitschrift für Wahrscheinlichkeitstheorie und Verwande Gebiete, 62, 411–426.
176
S. Hong et al. / Journal of the Korean Statistical Society 42 (2013) 169–176
Diehl, S., & Stute, W. (1988). Kernel density estimation in the presence of censoring. Journal of Multivariate Statistics, 25, 299–310. Efron, B. 1967. The two-sample problem with censored data. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 4, pp. 831–853. Farin, G. (1990). Curves and surfaces for computer aided geometric design. London: Academic Press Inc.. Fleming, T. R., & Harrington, D. P. (1991). Counting processes and survival analysis. New York: Wiley. Földes, A., Rejtö, L., & Winter, B. B. (1981). Strong consistency properties of nonparametric estimators from randomly censored data, II: estimation of density and failure rate. Perisdica Mathematica Hungarica, 12, 15–29. Greenwood, M. (1926). Reports on public health and medical subjects: vol. 33. The natural duration of cancer. London: H. M. Stationery Office. Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observation. Journal of the American Statistical Association, 53, 457–481. Kim, C. (1996). Nonparametric density estimation via the Bezier curve. In: Proceedings of the section on statistical graphics of the American Statistical Association, pp. 25–28. Kim, C., Kim, W., Hong, C., Park, B-U., & Jeong, M. (1999). Smoothing techniques via the Bezier curve. Communications in Statistics. Theory and Methods, 28, 1577–1596. Kim, C., Park, B. U., Kim, W., & Lim, C. (2003). Bézier curve smoothing of the Kaplan–Meier estimator. The Annals of the Institute of Statistical Mathematics, 55, 359–367. Klein, J. P., & Moeschberger, M. L. (2003). Survival analysis: Techniques for censored and truncated data (2nd ed.). New York: Springer. Lawless, J. F. (2003). Statistical models and methods for lifetime data. New York: Wiley. Marron, J. S., & Padgett, W. J. (1987). Asymptotically optimal bandwidth selection for kernel density estimators from randomly right-censored samples. The Annals of Statistics, 15, 1520–1535. McNichols, D. T., & Padgett, W. J. (1986). Mean and variance of a kernel density estimator under the Koziol–Green model of random censorship. Sankhya¯ Series A, 48, 150–168. Miller, R. G., Jr. (1981). Survival analysis. New York: Wiley. Padgett, W. J. (1986). A kernel-type estimator of a quantile function from right-censored data. Journal of the American Statistical Association, 81, 215–222. Padgett, W. J., & McNichols, D. T. (1984). Nonparametric density estimation from censored data. Communications in Statistics. Theory and Methods, 13, 1581–1611. Peterson, A. V., Jr. (1977). Expressing the Kaplan–Meier estimator as a function of empirical subsurvival functions. Journal of the American Statistical Association, 72, 854–858. Reiss, R.-D. (1981). Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics, 8, 116–119. Sarda, P. (1993). Smoothing parameter selection for smooth distribution functions. Journal of the American Statistical Association, 4, 831–853. Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. London: Chapman and Hall. Yandell, B. (1983). Nonparametric inference for rates with censored survival data. The Annals of Statistics, 11, 1119–1135.