Statistics and Probability Letters 134 (2018) 150–158
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
The nonparametric quantile estimation for length-biased and right-censored data Jianhua Shi a,b, *, Huijuan Ma c , Yong Zhou d,e a
School of Mathematics and Statistics, Minnan Normal University, Zhangzhou, China Fujian Key Laboratory of Mathematical Analysis and Applications, Fujian Normal University, Fuzhou, China c Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA d School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China e Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China b
article
info
Article history: Received 18 April 2017 Received in revised form 21 August 2017 Accepted 31 October 2017 Available online 23 November 2017 Keywords: Length-biased data Right-censored data Quantile estimation Bahadur representation
a b s t r a c t This paper studies the nonparametric estimator of the quantile function under lengthbiased and right censored data, with the property of length-bias that the residual lifetime share the same distribution as the truncation time. A nonparametric estimator of the quantile function is proposed based on the improved product-limit estimator of distribution function that takes into account the auxiliary information about the length-biased sampling. Asymptotic properties of the estimator are derived, and numerical simulation studies are conducted to assess the performance of the proposed estimator, an application is also given using the Channing house data. © 2017 Elsevier B.V. All rights reserved.
1. Introduction In the prevalent study for survival analysis, the right-censored time-to-event data is often collected subject to the lefttruncation, since the individuals who have experienced the failure event before the recruitment time are not observable. When the incidence of the disease follows a stationary Poisson process, the left-truncated data is called as ‘the lengthbiased data’. As a special case of the left truncation, the length-biased sampling means that the left-truncation variable is uniformly distributed, and the sampling probability of survival time is proportional to its length. See for example, the dementia survey by the Canadian Study of Health and Aging (1994), where over 10,000 elderly Canadians (65 years or older) living in institutions or at home were screened for dementia in 1991. Such survival times arising from length-biased sampling are left truncated by uniformly distributed random truncation times when the incidence of disease onset follows a stationary Poisson process (Winter and Földes, 1988; De Uña-álvarez, 2004). It has been shown that the residual lifetime shares the same distribution to the truncation time in length-biased data (Huang and Qin, 2011). Another real length-biased example is the Channing house dataset (Hyde, 1980), see Section 5. Due to the fact that ignoring the length-biased information can lead to substantial overestimation of the survival time, the classical survival analysis methods such as the Kaplan–Meier estimator would fail when the sample is length-biased. In recent years, a lot of literatures on the statistical inferences for the length-biased data have been published, such as Addona and Wolfson (2006), Shen et al. (2009), Qin and Shen (2010), Carone et al. (2012), Huang and Qin (2012), Huang, et al. (2012) Chen et al. (2015), Qiu et al. (2016), etc. Based on the different type of complex data, various methods for the estimation of the quantile function have been proposed. Under the random right censorship model, Aly et al. (1985) studied the quantile process of the product-limit
*
Correspondence to: Minnan Normal University, Zhangzhou, Fujian, 363000, China. E-mail address:
[email protected] (J. Shi).
https://doi.org/10.1016/j.spl.2017.10.020 0167-7152/© 2017 Elsevier B.V. All rights reserved.
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
151
(PL) estimator via strong approximation methods. For left-truncation data, Gürler et al. (1993) derived a Bahadur type representation and confidence intervals and bands for the quantile function of the pertaining product-limit estimator. Some other literatures on estimation of quantile function are under the left-truncation and right-censoring model. Zhou (1997) deeply discussed the quantile function estimator by empirical process technology, and obtained some precise Bahadur type representations for the estimator. Using the kernel smoothing method, Zhou et al. (2000) obtained asymptotic normality and a Berry–Esseen type bound for the kernel quantile estimator. Wang et al. (2015) proposed a nonparametric maximum likelihood estimator of quantile residual lifetime, showed the asymptotic properties of the estimator, and illustrated some other good statistical property of the estimator through the simulation studies and a real data analysis. Some other literatures include Takeuchi et al. (2006), Zhao et al. (2011), Liang and de Uña-Álvarez (2011) and Zhang and Tan (2015) among others. In this paper, a nonparametric estimator for the quantile of distribution function (d.f.) for the length-biased and rightcensored (LBRC) data is proposed. The proposed quantile estimate function is based on the improved product-limit estimator of distribution function, which takes the auxiliary information about the length-biased sampling scheme into account. The rest of the paper is organized as follows. Section 2 introduces some notations and the proposed quantile estimator. Section 3 presents some large sample properties, such as the consistency, two Bahadur type representations and the asymptotic normality. In Section 4, some simulation studies are conducted to evaluate the behavior of the quantile estimator. In Section 5, a real dataset is analyzed to illustrate the application of our approach. Finally, proofs of the theoretical results are provided in Appendix A, however, several related lemmas and their proofs are postponed to Supplementary materials. 2. Nonparametric quantile estimation Denote (A0 , T 0 , C 0 ) as a random vector, where T 0 is the survival time of interest with continuous distribution function F (·) and density function f (·), A0 is the left truncation time with distribution function FA (·), C 0 is the total censored time. Without loss of generality, it is assumed that they are all nonnegative random variables with C 0 and (A0 , T 0 ) being mutually independent, but T 0 and C 0 may be dependent. Define Y 0 = T 0 ∧ C 0 = min(T 0 , C 0 ) and the indicator of censoring status ∆0 = I(T 0 ≤ C 0 ). Then in the setting of LBRC, nothing is observed if Y 0 < A0 , only the individuals with Y 0 ≥ A0 can be observed. Moreover, it is assumed that the survival time T 0 is independent of the calendar time of the disease onset W 0 . In this model, a reasonable assumption is α = P(Y 0 ≥ A0 ) > 0, and the calendar time of the disease onset is uniformly distributed on the interval between zero and the sampling time. For simplicity, the superscript 0 of letter is dropped to indicate the observed time without special statement, for example, write T as the observed survival time, V = T − A as the corresponding residual time, etc. Then for each individual, the observed random vector is (A, Y , ∆), whose n independent and identically distributed copies are denoted as {(ai , yi , δi ), i = 1, 2, . . . , n}. Define Q˜ (t) =
n 1∑
n
[I(ai ≤ t) + δi I(yi − ai ≤ t)]
i=1
and K˜ (t) =
n 1∑
n
[I(ai ≥ t) + I(yi − ai ≥ t)].
i=1 dQ˜ (u)
Then the survival d.f. of A can be consistently estimated by Kaplan–Meier type estimator S˜A (t) = u∈[0,t ] {1 − K˜ (u) }. And a nonparametric composite likelihood estimator (Huang and Qin, 2011) for F (·) combining with LBRC information was constructed by
∏
1 − F˜n (t) =
∏
˜ (u)} {1 − dΛ
(2.1)
u∈[0,t ]
where
˜ (t) = Λ
t
∫ 0
¯ dN(u) R˜n (u)
, R˜n (t) = n−1
n ∑
¯ I(yj ≥ t) − S˜A (t), N(t) = n− 1
j=1
n ∑
δj I(yj ≤ t).
j=1
Recently, Shi et al. (2015) successfully established an almost sure representation for the estimator F˜n (t), which help us to study the properties of quantile function. As well known, the quantile function for a d.f. G(·) is defined as G−1 (p) = inf{u : G(u) ≥ p}, p ∈ (0, 1). This paper focuses on estimating the quantile function for F −1 (p) for some constant 0 < p < 1 based on LBRC data. A natural estimator defined via the product-limit estimator (2.1) is proposed by F˜n−1 (p) = inf{u : F˜n (u) ≥ p}, p ∈ (0, 1), n = 1, 2, . . . .
(2.2)
This is in contrast to the nonparametric maximum likelihood estimator (Vardi, 1989) that has not closed-form in the presence of censoring.
152
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
3. The asymptotic properties of the quantile estimator For any distribution function G(·), define the left and right endpoints of its support as aG = inf{x : G(x) > 0} and bG = sup{x : G(x) < 1}, respectively. Under the LBRC sampling mechanism, similar to Gijbels and Wang (1993), Zhou (1996), and Zhou and Yip (1999), we always assume that aFA ≤ aH and bFA ≤ bH , where H(·) is the d.f. of Y . Define F u (t) = P(∆ = 1, Y ≤ t), R(t) = P(A ≤ t ≤ Y ). To prove our main results, we also need the integral condition (A1), which is similar to assumption ∫ b (1.3) of Zhou and Yip (1999) for LTRC data. (A1) For aH < b < bH , a R−3 (u)dF u (u) < ∞. H
Theorem 1. Assume that (A1) is satisfied, 0 < p0 < p1 < F (b), for any arbitrarily small number δ > 0, d.f. F (·) is differentiable with order one on [F −1 (p0 ) − δ, F −1 (p1 ) + δ], and f (·) is positive in the interval, then
⏐ ⏐
⏐ ⏐
sup ⏐F˜n−1 (p) − F −1 (p)⏐ =
Op (n−1/2 ) O((n−1 log log n)1/2 ), a.s.
{
p0 ≤p≤p1
To obtain the asymptotic normality and the law of the iterated logarithm for the proposed estimator, a Bahadur type representation is need. Otherwise, it will be a daunting task to derive the limiting distribution function of the quantile estimator. Theorem 2 (Bahadur Type Representation of the Quantile Estimator). Assume that (A1) is satisfied, for 0 < p < F (b), d.f. F (·) is differentiable in a neighborhood of F −1 (p) with f (F −1 (p)) > 0, then F˜n−1 (p) − F −1 (p) = f −1 (F −1 (p))[p − F˜n (F −1 (p))] +
{
op (n−1/2 ) o(n−1 log log n)1/2 , a.s.
By Theorem 2, the asymptotic normality for the quantile estimator can be derived, which is claimed as Theorem 3. For the representation of Theorem 3, let V˜ = min(V , C ), where C is the time from study enrollment to censoring, denote Q (t) = E {I(A ≤ t) + ∆I(V˜ ≤ t)}, and K (t) = E {I(A ≥ t) + I(V˜ ≥ t)}. And introduce a mean zero process for each i = 1, . . . , n,
φi (t) =
t
∫
K (u)−2 {I(ai ≥ u) + I(v˜ i ≥ u)}dQ (u) −
I(ai ≤ t)
0
K (ai )
−
δi I(v˜ i ≤ t) . K (v˜ i )
Write ψi (t) = ψ1i (t) + ψ2i (t) with
ψ1i (t) =
t
∫
R(u)−2 I(ai ≤ u ≤ yi )dF u (u) −
0
δi I(yi ≤ t) K (yi )
and
ψ2i (t) =
t
∫
R(u)−2 {I(ai ≥ u) − SA (u) − φi (u)SA (u)}dF u (u),
0
where SA (t) is the survival function of A. Theorem 3. If the condition of Theorem 2 is satisfied, then
√
D
n(F˜n−1 (p) − F −1 (p)) → N(0, σ 2 (p)),
D
where → denotes convergence in law, and
σ 2 (p) = [f (F −1 (p))]−2 Σ 2 (F −1 (p)), Σ 2 (t) = S 2 (t)E ψ12 (t). Remark 1. As the application of Theorem 3, an approximate confidence interval for F −1 (p) can be constructed. Let ˆ f (·) ˆ 2 (·) be some consistent estimate of Σ 2 (·). And denote Zα = be some nonparametric estimate of density function f (·), Σ Φ −1 (1 − α ) as the 1 − α quantile of the standard normal distribution, where α ∈ (0, 1). Then, under the conditions of Theorem 3, an approximate confidence interval with level 1 − α for F −1 (p) is
ˆ (F˜n−1 (p)) Zα/2 Σ
[F˜n−1 (p) − √
˜ −1
nˆ f (Fn (p))
ˆ (F˜n−1 (p)) Zα/2 Σ
, F˜n−1 (p) + √
nˆ f (F˜n−1 (p))
].
A naive estimate of f (·) can be obtained by the order statistics of the observable survival time T . To this end, write ln = p − n−1/2ˆ σn Zα1 , un = p + n−1/2ˆ σn Zα2 where α = α1 + α2 , ˆ σn is a consistent estimate of σ (p), thus the approximate confidence interval is of the form [F˜n−1 (ln ), F˜n−1 (un )]. Then by Lemma 2 and the uniform version convergent result of Theorem 1, when n → ∞, √ −1 n[F˜n (un ) − F˜n−1 (ln )] → (Zα + Zα )σ (F −1 (p))/f (F −1 (p)), a.s. 1
2
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
153
Similarly, by utilizing Theorem 2, we can derive a law of iterated logarithm (LIL) for the quantile estimator. Corollary 1 (LIL for the Quantile Estimator). Under the assumptions of Theorem 2, and for 0 < p < 1, F (·) is two times continuously differentiable in a neighborhood of F −1 (p) , then lim sup n1/2 (log log n)−1/2 [F˜n−1 (p) − F −1 (p)] =
n→∞
√
2σ 2 (p), a.s.
Based on the above discussion, together with Lemma 2 and Theorem 1, a uniformly strong version of Bahadur representation theorem can also be established. Theorem 4. If the condition of (A1) is satisfied, 0 < p0 ≤ p ≤ p1 < F (b), and for any arbitrary small number δ > 0, the density function f (·) is continuously differentiable on [F −1 (p0 ) − δ, F −1 (p1 ) + δ] with f (·) > 0. Then
⏐ { ⏐ ⏐ ⏐ p − F˜n (F −1 (p)) ⏐ o (n−1/2 ) ⏐ ˜ −1 −1 sup ⏐Fn (p) − F (p) − ⏐ = p −1 − 1 o(n log log n)1/2 , a.s. f (F (p)) ⏐ p0 ≤p≤p1 ⏐ Furthermore, If f (·) is two times continuously differentiable on [F −1 (p0 ) − δ, F −1 (p1 ) + δ], then
⏐ ⏐ ⏐ ˜n (F −1 (p)) ⏐⏐ p − F ⏐ ˜ −1 sup ⏐Fn (p) − F −1 (p) − ⏐ = O(n−3/4 (log n)1/2 (log log n)1/4 ), a.s. f (F −1 (p)) ⏐ p0 ≤p≤p1 ⏐ Corollary 2. Under the assumptions of Theorem 4,
⏐ ⏐
⏐ ⏐
sup f (F −1 (p)) ⏐F˜n−1 (p) − F −1 (p)⏐ = O(n−1/2 (log log n)1/2 ), a.s.
p0 ≤p≤p1
4. Simulation studies for the quantile estimator In this section, we conduct simulation studies to examine the finite sample performance of the proposed estimator and compare them with an existing nonparametric method. We first generate the length-biased and right censored data under the mechanism proposed in Huang and Qin (2011). More specifically, to mimic the disease incidence in the prevalent population, the sampling time ξ is set to be 100, and W0 is simulated from a uniform(0, 100) distribution. The unbiased survival time T0 is independently generated from three Weibull distributions with survival functions S(t) = exp(−t 2 /4), exp(−t 2 /2) and exp(−t 2 ), respectively. The observations (W0 , T0 ) are qualified in the prevalent cohort when W0 + T0 ≥ ξ . The generation is repeated until the sample size gets to n. Correspondingly, the residual censoring time is simulated from six scenarios, i.e., the uniform distribution functions U(0, 10.8), U(0, 5.6), U(0, 8.0), U(0, 4.0), U(0, 5.5) and U(0, 2.8), respectively, with censoring rate about 10% or 20%, see Table 1. We compare the proposed estimator F˜n−1 (p), which is denoted as Q _LBRC , and the well-known LTRC quantile estimator (Tsai et al., 1987), which is denoted as Q _LTRC , for three different probability values , i.e., p = 0.25, 0.50, 0.75. In each simulation with a sample size n = 200 and n = 400, we generate 1000 replicates. The notations BIAS, SD and MSE are used to denote the empirical bias, the empirical standard derivation and the mean square error, respectively. The results are summarized in Table 1. It is clear from Table 1 that two estimators are approximately unbiased with very small biases in all of the scenarios. The proposed estimator Q _LBRC has the smaller BIAS, SD and MSE than Q _LTRC in almost all situations, however. Moreover, along with the increasement of sample size, three index values of BIAS, SD and MSE all decrease. All these mean the well large sample performance of F˜n−1 (p). For further illustration of the good performance of the proposed estimator, another simulation is conducted with higher censoring rate 50% and 61% and the result is presented in Table 2. It is clear that the simulation results in Table 2 are quite similar to the results in Table 1, except that the estimation values are slightly larger than the ones in the case of lower censoring rate. Intuitively, higher censoring rate usually results in some poorer estimate. On the other side, even though the censoring rate reaches 61%, the MSEs’ in the simulation are still small. In summary, both Tables 1 and 2 show that the proposed quantile estimator has better performance for the LBRC dataset than the classic LTRC quantile estimator. This is not surprising because the proposed estimator incorporates the auxiliary information about the length-biased data. Remark 2. When the above assumption is violated, i.e., the incidence of disease onset does not follow a stationary Poisson process and thus the truncation times are not uniformly distributed, the performance of the proposed quantile estimator will be poorer than the LTRC quantile estimator since the mis-specification distribution of the truncation time. We also conduct the simulation studies when the data is sampled from the LTRC setting, and the results are summarized in Table 3.
154
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
Table 1 Simulation results for the length-biased and right censored data (I). S(t) exp{−t 2 /4}
n
200
10%
20%
400
10%
20%
exp{−t 2 /2}
200
10%
20%
400
10%
20%
exp{−t 2 }
200
10%
20%
400
p = 0.25
C%
10%
20%
BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE
p = 0.5
p = 0.75
Q_LBRC
Q_LTRC
Q_LBRC
Q_LTRC
Q_LBRC
Q_LTRC
−0.0068
−0.0060
−0.0027
−0.0048
−0.0036
−0.0053
0.1390 0.0194 0.0028 0.1338 0.0179 0.0002 0.0897 0.0080 −0.0061 0.0993 0.0099 0.0011 0.0950 0.0090 −0.0047 0.0965 0.0093 0.0001 0.0656 0.0043 −0.0011 0.0669 0.0045 0.0000 0.0705 0.0050 0.0007 0.0676 0.0046 −0.0013 0.0461 0.0021 −0.0037 0.0512 0.0026
0.1432 0.0205 −0.0035 0.1469 0.0216 −0.0017 0.0967 0.0094 −0.0092 0.1160 0.0135 −0.0030 0.1040 0.0108 −0.0073 0.1065 0.0114 −0.0006 0.0688 0.0047 −0.0023 0.0718 0.0052 −0.0018 0.0739 0.0055 0.0001 0.0715 0.0051 −0.0019 0.0485 0.0024 −0.0039 0.0528 0.0028
0.1084 0.0118 0.0048 0.1145 0.0131 −0.0002 0.0758 0.0058 −0.0027 0.0801 0.0064 0.0006 0.0774 0.0060 −0.0062 0.0816 0.0067 0.0004 0.0530 0.0028 −0.0023 0.0563 0.0032 −0.0003 0.0588 0.0035 0.0027 0.0545 0.0030 −0.0015 0.0393 0.0015 −0.0033 0.0462 0.0021
0.1186 0.0141 0.0001 0.1272 0.0162 −0.0029 0.0829 0.0069 −0.0061 0.0916 0.0084 −0.0033 0.0849 0.0072 −0.0074 0.0932 0.0087 0.0003 0.0586 0.0034 −0.0035 0.0628 0.0040 −0.0024 0.0656 0.0043 0.0024 0.0616 0.0038 −0.0028 0.0436 0.0019 −0.0045 0.0510 0.0026
0.1035 0.0107 0.0063 0.1101 0.0122 −0.0001 0.0737 0.0054 −0.0051 0.0758 0.0058 −0.0007 0.0737 0.0054 −0.0051 0.0767 0.0059 0.0006 0.0505 0.0026 −0.0002 0.0532 0.0028 −0.0010 0.0517 0.0027 −0.0011 0.0538 0.0029 −0.0015 0.0364 0.0013 −0.0024 0.0399 0.0016
0.1191 0.0142 0.0030 0.1293 0.0167 −0.0032 0.0834 0.0070 −0.0077 0.0876 0.0077 −0.0053 0.0834 0.0070 −0.0053 0.0908 0.0083 0.0006 0.0580 0.0034 −0.0024 0.0604 0.0037 −0.0030 0.0589 0.0035 −0.0011 0.0642 0.0041 −0.0018 0.0422 0.0018 −0.0033 0.0443 0.0020
Note: S(t) stands for the survival function of T 0 , n stands for the sample size, C % stands for the censoring rate.
5. Application to the Channing house data To illustrate the practical usefulness of the proposed method, we apply it to the investigation of the Channing house data (Hyde, 1980). For assessing the impact of the Channing House medical program on survival time, the dataset was collected from 1964 to July 1, 1975, including 365 women and 97 men, whose ages on entry and on death or leaving were recorded. Only those individuals who lived longer than the recruited time can be observed, hence, the lifetime was left truncated. During the follow-up period, 130 women and 46 men died at the Channing House, and many of the others were censored since they were still alive at the end of the study, which resulted in about 62% of the censoring ratio. As Chen and Zhou (2012) pointed out, a subset selected from the individuals whose recruited age were more than 786 months (65.5 years) was a length-biased datum. The eligible sub-sample consists of 448 observations with 173 survival time and 275 censoring time. We use year as the time unit for simplicity and plot the curves of the quantile lifetime using the proposed method and the LTRC method, see Fig. 1. To compare the empirical standard deviation of the two methods, a nonparametric bootstrap method is adopted by sampling 448 subjects with replacement from the dataset. The resampling procedure is repeated 1000 times with about 61% mean censoring rate, and the mean estimated quantile and the standard deviation of the 1000 replications are calculated at 0.05, 0.25, 0.5, 0.75, 0.8, respectively. The results of the real data analysis are summarized in Table 4. As expected, the proposed quantile estimator has smaller standard deviation than the LTRC method. Acknowledgments Shi’s work is supported by Natural Science Foundation of Fujian Province (2016J01026), China. Ma’s work is partially supported by National Institutes of Health grant R01 HL113548, USA. Zhou’s work was supported by the State Key Program of National Natural Science Foundation of China (71331006), the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202), National Center for Mathematics and Interdisciplinary Sciences (NCMIS), Key Laboratory of RCSDS, AMSS, CAS (2008DP173182) and Innovative Research Team of Shanghai University of Finance and
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
155
Table 2 Simulation results for the length-biased and right censored data (II). S(t) exp{−t 2 /4}
n
200
50%
61%
400
50%
61%
exp{−t 2 /2}
200
50%
61%
400
50%
61%
exp{−t 2 }
200
50%
61%
400
p = 0.25
C%
50%
61%
BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE
p = 0.5
p = 0.75
Q_LBRC
Q_LTRC
Q_LBRC
Q_LTRC
Q_LBRC
Q_LTRC
−0.0009
−0.0034
−0.0002
−0.0031
−0.0027
−0.0021
0.1355 0.0184 0.0010 0.1450 0.0210 −0.0030 0.0970 0.0094 0.0026 0.1030 0.0106 0.0032 0.0984 0.0097 −0.0034 0.1020 0.0104 0.0013 0.0737 0.0054 −0.0020 0.0871 0.0061 −0.0015 0.0745 0.0055 −0.0011 0.0773 0.0060 −0.0040 0.0534 0.0029 0.0009 0.0525 0.0028
0.1478 0.0219 −0.0038 0.1559 0.0243 −0.0039 0.1041 0.0109 0.0033 0.1093 0.0120 −0.0012 0.1075 0.0116 −0.0031 0.1102 0.0121 0.0005 0.0792 0.0063 −0.0026 0.0808 0.0065 −0.0019 0.0778 0.0061 −0.0022 0.0825 0.0068 −0.0035 0.0554 0.0031 −0.0003 0.0561 0.0031
0.1243 0.0154 0.0063 0.1313 0.0173 0.0000 0.0852 0.0073 0.0007 0.0908 0.0082 −0.0016 0.0843 0.0071 −0.0019 0.0897 0.0081 −0.0010 0.0688 0.0047 −0.0022 0.0676 0.0046 −0.0019 0.0622 0.0039 −0.0007 0.0659 0.0043 −0.0034 0.0499 0.0025 0.0008 0.0460 0.0021
0.1268 0.0215 0.0019 0.1469 0.0216 −0.0007 0.0971 0.0094 0.0020 0.1015 0.0103 −0.0037 0.0943 0.0089 −0.0029 0.1034 0.0107 −0.0012 0.0789 0.0062 −0.0034 0.0846 0.0072 −0.0045 0.0737 0.0055 −0.0019 0.0748 0.0056 −0.0032 0.0545 0.0030 0.0010 0.0523 0.0027
0.1257 0.0158 0.0016 0.1413 0.0200 0.0029 0.0897 0.0081 −0.0001 0.1034 0.0107 −0.0018 0.0868 0.0075 −0.0067 0.1025 0.0106 −0.0025 0.0643 0.0041 −0.0054 0.0714 0.0051 −0.0046 0.0651 0.0043 −0.0031 0.0718 0.0052 −0.0021 0.0508 0.0026 −0.0005 0.0511 0.0026
0.1471 0.0216 0.0006 0.1631 0.0266 0.0031 0.1074 0.0115 0.0012 0.1193 0.0142 −0.0034 0.1021 0.0104 −0.0053 0.1247 0.0156 −0.0017 0.0910 0.0083 −0.0065 0.0863 0.0075 −0.0055 0.0794 0.0063 −0.0022 0.0879 0.0077 −0.0028 0.0547 0.0030 0.0006 0.0604 0.0036
Note: S(t) stands for the survival function of T 0 , n stands for the sample size, C % stands for the censoring rate.
Fig. 1. Quantile Curves for the Channing House Data.
Economics (IRTSHUFE13122402). The authors wish to thank two anonymous referees and the Associate Editor for many helpful comments.
156
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
Table 3 Simulation results when the truncation time A is not uniformly distributed. S(t)
n
exp{−t 2 /4}
200
10%
20%
400
10%
20%
exp{−t 2 /2}
200
10%
20%
400
10%
20%
exp{−t 2 }
p = 0.25
C%
200
10%
20%
400
10%
20%
BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE BIAS SD MSE
p = 0.5
p = 0.75
Q_LBRC
Q_LTRC
Q_LBRC
Q_LTRC
Q_LBRC
Q_LTRC
−0.1865
0.0065 0.1012 0.0103 0.0023 0.1047 0.0110 −0.0014 0.0744 0.0055 −0.0010 0.0743 0.0055 0.0019 0.0799 0.0064 0.0022 0.0821 0.0067 0.0007 0.0539 0.0029 −0.0013 0.0606 0.0037 −0.0013 0.0642 0.0041 −0.0040 0.0629 0.0040 −0.0012 0.0464 0.0022 0.0030 0.0401 0.0016
−0.2662
0.0058 0.0959 0.0092 −0.0009 0.0952 0.0091 −0.0025 0.0696 0.0048 0.0003 0.0734 0.0054 0.0007 0.0726 0.0053 −0.0015 0.0727 0.0053 0.0008 0.0509 0.0026 −0.0020 0.0515 0.0027 −0.0007 0.0644 0.0041 −0.0031 0.0533 0.0029 −0.0005 0.0392 0.0015 0.0027 0.0382 0.0015
−0.3478
0.0057 0.1077 0.0116 −0.0034 0.1123 0.0126 −0.0035 0.0778 0.0061 −0.0029 0.0841 0.0071 −0.0014 0.0800 0.0064 −0.0012 0.0782 0.0061 −0.0007 0.0553 0.0031 −0.0010 0.0575 0.0033 −0.0001 0.0585 0.0034 −0.0022 0.0597 0.0036 0.0008 0.0400 0.0016 0.0022 0.0409 0.0017
0.1078 0.0464 −0.1926 0.1085 0.0489 −0.1920 0.0804 0.0433 −0.1942 0.0809 0.0443 −0.1072 0.0836 0.0185 −0.1079 0.0851 0.0189 −0.1094 0.0575 0.0153 −0.1131 0.0615 0.0166 −0.0627 0.0638 0.0080 −0.0653 0.0609 0.0080 −0.0624 0.0468 0.0061 −0.0604 0.0416 0.0054
0.0939 0.0797 −0.2779 0.0967 0.0866 −0.2744 0.0684 0.0800 −0.2781 0.0692 0.0821 −0.1547 0.0704 0.0289 −0.1584 0.0720 0.0303 −0.1541 0.0503 0.0263 −0.1600 0.0503 0.0281 −0.0845 0.0527 0.0099 −0.0923 0.0496 0.0110 −0.0860 0.0382 0.0089 −0.0855 0.0371 0.0087
0.0940 0.1298 −0.3696 0.0920 0.1451 −0.3548 0.0663 0.1303 −0.3690 0.0696 0.1410 −0.1981 0.0680 0.0439 −0.2082 0.0675 0.0479 −0.1999 0.0490 0.0424 −0.2100 0.0478 0.0464 −0.1086 0.0489 0.0142 −0.1159 0.0491 0.0159 −0.1099 0.0351 0.0133 −0.1113 0.0348 0.0136
Note: S(t) stands for the survival function of T 0 , n stands for the sample size, C % stands for the censoring rate. Table 4 Analysis results for the Channing house data. C% 61.8%
EST SD
Estimator
p = 0.05
p = 0.25
p = 0.50
p = 0.75
p = 0.80
Q_LBRC Q_LTRC Q_LBRC Q_LTRC
5.2681 5.0273 1.7810 1.7849
13.3462 13.4185 1.5471 1.7801
19.3871 19.4938 0.5647 0.6752
24.9351 25.1258 0.7171 0.8866
26.5298 26.8682 1.3752 1.4690
Note: EST stands for the estimator for quantile, SD stands for the empirical standard deviation, Q_LBRC stands for the LBRC quantile estimator, Q_LTRC stands for the LTRC quantile estimator.
Appendix A In this section, the proofs of main results are presented. Moreover, some lemmas are also claimed with proofs given in supplementary materials. By decomposition together with Theorem 2.2 in Shi et al. (2015), we may obtain the following strong representation for F˜n (t). Lemma 1. Under condition (A1), define a process
∫
t
Ln (t) = aH
¯ d[N(u) − F u (u)] R(u)
∫
t
−
R˜n (u) − R(u)
aH
then uniformly on aH ≤ t ≤ b < bH , F˜n (t) − F (t) = (1 − F (t))Ln (t) + rn (t)
R2 (u)
dF u (u),
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
157
with the negligible remainder term sup |rn (t)| = O((n−1 log n)3/4 ) a.s. aH ≤t ≤b
Lemma 2. Assume that F (·) is Lipschitz continuous of order one on [a, b], where aH < a < b < bF , and {kn , n = 1, 2, . . .} is a sequence of positive constants satisfying the following conditions. 1 (i) k− n = o(n); 1 (ii) nkn /log k− n → c; 1 (iii) log k− / log n → d, n
where c , d are two constants or ∞. Then if aH < aF ,
⏐ ⏐
⏐ ⏐
1 1/2 sup ⏐F˜n (t) − F˜n (s) − (F (t) − F (s))⏐ = O((n−1 kn log k− ), a.s. n )
a≤s,t ≤b |s−t |≤kn
Lemma 3. Suppose the condition (A1) holds, d.f. F (·) is continuous. Then for 0 < p0 ≤ p ≤ p1 < 1, there is a uniform consistency
⏐ ⏐
⏐ ⏐
sup ⏐F˜n (F˜n−1 (p)) − p⏐ = O(n−1 ), a.s.
p0 ≤p≤p1
Proof of Theorem 1. The condition of Theorem 1 together with Lemma 3 implies that
⏐ ⏐ ⏐ ⏐ ⏐ ⏐ ⏐ ˜ −1 ⏐ ⏐Fn (p) − F −1 (p)⏐ = (f (F∗−1 (p)))−1 ⏐F (F˜n−1 (p)) − F (F −1 (p))⏐ ⏐ ⏐ ⏐ ⏐ ⏐ ⏐ ⏐ ⏐ ≤ (f (F∗−1 (p)))−1 ⏐F (F˜n−1 (p)) − F˜n (F˜n−1 (p))⏐ + (f (F∗−1 (p)))−1 ⏐F˜n (F˜n−1 (p)) − p⏐ ⏐ ⏐ ⏐˜ ⏐ ≤M sup ⏐Fn (u) − F (u)⏐ + O(n−1 ), [F −1 (p0 )−δ,F −1 (p1 )+δ]
for some F∗−1 (p) between F −1 (p) and F˜n−1 (p) with p0 ≤ p ≤ p1 . Combining Lemma 1 and the property of empirical process, one may show the fact sup [F −1 (p0 )−δ,F −1 (p1 )+δ]
⏐ ⏐ { O (n−1/2 ) ⏐˜ ⏐ ⏐Fn (u) − F (u)⏐ = p −1 O((n
log log n)1/2 ), a.s.
This ends the proof of Theorem 1. Proof of Theorem 2. It is obvious from Lemma 3 that for p0 ≤ p ≤ p1 , F˜n (F˜n−1 (p)) − F˜n (F −1 (p)) = p − F˜n (F −1 (p)) + O(1/n), a.s. On the other hand, by Lemma 2, Lemma 3 and Theorem 1 with probability one, there is F˜n (F˜n−1 (p)) − F˜n (F −1 (p))
= F (F˜n−1 (p)) − F (F −1 (p)) + O(n−3/4 (log n)1/2 (log log n)1/4 ) = f (F −1 (p))[F˜n−1 (p) − F −1 (p)] + o(F˜n−1 (p) − F −1 (p)) + O(n−3/4 (log n)1/2 (log log n)1/4 ), a.s. This ends the proof of Theorem 2.
Appendix B. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.spl.2017.10.020. References Addona, V., Wolfson, D.B., 2006. A formal test for the stationarity of the incidence rate using data from a prevalent cohort study with follow-up. Lifetime Data Anal. 12 (3), 267–284. Aly, E.A.A., Csörgő, M., Horváth, L., 1985. Strong approximations of the quantile process of the product-limit estimator. J. Multivariate Anal. 16 (2), 185–210. Carone, C., Asgharian, M., Wang, M.C., 2012. Nonparametric incidence estimation from prevalent cohort survival data. Biometrika 99 (3), 599–613. Chen, X.P., Shi, J.H., Zhou, Y., 2015. Monotone rank estimation of transformation models with length-biased and right-censored data. Sci. China Math. 58 (10), 2055–2068. Chen, X.R., Zhou, Y., 2012. Quantile regression for right-censored and length-biased data. Acta Math. Appl. Sin. Engl. Ser. 28 (3), 443–462.
158
J. Shi et al. / Statistics and Probability Letters 134 (2018) 150–158
De Uña-álvarez, J., 2004. Nonparametric estimation under length-biased sampling and Type I censoring: a moment based approach. Ann. Inst. Statist. Math. 56 (4), 667–681. Gijbels, I., Wang, J.L., 1993. Strong representations of the survival function estimator for truncated and censored data with applications. J. Multivariate Anal. 47 (47), 210–229. Gürler, Ü., Stute, W., Wang, J.L., 1993. Weak and strong quantile representations for randomly truncated data with applications. Statist. Probab. Lett. 17 (2), 139–148. Huang, C.Y., Qin, J., 2011. Nonparametric estimation for length-biased and right-censored data. Biometrika 98 (1), 177–186. Huang, C.Y., Qin, J., 2012. Composite partial likelihood estimation under length-biased sampling, with application to a prevalent cohort study of dementia. J. Amer. Statist. Assoc. 107 (499), 946–957. Huang, C.Y., Qin, J., Follmann, D.A., 2012. A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling. Biometrika 99 (1), 199–210. Hyde, J., 1980. Testing survival with incomplete observations tasks. In: Miller, R.G., Efron, B., Brown, B.W., Moses, L.E. (Eds.), Biostatistics casebook. Wiley, New York, 31–46. Liang, H.Y., de Uña-Álvarez, J., 2011. Conditional quantile estimation with auxiliary information for left-truncated and dependent data. J. Statist. Plann. Inference 141 (11), 3475–3488. Qin, J., Shen, Y., 2010. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 66 (2), 382–392. Qiu, Z.P., Qin, J., Zhou, Y., 2016. Composite estimating equation method for the accelerated failure time model with length-biased sampling data. Scand. J. Stat. 43, 396–415. Shen, Y., Ning, J., Qin, J., 2009. Analyzing length-biased data with semiparametric transformation and accelerated failure time models. J. Amer. Statist. Assoc. 104 (487), 1192–1202. Shi, J.H., Chen, X.P., Zhou, Y., 2015. The strong representation for the nonparametric estimator of length-biased and right-censored data. Statist. Probab. Lett. 104, 49–57. Takeuchi, I., Le, Q.V., Sears, T.D., Smola, A.J., 2006. Nonparametric quantile estimation. J. Mach. Learn. Res. 7, 1231–1264. Tsai, W.Y., Jewell, N.P., Wang, M.C., 1987. A note on the product-limit estimator under right censoring and left Truncation. Biometrika 74 (4), 883–886. Vardi, Y., 1989. Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation. Biometrika 76 (4), 751–761. Wang, Y.X., Liu, P., Zhou, Y., 2015. Quantile residual lifetime for left-truncated and right-censored data. Sci. China Math. (English Ser.) 58 (6), 1217–1234. Winter, B.B., Földes, A., 1988. A product-limit estimator for use with length-biased data. Canad. J. Statist. 16, 337–355. Zhang, F.P., Tan, Z., 2015. A new nonparametric quantile estimate for length-biased data with competing risks. Econom. Lett. 137, 10–12. Zhao, M., Bai, F.F., Zhou, Y., 2011. Relative deficiency of quantile estimators for left truncated and right censored data. Statist. Probab. Lett. 81 (11), 1725–1732. Zhou, Y., 1996. A note on the TJW product-limit estimator for truncated and censored data. Statist. Probab. Lett. 26 (4), 381–387. Zhou, Y., 1997. The product-limit quantile estimator for randomly truncated and censored data. Acta Math. Appl. Sin. 20 (3), 456–465 (in Chinese). Zhou, X., Sun, L.Q., Ren, H.B., 2000. Quantile estimation for left truncated and right censored data. Statist. Sinica 10 (4), 1217–1229. Zhou, Y., Yip, P., 1999. A strong representation of the product-limit estimator for left truncated and right censored data. J. Multivariate Anal. 69 (2), 261–280.