Modified self-consistent estimators of the survival function with twice censored data

Modified self-consistent estimators of the survival function with twice censored data

Journal of Statistical Planning and Inference 142 (2012) 1549–1556 Contents lists available at SciVerse ScienceDirect Journal of Statistical Plannin...

184KB Sizes 0 Downloads 29 Views

Journal of Statistical Planning and Inference 142 (2012) 1549–1556

Contents lists available at SciVerse ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Modified self-consistent estimators of the survival function with twice censored data Pao-sheng Shen Department of Statistics, Tunghai University, Taichung 40704, Taiwan

a r t i c l e i n f o

abstract

Article history: Received 27 September 2010 Received in revised form 16 June 2011 Accepted 4 January 2012 Available online 13 January 2012

Using a modified self-consistent (MSC) approach, Shen (2011) proposed a MSC estimator S^ n of the survival function for twice censored data. In this paper, based on the modified likelihood function, an alternative MSC estimator Sn is proposed. The asymptotic properties of Sn are derived. Simulation results indicate that the estimator Sn can outperform S^ n when left-censoring is heavy. & 2012 Elsevier B.V. All rights reserved.

Keywords: Left-censored Right-censored Modified self-consistent

1. Introduction In survival studies, the observed data is typically right censored and left censored. Consider the example of a reliability system which consists of three components E1, E2 and E3, with E1 and E2 in series and E3 is parallel with this series system. Let T, U and L denote the life times of the three components E1, E2 and E3, respectively. Suppose that T, U and L are independent of one another and when the system fails we are able to determine which component failed at the same time as the system. When the system fails, we can only observe a lifetime variable X ¼ max½minðT,UÞ,L and an indicator variable d with d ¼ 1 if Lo T r U, d ¼ 2 if Lo U oT and d ¼ 3 if minðT,UÞ rL. We say that X is a twice censored observation of T. Note that twice censoring at first glance is the same as double censoring, where we also only observe a lifetime variable X ¼ max½minðT,UÞ,L. However, the two schemes turn out to be quite distinct ideas (see the example of double censorship of Turnbull, 1974). Under double censoring scheme, L and U are dependent on each other with PðL oUÞ ¼ 1, and we also observe a lifetime variable X ¼ max½minðT,UÞ,L and an indicator variable d with d ¼ 1 if Lo T r U, d ¼ 2 if U o T and d ¼ 3 if T r L. Let ðX 1 , d1 Þ, . . . ,ðX n , dn Þ denote the observed sample and x1 ox2 o    oxJ be the distinct values in increasing order of Xi’s. P For component E1, let dj ¼ ni¼ 1 I½X i ¼ xj ;di ¼ 1 denote the number of failure times at xj. Similarly, define Pn Pn cj ¼ i ¼ 1 I½X i ¼ xj ;di ¼ 2 and ej ¼ i ¼ 1 I½X i ¼ xj ;di ¼ 3 for components E2 and E3, respectively. Let F, Q, and G denote the distribution functions of T, U and L, respectively. Under this model, Patilea and Rolin (2006) proposed a product-limit estimator S^ P ðtÞ of SF ðtÞ ¼ 1FðtÞ ¼ PðT 4tÞ as follows: S^ P ðtÞ ¼

Y xi r t

( 1

di

)

nG^ n ðxi ÞRn ðxi1 Þ

,

E-mail address: [email protected] 0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2012.01.007

1550

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

where Rn ðxÞ ¼ G^ n ðxÞ ¼

PJ

I j ¼ 1 ½xj r x

J Y xj Z x

 1

and  ej : Rn ðxj Þ

Note that G^ n ðxÞ is the product-limit estimator of GðxÞ ¼ PðL oxÞ and n1 Rn ðxÞ is the empirical function of PðX o xÞ ¼ PðL o xÞPðminðT,UÞ oxÞ. Hence, G^ n ðxÞn1 Rn ðxÞ is a consistent estimator of the probability PðLo xÞPðminðU,TÞ ZxÞ. Let aF ¼ supft : FðtÞ ¼ 0g and bF ¼ infft : FðtÞ ¼ 1g denote the left and right support of T, respectively. Similarly, define aQ and bQ for U and aG and bG for L. Let ðD,J  JÞ be the Banach space of all real valued functions defined on ð0,1Þ which are right-continuous and have left limits at t r1, and ðD½a,b,J  JÞ the restrictions of h 2 D on ½a,b  ð0,1Þ. Under the condition FðaF Þ ¼ Q ðaQ Þ ¼ 0, Patilea and Rolin (2006) established the strong convergence of S^ P , i.e. ~ ¼ PðX rt, d ¼ 3Þ, Q~ ðtÞ ¼ PðX rt, d ¼ 2Þ, supaF r t r bF 9S^ P ðtÞSF ðtÞ9-0 almost surely. Let GðtÞ ~ ~ HðtÞ ¼ F~ ðtÞ þ Q~ ðtÞ þ GðtÞ. Let t 4 a~ F such that F~ ðtÞ þ Q~ ðtÞ o 1. Under the condition Z 2 ~ ~ o1 GðduÞ=½ HðuÞ

F~ ðtÞ ¼ PðX r t, d ¼ 1Þ,

and

ð1:1Þ

ðaF~ ,1Þ

they also established the weak convergence of S^ P ðtÞ in D½aF~ , t. Note that when F, Q and G are continuous, aG r minðaF ,aQ Þ and bG ¼ bF ¼ bQ , we have a ~ ¼ aF and t ¼ bF . In this case, the strong consistency and weak convergence of S^ P ðtÞ hold on the F

set ðaF ,bF Þ. Using the approach similar to the case of doubly censored data (see Turnbull, 1974; Tsai and Crowley, 1985; Chang and Yang, 1987), Shen (2011) proposed a modified self-consistent (MSC) estimator of SF. His approach is described Rt GðuÞSF ðuÞQ ðduÞ. Let as follows: Consider the subdistribution function Q~ ðtÞ ¼ PðX i ot, di ¼ 2Þ ¼ PðL oU o T,U r tÞ ¼ aF

WðtÞ ¼ PðX 4 tÞ and SG ðtÞ ¼ 1GðtÞ. Then WðtÞ ¼ SG ðtÞ þSF ðtÞSQ ðtÞSF ðtÞSG ðtÞSQ ðtÞ:

ð1:2Þ

Note that Eq. (1.2) can be rewritten as SF ðtÞ ¼ WðtÞSG ðtÞ þ SF ðtÞQ ðtÞ þ SF ðtÞSG ðtÞSQ ðtÞ: Rt Consider the subdistribution function Q~ ðtÞ ¼ PðLo U o T,U rtÞ ¼ aF GðuÞSF ðuÞQ ðduÞ. When G and SF are known, SQ(t) can be estimated by n1

n X I½X i 4 t, di ¼ 2 : GðX i ÞSF ðX i Þ i¼1

~ n ðtÞ ¼ n1 Pn I½X 4 t and Q~ n ðtÞ ¼ n1 Pn I½X r t, d ¼ 2 denote the empirical survival and distribution functions of Let W i¼1 i¼1 i i i WðtÞ and Q~ ðtÞ, respectively. Now, we require the estimators of SF , Q and SQ (denoted by S^ n , Q^ n and S^ Q , respectively) to Rt relate Q~ . Imposing the condition S^ Q ð0Þ ¼ 1, we have Q^ ðtÞ ¼ ð1=G^ n ðuÞS^ n ðuÞÞQ~ ðduÞ and S^ Q ðtÞ ¼ 1Q^ ðtÞ.Thus, a MSC n

n

0

n

estimator S^ n ðtÞ can be obtained by solving the following equation: " # Z t ~ Z t ~ Q n ðduÞ Q n ðduÞ ^ ^ ^S n ðtÞ ¼ W ~ n ðtÞS^ G ðtÞ þ S^ n ðtÞ , þ S n ðtÞS Gn ðtÞ 1 n ^ n ðuÞS^ n ðuÞ ^ n ðuÞS^ n ðuÞ 0 G 0 G

n

ð1:3Þ

where S^ Gn ðtÞ ¼ 1G^ n ðtÞ. Shen (2011) demonstrated that the MSC estimator S^ n outperforms the product-limit estimator S^ P and its advantage over the product-limit estimator can be very significant when right censoring is heavy. Under the condition GðuÞSQ ðuÞ 4 0 for u 2 ðaF , tÞ, Shen (2011) established the weak convergence of S^ n ðtÞ in DðaF , t. Note that for doubly censored data (i.e. PðL oUÞ ¼ 1), Eq. (1.3) is reduced to Z t ~ Q n ðduÞ ~ n ðtÞ þ S^ n ðtÞ ½1S^ n ðtÞS^ Gn ðtÞ, S^ n ðtÞ ¼ W ^ n ðuÞ 0 S

ð1:4Þ

which differs from the self-consistent equation of Tsai and Crowley (1985) (or Chang and Yang, 1987), and EM algorithm of Mykland and Ren (1996). In Section 2, based on the modified likelihood function, an alternative MSC estimator Sn is proposed. The asymptotic properties of Sn are derived. In Section 3, a simulation study is conducted to compare the performance between S^ n and Sn . 2. The proposed estimator First, for twice censored data, given SQ , the likelihood function for SF is proportional to the function LðSF Þ ¼

J Y i¼1

ðSF ðxi1 ÞSF ðxi ÞÞdi ðSF ðxi ÞÞci ½1SF ðxi ÞSQ ðxi Þei :

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

1551

Since the likelihood function involves both SF and SQ , it is difficult to derive the NPMLE of SF . Similar to Theorem 1 of Mykland and Ren (1996), given SQ , the likelihood, or log LðSF Þ is strictly concave on the set of P i ¼ SF ðxi Þ satisfying (i) 1 ZP 1 Z    ZP J Z 0; (ii) Pi 4P i1 if di 40; (iii) P i 4 0 if ci 40 and (iv) P i o1 if ei 40. Given SQ , the relationship between the EM algorithm and self-consistent estimator can be derived as follows. Assume that S0 and S only put mass at the observed points x1 o x2 o    o xJ and the interval ðxJ ,1Þ. Define f 0 ðxÞ ¼ P S0 ðX ¼ xÞ and f ðxÞ ¼ P S ðX ¼ xÞ. Then, given SQ and G, we have " P J X xj 4 xi pj f 0 ðxj Þ log f ðxj Þ UðS9S0 Þ ¼ E½log LðSÞ9S0  ¼ di log f ðxi Þ þ ci S0 ðxi Þ i¼1 # P PJ SQ ðxi Þ xj r xi pj f 0 ðxj Þ log f ðxj Þ Q ðxi Þ j ¼ 1 f 0 ðxj Þ log f ðxj Þ þ ei þ ei 1S0 ðxi ÞSQ ðxi Þ 1S0 ðxi ÞSQ ðxi Þ " # Jþ1 J J J X X X X I½x o x  ei SQ ðxi ÞI½xi Z xj  ei Q ðxi Þ þ pj f 0 ðxj Þ dj þ pj f 0 ðxj Þ ci i j þ pj f 0 ðxj Þ ¼ log f ðxj Þ, 1S0 ðxi ÞSQ ðxi Þ S0 ðxi Þ 1S0 ðxi ÞSQ ðxi Þ j¼1 i¼1 i¼1 i¼1 where pj ¼ dj þcj þ ej , dJ þ 1 ¼ 0, pJ þ 1 ¼ 1 and xJ þ 1 ¼ xJ þ c with some c 4 0. To find the M-step S1 , that is the S1 which maximizes UðS9S0 Þ, we need to find an optimal point for the system max

Jþ1 X

Jþ1 X

lj log qj ,

j¼1

qj ¼ 1,

j¼1

where UðS9S0 Þ ¼

PJ þ 1

j¼1

lj ¼ dj þ pj f 0 ðxj Þ

lj log qj with qj ¼ f ðxj Þ and

J X

ci

i¼1

J J X X I½xi o xj  ei SQ ðxi ÞI½xi Z xj  ei Q ðxi Þ þ pj f 0 ðxj Þ þ pj f 0 ðxj Þ : 1S0 ðxi ÞSQ ðxi Þ S0 ðxi Þ 1S ðx ÞS ðx Þ 0 i Q i i¼1 i¼1

Similar to the proof of Theorem 6 of Mykland and Ren (1996), it follows that UðS9S0 Þ has a unique maximum given by P q^ j ¼ lj = Jj þ¼11 lj . Next, for any k, we have " # Jþ1 Jþ1 J J n X X X X X ci I½xi o xj  ei SQ ðxi ÞI½xi Z xj  ei Q ðxi Þ þ pj f 0 ðxj Þ þ pj f 0 ðxj Þ lj I½xj 4 xk  ¼ I½xj 4 xk  dj þ pj f 0 ðxj Þ 1S0 ðxi ÞSQ ðxi Þ S0 ðxi Þ 1S0 ðxi ÞSQ ðxi Þ j¼1 j¼1 i¼1 i¼1 i¼1 ¼

J X Jþ1 X i¼1j¼1

I½xj 4 xk 

  ci I½xi o xj  ei SQ ðxi ÞI½xi Z xj  dj ei Q ðxi Þ þ pj f 0 ðxj Þ þ pj f 0 ðxj Þ þ pj f 0 ðxj Þ 1S0 ðxi ÞSQ ðxi Þ J S0 ðxi Þ 1S0 ðxi ÞSQ ðxi Þ

3 J J J X X X c e S ðx Þ e Q ðx Þ Q i i i i i ~ 1n ðxk Þ þ 4W ¼ I½x 4 x 3x  p f ðx Þ þ I½x o x r x  p f ðx Þ þ I½x o x  p f ðx Þ5 1S0 ðxi ÞSQ ðxi Þ j ¼ 1 k j i j 0 j 1S0 ðxi ÞSQ ðxi Þ j ¼ 1 k j j 0 j S0 ðxi Þ j ¼ 1 j i k j 0 j i¼1 J X

2

J  X ~ 1n ðxk Þ þ W

ci ci ei SQ ðxi Þ S0 ðxk Þei Q ðxi Þ I½xi 4 xk  S0 ðxi Þ þ I½xi r xk  S0 ðxk Þ þ I½xi 4 xk  ½S0 ðxk ÞS0 ðxi Þ þ 1S 1S S ðx Þ S ðx Þ ðx ÞS ðx Þ 0 0 0 Q 0 ðxi ÞSQ ðxi Þ i i i i i¼1   J X ei SQ ðxi Þ S0 ðxk Þei Q ðxi Þ ~ 1n ðxk Þ þ W ~ 2n ðxk Þ þ ci I½x r x  S0 ðxk Þ þ ¼ W I½xi 4 xk  ½S0 ðxk ÞS0 ðxi Þ þ , i k 1S0 ðxi ÞSQ ðxi Þ 1S0 ðxi ÞSQ ðxi Þ S0 ðxi Þ i¼1 ¼



PJ 1 ~ 1n ðtÞ ¼ n1 PJ ~ where W j ¼ 1 dj I½xj 4 t and W 2n ðtÞ ¼ n j ¼ 1 cj I½xj 4 t . Hence, given SQ and G, the following self-consistent equation is equivalent to the EM algorithm: Z 1 Z t ~ Z 1 Q ðuÞ SQ ðuÞ½SðtÞSðuÞ ~ Q n ðduÞ ~ 1n ðtÞ þ W ~ 2n ðtÞ þ SðtÞ þSðtÞ SðtÞ ¼ W G~ n ðduÞ þ G n ðduÞ SðuÞ 1SðuÞSQ ðuÞ 1SðuÞSQ ðuÞ 0 0 t Z t Z t ~ Z 1 Q ðuÞ 1SðtÞ Q n ðduÞ ~ n ðtÞ þ SðtÞ þ SðtÞ ð2:1Þ ¼W G~ n ðduÞ G~ n ðduÞ, SðuÞ 1SðuÞSQ ðuÞ 0 0 1SðuÞSQ ðuÞ t P P ~ where G~ n ðuÞ ¼ n1 Jj ¼ 1 ej I½xj r u =n and n1 Q~ n ðuÞ ¼ Jj ¼ 1 cj I½xj r u =n are the empirical subdistribution function of GðuÞ and ~ Q ðuÞ, respectively. Hence, given SQ , (2.1) algorithm is the EM, and the NPMLE of SF is the solution of (2.1). Notice that Eq. (2.1) can also be derived based on the following equation: SF ðtÞ ¼ PðLo T o U,T 4 tÞ þ PðL oU o T,U 4 tÞ þ PðL oU o t,T 4 tÞ þ PðU o L,T 4tÞ þPðT o Lo U,T 4 tÞ: For doubly censored data, since PðU 4LÞ ¼ 1, (2.1) is reduced to Z 1 Z 1 Z t ~ Z t ~ ½SðtÞSðuÞ ~ 1SðtÞ ~ Q n ðduÞ Q n ðduÞ ~ 2n ðtÞ þ SðtÞ ~ n ðtÞ þ SðtÞ ~ 1n ðtÞ þ W þ  ð2:2Þ SðtÞ ¼ W G n ðduÞ ¼ W G n ðduÞ, SðuÞ 1SðuÞ SðuÞ 1SðuÞ 0 t 0 t P where W 3n ðtÞ ¼ n1 Jj ¼ 1 ej I½xj 4 t . Note that Eq. (2.2) is exactly the EM algorithm of Mykland and Ren (1996) for doubly censored data. As pointed out in Section 1, the self-consistent equation proposed by Shen (2011) does not reduce to (2.2). Since Sn is asymptotically efficient with doubly censored data (see Yu and Li, 2001), it is expected that the estimator Sn

1552

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

should outperform the estimator S^ n when sample size is large and PðLo UÞ C1. Simulation results in Section 3 confirm this assertion. Although given SQ , one can obtain the NPMLE by solving (2.1), SQ is unknown practically. To obtain an alternative MSC based on (2.1), we need to estimate SQ . Similar to the product-limit estimator S^ P , we can obtain a product-limit estimator of SQ ðtÞ by ( ) Y ci 1 S^ Q ðtÞ ¼ : nG^ n ðxi ÞRn ðxi1 Þ xi r t Similar to S^ P , we can establish the strong convergence of S^ Q , i.e. supaQ r t r bQ 9S^ Q ðtÞSQ ðtÞ9-0 almost surely. Let t 4 a~ Q such R 2 ~ ~ o 1, we can also establish the weak convergence of S^ Q ðtÞ that F~ ðtÞ þ Q~ ðtÞ o 1. Under the condition ða ~ ,1 GðduÞ=½ HðuÞ Q

in D½aQ~ , t. Replacing SQ ðxÞ in (2.1) by S^ Q , we obtain an alternative MSC Sn by solving the following MSC equation: ~ n ðtÞ þ Sn ðtÞ Sn ðtÞ ¼ W

Z 0

t

Z t Z 1 1S^ Q ðuÞ 1Sn ðtÞ Q~ n ðduÞ þ Sn ðtÞ G~ n ðduÞ G~ n ðduÞ: ^ Q ðuÞ Sn ðuÞ 1Sn ðuÞS^ Q ðuÞ 0 1Sn ðuÞS t

ð2:3Þ

Notice that the self-consistent estimate based on (2.3) is not the NPMLE of SF since it only maximizes the estimated ^ F Þ ¼ QJ ðSF ðxi1 ÞSF ðxi ÞÞdi ðSF ðxi ÞÞci ½1SF ðxi ÞS^ Q ðxi Þei . When SQ is unknown, it is difficult to derive likelihood function: LðS i¼1 the NPMLE of SF since we are unable to obtain the joint likelihood function of ðX i , di Þ, i ¼ 1, . . . ,n, for SF and SQ . Further research is required in this area. Next, we shall derive the asymptotic properties of the MSC estimator Sn . The proof of the following theorem is inspired from Gu and Zhang (1993), where they derived the asymptotic properties of the SCE of Eq. (2.2) for doubly censored data. Theorem 1. Suppose that G is continuous and PðL otÞPðLo U otÞPðU oL otÞ ¼ SQ ðtÞGðtÞ 40

holds on ðaF ,bF Þ:

ð2:4Þ

Then, supaF o t o bF 9Sn ðtÞSF ðtÞ9-0 a.s. Proof. Similar to Theorem 1 of Gu and Zhang (1993), we shall first prove the uniqueness of the solution (2.3) (Step A) and then prove the uniform consistency of S^ n (Step B). Step A: Uniqueness of the solution of (2.3): The proof of uniqueness is similar to that of Lemma 1 of Gu and Zhang (1993) ~ S^ Q -SQ uniformly, and Sn satisfies (2.3), Sn ðtÞ-SðtÞ for ~ n ðxÞ-WðxÞ, Q~ n -Q~ , G~ n -G, (see Appendix, p. 619). First, since W k each t as nk -1 implies Z t Z 1 Z t GðuÞSF ðuÞQ ðduÞ Q ðuÞ½1SF ðuÞSQ ðuÞ ½1SðtÞ½1SF ðuÞSQ ðuÞ þ SðtÞ GðduÞ GðduÞ: ð2:5Þ SðtÞ ¼ WðtÞ þ SðtÞ SðuÞ 1SðuÞSQ ðuÞ 1SðuÞSQ ðuÞ 0 0 t Let S be a [0,1]-valued nonincreasing function with left support aF and right support bF . By (2.3), it follows that   Z t Z t GðuÞQ ðduÞ Q ðuÞGðduÞSG ðtÞ ½SðtÞSF ðtÞ 1 0

Z ¼ SðtÞ 0

t

0

GðuÞ½SðuÞSF ðuÞ Q ðduÞ½1SðtÞ SðuÞ

Z t

1

½SðuÞSF ðuÞSQ ðuÞ GðduÞ þ SðtÞ 1SðuÞSQ ðuÞ

Z

t 0

½SðuÞSF ðuÞQ ðuÞSQ ðuÞ GðduÞ: 1SðuÞSQ ðuÞ

Let h be a function such that Z 1 Z t Z t GðuÞhðuÞ hðuÞSQ ðuÞ hðuÞQ ðuÞSQ ðuÞ Q ðduÞ½1SðtÞ GðduÞ þ SðtÞ GðduÞ, ð2:6Þ hðtÞKðtÞ ¼ SðtÞ SðuÞ 1SðuÞSQ ðuÞ 0 t 0 1SðuÞSQ ðuÞ Rt Rt where KðtÞ ¼ 1 0 GðuÞQ ðduÞðtÞ 0 Q ðuÞGðduÞSG ðtÞ ¼ SQ ðtÞGðtÞ. Suppose that (2.4) holds. We shall show that hðtÞ ¼ 0 for all t 2 ðaF ,bF Þ by setting hðtÞ ¼ SðtÞSF ðtÞ. Assume hðt 0 Þ 4 0 and 0 oSðt 0 Þ o 1 at some point t 0 . Our goal is to establish a contradiction. Define Z Z Z GðuÞhðuÞ hðuÞSQ ðuÞ hðuÞQ ðuÞSQ ðuÞ Q ðduÞ þ GðduÞ þ GðduÞ: gðtÞ ¼  SðuÞ urt u 4 t 1SðuÞSQ ðuÞ u r t 1SðuÞSQ ðuÞ Note that both KðtÞ and gðtÞ are right continuous and their definitions are different from that of Gu and Zhang (1993), where they were defined as KðtÞ ¼ SQ ðtÞSG ðtÞ and Z Z hðuÞ hðuÞ Q ðduÞ þ GðduÞ: gðtÞ ¼  u r t SðuÞ u 4 t 1SðuÞ By (2.6), we have KðtÞhðdt þ Þ ¼ gðtÞSðdt þÞ and KðtÞhðdtÞ ¼ gðtÞSðdtÞ, where hðdtÞ ¼ hðtÞhðtÞ and hðdt þÞ ¼ hðt þ ÞhðtÞ. Define t 1 ¼ supfaF ot r t 0 : hðtÞ r 0g, t 2 ¼ infft 0 r t obF : hðtÞ r0g and H ¼ ft : hðtÞ 4 0,t 1 r t rt 2 g. Then, t 0 2 H, and

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

ðt 1 ,t 2 Þ  H  ½t 1 ,t 2 . First   GðtÞ SQ ðtÞ Q ðtÞSQ ðtÞ Q ðdtÞ GðdtÞ þ GðdtÞ r 0 gðdtÞ ¼ hðtÞ  SðtÞ 1SðtÞSQ ðtÞ 1SðtÞSQ ðtÞ

on H:

1553

ð2:7Þ

To establish a contradiction, we need Step 1 as follows: Step 1: Show that gðtÞ ¼ gðtÞ ¼ 0 on H. By assumption (2.4) and the arguments of Lemma 1 of Gu and Zhang (1993) (see p. 620), it follows that gðtÞ ¼ gðtÞ ¼ 0 on H. Step 2: Find a contradiction. Since hðtÞ 40 on H, by assumption (2.4), Step 1 and (2.7), we have Q ðdtÞ ¼ GðdtÞ ¼ 0, KðtÞ ¼ KðtÞ ¼ constant 4 0 on H. Hence, we have hðdtÞ ¼ hðdt þ Þ ¼ 0 on H, so that hðtÞ ¼ hðt 0 Þ 4 0 on H and t 0 2 H ¼ ðt 1 ,t 2 Þ. Therefore, hðt 1 Þ r0 and hðt 1 þ Þ ¼ hðt 0 Þ 40. Since Kðt 1 Þ 4 0, it follows that gðt 1 þÞ o0, which is a contradiction to Step 1, i.e. gðt 1 þÞ ¼ 0. By setting hðtÞ ¼ SðtÞSF ðtÞ, it follows that hðtÞ ¼ 0 for t 2 ðaF ,bF Þ. The proof of the uniqueness is completed. Step B: Uniform consistency: By (2.3) all limit points of Sn must satisfy (2.5), by Helly–Bray selection theorem we have ~ 1 ðtÞ ¼ PðX 4 t, d ¼ 1Þ and W ~ 1n ðtÞ is the empirical function of W ~ 1 ðtÞ. If SF ðdtÞ o 0 S^ n ðtÞ-SF ðtÞ a.s. for t 2 ðaF ,bF Þ. Note that W then by (2.3) Z Z t Z 1 ~ 1n ðdtÞ ~ 1 ðdtÞ Sn ðuÞSn ðuÞS^ Q ðuÞ ~ 1 Q~ n ðduÞ W W r1  G n ðduÞ G~ n ðduÞ^ ^ Sn ðdtÞ S SðdtÞ ðuÞ n u o tE 1Sn ðuÞS Q ðuÞ 0 Sn ðuÞ½1Sn ðuÞS Q ðuÞ t ~ 1n ðdtÞ=W ~ 1 ðdtÞ-1. Hence, sup as n-1 and then E-0 þ , which implies 9Sn ðdtÞ9Z ð1oð1ÞÞ9SF ðdtÞ9, since W t2ðaF ,bF Þ 9Sn ðtÞ SF ðtÞ9-0 a.s. The proof is completed. & pffiffiffi In order to derive the asymptotic normality of n½Sn ðtÞSF ðtÞ, similar to Theorem 2 of Gu and Zhang (1993) (see p. 613), we define four linear operators as follows. For any survival function S, let AS , RS , K and BS be the linear operators defined by Z 1 Z t Z t GðuÞhðuÞ hðuÞSQ ðuÞ hðuÞQ ðuÞSQ ðuÞ Q ðduÞ½1SðtÞ ðAS hÞðtÞ ¼ SðtÞ GðduÞ þSðtÞ GðduÞ ð2:8Þ SðuÞ 1SðuÞSQ ðuÞ 0 t 0 1SðuÞSQ ðuÞ RS ¼ AS K,

ðKhÞðtÞ ¼ KðtÞhðtÞ

ð2:9Þ

and ð1Þ

ð2Þ

ð3Þ

ð4Þ

BS ðh ,h ,h ,h ÞðtÞ ¼

3 X

ðjÞ

Z

t

½1h ðtÞ þSðtÞ 0

j¼1

ð2Þ Z t Z 1 ð4Þ h~ ðduÞ SðuÞh ðuÞ ~ ð3Þ 1SðtÞ ð3Þ þSðtÞ h h ðduÞ: ðduÞ ð4Þ ð4Þ SðuÞ 0 SðuÞ½1h ðuÞ t 1h ðuÞ

ð2:10Þ

By (2.8) and (2.10), we have ~ H ÞðtÞW ~ n ðtÞSn ðtÞ ASn ðSn SF Þ ¼ BSn ðF~ , Q~ , G,S Z

Z

t

GðuÞQ ðduÞ

0

Z 1 1Sn ðuÞSQ ðuÞ 1Sn ðuÞSQ ðuÞ GðduÞ þ ð1Sn ðtÞÞ GðduÞ: 1S 1SF ðuÞSQ ðuÞ ðuÞS ðuÞ F Q 0 t pffiffiffi Since Sn ðtÞ ¼ BSn ðF~ n , Q~ n , G~ n , S^ H ÞðtÞ, RSn xn ¼ BSn Z n , where xn ¼ nðSn SF Þ and pffiffiffi pffiffiffi pffiffiffi pffiffiffi ~ nðS^ Q SQ ÞÞ: Z n ¼ ð nðF~ n F~ Þ, nðQ~ n Q~ Þ, nðG~ n GÞ, t

Sn ðtÞ

Q ðuÞ

Let ðDðaF ,bF Þ,J  JF Þ be the Banach space of all real-valued functions defined on ðaF ,bF Þ which are right-continuous and have left-limit at t o bF , where JhðtÞJF ¼ supaF o t o bF hðtÞ. Define Banach spaces ðDK ðaF ,bF Þ,J  JK Þ ¼ fh : Kh 2 DðaF ,bF Þg, JhJK ¼ JKhJF , P ð1Þ ð2Þ ð3Þ ð4Þ ðjÞ ðDZ ,J  JZ Þ ¼ fh 2 D  D  D  D : BSF ðhÞ 2 DðaF ,bF Þg, Jðh ,h ,h ,h ÞJZ ¼ 4j ¼ 1 Jh JF . By (2.9), we have BSF ðF~ n F~ Þ,ðQ~ n Q~ Þ, ~ S^ Q SQ ÞÞ 2 DðaF ,bF Þ and ðG~ n GÞ,ð d

Z n -Z ¼ ðZ 1 ,Z 2 ,Z 3 ,Z 4 Þ

in DZ ,

ð2:11Þ

E½Z 1 ðtÞZ 1 ðsÞ ¼ F~ ðmaxðt,sÞÞF~ ðtÞF~ ðsÞ, . . . ,E½Z 4 ðtÞZ 4 ðsÞ ¼ SH ðmaxðt,sÞÞSH ðtÞSH ðsÞ; and where E½Z i ðtÞ ¼ 0 ði ¼ 1; 2,3; 4Þ, ~ E½Z 1 ðtÞZ 2 ðsÞ ¼ F~ ðtÞQ~ ðsÞ, . . . ,E½Z 3 ðtÞZ 4 ðsÞ ¼ GðtÞS H ðsÞ. pffiffiffi Next, we derive the asymptotic normality of nðSn SF Þ. The proof of the following theorem is similar to that of Theorem 2 of Gu and Zhang (1993) (see p. 617). The main idea of the proof is via strong continuity of linear operators indexed by survival functions in the metric space F S ¼ fS : SSF 2 DðaF ,bF Þg with the distance JSSF JF . Theorem 2. Under the assumptions of Theorem1 and for all 0 o t o 1 Z Z Q ðduÞ GðduÞ þ o1: 1t o SF ðuÞ o 1 KðuÞ 0 o SF ðuÞ o t KðuÞ

ð2:12Þ

1554

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

Then, R1 SF , the inverse of RSF in (2.9), exists as a bounded operator from DðaF ,bF Þ to DK ðaF ,bF Þ, and pffiffiffi d in DK ðaF ,bF Þ, nðSn ðtÞSF ðtÞÞ ¼ xn -x ¼ R1 SF BSF Z where Z is the Gaussian process in (2.11). Proof. Let ST,m , Gm , Q m , m Z1, and S be survival functions such that JST,m SJF -0 JQ m Q JF -0, JGm GJF -0. Note that the existence of ST,m is guaranteed by (2.4) and (2.12). Let hm , g m . m Z1, and g be functions in DðaF ,bF Þ such that Jg m gJ-0 and Rm hm ¼ g m , where Rm ¼ Am K m , and Am , Rm , and K m are defined as (2.8)–(2.10) with ðS,Q ,GÞ replaced by ðST,m ,Q m ,Gm Þ. Under assumption (2.12) we have Z  Z dQ m ðduÞ Gm ðduÞ þ ¼ 0: ð2:13Þ lim sup t-aF þ m 1t o SF ðuÞ o 1 K m ðuÞ 0 o SF ðuÞ o t K m ðuÞ Define v m ¼ þ vm ¼

Z

t

0

Z t

ST,m ðtÞGm ðuÞhm ðuÞ Q m ðduÞ þST,m ðtÞ ST,m ðuÞ

1

Z 0

t

ST,m ðtÞQ m ðuÞSQ m ðuÞhm ðuÞ Gm ðduÞ, 1ST,m ðuÞSQ m ðuÞ

½1ST,m ðtÞSQ m ðuÞhm ðuÞ Gm ðduÞ, 1ST,m ðuÞSQ m ðuÞ

where SQ m ðuÞ ¼ 1Q m ðuÞ. þ þ  By definition of vm , v m and Rm hm ¼ g m , we have K m hm ¼ g m þvm þ vm . 1 Step 1: Existence of RS as a linear operator from DðaF ,bF Þ to DK ðaF ,bF Þ for all S 2 F S ¼ fS : SSF 2 DðaF ,bF Þg. A sequence of functions is totally bounded if every subsequence contains a uniformly convergent further subsequence. þ By (2.12), for a fixed 0 o t0 o 1, both vm ðtÞI½SðtÞ r t0  and v m ðtÞI½SðtÞ 4 t0  are totally bounded for the case JK m hm JF r 1. Further,  þ since SSF 2 DðaF ,bF Þ, we have Jvm ðtÞJF ¼ op ð1Þ and Jvm ðtÞJF ¼ op ð1Þ. Similar to the arguments of Steps 3 and 4 of Lemma 2 of Gu and Zhang (1993), under condition (2.13), it follows that there exists h 2 DK ðaF ,bF Þ such that JKhm KhJF -0, RS h ¼ g and the solution h of RS h ¼ g is unique. Define h ¼ R1 S g. This completes the proof of Step 1. Step 2: Strong continuity of fR1 S ,S 2 F S g. Let g 0m 2 DðaF ,bF Þ and Sm in F S be such that Jg 0m gJF -0 and JSm SJF -0. Similar to Step 2 of Theorem 2 of Gu and Zhang 1 0 1 (1993), it follows that JK m hm KR1 S gJF -0, which implies that JKRST,m g m KRS gJF -0. Hence, we have the strong continuity. This completes the proof of Step 2. Step 3: Strong continuity of fBS ,S 2 F S g. Let h be a simple function in DZ . Since SSF 2 DðaF ,bF Þ, by (2.10), BS h-BSF h in DðaF ,bF Þ as JSSF JF -0. Since JBS JF r 2 and the collection of simple functions is dense in DZ , we have the strong continuity. This completes the proof of Step 3. pffiffiffi Similar to the arguments of Steps 4 and 5 of Theorem 2 of Gu and Zhang (1993), it follows that nðSn ðtÞSF ðtÞÞ converges in distribution in DK ðaF ,bF Þ. The proof is completed. & Next, we briefly discuss the conditions required for deriving the asymptotic properties of S^ P , S^ n and Sn . Notice that we require condition (2.4), i.e. KðxÞ ¼ GðxÞSQ ðxÞ 4 0 for x 2 ðaF ,bF Þ, to derive weak convergence of both S^ n and Sn . To derive weak ~ ~ convergence of S^ P ðtÞ, we require condition (1.1). Since GðdxÞ ¼ GðdxÞ½1SQ ðxÞSF ðxÞ and HðxÞ ¼ GðxÞ½1SQ ðxÞSF ðxÞ, condition (1.1) is equivalent to Z GðdxÞ o 1: ð2:14Þ 2 ðaF ,1Þ G ðxÞ½1SQ ðxÞSF ðxÞ When GðxÞ 40 and 0 o SQ ðxÞ o 1 for x 2 ðaF ,bF Þ, both (2.4) and (2.14) hold. However, neither (2.4) nor (2.14) is more restrictive than the other. In Section 3, simulation study is set up using aF ¼ aG ¼ aQ ¼ 0 and bF ¼ bG ¼ bQ ¼ 1. Since T, L and U are continuous and independent of one anther, both conditions (2.4) and (2.14) hold. 3. Simulation study A simulation study is conducted to compare the performance between the two estimators Sn and S^ n . The T’s are i.i.d. exponential distributed with scale parameter equal to 1, i.e. FðxÞ ¼ 1ex for x 4 0. The U’s are i.i.d. exponential distributed with scale parameters lq , i.e. Q ðxÞ ¼ 1elq x for x 4 0. The L’s are i.i.d. exponential distributed with scale parameters lg , i.e. GðxÞ ¼ 1elg x for x 4 0. The T, U and L are independent of one another. Then the variables X and d are generated as described in Section 1. The goal is to estimate the survival function of T: SF ðtÞ ¼ pf , where pf is chosen as pf ¼ 0:75,0; 5,0:25. The values of (lg , lq Þ are chosen as ð0:75,1:0Þ, ð30,4:0Þ, ð15,1:0Þ, ð2:0,0:5Þ, and ð4:0,0:1Þ. The sample sizes are chosen as 100, 200 and 500. The replication is 1000 times. Tables 1–3 show the biases and root mean squared errors (denoted by RMSE) of the two estimators for Sð0:29Þ ¼ 0:75, Sð0:69Þ ¼ 0:5 and Sð1:39Þ ¼ 0:25, respectively. For the purpose of comparison, we also presented the simulated results of the product-limit estimator S^ P ðtÞ. Tables 1–3 also show the ratio of root mean squared errors of Sn and S^ n to that of S^ P denoted by r 1 and r 2 , respectively. Further, Tables 1–3 also list the simulated proportions for d ¼ 1, d ¼ 2 and d ¼ 3, denoted by p1 , p2 and p3 , respectively.

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

1555

Table 1 Simulation results for biases and RMSE of Sn , S^ n and S^ P , Sð0:29Þ ¼ 0:75.

lg

0.75 0.75 0.75

lq

n

p1

p2

p3

Sn (0.29)

S^ n (0.29)

S^ P (0.29)

Bias

RMSE

Bias

RMSE

Bias

r1

r2

RMSE

1.0 1.0 1.0

100 200 500

0.14 0.14 0.14

0.16 0.16 0.16

0.70 0.70 0.70

 0.029  0.015  0.010

0.106 0.074 0.053

 0.032  0.019  0.016

0.112 0.077 0.058

0.035 0.034 0.022

0.135 0.091 0.068

0.79 0.81 0.78

0.83 0.85 0.85

30 30 30

4.0 4.0 4.0

100 200 500

0.15 0.15 0.15

0.71 0.71 0.71

0.14 0.14 0.14

 0.009  0.007  0.004

0.062 0.046 0.025

 0.003  0.010  0.007

0.060 0.044 0.025

0.019  0.001 0.001

0.065 0.049 0.027

0.95 0.94 0.93

0.92 0.90 0.93

15 15 15

1.0 1.0 1.0

100 200 500

0.42 0.42 0.42

0.44 0.44 0.44

0.14 0.14 0.14

 0.012  0.010  0.006

0.053 0.038 0.022

 0.006  0.009  0.006

0.051 0.037 0.022

0.011 0.008 0.010

0.053 0.036 0.024

1.00 1.06 0.92

0.96 1.03 0.92

2.0 2.0 2.0

0.5 0.5 0.5

100 200 500

0.43 0.43 0.43

0.16 0.16 0.16

0.41 0.41 0.41

 0.010  0.007  0.007

0.073 0.053 0.032

 0.013  0.010  0.004

0.075 0.055 0.034

0.022 0.014 0.011

0.082 0.061 0.038

0.89 0.87 0.84

0.87 0.90 0.89

4.0 4.0 4.0

0.1 0.1 0.1

100 200 500

0.72 0.72 0.72

0.10 0.10 0.10

0.18 0.18 0.18

 0.006  0.005  0.004

0.051 0.035 0.020

 0.003  0.003  0.001

0.051 0.037 0.025

0.024 0.014 0.008

0.065 0.043 0.029

0.78 0.81 0.69

0.78 0.86 0.86

r1

r2

Table 2 Simulation results for biases and RMSE of Sn , S^ n and S^ P , Sð0:69Þ ¼ 0:50.

lg

0.75 0.75 0.75

lq

n

p1

p2

p3

Sn (0.69)

S^ n (0.69)

S^ P (0.69)

Bias

RMSE

Bias

RMSE

Bias

RMSE

1.0 1.0 1.0

100 200 500

0.14 0.14 0.14

0.16 0.16 0.16

0.70 0.70 0.70

 0.014  0.010  0.005

0.122 0.086 0.054

 0.017  0.009  0.012

0.128 0.090 0.060

0.037 0.019 0.010

0.142 0.095 0.063

0.86 0.91 0.86

0.90 0.95 0.95

30 30 30

4.0 4.0 4.0

100 200 500

0.15 0.15 0.15

0.71 0.71 0.71

0.14 0.14 0.14

 0.017  0.012  0.008

0.108 0.082 0.055

0.021 0.011 0.006

0.103 0.078 0.055

0.015 0.009 0.002

0.124 0.103 0.068

0.87 0.80 0.81

0.83 0.76 0.81

15 15 15

1.0 1.0 1.0

100 200 500

0.42 0.42 0.42

0.44 0.44 0.44

0.14 0.14 0.14

 0.009  0.004  0.006

0.069 0.043 0.032

 0.005  0.002  0.000

0.067 0.043 0.031

0.004 0.005 0.002

0.066 0.045 0.032

1.04 0.96 1.00

1.02 0.96 0.97

2.0 2.0 2.0

0.5 0.5 0.5

100 200 500

0.43 0.43 0.43

0.16 0.16 0.16

0.41 0.41 0.41

 0.004  0.006  0.001

0.074 0.053 0.036

 0.003  0.003  0.000

0.076 0.054 0.037

0.029 0.016 0.013

0.082 0.054 0.041

0.90 0.98 0.88

0.93 0.98 0.90

4.0 4.0 4.0

0.1 0.1 0.1

100 200 500

0.72 0.72 0.72

0.10 0.10 0.10

0.18 0.18 0.18

 0.007  0.002  0.000

0.053 0.039 0.019

 0.005  0.001  0.001

0.052 0.039 0.023

0.008 0.011 0.007

0.056 0.043 0.025

0.95 0.91 0.76

0.93 0.91 0.92

Based on the results of Tables 1–3, we conclude that (i) For the estimation of Sð0:29Þ ¼ 0:25 and Sð0:69Þ ¼ 0:5, in terms of RMSE, both S^ n and Sn outperform the product estimator S^ P for most of the cases considered. The ratio of root mean squared errors of S^ n to that of S^ P ranges from 0.76 to 1.02. The ratio of root mean squared errors of Sn to that of S^ P ranges from 0.76 to 1.06. (ii) For the estimation of Sð1:69Þ ¼ 0:75, the bias and standard deviation of both S^ n and Sn are smaller than that of S^ P for all the cases considered. When right censoring is heavy (i.e. p2 ¼ 0:71), the biases and standard deviations of both S^ n and Sn are much smaller than that of S^ P . The ratio of root mean squared errors of S^ n to that of S^ P ranges from 0.55 to 1.00. The ratio of root mean squared errors of Sn to that of S^ P ranges from 0.58 to 0.98. (iii) When left-censoring is heavy and right-censoring is light (i.e. p3 ¼ 0:70), the standard deviations of Sn are smaller than that of S^ n . However, when right-censoring is heavy (i.e. p2 ¼ 0:71), the situation is reverse. One explanation for the results is that Q~ n on which the MSC estimator S^ n is based, is a function of the data with X i r t and di ¼ 2 and G~ n , on which the MSC estimator Sn is mainly based, is a function of the data with X i rt and di ¼ 3. In terms of RMSE, the ratio of root mean squared errors of Sn to that of S^ n ranges from 0.83 to 1.05.

1556

P.-s. Shen / Journal of Statistical Planning and Inference 142 (2012) 1549–1556

Table 3 Simulation results for biases and RMSE of Sn , S^ n and S^ P , Sð1:39Þ ¼ 0:25.

lg

0.75 0.75 0.75

lq

n

p1

p2

p3

Sn (1.39)

S^ n (1.39)

S^ P (1.39)

Bias

RMSE

Bias

RMSE

Bias

RMSE

r1

r2

1.0 1.0 1.0

100 200 500

0.14 0.14 0.14

0.16 0.16 0.16

0.70 0.70 0.70

0.007 0.003 0.000

0.093 0.065 0.034

0.007  0.007  0.006

0.097 0.069 0.037

0.028 0.011 0.004

0.105 0.075 0.039

0.89 0.87 0.87

0.92 0.92 0.95

30 30 30

4.0 4.0 4.0

100 200 500

0.15 0.15 0.15

0.71 0.71 0.71

0.14 0.14 0.14

0.110 0.078 0.043

0.150 0.099 0.069

0.113 0.074 0.037

0.146 0.095 0.067

0.165 0.100 0.045

0.216 0.171 0.112

0.69 0.58 0.62

0.68 0.55 0.60

15 15 15

1.0 1.0 1.0

100 200 500

0.42 0.42 0.42

0.44 0.44 0.44

0.14 0.14 0.14

0.013 0.008 0.002

0.071 0.048 0.030

0.007 0.003 0.001

0.072 0.047 0.029

0.012 0.006 0.004

0.074 0.049 0.031

0.96 0.98 0.97

0.97 0.96 0.94

2.0 2.0 2.0

0.5 0.5 0.5

100 200 500

0.43 0.43 0.43

0.16 0.16 0.16

0.41 0.41 0.41

0.010 0.005 0.007

0.056 0.041 0.024

0.009 0.003 0.004

0.058 0.043 0.026

0.024 0.012 0.001

0.062 0.047 0.028

0.90 0.87 0.86

0.94 0.91 0.93

4.0 4.0 4.0

0.1 0.1 0.1

100 200 500

0.72 0.72 0.72

0.10 0.10 0.10

0.18 0.18 0.18

0.006 0.008 0.004

0.046 0.033 0.018

0.001  0.003 0.000

0.045 0.035 0.020

0.008 0.005 0.003

0.050 0.035 0.022

0.92 0.94 0.82

0.90 1.00 0.91

(iv) When lg is much larger than lq , i.e. ðlg , lq Þ ¼ ð4,0:1Þ, we have PðL oUÞ C 1. In this case, the estimator Sn outperforms S^ n . When n ¼ 500, the ratio of root mean squared errors of Sn to that of S^ n is equal to 0.80, 0.83 and 0.90 for the estimation of Sð0:29Þ ¼ 0:75, Sð0:69Þ ¼ 0:50 and Sð1:39Þ ¼ 0:25, respectively.

4. Discussion Based on the modified likelihood function, we have proposed an alternative MSC estimator Sn of the survival function with twice censored data. The asymptotic properties of Sn are derived. Our simulation results indicate that both MSC estimators S^ n (Shen, 2011) and Sn perform adequately for finite sample. When right-censoring is heavy, S^ n tends to outperform Sn . However, when left-censoring is heavy, the situation can be reverse. References Chang, M.N., Yang, G.L., 1987. Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Annals of Statistics 15, 1536–1547. Gu, M.G., Zhang, C.H., 1993. Asymptotic properties of self-consistent estimators based on doubly censored data. Annals of Statistics 21, 611–624. Mykland, P.A., Ren, J., 1996. Algorithms for computing self-consistent and maximum likelihood estimators with doubly censored data. Annals of Statistics 24, 1740–1764. Patilea, V., Rolin, J.-M., 2006. Product-limit estimators of the survival function with twice censored data. Annals of Statistics 34 (2), 925–938. Shen, P.-S., 2011. Nonparametric estimators of the survival function with twice censored data. Annals of the Institute Statistical Mathematics 63, 1207–1219. Tsai, W.Y., Crowley, J., 1985. A large sample study of generalized maximum likelihood estimators from incomplete data via self-consistency. Annals of Statistics 13, 1317–1334. Turnbull, B.W., 1974. Nonparametric estimation of a survivorship function with doubly censored data. Journal of the American Statistical Association 69, 169–173. Yu, Q.Q., Li, L.X., 2001. Asymptotic properties of self-consistent estimators with doubly censored data. Acta Mathematica Sinica 17, 581–594.