STATISTICS & ELSEVIER
Statistics & Probability Letters 37 (1998) 381-389
Asymptotic properties of Kaplan-Meier estimator for censored dependent data Zongwu Cai* epartment of Mathematics, Southwest Missouri State University, Springfield, MO 65804, USA Received 1 June 1996
Abstract In some long term studies, a series of dependent and possibly censored failure times may be observed. Suppose that the failure times have a common marginal distribution function, and inferences about it are of interest to us. The main result of this paper is that, under certain regularity conditions, the Kaplan-Meier estimator can be expressed as the mean of random variables, with a remainder of some order. In addition, the asymptotic normality of the Kaplan-Meier estimator is derived. (~) 1998 Elsevier Science B.V. All rights reserved
A M S classification: Primary: 62G05; secondary: 62M09; 60G10; 62G30 Keywords: a-mixing; Censored dependent data; Kaplan-Meier estimator; Strong representation; Asymptotic normality
1. Introduction and preliminary results Let Ti . . . . . 7", be a sequence o f the true survival times for the n individuals in the life table. The random variables (r.v.s) are not assumed to be mutually independent (see assumption (K1) for the kind o f dependence stipulated); it is assumed, however, they have a common unknown continuous marginal distribution function (d.f.) F ( x ) = P ( T i <<,x) such that F ( 0 ) = 0. Let the r.v.s T,. be censored on the right by the censoring r.v.s ~ , so that one observes only (Zi, 6i), where
Zi=Ti
Yi
and
6i=I(Ti<~Yi),
i = 1 . . . . . n,
here and in the sequel, denotes minimum and I(A) is the indicator random variable o f the event A. In this random censorship model, the censoring times Y~, i = 1. . . . . n, are assumed to be i.i.d, with distribution function G ( y ) = P ( Y i < ~ y ) such that G ( 0 ) = 0 ; they are also assumed to be independent o f the r.v.s Ti's. The problem at hand is that o f drawing nonparametric inference about F based on the censored observations
* Tel.: ÷ ! 417 836 5887; fax: ÷ 1417 888 2465; e-mail:
[email protected]. 0167-7152/98/$19.00 (~) 1998 Elsevier Science B.V. All rights reserved PH S01 67-7152(97)00 141-7
382
. Cai / Statistics
(Z/,(~i) ,
P r o b a b i l i t y L e t t e r s 3 7 ( 1 9 9 8 ) 381
i = 1,...,n. For this purpose, define two stochastic processes on [0,
Nn(t)= ~ I(Zi<~t,cSi=l) = ~ i--I
I(Ti<.t
389
) as follows:
Yi),
i I
the number of uncensored observations less than or equal to t, and
l(Zi>~t),
Y,(t)-- ~ i-I
the number of censored or uncensored observations greater than or equal to t. Then, the Kaplan-Meier (K-M) estimator F, is given by 1 -
F.(t)
=
,,'<,
1
dN,(s)~ Y,(s) J'
where dN,(s)=N,(s)- N,(s-). As is known (see, e.g., Gill, 1980), for a d.f. F on [0, hazard function A is defined by A(t)=
f0 t
), the cumulative
dF(s) 1 -F(s-)'
and A(t)= - l o g ( 1 - F ( t ) ) for the case that F is continuous. The empirical cumulative hazard function is given by
A,(t) =
A,(t)
f0 t dN,,(s)
Yn(s) '
which is called the Nelson estimator of A(t) in the literature. For the case that the failure time observations are mutually independent, the K-M estimator has been studied extensively by many authors during the recent three decades, such as, Breslow and Crowley (1974), Peterson (1977), Gill (1980, 1981, 1983), Lo and Singh (1986), Wang (1987), Stute and Wang (1993), and others. However, there are preciously few results available, for the case that these observations exhibit some kind of dependence. For example, Voelkel and Crowley (1984) used an approach, called semi-Markov process, to establish a reasonable model in Cancer Research Clinical Trials that assumes each patient may either remain in an initial state, or progress, or respond and then possible relapse. Ying and Wei (1994) explored consistency and asymptotic normality of F,(t) under -mixing context. An application of the right censoring model in Diabetes Control and Complications Trial was given by them for a special dependent case, in which survival times are highly stratified. Our basic aim in this article is to express the K-M estimator as the mean of bounded random variables with a remainder of order O(n ~/2(logn) ~) for some 2 > 0 , for the case in which the underlying failure times are assumed to be a-mixing whose definition is given below. In addition, asymptotic normality of the K-M estimator is represented in Section 3. Let ~/k(X) denote the a-field of events generated by {Xj, i<~j<<.k}. For easy reference, let us recall the following definition. Definition. Let {X,., i = 0, +1,:t:2 .... } denote a sequence of r.v.s. Given a positive integer n, set ~ ( n ) = sup
IP(A B ) - P(A)P(B)] :A C ~Jlk(X) and B C ~+,(X) .
The sequence is said to be a-mixing (strongly mixing) if the mixing coefficient a(n)
0 as n
• Cai / Statistics
Probability Letters 37 (1998) 3 8 1 - 3 8 9
383
Among various mixing conditions used in the literature, a-mixing is reasonably weak, and has many practical applications. Many stochastic processes and time series are known to be or-mixing. Withers (1981) obtained various conditions for a linear process to be a-mixing. Under certain weak assumptions autoregressive and more generally bilinear time series models are strongly mixing with exponential mixing coefficients. Auestad and Tj stheim (1990) provided illuminating discussions of the role ofc~-mixing for model identification in non-linear time series analysis. For the sake of simplicity, the assumptions used in this paper are listed below. It should be pointed out that, throughout the paper, the letter C is used indiscriminately as a generic constant, and all limits are taken as n unless otherwise specified. Assumptions. (K1) Suppose that {T/; j~>l} is a sequence of stationary a-mixing random variables with continuous distribution function F. (K2) Suppose that the censoring time variables { Yj; j >~1} are i.i.d, with continuous distribution function G, and are independent of {Tj; j~> 1}. (K3) ~(n)=O(n v) for some v>3. In this section, a number of lemmas are presented to be employed later in subsequent parts of the paper. Lemma 1. Let {Xn; n>~ 1} be a sequence of a-mixing r.v.s with mixing coefficient c~(n) which are independent ~f an i.i.d sequence of r.v.s {Y,; n~>l}, then {(X,,Y,,); n~>l} is a sequence of:t-mixing r.v.s with mixing coefficient 4~(n). In particular, so is {Xn Y,,; n >~1}. Proof. For any sets A E :~71k(X, Y) = cr(Xj, Yj; 1 ~
~k + n).
IP(A
B ) - P(A)P(B)[
= IE(I(A )I(B)) - EI(A )El(B)[ = IE[E(I(A ) I ( B ) l ~ k ( Y , Y))]
-
EI(A)EI(B)I
=[E{I(A)[E(I(B)I,~k(X, Y)) - El(B)]}] <.glg(I(B)l.~k(X, Y)) - gI(B)l. Since
E(I(B)I,~k(X, Y ) ) = E ( I ( B ) [ X ~ , . . . ,Ark, Y~.... , Yk), and Yi, 1 <~i <~k, are independent of Xi, 1 ~~k + n, then we have
E(I(B)I,~k(X, Y ) ) = E ( ( ~ ; j>~k + n)]X~ ..... Xk), where (XJ; j>~k + n)=E(I(B)]~.; .j>~k + n). Clearly, ] ]~<1 and E(I(B)]~k(X, Y)) is measurable with respect to (w.r.t.) ~k(X). Let
q=sgn[E(l(B) l~flk(X, Y)) - El(B)] = sgn[E( (X/; j>>,k + n)[~k(X)) - El(B)],
. Cai / Statistics
384
ProbabilityLetters 37 (1998) 381 389
then, q is measurable w.r.t. ~ k ( X ) and Iql ~< 1. Therefore, by Theorem 17.2.1 in Ibragimov and Linnik (1971, p. 306) and the fact that (Xj; j >~k + n) is measurable w.r.t. ~ + n ( X ) , one has that
IP(A
B ) - P ( A ) P ( B ) [ ~~k +n))~<4c~(n).
This completes the proof of the lemma. Let H be the distribution of the Zi's, given by
H= 1 - H =F G=(1 -F)(1
- G)
and define (possibly infinite) times
F,
6 and
H by
F = inf{y :F(y)---- 1}. Then,
H= F
G (see, e.g., Stute and Wang, 1993). By setting
F ( t ) = P ( Z I <<.t,61 = 1), we have then, F (t)=
/o
F(t
/o'
z)dG(z)=
[1 - G(z)]dF(z).
Let
~=~(F,G)=e(T<~Y)=fo
F ( z ) d G ( z ) = ~o [1 - G(z)]dF(z)
and assume that ~ > 0. Clearly, F (t)/~ is the conditional distribution function of Z given 6 = 1. Define F. = inf{t " F (t) = ~}. Then it is easy to see that F. = F
G, SO that H = Y.. Also, we have for continuous F
t d F (s)
A(t) =
fo
-H(s)
Let
Nn(t)=Nn(t)/n
and
Yn(t)=Yn(t)/n.
Let us first consider the uniform convergence rate of the empirical cumulative hazard function An. Namely, we have the following result. Theorem 1. Under assumptions ( K 1 ) - ( K 3 ) , sup
IL(t)-H(t)[ =O(a,)
a.s.,
(1)
a.s.,
(2)
t~>O
sup [ N . ( t ) - F
(t)l = O ( a . )
t~>0
where an= (l°gl°ngn) l/2 "
(3)
• Cai / Statistics Probability Letters 37 (1998) 381-389 Consequently, we have for any 0 < sup [A,,(t)- A(t)l =O(an)
385
< H: a.s.
(4)
0~
In order to prove Theorem 1, we need the following lemma, which is Theorem 3.2 in Cai and Roussas (1992), stated here without proof.
Lemma 2. Let {X,}, n~> 1, be a stationary ~-mixing sequences of r.v.s with d.f F and mixing coefficient ~(n) = O(n v)for some v>3, and let F,,, be the empirical d.f based on the segments Xl ..... X,. Then sup
IF,,(x)
- F(x)[ = O ( a , )
a.s.
xE~
We now proceed with the proof of Theorem 1 by utilizing Lemma 2. Proof of Theorem 1. It is easy to see from Lemma 1 that {Zi; i>~ 1} and {(Zi,6i); i>~ 1} are two sequences of stationary a-mixing r.v.s. Then (1) and (2) follow by Lemma 2 and the fact that both 1 - Y, and N, are empirical functions. An application of Lemma 2 in Gill (1981) now yields sup
o<~,<~
IA.(t)
-
A(t) ~<~2pl
(N,,F.) + p (Y,,H)[N,( ) + p (N,,F.)] Y,( ) Y,( )[IF',( ) - p (Y,,H)]
(5)
where p is the supremum metric on [0, ]. Therefore, (4) holds tree from (5), (1) and (2). This completes the proof of the theorem. [] Lemma 3 (Theorem 3 in Dhompongsa 1984). Under assumptions (K1) and (K3), there exists a Kiefer process {K(s, t), s C ~, t >~0} with covariance function
E[K(s, t)K(s', t')] = F(s, s') min(t, t') and F(s, s') is defined by I'(S,S t) = Cov(gl(s),gl(st) ) Jr- ~
[ C ° V ( g l ( s ) , gk(st) )
@ Cov(gl(st),gk(S) )],
k=2
where gk(s)=l(Zk <~s)- H(s), such that, for some 2 > 0 depending only on v, given in assumption (K3), sup [Y,(t) - H(t) - K(t,n)/n I = O ( b , ) , tCR
a.s.
where b, = n-I/2(log n)-~.
2. Strong representation results Let g(x)
=
Z x( H ( s ) )
2 dF,(s),
(6)
386
. Cai / Statistics
Probability Letters 37 (1998) 381 389
and for positive reals z and x, and /~ 0 or 1, let
{(z,&x)=q(z Ax) - l(z<~x,O= 1)/H(z). Observe that
E(~(Zi, Si,x))=O and Cov({(Zi, 5i, s), {(Zi, 5i, t)) = g(s A t). Now, let us write
A,,(O - A(t)=. fot dN,(s) ~
fo' dF.(s) ~77 l ~ d [ N , ( s ) - F,(s)] H(-s) 1 ) dF,(s) + ~ ' _H(s)
f o ' ( Y~(,) 1
jot(
1
1 ) d[N.(s)- F.(s)]
=I, +12+13
(7)
(say).
It follows from (1) that the first term I1 in (7) turns into 1, = ~0 t (H(s))-2(H(s) - Y,,(s))dF.(s) + O(a~)
a.s.,
where a,, is defined by (3). Consequently, l~ + 12 =
(H(s))-2(H(s) - E,(s)) dF.(s) + ~ ' dN,,(s)
:
--
n
H(s)
d[N.(s) - F.(s)] + O(a~)
fo' Y,,,(s)dF.(s)+ O(a:)
~
.
'_2L
_1
H(,)2
~(Zi, Si, t)+O(a])
a.s.
(8)
i=1
To estimate 13, divide the interval [0, ] for < H into subintervals [xi,xi+l], i= 1,...,kn, where k, =O(a,TJ), and 0 = x l
L,) ~<2 max
sup
~ - 1
--
IY,, ( y ) - Y ,
I
(xi)-H
I(y)+H
I(xi)[
I <~ i <~ k,, y 6 [x~,x~ + 1]
+k,, sup I]1,, I(x) 0~x~< C max
sup
I ~i<~k,, yG[x,,x~ I]
H I(x)[1.
IY,(y) - Y,(xi)
H(y) + H(xi)l
• Cai / Statistics
387
Probability Letters 37 (1998) 381 389
+ C max tNn(xi+j) - N,(xi) - F,(xi+~) + F,(xi)l + O(a]) I ~.i<~kn
= 131 + 132 +
O(a~)
(say)
(9)
by (1) and (2). Clearly, it follows from Lemma 3 that 131=C
max
sup
]K(y,n)-K(xi, n)l/n+O(b,,)
a.s.
I <~i~k,, vE[)c,,x/~t]
By the law of the iterated logarithm for the Kiefer processes (see, e.g., Theorem 1.14.2, p. 79 in Cs6rg6 and R6v6sz, 1981 ), we have
(,oglog)J2)
[31=O
",
+ O(b,) = O(b,)
~
a.s.
(10)
Likewise, 132 = O(bn)
a.s.
(11)
Therefore, by combining ( 7 ) - ( 1 1 ) , we have established the following result. Theorem 2. Under assumptions ( K 1 ) - ( K 3 ) , A(t)=
- ) ?/
~(Zi,(Si, t ) + r n ( t ) ,
i=1
where sup Irn(t)l = O(bn)
a.s.
0~
Jbr any
< ft, and b~ is defined by (6).
Since H(Zn:~) ~ 1 by Theorem 1, then for any 0 < < H we have that 0 < n. Therefore, Lemma 1 in Breslow and Crowley (1974) gives
< Z,:, for sufficiently large
n - Y,,(t) O< - log(l - F n ( t ) ) - An(t)< - - ,
nYn(t)
which implies that: p (
log(1 - F , , ) , A , , ) < ,
n-L,(
hr.( )
)
<~C( )/n
(12)
for sufficiently large n, where 0 < C( ) < vc is independent of n. Using the Taylor expansion, we have F,,(t) -1 ~e
F(t) F ( t ) - (1 - F,(t)) A(t)
elog( I - F,,(t))
= e '~,:~°[An(t) - A(t)] + e A2"~t)[_ log(1 - Fn(t)) - An(t)] = e-A2(O+AU)e AU)[An(t ) -- A(t)] ÷ e a'*'*(t)+A°(t)e--A"(t)+A(t)e A(t)[_ log(1 -- Fn(t)) - A,(t)],
(13)
388
. Cai / Statistics
Probability Letters 37 (1998) 381 389
where (14)
p (A~,A)<~p ( A . , A ) ,
and, from (12) p (A~*,A.)<~p ( - log(1 -Fn),A,,)<<.C( )In.
(15)
Therefore, it follows from (4), (12)-(15) that Fn(t) - F ( t ) =
e-~lO[A.(t) - A(t)] + O(1/n) + O(p2(A., A))=F(t)[A.(t) - A(t)] + O(a2.)
almost surely. Then, the following theorem has been established.
Theorem 3. Under assumptions (K1)-(K3), f.(t) - F(t)-
-±
f(_t) n
~(Zi ' Oi, t ) q- n(t),
i=1
where
sup I . ( t ) l = O ( G )
a.s.
0~
for any
< H, where b. is defined in (6).
3. Asymptotic normality We now present our asymptotic normality of the K-M estimator based on our strong representation result. It is easy to see from Lemma 1 that {~(Zi, c%,t)}i is a sequence of stationary a-mixing bounded random variables. In order to obtain the asymptotic normality for K-M estimator, we just apply Theorem 18.5.4 in Ibragimov and Linnik (1971) and establish the following results:
Theorem 4. Under assumptions (K1)-(K3), v~[A.(t) - A(t)]~N(O, a2(t)) for t E [0, ] Jbr any
< H, where fiX3
a2(t) = Var(~(Zi, 61, t)) + 2 Z
Cov(~(Zi, 6j, t), ~(Zj, 6i, t)).
j 2
Theorem 5. Under assumptions (K1)-(K3), 2
--2
x/n[Fn(t) - F(t)]-+N(O, a (t)F (t)) jbr t E [O, ] for any
< H.
References Auestad, B., Tj stheim, D., 1990. Identification of nonlinear time series: First order characterization and order determination. Biometrika 77, 6 6 9 ~ 8 7 . Breslow, N., Crowley, J., 1974. A large sample study of the life table and product limit estimators under random censorship. Ann. Statist. 2, 437-453.
• Cai / Statistics
Probability Letters 37 (1998) 381 389
389
Cai, Z.W., Roussas, G.G., 1992. Uniform strong estimation under :t-mixing, with rates. Statist. Probab. Lett. 15, 47-55• Cs6rg6, M., R6v6sz, P., 1981. Strong Approximations in Probability and Statistics. Academic Press, New York. Dhompongsa, S., 1984. A note on the almost sure approximation of the empirical process of weakly dependent random variables. Yokohama Math. J. 32, 113 121. Gill, R.D., 1980. Censoring and Stochastic Integrals. Mathematical Centre Tracts No. 124, Mathematisch Centrum, Amsterdam. Gill, R.D., 1981. Testing with replacement and the product limit estimator. Ann. Statist. 9, 853-860. Gill, R.D., 1983. Large sample behavior of the product limit estimator on the whole line. Ann. Statist. I 1, 49-56. lbragimov, I,A., Linnik~ Yu.V., 1971. Independent and Stationary Sequences of Random Variables. Walters-Noordhoff, Groningen, the Netherlands. Kaplan, E.L., Meier, P., 1958. Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, 457-481. Lo, S.H., Singh, K., 1986. The product-limit estimator and the bootstrap: some asymptotic representations. Probab. Theory Rel. Fields 71, 455-465• Peterson, A.V., 1977. Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions. J. Amer. Statist. Assoc. 72, 854-858. Stute, W., Wang, J.L., 1993. A strong law under random censorship. Ann. Statist. 21, 1591-1607. Voelkel, J., Crowley, J., 1984. Nonparametric inference for a class of semi-Markov process with censored observation. Ann. Statist. 12, 142 160. Wang, J.G., 1987. A note on the uniform consistency of the Kaplan-Meier estimator. Ann. Statist. 15, 1313-1316• Withers, C.S., 1981. Conditions for linear processes to be strong mixing. Z. Wahrsch. verw. Gebiete 57, 477-480. Ying, Z., Wci, L.J., 1994. The Kaplan-Meier estimate for dependent failure time observations. J. Multivariate Anal. 50, 17--29.