1999, 19( 3):251-260
.At'fllhdtl{"cta!;PcStJ·
1~~JI~fIl STRONG REPRESENTATIONS OF THE SURVIVAL FUNCTION ESTIMATOR ON INCREASING SETS FOR TRUNCATED AND CENSORED DATA 1 Sun Liuquan ( 1IJ-~1:- ) Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing 100080, China Zheng Zhongguo ( ~:t I!l ) Department of Pmbability and Statistics, Peking University, Beijing 100871, China Abstract
In this paper, based on random left truncated and right censored data the
authors derive strong representations of the cumulative hazard function estimator and t luproduct-limit estimator of the survival function, which are valid up to a given order st.atisti« of the observations. A precise bound for the errors is obtained which only depends on the index of the last order statistic to be included.
Key words Truncated and censored data, cumulative hazard function, product-limit estimator, strong representations.
1991 MR Subject Classification
1
62G05,62E20
Introduction and the Main Results In this article we study survival data which are subject to both left truncation and right
censoring (LTRC). More specifically, let (X, T, Y) denote random variables where X is the variable of interest, called the lifetime variable, with unknown distribution function (d.f.) F: T is the random left truncation time with arbitrary d.f, G, and Y is the random right censoring time with arbitrary d.f, H. It is assumed that X, T, Yare mutually independent. In the random LTRC model one observes (Z, T, 15) if Z 2: T where Z = X 1\ Y = min(X, Y) and 6 = I(X S Y) is the indicator of censoring status. When Z < T nothing is observed. Let a == P(T S Z) > O.
and W denote the d.f, of Z, i.e., 1 - W = (1 - F)(l - H). Let (Zj, T,o b, ).t = 1. 2..... n be an independent and identically distributed (i.i.d.) sample of (Z, T, 6) which one observes (i.e., Zj ~ Ii). A traditional approach consists in estimating the so-called cumulative hazard function
A(x)
=
1'"
-00
dF(u) 1- F(u-)
first and then using a one-to-one correspondence (see (1.2)) between the survival function and its corresponding cumulative hazard function. Define C(x)
= P(T S z S ZIT S Z) =
Q-
1
G(x)[1 - W(x-)],
1 Received Apr.25,1997; revised Sep.17,1997. The research is supported by the Doctoral Programme Foundstion and the National Science Foundation of China.
ACTA MATHEMATICA SCIENTIA
252
W 1 (x ) = P(Z
~ x, s = 11 T ~ Z) = a-I IX
VoU9
G(-u)(l -
si«: ))dF(u).
oo
It can be shown that
A( ) = x
jX -00
dW1(u) C(u) .
Let Cn (:7;) and WIn (:7;) be the empirical estimators of C( x) and WI (x) respectively. n
Cn(x)
= 11,-1 L
i=1
and
I(Ti ~
= 11.- 1 L I(Zi
X
n
W 1n(x)
i=1
~
X,
~ Zi)
s, = 1).
Hence a natural estimator of A is An A
(')
X
_
-
.jX -00
dW1n(u) _ ~ I(Zi ~ x, (ji = 1) -L..J ' Cn(u) i=1 nCn(Zi)
(1.1)
which is comparable to the Nelson-Aalen estimator of the cumulative hazard function for right censored data. The above-mentioned one-to-one correspondence between F and A is as follows: 1- F(x)
= e~Ac(x) II (1- ~A('u)),
( 1.2)
tt~X
where Ac(x) is the continuous part of A. Since An is purely discrete and has no ruass at the censored Z's, the product-limit estimator (PLE)Fn pertaining to An is defined in Tsai et al,UJ as
1- Fn(x) = II (1- [nC
n(Zi)tl
)'5 '.
( 1.3)
i:Z.:S;x
For any d.f. L denote the left and right endpoints of its support by at. = inf {:7; : L( x) > U} and = sup{x : L(x) < I} respectively. For LTRC data, Gijbels and Wang[2] pointed out that F
bt.
is identifiable if ( 1.4)
Fn ,
If F is continuous, Gijbels and Wang[2] obtained an almost sure representation of An and uniformly over comaeta, in terms of a sum of i.i.d. random processes plus a negligible
remainder term of the order O(n- 1Iogn) whenac
<
aw; Zhou[3] established a strong ap-
proximation for An and Fn by a certain linear functional of an empirical process at the rate O(n- 110g 1 +' n) for any E > 1/2 when aG = aw under the integral condition that
1
00
aw
dF(u)
G2(u) <
.
(1.5 )
00,
Gu and Lai[4] obtained an i.i.d. representation of a properly modified PLE up to Z(r n
).
with
= < f3 < 1/2), again under some technical assumptions on the tails. where Z(I)' Z(2).···. Z(n) are the ordered Z-values. Furthermore, in their (3.13) and (3.14). the error bounds increase ill IJ while the set raw, Z(r n ) ] on which F is estimated decreases.
r'n
n - [cn 13 ] (1/3
No.3
Sun & Zheng: STRONG REPRESENTATIONS OF SURVIVAL FUNCTION ESTIMATOR
Fn
In this paper we consider the estimators
253
and An on increasing sets iu geueral case.
Fn and An will be derived which are valid on intervals [aw. bn], while bn r may be such that limn bn = bw. Although we have many choices for bn , for example. we may Representations of
--+ oo
=
take for bn a propel' qn-quantile of W, i.e. W(b n) qn, there is only one way to efficiently select the b~s: let the data speak for themselves. In other words, we are tempted to put bn = Z(r n)
= [en], 0 < e < 1, we are in a situation related to [2] and [3]. A more important case occurs if T « = n - Sn and Sn = o(n). In such a situation bn -+ bvl' so that finally
for some 1 :::; r« :::; n. If r-.
we may covel' each law, b], b < bw, by law, bn] from some (random) no on. The representations which follow hold true uniformly over [aw, Z(rnj], and the bounds on the remainder teriu nicely reflect the loss of accuracy (in the worst-scenario case) if one chooses r« too large. As tedlllical tools we shall apply recent bounds for so-called U-statistic processes[51 and properly adapted results for weighted empirical d.f, 'so Our first results yields a strong (i.e. almost sure) representation of An. Needless to say, each of the two integrals below may be written as a mean of i.i.d. random variables. Theorem 1
Suppose that (1.4) and (1.5) hold. Let 1 :::; r«
Tnln Then A
An(x) - A(:r:) where for each
E
r
r
n-T n and - I - ogn
-+
< n be such that
oo.
(1.0 j
r
d[Wtn(u) - Wt(u)] Cn(u) - C(u) C(u) - }"w C2(-u) dWt(-u)
= }"w
0
+ Rn(x),
> O. sup
aw:::;.r:::;z(rn)
IR?,(x)1
(log n)4+< = 0 ( (2n- T2n)(I-n
-1
)
Tn)
a.s.
(1. 7)
Remarks (i) If r« = [en] with 0 < e < 1, the remainder term then becomes (up to logarithmic factors) O(n -1), which is the same as in [2] and [3]. (ii] UTn = n- Sn and Sn "" n 3 / 4(logn)2+t3 with (3 > 0, then the remainder is o(n- 1 / 2 ). In such a situation Theorem 1 could be utilized to yield an.invariance principle for n 1 / 2 (An(:r)-
A(x)), aw :::; x:::; bw, properly standardized and stopped at Z(T"j. (iii) If Tn "" n - n 1 / 2(logn)2+ t3 with (3 > 0, the remainder tends to zero (at least) with probability one. (iv) The condition 'Tnln T' may be replaced, if necessary, by 'Tnln "" An T' . The other condition in (1.6) is mild and will be needed for technical reasons .. Informally. it prevents Tn from being too large. Without such a condition also the bound (1.7) will be of little use. Corollary 1
(i) Under the assumptions of Theorems 1, for each sup
IAn(:r:) - A(x)! A
IlW:::;:r:::;z,rn)
=0 •
.
(
E
(log n)4+< ) -1 T2n)(l-n (2nTn)
> O. a.s.
(ii) Under the assumptions of Theorem 1, if F is continuous, then for each
"11'
IF,,(x) - F(x)1 = 0 A
sup
:::;:r:::;Zir"
I
( ( l O g n)4+< ) (2 )(1 -1 )2 n - T2n - n Tn
a.s.
(1.8 ) E
> O. ( 1.9)
254
ACTA MATHEMATICA SCIENTIA
Vol.19
Remark Corollary 1 yields explicit bounds on the deviations between An and A respectively Fn and F. In particular. we obtain o(l)-bounds for An - A and Fn - F up to the 'l'lIth order statistic if Tn = O( ti - ti 1/2(log n )2+13) and O( n - n 2/ 3 (log n) ~+J) (fJ > (J) resper.tively. The next result provides an almost sure representation for the PLE Theorem 2 Under the assumptions of Theorem 1, if F is continuous. then
r:
Fn(:c) - F(x) :::: (1 - F(x))[An(x) - A(x)]
where for each e
+ R~(x),
> 0, a.s.
2
(1.1l1)
Proofs of the Main Results For notational convenience, for any d.f. L, denote with
L_(x) == L(x-):::: limL(y). yTx
The quantile function of L iii given by L - 1 (u) :::: inf {x : L ( x)
2 'u}, 0 <
'll
< 1.
J/ :::: JU,s] :::: 0 if t > s.
In the sequel, we use the following convention Lemma 1[6] For any distribution function L, loa.
"J\~"--
(2.1)
Let W*(:I;)
= P(Z S x I Z
2 T),
L I(Z; S x),
G*(x) :::: P(T S x I Z 2 T), n
n
Wn(x) ::: n- 1
Gn(x) :::: n- 1
;=1
L I(T; S :I;). ;=1
Similarly to Lemma 2.3 of [6], we have Lemma 2 Let (J < < 1 be such that
e;
nfjn logn
- - -> 00
as n
-> 00.
Then for all n sufficiently large and each 0 < p < 1, (2.2)
Lemma 3
For each e
> 0; with
probability one we have
1-W(x-) ( ) : ::::O(logn)l+', o~w'(,r,)
Wn (X- ) - W * (X- ) j sup O~W'(,r,-)
V1-
:::: 0
( n- 1/ 2(logn) ~) 2 •
(2.3)
(2.4)
No.3
Still & Zheng: STRONG REPRESENTATIONS OF SURVIVAL FUNCTION ESTIMATOR
I
sup Gn(X) - G*(X) O~G'(:r-)<1 G*(x)
VI -
1= o(n- 1/
255 (2.5)
2(logn).!..=}!-).
Proof The first statement follows from Corollary 10.5.2 of [7], while the second and the third are consequences of [8]. Lemma 4 For bn < bw , we have
C(Zi) _ ---
sup
i:Z;~b" Cn(Zi)
o(
logn ) a.s. 1- W(bn-)
(2.6)
Proof The proof is similar to that of Lemma 1.2 and Corollary 1.3 of [9], hence we omit it. Lemma 5
Let aw < b < bw . Then for each
C,,(x) - C(x) su p b~:r-
E
> 0,
I = o(-1/2(1 n · og n ).!.±!.)
a.s.
2
(2.7)
It can be checked that sup
1 - G*(x)
C(x)
b~:c
Note that
C(x)
= G*(x) -
<
00
and
1- W*(x-)
sup
C(x)
b~:r-
W*(x-) and Cn(x)
= Gri(x) -
<
00.
(2.8)
Wn(x-) ..
Then using Lemma 3, we get
I
sup Cn(x) - C(x) b~:c
sup
b~:I:
I
Gn(x) - G*(x) G*(x) I
VI -
1 - G*(x)
+ sup Wn(x-) - W*(x-) b~:c
=
I
0 (
n -1/2 (log n)!.:p )
I
C(x) ,------
1 - W*(x-)
I
C(x)
a.s.,
which completes the proof of Lemma 5. In our next lemma we derive an almost sure uniform bound for
Lemma 6
Under the assumptions of Theorem 1, we have lan(x)1
sup aw~:c~Z(.,,)
=
o(J
10gn) n - Tn
a.s.
Proof Note that ZIT,,) = W;1(n- 1Tn) = Wn(l- (3,,), {3n Lenuna 2 it suffices to show that sup
aw~",~b"
lan(x)1
=0
og n ) (~
-{3 , n n
=1-
(2.9)
n- 1".". According to
(2.10)
256
ACTA MATHEMATICA SCIENTIA
Vol.19
where bn = W*-l(l- t,Bn). In view of (1..5), the process an(x) is an empirical process on VC classes of functions with square integrable envelope over aw x b, where aw < b < bw. so it satisfies the LIL (see [10],[11]), i.e.
:s :s
sup
a.wS:r.Sb Now put Vi(:];)
lan(x)/
= O(n- 1 / 2(10glogn?/2)
a.s.
(2.11 )
= J(b < Zi :s z , Iii = l)/C(Zi), so that
r d[W
Jb
Note that 1- W':
:s rx-
1
1n(d(:)
= n- 1 i)V;(x) .=1
W 1(u)]
EV;(x)].
(1 - W_) and W*(W*-l(x)_):; x. Then we get that for b:S:c:S bn.
2
:s G(b),Bn
V;(x)
. 4 Var(V;(x)):; G2(b),Bn
and
From Bennett's inequality[12], for K >·0 we obtain
v« ==
p( l,t' d[W1n(u)C(u)- W1(u)] 2: JKlOgn) n,Bn 2..:..(b-,--)_10.:::.g_n
< exp [ _ _K_G_ -
where W(~;)
= or2, [( 1 + x) log( 1 + x) -
= 1. we may choose
8
x] for x
w(G
2(b)y'KlOgn)] 2y'n,Bn'
= 1.. Since the llJ-term converges to that P« = O(n-m)o Moreover. nil bounds
> 0 and W(O)
K > O. for any m > 0, such do not depend on x. Hence by Borel-Cantelli, we have 1lJ(0)
max (rxn(x) - rxn(b))
orES.
= 0 (Jlo~n) n/Jn
a.s.
for every set S" of x's with b :;
~; :; bn such that the cardinality of Sn increases at a polynomial rate. Similarly for -rx n. Assuming for a. moment that A is continuous, we may choose SI! {Xni : 15: i :; m n } such that b == Xnl < ... < x nmn bn and
=
A(Xn,i+d - A(xn;) Using monotonicity of A and sup
bS:rSb.
r,r
.. -00
:s
W
n == JlOgn ---r..I' npn
d~t()"), we get 1:t
Irxn(x) - rxn(b)1 :; sup Irxn(x) - rxn(b)1 :rE S.
+Wn = 0
(JlO~ n) n/Jn
a.s.
(2.12)
The general case may be traced back to the continuous case by a quantile transfonuation y = W 1(u) and
Hence the proof of Lemma 6 is completes by (2.11), (2.12) and sup
awS:rSbn
lu,,(:/;)I:; 2
sup
awS:r.Sb
lan(x)/
+
sup
bS:rSb n
Inn(x) - rxn(b)l·
No.3
Still & Zheng: STRONG REPRESENTATIONS OF SURVIVAL FUNCTION ESTIMATOR
-
Define
1 - F n(X) Lemma 7 (Ii )
(II) ..
1F n (.) :], -
SUPaw:Sx:SZ(rn) SUPaW:S:":SZ(rn)
Fn),
II (nCn(Z;) nCn(Z;) )6; +1 .
I:Z;:Sx
With probability one, for each e FAn ( X)1-
o( n(l-n
> 0,
(logn)4+<
'rn)'
l-log(l- Fn(x)) - An(X)I- 0 -
A
_
n
n
n
;=1
;=1
;=1
).
,
(logn)4+<)
n(l-n
I IT c; - IT a; I ::; I: Ic; -
Proof As usual, applying log( 1 -
=.
we get respectively
-
.
\Fn(x) - Fn(x)l::; n
sup
aw:s:":sz(rn)
_lI
Z
(r n )
aw
For aw
l-log(1-F n (x)) - An(x)I::;2n- 1 -
A
'r n)"
.
[c, I.
Id; I
< 1. and expaudiug
dW1n(,u) C2( ) ,
(2.13 )
d; I.
and sup
257
.
1 Z
n U
(r
nJ
dW1n ('u ) C?' ;;:(u)
(2.14)
< b < bw , by Lemma 2 of [3], we obtain n
-11
b
1n Cu) - oi -11 dW ) a.s. C2(.) n ogn
al-V
n
(2.15)
U
Note that
Theil using Lemma 1. 2, 3. 4. we get
n_ 1
1 Z
nl
(r
dW1n (,u)
C;Cu)
b
=0 (
2
(logn)4+ € ) n(l - n- l rn P
a.s.
(2.16)
Lemma 7 is immediate from (2.13)-(2.16). Proof of Theorem 1
Using
we have
(2.17) where
Un(x)
=L n
I.
;=1
SnIlx)
=
R n1(x)
=n
R-n2(X)
I(T' < z. < Z·) L -2(~.) I(Z; < x)o;, j~; c.
J
J
dW1n(n) C(1l) ,
1n(u) .-11:" dWC2(u) ,
=
aw
1 x
aw
[Cn(U) - C(u)j2 Cn(u)C2(u) dW1nCn).
258
ACTA MATHEMATICA SGIENTIA
V,,1.19
Introducing the varirbles V;=
we obtain I(Z; ::; x)8;
= 1(V;
::; x). Then
{
= 1, 8; = 0,
Z;
if 8;
00
if
tt;
may be written as
each of these summands contributes to a U-statistic process as studied in [5]. Theorem 5 there yields a representation in terms of the pertaining Hajek projection and a remainder. In particular, this approach leads to (2.18 ) where
(2.19) and
for each m 2: 2 and aw ::; bn
aw < b < bw, bn and
=
bw , here Co is a constant depending only on m. Now put m=2, W*-l(l- ~,Bn) where as before d; 1-n- 1rn . Note that w d~l,,';) < 00 ::;
J:
=
Then by Lemma 1, we have
Therefore, (2.20) Using (2.20), similarly to Theorem 1.2 of [5], we obtain that for any
E
> 0, (2.21)
Hence applying Lemma 2 and (2.21), we get that for each
E
> 0, (2.22)
For R n 1, note that
No.3
Sun & Zheng: STRONG REPRESENTATIONS OF SURVIVAL FUNCTION ESTIMATOR
259
The strong law of large numbers and (1.5) yield
r
b
law
dW1n(u) _ 0(1) C2(u) -
a.s,
(2.23)
By Lemma 1 and 3, we get that for each e > 0, rZ('nl
lb
dW1n(u) < C2(U) -
rZ('n)
l,
(1- W*(u-))2 (1- W n(U-))2 C2(u)
1- W*(u-)
= 0 ({3,";" 1(log n )2+2< ) Hence for each e
dWn(u) (1- W I l ( u - ) ) 2
a.s.
(2.24)
> O. sup aw::;,r,::;Z('n)
IRn1(x )/ = 0(n- 1{3,";"1(log n)2+ 2<)
a.s.
(2.25)
Now using the LIL, the strong law of large numbers, Lemma 4 and (1.5), we have
I
b
aw
[Cn(u) - C(u)F -1 ' )C2(. ) dW1n(U)=0(n log nlog log n) C( n II U
Applying Lemma 2, 4, 5, and (2.24), we get that for each e
a.s.
(2.26)
> 0,
(2.27)
Thus for each
E.
> O.
aw~~r,{z('n) IR 2(X)1= 0 n
(
(log n)4+3 < )
nf3~
a.s,
(2.28)
Since the factors ~ and n~l may replaced by 1 without disturbing (1.7), putting together (2.17), (2.18), (2.22), (2.25) an~ (2.28), we complete the proof of Theorem 1. Proof of Corollary 1 (i) Similarly to (2.28), we get that for each E. > 0,
I aw::;x::;z('n) sup
J
Cn(u) - C(u) C2() dW1(u) U
(logn) .!.±.!. 2 ) _ )1/2
I = 0 ( (n
r«
a.s.
(2.29)
Hence Corollary 1 (i) follows from (1.7), (2.29) and Lemma 6. Proof of Corollary 1 (ii) For continuous F, (1.2) becomes 1- F
= exp(-A).
(2.30)
The assertion follows from Lemma 7, Corollary 1 (i), (2.30) and the inequality
13: - :til :::; [log x
- log:til,
°<
x, y
<
1.
In the next proof we tacitly assumed that the right-hand side of (1.7) is bounded Otherwise the assertion of Theorem 2 becomes empty.
&'3 ti -+ 00.
260
ACTA MATHEMATICA SCIENTIA
Proof of 'I'he-orem 2
For aw
< x <
Z(r n
) ,
applying Taylor's expansion (see [9]). we
have
F(x)-F n(x)=(l-F(x))[A(x)-An(x)J+
[A(x) - A (xW 2 n exp(-A;.)
+[log(l - Fn(x)) + An(x )Jexp( -A~*), with some nonnegative
A~, A~*.
V,,1.18
(2.31)
Theorem 2 is immediate from Corollary 1, Lemrna 7 and (2.31). References
1 Tsai W Y, Jeweli N P, Wang M C. A note on the product limit estimator under right censoring and left truncation. Biometrika, 1987, 74: 883-886 2 Gijbels I, Wang J L. Strong representations of the survival function estimator for tnmeated and eensored data with applications . .J Multivariate Anal, 1993, 47: 210-229 3 Zhou Y. A note on the TJW product-Iimit estimator for truncated and censored dat a. Statist P,'"1,,,b Lett. 1896. 26: 381-387 4 Gu M G, Lai T L. Functionals laws of the iterated logarithm for the product-limit estimator of a distribution function under random «ensorship or truncation. Ann Probab, 1890. 18: 160-188 5 Stute W. U-statistic processes: a martingale approach. Ann Prcbab, 1895, 22: 1725-1744 6 Stute W. Strong and weak representations of cumulative hazard function and Kaplan-Meier estimators on increasing sets. J Statist Plan Inference, 1894, 42: 315-328 7 Shorack G R. Wellner J A. Empirical Processes with Applications to Statistics. New York: Wiley. 1886 8 Csaki E. Some notes on the law of the iterated logarithm for empirical distribution funct.ion. In: Collq Math Soc Janos Bolyai, Limit Theorem of Probability Theory, ( Revesz P ed.). North-Holland. Amsterdam. 1978.11: 45-58 9 Stute W. Almost sure representations of the product-limit estimator for truncated data. Ann Statist, 1993. 21: 146-156 10 Alexander K, Talagrand M. The law of the iterated logarithm for empirical process on Vanik-Cervmenkis classes. J Multivariate Anal, 1989,30: 155-166 11 Arcones M A. Gine E. On the law of the iterated logarithm for canonical Ll-statistics and processes. St och Proc Appl, 1995. 58: 217-245 12 Bennett G. Probability inequalities for the sum of independent random variables. J Amer Stat'st Assoc. 1962, 57: 33-45