Journal of the Korean Statistical Society 38 (2009) 331–337
Contents lists available at ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
A note on score statistic for grouped data Hyo-Il Park Department of Statistics, Chong-ju University, Chong-ju, Choong-book 360-764, Republic of Korea
article
info
Article history: Received 4 August 2007 Accepted 21 October 2008 Available online 22 January 2009 AMS 2000 subject classifications: primary 62N03 secondary 62G10 Keywords: Log-rank statistic Nonparametric test Proportional hazards model Two-sample problem
abstract In this paper, we consider modifying the score statistic proposed by Prentice and Gloeckler [Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped data with applications to breast cancer data. Biometrics, 34, 57–67] for the grouped data under the proportional hazards model. For this matter, we apply the likelihood method and derive the scores without re-parameterization as a discrete model. Then we illustrate the test with an example and compare the efficiency with the test of Prentice and Gloeckler’s statistic by obtaining empirical powers through simulation study. Also we discuss some possible extension and estimated variances of the score statistic as concluding remarks. © 2009 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
1. Introduction Suppose that we have a sample,{(Ti , δi , z i ), i = 1, . . . , n}, which consists of the survival times, censoring indicators (δi = 0 or 1 according as Ti is censored or not) and p-dimensional regression vectors. Then the proportional hazards model (cf. Cox (1972)) implies that for each individual i,
λi (t ) = λ0 (t ) exp( z 0i β)
(1.1)
where λi and λ0 are the ith individual hazard and baseline hazard functions, respectively and β is the corresponding vector of regression coefficients which is our main interest. We assume that the baseline hazard function λ0 is an arbitrary nonnegative function of t. In this study, we consider the following situation during an experiment. A researcher may observe objects periodically or irregularly but with some time-schedule. Then the collected data set has only the information that the survival time for each individual belongs to one of the finite number of time sub-intervals and whether it is censored or not with an individual p-dimensional regression vector. We call those the grouped data. For the grouped survival data, Cox (1972) analyzed the data by applying the discrete logistic model whereas Prentice and Gloeckler (1978) considered inferences for the data based on the discrete proportional hazards model (cf. Kalbfleisch and Prentice (1973)) using the likelihood principle. In particular Prentice and Gloeckler (1978) derived a score statistic for testing H0 : β = 0, whose form may be considered as a generalized or weighted log-rank statistic for the grouped data. However during the process of construction of the likelihood function, all the observations in the final sub-interval have been treated as uncensored whatever their censoring status might be. This means that they ignored the information about censoring status of the observations in the last sub-interval. Therefore in this study, we consider some modification of the score statistic proposed by Prentice and Gloeckler (1978) to accommodate the information about censoring status for those in the final sub-interval. For this matter, our arguments will be rather straightforward without re-parameterization of the survival probability for each
E-mail address:
[email protected]. 1226-3192/$ – see front matter © 2009 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jkss.2008.10.009
332
H.-I. Park / Journal of the Korean Statistical Society 38 (2009) 331–337
sub-interval. Then we illustrate the test based on the modified score statistic with an example and compare the efficiency with the test of Prentice and Gloeckler’s score statistic by obtaining empirical powers through simulation study. Also we comment briefly on another direction for the potential modification of the score statistic with choosing suitable consistent estimates for the survival function and/or using a weight function. Finally we discuss two kinds of consistent estimates for the variance of generalized log-rank statistics. 2. Score statistic Suppose that the positive half-real line [0, ∞) is partitioned into k sub-intervals such as Aj = [aj−1 , aj ),
j = 1, . . . , k,
with the notation that a0 = 0 and ak = ∞. Then we note that each observation Ti belongs to only one of sub-intervals, Aj ’s. Let Dj represent the set of labels attached to individuals falling in the jth sub-interval Aj and Cj , the set of labels attached to individuals censored in Aj . Also let Rj be the set of labels attached to individuals censored in Aj or observed to survive beyond Aj . In the following, we assume that the uncensored observations precede the censored ones if they are observed in the same sub-interval. Also we assume that the life-time and censoring random variables are independent in order to avoid the so-called identifiability problem. The precedence of uncensored observations over censored ones means that the censored observation in Aj is known to have survived the whole sub-interval Aj except for the censored observations in Ak , the final sub-interval. In order to accommodate the censored observations in Ak , we consider to partition Ak further by introducing a hypothetical point akc such that ak−1 < akc < ak = ∞. Then by the assumption that uncensored observations precede censored observations in the same sub-interval, one may consider that all the uncensored observations in Ak are contained in [ak−1 , akc ), while all the censored ones belong to [akc , ∞). Later, we will see that the choice of akc has no effect on the final result. Now let S0 be the corresponding survival function for λ0 . Finally we assume that the unknown censoring distribution does not depend on the vector of regression coefficients, β . Then under the model (1.1), it is well-known that the survival function Si corresponding to the ith individual is of the form Si (y) = S0 (y)exp[z i β] .
(2.1)
Since each observation has only the information of belonging to one of the k sub-intervals and censoring status with a regression vector, Pr{Ti ∈ Aj }, the probability that Ti belongs to, Aj , j = 1, . . . , k − 1 is proportional to Si (aj−1 ) − Si (aj )
1−δi
S i ( aj )
δi
(2.2)
with the assumption of independence between the life-time and censoring random variables. Then using the relation (2.1), we see that Pr{Ti ∈ Aj } ∝ S0 (aj−1 )exp[z i β] − S0 (aj )exp[z i β]
1−δi
S0 (aj )exp[z i β]
δi
.
For j = k, using the hypothetical point akc , we see that Pr{Ti ∈ Ak } ∝ S0 (akc )exp[z i β] − S0 (ak−1 )exp[z i β]
S0 (akc )exp[z i β]
1−δi
δi
.
Then when no censored observation exists in the final sub-interval Ak , the likelihood function becomes as follows: with all the assumptions introduced up to now L(β) ∝
k−1 Y Y
S0 (aj−1 )exp[z i β] − S0 (aj )
j =1
l∈D
Y exp[z β] i
l∈Cj
j
S0 (aj )exp[z i β]
Y l∈D
S0 (ak−1 )exp[z i β] .
k
From now on, we consider the one-dimensional regression coefficient case for the simplicity of our discussion since the extension to the p-dimensional case is only a notational matter. Then by taking the logarithm and differentiating the loglikelihood function, l(β) with respect to β , we have that
k−1 X X zl exp[zl β] S0 (aj−1 )exp[zl β] ln[S0 (aj−1 )] − S0 (aj )exp[zl β] ln[S0 (aj )] ∂ l(β) = ∂β S0 (aj−1 )exp[zl β] − S0 (aj )exp[zl β] j=1 l∈D j
+
k−1 X X j−1 l∈Cj
zl exp[zl β] ln[S0 (aj )] +
X
zl exp[zl β] ln[S0 (ak−1 )].
l∈Dk
Then for testing H0 : β = 0, a score statistic can be obtained by substituting 0 for β in ∂ l(β)/∂β as follows:
k−1 X X X X S0 (aj−1 ) ln[S0 (aj−1 )] − S0 (aj ) ln[S0 (aj )] Wn = zl + zl ln[S0 (aj )] + z ln[S0 (ak−1 )]. l∈D l∈D l S 0 ( aj − 1 ) − S 0 ( aj ) l∈C j =1 j
j
k
H.-I. Park / Journal of the Korean Statistical Society 38 (2009) 331–337
333
If the baseline hazard function λ0 or the corresponding survival function S0 were completely known, then the test procedure based on Wn might be fully efficient for testing H0 : β = 0. However since we assumed that we do not have any information about λ0 or S0 , one may try to obtain a test statistic by substituting any consistent estimate for S0 . Before carrying out such business, first of all, we look through the structure of Wn in some detail. From Park (1997), by identifying that cj =
S0 (aj−1 ) ln[S0 (aj−1 )] − S0 (aj ) ln[S0 (aj )] S0 (aj−1 ) − S0 (aj )
and Cj = ln[S0 (aj )], we have that for each j, j = 1, . . . , k − 1 cj =
= = =
S0 (aj−1 ) ln[S0 (aj−1 )] − S0 (aj ) ln[S0 (aj )] S0 (aj−1 ) − S0 (aj )
− ln[S0 (aj )]
S0 (aj−1 ) ln[S0 (aj−1 )] − S0 (aj ) ln[S0 (aj )] − S0 (aj−1 ) ln[S0 (aj )] + S0 (aj ) ln[S0 (aj )] S0 (aj−1 ) − S0 (aj ) S0 (aj−1 ) ln[S0 (aj−1 )] − S0 (aj−1 ) ln[S0 (aj )] S0 (aj−1 ) − S0 (aj ) S0 (aj−1 )
S0 (aj−1 ) − S0 (aj )
ln
S0 (aj−1 ) S0 (aj )
.
Therefore we have that Wn =
k−1 X j =1
S0 (aj−1 )
S0 (aj−1 ) X
S0 (aj−1 ) − S0 (aj )
ln
S0 (aj )
X
zl + ln[S0 (aj )]
l∈Dj
l∈Cj ∪Dj
zl
+ ln[S0 (ak−1 )]
X
zl .
l∈Dk
Furthermore we note that for each j, since ln[S0 (a0 )] = ln(1) = 0, ln[S0 (aj )] = ln[S0 (aj )] − ln[S0 (aj−1 )] + ln[S0 (aj−1 )] − ln[S0 (aj−2 )] + · · ·
+ {ln[S0 (a2 )] − ln[S0 (a1 )]} + ln[S0 (a1 )]. Thus we have that a =
k−1 X
ln[S0 (aj )]
zl + ln[S0 (ak−1 )]
k−1 X
X
zl
l∈Dk
l∈Cj ∪Dj
j =1
=
X
(ln[S0 (aj )] − ln[S0 (aj−1 )]) + · · · + ln[S0 (a1 )]
X
zl
l∈Cj ∪Dj
j =1
+ {(ln[S0 (ak−1 )] − ln[S0 (ak−2 )]) + · · · + ln[S0 (a1 )]}
X
zl
l∈Dk
=
k−1 X
X
ln[S0 (aj )] − ln[S0 (aj−1 )]
=
k−1 X
zl
l∈Cj ∪Dj
j =1
ln
S 0 ( aj ) S0 (aj−1 )
j =1
X
zl .
l∈Cj ∪Dj
Finally we obtain that Wn =
k−1 X j=1
S 0 ( aj − 1 ) S0 (aj−1 ) ln S0 (aj−1 ) − S0 (aj ) S0 (aj )
X l∈Dj
X S0 (aj ) zl + ln zl S0 (aj−1 ) l∈D ∪R
j
j
X k−1 X S0 (aj−1 ) S0 (aj−1 ) S0 (aj−1 ) − S0 (aj ) X = ln zl − S (a ) − S0 (aj ) S0 (aj ) l∈D S0 (aj−1 ) l∈D ∪R j=1 0 j−1 j
j
zl j
.
(2.3)
Therefore by substituting any consistent estimate for S0 , we can obtain a nonparametric test statistic for testing H0 : β = 0. If we consider the Kaplan–Meier estimate Sˆ0 for S0 , then Sˆ0 (ˆaj−1 ) Sˆ0 (aj−1 ) − Sˆ0 (aj )
=
1 dj /nj
=
nj dj
334
H.-I. Park / Journal of the Korean Statistical Society 38 (2009) 331–337
and
" ln
Sˆ0 (aj−1 )
#
Sˆ0 (aj )
= − ln[1 − dj /nj ],
where dj and nj are the number of deaths and the size of risk set in Aj . Therefore the resulting test statistic becomes
Wn =
k−1 X nj j =1
dj
− ln 1 −
dj
X
nj
zl −
l∈D
j
dj X
zl , nj l∈D ∪R j j
(2.4)
which was already proposed by Prentice and Gloeckler (1978). However if there exists any censored observation in Ak , with the hypothetical point akc which was introduced for further partition of Ak and the assumption of precedence of uncensored observations over censored ones, the corresponding likelihood function becomes as follows:
Y S0 (aj )exp[zi β] S0 (aj−1 )exp[zi β] − S0 (aj )exp[zi β] L(β) ∝ l∈D l∈Cj j =1 j X Y S0 (akc )exp[zl β] . S0 (ak−1 )exp[zi β] − S0 (akc )exp[zi β] × k−1 Y Y
l∈Ck
l∈Dk
Then by taking the same procedure and identifying the algebraic relations as before, we obtain the following score statistic by substituting Kaplan–Meier estimate for the baseline survival function S0
Wn =
k X nj j =1
dj
− ln 1 −
dj
X
nj
l∈D
j
zl −
dj X
zl , nj l∈D ∪R j j
(2.5)
In passing, we note that the choice of the hypothetical point akc has no effect on the form of the score statistic (2.5). The reason for this is that the jump for the Kaplan–Meier estimate of S0 can only occur at the uncensored observations. Also we may deduce that in the case that all the observations in Ak are censored, the form of score statistic becomes (2.4) for the same reason. Then with any suitable consistent estimate of the variance for Wn , one may set out for testing H0 : β = 0. Now by applying the permutation principle (cf. Good (2000)), a consistent estimate Vˆ n of variance for (2.5) can be obtained as
k X X qj zl2 − z¯j2 Vˆ n = nj l∈D ∪R j =1 j
(2.6)
j
P 1 where qj = {nj (nj − dj )dj }{ln[1 − dj /nj ]}2 and z¯j = n− j l∈Dj ∪Rj zl . Then under the two-sample problem setting which will be used in the next section for the example and simulation study, by taking 1 or 0 as the value of zl according to whether it comes from the first sample or the second, Vˆ n is of the form Vˆ n =
k X n1j n2j
qj
j =1
nj nj
(2.7)
where n1j and n2j are the sizes of risk sets for the first and second samples in Aj , respectively. In the later section, we will discuss more about the q estimation of variance in some detail. Then it is well-known that with some additional assumptions, the distribution of Wn / Vˆ n converges in distribution to the standard normal distribution under H0 : β = 0. 3. An example and simulation results In order to illustrate our test procedure, we show an example using a part of the data of time for assays to register HIV positivity considered by Makuch, Escobar, and Merill (1991). The 47 patients were allocated into two groups (23 in Group 1 and 24 in Group 2) according as taking drug 1 or drug 2. In those data, each patient was measured on four occasions at monthly intervals and observed daily whether each patient should be registered as HIV positive. Since a lot of tied-values appeared, this data set can be considered as the grouped data and should be analyzed by our proposed procedure. Originally this data set was treated as multivariate data. In this study, we considered the last occasion only and summarized them in Table 1 as follows. For more detailed discussion of the data, you may refer to Makuch et al. (1991). Then one may be interested in detecting whether there is any difference in times between the two groups due to the drug effect.
H.-I. Park / Journal of the Korean Statistical Society 38 (2009) 331–337
335
Table 1 The number of days before HIV positivity registered in the last month. Group 1
21 6
5 8
18 6
19+ 8
0+ 5
8 6
14 7
18 0+
8 9
5 6
3 12
5
Group 2
3 6
0+ 21+
5 14
18 16+
6 0+
6 18+
13 0+
7 13
8 10
22+ 21
7 12
6 9
Note: + implies censored. Table 2 Gamma distribution with sample sizes m = n = 20.
η
1
1.2
1.4
1.6
1.8
2.0
Proposed test
0.034
0.083
0.164
0.248
0.324
0.388
P–G test
0.040
0.060
0.106
0.162
0.199
0.238
Table 3 Gamma distribution with sample sizes m = n = 50.
η
1
1.2
1.4
1.6
1.8
2.0
Proposed test
0.031
0.137
0.295
0.488
0.638
0.783
P–G test
0.032
0.087
0.181
0.281
0.389
0.478
Table 4 Exponential distribution with sample sizes m = n = 20.
η
1
1.2
1.4
1.6
1.8
2.0
Proposed test
0.039
0.085
0.169
0.277
0.375
0.480
P–G test
0.036
0.078
0.150
0.223
0.283
0.357
Table 5 Exponential distribution with sample sizes m = n = 50.
η
1
1.2
1.4
1.6
1.8
2.0
Proposed test
0.033
0.146
0.353
0.568
0.734
0.857
P–G test
0.036
0.134
0.279
0.456
0.596
0.715
We obtained W47 = 4.82, Vˆ 47 = 8.72 and 0.102 as its asymptotic p-value for the two-sided test which tells of some insignificant aspect between the two groups. Also we note that the procedure based on (2.4) which was proposed by Prentice and Gloeckler (P–G) must yield the same result since there is only one censored observation (22+) in the last sub-interval. In order to compare the efficiency of our test procedure with that of P–G, we consider to obtain the empirical powers through simulation study under the two sample problem setting. For this, the covariate zi for the individual i takes on value 1 if the individual i belongs to the first sample and 0, otherwise. Then under the model (1.1), we see that with the underlying survival function S0 , S2 (t )/S1 (t ) = [S0 (t )]exp(β)−1 = [S0 (t )]η−1 , where S1 and S2 are the survival functions for the first and second groups, respectively. Then we note that when β = 0 or η = 1, the two survival functions, S1 and S2 coincide. We obtain the empirical powers by varying the value of η. Therefore in the simulation results, η = 1 means that the simulations were carried out under the null hypothesis. As for the underlying survival function S0 , we consider three types of distributions such as gamma with scale parameter λ = 1 and shape parameter α = 2, exponential with λ = 1 and chi-square with 2 degrees of freedom. Also for the censoring distribution, we only use the exponential with λ = 1/6 in order to avoid excessive censoring. For the sample size, we consider the equal sample sizes for both samples and take two cases with 20 and 50 each. The nominal significance level is 0.025 for all cases. The value of η has been varied from 1.0 to 2.0 with increments by 0.2. All the computations have been carried out through SAS/IML with 1000 simulations. The empirical powers are summarized in Tables 2–7. In the tables, the proposed and P–G tests imply that the simulations have been carried out based on the statistics (2.4) and (2.5), respectively. From the tables, we note that our proposed procedure shows better performances than the P–G test which all the observations in the last sub-interval have been treated as censored or uncensored. Therefore our procedure should be effective naturally for the case that the censored and uncensored observations co-exist in the last sub-interval. Also as we might expect, along with the increase of sample sizes, the empirical powers increase and the nominal significance level can be more closely achieved.
336
H.-I. Park / Journal of the Korean Statistical Society 38 (2009) 331–337
Table 6 Chi-square distribution with sample sizes m = n = 20.
η
1
1.2
1.4
1.6
1.8
2.0
Proposed test
0.038
0.083
0.163
0.239
0.317
0.388
P–G test
0.029
0.066
0.112
0.163
0.212
0.265
Table 7 Chi-square distribution with sample sizes m = n = 50.
η
1
1.2
1.4
1.6
1.8
2.0
Proposed test
0.032
0.140
0.298
0.488
0.647
0.763
P–G test
0.025
0.099
0.201
0.304
0.426
0.531
4. Some concluding remarks Suppose that an experimenter makes an observational schedule such as he or she will not observe experimental units beyond ak−1 , which is the end point of the sub-interval Ak−1 . Then for the observations in the last sub-interval Ak , since one has the definitive information that they all survived beyond the final observational time-point ak−1 and will not be observed any more, it would be natural that one may consider that they can all be treated as uncensored or censored. In this case, the statistic (2.4) considered by Prentice and Gloeckler (1978) should be used for testing H0 : β = 0 without any loss of efficiency for the data. However when the experimenter observes experimental units during Ak for some time, then there might co-exist the censored and uncensored observations in Ak together. Even after confirming the co-existence of the censored and uncensored observations in Ak , if one insists on using the statistic (2.4) rather than (2.5), then this approach may induce some negative effect for ignoring further information about observations in Ak as we have seen from the results of simulation study, which show some loss of the powers of the test in the previous section. We note that the score statistic (2.4) or (2.5) is a generalized log-rank statistic for the grouped data. Also we note that by choosing a suitable consistent estimate for S0 other than the Kaplan–Meier estimate in (2.3), one may obtain the other form of generalized log-rank statistic. For example, if one substitutes the Kaplan–Meier estimate for S0 (aj−1 )/ S0 (aj−1 ) − S0 (aj ) and the Nelson–Aalen estimate for ln[S0 (aj−1 )/S0 (aj )], then one may obtain
Wn =
k X X j =1
l∈D
zl −
j
dj X
zl , nj l∈D ∪R j j
(4.1)
which is a direct extension of the log-rank statistic to the grouped data. Therefore this may be an advantage of our approach over the re-parameterization one since we can choose any consistent estimate for S0 . Also at this moment, it would be worthwhile to suggest that by introducing a weight function wj to (4.1) one may construct a class of generalized log-rank statistics for the grouped data such as
Wn =
k X
wj
X l∈D
j =1
zl −
j
dj X
zl , nj l∈D ∪R j j
(4.2)
With (4.2), one can apply the test procedure to various models such as the location translation model and the accelerated life-time model. In this direction, Neuhaus (1993) considered the following statistic for testing H0 : β = 0 under the twosample problem setting
˜n = W
k X
n1j , wj d1j − dj
(4.3)
nj
j =1
where d1j and n1j are the number of deaths and the size of risk set for the first sample in Aj , respectively. We note that the coincidence between (4.2) and (4.3) is obvious for the two-sample case. Finally we comment on the estimated variances of Wn . The estimated variance Vˆ n in (2.6) and (2.7) has been derived by Kalbfleisch and Prentice (1980). They used the permutation principle. In passing we would like to note that the estimated ˜ n obtained by Neuhaus is of the form with wj = −(nj /dj ) ln[1 − dj /nj ] variance V˜ n for W V˜ n =
k X nj j =1
dj
ln 1 −
dj nj
2
n1j n2j nj nj
dj
nj − d j nj − 1
=
k X n1j n2j
qj
j =1
nj
nj nj nj − 1
.
We note that V˜ n was obtained through the counting process approach. In this case, Vˆ n can achieve somewhat smaller numerical value than V˜ n . Latta (1981) carried out an extensive comparison study between the two types of estimated
H.-I. Park / Journal of the Korean Statistical Society 38 (2009) 331–337
337
variances by obtaining empirical powers through simulation study and concluded that V˜ n is superior to Vˆ n in general. However for the case of Vˆ n , it is necessary to assume that the two censoring distribution functions, which are completely unknown and nuisance entities, should be identically distributed in order to apply the permutation principle. Acknowledgment The author would like to express his sincere appreciation to the two anonymous referees for suggesting constructive advice. References Cox, D. R. (1972). Regression models and life-time tables (with discussion). Journal of the Royal Statistical Society Series B, 34, 187–220. Good, P. (2000). Permutation test—A practical guide to resampling methods for testing hypotheses. New York: Springer. Kalbfleisch, J. D., & Prentice, R. L. (1973). Marginal likelihoods based on Cox’s regression and life model. Biometrika, 60, 267–278. Kalbfleisch, J. D., & Prentice, R. L. (1980). The statistical analysis of failure time data. New York: Wiley and Sons. Latta, R. B. (1981). A Monte Carlo study of some two sample rank tests with censored data. Journal of the American Statistical Association, 76, 713–719. Makuch, R. W., Escobar, M., & Merill, S. (1991). A two sample test for incomplete multivariate data. Applied Statistics, 40, 202–212. Neuhaus, G. (1993). Conditional rank tests for the two-sample problem under random censorship. The Annals of Statistics, 21, 1760–1779. Park, H. I. (1997). A note on the relation between two forms of linear rank statistics for right censored and grouped data. Biometrika, 84, 987–988. Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped data with applications to breast cancer data. Biometrics, 34, 57–67.