Bootstrap confidence intervals for the mode of the hazard function

Bootstrap confidence intervals for the mode of the hazard function

Computer Methods and Programs in Biomedicine (2005) 79, 39–47 Bootstrap confidence intervals for the mode of the hazard function Josmar Mazuchelia,∗, ...

394KB Sizes 2 Downloads 191 Views

Computer Methods and Programs in Biomedicine (2005) 79, 39–47

Bootstrap confidence intervals for the mode of the hazard function Josmar Mazuchelia,∗, Em´ılio Augusto Coelho Barrosb Jorge Alberto Achcarc a

Universidade Estadual de Maring´ a, Departamento de Estat´ıstica, DEs/UEM, Maring´ a, PR, Brazil Bolsista CNPq/UEM, Universidade Estadual de Maring´ a, Departamento de Estat´ıstica, DEs/UEM, Maring´ a, PR, Brazil c Universidade Federal de S˜ ao Carlos, Departamento de Estat´ıstica, DEs/UFSCar, S˜ ao Carlos, SP, Brazil b

Received 6 December 2004 ; received in revised form 25 January 2005; accepted 7 February 2005

KEYWORDS

Survival analysis; Log-logistic distribution; Hazard function; Bootstrap confidence intervals

Summary In many applications of lifetime data analysis, it is important to perform inferences about the mode of the hazard function in situations of lifetime data modeling with unimodal hazard functions. For lifetime distributions where the mode of the hazard function can be analytically calculated, its maximum likelihood estimator is easily obtained from the invariance properties of the maximum likelihood estimators. From the asymptotical normality of the maximum likelihood estimators, confidence intervals can be obtained. However, these results might not be very accurate for small sample sizes and/or large proportion of censored observations. Considering the log-logistic distribution for the lifetime data with shape parameter ˇ > 1, we present and compare the accuracy of asymptotical confidence intervals with two confidence intervals based on bootstrap simulation. The alternative methodology of confidence intervals for the mode of the log-logistic hazard function are illustrated in three numerical examples. © 2005 Elsevier Ireland Ltd. All rights reserved.

1. Introduction In lifetime data analysis, we usually have a hazard function that increases up to a maximum and then decreases after this change-point. This is common in medical studies as in heart or kidney transplantation, where the patients have an increasing hazard during an adaptation period and ∗

Corresponding author. E-mail address: [email protected] (J. Mazucheli).

a decreasing hazard after this adaptation period [1]. Therefore, we need lifetime models with this property. Some common lifetime models like the exponential or Weibull distributions are not appropriate to fit this kind of data. Many existing probability distributions used to analyze lifetime data have unimodal hazard functions: the log-logistic distribution [2]; the lognormal distribution [3]; the exponentiated-Weibull distribution [4]; the inverse-Weibull distribution [5] among many others.

0169-2607/$ — see front matter. © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2005.02.008

40

J. Mazucheli et al.

Usually, we have interest in the estimation of the lifetime change-point where the mode of the hazard function occurs. Considering the log-logistic distribution with shape parameter ˇ > 1, we introduce asymptotical based inferences and bootstrap based inferences for the mode or peak of the hazard function. It is important to note that usually, in the literature of lifetime data analysis, confidence intervals for the mode of the hazard functions are based on asymptotic arguments. A recent study about the log-logistic distribution, related to this work, is presented in [6]. Since we are interested in the mode of the hazard function, we do not discuss the case where the shape parameter ˇ ≤ 1. For ˇ ≤ 1 the hazard function is monotone decreasing with maximum at 0. The paper is organized as follows: in Section 2 we introduce some characteristics of the loglogistic distribution; in Section 3 we introduce the likelihood function in the presence of censored observations; in Section 4 we have some comparisons between asymptotical based inferences and bootstrap simulation based inferences for the hazard change-point; in Section 5 we give some illustrative examples with simulated and real data sets. A SAS macro to obtain asymptotical and bootstrap confidence intervals is briefly discussed in Section 6.

with  = 0, ˇ = 3.0, ˇ = 1.5 and ˇ = 1.0, respectively.

2. The log-logistic distribution Let T be a random variable representing the lifetime of a unit or patient with a log-logistic distribution with hazard function given by h(t) =

e ˇtˇ−1 , 1 + e t ˇ

(1)

where t > 0; ˇ > 0 and −∞ <  < ∞ are, respectively, the shape and scale parameters. The survival and the density functions [7] are, respectively, given by S(t) =

1 1 + e t ˇ

and

f(t) =

e ˇtˇ−1 . [1 + e tˇ ]2

Fig. 1 The log-logistic hazard function considering  = 0, (—): ˇ = 3.0, (– –): ˇ = 1.5 and (–·–): ˇ = 1.0.

(2)

We observe that the hazard function (1) decreases for ˇ ≤ 1 with maximum at t = 0; for ˇ > 1, the hazard function (1) is unimodal where h(t) increases for T ≤ and decreases for T > . The parameter = [(ˇ − 1)e− ]1/ˇ is the hazard changepoint. In Fig. 1, we have the plots of the log-logistic hazard function (1) considering some special cases

3. The likelihood function in the presence of right censored data Let T10 , . . . , Tn0 be the true lifetimes of a sample of size n, assumed to be independent identically distributed with a log-logistic distribution with hazard function (1). Assuming that the observations are subject to arbitrary right censoring, the period of follow-up for the ith individual is limited to a value Ci . Then, the observed survival time of the ith individual is given by ti = min(Ti0 , Ci ). Define ıi such that ıi = 0 if ti > Ti0 (a censored observation) and ıi = 1 if ti = Ti0 (an observed death or failure of some kind). The likelihood function for  and ˇ is given by

L(, ˇ | t) =

n  i=1



ˇ−1

e ˇti

ˇ 1 + e ti

ıi

ˇ

(1 + e ti )−1 .

(3)

Bootstrap confidence intervals for the mode of the hazard function Maximum likelihood estimators for  and ˇ are obtained by solving the equations  n n   −1 = 0 i=1 ıi S(ti ) − ˇe i=1 ti (1 + e ti )  n n 1 −1  i=1 ıi [S (ti ) + ˇ log ti ]S(ti ) − i=1 log(1 + e ti ) = 0, ˇ where S(t) is given in (2). The maximum likelihood estimator for the change-point at the mode of hazard function (1) assuming ˇ > 1 is obtained from the maximum likeˆ, that is lihood estimators  ˆ and ˇ ˆ ˆ − 1)e

= [(ˇ

ˆ −ˆ  1/ˇ

]

.

(5)

Asymptotical confidence intervals for = g(, ˇ) are obtained using the delta method [8], that a is, ˆ

∼ N( ; var(ˆ

)). Under the delta method, the ˆ) is given by asymptotical variance of ˆ

= g(ˆ ,ˇ  2 ∂ ˆ)] = ˆ) Var(ˆ Var[g(ˆ ,ˇ ) g(ˆ ,ˇ ∂  2 ∂ ˆ) Var(ˇ ˆ) + g(ˆ ,ˇ ∂ˇ    ∂ ∂ ˆ) ˆ) Cov(ˆ ˆ), +2 g(ˆ ,ˇ g(ˆ ,ˇ , ˇ (6) ∂ ∂ˇ where

 1  ∂ ˆ) = −  , g(ˆ ,ˇ ∂ ˇ =ˆ,ˇ=ˇˆ ∂ ˆ) g(ˆ ,ˇ ∂ˇ 

{[log(ˇ − 1) − ](ˇ − 1) − ˇ}  =− ,  ˇ2 (ˇ − 1) ˆ =ˆ ,ˇ=ˇ

and the asymptotical variances and covariance for ˆ are obtained from the inverse of the Fisher  ˆ and ˇ information matrix for  and ˇ. Alternatively to the asymptotical based confidence intervals for , we could use parametric bootstrap simulation methods by generating samples of the log-logistic density (2) assuming  =  ˆ (or ˆ − 1) − ˇ ˆ log ˆ the reparametrization  ˆ = log(ˇ

) and ˆ where  ˆ are the obtained maximum ˇ=ˇ ˆ and ˇ likelihood estimates for  and ˇ. In the same way, we could use nonparametric bootstrap simulation methods by resampling with replacement the available data (t1 , ı1 ), . . . , (tn , ın ) [9,10]. For the implementation of the parametric bootstrap, we generate pseudo-random values [11,12] from

1/ˇˆ U −ˆ  t= e , (7) 1−U where U has a uniform distribution in the (0, 1) interval.

41

(4)

4. Bootstrap confidence intervals for In this section we introduce the steps for the construction of bootstrap confidence intervals for , the mode of the log-logistic hazard function. The advantage of the bootstrap is that the joint distribution of the maximum likelihood estimators is not assumed to be normal, unlike in the delta method. We consider two bootstrap methods to construct the confidence intervals for : the p-Bootstrap method suggested by [13], based on the percentiles of the bootstrap distribution, and the t-Bootstrap method suggested by [14]. Other existing alternatives for the p-Bootstrap and the t-Bootstrap, not considered in this paper, also could be used to construct confidence intervals. For a complete review of available approaches to bootstrap confidence intervals see [9,15,16]. Let U = (t, ı) be the observed data where t = (t1 , . . . , tn ) is the vector of lifetime data and ı = (ı1 , . . . , ın ) is the vector of indicators of censored observations.

4.1. p-Bootstrap [a] Random select, with replacement from U, a bootstrap sample (t1∗ , ı∗1 ), . . . , (tn∗ , ı∗n ). [b] From the bootstrap sample in [a], find the maximum likelihood estimates of , denoted by ˆ

∗ . [c] Repeat steps [a] and [ b], B times. ∗ ≤ˆ ∗ ≤ ··· ≤ ˆ ∗ ) find a 100 × [d] From ˆ

∗ = (ˆ

(1)

(2)

(B) (1 − ˛)% bootstrap confidence interval given by ∗ ,ˆ ∗ ) where q = [(˛/2)B] and q = B −

(q (ˆ

(q 1 2 1) 2) q1 .

4.2. t-Bootstrap [a ] Random select, with replacement from U, a bootstrap sample (t1∗ , ı∗1 ), . . . , (tn∗ , ı∗n ).

[b ] From the bootstrap sample in [a ], find the maximum likelihood estimates of , denoted by ˆ

∗ .

[c ] Repeat steps [a ] and [b ], B times. ∗ , . . . , T ∗ ), [d ] From ˆ

∗ given in [d], find T ∗ = (T(1) (B) ∗ ∗ where T(i) ≤ T(j) for i, j = 1, . . . , B; i = j

42

J. Mazucheli et al.

Table 1

Simulated data set ( = −5.0; ˇ = 2.0)

2.09 2.64 2.81 2.98 3.62 4.08 4.32 6.12 6.25 6.52

7.06 7.26 7.30 7.89 8.30 8.30 8.53 8.55 8.93 8.98

10.14 10.16 10.22 10.91 11.57 11.58 11.67 11.99 12.40 13.04

13.12 13.74 14.27 14.84 17.40 18.51 19.45 20.26 20.42 20.46

20.67 22.58 25.55 26.49 27.12 47.01 55.62 80.35 104.09 115.36

given by Ti∗ =



i∗ − ˆ

) , ∗ ˆi

(8)

where ˆ

is the maximum likelihood estimates for and ˆi∗ is the standard error of ˆ

i∗ . Since ˆi∗ (i = 1, . . . , B) can be calculated directly by the inverse of the Fisher information matrix, it is not necessary to resample from the bootstrap sample [9,16]. [e ] From T ∗ we find a 100 × (1 − ˛)% bootstrap confidence interval for given by

− ˆ T ∗(q1 ) ), (ˆ

− ˆ T ∗(q2 ) ; ˆ

(9)

where q1 and q2 are defined in [d] and ˆ = Var(ˆ

), (ˆ

and ˆ are calculated from the original lifetime data).

5. Some illustrative examples 5.1. An example without censored lifetime observations In Table 1, we have a simulated data set from the log-logistic distribution with parameters  = −5.0 and ˇ = 2.0 where the mode of the hazard function occurs at the change-point = 12.1825 (n = 50 generated observations using the Eq. (7)). In Table 2, we have the maximum likelihood estimates for and ˇ and their asymptotical standard errors. In Table 3, we have 95% confidence intervals Table 3

MLEa 12.2750 2.1464

a

Parameter

MLE

SE

95% Confidence interval

ˇ

12.2239 2.1106

1.8575 0.2534

(8.5833; 15.8645) (1.6140; 2.6072)

for and ˇ based on B = 100,000 resamples from the original simulated data considering the pBootstrap and the t-Bootstrap methods assuming the parametrization  = log(ˇ − 1) − ˇ log in likelihood function. For illustration purposes the distribution of B = 100,000 bootstrap replications for the parameter

along with its 95% confidence interval is presented in Fig. 2(a). In Fig. 2(b) we have the plot of the estimated hazard function. We clearly observe from the results of Tables 2 and 3 that the bootstrap confidence intervals have shorter lengths than the asymptotical confidence intervals. We also could calculate asymmetry indexes for the confidence interval for given by F = (ˆ

U − ˆ

)/(ˆ

−ˆ

L ) where ˆ

U is the upper limit and ˆ

L is the lower limit for the confidence interval [9]. Observe that if F = 1 we have symmetrical confidence intervals. The bootstrap confidence intervals are in agreement with the bootstrap distribution for , that is, we get more accurate results (see, Fig. 2(a)). The asymptotical confidence intervals enforces symmetry (F = 1), a condition not verified for the empirical bootstrap distribution for . As pointed in [9], exact intervals, when they exist, are in general quite asymmetrical. In Table 4, we have the ranges (R) and the asymmetry indexes (F ) for the obtained confidence intervals.

5.2. An example with censored lifetime observations As our second example of the use of the resampling methods, let us assume that the first 40 of the n = 50 observations of Table 1 are

Bootstrap estimates, p-Bootstrap and t-Bootstrap confidence intervals

Parameter

ˇ

Table 2 Maximum likelihood estimates and asymptotical confidence intervals

Based on B = 100,000 bootstrap replications.

SEa 1.3083 0.2247

95% Confidence interval p-Bootstrap

t-Bootstrap

(9.8980; 15.0066) (1.7914; 2.6649)

(9.9241; 14.0735) (1.7624; 2.4106)

Bootstrap confidence intervals for the mode of the hazard function

43

Fig. 2 (a) Distribution of B = 100,000 bootstrap replications for the parameter along with its 95% confidence interval. (b) Estimated hazard function. (—): asymptotic confidence interval, (– –): p-Bootstrap and (· · ·): t-Bootstrap confidence intervals.

uncensored data (ıi = 1, i = 1, . . . , 40) and that the last 10 largest observations are censored data (ıi = 1, i = 41, . . . , 50). We also have B = 100,000 obtained resamples from the original data. However, in situations where we have censored lifetime data, parametric bootstrap in general gives better results than using nonparametric bootstrap [15,17]. In Tables 5 and 6, we have the asymptotical and bootstrap confidence intervals for the parameters

and ˇ. As expected [18], we observe larger lengths for the obtained confidence intervals based on the asymptotical or bootstrap methods in comparison with the obtained confidence intervals

considering uncensored observations. As in the uncensored case, bootstrap confidence intervals have shorter lengths than the asymptotical confidence intervals.

5.3. An example with a real data set Let us consider the lifetime of 96 patients with lung cancer introduced by [19]. From the 96 patients in the study, 5 were censored. In Tables 7 and 8, we have 95% asymptotical and bootstrap confidence intervals for the parameters and ˇ. The empirical bootstrap distributions are presented in Fig. 3(a) and (b), respectively.

Table 4 Range (R) and asymmetry index (F ) for the 95% confidence intervals for and ˇ Interval

Asymptotical p-Bootstrap t-Bootstrap

Table 5 Maximum likelihood estimates and asymptotical confidence intervals in the presence of censored observations

Parameter ˇ

R

F

R

F

Parameter

7.2811 5.1086 4.1493

1.0000 1.1964 0.8042

0.9932 0.8734 0.6483

1.0000 1.7362 0.8614

ˇ

MLE 10.2305 1.7353

SE

95% Confidence interval

2.4372 0.2428

(5.4537; 15.0073) (1.2728; 2.1979)

44

J. Mazucheli et al.

Fig. 3 (a and b) Distributions of B = 100,000 bootstrap replications for parameters and ˇ along with their 95% confidence intervals, (c and d) quantile-quantile plots for and ˇ. (—): asymptotic confidence interval, (– –): pBootstrap and (· · ·): t-Bootstrap confidence intervals.

Bootstrap confidence intervals for the mode of the hazard function

45

Table 6 Bootstrap estimates, p-Bootstrap and t-Bootstrap confidence intervals in the presence of censored observations MLEa

Parameter

ˇ

SEa

10.1632 1.7696 a

1.7387 0.2490

95% Confidence interval p-Bootstrap

t-Bootstrap

(6.6225; 13.4748) (1.3746; 2.3398)

(6.9704; 12.5813) (1.3635; 2.0809)

Based on B = 100,000 bootstrap replications.

Table 7 Maximum likelihood estimates and asymptotical confidence intervals

Table 9 Range (R) and asymmetry index (F ) for the 95% confidence interval for and ˇ

Parameter

Interval

ˇ

MLE 35.3470 1.3903

SE

95% Confidence interval

11.0213 0.1208

(13.7455; 56.9484) (1.1535; 1.6270)

To check if the normality of the empirical bootstrap distributions for and ˇ are appropriate, we have in Fig. 3(c) and (d), their normal quantilequantile plots. If we have normality, then the points in these plots should lie roughly on a straight line [20]. From these plots, we clearly observe that the normality assumption is not appropriate, which justifies the use of bootstrap methods to construct confidence intervals for the parameters. The analytical goodness-of-fit tests given by the Kolmogorov–Smirnov D statistic, the Anderson– Darling statistic, and the Cramer–von Mises statistic [21], also do not support the normality assumption. For both parameters, and ˇ, all tests provide pvalues less than 0.001. From the obtained results of Table 9, we observed that the obtained bootstrap confidence intervals are more accurate than the obtained asymptotical intervals, even in the situation of a large sample size (n = 96) and few censored observations. The fitted hazard function is given in Fig. 4(a). To check the adequacy of the log-logistic distribution for the data [1], we have in Fig. 4(b), the plot of log(t) versus log[(1 − ˆ S)/ˆ S], where ˆ S is the Kaplan

Table 8

MLEa 36.1959 1.4009

a

Asymptotical p-Bootstrap t-Bootstrap

ˇ

R

F

R

F

43.2029 34.3460 27.5554

1.0000 1.4570 0.8652

0.4736 0.3408 0.2620

1.0000 1.4585 0.8712

and Meier [22] estimator of the survival function. We observe a good fit of the data for the log-logistic model.

6. A SAS macro for asymptotic and bootstrap confidence intervals A SAS macro has been written to implement the asymptotic and bootstrap confidence intervals presented in numerical examples, Section 5. Usually, bootstrap simulation using SAS software resample the original data via proc multtest. In the macro code, we do the resampling using o proc surveyselect and the maximum likelihood estimates for and ˇ are obtained via the proc nlp using the trustregion method [23–25]. From the B bootstrap estimates for , ˇ, Var( ) and Var(ˇ), SAS/IML is used to provide asymptotic and bootstrap confidence intervals estimates. The SAS macro, along with instructions on its use, is available from the first author.

Bootstrap estimates,p-Bootstrap and t-Bootstrap confidence intervals

Parameter

ˇ

Parameter

Based on B = 100,000 bootstrap replications.

SEa 8.8233 0.0875

95% Confidence interval p-Bootstrap

t-Bootstrap

(21.3680; 55.7140) (1.2516; 1.5925)

(20.5732; 48.1286) (1.2502; 1.5122)

46

J. Mazucheli et al.

Fig. 4

(a) Fitted hazard function; (b) plot of log(t) versus log[(1 − ˆ S)/ˆ S].

7. Conclusions Considering the log-logistic distribution with shape parameter ˇ > 1, we presented two bootstrap based methods to construct confidence intervals for the change-point of the hazard function. We have showed in three numerical examples that the nonparametric bootstrap can be quite useful. Even for large sample sizes and small proportion of censored data, we observed better inference results considering bootstrap based methods in comparison to the usual asymptotical inference based on the normality of the maximum likelihood estimators. As pointed in [6], the difference between asymptotic confidence interval and the bootstrap confidence interval might be due to the fact that the variance obtained by the delta method depends on Taylor series approximation where the error terms are ignored. These terms are part of the variances computed by bootstrapping. The bootstrap can also be used to obtain confidence intervals for other functions of the parameters. For example, we could obtain confidence intervals for the mean or the median of the loglogistic distribution. The nonparametric bootstrap procedure was implemented in the SAS system and the macro

code can be obtained from the first author by request.

Acknowledgements The authors would like to thank the referees for their helpful comments. Emlio Augusto Coelho Barros and Jorge Alberto Achcar gratefully acknowledge partial financial support from CNPq/ Brazil.

References [1] D. Collett, Modelling Survival Data in Medical Research, Chapman & Hall, New York, 1994. [2] S. Bennett, Log-logistic regression models for survival data, Appl. Stat. 32 (1983) 165—171. [3] J.P. Klein, M.L. Moeschberger, Survival analysis: techniques for censored and truncated data, Springer-Verlag, New York, 1997. [4] G.S. Mudholkar, A.D. Hutson, The exponentiated Weibull family: some properties and a flood data application, communications in statistics, Theory Methods 25 (12) (1996) 3059—3083. [5] R. Jiang, D.N.P. Murthy, P. Ji, Models involving two inverseWeibull distributions, Reliability Eng. Syst. Saf. 73 (2001) 73—81.

Bootstrap confidence intervals for the mode of the hazard function [6] R.C. Gupta, O. Akman, S. Lvin, A study of log-logistic model in survival analysis, Biometrical J. 41 (4) (1999) 431—443. [7] J.F. Lawless, Statistical Models and Methods for Lifetime Data, John Wiley and Sons, New York, 1982. [8] C.R. Rao, H. Toutenburg, Linear models, Springer-Verlag, New York, 1999. [9] B. Efron, R.J. Tibshirani, An introduction to the bootstrap, vol. 57 of Monographs on Statistics and Applied Probability, Chapman & Hall, New York, 1993. [10] T.J. DiCiccio, B. Efron, Bootstrap confidence intervals, Stat. Sci. 11 (3) (1996) 189—228. [11] B.D. Ripley, Stochastic simulation, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons Inc., New York, 1987. [12] L. Devroye, Nonuniform random variate generation, Springer-Verlag, New York, 1986. [13] B. Efron, The jackknife, the bootstrap and other resampling plans, vol. 38 of CBMS-NSF Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1982. [14] P. Hall, Theoretical comparison of bootstrap confidence intervals, Ann. Stat. 16 (3) (1988) 927—985. [15] A.C. Davison, D.V. Hinkley, Bootstrap methods and their application, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 1997 with 1 IBM-PC floppy disk (3.5 inch; HD).

47

[16] J. Carpenter, J. Bithell, Bootstrap confidence intervals: when, which, what? A practical guide for medical statistician, Stat. Med. 19 (2000) 1141—1164. [17] N. Veraverbeke, Bootstrapping in survival analysis, S. Afr. Stat. J. 31 (2) (1997) 217—258. [18] S.L.J., W.Q. Meeker, Parametric simultaneous confidence bands for cumulative distributions from censored data, Technometrics 43 (4) (2001) 450—461. [19] R.L. Prentice, Exponential survivals with censoring and explanatory variables, Biometrika 60 (1973) 279—288. [20] J.M. Chambers, W.S. Cleveland, B. Kleiner, T.P.A., Graphical Methods for Data Analysis, Chapman & Hall, New York, 1983. [21] R.B. D’Agostino, M.A. Stephens, Goodness-of-fit Techniques, Marcel Dekker Inc., New York, 1986. [22] E.L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53 (1958) 457—481. [23] J. Dennis, D. Gay, R. Welsch, An adaptive nonlinear leastsquares algorithm, ACM Trans. Math. Software 7 (1981) 348—368. [24] J.J. Mor´ e, D.C. Sorensen, Computing a trust region step, SIAM J. Sci. Stat. Comput. 4 (3) (1983) 553—572. [25] T.F. Coleman, C. Hempel, Computing a trust region step for a penalty function, SIAM J. Sci. Stat. Comput. 11 (1) (1990) 180—201.