Computer Methods and Programs in Biomedicine 67 (2002) 217– 223 www.elsevier.com/locate/cmpb
SAS and SPLUS programs to perform Cox regression without convergence problems Georg Heinze *, Meinhard Ploner Department of Medical Computer Sciences, Uni6ersity of Vienna, A-1090 Vienna, Spitalgasse 23, Austria Received 20 October 2000; received in revised form 29 December 2000; accepted 4 January 2001
Abstract When analyzing survival data, the parameter estimates and consequently the relative risk estimates of a Cox model sometimes do not converge to finite values. This phenomenon is due to special conditions in a data set and is known as ‘monotone likelihood’. Statistical software packages for Cox regression using the maximum likelihood method cannot appropriately deal with this problem. A new procedure to solve the problem has been proposed by G. Heinze, M. Schemper, A solution to the problem of monotone likelihood in Cox regression, Biometrics 57 (2001). It has been shown that unlike the standard maximum likelihood method, this method always leads to finite parameter estimates. We developed a SAS macro and an SPLUS library to make this method available from within one of these widely used statistical software packages. Our programs are also capable of performing interval estimation based on profile penalized log likelihood (PPL) and of plotting the PPL function as was suggested by G. Heinze, M. Schemper, A solution to the problem of monotone likelihood in Cox regression, Biometrics 57 (2001). © 2002 Elsevier Science Ireland Ltd. All rights reserved. Keywords: FORTRAN; Monotone likelihood; Nonexistence of parameter estimates; Penalized likelihood; Proportional hazards regression; Survival analysis
1. Introduction For almost three decades, the semiparametric proportional hazards regression model of Cox [1] has been the first choice for the analysis of survival data. The straightforward interpretation of the estimated parameters as log relative risks fa-
* Corresponding author. Tel.: + 43-1-404006684; fax: + 431-404006687. E-mail address:
[email protected] (G. Heinze).
vored its popularity in medical research. Parameters are estimated by maximizing the (log) likelihood function (maximum likelihood method). However, it is also known that there are certain situations where finite maximum likelihood parameter estimates do not exist, i. e. where the likelihood converges to a finite value while at least one parameter estimate diverges to 9 [2]. This phenomenon is due to special conditions in a data set and is known as ‘monotone likelihood’. The probability of occurrence of monotone likelihood is too high to be negligible [3,4].
0169-2607/02/$ - see front matter © 2002 Elsevier Science Ireland Ltd. All rights reserved. PII: S 0 1 6 9 - 2 6 0 7 ( 0 1 ) 0 0 1 4 9 - 3
218
G. Heinze, M. Ploner / Computer Methods and Programs in Biomedicine 67 (2002) 217–223
In statistical software packages for Cox regression the convergence of the model fitting algorithm is usually based on the log likelihood
[5,6]. The resulting relative risk estimates are based on that iteration where the log likelihood changes by less than a very small prespecified
Fig. 1. Analysis of breast cancer data set: Output of SAS macro FC.
Fig. 2. Analysis of breast cancer data set: Output of PROC PHREG.
G. Heinze, M. Ploner / Computer Methods and Programs in Biomedicine 67 (2002) 217–223
219
Fig. 3. Plot of the profile penalized log likelihood function for variable HG.
value (e.g. 10 − 6). In case of monotone likelihood, the algorithm converges but nonexistence of finite estimates is often overlooked. The resulting estimates are completely arbitrary and thus extremely inaccurate [3] and misleading. In this paper, we present SAS [7] and SPLUS [6] programs to solve the monotone likelihood problem. In our programs parameter estimation is based on the penalized maximum likelihood approach originally developed by Firth [8] and adapted to the Cox model by Heinze and Schemper [3]. This approach provides an ideal solution to the problem of monotone likelihood. It has been shown that parameter estimates from this approach are always finite and have lower small sample bias than maximum likelihood estimates. Because of asymmetric shapes of the profile penalized likelihood in case of monotone likelihood, Heinze and Schemper [3]
recommend the construction of confidence intervals based on profile penalized likelihood instead of using the simpler Wald method. Our programs can be used to perform both types of interval estimation and to compare them graphically by plotting the profile penalized log likelihood (PPL) function as has been suggested [3]. The core routines of both programs reside in a dynamic link library (DLL) and can thus also be accessed from other software packages. While in Section 2 we describe the algorithms used by the SAS macro FC.SAS and by the SPLUS library coxphf, in Section 3 we give an overview of applications of our programs. In Section 4, we compare results obtained from the penalized maximum likelihood approach with those from standard analysis by means of a worked example and describe the availability of the program.
G. Heinze, M. Ploner / Computer Methods and Programs in Biomedicine 67 (2002) 217–223
220
2. Computing estimates and confidence limits The Cox [1] regression model is given by log{h(t)/h0(t)}=ixi. where h(t)/h0(t) is the hazard ratio or relative risk of a failure occurring at any given time t for an individual with covariate (row) vector xi compared to an individual with a null covariate vector. The regression parameter vector i of length k (corresponding to k covariates) is estimated by maximizing the partial log likelihood m
!
"n
log L(i)= % isj −dj log %h Rj exp(xhi)
,
j=1
where m, sj, dj and Rj denote the number of distinct survival times among the n survival times, the vector sum of the covariates, the number of all individuals failed at time t( j ), and the individuals at risk (the risk set) at time t( j ), respectively. Maximum likelihood estimates of the parameter i are found by equating the first derivative of the log likelihood, the so-called ‘score function’, to zero. The Cox regression model has been discussed in more detail in standard survival textbooks [9,10]. In general, maximum likelihood estimates are often prone to small sample bias. To reduce this bias, Firth [8] suggested to maximize the PPL log L(i)*=log L(i) +0.5 log I(i) , where I(i) is the Fisher information matrix, i.e. minus the second derivative of the log likelihood. Applying this idea to Cox regression, the score function U(i) is replaced by the modified score function U(i)* =U(i)+a, where a has rth entry ar = 0.5tr{I(i) − 1[(I(i)/(ir ]},r =1, … , k. Heinze and Schemper [3] give the explicit formulae for I(i) and (I(i)/(ir. In our programs estimation of i is based on a Newton-Raphson algorithm. Parameter values are initialized with 0. Breslow’s approximation [11] is used for computation of the likelihood and its derivatives with tied survival times. With a starting value of i (0), the penalized maximum likelihood estimate i. is obtained iteratively: i (s + 1) =i (s) +I(i (s)) − 1U(i (s))*,
(1)
If the PPL evaluated at i (s + 1) is less than that evaluated at i (s), then i (s + 1) is recomputed by step-halving. For each entry r of i with r= + 1) 1, … , k the absolute step size i (s − i sr is rer stricted to a maximal allowed value x. These two means should avoid numerical problems during estimation. The iterative process is continued until the parameter estimates converge. Computation of profile penalized likelihood confidence intervals for parameters follows the algorithm of Venzon and Moolgavkar [12]. For testing the hypothesis of k =k0, let the likelihood ratio statistic LR =2[log L(kˆ , l. )*− log L(k0, l. k 0)*], where (kˆ , l. ) is the joint penalized maximum likelihood estimate of i=(k, l), and l. k 0 is the penalized maximum likelihood estimate of l when k =k0. The profile penalized likelihood confidence interval is the continuous set of values k0 for which LR does not exceed the (1− h) 100th percentile of the 21-distribution. The confidence limits can therefore be found iteratively by approximating the PPL function in a neighborhood of i by the quadratic function: 1 l0 (i+l)= l(i)+l%U* − l%Il, 2 where U* = U(i)* and − I= − I(i). Suppose the confidence limits for parameter ir are to be computed. Then the increment vector l for the next iteration is obtained by solving the likelihood equations: ( {l0 (i+l)+ u(e %l−q)}=0, r (l where u is a Lagrange multiplier, er is the rth unit vector, and q is an unknown constant. The solution is l= I − 1(U* + uer ).
(2)
Let l0 = lmax − 21,1 − h /2, where lmax is the maximized value of the PPL, and 21,h is the h-quantile of the 2-distribution with one degree of freedom. By substituting Eq. (2) into the equation l0 (i + l)= l0, u can be estimated as
1 2 l0 − l(i)− U*%I − 1U* 2 u= 9 −1 e %I er r
"=
1 2
.
G. Heinze, M. Ploner / Computer Methods and Programs in Biomedicine 67 (2002) 217–223
The upper confidence limit for ir is computed by starting at the penalized maximum likelihood estimate of i and iterating with positive values of u until convergence is attained. The process is repeated for the lower confidence limit using negative values of u. Convergence is declared on the current iteration if l(i) − l01 0m and (U* + uer )%I − 1(U* +uer )0 m. In some situations computation of profile penalized likelihood confidence intervals may be time consuming since the iterative procedure outlined above has to be repeated for the lower and for the upper confidence limits of each of the k parameters. In other problems one may not be interested in interval estimation, anyway. In such cases, the user can request computation of Wald confidence intervals and P-values, which are based on the normal approximation of the parameter estimates and do not need any iterative estimation process. Standard errors |ˆ r, r = 1, … , k, of the parameter estimates are computed as the roots of the diagonal elements of the variance matrix V(i. )= I(i. ) − 1. A (1−h) × 100% Wald confidence interval for parameter ir is then defined as [i. r +Fh/2|ˆ r, i. r +F1 − h/2|ˆ r ] where Fh denotes the h-quantile of the standard normal distribution function. The adequacy of Wald confidence intervals for parameter estimates should be verified by plotting the PPL function. A symmetric shape of the PPL function allows use of Wald intervals, while an asymmetric shape demands profile penalized likelihood intervals [3].
3. Program description The SAS macro FC.SAS and the SPLUS library coxphf both access the dynamic link library FC.DLL which was written in FORTRAN using Microsoft Visual Fortran Professional Edition 5.0.A and compiled for Windows platforms. Several simple options allow the user to specify the Cox model for which parameter estimates based on penalized maximum likelihood according to Ref. [3] should be obtained. While the complete User’s Guide to those two programs can be found in a Technical Report [4], here, we only give an
221
overview of the parameters that can be set by the user.
3.1. SAS macro FC.SAS The most important macro options of FC.SAS are the following: DATA: SAS data set which contains the survival data TIME: survival time variable CENS: censoring status variable CENSVAL: value of censoring status variable which indicates censoring of survival times. The default value is 0. VARLIST: list of covariates, delimited by blanks RISK: specifies if in addition to parameter estimates relative risk (hazard ratio) estimates and associated confidence intervals should be given in the output PL: specifies if confidence intervals should be based on the PPL (the default) or on the Wald method PROFILE: requests plots of the PPL for the covariates specified in this option TEST: requests a penalized likelihood ratio test for a set of variables (e.g. a set of k–1 dummy variables that are related to a k-level factor) Further macro options control the Newton– Raphson algorithm used for computation of parameter estimates (maximum number of iterations, maximum step length per iteration, maximum number of step-halvings per iteration, convergence criterion for the parameter estimates) and the names of SAS data sets containing the output of the macro. In particular, these data sets comprise a table of parameter estimates, standard errors, confidence intervals and P-values (OUTTAB), a summary of the number of survival and censoring times, the number of iterations needed and the P-value for the global penalized likelihood ratio test (OUTMOD), as well as the estimate and its variance-covariance matrix (OUTEST). Part of the printed output for a typical macro call can be seen from Fig. 1. FC.SAS can also be efficiently applied in simulation studies or bootstrap applications: separate analyses can be requested for observations in groups
222
G. Heinze, M. Ploner / Computer Methods and Programs in Biomedicine 67 (2002) 217–223
defined by a grouping variable (option BY) and printed output can be suppressed (option PRINT). This is a standard feature shared by virtually all SAS procedures.
To plot the PPL function the function coxphfplot can be used. It allows the specification of the range of and distances between the points plotted.
3.2. SPLUS library COXPHF 4. Example and availability The SPLUS library coxphf includes the functions coxphf, print.coxphf, summary.coxphf, coxphftest, print.coxphftest, and coxphfplot. The main function coxphf follows the implementation of the standard Cox function coxph with respect to the submission of input data and the naming of the attributes of output data objects. The associated generic functions summary.coxphf and print.coxphf are used to show detailed and brief results of coxphf, respectively. The most important arguments of the function coxphf are the following: formula: an SPLUS formula as used by the coxph-function, with a Surv function containing the time and censoring indicator variables at the left-hand side; the general SPLUS formula syntax applies (e. g. formula= Surv(time,status) X1 + X2 + X3 +X4) data: an SPLUS dataframe containing the relevant variables pl: specifies if confidence intervals should be based on the PPL method (pl = T, the default) or on the Wald method (pl = F) alpha: the significance level (1 minus the confidence level, 0.05 as default) Similarly to the SAS macro FC.SAS, the arguments maxit, maxstep, maxhs, and epsilon control the Newton –Raphson algorithm (maximum number of iterations, maximum step width per iteration, maximum number of step-halvings per iteration, and convergence criterion for the parameter estimates, respectively). The function coxphftest performs a penalized likelihood ratio test for a set of model terms. In addition to the arguments of the function coxphf the test function includes: test: specifies the model terms which will be tested (e.g. test= X1 + X2) values: null hypothesis values in a column vector, default= 0 (e.g. values= c(2,0) will test the null hypothesis H0:iX1 =2,iX2 =0)
Use of FC.SAS and coxphf is exemplified by means of the analysis of a breast cancer data set [13]. Survival times of 100 patients were recorded (74 of them censored) and also values of four potential risk factors: tumor stage (TS), nodal status (NS), histological grading (HG) and Cathepsin D immunoreactivity (CD). For analysis these factors were dichotomised to levels of 0 and 1 (unfavorable). The survival and censoring times were stored in variable TIME, and the censoring indicator in variable CENS, 0 indicating a censored survival time. If a Cox model is fitted to this data set either by routine coxph of SPLUS or by PROC PHREG of SAS [5], risk factor HG causes monotone likelihood, easily recognized by its extremely large though insignificant parameter and hazard ratio estimates (see the output of PROC PHREG in Fig. 2). However, penalized maximum likelihood analysis using either SPLUS by submitting coxphf (Surv (time,cens) TS +NS + HG + CD, data = breast) or SAS by submitting % fc (data= breast, time= time, cens= cens, varlist= ts ns hg cd); results in a finite and plausible relative risk (hazard ratio) estimate of 11.3 (see SAS output in Fig. 1, SPLUS output similar). In accordance with Heinze and Schemper [3], we recommend using confidence limits for relative risks based on the profile penalized likelihood, which is also the default setting of both programs. The adequacy of Wald confidence intervals can be judged by requesting the program to plot the profile of the penalized likelihood for an independent variable. In SAS this is achieved by submitting % fc (data= breast, time= time, cens=cens, varlist=ts ns hg cd, profile = hg); In SPLUS we call coxphfplot (Surv (time, cens) TS +NS + HG +CD, data=breast, which= HG). In the plot produced by SAS or SPLUS, we see that the Wald test is not adequate for the variable HG
G. Heinze, M. Ploner / Computer Methods and Programs in Biomedicine 67 (2002) 217–223
because of the asymmetric shape of the PPL function (see SAS output in Fig. 3). Both lower and upper Wald confidence limits are far too low. The complete User’s Guide for the SAS macro FC.SAS and the SPLUS library coxphf along with installation notes can be found in a Technical Report [4] which is available along with the programs at the WWW location http://www.akhwien.ac.at/imc/biometrie/fc.
References [1] D.R. Cox, Regression models and life-tables (with discussion), J. Roy. Stat. Soc. B34 (1972) 187 –220. [2] M.C. Bryson, M.E. Johnson, The incidence of monotone likelihood in the Cox model, Technometrics 23 (1981) 381 – 383. [3] G. Heinze, M. Schemper, A solution to the problem of monotone likelihood in Cox regression, Biometrics 57 (2001). [4] G. Heinze, Technical Report 10: The application of Firth’s procedure to Cox and logistic regression, Depart-
[5] [6] [7] [8] [9]
[10] [11] [12]
[13]
223
ment of Medical Computer Sciences, Section of Clinical Biometrics, Vienna University, Vienna, 1999. SAS Institute, SAS/STAT User’s Guide, Version 8, SAS Institute Inc., Cary, NC, 1999. MathSoft, SPLUS 4.0, MathSoft Inc., Cambridge, MA, 1997. SAS Institute, SAS Language Reference, Version 8, SAS Institute Inc., Cary, NC, 1999. D. Firth, Bias reduction of maximum likelihood estimates, Biometrika 80 (1993) 27 – 38. E. Marubini, M.G. Valsecchi, Analysing Survival Data from Clinical Trials and Observational Studies, Wiley, New York, 1995. D.R. Cox, D. Oakes, Analysis of Survival Data, Chapman and Hall, London, 1984. N.E. Breslow, Covariance analysis of censored survival data, Biometrics 30 (1974) 89 – 99. D.J. Venzon, S.H. Moolgavkar, A method for computing profile-likelihood based confidence intervals, Appl. Stat. 37 (1988) 87 – 94. A. Loesch, C. Tempfer, P. Kohlberger, E.A. Joura, M. Denk, B. Zajic, G. Breitenecker, C. Kainz, Prognostic value of cathepsin D expression and association with histomorphological subtypes in breast cancer, Brit. J. Cancer 78 (1998) 205 – 209.