Validation of a risk function to predict mortality in a VA population with coronary artery disease

Validation of a risk function to predict mortality in a VA population with coronary artery disease

Validation of a Risk Function to Predict Mortality in a VA Population with Coronary Artery Disease P. N. Peduzzi, K. M. Detre, Y. K. Chan, A. Oberman,...

704KB Sizes 6 Downloads 60 Views

Validation of a Risk Function to Predict Mortality in a VA Population with Coronary Artery Disease P. N. Peduzzi, K. M. Detre, Y. K. Chan, A. Oberman, and G. R. Cutter From the Cooperative Studies Program Coordinating Center, VA Medical Center, West Haven, Connecticut and the School of Medicine, The University of Alabama, Birmingham, Alabama

ABSTRACT: A multivariate risk function based on the Cox model was developed in the VA Study of Coronary Artery Bypass Surgery to predict the survival of patients with stable angina pectoris. The methods used in developing and evaluating the performance of the risk function (validation) are described. Performance was evaluated internally by the methods of resubstitution, half-sample replication, and jackknifing, and externally by use of an independent patient population. KEY WORDS: risk function, resubstitution, cross-sample validation, half-sample replication, jackknif-

ing, Cox model INTRODUCTION In medical and epidemiological studies of the natural history of disease, functions are often developed to estimate prognosis using a combination of weighted risk factors. This function is commonly called a risk function and is usually based on a statistical model such as the linear logistic regression [1] or Cox life table regression [2] procedures. Frequently, the selection of prognostic factors and their corresponding weights (coefficients) are derived from the same data set. Thus, the risk function is expected to perform well on the data set from which it was developed using simple resubstitution provided that factors with prognostic value exist and that both the factors and model are properly selected. Resubstitution, by itself, has limited value. In order to generalize the value of the risk function as a true measure of risk, its performance must be examined on independent data sets. This may be accomplished either by creating subsets of the original data so that the function can be developed on

Received June 12, 1981; revised and accepted August 27, 1981. Address reprint requests to: Peter Peduzzi, CSPCC 151A, VA Medical Center, West Spring St., West Haven, CT 06516. Controlled Clinical Trials 3:47-60 (1982) 47 © 1982 Elsevier Science Publishing Co., Inc., 52 Vanderbilt Avenue, New York, NY 10017 0197-2456/82/01004714502.75

48

P.N. Peduzzi et al. one subset of the data and tested independently on another subset, or by accessing new independent populations on which to test the function. These methods of testing the risk function on independent sources of data may be referred collectively to as validation. In this paper we describe the development of a risk function based on the Cox model to predict the natural history of patients in the VA Study of Coronary Artery Bypass Surgery and the various methodologic approaches that were used to establish validity for the risk function. The VA data are used only to illustrate the methods; practical application of the VA risk function has been reported [3].

METHODS

The background, design, methods, and baseline characteristics of the VA Study have been described in detail [4,5]. Briefly, the primary objective was to determine the efficacy of coronary artery bypass surgery in prolonging survival in patients with chronic stable angina. For this purpose, 1015 patients were recruited by 13 VA medical centers during the years 1970-1974, and were randomly assigned to either medical (N = 508) or medical plus surgical therapy (N -- 507). Entry criteria consisted of stable angina pectoris of more than 6 months' duration with at least a 3-month trial of standard medical management, electrocardiographic abnormality at rest or with exercise consistent with myocardial ischemia, and angiographic evidence of a reduction of at least 50% in the lumen diameter of one or more major coronary arteries with potentially graftable distal segments. Patients with conditions limiting life expectancy to less than 5 years were excluded. A risk function was developed on the medical cohort because this cohort more closely represents the natural history of the disease. The following considerations in the development of the function are described: selection of risk factors; selection of an appropriate model; validity of assumptions of the model; and methods of evaluating performance, i.e., the predictive power of the function. The nine entry characteristics evaluated in the current work were those found to be the best risk indicators in the Coronary Drug Project [6] and the Health Insurance Plan [7] of N e w York. Table 1 presents the prevalence of these risk indicators for the VA study, and gives the risk associated with each based on the five-year mortality experience (% dead) of the 508 medical patients. Those risk indicators whose prevalence was at least 20%, and whose values were recorded in most patients, were selected to enter the risk function. These were (1) New York Heart Association classification III or IV (NYHA), (2) history of hypertension, (3) history of myocardial infarction (MI), and (4) ST segment depression on the resting ECG. Since the objective was to develop a risk function that was both valid and applicable, we selected factors whose prevalence was not negligible, and arbitrarily chose 20%. The Cox life table regression model [2] modified by Breslow [8] was used as the basis for development of the risk function because it provided more efficient use of the data than the logistic model since each patient's survival

49

Validation of a Risk Function Table I

Baseline Distribution a n d Mortality Rates of Selected Prognostic Factors in the VA Medical C o h o r t

Variable ST segment depression" No Yes History of MI No Yes History of hypertension No Yes NYHA HI or IV No Yes EKG evidence of MI

No.

%

5-year mortality %

361 133

73 27

14 35

<0.001

200 308

39 61

12 26

<0.001

365 143

72 28

17 29

0.005

220 288

43 57

16 24

0.026

303 191

61 39

17 25

0.045

135 56

71 29

19 29

0.124

429 50

90 10

21 26

0.368

467 27

95 5

19 41

0.006

477 17

97 3

20 18

0.500

X2 p value

(Q - QS) ~

No Yes Digitalis or diuretics No Yes Cardiomegaly~ No Yes Ventricular premature beats d No Yes Ventricular conduction defects r No Yes

"YesffiMinnesota codes 4.1, 4.2, 4.3, 4.4, No=Minnesota code 4.0. ~Yes=Minnesota codes 1.1.1 to 1.4.4, No=Minnesota code 1.0. ~YesffiCardiothoracic ratio >50% on chest x ray. 'Yes ffipresence of any ventricular premature beats on resting ECG. 'Yes=Minnesota codes 7.1, 7.2, 7.4, 7.6, 7,8, No=Minnesota codes 7.0, 7.3, 7.5, 7.7.

time w a s k n o w n . This m o d e l w a s a p p l i e d to the data of the medical cohort u s i n g i n f o r m a t i o n on each p a t i e n t ' s survival time, survival status (alive/dead), a n d values for the f o u r covariates (1 = absent, 2 = present). F r o m the model, the probability of d y i n g within t years (risk function) can be e s t i m a t e d a n d e x p r e s s e d as follows: Po(t)

=

1

-

/~0 (t) ~pl~'~z - ~n

(1)

w h e r e Z ' = (Z1. . . . . Z 0 is r o w vector of covariate values, ~ = (Zi . . . . . 54) is the r o w vector of m e a n values of the covariates, ~ ' = (~ . . . . . ~ ) is the r o w vector of m a x i m u m likelihood estimates (MLEs) of the regression coefficients and/~0(t) is Breslow's estimate of the u n d e r l y i n g probability of s u r v i v i n g t y e a r s w h e n the values of the covariates are set equal

50

P.N. Peduzzi et al. to their mean values (Z1 = Z1. . . . . Z, = Z,), i.e., under an average set of conditions. In our notation Z~ -- N e w York Heart Association class III or IV, Z2 -- history of hypertension, Z3 = history of MI, and Z4 = ST segment depression. Since each of the four risk factors are dichotomous, only sixteen (24) distinct probabilities or probability groups result. In the development of the coefficients, medically randomized patients w h o crossed over to surgical treatment were counted as lost to follow-up at the time treatment change occurred. Thus, the coefficients represented the weights contributed by the natural history of the disease and were not affected by subsequent surgeryl However, similar coefficients were found even when all deaths were counted, regardless of crossover. The assumption of the Cox model is that the covariates have a multiplicative effect on the hazard function. The proportional hazards assumption was examined, for each variable included in the model by the graphical method outlined by Kalbfleish [9]. For example, for history of MI, two strata were formed--stratum I for no history of MI (Z3 = 1) and stratum 2 for the presence of any history of MI (Z3 = 2). The estimated survival functions for each strata $1 (t) = [F01 (t)]~ and $2 (t) = [/~02(t)] c with c = exp[~'l (Z1 - Z1) + ~2 (Z2 Z2) + ~fa(Z4 - Z4)] were determined from the Cox model adjusted for NYHA, history of hypertension, and ST segment depression. The logarithm of the negative value of the logarithmic survival functions, In[ - In S~ (t)] and In[ - In $2 (t)], was then plotted against time for each of the two strata given specified values of the adjustment covariates. Parallel curves implied that the porportional hazards assumption was satisfied. Interactions between factors were examined to determine whether the simple additive model was appropriate. The performance of the risk function was evaluated by the following methods: resubstitution, half-sample replication, jackknifing, and cross-sample validation. By the method of resubstitution, the risk function developed on the medical cohort was applied to the baseline characteristics of each patient in this cohort to obtain his predicted or theoretical 5-year probability of dying. This method of testing performance is biased in favor of good prediction because the function is developed on and applied to the same data. A more realistic appraisal of performance is obtained when estimattion and testing are accomplished on independent sets of data. The methods of half-sample replication, jackknifing, and cross-sample validation offer this advantage. Independent subsets of the VA data were constructed for purposes of estimation and testing by randomly dividing the medical cohort into two groups, half-sample replicates. Group 1 consisted of 242 patients including 4 with missing data on one or more of the four risk factors, while Group 2 consisted of 266 l~atients 10 of w h o m had missing data. A risk function using the four factors was developed on Group 1 and applied to the data of Group 2. Similarly, a risk function was developed on Group 2 and tested on the data of Group 1. The jackknife technique was applied to the medical cohort using the leaveone-out method [10]. Maximum likelihood estimates of the Cox-Breslow regression coefficients of the four risk factors were obtained after leaving one patient out of the data. The resulting risk function was applied to the excluded patient's data to obtain his predicted 5-year probability of dying. This process

Validation of a Risk Function

51

was repeated until probabilities of dying were estimated for all patients. Sixteen probability groups were obtained by averaging the predicted probabilities for all patients with the same combination of risk factors. Availability of an independent population with coronary artery disease provided cross-sample validation of the risk function. Data on 535 medically treated patients w h o had angiography between 1973 and 1975 were obtained from the University of Alabama Birmingham Medical Center. Male patients who had less than a 50% stenosis in a major coronary artery and all female patients were excluded in order to define a group comparable to the VA population. Since N e w York Heart Association functional class was not recorded in the Alabama patients, a similar variable that measured dyspnea on exertion was substituted. The average length of follow-up in this population was 3 years compared with an average of more than 5 years in the VA patients. The VA risk function was adjusted to predict the 3-year mortality of the Alabama population. The reverse was also accomplished--a risk function was developed from the data of the Alabama population and tested on the independent VA data. For each method of evaluation,, the agreement between observed and predicted mortality among the 16 probability groups was determined by fitting a weighted linear regression to the grouped data. The estimate of observed mortality was the familiar t-year life table cumulative mortality rate (CMR). The weights, Wi, were constructed such that their sum equaled the number of groups, i.e., X Wi = 16. The weight for each group was calculated as Wi = 1 6 n / X n , where n, was the size of the ith group [11]. This method was utilized because the number of patients in each of the sixteen probability groups varied considerably and the precision of the estimated t-year CMR was related to the size of the group, i.e., nearly a one-to-one correspondence. Because some of the CMRs were zero, using weights based on the reciprocal of the Greenwood variance was not possible. The estimate of slope was an indicator of the gradient of the linear relationship between observed and predicted mortality and the R 2 criterion provided a measure of the fit of the weighted linear regression line to the data [12]. Comparison of slope and R 2 between the different methods of evaluating performance was not directly possible because the ranges of observed and predicted mortality differed among the methods. Performance of the risk function was, however, evaluated based on whether the different methods yielded consistently high R 2 values and slopes near unity. RESULTS

The average length of follow-up was 6.5 years for the 508 medical patients in the VA study. During this period 145 deaths occurred and the survival status was known for 98.6% of the cases. In this same cohort, 109 patients (21%) crossed over to surgical treatment with a mean time to crossover of 3 years. Fourteen patients had missing data on one or more of the four risk factors and were excluded from the analysis. The maximum likelihood estimates of the regression coefficients obtained by applying the Cox-Breslow life table regression model to the data of the meaical cohort are given in Table 2. Using these estimates and the estimated

Estimates

0.1943 0.1933 0.2190 0.1864

VA population Standard error

‘NYHA = Dyspnea on exertion for Alabama population.

0.2757 0.6374 0.8318 1.0579 0.8617 0.8053

NYHA Hx of hypertension HxofMI ST depression [!I (3) F, (5)

Regression

Coefficient

Cox-Breslow

Variable

Table 2

1.42 3.30 3.80 568

T-value 1.57 1.29 1.61 1.27

Mean value 0.6405 0.4204 0.5210 0.6305 0.8208

0.2608 0.1940 0.2228 0.2011

3.10 2.07 2.34 3.14

Alabama population Standard ‘T-value Coefficient error

1.53 1.46 1.66 1.24

Mean value

Validation of a Risk Function

53

MI p r ~ t

] MI IINInI

-3

,4 0

Figure 1 Logarithm of the negative value of the logarithmic survival functions for presence of history of MI and absence of history of MI when (A) adjustment covariates assume value 1, and (B) adjustment covariates assume value 2. value 0.8053 as the underlying probability of surviving more than 5 years, the risk function was given by

Po (5) = 1 - 0.0853 c,

(1)

where c = exp[0.2757(Z~ - 1.57) + 0.6374(Z~ - 1.29) + 0.8318(Z3 - 1.61) + 1.0579(Z, - 1.27)]. The proportional hazards assumption was tested separately for each of the factors included in the model (as described in Methods). Only the results for history of MI are presented. Figure 1 (Part A) gives the plots of strata I (history of MI absent) and 2 (history of MI present) when the adjustment covariates assumed value 1 (i.e., Z~ = Z2 = Z4 = 1). Figure 1 (Part B) represents the case when the adjustment factors assumed value 2 (i.e., Z~ = Z2 = Z4 = 2). In both cases the curves are approximately parallel implying that the proportional hazards assumption is satisfied for history of MI and that this variable can be included in the model. Similar results were obtained with the other factors. The results of applying the risk function to the data of the medical cohort (resubstitution) are presented in Table 3. For each of the 16 probability groups the 5-year CMR + SE is given by increasing PD (5) along with the resulting patterns of the presence or absence of the four factors (clinical patterns). The range of Po (5) was 0.067-0.683 and of 5-year CMR was 0.043-0.600. The ratio of highest to lowest predicted probability was 10, while the corresponding ratio for observed mortality was 14. Slope and R2 were 0.897 and 0.737, respectively. The R2 in this case is not a good measure of fit because the function is tested on the same data from which it was developed. The plot of these data (Figure 2) shows a bump between predicted values 0.2 and 0.4 and suggests that the simple additive model may not be completely

54

P. N. Peduzzi et al.

Table 3

D i s t r i b u t i o n of O b s e r v e d a n d P r e d i c t e d a 5 - Y e a r M o r t a l i t y b y Clinical P a t t e r n in t h e V A M e d i c a l C o h o r t

Clinical pattern

No.

PD(5)

5 yr CMR + SE

.... N---H---MNH----S N-MN--S -HM-H-S NHM--MS NH-S N-MS -HMS NHMS

51 48 26 67 20 14 94 16 19 8 36 18 12 44 11 10

0.067 0.088 0.123 0.148 0.159 0.182 0.190 0.232 0.261 0.316 0.329 0.369 0.393 0.455 0.582 0.683

0.043 _ 0.030 0.054 +_ 0.038 0 0.153 _+ 0.047 0.181 _+ 0.097 0.447 _+ 0.179 0.201 -+ 0.044 0.333 - 0.124 0.426 _+ 0.127 0.300 _ 0.182 0.330 +_ 0.082 0.250 _+ 0.108 0.500 _+ 0.144 0.370 _+ 0.077 0.501 _+ 0.161 0.600 _+ 0.155

N = NYHA Class III or IV H = Hx of hypertension M = Hx of MI S = ST depression -= Absence of factor ~= VA 5-year risk function

a p p r o p r i a t e . A s a first a p p r o a c h , all t w o - w a y i n t e r a c t i o n s w e r e e x a m i n e d in a n a t t e m p t to s m o o t h t h e l i n e a r r e l a t i o n s h i p b e t w e e n p r e d i c t e d a n d o b s e r v e d m o r t a l i t y . T h e o n l y i n t e r a c t i o n t e r m t h a t s i g n i f i c a n t l y c o n t r i b u t e d to t h e r e g r e s s i o n w a s t h a t b e t w e e n h i s t o r y of M I a n d ST d e p r e s s i o n (MI*ST). W h e n this t e r m w a s i n c l u d e d in t h e r i s k f u n c t i o n , t h e s l o p e a n d R 2 o b t a i n e d b y r e s u b s t i t u t i o n w e r e 0.966 a n d 0.862. A l t h o u g h s o m e s m o o t h i n g w a s o b t a i n e d , t h e p l o t of t h e d a t a ( F i g u r e 3) still r e v e a l e d a b u m p in t h e s a m e r a n g e of PD (5). A d d i t i o n of t h e i n t e r a c t i o n t e r m to t h e m o d e l i n c r e a s e d t h e s l o p e a n d R 2 b u t d i d n o t r e a l l y i m p r o v e t h e fit. T h u s , t h e i n t e r a c t i o n t e r m w a s n o t a d d e d to t h e f u n c t i o n a n d t h e s i m p l e a d d i t i v e m o d e l w a s r e t a i n e d . T h e M L E s of t h e r e g r e s s i o n c o e f f i c i e n t s d e v e l o p e d o n G r o u p 1 a n d G r o u p 2, r e s p e c t i v e l y , a r e p r e s e n t e d in T a b l e 4. B e c a u s e t h e c o e f f i c i e n t s in e a c h g r o u p w e r e e s t i m a t e d o n f e w e r p a t i e n t s , t h e y w e r e less s t a b l e , e x h i b i t i n g

Table 4

C o x - B r e s l o w R e g r e s s i o n E s t i m a t e s for H a l f - S a m p l e R e p l i c a t e s

Variable

Coeff.

NYHA Hx of Hypertension Hx or MI ST Depression /~0(5)

0.4581 0.8513 0.7752 0.7481 0.7778

Group 1 (n = 242) SE T-value 0.2796 0.2614 0.2914 0.2606

1.64 3.26 2.66 2.87

Mean value 1.59 1.29 1.59 1.27

Coeff. 0.1653 0.4625 0.8786 1.3555 0.8284

Group 2 (n = 266) SE T-value 0.2784 0.2918 0.3344 0.2748

0.59 1.58 2.63 4.93

Mean value 1.55 1.29 1.62 1.27

Validation of a Risk Function

55

1.0.

0.9,

0.8

0.?

0.8

;o

-I

0.¼ ¸

o

I

0.3

0.2

0.1

0.(~ o.o

v O. I"

! 0.2

o,

J

o.;

o.s'

o6'

o;

o.e'

od

,d

eO (5)

Figure 2 Observed by predicted 5-year mortality in the VA medical cohort using the VA 5-year risk function. ~ = 5-yr CMR -+ 1 SE.

larger standard errors than those developed on all the data (Table 2). The differences between the two sets of coefficients can be explained by their variability, i.e., none is significantly different. The application of each group's risk function to the data of the other group produced a gradient between observed and predicted mortality. The Group I risk function applied to Group 2 resulted in a slope of 0.858 and an R2 of 0.505, while the reverse procedure yielded a slope of 0.666 and an R2 of 0.390. Even though the half-sample replicates produced relatively unstable coefficients, the performance of the separate risk functions was about the same and, as expected, not as good as that obtained by resubstitution. The jackknife results were similar to those for resubstitufion. The slope of the fitted regression line was 0.891 and R2 was 0.733. These results were not unexpected because the exclusion of one patient at a time from the analysis with so large a sample size would not markedly alter the estimates of the regression coefficients for the 494 separate risk functions.

56

P.N. Peduzzi et al. 1.0'

O. 9'

0.8'

~

0.5' )

I

0,~ ( Q

$

0.3'

1 ~A

O. 2 ~

0.1'

o.~ ~.o

tt

., o.,

o.2

,

o.

~

o.,

,

o.s

,

o.

~

o.,

,

0.,

,

o.

;

,.o

,

PO151

Figure 3 Observed by predicted 5-year mortality in the VA medical cohort using the VA 5-year risk function with the MI*ST interaction term included. ~ = 5-yr CMR -+ 1 SE.

The VA risk function, adjusted to predict 3-year mortality, was next applied to the Alabama population (Table 5). The slope of the fitted weighted linear regression line of observed on predicted mortality was 0.754 with an R 2 of 0.731. The risk function developed on the Alabama population to predict 3year mortality was based on the estimates shown in Table 2. Application of this function to the VA data resulted in a slope of 1.280 and an R 2 of 0.750. These cross-sample validation findings were consistent, even though the separate risk functions differed. The results of the different methods used to evaluate performance are summarized in Table 6. All methods, except for half-sample replication, yielded consistently high R 2 within a narrow range and slopes near unity. The half-sample replication findings were poorer and may be attributable to the smaller sample size.

57

Validation of a Risk Function

Table 5

Distribution of Observed and Predicted a 3-Year Mortality by C l i n i c a l P a t t e r n in t h e A l a b a m a C o h o r t

Clinical pattern

No.

VA PD(3)

.... N---H---MNH----S N-MN--S -HM-H-S NHM--MS NH-S N-MS -HMS NHMS

44 28 26 74 32 8 75 10 58 8 58 15 20 28 15 24

0.047 0.061 0.087 0.104 0.112 0.129 0.135 0.166 0.188 0.230 0.240 0.271 0.291 0.341 0.451 0.546

Alabama 3-year CMR +- SE 0.068 0.036 0.158 0.081 0.195 0.206 0.300 0.231 0.125 0.241 0.220 0.384 0.373 0.357 0.398

-+ 0.038 -+ 0.035 -+ 0.073 --- 0.032 -+ 0.072 0 -+ 0.048 --- 0.145 _+ 0.060 -+ 0.117 _+ 0.060 _+ 0.114 _+ 0.118 -+ 0.094 -+ 0.128 - 0.105

N = Dyspnea on light or negligible exertion. H = Hx of hypertension. M = Hx of MI. S = ST depression. -= Absence of factor. a= VA 3-year risk function.

DISCUSSION T h i s p a p e r c o n s i d e r s t h r e e i s s u e s r e l e v a n t to t h e d e v e l o p m e n t of a r i s k f u n c t i o n u s i n g t h e V A d a t a a s a n e x a m p l e : (1) s e l e c t i o n of r i s k factors; (2) s e l e c t i o n of a n a p p r o p r i a t e m o d e l a n d t h e v a l i d i t y of a s s u m p t i o n s of t h e m o d e l ; a n d (3) m e t h o d s o f e v a l u a t i n g p e r f o r m a n c e . T h e s e i s s u e s w e r e e x a m i n e d w i t h r e s p e c t to t h e d e v e l o p m e n t o f a f u n c t i o n t h a t m a y b e g e n e r a l i z e d to o t h e r populations. I n o r d e r to d e v e l o p a f u n c t i o n t h a t h a s g e n e r a l a p p l i c a b i l i t y , i n d e p e n d e n t s o u r c e s of d a t a s h o u l d b e u s e d for s e l e c t i n g r i s k f a c t o r s a n d for d e v e l o p i n g and testing the risk function. Often, risk factors are selected based on internal

Table 6

S u m m a r y o f S l o p e a n d R 2 b y M e t h o d of E v a l u a t i o n

Method of evaluation

Slope + SE

R2

Resubstitution

0.897 -+ 0.143

0.737

Half-sample replication Group 1 to G r o u p 2 Group 2 to G r o u p 1

0.858 - 0.227 0.666 _+ 0.223

0.505 0.390

Jackknife

0.891 + 0.144

0.733

Cross-sample validation VA to Alabama Alabama to VA

0.754 _ 0.122 1.280 -+ 0.197

0.731 0.750

58

P.N. Peduzzi et al. searches of study data without consideration of what has been demonstrated in the literature. This method is biased toward good prediction on the original data and may result in poor prediction in independent populations. Selection of an appropriate statistical model is frequently data dependent. For survival data, the Cox model has been shown to be effective when both the survival time and outcome (alive, dead) are recorded. In addition to the graphical procedure outlined by Kalbfleish [9], alternative methods for testing the proportional hazards assumption associated with this model have been proposed by Cox [1] and more recently by Schoenfeld [13]. Cox included time-by-risk factor interaction terms in the model and demonstrated that proportional hazards would be violated if significant interactions with time were present. Schoenfeld developed a procedure based on chi-squared goodness of fit tests to examine both the assumptions of proportional hazards and of log-linearity. Logqinearity can also be tested by including nonlinear terms in the model, such as interaction and higher order terms, as we have done. Evaluating the performance of a risk function should be based on methods that use independent sets of data or subsets of the original data, separately, for estimation and testing. The methods used in this paper to satisfy these criteria were half-sample replication, jackknifing, and cross-sample validation. Resubstitution alone is a weak method of validation because it is biased in the direction of good performance. However, it can be useful as anindication of model validity. Half-sample replication may give only an indication of the predictive power of a risk function, particularly when the sample size is not large. The two subsets constructed by randomly dividing the sample in half are not truly independent because they are part of a larger set of data usually collected under a protocol and can be expected to be correlated. Although separate risk functions are developed on each half of the data for testing, the function used in actual practice is usually based on all the data because the coefficients are estimated with greater precision. The jackknife procedure has been shown to be a linear approximation of the bootstrap methods outlined by Efron [14]. For regression models, bootstrapping is more suitable for determining the stability of the regression coefficients than jackknifing, which may be inefficient. The jackknife procedure can also be viewed as an extension of the method of half-sample replication where subsets are based on excluding one patient instead of half the patients. Predicted probabilities are independent in the sense that the separate risk functions are derived independently from the data of the individual to w h o m the risk function is applied. However, the subsets created by sequentially excluding one patient are correlated and with large samples may produce risk functions that~are nearly identical to the function based on all the data, as in the VA sample. This procedure may be more suitable for smaller samples. The jackknife procedure, as we have applied it, examined the performance of the risk function without an assessment of its stability. It would have been desirable to carry out a double jackknife procedure [10] by leaving out two patients at a time, one for estimation and one for testing purposes. In this case an assessment of the variability of each patient's predicted probability of dying could have been determined. For example, the first patient would be left out of the analysis and 493 separate risk functions would be obtained

Validation of a Risk Function

59

by jackknifing the remaining patients. The risk functions w o u l d be applied to the excluded patient's data and the variability of his average predicted probability of d y i n g could t h e n be estimated. This process w o u l d be repeated for every patient, or 494 times. H o w e v e r , the cost of developing 494 x 493 separate risk functions w o u l d be prohibitive. The m e t h o d s of half-sample replication and jackknifing r e p r e s e n t cross validation within a single population. Cross-sample validation involving two i n d e p e n d e n t populations is preferred because it avoids some of the problems connected with cross validation within a population, such as correlated subsamples. This p r o c e d u r e provides a first approach to testing the general applicability of a risk function. Ideally, p e r f o r m a n c e should be evaluated on several i n d e p e n d e n t populations. The main problem with this m e t h o d is the availability of data on comparable populations. Direct c o m p a r i s o n a m o n g these m e t h o d s of validation with respect to performance of a risk function is difficult. O n e m e t h o d of c o m p a r i s o n is to examine the consistency of p e r f o r m a n c e based on a c o m m o n criterion, such as R 2. In the VA example, all m e t h o d s of evaluation, except for half-sample replication, gave consistent findings. We attribute this consistency in part to the use of established risk factors.

ACKNOWLEDGMENT The authors wish to acknowledge that data collected by the participants in the VA Cooperative Study for Coronary Arterial Occlusive Disease were used in this manuscript with permission. A complete listing of the participants in this study can be found in reference [3]. The Alabama data were collected as part of Grant No: NIH-SP-50-HL17-667-05(SpecializedCenter of Research --Natural History of Ischemic Heart Disease). We also wish to thank Victor Latvis and Margaret Lee for their technical assistance.

REFERENCES 1. Cox DR: Analysis of Binary Data. London: Methuen & Co., Ltd, 1970 2. Cox DR: Regression models and life tables (with discussion). JR Statist Soc B34:187-220, 1972 3. Detre K, Peduzzi P, Murphy M, Hultgren H, Thomsen J, Oberman A, Takaro T, the VA Cooperative Study Group for Surgery for Coronary Arterial Occlusive Disease: Effect of bypass surgery on survival of patients in low and high risk subgroups delineated by the use of simple clinical variables. Circulation 63:1329-1338, 1981 4. Detre K, Hultgren H, Takaro T: VA cooperative study of surgery for coronary arterial occlusive disease III. Methods and baseline characteristics, including experience with medical treatment. Am J Cardiol 40:212-225, 1977 5. Murphy M, Hultgren H, Detre K, Thomsen J, Takaro T, Participants of the Veterans Administration Cooperative Study: Treatment of chronic stable angina; a preliminary report of survival data of the randomized Veterans Administration cooperative study. N Engl J Med 297:621-627, 1977 6. Coronary Drug Project Research Group: Factors influencing long-term prognosis after recovery from myocardial infarction--three year findings of the Coronary Drug Project. J Chron Dis 27:267-285, 1974

60

P.N. Peduzzi et al. 7. Frank CW: The course of coronary heart disease: Factors relating to prognosis. Bull New York Acad Med 2nd series 44:900, 1968 8. Breslow N: Covariance analysis of censored survival data. Biometrics 30:89-99, 1974 9. Kalbfleish J: Some extensions and applications of Cox's regression and life model. Presented at Biometric Society meeting, Tallahassee, Florida, 1974 10. Mosteller F, Tukey JW: Data Analysis and Regression. Reading, MA: AddisonWesley, 1977 11. Dunn OJ, Clark VA: Applied statistics: analysis of variance and regression. New York: John Wiley & Sons, 1974 12. Ostle B: Statistics in research. Ames, IA: The Iowa State University Press, 1963 13. Schoenfeld D: Chi-squared goodness-of-fit tests for the proportional hazards regression model. Biometrika 67:145-153, 1980 14. Efron B: Bootstrap methods: another look at the jackknife. Annals of Statistics 7:1-26, 1980