A SAS macro for stepwise correlated binary regression

A SAS macro for stepwise correlated binary regression

Computer Methods and Programs in Biomedicine 49 (1996) 199-210 macro for stepwise correlated binary re saac F. Nuamaha* I, Yinsheng Qub, Saeid B. Ami...

1MB Sizes 39 Downloads 96 Views

Computer Methods and Programs in Biomedicine 49 (1996) 199-210

macro for stepwise correlated binary re saac F. Nuamaha* I, Yinsheng Qub, Saeid B. Amini” aVniversity of Pennsylvania Cancer Center, 528 Blockley Hull, Philadelphia, PA 19104-6021. USA “Department of Biostatistics and Epidemiology, Cleveland Clinic Foundation, Cleveland, OH 44106, VS’A ‘Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, VSA

Received 11 August 1995; accepted 9 January 1996

Several regression methods have been proposed for the analysis of correlated binary data, but none deals with the selection of covariates when there exist a large numberof potentially relevant covariates.We presenta SAS macro

basedon a stepwiseselectionprocedurefor the analysisof correlatedbinary data. Usingregressionmethodsbased on generalizedestimatingequationsoriginally proposedby Liang and Zeger /l] and extendedby Prentice [2], we describea scoretestfor forward selection,a Wald’stestfor backwardelimination,anda testfor modeladequacybased on generalizedscores.The methodologyand the accompanyingcomputermacroprogramwritten in MS IML are illustratedwith data from a prospectivestudy of functional declinein the activitiesof daily living in a group of elderly patients. Keywords:Correlated binary data; Generalizedestimatingequations;Longitudinal methods;~tepwiscregression; Variable selection

. Intrductisa Regression analysis of correlated data arising from Iongitudinal designshave become increasingly popular in the literature. In most of these instances, the purpose is to describe a regression model for a function of the mean of the response variable. Hn this type of data, the response vari-

ables consist of either clustered data or repeated observations of some response of interest. These * Corresponding author. ’ This work was done while the first author was at Case Western Reserve University, Cleveland, Ohio, USA

responses are usually correlated an dence among the responses needs to for in order to make correct infere related binary data, the generaiiz equations (GEE) approach proposed Zeger [l] has been used as the basis

modeling. In this paper, we present a SAS macro, called STEPCEE, whose main focus is on the selection of covariates for these models when there exist a large number of potentially relevant covariates. For independent binary outcomes, its counterpart is the stepwise logistic p stepwise covariate selection proce related binary regression is based o

0169-2607/96/%15.00 0 1996 Elsevier Science Ireland Ltd. All rights reserved PII §0169-2607(96)01718-X

2cG

1.E Nuamah

et al. /Computer

Methods

and Programs

extension of the original generalized estimating equations approach. We consider marginal models in which the marginal expectation of the binary responses is related to a set of potentially useful covariates by the logit link function. In section 2, the model and estimation based on tice’s [2] extension of Liang and Zeger’s ralized estimating equations (Liang and Zeger [l], Zeger and Liang [3]) will briefly be discussed. In section 3, a stepwise covariate selection procedure will be presented using efficient score tests for forward selection and a modified Wald’s test for ackward elimination. A test for model adequacy sing a generalized score test is also provided. In section 4, we will describe the SAS macro, E, discussing the data input, userparameters, and its output. In section 5, we will illustrate this procedure with data on functional decline in a group of elderly subjects. We shall not concentrate on any of the potential problems associated with stepwise selection in general, and we hope that any such concerns will not distract from the main issue at hand, one of variable selection for correlated binary data. The selection strategy implemented in STEPGEE is similar to the one used in PROC LOGIST [4] in that it is limited to only binary responses. We know of no other software that can implement a selection strategy using GEE, but we now of at least three other programs that use the GEE methodology for estimation of regression coefficients. Karim and Zeger [5] have a SAS macro that analyses longitudinal data using the ethodology, but cannot handle missing appearance, our program is similar to the ram of Karim and Zeger, since they are both acres and require the SAS system and its interactive matrix language (IML) [6] for implementation. Two other programs written in different computer languages are available that are modelled after the Karim and Zeger original program. The first one by Carey (available by sending a request for GEE to STATLIB.CMU.EDU) is a C version that is designed to be used as an S function The other one, RMGEE by Davis [7], is a AN version that has the additional capability of handling missing data.

in Biomedicine

49 (1996)

199-2IO

2. Regression methodology

Liang and Zeger successfully applies the approach of generalized estimati to longitudinal data whose m lated to covariates through gen models [1,3,8,9]. Prentice [2] extended the Liang and Zeger approach to allow for joint inference on the regression parameters and pairwise correlations among the observatio consider the approach taken Harrington [lo], in which th model the pairwise correlations, since it seems more appropriate for binary data. In im~le~e~ting our version of GEE, we kept these objectives in mind: (a) that one is interested in relati binary response, Y, to a set of covariates, that one is interested in characterizing the of association between pairs of outcomes using the odds ratio; and (c) one is interested in selecting from a given set of covariates a reduced set of significant covariates. The second even include describing the dep association parameters on the c ever, in implementing STEPGEE this objective is not considered, that is, it has been assumed that the odds ratio between any pair of o~t~orne~ do not depend on any of the covariates. Let us assume that the i-th indiv observed a total of nj (i = 1,2,...,N) time the maximum is 7’ times, i.e. rri % T, The responses are binary, that is, the binary random variable yi, = 1, if subject i has response 1, the response of interest, e.g. success at time t, and 0 otherwise. Each individual has a pnl covariate vector xit measured at time ti> which can be both time-stationary or time-varying. Let Xi**,. . *s.,Xir J and pit be, respectively, the vector of covariates and mean corresponding to yit, the t-th outcome far individual i. Let yi = (y;t, yiz,... .,yini) and let Xi be the ni X p matrix with rows equal 20 Xily i.e. Xi = (Xi13 X&.. ..)x.1ni)* The marginal distribution of I’, is binary and can be represented as

where

1.F. Nuamah

et al. /Computer

Methods

and Programs

= PdIP)

=

4

yiJ

=

Pr(

Y;,

=

1 IX;&)

is the probability of success at time t; and 0 is a p x 1 vector of parameters. Here the link function is the logit, a natural choice for binary data. The arginal probabilities can also be put together to rm a vector PAP) = E( YJ = (pLil, pLi2,...., pini)7 To estimate /3, Liang and Zeger [I] and Prentice [2] ‘der generalized estimating equations of the N

Al

D;TVi-lcvi

=

-

~,~p))

=

0

(1)

i=l

where i = acli(p)lap, and Vj is a ‘working’ covariance matrix of Yi. This working covariance matrix has the fotm

C1

E( Yis~j~) = P( Yip = Yj, = 1) =

-

&is

- 4(Yd

+

PiIN1

49 (1996)

199-410

201

which for a 2 x 2 table is given as the crossproducts ratio. Let the log-odds ratio = hTi,p, where his, is a 4 X i ve specifies the form of the relati association parameters and covariates, and CYis a q x 1 vect parameters where q is arbitrary. Thus, y is a 1nl(ni - 1)/2 ) x 1 vector of pairwise odds ratios, expressed as a function of a q x I vector a, III some studies it may be reasonable the ys are identical, i.e assume a co odds ratio. In STEPGEE, the form o tion structure ameters would ever, one cannot specify the form of the relation ~~twe~~ the association parameters and the vector of covariates, that is, it is assumed that log(‘is,) = cc. For each cluster, define a lni(ni - 1)/2] x 1 vector such that Z; = @iiyiz, yiiyi3,...., yq- tyia,f’. n.If S f t, it has Let vi = E(ZJ, then for s,t = 1>2 30.~) been shown by Mardia [l l] that

and Pit

in Biomedicine

-

Yist)

-

it

l

-

his

- lhsPifl 1’2Y2(Y;sf -I),

+

PitIC

-

Yisr)

if Yisf# 1,

I 1’2

14)

if yis, = 1

vi = Ai!‘2Ri(&p

(2)

where Ai = diag 1Var( Y;,) ) = diag( pir( 1 - p;,) ) , and &{a) is a ‘working’ correlation matrix. Here, is a function of both ,6 and (Y, and is specified entirely by the marginal distributions, i.e. by @and XCY)= corr (YJ. Lipsitz, Laird and Harrington [lo] suggested using the odds ratio, represented here as y, as an alternative to the correlation coefficient. The pairwise association is measured in terms of odds ratio between YiJ and Yil, i=l2. 3 3 .a,n.1, s f t,anddetinedas Y& =

Br( Yj# = 1, Y/f = l)Pr( Y& = 0, Yit = 0)

Pr( Yis = 1, Yi::,= O)Pr( Yis = 0, Yit = 1) (3)

which depends on the parameter set S = (0, o) through pis = E( YjS), pir = E( Y,J and yisr, the wise odds ratio. Note that, if we let fiist = E( Yi,YJ then cordh,

~4 =

is the (s, t)-th element of Ri(~) in (1) Using arguments similar to Prentice [2], Lipsitz et al. showed that a second set of estimating equations of the form (1) could be obtained as

202

I.F. Nuamah

et al. /Computer

Methods

and Programs

where Ei = &I,@, a)/&. Although vi is a function of both a and p, we assume that /3 is fixed in qi in (6), that is why in Ei, we take derivatives only with respect to o. The two sets of estimating equations (I) and (6) provide a unified approach to estimating /3 = (Pi, & ,..., &j and cx = (cY~,o2 ,..., LYJ. Following Prentice [2], the estimation of the parameter values of 6 = (0, a) can be obtained iteratively by using 2-Z

0

h(P)

c

U,(a)

(7)

+

[

I

where Vi(@) and U2(o) have been defined in (1) and (6), and

in Biomedicine

49 (1996)

199-210

(91 i= 1

Iterating between these two equations leads to consistent parameter and variance estimates, This form is similar to the alternating logistic regressions approach of Carey, Zeger and Diggle ]13]. Let fi be the GEE estimate obtained, and if the variance of Y is correctly specified, then the variance of B is estimated by var(& = ( The robust variance estimate, which is co~sistc~t even if the covariance structure ofy is ~sspeci~ed has been given as (Liang and Zeger 11 Liang [3]):

B=

vi%@) = x

-1 C=

Ei*Wi”Ei >

owever, a computationally simpler version can be used in practice (Prentice [2]; Qu, Piedmonte and Williams [12]) by setting the matrix B to zero resulting in p + q equations. STEPGEE prints out the values of the matrix B so that users can decide for themselves how good the resulting equations are in computing the parameter estimates. From our experience, this simpler version is very good and the values of B were always less than 9 x 1K3. The resulting equations are

X

(II)

3. Theoretical considerations 3.1. Asymptotic distributions We shall now discuss some of properties of the score function U(p), defined as

i=l

where Vi is the variance matrix of yj. This score function has the following properties: TVl”(Yi - /.4j)

(8)

P.F. Nuamah

et al. /Computer

Methods

and Programs

N (-au/a@

I)iTV”Di

=

= J

(14)

i=l

wbere 9 is the Fisher information, and the variance of U equals J. If the true variance of yi is not known, and Vi is a ‘working’ covariance matrix of Yi, then the properties of (13) and (14) still hold, but the variance of the score U is now given as AT

(15) i=

in Biomedicine

49 (1994)

199-210

203

responds to testing that all p ~arametcrs are jointly zero. For the simple null hypothesis for any k-th element from the ,P ~arameters~ the Wald test corresponds to

where Vkk is the (k,k)-th element of the variancecovariance matrix Vo. An alternative method of testing the joint significance of all variates @s) uses the score test statistic

I

which is not equal to the information matrix J unless Vi = varbi) for all i = 1,2,...,N. Under mild regularity conditions (Rao [ 141, Liang and Zeger 11) the distribution of the score function U is symptotically multivariate Gaussian with mean zero and variance-covariance matrix C. Also, the distribution of M’” (6 - 0) is asymptotically multivariate Gaussian, with mean zero and nonsingular variance-covariance matrix NJ-‘CJ-‘. We note here that the method by which the asymptotic distribution of the GEE estimators are determined follows closely those of the maximum likelihood estimators, i.e. they involve the expan‘on of the score function by Taylor’s theorem. he joint asymptotic distribution of 6 = (0, CY) follows directly from the result above [2]. 3.2. ~~p#t~e~is testing From the preceding asymptotic probability distribution based on the generalized estimating equations, aid and score-type tests can be constructed. To test the null hypothesis that the pvariate vector /3 = ,&, let fi be the GEE estimate en under the regularity conditions already to, we can construct the Wald test statistic

where 8, is a vector of GEE estimates, an evaluated at &. This statistic under th hypothesis is distributed as a chi-square distribution with p degrees of freedom, The score statistic for testing the simple null hypothesis follows directly from (18) and we discuss this further in Section 4. In all these tests, the correiati eter, CY,is estimated and assumed fix limiting value. Hence, whenever we write the score function, U(8), the correlation parameter is suppressed. 3.3. Generalized score tests Suppose the parameter vector 6 can be partitioned as fi = (0, ‘, &T)T, where & and & are vectors of dimension (p - F) x 1 and r x !, respectively. The score function U and its variance matrix C, as well as the information matrix J, can also be partitioned to V(P) = (Ui r(B), Uz“(P)) “,

and where P’o = J-‘CJ-‘, which asymptotically under the null has a central chi-square distribution with p degrees of freedom. The variance term V, is evaluated at the GEE estimates 8. If we define the null hypothesis as Ho:/3 = 0, then the test cor-

JI,

J,,

J21

J22

J =

respectively. The null hypothesis Ho is & = &o.

2m

f.F. Nuamah

et al. /Computer

Methods

attd Programs

Let ,d = (a’;, fi;O)T be the vector of GEE estimates under Ho: The score ci
in Biomedicine

49 (19%)

199-210

responds with the appropriate submatrix of the correlation structure of the clusters of larger size. 4. Variable selection

U,(8)=

(19)

The generalized score test for Ho is given by

T” = OzT(~)PLIz-‘O*(~)

(20)

where pu2 is the variance matrix of the subvector 4(@. fcu2can be*calculate_d from the submatrices of &I, &I, CZ2, Jrt, and Jr2 such that 8”2 = c,, - &&9& i l~~.fi/Cll..fi/.f*2

- C,J,‘Y,, (21)

(Boos [15], Breslow [16]). Under the null hypothesis, T, has a chi-square distribution with r degrees of freedom. For a model-based score function, since C = J, P, reduces to F-u2 = 1.2 - ,J*&-&

(22)

A test for model adequacy also based on the generalized score statistic is available in STEPGEE, by testing that the parameters not selected into the final model are jointly zero. Alternatively, one could use a Wald test to test for the significance of the selected parameters. Such a method is given in the program by Davis [7]. We note here that these assumptions have been made for the tests to be valid for the correlated binary regression model: (a) consistent variance estimates are obtained under the assumption that the weighted average of the estimated correlation matrices converge to a fixed matrix; (b) the GEE estimates of the parameters obtained under the reduced model and the correlation parameters are assumed to be fixed at their limiting value, when testing for the significance of the parameters not yet selected into the model; (c) the ‘working’ correlation matrix is assumed to be the true correlation structure during hypothesis testing; (d) for clusters of varying sizes, the correlation structure of a given size cor-

Variable selection strategies have received considerable attention in the literature. They include such strategies as forward selection, backward elimination, and subset selection. Although there is no ‘best’ strategy for variable selection, a reasonable and often-used approach is the stepwise selection, and this is the selectjo~ strategy employed in STEPGEE. Stepwise selection refers to the strategy of using forward selection followed by backward elimination. While this s&ate useful in identifying significant predictive variables from among a large set of other explanatory variables, cautious use of them leads to correct interpretation of the final model. Several authors have provided broad guidelines for use with any variable selection procedure. Among them are Krall [17], Harrell et al. [ISI, Miller 1191 and Hauck and Miike [20]. To select ‘significant’ variables, we employ the score statistic defined in Section 3. Suppose that in the regression model, we have selected p - 1 variables and the GEE estimates of the regression coefficients are denoted by the vector j3r. Wow a new covariate with a regression coefficient & is added to the” model. We want to test Ho: & = pZo = 0. Let fi = (&, 0), and calculate the generalized score statistic

using

Under the null hypothesis, T, follows the chisquare distribution with 1 degree of freedom. The new variable is selected when Ho is rejected. We have assumed that the significant ~~variates already selected into the model and the estimates of the correlation parameters are fixed at their

I.F.

Nuamah

et al. /Computer

Methods

and Programs

limiting values, and therefore do not affect the asymptotic distribution of T,. If more than one variable is outside the regression model, the forward selection procedure works by first calculating the T, statistic for each of them and selecting the one with the largest Tu. If it is significant at a nominally set significant level, referred to as PIN in STEPGEE, then it is added to the model and GEE estimation procedure carried over again for the current number of parameters in the model. After each forward selection step, tests for backward elimination begin. Suppose that at the i-th e model contains i + 1 variables (including the intercept term). We wish to test for each variable currently in the model (except the intercept term) which one should be eliminated. We test the hypothesis Ho: & = 0, using the Wald test which is given in Eq. (17). If any of T,‘s is not significant at the chosen level of significance, POUT, the least significant one is dropped from the model, otherwise the forward selection step is repeated. We also opted to use the t-distribution for backward elimination instead of the normal distribution for the Wald ratio since an earlier simulation study on the small sample properties of correlated binary models seem to indicate that tests based on the t-distribution results in smaller type I error coverage probability [ 121. The degrees of freedom associated with the t-distribution are (total number of clusters - total number of parameters). The number of parameters consist of the number of correlation parameters and the number of parameters selected into the model. STEPGEE implements this stepwise procedure based on the forward selection followed and the backward elimination strategies just described. The forward addition is immediately followed by a test for backward elimination. If no additional variable can be added to the model at the preset PIN level, or the variable just added is the least significant one to be deleted, then the stepwise procedure stops, indicating no change in the model. The final model significance is tested by the generalized score test for model adequacy. The final decision regarding the suitability of such a model should be made considering all available a priori information regarding the variables and their effect on out-

in Biomedicine

come. STEPGEE direction.

49 (1996)

199-210

205

is only a guide in t

5. Program description

STEPGEE is a SAS macro for perfo wise regression analysis when the outc of repeated measurements of a binary response variable. Since the program was written using the IML procedure of SAS, it assumes that the user is familiar with a basic SAS DATA step setup and has access to the SAS system. An has to be created using the PROC SAS. The macro facility then al10 pass arguments to STEPGEE. The limitation on the size of input data depends on t SAS available at one’s site. For exa gram yielded an out-of-memory error when run on the DOS version of SAS with sample data set, but worked line in Windows, IX and OS platform versions. The time taken to implement one run also depends on the number of imput parameters and the speed of the computer’s micro sor. The program prints out a descri on of data, and some parameters that were pass to the program. It prints out at each selection or deletion, the parameters and their estimated toe robust standard errors and a P-value base statistic. It also prints out the naive errors of the selected variables at each step. In addition, it prints out the parameter selected into the model, together wi ated Wald’s tests P-values. The tion matrix selected is also parameters are required in the passed to the program: DATA: the name of the SAS input dataset; it is assumed that the input dataset was created with one line per time per subject, and missing data points have been set to SAS missing data value. OUTV: the name of the binary response variable COVRS: the names of the explanat~~ variables, separated by spaces NVARS: the total number of input variables (the ID identifier, the response variable and the explanatory variables) NCOVS: the total number of expla~ato~ variables; this is used only as a cheek

206

I.F. Nuamah

et al. /Computer

Methods

and Programs

ID: the identifier field for the each subject; repeated data for the same subject will have the YVAR: the index of the response variable, Y. XVARS: indices for the explanatory variables, X. VVAR: the ‘working’ correlation structure desired. Four correlation structures are allowed: (1) independent, (2) exchangeable, (3) onedependent structure, (4) completely unspecified structure. ITER: the total number of iterations desired, default is 20. CRIT: the convergence criterion, default is 0.001. PIN: the tail probability required for forward selection into model POUT: the tail probability required for backward elimination from the model SUBS: code for using mean imputation to replace missing elements (0 means exclude responses with missing covariates, 1 means replace missing covariate values with the mean for that field). The program is invoked by passing the appropriate parameters in this manner after a data step has been detined to read in the input dataset DATA: % STEPGEE

(DATA, OUTV, COVRS,

in Biomedicine

49 (‘1996)

199-210

A sample of 206 adults aged 7.5IOQ, who were part of a study on dysfunctional syn elderly were interviewed at five time line information consisting of what t a reliable surrogate could recall 2 w admission to a local hospital for diffe~e~ ailments; at admission; at discha discharge and 90 days post-dis come variable “Decline’ was defined as a decrease in the number of independent A five items (bathing, dressing, transfer? eating an toileting) from one time point to the next, in four dependent binary responses for most subjects. Some had missing responses at certain so a total of 723 individual responses were o ed. The binary response variable, the 17 covariables that the resea necessary are listed in Table 1. The mand was used to invoke STEPGEE. For example, the following data step part of SAS is al provided before the invocation of ~TEPGEE. libname indicating the location of the input file is also included. A SAS dataset called elder.ssd was created from the sample data set. LIBNAME run ‘c:sas\sasuser\isaac’; DATA; SET run.elder; RUN;

The STEPGEE macro is then invoked as:

%STEPGEE VVAR, ITER, GRIT, PIN, POUT, SUBS k

We illustrate the above selection procedure with data on the functional decline in the activities of daily living (ADL) in a group of elderly subjects.

(run.elder, Decline, Sex Race Marital Age Apache Depress Typefro Destin Livespx Index IADL Mobile 19, 17, 1, 3, 5 6 7 8 9 10 11 12 13 I4 f5 I6 17 18 19 20 21, 20,

0.001,

P.F. Nuamah

et al. /Computer

Methods

0.05, 0.10, 1);

The output results are listed in the Appendix. The stepwise selection procedure produced three significant variables at the PIN = 0.05 significant level. e assumed three correlation structures: the independent structure, the exchangeable or equal correlation structure and the completely unspecified structure. In this example, the same final model was selected. Only the results from the exchangeable correlation structure are therefore presented in the appendix. The goodness-of-lit statistics are included in the output. They show that the final model provided an adequate fit to the data for this type of model, since the contribution of variables not selected into the model could jointly be assumed to be zero (P = 0.9983). Under all correlation structures, the order of selection of all the variables was the same. The estimate of the common odds ratio is about 0.30 (from the logodds of about - 1.2), indicating a reduction in odds of decline with time. This reflects the fact that individuals who declined from one time point to the next were less likely to decline further after that.

and Programs

in Biomedicine

49 (19963

199-210

7. Discussion Many researchers have pointed out some of t inherent problems with any varia strategy (Harrel et al. [ 181, Miller theless, variable selection remains a in data analysis. This paper represents the first attempt to apply stepwise regression prQced~res to a new and growing field. We have s~cc~ssfuil~ implemented a stepwise procedure for correlated binary data. The performance of the selective strategy is currently being studied and will be reported elsewhere. Preliminary results suggest that under mild to moderate associations the choice of correlation structure does not adversely affect the selection criteria, that is, the same final model is usually selected, though the estimates of the regression coefficients might The appropriateness of the score tests used for selection was confirmed by low Type I errors which ranged from 3 to 9% (It was preset at 5%). compared the parameters estimates and robust standard errors obtained from STEPGEE with the results of two other items of software f5,7], a sample dataset, referred to as visits.d by these authors in their examples, we

Table 1 DECLINE SEX RACE MARITAL AGE ADLBASE APACHE EPRESS TYPEFROM DESTIN LIVESPX INDEX TH

MOBILE OL NEUROC

207

functional decline on binary scale 0 = no decline, 1 = decline 0 = male, 1 = female 0 = black, 1 = white marital status 0 = married, 1 = not currently married (range from 75 to 100) ADL at baseline (range from 0 to 5) apache score (range from 6 to 28) depression scale (range from 0 to 13) type of residence from which subject was admitted 0 = home, 1 = other residence to which subject will go after discharge 0 = home, 1 = other living situation before admission 0 = other, 1 = lived with spouse comorbid illness index (range from 0 to 10) length of stay (days) range from 2 to 43 days global health status 0 = good, 1 = poor instrumental ADL of seven items (telephone, transportation, shopping, meals, housework, manage money, medications) (range from 0 to 7) a score of 7 indicates independence, i.e. do not need help on all seven items mobility on four items (sit up in bed, write, walk across room, get from bedtochair) - range from 0 to 4 quality of life 0 = satisfied, 1 = dissatisfied, nemocognitive assessment on a 21mini mental state questionaire range from 0 to 21.

208

I.F. Nuamah

et al. /Computer

Methods

and Programs

the response variable and modelled it using these items of software. In our case, we run STEPGEE with PIN = 0.95 and POUT = 0.96 to ensure that all variables were selected into the final model model. The same regression coefficients and standard errors were obtained by all three items of software. From a data analytic point of view, the stepwise procedure implemented in the example, suggests at a patients assessment of global health, apehension about where she/he will be dischargedto after hospitalization, e.g. going home to family or nursing home, and loss of independence on instrumental ADLs seems to be most predictive of functional decline in activities of daily living. Although targeting hospitalized older patients according to risk for functional decline is often advocated, early predictors have not been previously identified and validated. Those who have attempted this have ignored the time dependence among the responses. Such a stepwise stratmay prove useful in identifying factors that ail e elderly patients at risk for functional decline, and thereby targeting them for adequate care.

in Biomedicine

TYPEFROM Di?STIN LIVESPX MOBILE NEUaJC QOL Link : LOGISTIC Correlation Structure: Total Number of Records Total Num!oer of Clusters: Maximum Clustfr Size: Minimum Cluster Size:

B

0.0013229

COVARIATE SEX RACE MARITAL AGE ADLBASE APACHE DEPRESS DESTIN LIVESPX INDEX LOS

The authors wish to thank Seth Landefeld, D., of the Department of Internal Medicine, se jested Reserve University, for making the data available to us. The data which came from a clinical trial on Dysfunctional Syndrome in Elderly Patients is funded by the John A. Hartford Fo~ndati~~~

SEX

DECLINE outcome variabie: Covariates: RACE w4R:TP.L AGE

AmBASE

APACHE

DEPRESS

199-218

INDEX

LOS

EXCIiKNGEAaLE read: 206 4 1

(Step 0)

CORRELATED BINARYFms.ESSION MODEL

TYPEFROM

A copy of the macro can be obtained by writing to Dr. Isaac F. Nuamah, University of Pennsylvania Cancer Center, 528 Blockley Hall, ~~iladel~~ia~ PA 19 101-602 1, USA. Please include either a 3% or a 5% inch diskette.

49 (1996)

HEALTH IAIJL MOBILE QOL NEUROC HEALTH

CHI-SQUARE 0.0375 0.0082 1.8227 2.1825 3.0599 0.4606 7.5638

P-VALUE cl.8461 0.9278 Cl.1770 0.1396 0.0802 0.4973 0.0060

HEALLTH

723

IADL

LF.

Nuamah

et al. /Computer

Methods

and

Programs

in Biomedicine

49

(1996)

199-210

209

SEX RACE MARITAL AGE ADLBASE APACHE DEPRESS TYPEFROM LIVESFX INDEX LOS :YIOBILE QOL NEUROC

References K.Y. Liang and S.L. Zeger, longitudinal data analysis using generalized linear models, ~iomet~~ka 73 (1986) 13-22.

R.L. Prentice, Correlated binary regression with covariates specific to each binary observation, 44 (1988)

1033-104s.

S.L. Zeger and K.Y. Liang, Long~~udiual data analysis for discrete and continuous outcomes, Biometrics 42 (1986) 121-130. SAS Institute INC. SASISTAT User’s Guide, Version 6, 4th edn., Vol. 2 (SAS Institute Inc, Gary, NC, 1989). M.R. Karim and S.L. Zeger, GEE: a SAS macro for longitudinal data analysis, Technical (Department of Biostatistics, The Johns Hopkins University, 1988). SAS Institute INC. SAS/IML Software: Usage and Reference, Version 6 (SAS Institute Inc, Gary, NC, 1989). C.S. Davis, A computer program for regression anaiysis of repeated measures using generalized estimating equations, Comp. Methods Prog. Biomed. 40 (1993) 15-31. S.L. Zeger, K.Y. Liang and P. Albert, Models for longitudinal data: A generalized estimating equation approach, Biometrics 44 (1988) IQ49-1060. K.Y. Liang, S.L. Zeger and B. Qaqish, Multivariate regression models for correlated categorical data (with discussion), I. R. Stat. Sot. B. 54 (19923 3-40. S.R. Lip&z, NM. Laird and D.P. Harrington, Generalized estimating equations for correlated binary data: Use of the odds ratio as a measure of association, Biometrika 7X (1991) 153-160.

210

Ill] 112)

[la] [14] [15] [16]

f. F. Nuamah

et al. / Computer

Methods

and Programs

K.V. Mardia, Some contributions to contigency-type bivariate distributions, Biometrika 54 (1967) 235-249. Y. u, G.W. Williams, G.J. Beck and S.V. Medendorp, Latent variables models for clustered dichotomous data with multiple subclasses, Biometrics 48 (1992) 1095-1102. V. Carey, S.L. Zeger and P. Diggle, Modelling multivariate binary data with alternating regression models, Biometrika 80 (1993) 517-526. C.R. Rao, Linear Statistical Inference and Its Applications, 2nd edn. (John Wiley & Sons, New York, 1973). D.D. Boos, On generalized score tests, Am. Stat. 46 (1992) 327-333. N. Breslow, Tests of hypothesis in overdispersed Poisson

in Biomedicine

49 (1996)

199-210

regression and other quasi-likelihood Stat, Assw.

models, J. Am.

85(1990)565-571.

[17] J.M. Krail, V.A. Uthoff and J.B. Harley, A step-up procedure for selecting variables associated with survival, Biometrics 31 (1975) 49-57. [IS] F.E. Harrel, K.L. Lee, R.M. Caiiff, DE. Pryor and R.A. Rosati, Regression modelling strategies for improved prognostic prediction, Stat. Med. 3 (1984) 143-352. [19] A.J. Miller, Subset selection in regression (Chapman and Hall, London, 1990). [20] W.W. Hauck and R. Miike, A proposal for examining and reporting stepwise regressions, Stat. Med. 10 (1991) 711-715.