Design aspects for clinical trials with repeated measures in the presence of informative censoring

Design aspects for clinical trials with repeated measures in the presence of informative censoring

Journal of Statistical North-Holland Planning and Inference 109 42 (1994) 109-122 Design aspects for clinical trials with repeated measures in th...

935KB Sizes 0 Downloads 57 Views

Journal of Statistical North-Holland

Planning

and Inference

109

42 (1994) 109-122

Design aspects for clinical trials with repeated measures in the presence of informative censoring Margaret C. Wu Biostatistics

Received

Research Branch, National

8 September

Heart, Lung, and Blood Institute,

1992; revised manuscript

received

24 September

Bethesda,

MD 20892, USA

1993

Abstract Likelihood-based procedures derived under the linear random effects model and distribution-free tests obtained by combining marginal U statistics as well as by ranking the individual effects are reviewed for clinical trials with repeated measures. Simulation studies are illustrated to evaluate the effects of informative censoring and model misspecifications on statistical power and sample size. The results indicate that in some situations with information censoring the combined marginal U statistic could suffer severe loss of power. Furthermore, in the presence of informative censoring with some model misspecifications all other procedures could also suffer some power losses, although not as severe. Therefore, in planning clinical trials in which informative censoring is likely to occur, it is important to conduct some simulation studies based on realistic parameter estimates such that a proper method of analysis could be selected and appropriate sample size adjustments could be made. AMS

Subject Classijcation:

Key words: Informative

62H15.

censoring,

longitudinal

approach,

marginal

approach.

1. Introduction In clinical trials in which the primary

response

is a continuous

variable

participants

are usually followed at prespecified intervals to obtain serial measurements of the response. The primary objective is to compare the response curves between treatment groups. Missed visits, participants’ death and withdrawal often cause the serial measurements to be incomplete for some participants. Design considerations such as, sample size, study duration and statistical power usually depend on the type of alternative hypothesis to be tested, the type of statistical analysis to be performed and the nonresponse mechanism.

Correspondence to: Dr. M.C. Wu, National Room 2Al1, 7550 Wisconsin Avenue, Bethesda,

0378-3758/94/$07.00 0 1994-Elsevier SSDI 0378-3758(93)E0109-T

Heart, Lung, and MD 20892, USA.

Blood

Science B.V. All rights reserved

Institute,

Federal

Building,

110

M.C. Wu/ Design aspects of clinicaltrials

Schlesselman (1975) discussed frequency of measurements and study duration for longitudinal studies in which all participants have complete observations with linear response curves. Under the linear random effects model Wu (1988) further described sample size derivations allowing for randomly missing and censored observations. Wu and Carroll (1988) presented an example in which the porbability of censoring could be dependent on the response parameters. The term informative censoring was used to describe this type of censoring process. It has been shown that in the presence of informative censoring, the optimum statistical procedures derived under the assumption of random censoring could give biased parameter estimates and suffer power losses in hypothesis testing. Likelihood-based procedures have also been developed to account for informative censoring. These procedures were all derived under the general framework of random effects models described by Laird and Ware (1982). This type of likelihood-based approach summarizes the longitudinal changes within subjects. The expected group response curves are then obtained and compared. These methods also require the assumption that the individual measurements are normally distributed. A nonparametric test derived from ranking individual summary statistics was suggested by Wu et al. (1991). Since all of these procedures summarize the individual effects longitudinally, they will be referred to as the longitudinal approach. An alternative approach is the marginal approach. Under this approach, the response data is compared marginally between groups at each follow-up time. These marginal measures are then combined to obtain a summary statistic. Under the assumption of a completely random nonresponse process distribution-free tests for repeated measures have been developed by many authors (see, for instance, Koziol et al. (198 l), and Wei and Johnson (1985)). These distribution-free tests rely only on large sample theory to derive asymptotic normality of the summary statistics. Clearly, if the random effects model was true, the distribution-free tests could suffer some power losses and hence require larger sample sizes than the model-based approaches. The presence of informative censoring could further affect the statistical power and sample size requirement of the distribution-free tests. On the other hand, if there were model misspecifications then the model-based approaches could suffer losses of power. Hence, larger sample sizes may be required than in situations in which the assumed model is true. In this paper, the use of simulation studies to address these design considerations are illustrated.

2. Models for the response and nonresponse process 2.1. Linear random efsects model for the response Let the participants of a clinical trial be assigned to two treatment groups of sample sizes nk for k= 1 and 2. The combined sample size of the entire study is

111

M.C. Wu /Design aspects of clinicaltrials

n = n1 + nz. A total of J measurements of the response variable are planned for each participant at time points, t1 =O, tZ, . . . , t J. Let ji> 2 be the number of measurements actually made for the ith participant of the combined sample, and vector of serial measurements. The notaYi=(Yil, Yi2, ...) Yiji)’ be the corresponding tion iGk is used to denote the kth treatment response curves.

that the ith participant

group. We will first restrict The cases of more general

in the combined our presentation response curves

sample belongs

to

to the case of linear will be discussed in

Sections 4 and 5. We assume that the response curves are linear with respect to measurement time in both groups. Let pi = ( Bi1, Bi2)l be the unobserved vector representing the initial value and slope of the response

variable.

We have for iek (k= 1 or 2)

Yi = Xlsi + ei,

where pi-

N(Bk,Cq) with

(1) Bk=(Bklr

Bk2)1,

and

with I represents the ( ji x ji) identity matrix. The objective is to test the null hypothesis of equal expected slopes, Ho: B12 = Bzz aagainst the alternative hypothesis, H1 : B1 z < Bz2. The test statistics used are usually in the form of

where &,

for k= 1 and 2 are estimates

of parameters,

Bk2

as defined in equation

(1)

and 3; is the estimated variance of (B^22-s12). In most clinical trials, a participant’s death and withdrawal often cause some of the response vector to be incomplete. Hence, the model of the nonresponse process as well as the model for the response could affect the choice of method for parameter estimation.

2.2.Models for nonresponse Let Zi and y be vectors of fixed covariates and parameters for the nonresponse process. Let Ri= (Ri, , ...,RiJ)’ be the nonresponse function for the ith participant such that, Rij= 1 if the jth response was observed for the ith participant and Rij=O

M.C. Wu /Designaspectsof clinicaltrials

112

otherwise and letf(Ri 1Yi,Xi, Zi:, r) denote the density of the nonresponse function Ri given YipXi, Zip, and y. The hierarchy of models for nonresponse introduced by Rubin (1976) and (further) discussed by Little (1982) is as follows. (A) Missing completely at random (MCAR): Under this process f(R, I yi, Xi, Zi, Y)=f(Ri I xi, 4, ~1, i.e. the nonresponse process is not dependent on the response variable. Most standard approaches to analysis are valid under this process. (B) Ignorable nonresponse process: Let Yi=( Yio, Yiu)’ where Yio and Yiu correspond, respectively, to the observed and unobserved components of responses for the ith participant. Under this process f(Ri I Yi,Xi, Zi, Y)=f(Ri I yio, Xi, Zi, YL i.e. the nonresponse process is dependent on the observed response only. It can be shown that this type of nonresponse process can be ignored under the likelihoodbased approach to inference (Laird, 1988). (C) Non-ignorable nonresponse process: Under this process, f(Ri ( Yi, Xi, Zi, y) is dependent on the observed response Yio as well as the unobserved response Yi”. In this case, valid likelihood-based inferences can only be made by specifying the density for the nonresponse. The term ‘informative censoring’ was used by Wu and Carroll (1988) to represent the censoring process under which the probability of censoring is dependent upon the response parameters, pi as defined in (l), i.e.f(Ri 1 Yi, Xi, Zi, y) =f(Ri 1pi, Xi, Ziy 7). For instance, under a probit informative censoring model, the probability of the ith participant being censored prior to follow-up time tj, for j = 2, . . . , J is give by P(tJiG

tj)=4WSi+

Yojl.

(3)

Under this censoring process, for two participants with the same observed response at baseline the one with a large unobserved slope could be censored prior to the first follow-up. On the other hand, the participant with a much smaller slope may have complete the observations during the entire study. Hence, the probability of nonresponse is dependent upon the unobserved as well as the observed responses. Therefore, informative censoring can be viewed as a type of nonignorable nonresponse process. Similar models have been used by econometricians, although not in a repeated measures setting. The sample section models described by Heckman (1979) and Maddala (1983) are examples of such models. 2.3. Parameter estimation An approximate conditional linear (CL) model using censoring time as a covariate in a random effects model was proposed by Wu and Bailey (1989) to account for

113

M.C. Wu/Design aspects of clinical trials

informative imations

censoring.

They showed that this approach

to many commonly

the proportional

hazard

used censoring

functions

could provide

good approx-

such as, the probit function

and

model.

Let (bi

1 tiji

=

[(bil

tj)=

=

1 tij,

=

tj),

(bi*

1 tij,

=

tj)]'

(X{Xi)- ‘Xi Yi

be the ordinary least-squares estimate (OLSE) of pi. Under the conditional linear model of Wu and Bailey (1989), the estimated individual mean or slope given censoring time is assumed to be a linear function of censoring time, i.e. (biz

Itij,=tj)=C(Ok+altj+&ij,

(4)

coefficients and Eij is a normally where ozl and @ok, for k= 1 and 2, are unknown distributed random error with mean 0 and variance a$ To derive the test statistic of equation (2), the expected group slopes, Bk2, have to be estimated. From (4) we have Bk2

=

Ei,k(bi2

I tij,)

where Eiek(tiji) is the expectation of censoring time for individuals belonging to the kth treatment group. The conditional linear (CL) models estimates for the expected group slopes are given by &(CL)

= &,k + 81 &k,

(5)

where fk and &ok and Bi are weighted least-squares estimates of C(Ok and al. When c(i = 0, (5) becomes the usual weighted least squares estimate (WLE), which is the maximum likelihood estimate of the expected group slope derived under noninformative censoring: =&k(tiji)/tlk

&(WLE)

= &Ok.

(6)

Note that the weight used in obtaining the WLE (6) is proportional to the inverse of the variance of (biz 1tiji), which is a function of the number and the time of measurements made for the participant. If censoring is informative than an individual with a larger decline (or increase) in the response tends to be censored early in the study. Thus, a smaller weight is given to a larger decline (or increase) in obtaining the WLE. Hence, the WLE could give biased estimates of the expected group slopes in situations with informative censoring.

114

M.C. Wul Design aspects

3. Distribution-free 3.1. Combining

of clinical trials

tests

marginal

Most distribution-free

tests tests developed

for analyzing

repeated

measurements

with

incomplete data usually compare the responses between groups marginally at each follow-up measure. These marginal statistics are then combined to form a summary statistic. Censoring completely at random was assumed in the development of all the distribution-free tests of this subsection. The optimum choice of weights for combining the marginal statistics are dependent upon the type of alternative hypothesis to be tested. 3.1.1. The alternative of stochastic ordering Koziol et al. (1981) proposed a conditionally distribution-free test for the equality of several growth curves with incomplete observations. Their method is based on the rank permutation principle. We will consider the test statistic suggested by these authors which is appropriate for the detection of alternatives involving stochastic ordering of distributions. That is, HI : F2j( I’) 2 F,j( Y) for j = 2, . . . , J with at least one strict inequality, where Fkj@) is the distribution function of the response in the kth group at the jth time point. Let Nki be the number of participants in the kth group yielding observations of the response variable at the jth measurement time. Let S,j= C

(7)

LZj(Rik’)/N,j,

id

where for j> 1, I@“ is the rank of (Yijthe jth measurement

Yil) among

time for both groups

aj(R)=R-(N.j+

Nkj values available

and

test statistic,

to be referred to as the unweighted

3.1.2.

To test against

(8)

S,j and Vk, is the sample

The alternative

sum of

is

Z(USR) = S,./(~I.)~‘*, where Sk.=CSZ2

at

1)/2,

for any rank R. The proposed rank (USR) statistic,

N.,=Ci=,

variance

of Sk..

of a constant location shift

an alternative

hypothesis

in the form of a constant

location

shift,

where 6’is the location shift in group 2 from group 1, Wei and Johnson (1985) showed that the optimum test is a weighted sum of the marginal tests (U statistics) obtained for each of the follow-ups. The proposed weights were proportional to the inverse of the observed covariance matrix of the marginal U statistics.

M.C. Wu /Design aspects

3.1.3.

The alternative

of linear location shift

For situations in which testing for equality the location shift of the response variable increasing

115

of clinical trials

with follow-up

in linear slopes is the primary objective, in group 2 from group 1 is linearly

time. Hence, the alternative

Hr:F,j(Y)=F,j(Y+tj0)=F(Y+tjB)

of a linear

forj=2,

location

shift

. . . ,J

(9)

is more appropriate. Using the principle of generalized least squares and the method described by O’Brien (1984), it can be shown that the optimum test statistic for this alternative is again a weighted sum of the marginal U statistics. This statistic, to be referred to as the weighted sum of U statistic is

Z(WSU)=(

~WjUjj)/(d/ilW)1'2,

with W=(Pt2Z*2r*2

,...,~-'tjZ*J)*IJ)),

the observed covariance matrix of U= ( U2, . . . , 7JJ)I, and 4 and 4j the proportions of observed responses at the jth measurement for groups 1 and 2, respectively. Using the marginal rank statistic, S,j, proposed by Koziol et al. (1981) for Uij the optimum test static for this alternative is in the form of a weighted sum of the marginal rank statistics

2,

ZWW

= &d~~w, )1’2,

where Swk.= T/i- ‘Sk, &=&2,

T=(t

.'.,&)I,

2,...,b)

and /1^ is the observed covariance matrix of Sk2, . . . , Skj and VW1= Tk’T estimated variance of Swl.. This test statistic will be referred to as the weighted rank statistics (WRS).

is the sum of

3.2. Linear rank statistic for individual eflects A linear rank statistic obtained by ranking the OLSE of the individual slopes for all participants in both treatment groups was proposed by Wu et al. (1991). Let Ri be the rank obtained for the ith participant based on the OLSE of the individual slope bi2. The linear rank statistic based on ranking of the slopes (RS) is Z(RS) = 1 CR, - (n + kl

1)/2XVI3

(11)

116

M.C. Wu/Design aspects of clinical trials

where I’: is the estimated

variance

of the rank sum among

individuals

in group

should be noted that in the presence of censoring the biz’s may have unequal caused by the varying number of observed responses among participants. asymptotic

normality

are independently shown to be robust DeMets,

of Z(RS) has been proved

and identically to deviations

distributed,

under the assumption

the property

from the identical

of normality

distributions

1. It

variances Although

that the biz’s has also been

of the bix’s (Lee and

1992).

4. Examples

and illustrations

Wu and Carroll (1988) presented an example of the feasibility study of antiproteolytic replacement therapy trial among individuals with PiZ phenotype. Individuals with PiZ phenotype tend to develop severe alpha,-antitrypsin deficiency leading to pulmonary emphysema and accelerated decline in lung function. The trial was designed to detect differences in rates of decline in l-second forced expiratory volume (FEV,) between a control group and a therapeutic group. Linear response curves were assumed for both treatment groups. Based on data retrospectively collected from ten US institutions, Wu and Carroll (1988) found that the probability of a participant being censored by death could depend upon his (or her) underlying FEVi intercept and slope. They showed that a probit censoring model in the form of (3) provided a good fit to the data. Based on the PiZ data, the censoring parameters were estimated as y1 = -4.6 and y2 = - 13.8. Furthermore, it is anticipated that participants in the control group who are not doing well are more likely to drop out of the study to seek advice from their own physicians. Therefore, a stronger dependency of the censoring probability on FEV, slope in the control group than the therapeutic group is expected. In conducting the feasibility study clinical investigators were interested in questions such as the following: (1) Which method of analysis should the design of the proposed study be based upon? (2) What are the effects of informative censoring and some derivations from the underlying assumptions of the linear random effects model? (3) How does one make sample size adjustments to account for informative censoring and some model deviations? The simulation studies of this section are designed to answer these questions. Design parameters based on the linear random effects model of (1) were estimated from the PiZ data as follows: the expected FEVl intercept B, I = 966 ml, the slope for the control group 8,, = -90 ml/yr, the within individual FEVl initial value standard deviation 8.8= 155 ml, the between individual FEVl initial value standard deviation d,, =390 ml and the FEVl slope standard deviation for the control group d,, = 91 ml/yr with oBlpz =O. Simulated clinical trials were used to compare the two likelihood-based procedures discussed in Section 2 (the weighted least squares estimate (WLE), the conditional

MC. Wu/ Design aspects of clinicaltrials

117

linear model procedure (CL) to adjust for informative censoring) to the distribution-free tests of Section 3 (the unweighted sum of rank statistic (USR), the weighted sum of rank statistic (WSR) for the alternative of linear location shift, and the rank statistic of individual slopes (RS)). 4.1.

Linear

random eflects model

By analogy to the Intermittent Positive Pressure Breathing Trial (1983), the proposed study duration was assumed to be 3 yr with 4 FEVl measurements each year in addition to a baseline measurement. The FEVi measurements were generated for each participant of the study according to equation (1). Estimates obtained from the PiZ data were used as parameter values in generating the control group measurements. The theraputic group FEVi values were generated similarly except that under the alternative hypothesis FEVi slopes were assumed to follow a normal distribution with its mean value shifted from -90 ml/yr to -45 ml/yr. If complete observations are observed for all participants then equal sample sizes of 100 participants in each group will provide a 0.91 power to detect this difference in FEV, slopes (based on Schlesselman’s (1975) formula) at a 0.05 (one-sided) significance level. Three simulated clinical trials were generated, corresponding to: (A) noninformative completely at random censoring with y1 = y2 = 0 in (3), (B) probit informative censoring with y1 = -4.6 and y2 = - 13.8 in both groups and (C) probit informative censoring similar to (B) except that y2 = - 6.0 in the therapeutic group indicating a stronger dependency in the control group of the censoring probabilities on the underlying FEV, slope. Censoring was assumed to occur only in the middle of each year of follow-up after the 2nd follow-up measurement of the year had been made. Thus, all individuals have at least 2 follow-up measurements plus a baseline measurement. The censoring parameter values yoj for each of the 3 years of follow-up were set at yol = - 1.2, yo2 = -0.70 and yo3 = -0.35 for noninformative censoring corresponding to approximately a 15% censoring in the first year and a 10% censoring in each of the remaining 2 yr and yol = 1.12, yo2 = 1.60 and yo3 = 2.27 for probit informative censoring. Sample sizes of 100 were generated for each of the two groups. For simplicity we also assume that there were no missed visits other than censoring caused by death or dropout. Normal random numbers were generated by the IMSL routing GGNPM on an IBM mainframe computer. The IMSL subroutine GGUBS was used to generate probabilities from the uniform distribution which were used in deriving the probit censoring process. The experiments were repeated 600 times in each of the 3 simulations. For a l-sided test with significance level of 0.05 the critical value of - 1.645 was used for all test procedures. For instance the decision rule was (812-822)/@2

< - 1.645

M.C. Wul Design aspects of clinical trials

118

Table 1 Comparison of statistica power and significance level under a linear random effects model with noninformative censoring and probit informative censoring with parameter values estimated values estimated from the Pi2 emphysema dataa Methods

Noninformative censoring

Probit

informative

censoring

Same censoring coefficients y1 =4.67, y2 = - 13.8 in both groups

WLE CL USR WSR RS

Different censoring coefficients y1 = -4.6 - both groups y2 = - 13.8 - control yz = - 6.0 ~ treatment

Powe?

Achieved significance

Power

Achieved significance

Power

Achieved significance

0.84 0.84 0.58 0.66 0.74

0.06 0.06 0.05 0.05 0.05

0.76 0.84 0.55 0.65 0.70

0.07 0.08 0.05 0.05 0.05

0.59 0.77 0.38 0.36 0.62

0.02 0.05 0.01 0.01 0.02

a FEV, measurement error standard deviation o,= 155 ml, initial value and slope standard deviation cr,,=390ml nd cr,,=91 ml/yr, 0 s1s2 =O. The expected values for FEV, initial value and slope were B1 r = B2, =960 ml, I?,, = - 90 ml/yr and Bz2 = -45 ml/yr and equal sample size of 100 in each group. ‘Simulated power for WLE=0.91 when all participants have complete observations

for the likelihood approaches, where ii2 and tiz2 represent the estimated group slope means derived under each of the 2 parametric methods and e is the variance of their difference. The performances of the 5 procedures in hypothesis testing are compared in Table 1 for each of the 3 censoring mechanisms. The results in Table 1 indicate that when the linear

random

effects

model

is true,

the

RS statistic,

obtained

by ranking

the individual OLSE slopes, is more efficient in terms of statistical power than the other 2 nonparametric statistics, obtained by combining the marginal rank statistics derived at the follow-up visits. The gain in power of the RS statistic from the other nonparametric tests is most profound under informative censoring with a stronger dependency in the control group of the censoring probabilities on the FEVi slope. Column 5 of Table 1 gives the simulated power for this situation. The RS was 62% compared to 38% and 36% for the USR and the WSR statistics, respectively. Both the two parametric tests achieved much higher power than the nonparametric tests under the first 2 censoring mechanisms. In the situation of informative censoring with stronger dependency in the control group of the censoring probabilities on FEV, slope the CL procedure obtained a much higher statistical power than all other procedures. From column 5 of Table 1 the power of the CL was 77% compared to 59%, 62% and 36% for the WLE, the RS and the WSR, respectively.

119

M.C. Wu 1 Design aspects of clinical trials

4.2. Deviations from To answer random

the linear random effects model

questions

effects model

both groups.

2 and 3, we evaluate

the effects of deviations

(l), such as (a) the expected value, the linear

and quadratic

from the linear

curves are quadratic

(l), Bi = (/3,,) Bi*, /ji3)) corresponds

That is in equation

able vector of the initial variable with

response

coefficients

in

to the unobservof the response

and 1 , ... >1 Xi=

til,...,tij,

[

$1,

1 .

.

.

)

t;,

The parameter values of B13 cBs = 0.0015. The same parameter the initial value and linear slope to represent the quadratic and investigate the effects of deviation linear slope Bi2 is contaminated, distribution ramaining

= -0.8 ml/y?, Bz3 = - 1.7 ml/y? were used with values as those used in the previous simulations for were assumed. The notation of Q and Q. will be used the linear response models, respectively. We also (b) that the underlying distribution of the individual i.e. the linear slopes were generated from a normal

with a standard deviation of 40~~ for 10% of the combined sample. The 90% of the individual linear slopes were generated, as in the previous

simulations, from a normal distribution with standard deviation aa,. The notation of C and Co will be used to represent experiments generated from normal distributions with and without contamination, respectively. Since informative censoring with a stronger dependency in the control group on FEVr slope is of primary interest, different combinations of deviations (a) and (b) from the linear random effects model with this censoring mechanisms (to be referred to as I,,) were generated. For instance, the combination of ZdQoC denotes that the experiment was generated under a model with linear response curves in both groups and an informative censoring process with different censoring coefficients in the two groups. Furthermore, the individual linear slopes were generated from contaminated normal distributions. Each experiment was again repeated 600 times. The simulated power and significance levels for all 5 procedures under the 3 model combinations are presented in the top 6 rows of Table 2. The results in Table 2 (columns 5 and 6) further indicate that the marginal approaches USR and WSR could suffer severe loss of power for all 3 models considered. The CL procedure obtained the highest power under all 3 models. In

M.C. Wu/Design

120

Table 2 Simulated powers and significance sample sizes as specified Models

Statistical

aspects of clinical trials

levels for the models”,

statistical

procedures

and

Procedures

nb

WLE

CL

RS

USR

WSR

100 100 100

0.340 0.370 0.600

0.060 0.500 0.740

0.530 0.500 0.620

0.290 0.210 0.400

0.260 0.200 0.310

leuel 100 100 100

0.005 0.008 0.010

0.038 0.045 0.045

0.018 0.015 0.023

0.010 0.005 0.023

0.008 0.005 0.008

115 160

0.650 0.400

0.850 0.770

0.680 0.700

Power LQOC

IdQC

IdQGl Significance L&C

LQC IdQC Power

LQOG LQOC

a Under quadratic response curves, B,, = -0.8 ml/y?, B2s = - 1.7 ml/y? and oB2 = 1.5 ml/y?. Probit informative censoring coefficients: yr = -4.6 for both groups, y2= - 13.8 in the control group and y2= -6.0 in the therapeutic group. Other parameters are the same as in Table 1. b n = sample size in each of the 2 equal-sized groups.

situations in which the underlying normality assumption was contaminated (rows 1 and 2) however, the nonparametric method, RS, derived from ranking individual slopes was quite competitive when compared with the CL procedure. From column 5 of Table 1 and columns l-3 of Table 2 it is clear that in the presence of informative censoring with different coefficients in the 2 groups and some model deviations, the powers of all procedures are lower than those obtained under completely random censoring. To evaluate the effects of sample size modifications two more simulation studies were conducted. Simulated clinical trials with equal sample sizes of 115 and 160 in each group were generated under the models of ZdQoCo and I,Q,C, respectively. The experiment was again repeated 600 times under each model. The simulated power for the 3 procedures derived under the individual effects approach (WLE, CL and RS) are presented in the last 2 rows of Table 2. For instance, row 7 of Table 2 and column 5 of Table 1 indicated that if the model ZdQoCo were true and if the CL procedure were used a 15% increase in sample size from 100 to 115 in each group could improve the statistical power from 77% to 85%. Similarly, from rows 1 and 8 of Table 2, if the model ZdQoC were true, a 60% increase in sample size is needed to (from 100 to 160) increase the power from 50% to 77%.

121

M.C. Wu / Design aspects of clinical trials

5. Discussion The simulation derived situations

under

results

of Section

the marginal

with informative

4 indicate

approach censoring.

could

that the distribution-free

procedures

suffer severe loss of power

These marginal

approaches

in some

were derived under

the assumption of censoring completely at random. Therefore, they should not be used in designing clinical trials in which informative censoring is likely to occur. In the presence of informative censoring and model misspecifications procedures derived under the longitudinal approach could also suffer some loss of power, although not as severe as the marginal approach. Therefore, in designing clinical trials with repeated measures, it is important to conduct simulation studies based on realistic parameter estimates derived from historical and concurrent data to evaluate the effects of informative censoring and model deviations on statistical power and sample size. In situations, in which quadratic response curves or higher-order polynomials are required to describe the response variable, the CL procedure derived from the appropriate response curves as described by Wu and Bailey (1989) should be used. Wu and Lan (1992) proposed comparing areas under the expected response curves or the expected responses at the end of the study for these situations. It should be noted that in equation (4) the dependency of the individual FEVi slopes on censoring time was assumed to be the same for both groups. This model was used in obtaining the CL statistics for all three censoring situations of Section 4. The performance of the CL procedure may be improved by assuming different censoring coefficients in (4) for the case of informative censoring with a stronger dependency in the control group on FEVi slope. On the other hand, the estimation of an additional parameter could also increase the variance of the test statistic. Furthermore, the relatively large significance levels obtained for the CL procedure in some situations (Table 1, column 4) indicate that the more robust bootstrap (Efron, 1979) variance estimation method should be used in conjunction with the CL procedure. The weighted sum of U statistics, WSU or WSR, are optimum combinations of the marginal U statistics for the alternative hypothesis of linear location shift of equation (9). This type of alternatives assume same shape for the distribution functions Fkj( .) of all the serial measurements for the response variable with only a linear shift in the means between the 2 treatment groups. Under the linear random effects model (l), the variance of the response for the ith participate, Yij is a;, + tf& +o,’ for j=2, . . . ,J. Therefore, the variances of Fkj( .) are not constant for all j. Nevertheless, since the varying part of the variances is dependent on & and since c$, is much smaller than ai, and 0’ in the examples presented in Section 4, the test statistics WSU and WSR are near optimum in these situations under completely random censoring. It should also be noted that the distribution-free tests of Section 3. are not designed to work under informative censoring. Therefore, it is natural for some problems to occur if there is informative censoring. Columns 5 and 6 of Tables 1 and 2 indicate that

M.C. Wu / Design aspects of clinical trials

122

in these situations,

both the significance

level and the power achieved

were much smaller than the other procedures. statistics

could

be made

to account

Perhaps,

for informative

some adjustments censoring.

Further

by these tests of these test research

is

needed in this area.

References Anthonsien, N.R. (1989). Lung health study. American Review of Respiratory Diseases 140, 871-872. Efron, B. (1979). Bootstrap methods. Another look at the jackknife. Ann. Statist. 7, l-26. Heckman, J.J. (1979). Sample selection bias as a specification error. Econometrica 47, 153-161. Intermittent Positive Pressure Breathing Trial Group (1983). Intermittent positive pressure breathing therapy of chronic obstructive pulmonary disease. Ann. Int. Med. 99, 615-620. Koziol, J.A., D.A. Maxwell, M. Fukushima, M.E.M. Colmerauer and Y.H. Pilch (1981). A distribution-free test for tumor-growth curve analysis with application to an animal tumor immunotherapy experiment. Biometrics 37, 383-390. Laird, N.M. (1988). Missing data in longitudinal studies. Statistics in Medicine 7, 305-316. Laird, N.M. and J.H. Ware (1982). Random effect models for longitudinal data. Biometrics 38, 963-974. Lee, J.W. and D.L. DeMets (1992). Sequential rank tests with repeated measurements in clinical trials. J. Amer. Statist. Assoc. 87, 136-142. Little, R.J.A. (1982). Models for nonresponse in sample surveys. J. Amer. Statist. Assoc. 77, 237-250. Maddala, G.S. (1983). Limited-dependent and qualitative variables in econometrics. Econometric Society Monograph 3, Cambridge, Cambridge University Press. O’Brien, P.C. (1984). Procedures for comparing samples with multiple endpoints. Biometrics 40, 1079-1087. Rubin, D.B. (1976). Inference and missing data. Biometrika 63, 581-592. Schlesselman, J.J. (1975). Planning a longitudinal study II. Frequency of measurements and study duration. J. Chronic Dis. 26, 561-570. Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. The Multiple Risk Factor Intervention Trial (MRFIT) Research Group (1982). Multiple risk factor intervention trial. Risk factor changes and mortality results. J. Amer. Med. Assoc. 248, 1465-1477. Vonesh, E.F. and R.L. Carter (1987). Efficient inference for random coefficient growth curve models with unbalances data. Biometrics 43, 617-628. Wei, L.J. and W.E. Johnson (1985). Combining dependent tests with incomplete repeated measurements. Biometrika 72, 359-364. Wu, M.C. (1988). Sample size for comparison of changes in the presence of right censoring caused by death, withdrawal and staggered entry. Cont. Clin. Trials 9, 32-46. Wu, M.C. and K.R. Bailey (1989). Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model. Biometrics 45, 939-955. Wu, M.C. and R.J. Carroll (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44, 175-188. Wu, M.C., S. Hunsberger and D. Zucker (1991). Comparison of changes in the presence of censoring: parametric and nonparametric methods. Proc. of the American Statistical Association Biopharmaceutical Section, 291-299. Wu, M.C. and K.K.G. Lan (1992). Sequential monitoring for comparison of changes in a response variable in clinical studies. Biometrics 48, 765-779.