Statistical Design of the Child and Adolescent Trial for Cardiovascular Health (CATCH): Implications of Cluster Randomization David M. Zucker, PhD, Edward Lakatos, PhD, Larry S. Webber, PhD, David M. Murray, PhD, Sonja M. McKinlay, PhD, Henry A. Feldman, PhD, Steve H. Kelder, PhD, MPH, Philip R. Nader, MD, for the CATCH Study Group Biosfafisfics Research Branch, National Heart, Lung, and Blood Institute, Bethesda, Ma yland, and Department of Statistics, Hebrew University, Jerusalem, Israel (D.M.Z.); Searle Pharmaceuticals, Skokie, Illinois (E.L.); Department of Biosfafistics and Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana (L.S. W.); Division of Epidemiology, School of Public Health, University of Minnesota, Minneapolis, Minnesota (D.M.M.); New England Research Insfitute, Watertown, Massachusetts (S.M.M., H.A.F.); Center for Health Promotion, Research, and Development, University of Texas Health Science Center, Houston, Texas (S.H.K.); Department of Pediatrics, University of California at San Diego School of Medicine, San Diego, California (P.R.N.)
ABSTRACT: This paper describes some statistical considerations for the Child and Adolescent Trial for Cardiovascular Health (CATCH), a large-scale community health trial sponsored by the National Heart, Lung, and Blood Institute. The trial involves randomization of entire schools rather than individual students to the experimental arms. The paper discussed the implications of this form of randomization for the design and analysis of the trial. The power calculations and analysis plan for the trial are presented in detail. The handling of outmigrating and inmigrating students is also discussed.
Cluster randomization, group randomization, community studies, multilevel modeling, intention-to-treat principle
KEY WORDS:
1. INTRODUCTION The Child and Adolescent Trial for Cardiovascular Health (CATCH) is a randomized trial investigating the question of whether a behaviorally oriented cardiovascular health education program can produce positive changes This work receivedsupportfrom the NationalHeart, Lung, and Blood InstituteCUOIHL 39927, UO1 HL 39852, UOZHL 39870, UOl HL 33906, UOl HL 39880). Address reprint requests to: Sonja M&inlay, PhD, New England Research Institute, 9 Galen Street, Watertown MA 02172. Received September 22, 1993; revised June 16, 1994. 0197-2456/95/$9.50 SSDI 0197-2456(94)00026-Y
Controlled Clinical Trials 16:9&118 (1995) 0 Elsevier Science Inc. 1995 655 Avenue of the Americas, New York, New York 10010
CATCH Statistical Design
97
in the cardiovascular health habits and risk factor profile of elementary schoolchildren. This paper discusses special statistical issues that are presented by the design of the CATCH trial. The CATCH study rationale and design are presented in detail elsewhere [I]. Here a brief overview is given. It is well recognized that a major role in the development of cardiovascular disease (CVD) is played by behavioral risk factors such as high fat consumption, high sodium consumption, sedentariness, and smoking. Because habits in the areas of diet, exercise, and smoking develop and become entrenched during the childhood and adolescent years [1,2], an obviously appealing strategy for primary CVD prevention is to provide a school-based educational program designed to promote healthy habits in these areas during these formative years. Recently, there has been an evolution in the prevailing notions concerning how to teach youth about health-enhancing lifestyles, with a shift from programs designed merely to instill knowledge of health facts to programs designed to instill the social and behavioral skills necessary to translate that knowledge into action. In addition, there has been a growing recognition that classroom programs are more effective when the home and other environmental settings are also addressed [3,41. During the past decade, in particular, significant advances have been made in the development of behaviorally oriented classroom curricula and innovative family-oriented and environmental interventions aimed at promoting a lifestyle conducive to cardiovascular health. These advances are extensively reviewed in Ref. 5. Work in the area to date, however, has been largely limited to relatively small, single-site studies that investigate shortterm intervention targeted at only one of the risk areas of interest. CATCH is, designed as a multisite investigation of a multiyear program that combines curriculum, environmental, and family components and that addresses diet and exercise (and, to a limited extent, smoking) simultaneously. Specifically, the CATCH trial involves 96 elementary schools, 24 at each of four sites, that have been randomized on a site-stratified basis across three experimental arms: 1. A control arm (C), including 40 schools, in which the usual curriculum, food service, and physical education program are provided in the school 2. A school-based intervention arm (S), including 28 schools, in which a behavioral cardiovascular health education program is added to the curriculum, the cafeteria food service is modified to provide lunches with less fat and salt, the physical education program is modified to promote more vigorous physical activity, and a nonsmoking policy is introduced into the school, and 3. A combined school-based and family-based intervention arm (S+F), including 28 schools, in which the intervention comprises all of the elements of intervention S plus a program of family involvement in cardiovascular health education activities that complement and are conducted concurrently with the classroom curriculum The number of schools was determined on the basis of formal statistical power calculations, as discussed in Sec. 4. The number of sites was determined on the basis of administrative considerations. The sites are located in
D.M. Zucker et al California, Louisiana, Minnesota, and Texas. The schools that have been recruited from these sites represent a diversity of socioeconomic, ethnic, and cultural groups. The CATCH intervention began in January 1992 when the students were in the middle of third grade and continued through the spring of fifth grade in 1994. A baseline measurement phase was conducted in the fall of 1991. Over the course of the trial, a wide range of knowledge, psychosocial, behavioral, and process measures was being taken on the students in the participating schools or on the schools themselves. In addition, the protocol called for a battery of physiological risk factor measurements, including serum cholesterol, blood pressure, height, weight, and skinfold measures, to be taken at baseline and at the end of the intervention period. The primary study comparison is intervention (S+F and S arms combined) vs. control with respect to change in serum cholesterol from baseline to the end of the intervention period. It is hypothesized that the intervention will lead to a mean serum cholesterol that is lower than the control group level by about 5 mg/dl at the end of fifth grade. The effect of the intervention as compared to control also will be evaluated in terms of various secondary outcome measures, including dietary measures, physical exercise measures, and physiological measures such as blood pressure and skinfold thicknesses. In addition, the combination of the school-based and family-based programs will be compared to the administration of the school-based program alone with regard to a range of outcome measures, principally dietary fat and sodium intake as measured by 24-hr recall. Specifically, it is hypothesized that the combined program, in comparison with the school-based program alone, will produce a 10% reduction in dietary fat intake (30% of calories vs. 33% of calories) and a 15% reduction in dietary sodium intake (3.0 g/day vs. 3.5 g/day). The special design feature of CATCH that sets it apart from the typical experimental trial is the nature of the randomization. In a typical trial, each participant is randomized individually to one of the experimental arms. In CATCH, however, although the student is the main target of intervention and measurement, the unit of randomization is the school rather than the individual student. Randomization by school is necessary in CATCH because the CATCH interventions by nature are designed to be implemented on a schoolwide basis, en bloc to all students in a given school. The experimental design in CATCH leads to a trial that may be described as a “community intervention trial,” in that the interventions are assigned and applied to intact social units (i.e., schools) rather than to individual participants. Randomization of this kind has been referred to in the literature as “cluster randomization” (or “group randomization”): the unit of randomization is not the individual participant but rather a cluster of participants. The design and analysis problems associated with cluster randomization were raised in the epidemiological literature by Jerome Cornfield in a 1978 note [61 and have received increasing attention since them. Recent discussions of the relevant issues include Dwyer et al.‘s [7] discussion of drug use prevention studies, Mickey et al.‘s [8] discussion of breast cancer mortality studies, Donner’s [9] review of nontherapeutic intervention trials, Koepsell et al’s [lo] discussion of analysis issues in community health trials, review papers by Murray and
CATCH Statistical Design coworkers [11,12] on school-based health promotion studies, and Feldman and M&inlay’s [13] recent attempt to formulate a unifying model for the design and analysis of cluster randomization trials. This paper focuses on the implications that this special type of randomization has had in the development of the statistical analysis plan and the statistical power calculations for the CATCH trial. Section 2 discusses general statistical considerations, with special emphasis on the choice of the unit of analysis and the handling of outmigrating and inmigrating students. Section 3 describes in technical terms a strategy for the statistical analysis of the CATCH trial. In Section 4, the statistical power calculations for CATCH are described in detail. Finally, Section 5 presents a brief discussion.
2. GENERAL
DESIGN
CONSIDERATIONS
2.1 Unit of Randomization
and Analysis
Randomization of experimental units is designed to provide two key benefits: (1) experimental groups that are appropriately balanced with respect to both known and unknown factors that may affect response, and (2) a basis for analyzing the study results without resorting to statistical modeling assumptions. The analytical benefit arises because the randomization itself provides the statistical structure whereby the study results may be judged [:14,15], in principle through the use of a randomization test, though in common practice through the use of a normal theory test that approximates the randomization test. It is a basic tenet of the statistical theory of experiments that in order to preserve these benefits the statistical analysis must be directed by the form of randomization. The admonition to “analyze as you randomize” is perhaps most familiar in the clinical trials field, but the tenet applies with equal force to all forms of experimental endeavor. The imperative to “analyze as you randomize” is particularly important in the context of a cluster randomization design such as that employed in the CATCH trial. With cluster randomization, the mean response under each experimental condition is subject to two sources of variation: variation from cluster to cluster and variation across individuals within a cluster. Donner et al. [161 described the increased variance of the condition means that results from between-cluster variation in terms of a variance inflation factor (IF) which is expressed as a function of the intracluster correlation (ICC) (see Section 4.1 below for relevant formulas). An analysis in which the unit of analysis is the individual rather than the cluster fails to account properly for the between-cluster variation and therefore is liable to produce misleading results. In effect, as indicated by Zucker [17] (see also Ref 181, the intervention effects become confounded with the natural cluster-to-cluster variability, and serious inflation in the type I error level may result. To avoid this predicament, the unit of analysis must be the cluster. It should be noted that in a trial where the unit of randomization is a cluster of individuals such as a school, the trial must include an adequate number of clusters to provide statistically rigorous results. For example, a
100
D.M. Zucker et al randomized trial with only two units per arm cannot provide statistically rigorous results because it is impossible for a randomization-based analysis of such a trial to yield a statistically significant result. Attempts to apply a normal theory analysis to such data are ill-founded: for a trial of this small size, the use of normal theory methods rests merely on bald assumptions and cannot be supported by the usual central limit theorem argument that justifies a normal theory analysis as an approximation to a randomization-based analysis. A trial of such small size often will be a very useful means of assessing the feasibility of implementing the intervention and obtaining preliminary indications of the intervention effect, but such a trial cannot provide a definitive basis for evaluating the benefit of the intervention. By contrast, the CATCH trial (96 schools) and the Community Health Trial for Smoking Cessation (COMMIT) [19,20] (11 matched pairs of communities) include an adequate number of units to permit a meaningful statistical analysis. Note that in COMMIT, in view of the relatively small number of units, the planned statistical procedure for evaluating intervention effect is a randomizationbased permutation test rather than a normal theory test.
2.2 Sampling
Methodology:
Handling
of Participant
Migration
A further statistical consideration that arises in the analysis of long-term school-based or community-based trials is the question of how to identify individuals from the school or community for measurement and statistical analysis. This question can be broken down into two aspects: (1) determining how to sample individuals for measurement and (2) defining the set of individuals who are considered to be members of the primary study population for the purpose of statistical analysis. There are two main approaches to sampling individuals for measurement. In the cohort approach, a sample of the individuals initially entered into the study (possibly all such individuals) is selected to be measured throughout the duration of the study. In the cross-sectional approach, separate samples of individuals are taken at each measurement time point. Hybrid schemes combining these two sampling approaches also can be considered. The main advantage of the cohort approach is that within-individual correlation can be exploited to enhance precision. A major advantage of the cross-sectional approach is that the measurement load is more evenly distributed across individuals. This feature of the cross-sectional design can be an important one from a logistic standpoint when the design calls for multiple repeated measurements, and is particularly beneficial in situations where there is serious concern that the act of measurement itself can influence participants’ subsequent behavior. When every individual in a predefined study population is to be measured, the cohort and cross-sectional approaches obviously coalesce. The CATCH design calls for measurement of all students initially entered for the primary study outcome measure of cholesterol change and most other outcome measures. For certain special studies, including HDL cholesterol, 24-hr diet recall, and overnight urine analysis, a random cohort approach is taken, with random samples of the set of initially entered students selected for both preintervention and postintervention measurement.
CATCH Statistical Design
101
The main issue in regard to defining the primary study population for the purpose of analysis is the question of how to handle outmigrating and inmigrating individuals. The issue is closely related to the issue that arises in clinical trials about how to handle patients who switch from the assigned intervention to another therapy during the course of the trial, i.e., dropouts and drop-ins. In the clinical trials field, there is a generally well-accepted “mtention-to-treat” principle that states that patients should be handled in the analysis according to how they were originally assigned, regardless of the therapy subsequently received [21,221. In school-based or community-based intervention research, the following represent three possible options for defining the primary study population: 1, Include individuals who were in the school or community and measured at the beginning of the trial regardless of what happened afterward, with suitable tracking of outmigrating individuals 2. Include individuals who were in the school or community at both the beginning of the trial and at the measurement point in question, and 3 Include individuals who happen to be present in the school or community at the specific measurement time in question The first approach is analogous to the intention-to-treat approach in clinical trials and involves a fixed, predefined study population. In the second approach, an initial study population is identified but only those individuals in the initial population who remain in the school or community up through the measurement point in question are considered in the analysis. The third approach may be described as a “dynamic population” approach. A common v,ariation of the third approach involves imposing the additional restriction that an individual must have received some minimum degree of exposure to the intervention to be included in the analysis. The first two approaches can be implemented via either cohort or crosssectional sampling (under the second approach in the cohort sampling context, outmigrators from the cohort purposely are not taken into account in the analysis). By contrast, the third approach clearly makes sense only within the context of cross-sectional sampling. When the second approach is taken and only a subset of the individuals in the school or community is to be measured, a cross-sectional sampling scheme (or a hybrid scheme) seems natural because a cohort sample will be subject to attrition from outmigration. None of the foregoing three approaches provides an entirely perfect solutiNon to the migration problem: each approach has its own strengths and 1i:mitations. The second and third approaches have the advantage of very obvious intuitive appeal. The appeal of the second approach is its focus on only those individuals who actually received the intended intervention for the entire follow-up period up to the measurement point in question. The appeal of the third approach is its focus on what is actually happening in the school 01: community at each measurement time point. The first approach, i.e., the intention-to-treat approach, has the advantage of defining the study population in a manner that is independent of postrandomization developments, thereby avoiding bias that could arise from subtle influences that the presence or absence of intervention in a particular school 01: community may have on the migration process. The bias threat posed by
102
D.M. Zucker et al migration is analogous to the bias threat posed by missing data in sample surveys [23, chapter 101. In a situation where the parents can choose which school in a given location their child will attend, the presence or absence of a certain intervention program in the school could have a significant influence on the choice. In addition, families contemplating a move often are influenced by the character of the school that the children will attend. In certain cases, the desirability or undesirability of a given intervention program may be a factor in the decision. Even a small bias of this kind can be of importance when the observed intervention effect is modest. The intention-to-treat approach avoids this bias threat entirely and therefore avoids the attendant increased risk of making a false claim of effectiveness (i.e., a type I error). On the other hand, when there is substantial migration that is completeIy unrelated to intervention, the estimate of intervention effect produced by the intention-to-treat approach will be a substantially poorer reflection of the true treatment effect than an estimate produced by one of the other approaches. In this sense the intention to treat approach also may be said to be subject to a type of bias. Often the intention-to-treat approach will lead to a dilution of the intervention effect and a corresponding loss of statistical power or need to increase sample size to maintain power. Investigators must appraise the protection against type I error that the intention-to-treat approach provides in conjunction with the risk of type II error or need for increased sample size that the intention-to-treat approach may create. In certain circumstances, the investigators may consider the likelihood of serious noncompliance bias to be small and therefore may find one of the other analytical approaches more plausible. Some measure of assurance against the possibility of noncompliance bias may be obtained by comparing relevant characteristics across intervention arms among the individuals in whom postintervention measurements were obtained. This checking process is not foolproof, however, as pertinent discussions indicate [21,221. The effect of subtle influences is extremely difficult to assess, particularly where important mediating variables are unrecognized or unmeasured, and can be significantly greater than might be anticipated in advance. Thus, concern about noncompliance bias may be lessened to some extent but not entirely eliminated by such checking. In many leading National Institutes of Health (NIH) studies, and in a broad segment of biomedical trials in other research sectors, primary emphasis is placed on having a high degree of protection against a false claim of effectiveness. Accordingly, an intention-to-treat type of approach has become the approach generally emphasized in such studies. Given the limitatiqns of this approach, however, a reasonable strategy often will be to report analyses under a variety of approaches, recognizing that the true answer probably lies somewhere in between those produced by the different approaches. When the approaches converge to a common result, the investigators can be more confident in the statistical validity of the study outcome. When the approaches yield conflicting results, i.e., some approaches indicate effectiveness whereas others are equivocal or unfavorable to the intervention, the interpretation of the study results should be suitably cautious. In particular, the study conclusion section should acknowledge appropriately that, from the stand-
CATCH Statistical Design
103
point of a fully exacting standard of statistical validity, effectiveness has not been clearly demonstrated. The CATCH protocol specifies that the primary analysis of the study results will follow the intention-to-treat approach. This plan implies that students who outmigrate will not automatically be considered lost to followup. Rather, outmigrating students will be tracked, by means of procedures th.at are set forth in detail in the protocol. The current plan is to track all outmigrating students within a defined radius of the study site and a random salmple of the more remote outmigrators (see Ref. 23 for a discussion of salmpling initial nonresponders). The alternative analytical approaches also will be fully explored in additional analyses. Persistent tracking efforts should limit missing data due to outmigration, but even with best efforts, some outmigrating students will truly be lost to follow-up. In addition, some further instances of missing data will occur because of other problems such as refusal to submit to the postintervention blood draw. Section 3 discusses the analytic strategy for handling missing data. It is important to note in this context that the primary CATCH study cohort is defined not as the set of all students in the participating schools at baseline, but rather as the set of those students in whom the baseline blood d:raw was successfully accomplished.
2.3 Rationale
for Assignment
Ratios
The CATCH design calls for 28 schools on S+F, 28 schools on S, and 40 schools on control (28:28:40 assignment). As stated previously, the primary comparison in CATCH is the comparison of the average of the S+F and S arms to the control arm. The optimal assignment schools to arms from this standpoint would be 24 on S+F, 24 on S, and 48 on control. Under this scheme, the 96 schools would be equally divided between intervention (S+F and S combined) and control, optimizing the power for intervention vs. control. Nevertheless, there is substantial interest in the comparison of the StF arm to the S arm. Therefore, the trial designers decided to consider alternative assignment schemes that would place as many schools as possible in the S+F and S arms, to maximize the power for the S+F vs. S comparison, without unduly compromising the power for the primary study comparison. An alternative assignment scheme would need to have the sample size (i.e., number of schools) in each arm be divisible by 4 to maintain the balance of the design over study sites. Relevant alternative assignment schemes thus included 28:28:40 assignment and 32:32:32 assignment. The projected power for the primary comparison under the 28:28:40 assignment scheme was 92% assuming a standard deviation of 28 mg/dl and 84% assuming a standard deviation of 30 mg/dl (see Section 4). By similar calculations, the projected power under the 32:32:32 assignment scheme was 89% assuming a standard deviation of 28 mg/dl and 81% assuming a standard deviation of 30 mg/dl. For the final design, the investigators decided to proceed with the 28:28:40 assignment scheme.
D.M. Zucker et al
104 2.4 Stratification
As is typical in randomized trials, randomization in CATCH is stratified by site. The investigators also contemplated stratification by socioeconomic status (SES) of the school but decided against such stratification. There was wide site-to-site variation in SES, so that a suitable uniform SES cutpoint for within-site stratification was difficult to define. Center-specific cutoff points could have been used, but the investigators felt that this approach would be somewhat confusing. Similarly, SES matching could have been implemented, but this approach would put excessive emphasis on SES relative to other important background variables (such as geographic location and ethnicity). Additionally, there was serious doubt as to whether SES stratification actually would provide any further useful reduction in residual variation. In regard to the statistical analysis, SES as a school level covariate will be included in exploratory analyses, but the current consensus of the investigators is that SES is unlikely to provide any useful reduction in residual variance beyond that provided by the stratification by site, with which SES is highly correlated.
2.5. Some Logistical
Considerations
We review here some relevant logistical considerations, though these considerations are not strictly statistical. Recruitment of schools was carried out independently at each site. Contact with school officials generally was made at both the district and individual school level. To reduce the problem of students moving on to a new school in the middle of the study, the end of the study period was defined as the end of fifth grade rather than the end of sixth grade, and only schools with all three grades 3, 4, and 5 within the school were considered. Recruitment also was restricted to schools in which the school officials were willing to sign a commitment to have the school randomized and participate in CATCH for the entire study period. Schools serving as “magnet” schools for children with special interests or handicaps were excluded. Informed consent at the individual student level was handled according to the requirements of the site-specific human subjects committees. For baseline data collection, all sites require a signed parental consent for physiological risk factor screening and for substudies involving a physical activity interview, a 24-hr diet recall, or an overnight urine specimen. Parental consent was not required for the health behavior questionnaire or observation of children during physical education class. For interim height and weight measurements, passive consent was approved at all sites. For the final risk factor screening, consent requirements varied across sites. Texas was permitted to rely on passive consent, whereas the three remaining sites were required to obtain repeat parental consent. Minnesota required consent from the child as well, via a separate consent form written in simplified language. 3. STATISTICAL
ANALYSIS
STRATEGIES
As discussed in the preceding section, the randomization scheme for the CATCH trial implies that the primary unit of statistical analysis must be the
CATCH Statistical Design
105
school rather than the student. At the same time, it is desirable to be able to incorporate relevant student-specific variables into the analysis. For example, in the analysis of the effect of intervention on serum cholesterol, it is planned to adjust for the student-specific baseline cholesterol level. This section outlines analytical strategies that allow student-specific variables to be incorporated while maintaining school as the primary unit of analysis. In addition, a strategy for handling missing data is described.
3.1. Analytic
Strategies
3.1 .Z. Two-state analysis scheme. Presented here is a two-stage analysis scheme whose general form is as follows. In the first stage, an individual level analysis is conducted, involving relevant individual level covariates, to derive school-specific means that are adjusted for individual levei covariates. In the second stage, the school-specific means are analyzed, with adjustment as appropriate for school-specific covariates, to evaluate the intervention effect. This approach is similar in spirit, though not identical in form, to the method of Gail et al. [241. As a preliminary, some notation is needed. Let s, i, j, and k index study site, intervention, school, and student, respectively. Next, let Y,,,ii, denote the res’ponse outcome for student k in school j(si). Finally, let Xk(sij)denote a vector of student-specific covariates for student k in school j(d). The two-stage scheme now may be described. The statistical model in the first stage of the analysis is given by Y,,,ij~ = WZjcsn + bTX&ij) + error for a continuous
(1)
response or
Prob(Y,,,tij = 1) = F-‘(mjcsi) + &TX,,,,,)
(2)
for a binary response, where mj(,, denotes a school-specific intercept parameter. In Eq. (21, FO represents a suitable transformation such as the logit transformation F(p) = log[p/(l - p)]. For a continuous response, the model is fitted using ordinary linear regression; for a binary response, the model is fitted using maximum likelihood. In the second stage of the analysis, differences between the estimated miCSo’s from either (1) or (2) are assessed for statistical significance by subjecting the est:imated mjcsa’s to two-way ANOVA or ANCOVA with fixed effects for intervention group, for study site, and for relevant school-specific characteristics, if any (e.g., the geographic setting of school-urban vs. suburban vs. rural). In these analyses, the school effects are weighted according to an appropriate estimate of the inverse variance of the estimated m+i) for the school or, as an approximate weighting, the number of students in the school. In principle, instead of applying ANOVA or ANCOVA, one could subject the estimated mjcsn’s to the exact permutation test procedure corresponding to the randomization scheme 1151, as is planned in the COMMIT trial. In CATCH, however, the number of schools is large enough to rely on the central limit theorem to justify the use of ANOVA or ANCOVA as a reasonable approxi-
106
D.M. Zucker et al mation to the permutation procedure. As part of the ANOVA/ANCOVA analyses, appropriate confidence intervals for the differences between pairs of intervention arms will be constructed. Interactions between intervention and school level covariates may be explored using the usual linear model approaches at the second stage of analysis. Interactions between intervention and student-specific covariates may be assessed for significance by replacing b by separate bjcSi)in the first stage of the analysis and subjecting the estimated bj(si) to two-way ANOVA. 3.2.2. Mixed-model strategy. An alternative analysis strategy is to collapse the two stages into a single stage using a mixed model approach. The mixed model may be defined by (1) or (2) with mjcsi,given by mjcsi, = m + LI, + ci + dj(,o
(3)
where m is an overall intercept term, a, is a site effect term, Ci is an intervention effect term, and dj<,ijis a mean-zero random effect term. For a continuous response, the standard approach would be to use a normal model for the djc,o and fit by maximum likelihood or restricted maximum likelihood, as in Ref. 25. For a binary response, there is as yet no firmly established approach; Breslow and Clayton 1261 describe various methods assuming normally distributed djc,o (see also Refs. 27-29), while Follmann and Lambert 1301 present a method in which the djc,n are handled nonparametrically. This approach can be extended to incorporate school level baseline covariates also, by including a regression term involving such covariates in (3). GEE strategy. A different type of single-step strategy would be to apply the method of generalized estimating equations (GEEs), as developed by Liang and Zeger 131,321 (cf. Ref. 33). Conventional statistical wisdom holds that the mixed model approach is more relevant when interest focuses on the effect of intervention at the unit level (in CATCH, the school level), whereas the GEE approach is more relevant when interest focuses on the intervention effect on a general population level averaging across units 1321. In CATCH, both levels of analysis could be of interest. The distinction between these two levels of analysis is nugatory for the linear statistical models typically used for a continuous response, but relevant for the nonlinear models typically used for a binary response. Discussion of analysis strategies. If the response variable is continuous and the number of randomized units is large, as in CATCH, the three analytical approaches discussed above are broadly equivalent. For a binary response, the two-stage approach is preferable from a hypothesis testing standpoint because it provides a valid test for intervention effect without relying on the correctness of the model assumptions (cf. Refs. 20 and 24). The two other approaches have the drawback of relying on the model assumptions. The mixed model strategy relies on the correctness of (2) and (3). The GEE strategy in principle relies on the correctness of the marginal model in the GEE framework, though the GEE-based test for intervention effect can be expected to be type I error robust to many types of model misspecification,
107
CATCH Statistical Design
such as omission of relevant covariates (see Ref. 34 and the general discussion of model misspecification in Section 1 of Ref. 35). In the alternative, the above-mentioned drawback could be dealt with in the manner indicated by Ref. 24. In that procedure, generalized residuals are defined based on the score test for intervention effect derived from the model, using null hypothesis estimates of the remaining model parameters. A test for intervention effect is then formed using the permutation variance of these residuals. In effect, the two-stage strategy of Section 3.1.1 is a somewhat simplified version of this procedure applied to the mixed model. Actually, the two-stage strategy may be somewhat more efficient because it avoids the need to assume no intervention effect while estimating the covariate effects. On the other hand, when the number of randomized units is large, the two-stage scheme involves estimating a large number of parameters in the first stage of the analysis. This feature may be problematic in exploratory analyses involving many student-specific variables, especially with a binary response. It may be noted, though, that certain procedures that have been proposed for generalized mixed linear models, such as the PQL method of Ref. 26, involve a computation that appears very similar to estimating unit-specific intercept parameters. For either a continuous or a binary response, when the number of randomized units is small, the analytical strategy of choice is to implement either the two-stage approach of Section 3.1.1 or the method of Ref. 24 (applied either to the mixed model or the GEE model) in conjunction with an exact permutation test. This strategy provides a test with a guaranteed type I error rate.
3.2 Handling
of Missing
data
As discussed in Section 2, persistent efforts will be made to obtain the highest possible degree of data completeness, including careful tracking and follow-up of outmigrating students. Nevertheless, despite best efforts, there will be a certain fraction of students in whom the tracking effort will be unsuccessful and the fifth grade risk factor assessment will be missing. Given the anticipated missing data problem, some plan for sensitivity analysis is needed to assess how different assumptions about the missing data might affect the study conclusions. A variety of approaches to such sensitivity analysis are possible. The CATCH protocol presents one possible approach applicable to a continuous outcome variable. Here this approach is described for the comparison of the combined intervention arms vs. the control arm-a similar development can be given for the comparison of the two intervention arms. The idea of the approach is to postulate a fairly conservative set of assumptions concerning the missing data. If intervention is found to be statistically significantly superior to control under such conservative assumptions, then one can infer with reasonable confidence that the intervention is beneficial despite the presence of possible missing data bias. To simplify the presentation, the approach is described here within the framework of the initially planned CATCH tracking scheme, under which tracking was to be attempted in all students but the assumption was made that tracking would be unsuccessful in a proportion o of the students (the
108
D.M. Zucker et al CATCH investigators projected that w would be equal to 15%). This setup is what was assumed in the CATCH power calculations. Subsequently, the tracking plan in the CATCH protocol was revised to call for tracking efforts on all students within a given radius and a random sample of more remote outmigrators. The approach to handling missing data can be adapted straightforwardly to the revised tracking plan. The assumptions made are as follows: 1. It is assumed that a certain proportion v of the missing data in each intervention school arises from the individuals at the unfavorable extreme of the outcome distribution (e.g., the highest cholesterol levels) within the school. The remaining missing data in the intervention schools are assumed to occur at random. 2. The missing data in the control schools are all assumed to occur at random. This set of assumptions is based on the notion that outmigration from intervention schools is likely to be influenced to some degree by dissatisfaction with the intervention, whereas outmigration from control schools is unlikely to be influenced substantially by either positive or negative feeling about the control condition. The CATCH power calculations incorporate an adjustment reflecting this approach to handling missing data. Details are given in Section 4.1. The foregoing approach to handling missing data represents only one possible approach. For the actual analysis of the CATCH results, a range of sensitivity analyses will be formulated and explored.
4. STATISTICAL
POWER
CONSIDERATIONS
This section outlines the statistical power calculations for the CATCH trial. The primary CATCH endpoint is serum cholesterol change. Power calculations for this endpoint formed the basis for determining the number of schools to be entered into the CATCH trial.
4.1. Methodology
for Computing
Statistical
Power
The approach to computing statistical power for the CATCH trial is patterned after the methodology described in Ref. 16 for computing statistical power for a continuous outcome variable in a trial in which within-cluster correlation must be taken into account. Two types of comparison are of interest: (1) comparison between intervention (defined as the average of the S and S+F arms) and control, and (2) comparison between the S+F arm and the S arm. The power computation incorporates an adjustment for the handling of missing data, based on the approach to handling missing data that is described in Section 3.2. The parameter specifications required in the power computation are as follows: (1) Structural
aspects of the design:
N = total number of schools in the study
CATCH Statistical Design
109
Ni = number of schools on intervention
(S+F and S combined)
N2 = number of schools on control C = number of classrooms
per school
S = number of students per class h = proportion
of students excluded from study at baseline
w = proportion
of data that is missing
u = proportion of students with missing data in the intervention who are assumed to have an extreme value (2) Parameters
group
relating to the outcome measure:
6 = mean difference
that is desired to be detected
L& = total between-student
variance
pi = correlation
between two students in the same classroom
p2 s;;i;rlation
between
two students in different
classrooms
in the same
‘The power for a given pairwise comparison takes the form 1 - t(t,; d, +), wh,ere t(x; d, +) is the noncentral t distribution with d = (N - 3) degrees of fre’edom and noncentrality parameter $, and t, is the critical t value corresponding to the desired type I error level. Define S’ = (1 - X)(1 - w)S; this S’ is the number of students observed per class, after reduction for an exclusion rabe of A and a missing data rate of w. Then the noncentrality parameter is given by :$ = [(ACS’)/(l
+ p&S’ - 1) + p2(C - l)S’)l”[S
- I)/(T~~J
where A = NIN2/N for the comparison of the average of S+F and S combined vs. control and A = N,/4 for the comparison of S+F vs. S. ‘The quantity r in the above formula represents a shift correction to account for the conservative missing data adjustment scheme described in Section 3.2. The shift correction is computed under the assumption that the missing data actually occur at random but are handled in the statistical analysis according to the conservative scheme. Assuming that the outcome variable follows a normal distribution, the shift correction may be derived from theoretical results concerning the truncated normal distribution 136, section 13.71. The shift correction thus obtained is given by l? = [$(Z,)/(l - ~)]a,, where 4 = VW, Z, is the Z value associated with a standard normal right tail area of 9, $c) is the standard normal density function, and era is essentially the variance due to student within classroom (a$” below), but with a slight adjustment to remove the component of variance due to pure measurement error. The expression 1 + pi(S’ - 1) + p2(C - 1)s’ in the formula for + represents the variance inflation factor (IF) resulting from within-school correlation. The for:mulation in the CATCH power calculations is slightly more general than that of Ref. 16 in that a distinction is made between students in the same classroom and students in different classrooms within the same school. This distinction is a relatively minor refinement, however; the key requirement is to incorporate some sort of variance inflation factor to take account of the
110
D.M. Zucker et al correlation between students within a school. When pi and p2 are assumed equal to a common intracluster correlation (ICC) p, the above expression for the IF reduces to that given in Ref. 16: IF = 1 + (9 - 1) p, where S+ = CS’ is the total number of students measured per school. Under the assumed values of the design parameters, the variance inflation factor in the CATCH power calculations works out to be about 1.50 (1.47 with three classrooms per school or 1.52 with four classrooms per school). In certain contexts, such as a study on an infectious disease, the within-cluster correlation could be much higher than in CATCH, resulting in a much larger inflation factor. In addition, a larger cluster size also would lead to a larger inflation factor. The formula for IJJ also can be expressed in terms of components of variance associated with the different sources of variation involved in the experimental design. The relevant variance components are as follows: (1) a& = the variance in mean response from school to school, (2) u& = the variance in mean response from classroom to classroom within a school, and (3) a&, = the variance in response level from student to student within a classroom. With these definitions, we have a& = u& + a& + c&, p, = (u& + c&)/u&,, and p2 = c&,/c&. The foregoing mechanism for power calculation pertains to the basic setting involving a continuous outcome variable without covariate adjustments, and assuming an equal number of students in each school. In this setting, the three analytical strategies of Section 3 coincide, and the power calculation methodology described here conforms with the analytical plan (even though classroom effects are not explicitly modeled in the analytical plan).
4.2. Background
Assumptions
In designing the CATCH trial, the following background assumptions were made. It was assumed that each school would include three to four classes with 25 students per class. It was further assumed that 20% of the students would be excluded from the primary study cohort because of refusal to consent to measurement or for medical reasons. In addition, as indicated in Section 3.2, it was projected by the investigators that the missing data rate w at the end of fifth grade would be 15%. Power calculations were performed for a one-sided type I error rate of 0.025. In the missing data adjustment scheme, the assumption was made that 25% of the students with missing data would come from the extreme part of the response distribution, resulting in a shift correction I’ of approximately 2.2 mg/dl. This highly conservative assumption was adopted to provide essentially “worst case” protection against missing data bias.
4.3. Specification
of the Detectable
Difference
for Cholesterol
Keys et al. [371 developed equations for relating serum total cholesterol to dietary intake. These equations were used to try to develop a realistic projection of the mean cholesterol difference in CATCH between intervention and
111
CATCH Statistical Design
control. This calculation relied on an extrapolation of the Keys et al. equations from adults to children, which is admittedly uncertain. It is common in clinical trials, however, to have to plan a trial on the basis of uncertain data, and here the Keys et al. data were the best data available for the required purpose. The calculations yielded a projected mean difference of about 5 mg/dl. The investigators were satisfied that this magnitude of difference would be of sufficient public health significance to justify basing the design of the trial on a mean difference of this size. The derivation of the projected mean difference is described below using the notation of Ref. 37. Let S, P, and C denote, respectively, the percentage of total calories provided by saturated fat, the percentage of total calories provided by polyunsaturated fat, and dietary cholesterol intake in mg/lOOO kcal dietary intake. Equation III.2 of Keys et al. gives the following formula for an “average man’s” serum total cholesterol in mg/dl (SC) as a function of S, P, and C. SC = 164 + 1.35(2S - P) + 1.5c” (4) This formula is based on data from physically healthy adult males in two mental patient populations in Minnesota. Keys et al. give a further formula for the difference ASC = SC, - SC* in serum cholesterol between two diets A and B for a specific individual as a function of the corresponding difference AZ = SC, - SC, for the average man and the ratio SCJSC,: ASC = l-0.84
+ 1.84(SC,/~,)]A~
(5)
This formula was developed to extrapolate from the average man to another man. In the calculations for CATCH, the assumption was made that the formula may be used to extrapolate from the average man to the average CATCH child. The dietary intake projections for the CATCH trial were as follows. The baseline diet of the CATCH population was assumed to have S = 13.5%, P = 7.0%, and C = 138 mg. The projection for the CATCH control group (group C) at the end of the trial was S = 12.5%, P = 7.0%, and C = 138 mg. The projection for the CATCH intervention group (S+F and S arms combined) at the end of the trial was S = lO.O%, P = 8.0%, and C = 138 mg. The foregoing estimates for S and P were based on input from the investigators. In particular, the end-of-trial values for the control group were based on the assumption that secular trends would produce a slight reduction in saturated fat consumption but relatively little change in polyunsaturated fat consumption. Estimates for C were based on projections derived for the Dietary Intervention in Children (DISC) trial 1381, a related NHLBI trial in a similar population, and on the assumption that there will be no change in C during the course of the trial in either the intervention or the control students. Taking A to be the baseline diet, Eq. (4) yielded SC* = 209 mg/dl. The mean cholesterol for children in the CATCH trial was projected to be 170 mg/dl. Thus, in using Eq. (5) to extrapolate to the average CATCH child, SC*, was taken to be 170 mg/dl. Taking B to be the CATCH control diet and applying Eqs. (4) and (5) in succession yielded AE(contro1) = 2.7 mg/dl and ASC(contro1) = (0.66X2.7 mg/dl) = 1.8 mg/dl. Similarly, ASC(intervention)
D.M. Zucker et al
112
was estimated to be 7.1 mg/dl. Thus, the estimated between-group mean cholesterol difference in CATCH was calculated as 7.1 mg/dl - 1.8 mg/dl = 5.3 mg/dl. Power calculations focused on a projected difference of 5.1 mg/dl, representing 3% of the baseline mean cholesterol level.
4.4 Specification
of Standard Deviations
and Correlations
Based on data from the Bogalusa Heart Study 1391 in the pediatric epidemiology field and the Know-Your-Body Study [40] in the health education field, it was estimated that the overall between-student standard deviation would be 28 mg/dl for change in cholesterol. Power calculations were performed for a range of standard deviations centered at 28 mg/dl. Variance component analyses on data from the Bogalusa Heart Study were used to develop estimates of the correlation between two students in the same class (estimated pi value of 0.023) and between two students in different classes of the same school (estimated p2 value of 0.003).
4.6 Power Estimates
for Total Cholesterol
Table 1 presents the results of the power calculations for the primary endpoint of total cholesterol. In particular, assuming four classes per school and a standard deviation of 28 mg/dl, the power to detect the projected mean difference of 5.1 mg/dl is 92%. The sample size of 96 schools affords a substantial degree of robustness in the study design, which the CATCH investigators considered to be an important aspect of the design. In particular, if the mean difference is less than projected, which is a significant possibility, then the CATCH study still will have good statistical power to detect an intervention effect if the assumption concerning missing data is relaxed. To illustrate this point, Table 2 shows power calculations similar to those presented in Table 1, but with the missing data shift correction I omitted. Assuming four classes per school and a standard deviation of 28 mg/dl, a mean difference of 2.8 mg/dl can be detected with 89% power. Tables 3 and 4 present power calculations, with and without the missing data correction, respectively, for the comparison of S+F to S. The trial was not designed to detect a difference between S+F and S with respect to cholesterol, and so the power figures shown in Table 3 are low. When the missing data correction is omitted, though, there is moderately good power to detect relevant differences in the comparison of S+F to F. 4.7 Power Calculations
for Other Outcome
Measures
Table 5 presents power calculations for various secondary outcome measures, for the comparison of intervention (S+F and S combined) vs. control and for the comparison of the S+F arm to the S arm. Specifically, the table shows the mean difference that is detectable with 80% power under the CATCH design. These calculations assume four classes per school and omit
113
CATCH Statistical Design
Table 1
Power for Cholesterol Change at 5th Grade for Intervention (S + F and S Combined) vs. Control; Missing Data Adjustment Incorporated Difference between Means SD
3.4
5.1
6.8
24 26 28 30 32
2) 4 classes per schooi, 25 students per cfass 54 99 40 97 29 92 21 84 15 75
99 99 99 99 99
24 26 28 30 32
2) 3 classes per school, 25 students per class 44 97 33 92 24 84 17 74 13 64
99 99 99 99 97
For reference, the mean total cholesterol among children in this age range is about 170 mg/dl; the listed effect sizes are 2%, 3%, and 4% of this value, respectively.
Table
2
Power for Cholesterol Change at 5th Grade for Intervention (S + F and S Combined) vs. Control; Missing Data Adjustment Omitted Difference between Means
SD
2.2
2.8
3.4
24 26 28 30 32
1) 4 classes per school, 25 students per class 83 96 77 93 71 89 65 85 59 80
99 99 97 95 92
24 26 28 30 32
2) 3 classes per school, 25 students 73 91 66 86 60 80 54 75 49 69
98 96 93 89 85
per class
D.M. Zucker et al
114 Table 3
Power for Cholesterol Change at 5th Grade for School Plus Family (S + F) vs. School Alone (S); Missing Data Adjustment Incorporated Difference between Means SD
Table 4
3.4
5.1
6.8
24 26 28 30 32
1) 4 classes per school, 25 students per class 92 36 26 84 74 19 14 63 11 53
99 99 98 96 92
24 26 28 30 32
2) 3 classes per school, 25 students per class 29 84 74 21 16 63 12 53 43 9
99 98 95 90 84
Power for Cholesterol Change at 5th Grade for School Plus Family (S + F) vs. School Alone (3; Missing Data Adjustment Omitted Difference between Means SD
2.2
2.8
3.4
24 26 28 30 32
2) 4 classes per school, 25 students per class 62 82 55 76 49 70 44 64 40 58
94 90 85 80 75
24 26 28 30 32
2) 3 classes per school, 25 students per class 52 72 45 65 40 59 36 53 32 48
87 81 75 70 64
CATCH Sta.tistical Design Table 5
Differences Detectable 0utcome Measures
115 with 80% Power on Secondary
Outcome Measure HDL cholesterol (mg/dl) Systolic BP (mm Hg) Diastolic BP (mm Hg) Heart r,ate (bpm) Triceps skinfold (mm) Subscapular skinfold (mm) Body mass index (kg/cm2) Percent calories from fat (%) Sodium./creatinine ratio (peq/mg)
Detectable Difference
Estimated SD
Int vs. Ctl
S + F vs. S
13.6 49.4 51.5 58.4 14.6 12.9 0.5 37.0 70.0
2.1 0.9 0.7 1.2 0.6 0.7 0.6 1.8 12.0
2.7 1.2 1.0 1.5 0.8 0.9 0.7 2.4 15.0
the missing data adjustment. Estimated standard deviations were obtained from data collected in the CATCH pilot study. Percent calories from fat is determined on the basis of 24-hr diet recall data. The sodium/creatinine ratio measure is based on analysis of an overnight urine sample. The HDL cholesterol, 24-hr diet recall, and overnight urine assessments are each being made only on a sample of students within each school, with the sampling designed to yield complete data on approximately 15 students per school.
5. SUMMARY
The CATCH trial is a community trial involving randomization of entire schools rather than individual participants to the experimental arms. This cluster randomization scheme imposes special requirements for the design and analysis. To ensure valid statistical results, the primary unit of analysis must be the cluster rather than the individual participant. In addition, to ensure an adequate number of clusters to achieve the desired statistical power, the power calculations must account for within-cluster correlation, as discussed by Ref. 16. For the CATCH trial, an analysis plan has been constructed that maintains the school as the primary unit of analysis while allowing incorporation of individual and school level covariates. The point of main importance here is not the specific form of the analysis but rather the general idea of taking the school as the primary analysis unit. The CATCH power calculations have been performed according to the formulation in Ref. 16. The cluster randomization scheme also puts in atypical form the familiar issue of dropout-drop-in which commonly arises in clinical trials. In the cluster randomization context, the main problem is not dropout-drop-in of entire experimental units but rather dropout-drop-in of individual participants within an experimental unit. Nonetheless, the basic concern for dropout-drop-in bias remains. The CATCH design calls for a conservative inten-
116
D.M. Zucker et al with outmigration and inmigration, tion-to-treat approach in dealing including a vigorous tracking plan and a sample size determination that maintains statistical power under conservative assumptions about remaining missing data. The foregoing importance ability
general
generically
to draw
strategies
for
design
and
in any cluster-randomized
statistically
firm
conclusions
analysis
experiment about
are
of key
to ensure
the effectiveness
the
of the
intervention. The authors thank the reviewers paper.
and the editor for helpful comments
on the initial draft of this
REFERENCES 1. Perry CL, Parcel GS, Stone E, Nader I’, McKinlay SM, Luepker RV, Webber LS: The Child and Adolescent Trial for Cardiovascular Health (CATCH): overview of the intervention program and evaluation methods. J Cardiovasc Risk Factors 2:36-44,1992 2. Nicklas TA, Forcier JE, Farris RI’, Hunter SM, Webber LS, Berenson GS: Heart Smart school lunch program: a vehicle for cardiovascular health promotion. J Health Prom 4:91-loo,1989 3. Perry CL, Luepker RV, Murray DM, Hearn MD, Halper A, Dudovitz B, Maile MC, Smyth M: Parent involvement with children’s health promotion: a one-year follow-up of the Minnesota Home Team. Health Educ Q 16:171-180, 1989 4. Nader PR, Sallis JF, Patterson TL, Abramson IA, Rupp JW, Senn KL, Atkins CJ, Roppe BE, Morris JA, Wallace JP, Vega WA: A family approach to cardiovascular risk reduction: results from the San Diego Family Health Project. Health Educ Q 16:229-244,1989 5. Stone EJ, Perry CL, Luepker RV: Synthesis of cardiovascular for youth health promotion. Health Educ Q 16:155-169, 1989 6. Cornfield J: Randomization 102,1978
behavioral
research
by cluster: a formal analysis. Am J Epidemiol 108:100-
7. Dwyer JH, MacKinnon DP, Pentz MA, Flay BR, Hansen WB, Wang EUI, Johnson CA: Estimating intervention effects in longitudinal studies. Am J Epidemiol 130:781-795,1989 8. Mickey RM, Goodwin GD, Costanza MC: Estimation of the design community intervention studies. Stat Med 10:53-64, 1991 9. Donner A, Brown KS, Brasher I’: A methodologic intervention trials employing cluster randomization, 19:795-800,199O
effect in
review of non-therapeutic 1979-1989. Int J Epidemiol
10. Koepsell TD, Martin DC, Diehr PH, Psaty BM, Wagner EH, Perrin EB, Cheadle A: Data analysis and sample size issues in evaluations of community based health promotion and disease prevention programs: a mixed model analysis of variance approach. J Clin Epidemiol44:701-713,199l 11. Murray DM, Hannan PJ: Planning for the appropriate analysis in school-based drug-use prevention studies. J Consult Clin Pyschol58:458-468, 1990 12. Murray DM, Hannan PJ, Zucker DM: Analysis promotion studies. Health Educ Q 16:315-320,1989
issues in school-based
health
13. Feldman HA, McKinlay SM: Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 13:61-78, 1994 14. Fisher RA: The Design of Experiments. Edition, New York, Hafner, 1966)
Edinburgh,
Oliver and Boyd, 1935 (8th
CATCH Statistical Design
117
15. Kempthorne 0: The Design and Analysis of Experiments. New York, 1952
John Wiley and Sons,
16. Donner A, Birkett N, Buck C: Randomization by cluster: sample size requirements and analysis. Am J Epidemiol 114:906-914, 1981 17. Zucker DM: An analysis of variance pitfall: the fixed effects analysis in a nested design, Educ Psycho1 Meas 50:731-738,199O 18. Glass GV, Stanley JC: Statistical Methods wood Cliffs, NJ, Prentice-Hall, 1970
in Education
and Psychology,
Engle-
19. Lichtenstein E, Wallack L, Pechacek TF: Introduction to the Community Health Trial for Smoking Cessation (COMMIT). Int Quart Commun Health Educ 11:223237, 1990-1991 20. Gail MH, Byar DP, Pechacek TF, CorIe DK: Aspects of the statistical design for the Community Health Trial for Smoking Cessation (COMMIT). Controlled Clin Trials 136-21, 1992 21. Friedman LM, Furberg CD, DeMets Edition. Littleton, MA, PSG, 1985
DL: Fundamentals
of Clinical
Trials, 2nd
22. Pocock SJ: Clinical Trials: A Practical Approach. New York, John Wiley and Sons, 1983 23. Cochran WG: Sampling Techniques, 1977
3rd Edition. New York, John Wiley and Sons,
24. Gail MH, Tan WY, Piantadosi S: Tests for no treatment clinical trials. Biometrika 75:57-64, 1988 25. Laird NM, Ware JH: Random-effects 35:963-974,1982
models
26. Breslow NE, Clayton IX: Approximate models. J Am Stat Assoc 88:9-25, 1993
effect in randomized
for longitudinal
inference
data. Biometrics
in generalized
27. Zeger SL, Karim MR: Generalized linear models with random sampling approach. J Am Stat Assoc 86:79-86, 1991
linear mixed
effects: a Gibbs
28. Stiratelli R, Laird N, Ware J: Random effects models for serial observations binary responses. Biometrics 40:961-971, 1984
with
29. Statistics and Epidemiology Research Corporation. EGRET User’s Manual. Seattle, WA, Statistical and Epidemiology Research Corporation, 1989 30. Follmann DA, Lambert D: Generalized ing. J Am Stat Assoc 84:295-300, 1989 31. Liang KY, Zeger SL: Longitudinal Biometrika 73:13-22,1986
logistic regression by nonparametric
data analysis using generalized
mix-
linear models.
3:!. Zeger SL, Liang KY, Albert PA: Models for longitudinal data: a generalized estimating equations approach. Biometrics 44:1049-1060, 1988 33. Prentice R: Correlated binary regression with covariates observation. Biometrics 44:1033-1048, 1988
specific to each binary
34. Gail MH, Wieand S, Piantadosi S: Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 75:57-64, 1984 35. Lin DY, Wei LJ: The robust inference for the Cox proportional Am Stat Assoc 84:1074-1078, 1989 36. Johnson NL, Kotz S: Distributions in Statistics: tions 1. New York, John Wiley and Sons, 1970
Continuous
hazards model. J
Univariate
Distribu-
37. Keys A, Anderson JT, Grande F: Serum cholesterol response to changes in the diet, III: difference among individuals. Metabolism 14:766-775, 1965 38. DISC Collaborative Research Group. Dietary intervention study in children with elevated LDL-cholesterol: design and baseline characteristics. Ann Epidemiol 3:393-402,1993
118
D.M. Zucker et al 39. Freedman DS, Shear C, Srinivasan SR, Webber LS, Berenson GS: Tracking of serum lipid and lipoproteins in children over an eight-year period: the Bogalusa Heart Study. Prev Med 14:203-216‘1985 40. Walter HJ, Hofman A, Vaughan RD, Wynder EL: Modification of risk factors for coronary heart disease: five-year results of a school-based intervention trial. N Engl J Med 3X3:1093-1100,1988