M u l t i c e n t e r R a n d o m i z e d C o n t r o l l e d Trials in T r a n s f u s i o n M e d i c i n e Eleftherios C. Vamvakas
ANDOMIZED controlled trials (RCTs) are prospective clinical experiments in which patients who have agreed to participate in a study are randomly allocated by the investigators to treatment or control groups, and then receive either the intervention under study or standard therapy, respectively. 1 RCTs that have been designed as double-blind studies are considered the most powerful method of investigation in clinical research, 2 because--if they are correctly conducted--they can eliminate bias 3 and they also can considerably reduce (or eliminate) confounding4,5 as alternative explanations for any observed difference in outcome between the treatment and the control group of patients. A double-blind RCT can thus establish a causal relationship between the treatment and the outcome under study.l.2,6 More specifically, if randomization is used correctly, neither the investigator nor the participant can influence the allocation of a patient to the treatment or the control group of the study, because neither one knows what the assignment will be before the patient's decision to enter the study. Patients with more severe illness cannot be channeled preferentially to the treatment or the control group, and selection bias is eliminated as a determinant of the outcome of interest in both groups. Furthermore, thanks to the play of chance, randomization should produce comparison groups that are balanced with regard to the levels of all known and unknown confounders of the association under study, so that differences in outcome between the treatment and the control group cannot be traced to differences in the levels of known and unknown confounding factors between the arms of the trial. 1,2,6 Randomization, however, does not guard against the effect of preconceived notions regarding benefit from the treatment on the part of either patients or investigators, and preformed opinions about benefit from the treatment can influence the reporting of symptoms by the patients or the assessment of disease activity by the investigators. 7 Blinding of the investigators or patients is not part of the definition of an RCT, and unbtinded or single-blind RCTs can be compromised by observation bias as much as the prospective observational studies of
R
Transfusion Medicine Reviews, Vo114, No 2 (April), 2000: pp 137-150
the same hypothesis. Because there is no difference between the unblinded RCTs and the prospective observational studies in the way the outcome of interest is ascertained in the treated patients and the controls, observation bias constitutes an alternative explanation for the findings of an RCT that has not been designed and implemented as a double-blind study. 8 For this reason, epidemiologic tenet holds that only double-blind RCTs can establish causal relationships, because only double-blind RCTs can eliminate all 3 alternative explanations for the findings of a study: selection bias, confounding factors, and observation bias. 6,9 In practice, a double-blind RCT may (or may not) eliminate these alternative explanations for the observed results. Randomization eliminates the effect of selection bias by not only generating an unpredictable assignment sequence but also assuring concealment of this sequence until each enrolled patient is allocated to either the treatment or the control group of an RCT. 4,1~ If there are errors in the implementation of the procedure, or if the randomization scheme is tampered with and compromised, enrolled patients and investigators may become aware of the next treatment assignment. 1~ If those responsible for admitting patients into an RCT have foreknowledge of the treatment allocations, they may channel study candidates preferentially to the treatment or the control group for various reasons. 12,13 This could easily be accomplished by delaying a patient's entry into the study until the next desired allocation appears, or by excluding eligible participants from the trial or by encouraging them to refuse entry. Furthermore, although probability theory dictates that randomization should produce balanced comparison groups in RCTs that enroll infinitely large patient populations, the random assignment
From the Blood Bank and Transfusion Service, Department of Pathology, New York University School of Medicine, New York, NY. Address reprint requests to Eleftherios C. Vamvakas, MD, PhD, Blood Bank, RRG-17, New York University Medical Center, 400 East 34th St, New York, NY 10016. Copyright 9 2000 by W.B. Saunders Company 0887- 7963/00/1402-000453.00/0
137
138
of subjects to groups does not guarantee balance in the levels of confounding factors between the arms of an ordinary trial enrolling a finite number of subjects. 1,6,14 Failure of randomization to equally distribute all confounding factors between the arms of an RCT is a major concern in small studies, 14but it also can occur in very large investigations as a result of chance. Providing that the randomization scheme is implemented correctly, the larger the sample, the greater the likelihood that no serious imbalance will result between the arms of the trial in the levels of variables that determine the outcome of interest. Multicenter RCTs were introduced to meet the sample size requirements for a successful trial. An RCT should not be undertaken if it is not expected to enroll a sufficient number of patients to have adequate statistical power to answer the question under study. 15,16In addition, an RCT should not be ffndertaken if it is not expected to enroll a sufficient number of subjects for there to be a high likelihood of balance in the levels of confounding factors between the arms of the trial, thanks to the process of allocating patients randomly to treatment and control groups (ie, thanks to the play of chance). To recruit an adequate number of patients within a reasonable period, it is often necessary to enroll participants simultaneously at several medical centers.1 In multicenter RCTs, patients are randomized separately at each participating medical center, and the analysis of the study also needs to be stratified by participating hospital. 17-19A coordinating center is responsible for implementing the randomization scheme and for collecting, monitoring, editing, and analyzing the data. The data monitoring committee, which ought to be independent of the investigators and any sponsor of the trial, is charged with periodically reviewing baseline, toxicity, and outcome data and with evaluating center performance. 2~ This committee has the responsibility to recommend early termination of the trial in the event of unanticipated toxicity of the intervention, greater-than-expected benefit permitting a reduction in the required sample size, or high likelihood of inconclusive results with the planned or feasible sample size. Furthermore, in double-blind multicenter studies, the responsibility for participant safety also rests to a large extent with the data-monitoring committee, because the individual investigators are unaware of the treatment assignments.1
ELEFTHERIOS C, VAMVAKAS
Advantages and disadvantages of multicenter (versus single-center) RCTs have been debated. 23-30 If commercially sponsored studies of new blood products and blood substitutes are excluded, the number of multicenter RCTs published in the field of transfusion medicine remains small, and the reported multicenter studies have often produced findings that contradicted the results of singlecenter investigations of the same hypothesis. This review considers 3 areas of investigation from the field of transfusion medicine in which one (or more) multicenter RCTs failed to detect an association reported earlier from single-center studies, and discusses some methodologic reasons that might account for the differences in the findings between multicenter and single-center studies of the same hypothesis. PROPHYLACTIC GRANULOCYTE TRANSFUSIONS
Eight RCTs published between 1978 and 1984 evaluated the efficacy of granulocyte transfusions in preventing bacterial or fungal infection, or death of bacterial or fungal infection, during periods of severe neutropenia after initial induction chemotherapy for acute nonlymphocytic leukemia or bone marrow transplantation. 31-38There were differences among the studies in the dose of granulocytes transfused, the assessment of leukocyte compatibility before the transfusion, the duration of neutropenia in the enrolled patients, as well as in the infection and survival rates of the patients from the control group who did not receive granulocyte transfusions (Table 1). Only 331,33,38 of the 831-38 RCTs demonstrated a statistically significant (P < .05) benefit from granulocyte transfusions, that is, a reduction in the risk of developing bacterial or fungal infection, but not in the risk of dying of bacterial or fungal infection, in the treatment (compared with the control) group (Table 2). However, although the differences did not attain statistical significance in most cases, the risk of developing bacterial or fungal infection, as well as the risk of dying of bacterial or fungal infection, were both lower in the treatment group of patients receiving granulocyte transfusions than in the control group of 73133,3538 of the 831-38studies. These 7 s t u d i e s 31-33,35-38 w e r e all single-center RCTs. The only study that observed a statistically insignificant, higher risk of bacterial or fungal infection and death of bacterial or fungal infection
MULTICENTER RANDOMIZED CONTROLLED TRIALS
139
Table 1. Randomized Controlled Trials of the Efficacy of Prophylactic Granulocyte Transfusions: Study Descriptors
Trial
Dose of Granulocytea*
Leukocyte Compatibilityt
Duration of Neutropenia$ (days)
Infection Rate in Controls (%)w
Mortality Rate in Controls(%)ll
>1.5 x 1010daily 1.15 • 10 l~ 4 times weekly 2.1 • 10l~ daily 0.7 • 10 l~ daily 0.9 x 10 lo daily 1.45 x 10 lo on alternate days 1.2 x 10 lo daily 1,24 x 10lo
Yes No
12.4 21.5
42.5 33
2.5 22.2
Yes No No No
12 18.3 NR 14
39.2 41.7 12.5 63.6
7.1 8.3 NR 9.1
No Yes
23.4 10.6
47.4 53,3
11.1 26.7
Sample S i z e
Cliff et aP 1 Schiffer et a132
69 18
Mannoni et a133 Strauss et a134 Sutton et a138 Ford et a136
50 102 67 24
Winston et a137 Gomez-Villagran et a138
38 35
Abbreviation: NR, not reported. *Mean number of granulocytes per transfused concentrate. tPretransfusion assessment of leukocyte compatibility. Leukocytes were "compatible" with the patient's serum if they produced a negative result on a microlymphocytotoxic crossmatch or if there was a statement in the report of the study that the recipient's serum contained no antibodies directed against antigens present on the transfused cells. CDefined variably as white blood count <200/pL or <500/pL. w of clinical (as opposed to culture-proven) bacterial or fungal infection in the control arm of each study, after exclusion of cases of fever of unknown origin and of local infections designated as "mild" or "minor" by the authors. IIFrom bacterial or fungal infection developing during the study period.
in the recipients of granulocyte transfusions had been conducted at 4 medical centers. 34 In this multicenter study, 34 the odds ratio (OR) of bacterial or fungal infection in the treatment (compared with the control) group was 1.21 (P = .6922); the OR of death from bacterial or fungal infection was 1.91 (P = .3686). Because the OR of bacterial or fungal infection in the treatment (compared to the control) group varied from 0.0538 to 1.2134 among the 8 RCTs, the Table 2. Randomized Controlled Trials of the Efficacy of Prophylactic Granulocyte Transfusions: Reported Results Risk of Bacterial or Fungal Infection
Risk of Death From Bacterial or Fungal Infection
95% CI
95% CI
Trial
OR*
for OR
OR*
for OR
Clift et a131 Schiffer et a132 Msnnoni et a133 Strauss et a134 Sutton et a135 Ford et a138 Winston et a137 Gomez-Villagran et a138
0.10 0.12 0.07 1.21 0.92 0.25 0.65
0.02-0.4810.005-2.80 0.009-0.6310.55-2.64 0.19-4.50 0.05-1.39 0.18-2.37
0.68 0.21 0.30 1.91 NR 0.83 0.44
0.02-21.11 0.008-5.34 0.01-7.06 0.54-6.81 NR 0.05-15.09 0.04-5.38
0.05
0.005-0.46t
0.07
0.004-1.54
Abbreviations: OR, odds ratio; CI, confidence interval; NR, not reported. *In the treatment (granulocyte transfusion) as compared with the control group. ,Statistically significant ( P < .05) difference, because the 95% CI for the OR does not include the null value of 1.
findings of all 8 studies 3v38 could not be combined by the techniques of meta-analysis (P < .0001 for the Q t e s t statistic). 39 The results of the available studies can be integrated in a meta-analysis only if the variation in the reported results is sufficiently modest to be attributed to chance, that is, only if there is a greater than 5% probability that any disagreements among the studies could have arisen by chance. The Q test statistic quantifies the probability that the variation in the reported results might have arisen by chance, indicating that there is sufficient agreement among the studies to permit the undertaking of a meta-analysis if P > .05 for the Q test statistic. 39 As reported previously, 39 the assessment of leukocyte compatibility before each granulocyte transfusion was the only study descriptor (Table 1) that could alone explain the disagreements among the studies. The 3 studies that detected a statistically significant (P < .05) benefit from granulocyte transfusions 31,33,3s were the only studies that had performed pretransfusion assessment of leukocyte compatibility, assuring that the recipient's serum contained no antibodies directed against antigens present on the transfused cells. There was also variation in the results of the studies 31-38when mortality from bacterial or fungal infection during the study period was used as the outcome measure (P < .05 for the Q test statistic). As reported previously, 39 the dose of granulocytes
140
ELEFTHERIOS C. VAMVAKAS
transfused, the assessment of leukocyte compatibility before each transfusion, and the duration of neutropenia could explain the variation in the reported findings (Tables 1 and 2). Granulocyte transfusions reduced (P < .05) the risk of death from bacterial or fungal infection if more than 1.0 • 101~ granulocytes had been transfused daily; if compatible leukocytes had been administered; or if the average period of neutropenia in the enrolled patients had lasted for 10.6 to 14 days. (The summary OR of mortality across the combined RCTs with each aforementioned study characteristic was 0.21, 0.17, and 0.23, respectively). Granulocyte transfusions did not confer a statistically significant benefit across the RCTs that had transfused a lower dose of granulocytes, had administered granulocytes without prior assessment of leukocyte compatibility, or had enrolled patients in whom the average period of neutropenia had lasted longer than 2 weeks. 39 When mortality from bacterial or fungal infection is used as the outcome measure, it is possible to also explain the variation in the results of the 8 RCTs by stratifying the available studies according to multicenter versus single-center design. If the multicenter RCT of Strauss et a134is excluded from the analysis, the disagreements among the remaining 7 single-center studies 31-33,35-38 are sufficiently modest to be attributed to chance (P > .25 for the Q test statistic). The summary OR of mortality
from bacterial or fnngal infection across the 7 studies, which is calculated by the random-effects method of DerSimonian-Laird, 4~ is 0.21 (95% confidence interval [CI], 0.09 to 0.50; P < .05). PERIOPERATIVE ALLOGENEIC BLOOD TRANSFUSION AND BACTERIAL INFECTION
Seven RCTs 41-47 investigated the association of perioperative allogeneic blood transfusion with postoperative bacterial infection, which is attributed to a purported immunomodulatory effect of allogeneic blood transfusion. 48 According to the analyses reported by the authors of these studies, the allogeneic blood transfusion effect varied from an ll-fold increase (P < .01) 41 to a 10% reduction (P > .25) 43,44 in the risk of postoperative infection in the treatment (compared with the control) group of patients. Subjects in the former group received buffy-coat-reduced42-46 or standard 41,47 allogeneic whole blood 41 o1" red blood cells (RBCs)42-47; patients in the latter group were transfused with autologous or white blood cell (WBC)-reduced allogeneic whole blood or RBCs (Table 3). The intention-to-treat analyses shown in Table 3 similarly indicate a more than 7-fold variation among the findings of these studies. The magnitude of the disagreements among the 7 RCTs precludes any attempt at integration of the results by the techniques of meta-anatysis (P < .0001 for the Q test statistic). 49,5~There were
Table 3. Randomized Controlled Trials Investigating the Association Between Perioperative AIIogeneic Blood Transfusion and Postoperative Bacterial Infection Odds Ratio of Postoperative Infection*
Sample
P$
Overall Percent Developing Postoperative Infection1"
Study
Size
RBC Product Given to the Control Group
OR
95% CI
Jensen et aP 1
197
7.32
1.62-33.12
.0033
8.1
Heiss et a142 Busch et a143 Houbiers et a144
120 470 697
2.75 0.89 0.87
1.05-7.24 0.59-1.34 0.64-1.19
.0419 .60 .42
20.0 26.0 33.4
Jensen et a146
589
3.49
2.16-5.20
<.0001
17.5
Van de Watering et al 4e
909
1.42
1.00-1.99
.0547
19.3
Tartter et a147
221
WBC-reduced allogeneic whole blood (filtered after storage) Autologous RBCs Autologous whole blood WBC-reduced, BC-reduced allogeneic RBCs (filtered before storage) WBC-reduced, BC-reduced altogeneic RBCs (filtered after storage) WBC-reduced, BC-reduced allogeneic RBCs (filtered before [n = 302] or after In - 302] storage) WBC-reduced allogeneic RBCs
1.85
0.89-3.85
.1076
16.7
Abbreviations: RBC(s), red blood cell(s); OR, odds ratio; CI, confidence interval; WBC-reduced, white blood cell-reduced; BC-reduced, bully-coat-reduced. *In the treatment (compared with the control) group; calculated according to an intention-to-treat analysis. 1"In the entire study population. $Calculated by a 2-tailed Fisher's exact test.
MULTICENTER RANDOMIZED CONTROLLED TRIALS
differences among the studies in the surgical setting where each trial was conducted (ie, gastrointestinal 41-45,47 13open-heart46 surgery), the homogeneity of the enrolled patient population (ie, patients undergoing elective surgery for any gastrointestinal disease 47 or any disease of the colorectum 41,45 versus patients undergoing elective colorectal cancer resection42-44), the RBC product transfused to the treatment group (ie, allogeneic whole blood 4I or R B C s 47 v buffy-coat-reduced allogeneic R B C s 4246) or the control group (Table 3), the proportion of transfused patients (which ranged from 26.7% 47 to 94.8%46), and the proportion of patients who developed postoperative infection according to the criteria used by the investigators (Table 3). The medical sources of variation in the findings of these RCTs 41-47 were discussed previously. 5~,52 Blajchman 51 and Vamvakas and Blajchman 52 argued that differences in the RBC products transfused to each study arm, differences in the criteria used for diagnosing postoperative infections, and differences in the distribution of risk factors for postoperative infection among the enrolled patients may have been responsible, at least in part, for the disagreements among the studies. In addition, these authors 51-53 speculated that the multicenter design used in the RCTs of Busch et al43 and Houbiers et a144may have biased the results of those 2 investigations toward the null. McAlister et a154observed that, for the most part, the disagreements among the 7 R C T s 41"47 w e r e due to the results reported in 2 articles 41,45 by the same group of investigators. This group found an implausibly large allogeneic blood transfusion effect, because they observed an extremely low postoperative infection rate among the patients who received WBC-reduced RBCs. In their 1996 study, Jensen et a145 detected 0 postoperative wound infections and intraabdominal abscesses among 118 patients sick enough to need perioperative transfusion with WBC-reduced RBCs; if this finding were not due to the effect(s) of bias and confounding, it would implicate the immunomodulatory effects of allogeneic blood transfusion as the sole cause of postoperative wound infections, because the incidence of postoperative wound infections and intraabdominal abscesses among the 142 patients who were transfused with buffy-coat-reduced RBCs in that trial 45 was 12%. Accordingly, McAlister et a154 combined the findings of the available RCTs 41-45before and after
141
excluding the 2 trials of Jensen et a141,45 from the analysis. It is now possible to update the results of that meta-analysis 54 by including the recent RCTs by van de Watering et aP 6 and Tartter et al. 47If the 2 RCTs by Jensen et a141,45ate excluded, the disagreements among the remaining 5 studies are sufficiently modest to be attributed to chance (P > .05 for the Q test statistic). The summary OR of postoperative infection in the treatment (compared to the control) group across the 5 studies 42-44,46,47is 1.17 (95% CI, 0.84 to 1.62; P >> .05). 40 The 2 RCTs by Jensen et a141,45 are highly homogeneous (P = 0.50 for the Q test statistic), and--if the findings of these 2 studies 41,45 are integrated4~ summary OR is 3.68 (95% CI, 2.11 to 6.43; P < .001). It is also possible to explain the disagreements among the 7 RCTs 41"47 by excluding the 2 multicenter studies 43,44 from the analysis. When the 5 single-center R C T s 41'42'45-47 are integrated, 4~ the disagreements among the studies are again sufficiently modest to be attributed to chance (P > .05 for the Q test statistic), and the summary OR across the 5 studies is 2.29 (95% CI, 1.35 to 3.87; P < .05). The 2 multicenter R C T s 43'44 are highly homogeneous (P = .95 for the Q test statistic), and if the findings of these 2 studies are integrateda~ summary OR is 0.88 (95% CI, 0.69 to 1.12; P >> .05). Either approach explains the initial heterogeneity among the 7 studies 41-47 equally well. When the 2 trials by Jensen et a141,45 are excluded, the calculated Q test statistic across the remaining 5 studies 42-44,46,47 is 8.93 (4 degrees of freedom; P > .05). When the 2 multicenter RCTs 43,44 are excluded, the calculated Q test statistic across the remaining 5 studies 41,42,4547 is 9.04 (4 degrees of freedom; P > .05). The former approach attributes the initial disagreements among the 7 RCTs 41,47 to the possible effects of bias and residual, uncontrolled confounding factors in the 2 studies by Jensen et al,4~,45 as discussed below in the section on Systematic Error. The latter approach ascribes the noted discrepancies 4~-47 to the reduced statistical power of multicenter (compared with singlecenter) RCTs, as discussed in the sections on the Center Effect and Random Error. Examining which of these 2 methodologic sources of variation is the most plausible in the case of the 7 RCTs of allogeneic transfusion and postoperative bacterial infection 41-47 is beyond the scope of this review. It
142
ELEFTHERIOS C. VAMVAKAS
should be noted, however, that the 2 multicenter the only trials to report a postoperative infection rate for the entire study population that exceeded 20.0% (Table 3). Therefore, when the initial discrepancies among the available RCTs are explained by the exclusion of these 2 studies, 43,44 the variation among the findings 41-47 can be attributed to either the use of a single-center versus multicenter design, or the use of different criteria for the diagnosis of postoperative infection by the various groups of investigators. Table 4 summarizes the data on these 2 possible sources of variation in the findings of the 7 R C Z s . 41-47 R C Z s 43'44 w e r e
THERAPEUTIC PLASMA EXCHANGE IN CHRONIC PROGRESSIVE MULTIPLE SCLEROSIS
Four RCTs 55-58 examined the hypothesis that the addition of therapeutic plasma exchange (TPE) to a.n immunosuppressive drug regimen increases that regimen's efficacy in slowing the progression of disability in patients with chronic progressive multiple sclerosis (CPMS) (Table 5). In these studies, disability was measured before the intervention (ie, immunosuppression and TPE), as well as at various follow-up intervals (eg, at 6, 12, 18, 24, and 36 months of follow-up). The early trials of TPE in CPMS 55,56,59produced mixed results. In 1985, the single-center, doubleblind RCT of Khatri et aP 7 reported impressive benefit from the addition of TPE to an immunosuppressive drug regimen. Patients from the control group received sham apheresis, and 2 neurologists blinded to the treatment allocations who were not involved in the care of the patients evaluated each p a t i e n t s There was a large and statistically significant benefit from TPE at 5 and 11 months of follow-up (P = .007 and P = .017, respectively). The trial was interrupted for ethical reasons at 11
months, so that all enrolled patients could receive TPE. 57 The question was generated as to why the results of Khatri et aP 7 differed so substantially from the findings of the earlier studies. 55,56,59However, a TPE protocol similar to that of Khatri et al had not been used in any of the earlier, singlecenter investigations (Table 5). The Canadian Cooperative Study 58 was undertaken to test the efficacy of the TPE protocol of Khatri et aP 7 at 9 participating medical centers. In addition to a control group of subjects receiving immunosuppression alone, this trial also included a control group of patients who received no treatment. Enrolled patients were aware of the treatment allocations, and sham apheresis was not used. Each patient was followed by a (blinded) evaluating neurologist who produced neurological assessments every 6 months for the purposes of the trial, and by an (unblinded) monitoring neurologist who was primarily responsible for the care of the patient and could administer co-intervention (with adrenocorticotropic hormone or steroids) if he or she observed neurological deterioration. When the Canadian Cooperative Study was analyzed as a single-blind study, using the assessments submitted by the (blinded) evaluating neurologists, there was a slight, marginally significant trend favoring the TPE group at 12 (P = .086) and 18 (P = . 106) months of follow-up, which was not sustained at 24 (P = .201) and 36 (P = .990) months; also, there was no benefit from TPE at 6 months (P = .246). However, TPE did produce a statistically significant benefit at 6, 12, 18, and 24 months of follow-up when the Canadian Cooperative Study was analyzed as an "open" trial, using the evaluations made by the (unblinded) monitoring neurologists (P = .047, P = .004, P = .072, and P - - . 0 3 1 , respectively). In addition, fewer patients receiving TPE needed co-intervention, and
Table 4. Two Possible Sources of Variation in the Findings of Randomized Controlled Trials Investigating the Association Between Perioperative AIIogeneic Blood Transfusion and Postoperative Bacterial Infection Summary Odds Ratio* Grouping of Studies Studies by Jensen et a141,4sversus studies by other groups of investigators Multicenter 43,44versus single-center studies
Studies Included in the Meta-analysis
Q TestStatistic (P)
Point Estimate
95% Confidence Interval
RCTs by Jensen et a141,45 Other ROTs4244,46,47
0.4641" (P = .50) 8.928~ (P > .05)
3.68 1.17
2.11-6.43 0.84-1.62
Multicenter RCTs43,44 Single-center RCTs41,42,46"47
0.0041" (P = .95) 9.036:~(P > .05)
0.88 2.29
0.69-1.12 1.35-3.87
*Calculated by the random-effects method of DerSimonian-Laird. 4~ 1"One degree of freedom (2 studies). ~Four degrees of freedom (5 studies).
143
MULTICENTER RANDOMIZED CONTROLLED TRIALS
Table 5. Randomized Controlled Trials Evaluating the Efficacy of Therapeutic Plasma Exchange in Chronic Progressive Multiple Sclerosis ImmunoauppreasiveRegimen Sample Size
Tr}N
Treatment Group
Control Group
Hauser et a155
30
IV ACTH plus oral cyciophosphamide
IV ACTH
Gordon et al s6
20
Oral azathioprine plus oral prednisone
Oral azathioprine plus oral prednisone
Khatri et al s7
59
Oral cyclophosphamide plus oral prednisone
Oral cyclophosphamide plus oral prednisone
112"
Oral cyclophosphamide plus oral prednisone
IV cyclophosphamide plus oral prednisone
Canadian Cooperative Study 58
Therapeutic Plasma Exchange Protocol (Treatment Group) 1-1.5 plasma volume 2-3 times/week for 2 weeks (total of 4-5 sessions) 1-1.5 plasma volume 2-3 times/week for 3 weeks (total of 8 sessions) 1-1.5 plasma volume 1 time/week for 20 weeks (total of 20 sessions) 1 plasma volume I time/week for 20 weeks (total of 20 sessions)
Abbreviations: IV, intravenous; ACTH, adrenocorticotropic hormone. *Excluding a second control group of patients receiving no treatment.
the average time to the first co-intervention was longer in patients receiving TPE compared with controls. 60 At 36 months of follow-up, the monitoring neurologists concurred with the evaluating neurologists in recording no benefit from TPE (P = .590 and P = .990, respectively). The authors of the Canadian Cooperative Study concluded that the combination of a placebo with the occasional use of co-intervention, as needed, was as effective an approach for slowing the progression of disability in CPMS as was either one of the immunomodulating regimens used in that trial. In a subsequent report, Noseworthy et al 6~ contrasted the blinded and unblinded neurologists' judgments of the patients' responses to TPE and emphasized that there would have been an important systematic error if the Canadian Cooperative Study had relied on the clinical assessments submitted by the (unblinded) monitoring neurologists.
(or standardized) by the research protocol. For example, in multicenter RCTs investigating the relationship between perioperative allogeneic blood transfusion and postoperative bacterial infection, 41"47 the research protocol adopted by all participating hospitals specified the extent of the indicated workup for establishing a diagnosis of infection and the criteria to be used for making that diagnosis. However, the rates of detection of postoperative infection could differ among hospitals, because of differences in patient-specific risk factors (eg, age, comorbidities, socioeconomic status, etc.), as well as differences in procedural risk factors that are difficult to standardize among participating institutions. The latter include the number and timing of preoperative enemas, the use of a gastric prophylaxis policy, the use of open drains and urinary catheters, the threshold for reoperations during the same admission, or for operating in the presence of a concurrent infection at another site, e t c . 61
THE CENTER EFFECT
Compared with single-center studies of the same association, multicenter RCTs may produce smaller (or statistically insignificant) estimates of the effect of an intervention, because of the impact of the "center effect" and the impact of random error. Both of these sources of variation bias the calculated estimate of a treatment effect toward the null. When a multicenter RCT is undertaken for the study of a particular association, the hospitals that participate in the trial usually differ in numerous and complex ways that can influence the resulting estimate of the treatment effect but are not captured
In clinical research, this constellation of institutional characteristics is usually referred to as the "center effect." Because of the "center effect," there was a statistically significant difference among the overall infection rates reported from the 16 hospitals participating in the RCT of Houbiers et al.44 The results of the RCT of Busch et al43 were reported only in summary form w i t h regard to postoperative infections, 62 and it was not stated whether the infection rates differed among the 15 hospitals that were included in that study. Because participating hospitals usually differ in terms of both patient-specific and procedural risk
144
factors, patients must be randomized separately at each hospital where a multicenter RCT is undertaken. If randomization is performed separately at each site, and providing that each site enrolls an adequate number of subjects, the levels of patientspecific and procedural risk factors that constitute the "center effect" should be distributed equally between the treatment and the control group of patients from each participating center. However, the distribution of these patient-specific and procedural risk factors will differ among the treatment groups of the various sites, as well as among the control groups of the various participating centers. Therefore, patients from the treatment group of one hospital should n o t be compared directly with subjects from the control group of another medical center (or vice versa); patients who received the intervention under study at one hospital should be compared directly only with subjects who received standard therapy at that same site. This is especially important when the overall frequency of the outcome under study (eg, the overall postoperative infection rate) differs among the participating hospitals,- because this indicates that various differences probably exist among the centers in patientspecific or procedural risk factors. When data from multicenter RCTs are analyzed, it is necessary to also stratify the analysis by participating hospital. For example, in multicenter RCTs investigating the relationship between perioperative allogeneic blood transfusion and postoperative bacterial infection, 43,44 an OR of postoperative infection in the treatment (compared with the control) group should be calculated separately for each hospital. If the hospital-specific ORs differ to an extent consistent with random sampling variation, 63 it is legitimate to combine the hospitalspecific ORs into a common OR, using the method of Mantel-Haensze164 or some other equivalent method. In the calculation of this c o m m o n O R , 64 patients from the treatment group of each hospital are compared directly only with subjects from the control group of that same center, and the participating medical centers contribute to the size and the significance of the resulting common OR to an extent commensurate with the number of patients that they have enrolled. 17-~9Therefore, the common OR should be free of the "center i effect," because (i) it is based on internal comparisons from within each medical center, and (ii) the treatment and the control group of patients from each center are
ELEFTHERIOS C. VAMVAKAS
equivalent in terms of the levels of the factors that constitute the "center effect," given that randomization was performed separately at each c e n t e r . 17-19 If in lieu of calculating a common OR from a stratified analysis63,64---the data from all hospitals are pooled, and the reported OR is based on an analysis of a single 2 • 2 contingency table, the "center effect" biases the calculated estimate of the treatment effect toward the null. 1719 In the example of the multicenter RCTs investigating the association of perioperative allogeneic blood transfusion with postoperative bacterial infection, 43,44 the presented pooled analysis may have failed to detect a true deleterious allogeneic blood transfusion effect, especially because the overall infection rate differed among the hospitals. 44 The importance of the "center effect" in abdominal s u r g e r y 41"45,47 w a s shown in a study that compared the observed postoperative infection rates among 20 surgical departments across Israel. 61 The infection rates varied from 0% to 65%, and the marked interdepartmental differences could not be accounted for by differences in patient case-mix. Procedural risk factors were of paramount importance in explaining the observed variation among those 20 surgical departments. 61 The data of Tabar et al65 illustrate how much the "center effect" can bias the results of a multicenter RCT toward the null if a pooled analysis is used. These investigators studied the efficacy of mammography in reducing mortality from breast cancer in women from 2 Swedish counties, Kopparberg County and Ostergotland County. Baseline breast cancer mortality was almost twice as high in women serving as controls in Kopparberg County, as compared with Ostergotland County (207.0 and 123.9 deaths per 100,000 women, respectively). The corresponding figures for women having mammography were 130.6 and 92.2 deaths per 100,000 women, respectively, indicating that women who had mammography in Kopparberg County fared worse than women who did not have mammography in Ostergotland County (130.6 and 123.9 deaths per 100,000 women, respectively). Mammography reduced breast cancer mortality by 37% in Kopparberg County, and by 26% in Ostergotland County. If a common OR based on internal comparisons from within each c o u n t y 64 w e r e to be calculated from these data, it would indicate a statistically significant (P < .05), 31% reduction in breast cancer mortality attributable to mammography in
MULTICENTER RANDOMIZED CONTROLLED TRIALS
the entire study population of the 2 Swedish counties. However, Tabar et a165presented a pooled analysis, using a single 2 • 2 contingency table for both counties, and computed a benefit from mammography in the entire study population that was smaller than the benefit from the procedure calculated separately from each county. When the data were pooled, there were 114.4 deaths per 100,000 women undergoing mammography and 151.5 deaths per 100,000 women not having the procedure; mammography reduced breast cancer mortality by 24% (P = .05) in both counties considered together in the pooled analysis, as compared with 37% in Kopparberg County considered alone and 26% in Ostergotland County considered alone. RANDOM ERROR
In addition to differences in patient-specific and procedural risk factors that are not captured by the research protocol of a multicenter RCT, there may be differences among the hospitals that participate in a multicenter study in the implementation of some aspects of the research protocol per se. For example, in the multicenter RCTs that investigated the association of perioperative allogeneic blood transfusion with postoperative bacterial infection, 43'44 there could be differences among the hospitals in the interpretation of the uniform set of diagnostic criteria for postoperative infection. The medical staff of some centers could be more lenient (than the medical staff of other centers) in interpreting these criteria. In the absence of any bias on the part of the clinical investigators, differences in the manner diagnostic criteria are interpreted would be expected to occur randomly among the participating centers. To some extent, such differences may be unavoidable. However, these differences can be minimized by proper training and retraining of staff, certification procedures, duplicate testing for key variables, use of a centralized laboratory, etc. Also, staff at all centers should understand the research protocol definitions of baseline and outcome variables and should know how to complete forms and make diagnoses or perform tests. 1 Poorly standardized procedures or ambiguous definitions of baseline and outcome variables may result in missing data, incorrect data, or excess variability. Variability can be intrinsic to the characteristics being measured, the instrument(s) used for the measurement, or the observer responsible for obtaining the data. 66,67
145
People perform tasks differently, and they may vary in their level of knowledge and experience; these factors often result in interobserver variability. Providing that all these errors occur in a truly random manner, they can only produce an increase in the inherent variation of the measured baseline and outcome variables, thus reducing the power of the study to detect a treatment effect. Quality control procedures pertaining to the various phases and aspects of data collection improve the accuracy of the collected information and enhance the power of an RCT to detect a treatment effect. Various procedures for reducing the magnitude of these random errors in clinical trials h a v e been proposed. ~Random error in RCTs is similar to random misclassification in observational studies, an issue that has been discussed extensively in the epidemiologic literature. 68-7~ SYSTEMATIC ERROR
Random error and the "center effect" often reduce the power of a multicenter RCT to detect a treatment effect, but they do not affect the internal validity of the results of the study as reported. The main reason for undertaking an RCT is to establish a causal relationship, that is, to show a statistically significant treatment effect that cannot be ascribed to the effects of selection bias, observation bias, or confounding factors, m,6,9 Bias is any systematic error in the design of a study (or execution of the study protocol) that may result in a systematic deviation from the truth when the data are analyzed and reported. 3 (A deviation from the truth is considered to be "systematic" when it is not random, that is, when it always occurs in the same direction.) Selection (allocation) bias refers to systematic differences between patients who received the treatment under study and those who did not. 3 Observation (ascertainment) bias refers to systematic differences in the investigators' diagnoses of subsequent disease in patients who received the treatment under study and those who did not. 3 Confounding refers to the situation in which an unrecognized variable is associated with both the outcome of interest and its putative cause, producing a spurious association (or concealing a true relationship) between the treatment and the outcome of interest when the data are analyzed and reported. 4,5 To this author's knowledge, there have been no empirical comparisons of the extent of residual bias
146
or confounding between single-center and multicenter RCTs investigating the same hypothesis. However, although no empirical data are provided in support of this contention, a case can be made here that multicenter RCTs may be less susceptible to the effects of bias and confounding compared with single-center studies. Bias is introduced into RCTs when the investigators become aware of the next treatment assignment ~~ and channel participants preferentially to one or the other group, and when patients and investigators are unblinded and are influenced by notions about benefit from the treatment in reporting signs and symptoms or assessing disease activity. 8 In other words, bias is introduced into an RCT when those responsible for admitting patients into the study decipher the treatment allocation sequence, or when those assessing the occurrence of the outcome of interest become aware of the treatment allocations. Whether the issue is deciphering the treatment allocation sequence or breaking the blind, both situations can occur either inadvertently or deliberately, and they do not appear to represent rare occurrences. 7,s,1~ In multicenter RCTs, both the recruitment of patients for the study and the assessment of the outcome of interest are delegated to clinical (field) investigators. Compared with the principal investigator(s) and any sponsor(s) of the trial, the clinical investigators have much less at stake, because their career and professional reputation are much less dependent on the results of the study. Clinical investigators may thus not have preconceived notions about benefit from the intervention, and this reduces the potential for introducing observation or selection bias into a multicenter trial. In addition, in multicenter RCTs, the responsibilities for the operation of the trial are usually divided between the clinical centers and the assembly of the investigators on the one hand and the coordinating center, the central pharmacy, and the data-monitoring committee on the other hand. Although randomization is performed separately for each participating medical center, the treatment allocations are usually made centrally by the coordinating center, allowing field investigators no control over the allocation of local participants to treatment and control groups. A central pharmacy often labels and distributes the study medications, again assuring blinding of the field investigators. Moreover, responsibility for patient safety rests with the data-
ELEFTHERIOS C. VAMVAKAS
monitoring committee, a feature that may make clinical investigators less inclined to break the blind to secure the safety of their patients. All of these factors combine to assure that those who have a stake in the results of a trial for professional or financial reasons have only limited control over the operation of a multicenter RCT. With regard to the potential for residual confounding, multicenter RCTs generally have a larger sample than single-center studies. Thanks to the enrollment of more patients, the play of chance is more likely to distribute the levels of all confounding factors equally between the arms of a multicenter RCT than among the groups of a single-center study. Also, the committee structure and the higher level of funding usually obtained for a multicenter RCT make available the expertise that is needed for the design, operation, and analysis of the trial. It can thus be expected that the effects of any residual confounding factors may be addressed more appropriately in the design and analysis of a multicenter RCT than in a single-center study, because more people are involved and more statistical expertise is available.
CONCLUSIONS In theory, random error, systematic error, and the "center effect" all work in the same direction, contributing to the detection of a smaller treatment effect by multicenter RCTs compared with singlecenter studies. This theoretical prediction was corroborated in all 3 areas of investigation discussed here. In the example of prophylactic granulocyte transfusions, the multicenter RCT of Strauss et a134 was the only study in which patients receiving such transfusions had a statistically insignificant, higher frequency of bacterial or fungal infection (or of death from bacterial or fungal infection) during periods of severe neutropenia. In the example of the association of perioperative allogeneic blood transfusion with postoperative bacterial infection, the 2 multicenter RCTs of Busch et a143 and Houbiers et al44 w e r e the only studies that did not detect a beneficial effect from the use of WBC-reduced or autologous RBCs in reducing the incidence of postoperative infection. Finally, in the example of the use of TPE in CPMS, there was a marked difference between the findings of the single-center trial of Khatri et al, 57 which detected an impressive benefit from TPE, and the results of the Canadian
MULTICENTER RANDOMIZED CONTROLLED TRIALS
Cooperative Study, 58 which found TPE to be equivalent to placebo. In addition to the use of a multicenter versus single-center design, however, there were other differences between the reviewed multicenter trials 34'43'44'58that detected no treatment effect, and the corresponding single-center studies that reported benefit from the treatment or observed a statistically insignificant association suggesting a trend toward benefit (Tables 1, 3, and 5). As already discussed, in the multicenter trial of Strauss et al, 34 leukocyte compatibility had not been assessed before the transfusion, the daily dose of granulocytes transfused (0.7 • 10 l~ had been less than 1 • 10 l~ and the period of neutropenia in .the enrolled patients (18.3 days) had lasted longer than in subjects included in studies reporting a beneficial effect (Tables 1 and 2). In the multicenter trials of Busch et a143 and Houbiers et al,44 the overall postoperative infection rate (26% and 33%, respectively) was higher than the rate reported from the single-center studies, suggesting that the criteria used in the multicenter trials 43,44 for the diagnosis of postoperative infection may have been more lenient, or that the multicenter studies may have included more patients with risk factors for postoperative infection than the single-center trials. Finally, the authors of the Canadian Cooperative Study 58 observed a difference between the estimates of the treatment effect calculated from the evaluations submitted by the blinded and the unblinded neurologists. 6~When the effect of observation bias was minimized by relying on the evaluations of the blinded neurologists, no benefit from TPE could be detected; when the Canadian Cooperative Study was analyzed as an "open" trial, using the evaluations of the unblinded neurologists, there was agreement between the results of this "open," multicenter trial and the findings of the double-blind, single-center RCT of Khatri et al. 57 Despite the fact that the RCT of Khatri et a157 was reported as a double-blind study, its findings were approached with circumspection at the time of its publication, 71 and the extent of neurological improvement recorded for 4 of the 30 patients from the treatment group was considered too good to be true. As already discussed, similar reservations 53,54 were expressed about the findings of the 2 RCTs of Jensen et al,41,45 which were reported as singleblind studies. The difference between the results of the re-
147
viewed multicenter and single-center studies thus may be due to either the multicenter versus singlecenter design or to other factors, such as the effect of observation bias or the effects of differences in the employed treatment protocol, the enrolled patient populations, or the criteria used for assessing the occurrence of the outcome of interest. 39,51,52,71 Other possible reasons for disagreements may have been the effects of selection bias or the effects of residual, uncontrolled confounding factors. 53,54 There is no methodological reason that multicenter RCTs should be inherently superior or inferior to single-center studies in establishing a causal relationship. The reasons for the lower statistical power of multicenter RCTs compared with singlecenter studies, that is, the "center effect" and the greater likelihood of random error, can be minimized by means of an analysis that is stratified by participating hospital and by meticulous attention to the quality control of collected data and the standardization of procedures among participating centers. Similarly, the potentially greater susceptibility of single-center trials to the effects of selection bias, observation bias, and residual, uncontrolled confounding factors can usually be reduced by assuring correct implementation of an appropriate randomization procedure and blinding technique, and by increasing the sample size of a single-center trial to make it possible for the play of chance to equally distribute all confounding factors between the treatment and the control group of the study. In practice, single-center trials cannot often enroll a large enough study sample, and the undertaking of a multicenter trial becomes necessary. In addition to a larger study population, multicenter trials may assure a more representative sample of the target population. 1 Geography, race, socioeconomic status, and lifestyles of participants may be more representative of the general patient population if participants are enrolled by many centers. 1 Also, the differences among the participating medical centers better reflect the conditions under which the treatment will be implemented if found effective than does the style of medical practice at the hospital where the principal investigators are located. Furthermore, if all hospital-specific ORs calculated from the centers participating in a multicenter study point in the same direction, the findings of the multicenter RCT provide some measure of internal corroboration that patients
148
ELEFTHERIOS C. VAMVAKAS
from the treatment group do indeed derive benefit from the intervention under study. In other words, the multicenter organization of the study provides some assurance that the improved clinical outcome observed in the treatment group is probably attributable to the treatment per se, and not to some undetected error or bias that may have compromised the implementation of the treatment at a particular medical center. For these reasons, the results of a multicenter study are both more generalizable and more useful for policy making, compared with the findings of a single-center trial. 1,72 Although the results of any RCT need to be corroborated independently by other studies before a new intervention becomes standard therapy, before such corroboration becomes available one may be more inclined to believe the results of a multicenter (as compared with single-center) study. In summary, both multicenter and single-center RCTs can serve the purpose for which a prospective, randomized trial of a new intervention is undertaken: that is, they can establish a causal relationship between a treatment and an outcome, after eliminating the effects of selection bias, observation bias, and confounding factors. However, in practice, there may be differences between the results reported from single-center and multicen-
ter RCTs, because the former may be more susceptible to the effects of bias and confounding, whereas the latter may have reduced statistical power for detecting a treatment effect. When the results of multicenter trials differ from the findings of singlecenter studies of the same hypothesis, the mutticenter versus single-center design should be considered as a possible reason for the variation in the reported results, and both types of RCTs should be subjected to methodologic scrutiny. 73-76 The large, multicenter RCTs should be examined for the reporting of adequate quality control measures to assure consistency among hospitals in data collection; adequate standardization of all employed procedures among the participating medical centers; and a statistical analysis that is stratified by participating hospital. The small, single-center RCTs should be examined for the reporting of an appropriate, and adequately concealed, randomization procedure; an appropriate, and adequately concealed, blinding technique; as well as a detailed comparison of the levels of all potentially confounding factors between the treatment and the control arm of a trial. Guidelines for the reporting of RCTs have been developed, and they can assist both editors and readers in evaluating the methodologic quality of published studies. 77-8~
REFERENCES 1. Friedman LM, Furberg CD, DeMets DL: Fundamentals of Clinical Trials (ed 3). St Louis, MO, Mosby, 1995 2. Passamani E: Clinical trials: Are they ethical? N Engl J Med 324:1589-1591, 1991 3. Sackett DL: Bias in analytic research. J Chron Dis 32:51-63, 1979 4. Miettinen OS: Components of the crude risk ratio. A m J Epidemio196:168-172, 1972 5. Miettinen OS, Cook EF: Confounding: Essence and detection. Am J Epidemiol 114:593-603, 1981 6. Hennekens CH, Buring JE: Epidemiology in Medicine. Boston, MA, Little, Brown, 1987, pp 30-53 7. Kleinbaum DG, Kupper LL, Morgenstern H: Epidemiologic research: Principles and quantitative methods. New York, NY, van Nostrand Reinhold Company, 1982, pp 220-240 8. Huskisson EC, Scott J: How blind is double-blind? And does it matter? Br J Clin Pharmacol 3:331-332, 1976 9. Elwood P: Causal Relationships in Medicine. New York, NY, Oxford University Press, 1988 10. Schultz KF, Chalmers I, Hayes RJ, et al: Empirical evidence of bias: Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273:408-412, 1995 11. Schultz KF: Subverting randomization in controlled trials. JAMA 274:1456-1458, 1995
12. Pocock SJ, Lagakos SW: Practical experience of randomization in cancer trials: An international survey. Br J Cancer 46:368-375, 1982 13. Chalmers TC, Celano P, Sacks HS, et al: Bias in treatment assignment in controlled clinical trials. N Engl J Med 309:1358-1361, 1983 14. Lachin JM: Statistical properties of randomization in clinical trials. Control Clin Trials 9:289-311, 1988 15. Church TR, Ederer F, Mandel JS, et al: Estimating the duration of ongoing prevention trials. Am J Epidemiol 137:797810, 1993 16. Whitehead J: Sample sizes for phase II and phase III clinical trials: An integrated approach. Stat Med 5:459-464, 1986 17. O'Gorman TW, Woolson RF, Jones MP: A comparison of two methods of estimating a common risk difference in a stratified analysis of a multicenter clinical trial. Control Clin Trials 15:135-153, 1994 18. Howard VJ, Gizzle J, Diener HC, et al: Comparison of multicenter study designs for investigation of carotid endarterectomy efficacy. Stroke 23:583-593, 1992 19. Fleiss JL: Multicenter clinical trials: Bradford Hill's contributions and some subsequent developments. Stat Med 1:353-359, 1982
MULTICENTER RANDOMIZED CONTROLLED TRIALS
20. Friedman L, DeMets D: The data monitoring committee: How it operates and why. IRB 3:6-8, 1981 21. Fleming TR, DeMets DL: Monitoring of clinical trials: Issues and recommendations. Control Clin Trials 14:183-197, 1993 22. Cohen J: Clinical trial monitoring: Hit or miss? Science 264:1354-1357, 1994 23. Meinert CL: Organization of multicenter clinical trials. Control Clin Trials 1:305-312, 1981 24. Meinert CL, Heinz EC, Forman SA: Role and methods of the coordinating center. Control Clin Trials 4:355-375, 1983 25. Gluud C, Sorensen TIA: New developments in the conduct and management of multicenter trials: An international review of clinical trial units. Fundam Clin Pharmacol 9:284289, 1995 26. Meinert CL: In defense of the corporate author for multicenter trials. Control Clin Trials 14:255-260, 1993 27. Meinert CL: NIH multicenter investigator initiated trials: An endangered species? Control Clin Trials 9:97-120, 1988 28. Kamb ML, Dillon BA, Fishbein M, et al: Quality assurance of HIV prevention counseling in a multicenter randomized controlled trial. Public Health Rep 111:99-107, 1996 (suppl 1) 29. Cooperative Studies Program: Guidelines for the planning and conduct of cooperative studies. Washington, DC, Department of Veterans Affairs Office of Research and Development, 1997 30. Lock S, Wells F (eds): Fraud and Misconduct in Medical Research. London, UK, British Medical Journal Publishing Group, 1993 31. Clift RA, Sanders JE, Thomas ED, et al: Granulocyte transfusions for the prevention of infection in patients receiving bone marrow transplants. N Engl J Med 298:1052-1057, 1978 32. Schiffer CA, Aisner J, Daly PA, et al: Alloimmunization following prophylactic granulocyte transfusion. Blood 54:766774, 1979 33. Mannoni P, Rodet M, Vernant JP, et al: Efficiency of prophylactic granulocyte transfusion in preventing infections in acute leukemia. Blood Transfus Immunohematol 22:503-518, 1979 34. Strauss RG, Connett JE, Gale RP, et al: A controlled trial of prophylactic granulocyte transfusions during initial induction chemotherapy for acute myelogenous leukemia. N Engl J Med 305:597-603, 1981 35. Sutton DMC, Shumak KH, Baker MA: Prophylactic granulocyte transfusions in acute leukemia. Plasma Ther Transfus Technol 3:45-50, 1982 36. Ford JM, Cullen MH, Roberts MM, Brown LM, Oliver RTD, Lister TA: Prophylactic granulocyte transfusions: Results of a randomized controlled trial in patients with acute myelogenous leukemia. Transfusion 22:311-316, 1982 37. Winston DJ, Ho WG, Young LS, Gale RP: Prophylactic granulocyte transfusions during human bone marrow transplantation. A m J Med 68:893-897, 1982 38. Gomez-Villagran JL, Torres-Gomez A, Gomez-Garcia P, et al: A controlled trial of prophylactic granulocyte transfusions during induction chemotherapy for acute nonlymphoblasfic leukemia. Cancer 54:734-738, 1984 39. Vamvakas EC, Pineda AA: Determinants of the efficacy
149
of prophylactic granulocyte transfusions: A meta-analysis. J ClinApheresis 12:74-81, 1997 40. DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials 7:177-188, 1986 41. Jensen LS, Andersen AJ, Christiansen PM, et al: Postoperative infection and natural killer cell function following blood transfusion in patients undergoing elective cotorectal surgery. Br J Surg 79:513-516, 1992 42. Heiss MM, Mempel W, Jausch K-W, et al: Beneficial effect of autologous blood transfusion on infectious complications after colorectal cancer surgery. Lancet 342:1328-1333, 1993 43. Busch ORC, Hop WCJ, van Papendrecht MAWH, et al: Blood transfusions and prognosis in colorectal cancer. N Engl J Med 328:1372-1376, 1993 44. Houbiers JGA, Brand A, van de Watering LMG, et al: Randomized controlled trial comparing transfusion of leukocytedepleted or buffy-coat-depleted blood in surgery for colorectal cancer. Lancet 344:573-578, 1994 45. Jensen LS, Kissmeyer-Nielsen P, Wolff B, et al: Randomized comparison of leukocyte-depleted versus buffy-coat-poor blood transfusion and complications after colorectal surgery. Lancet 348:841-845, 1996 46. van de Watering LMG, Hermans J, Houbiers JGA, et al: Beneficial effect of leukocyte depletion of transfused blood on post-operative complications in patients undergoing cardiac surgery: A randomized clinical trial. Circulation 97:562-568, 1998 47. Tartter PI, Mohandas K, Azar P, et al: Randomized trial comparing packed red blood cell transfusion with and without leukocyte depletion for gastrointestinal surgery. Am J Surg 176:462-466, 1998 48. Vamvakas EC, Blajchman MA (eds): Immunomodulatory effects of blood transfusion. Bethesda, MD, American Association of Blood Banks Press, 1999 49. L'Abbe KA, Detsky AS, O'Rourke K: Meta-analysis in clinical research. Ann Intern Med 107:224-233, 1987 50. Cooper H, Hedges LV (eds): The Handbook of Research Synthesis. New York, NY, Russell Sage Foundation, 1994 51. Blajchman MA: Allogeneic blood transfusions, immunomodulation and postoperative bacterial infection: Do we have the answers yet? Transfusion 37:121-125, 1997 52. Vamvakas EC, Blajchman MA: A proposal for an individual patient data-based meta-analysis of randomized controlled trials of allogeneic transfusion and postoperative bacterial infection. Transfus Med Rev 11:180-194, 1997 53. Vamvakas E: Transfusion-associated cancer recurrence and infection: Meta-analysis of the randomized controlled clinical trials. Transfusion 36:175-186, 1996 54. McAlister FA, Clark HD, Wells PS, et al: Perioperative allogeneic blood transfusion does not cause adverse sequelae in patients with cancer: A meta-analysis of unconfounded studies. BI- J Surg 85:171-178, 1998 55. Hanser SL, Dawson DM, Lehrich JR, et al: Intensive immunosuppression in progressive multiple sclerosis: A randomized three-arm study of high dose intravenous cyclophosphamide, plasma exchange and ACTH. N Engl J Med 38:173-180, 1983 56. Gordon PA, Carroll DJ, Etches WS, et al: A double-blind controlled pilot study of plasma exchange versus sham apheresis
150
in chronic progressive multiple sclerosis. Can J Neurol Sci 12:39-44, 1985 57. Khatri BO, McQuillen MP, Harrington GJ, et al: Chronic progressive multiple sclerosis: Double-blind controlled trial of plasmapheresis in patients taking immunosuppressive drugs. Neurology 35:312-319, 1985 58. The Canadian Cooperative Multiple Sclerosis Study Group: The Canadian cooperative trial of cyclophosphamide and plasma exchange in progressive multiple sclerosis. Lancet 337:441-446, 1991 59. Tindall RSA, Walker JE, Ehle AL, et al: Plasmapheresis in multiple sclerosis: Prospective trial of pheresis and immunosuppression versus immnnosuppression alone. Neurology (NY) 32:739-743, 1982 60. Noseworthy JH, Ebers GC, Vandervoort MK, et al: The impact of blinding on the results of a randomized, placebocontrolled multiple sclerosis clinical trial. Neurology 44:16-20, 1994 61. Simchen E, Zucker D, Siegman IY, et al: Method for separating patient and procedural factors while analyzing interdepartmental differences in rates of surgical infections: The Israeli study of surgical infection in abdominal operations. J Clin Epidemio149:1003-1007, 1996 62. Busch ORC, Hop WCJ, Marquet RL, et al: Autologous blood and infections after colorectal surgery. Lancet 343:668669, 1994 (letter) 63. Woolf B: On estimating the relation between blood group and disease. Ann Hum Genet 19:251-253, 1955 64. Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719-748, 1959 65. Tabar L, Fagerberg CJG, Gad A, et al: Reduction in mortality from breast cancer after mass screening with mammography. Randomized trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare. Lancet i:829-833, 1985 66. Koran LM: The reliability of clinical methods, data and judgments. Part 1. N Engl J Med 293:642-646, 1975 67. Koran LM: The reliability of clinical methods, data, and judgments. Part 2. N Engl J Med 243:695-701, 1975
ELEFTHERIOS C. VAMVAKAS
68. Copeland KT, Checkoway H, Holbrook RH, et al: Bias due to misclassification in the estimate of relative risk. Am J Epidemiol 105:488-495, 1977 69. Greenland S: The effect of misclassification in the presence of co-variates. Am J Epidemiol 112:564-569, 1980 70. Gullen WH, Bearman JE, Johnson EA: Effects of misclassification in epidemiologic studies. Public Health Rep 53:19561965, 1968 71. Weiner HL: An assessment of plasma exchange in progressive multiple sclerosis. Neurology 35:320-322, 1985 (editorial) 72. Mant D: Can randomized trials inform clinical decisions about individual patients? Lancet 353:743-746, 1999 73. Jadad AR, Moore RA, Carroll D, et al: Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Control Clin Trials 17:1-12, 1996 74. Moher D, Jadad AR, Nichol G, et al: Assessing the quality of randomized controlled trials: An annotated bibliography of scales and checklists. Control Clin Trials 16:62-73, 1995 75. Moher D, Jadad AR, Tugwell P: Assessing the quality of randomized controlled trials. Int J Technol Assess Health Care 12:195-208, 1996 76. Chalmers TC, Smith H Jr, Blackburn B, et al: A method for assessing the quality of a randomized controlled trial. Control Clin Trials 2:31-49, 1981 77. The Standards of Reporting Trials Group: A proposal for structured reporting of randomized controlled trials. JAMA 272:1926-1931, 1994 78. The Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature: Checklist of information for inclusion in reports of clinical trials. Ann Intern Med 124:741-743, 1996 79. Begg CB, Cho MK, Eastwood S, et al: Improving the quality of reporting of randomized controlled trials: The CONSORT statement. JAMA 276:637-639, 1996 80. Rennie D: How to report randomized controlled trials: The CONSORT statement. JAMA 276:649, 1996