Gynecology Methodology citations and the quality of randomized controlled trials in obstetrics and gynecology David A. Grimes, MD," and Kenneth E Schulz, PhD, MBAb
San Francisco, California, and Atlanta, Georgia OBJECTIVES: Randomized controlled trials offer the best chance for valid treatment comparisons, yet most trials are of poor quality. This may reflect a lack of awareness of the requirements for conducting and reporting this type of research. If so, then citation of methodology references might indicate knowledge of how to conduct these studies and vice versa. Our study tests the hypothesis that the methodologic quality of published trials is related to citation of methodology references. STUDY DESIGN: We performed a hand search of the AMERICANJOURNALOFOBSTETRICSAND GYNECOLOGY,the British Journal of Obstetrics and Gynaecology, the Journal of Obstetrics and Gynaecology, and Obstetrics and Gynecology to identify all randomized controlled trials published in 1990 and 1991 (N = 206). We reviewed the reference lists of all reports of randomized controlled trials and evaluated the adequacy of randomization methods by accepted criteria. RESULTS: Most reports (81.6%) cited no methodology text or article. Although lack of any methodology reference was not significantly related to failure to report an adequate random method of sequence generation, this was highly related (p < 0.001) to failure to report adequate allocation concealment. Scanning the reference list of reports took a mean of 16 seconds and identified most poorly done trials. CONCLUSIONS: Investigators who conduct randomized controlled trials should be thoroughly familiar with this type of research or should get expert help. Poorly done trials are wasteful and often misleading. (AM J OBSTErGYNECOL1996;174:1312-5.)
Key words: Randomized controlled trials, methodology, citations, bias, trial quality
Randomized controlled trials are the onIy known way to avoid selection biases that handicap observational studies. 1 Despite their scientific merit, randomized controlled trials are uncommon in most medical disciplines, including obstetrics and gynecology.2 Most of those that are done in obstetrics and gynecology violate the rules of conduct? Poorly done randomized controlled trials may be subject to bias and thus more closely resemble cohort studies than randomized trials. Hence their results may be incorrect or misleading2 Indeed, those that inadequately conceal the allocation process are associated with exaggerated estimates of treatment effects (evidence of bias)? Why are most randomized controlled trials done From theDepartmentof Obstetrics, Gynecologyand ReproductiveSciences and the Department of Epidemiology and Biostatistics, University of California, San Francisco," and the Division of Sexuall~ Transmitted Disease Prevention, National Centerfor HI~, STD, and TB Prevention, Centersfor Disease Control and Prevention.b Received for publication June 29, 1995; revised September 18, 1995; accepted September28, 1995. Reprint requests:David A. Grimes, MD, Department of Obstetrics, Gynecologyand ReproductiveSciences, Ward 6D14, San Francisco General Hospital, San Frandsco, CA 94110. 6/1/69633 1312
poorly? While examining the scientific quality of randomized controlled trials in four journals of obstetrics and gynecology,:~we observed an association that may partly explain this deficiency. We noted that poorly done trials infrequently cited epidemiologic or biostadstical references relevant to trials. Hence poor trials likely result from investigators' lack of expertise with this type of research. We hypothesized that citation of methodology references might indicate better research and vice versa.
Material and m e t h o d s
We hand-searched and reviewed all reports of parallel (uncrossed) randomized controlled trials published in the AMERIC~YNJOURNAL OF OBSTETRICS AND GVNECOI,O(,Y, the
British Journal of Obstetrics and Gynaecology, the Journal of Obstetrics and Gynaecology (from the United Kingdom), and Obstetrics and Gynecology in 1990 and 1991 volumes (N= 206). We then supplemented the hand search with a review of the Oxford Database of Perinatal Trials volume 86 and MEDLINE. We focused on two critical indicatorsv-'~of trial quality: generation of a random sequence of assignments and subsequent concealment of the assignments up to the
Grimes and Schulz 1313
Volume 174, Number 4 AmJ Obstet Gynecol
Table I. Proportion of reports of randomized controlled trials with one or more methodology references, by journal, 1990 and 1991
Journal
No. with >1 methodology reference
Total No. of randomized controlled trials
% with >1 methodology reference
0 8 15 15 38
20 64 74 48 206
0.0 12.5 20.3 31.3 18.4
Journal of Obstetrics and Gynaecology AUEPdCaNJoum~ALOFOBSTETRICSANDG'/NECOLOCu
Obstetrics and Gynecology British Journal of Obstetrics and Gynaecology TOTAL
point of treatment. Development of a random sequence of treatment allocation avoids selection bias in controlled trials. We considered the following techniques of sequence generation to be adequate: computer, random number table, shuffled cards or tossed coins, and minimization2' 8 Inadequate methods included assignment by odd-even numbers (birth date or hospital number) or alternate assignment. These latter approaches are nonrandom.4. 8.9 Reduction of bias in trials depends on avoiding foreknowledge of upcoming treatment assignments. To avoid selection bias in randomized controlled trials, persons enrolling participants must be unaware of the upcoming treatment assignment. Sometimes called "randomization blinding," this process is better termed "allocation concealment. ''~ This distinguishes the process from "blinding" or "masking" the treatment arms. Allocation concealment, which primarily addresses selection bias, is always possible in a trial, whereas blinding the treatment, which primarily addresses ascertainment bias, may not always be possible. In 1983 Chalmers et a17 showed that randomized trials in which the allocation sequence had not been adequately concealed before allocation found larger estimates of treatment effects than did trials adequately concealed; this likely reflected bias. Recent empiric evidence from trials in obstetrics and gynecology~confirmed the importance of allocation concealment: Inadequately concealed trials exaggerated odds ratios by 41%, on average, compared with adequately concealed trials (p< 0.001). Those with unclear concealment exaggerated odds ratios by 30%, after adjustment for other aspects of quality. Inadequate sequence generation did not have a similar effectS; hence we judged trials as deficient if they did not ensure concealment of the allocation schedule, since this failure distorted the treatment effect by 30% to 41% on average. We dichotomized the trials into those with adequate allocation concealment and those without. Adequate approaches to allocation concealment included central randomization (e.g., telephone calls to a trials office); randomization by a pharmacy; numbered or coded containers; sequentially numbered, opaque, sealed envelopes containing method indicator cards; or other meth-
ods that provided convincing evidence of concealment. Trials without evidence of adequate concealment included those using alternation or allocation that was based on birth dates or chart numbers, as well as those with unclear concealment. To assess whether scanning reference lists might identify deficient trials, we calculated its validity as a test for improperly done trials. We suspected that failure to cite at least one methodology reference might signal articles with deficient methods. The first author (D.A.G.) reviewed all 206 reports again. He read the reference list of each and noted whether it included any references concerning epidemiology, biostatistics, or randomized controlled trial methods. He timed each review with a stopwatch. The other author (K.ES.) independently read the reference lists from the articles published in the British Journal of Obstetrics and Gynaecology (n = 48). There were no discrepancies betweeen the two assessments for this subsample; both determined that the same 15 of 48 trials cited one or more methodology references. We used Epi Info version 6 n to determine confidence intervals for the indexes of validity and to calculate Cohen's K coefficients. The K coefficient measures the association between nominal data in a 2 x 2 table by comparing the observed agreement with the agreement expected by chance alone? 2 Using standard criteria for evaluating randomization techniques, 7q~ we found that only 32% of reports described an adequate random method of sequence generation? Concealment of the sequence of assigned treatments until allocation is an even more critical element to proper randomization. ~ Only 23% documented concealment of the treatment assignment until the point of allocation? A mere 9% described an adequate method for both sequence generation and allocation conceahnent.
Results Most reports (81.6%) of randomized controlled trials published in these four journals had no references to methodology articles or texts (Table I). The likelihood of a report citing one or more such references varied significantly among journals (p = 0.01), with the Journal of Obstetrics and Gynaecology having the lowest and the British Journal of Obstetrics and Gynaecology the highest.
1314
Grimes and Schulz
.kptil 19!16 :ktnJ ()hstct (;,,necc+l
Table II. N u m b e r of methodology refierences and adequacy of r a n d o m sequence generation in 206 randomized controlled trials
Random sequencegenetatio*~ Adequate
lna&quate No. ~qfmethodology refer~lces 0 el TOIAI.
117 23 140
7btal
%
No.
%
:Vo.
%
83.6 16.4 100.0
51 15 66
77.3 22.7 100.0
168 38 206
81.6 18.4 100.11
Table III. N u m b e r of methodology refierences and adequacy of allocation c o n c e a h n e n t in 206 randomized controlled trials
Allocation concealment Inadequate
Adequate
7btal
No. oJmethodolol~ references
No.
%
No.
%
~b.
%
0 ~1 TO'IAL
139 19 158
88.0 12.0 100.0
29 19 48
60.4 39.6 100.0
168 38 206
81.6 18.4 100.0
Citation of methodology references was not significantly related (~: coefficient 0.07, one-tailed p = 0.14) to the likelihood of reporting an adequate r a n d o m m e t h o d of sequence generation (Table II). Failure to cite any such references had a sensitivity of 0.84 (95% confidence interval 0.76 to 0.89) in identifying trials with inadequate description of sequence generation. The specificity was 0.23 (95% confidence interval 0.14 to 0.35), the positive predictive value 0.70 (95% confidence interval 0.62 to 0.76), and the negative predictive value 0.40 (95% confidence interval 0.25 to 0.57). However, citation of methodology references was strongly related (~: coefficient 0.30, one-tailed p = 0.000008) to the probability of reporting adequate allocation c o n c e a l m e n t (Table III). O f those trials that adequately described c o n c e a l m e n t of the allocation sequence, 40% had one or more such references. Only 12% of those with inadequate descriptions did. Failure to cite any such references had a sensitivity in identifying trials with inadequate description of c o n c e a l m e n t of 0.88 (95% confidence interval 0.82 to 0.92). The specificity was 0.40 (95% confidence interval 0.26 to 0.55), the positive predictive value 0.83 (95% confidence interval 0.76 to 0.88), and the negative predictive value 0.50 (95% confidence interval 0.34 to 0.66). Scanning the reference lists of the articles took little time. For all 206 reports the m e a n time was 15.5 seconds (SD 7.4). This varied with the n u m b e r and complexity of the references from a mean of 9.6 seconds (SD 5.8) for the Journal of Obstetrics and Gynaecology to 18.1 seconds (SD 7.0) for the British Journal of Obstetrics and C,ynaecology.
Comment The requirements for the conduct and reporting of trials are well established, a l:~-,,yet most investigators do
not h e e d them. a' ; We suspect that nawetO rather than negligence is responsible; our analysis provides indirect support for the former explanation. Randomized controlled trials are an important but eclectic area of medicine. By analogy surgeons do not attempt to perform complex operations without proper training or supervision; both the patient and the physician would likely suffer long-term consequences. Nevertheless, physicians and other investigators routinely carry out randomized controlled trials ~ without acknowledging in print that they have either read how to do the study or gotten help with it. We believe this is inappropriate. A flawed report does not necessarily mean that the underlying study was flawed. Some authors may have used p r o p e r methods of r a n d o m sequence generation and allocation c o n c e a l m e n t but inadvertently omitted the required description of methods from their reports. At best, this reflects a lack of understanding of the reporting requirements for such studies, a "' ~4. 15 However, inadequate reporting usually reflects inadequate methods. 2" O f the four journals, the British Journal oj ObsteOJcs and Gynaecology had the highest proportion of well-designed trials) It also had the highest proportion of reference lists with one or more methodology citations. We do not believe this is coincidence. The British Journal of Obstetrics and Gynaecolog3' has taken deliberate steps to improve the quality of science that it publishes. Is Nevertheless, much work remains to be done. Some journals now refuse to publish reports that violate the rules of conduct. TM Until the quality of published trials improves, readers n e e d to be wary of what appears in print. 4 If a report had no methodology citation, the probability that the report had an inadequate description of the randomization process was 70%. Similarly, the likelihood of an inadequate description of allocation con-
Volume 174, Number 4 Am J Obstet Gynecol
c e a l m e n t was 83%. This simple p r o c e d u r e c a n identify m o s t poor-quality r e p o r t s o f r a n d o m i z e d c o n t r o l l e d trials in a b o u t 16 s e c o n d s w i t h o u t t h e m e t h o d s section of the article even b e i n g read. Regrettably, t h e negative predictive values were poor, a n d we d o n o t r e c o m m e n d this as a test of trial quality. T h e positive predictive value of this test o f trial quality would d e c l i n e in c o n d i t i o n s o f lower p r e v a l e n c e (i.e., b e t t e r studies). For e x a m p l e , if t h e p r e v a l e n c e of p o o r quality trials as j u d g e d by allocation c o n c e a l m e n t were 50% i n s t e a d o f 77% (Table III), t h e predictive value of failure to cite o n e or m o r e m e t h o d o l o g y r e f e r e n c e s w o u l d b e lower t h a n 83%. In a s a m p l e o f 200 trials with the same sensitivity a n d specificity as in Table III, t h e positive predictive value would fall to 59%. Flawed trials are the n o r m in obstetrics a n d gynecology ~ a n d in o t h e r fields as well) ~ ~4, ~5 Regardless of the source o f this p r o b l e m the n e t effect is the s a m e - - p o o r science, 2~ w h i c h can lead to p o o r clinical practice. 4 Because o f this, o n e critic 22 a r g u e s t h a t p e r f o r m i n g p o o r quality r e s e a r c h is u n e t h i c a l . Clearly, investigators w h o c o n d u c t r a n d o m i z e d c o n t r o l l e d trials s h o u l d b e thoro u g h l y familiar with t h e i r m e t h o d s or s h o u l d get h e l p f r o m experts. D a b b l i n g in r a n d o m i z e d c o n t r o l l e d trials may b e h a z a r d o u s to o u r p a t i e n t s ' h e a l t h . REFERENCES
1. Peto R, Pike MC, Armitage P, et al. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design. Br J Cancer 1976;34:585-612. 2. Tulandi T, Cherry N. Clinical trials in reproductive surgery: randomization and life-table analysis. Fertil Steril 1989;52: 12-4. 3. Schulz KF, Chalmers I, Grimes DA, Ahman DG. Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecologyjournals.JAMA 1994; 272:125-8. 4. Grimes DA. Randomized controlled trials: "it ain't necessarily so." Obstet Gynecol 1991;78:703-4.
Grimes and Schulz
1315
5. Schulz KF, Chalmers I, Hayes RJ, Ahman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408-12. 6. Chalmers I, Hetherington J, Newdick M, et al. The Oxford Database of Perinatal Trials: developing a register of published reports of controlled trials. Control Clin Trials 1986; 7:306-24. 7. Chalmers TC, Celano P, Sacks HS, Smith H Jr. Bias in treatment assignment in controlled clinical trials. N Engl J Med 1983;309:1358-61. 8. Pocock SJ. Clinical trials: a practical approach. Chichester, United Kingdom: John Wiley, 1983. 9. Meinert CL. Clinical trials: design, conduct, and analysis. New York: Oxford University Press, 1986. 10. Ahlnan DG, Dore CJ. Randomisation and baseline comparisons in clinical trials. Lancet 1990;335:149-53. 11. Dean AG, Dean JA, Coulombier D, et al. Epi Info, version 6: a word processing, database, and statistics program for epidemiology on microcomputers. Atlanta: Centers for Disease Control and Prevention, 1994. 12. Siegel S. Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956. 13. Chalmers TC, Smith H Jr, Blackburn B, et al. A method for assessing the quality of a randomized control trial. Control Clin Trials 1981;2:31-49. 14. Mosteller F, Gilbert JP, McPeek B. Reporting standards and research strategies for controlled trials: agenda for the editor. Control Clin Trials 1980;1:37-58. 15. Der Simonian R, Charette LJ, McPeek B, Mosteller E Reporting methods in clinical trials. N EnglJ Med 1982;306:1332-7. 16. Altman DG. Randomisation: essential for reducing bias. BMJ 1991;302:1481-2. 17. Meinert CL, Tonascia S, Higgins K. Content of reports on clinical trials: a critical review. Control Clin Trials 1984;5: 328-47. 18. Grant A. Reporting controlled trials. Br J Obstet Gynaecol 1989;96:397-400. 19. Freiman JA, Chalmers TC, Smith H Jr, Kuebler RR. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: survey of 71 "negative" trials. N EnglJ Med 1978;299:690-4. 20. Liberati A, Himel HN, Chalmers TC. A quality assessment of randomized control trials of primary treatment of breast cancer. J Clin Oncol 1986;4:942-51. 21. McDonough PG. "Leaky randomization": standard pract i c e - b u t is it correct? Fertil Steril 1995;64:216-7. 22. Altman DG. The scandal of poor medical research. BMJ 1994;308:283-4.