Reproductive BioMedicine Online (2010) 21, 440– 443
www.sciencedirect.com www.rbmonline.com
COMMENTARY
Misplaced obsession with prospectively randomized studies Norbert Gleicher
a,b,*
, David H Barad
a,c
a Center for Human Re production (CHR) – New York and Foundation for Reproductive Medicine, NY, USA; b Department of Obstetrics, Gynecology and Reproductive Sciences, Yale University School of Medicine, New Haven, CT, USA; c Departments of Epidemiology and Social Medicine and Obstetrics, Gynecology and Women’ Health, Albert Einstein College of Medicine, Bronx, NY, USA
* Corresponding author. E-mail address:
[email protected] (N Gleicher).
Abstract Medical care should be based on best available evidence. While randomized controlled trials (RCT) are currently consid-
ered a gold standard of study design, they are not always available and do not represent the only study format that can lead to best available evidence. This communication argues that overvaluation of RCT and undervaluation of other study formats in establishing best available evidence hurts progress in reproductive medicine and in medicine in general. RBMOnline ª 2010, Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved. KEYWORDS: best available evidence, randomized controlled trial, study design
Introduction ‘‘The pure and simple truth is rarely pure and never simple.’’ Oscar Wilde At a recent congress in Asia, we observed a young investigator summarizing her oral presentation with the now almost perfunctory conclusion that ‘there was no evidence in support of a given intervention’, and that ‘only future randomized controlled trials could determine whether the intervention, indeed, was useful’. A prominent colleague, commenting from the floor, then ‘congratulated her on this conclusion’. (We paraphrase both colleagues from memory, but the interpretations are accurate.) These colleagues, of course, do not stand alone in their unshakable belief in randomized controlled trials (RCT). Recent trends towards evidence-based medicine have widely popularized meta-analyses and their fodder, RCT. The popularity of RCT is understandable since they do represent a gold standard of study design (Scott, 2009a). They,
however, should not be considered the only acceptable format of study design to establish best available evidence. This point is finally also made from within the obstetrics and gynaecology community (Scott, 2009a,b; Vintzileos, 2009) and the general medical community (Chambers et al., 2009; Lewin et al., 2009; Smith and Pell, 2003). Arguing in defence of case reports, Scott (2009a) recently noted that RCT reflect the highest level of evidence only thanks to empirical decisions of expert committees, themselves reflecting only the lowest level of evidence. In another editorial he recently pointed out that his long experience as editor-in-chief of a leading specialty journal convinced him that there are no perfect studies (Scott, 2009b). We concur! Criticism of too much reliance on RCT appears timely. Particularly obvious in our specialty, their overvaluation threatens medical progress by undervaluing other study formats. For example, the clinical valuation of IVF did not wait for the conduct of RCT. Vintzileos (2009) perceptively differentiates in obstetrics between evidence- and reality-based medicine. As the development of IVF has well
1472-6483/$ - see front matter ª 2010, Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.rbmo.2010.06.042
Misplaced obsession with prospectively randomized studies
441
demonstrated, similar choices exist in reproductive endocrinology and infertility. Scott recently also noted correctly that lack of supportive study evidence appears to represent a rather general phenomenon in medicine since much of what physicians deal with in daily medical practice is based on neither data from RCT or even large observational studies (Scott, 2009a). This fact, alone, reemphasizes Vintzileos’ call for reality-based medicine, as previously already so well illustrated by Smith and Pell (2003) in their meanwhile infamous article on parachute use to prevent trauma. Diomidus (2002) notes that every study design has advantages and disadvantages. RCT have the most powerful design. Well-designed observational studies can provide useful insights on causation. Cohort studies can be useful in studying natural progression of conditions and risk factors of diseases. Case–control studies are quicker and less costly than RCT and even simple case series can be used to raise research questions, a notion also supported by Scott (2009a).
WHI proved that HRT at age 50–79 years did not provide cardiovascular benefits, while exposing significant risk towards other morbidities. Since these studies were only designed to address a specific set of conditions within a defined patient population, legitimate questions remain, such as would different hormones, different routes of administration and younger patient ages have yielded different study conclusions? The two hormone trials of the WHI were not designed to answer these questions and, therefore, clearly demonstrate previously noted limitations of RCT, which can address narrowly posted questions with high accuracy, while well-designed observational studies allow for a wider focus and thus shed light on broader data sets. Subject to biases however, they require careful adjustments for confounding factors. Once that is done, they offer useful, though not necessarily definite, information, as recently demonstrated when a single WHI RCT trumped many earlier observational studies and radically affected the practice of medicine worldwide (Rossouw et al., 2002). All of this means that best available evidence exactly reflects what this acronym means: a summary conclusion, based on best available information at any given moment in time. In many cases, best available evidence will, indeed, rely on RCT. But other study formats have to be utilized when RCT are unavailable or of poor quality. Over 3 million IVF offspring worldwide are witnesses to this fact.
Best available evidence Since practice of medicine should always rely on best available evidence, arguments against overvaluing RCT should not be misconstrued as arguments against RCT. Best evidence can be gathered in various ways (Collins et al., 2009; Diomidus, 2002; Homberger and Wrone, 1997; Horn and Gassaway, 2007; Horn et al., 2005; Loewy, 2007), with not necessarily always perfect RCT not always providing truly the best. Like other study formats, for a RCT to provide useful information, it must be carefully designed, be properly powered and have a clear-cut testable hypothesis. In analogy to a telescope lens, capturing a single face in a large crowd, RCT can be very powerful; their scope is, however, usually narrow. Shortcomings, therefore, abound: If data collected for a RTC are used to post-factum address a newly developed hypothesis, such a new analysis may turn out underpowered. If the same database is used to address an issue not dependent on randomization, the results no longer are those of a randomized trial. Such secondary analyses, therefore, are hypothesis generating, evidence can be inferred from them, but findings are never conclusive. Post-menopausal hormone therapy (HRT) is a case in point. In the 1980s, best evidence from a preponderance of observational evidence (such as Framingham study, Walnut Creek Study, Nurse’s Health Study) suggested that HRT would relieve menopausal hot flashes, prevent osteoporosis and prevent heart disease. Studies were, however, not uniform in conclusions. For example, in one 1985 issue of the New England Journal of Medicine, back-to-back manuscripts reported no (Wilson et al., 1985) and positive (Stampfer et al., 1985) effects of post-menopausal oestrogen therapy on cardiovascular disease. Subsequent RCT of the Women’s Health Initiative (WHI) Hormone Trials (Prentice et al., 2009; Rossouw et al., 2002) were then designed to test the hypothesis that HRT was an effective treatment in preventing heart disease and stroke, as had been suggested by earlier observational studies.
Inability to perform RCT IVF is, indeed, a good example why RCT may at times be difficult to impossible to conduct. In the USA, the initial survival of IVF was entirely dependent on its successes in the market place, as it was developed without federal funding (Gleicher, 2003). With patients paying for cycles out of their own pocket, they rightly demanded best currently available treatments and rightly refused to fund RCT. IVF, by now a mature clinical procedure (Gleicher, 2009), still faces similar problems. We, for example, in recent years have had to abandon two registered RCT in the USA and Europe, involving the utilization of dehydroepiandrosterone (DHEA) in women with diminished ovarian reserve because patients simply refused recruitment into randomization (Barad et al., 2007). Considering limited conception times for women with severe diminished ovarian reserve, they cannot be blamed for refusing a 50% chance/risk, potentially affecting their last opportunity to conceive with use of their own oocytes if they, in addition, carry the costs of treatment. In theory – that is in evidence-based, infertility practice – a RCT can be designed and conducted for every clinical indication. In reality-based practice, this is, however, not really the case: Advanced female ages may hamper randomization or limited opportunity to conceive (for example, conception times left with severe diminished ovarian reserve) may make patients resistant to randomization. Many authors have pointed out cost disadvantages of RCT over other study formats (Diomidus, 2002; Homberger and Wrone, 1997). Unavailability of funding sources for IVF research has, likely, been a major reason why, at least in the USA, RCT have remained the exception (Gleicher, 2003). Under such circumstances, ignoring best available evidence
442 generated by other study formats would appear irresponsible and, of course, potentially prevent progress. The Practice Committee of the American Society for Reproductive Medicine recently reemphasized that all interventions should be considered experimental until authoritative bodies have declared them established practice (Practice Committee of the ASRM, 2009). Any declaration of standard practice should, however, be based on quality evidence from all available sources. The recent experience with preimplantation genetic screening (PGS), an offshoot of preimplantation genetic diagnosis, offers a good example how carefully best available evidence needs to be assembled (Gleicher et al., 2008). Here, PGS benefited from preimplantation genetic diagnosis having been declared established practice and was prematurely integrated into routine IVF practice, as is further discussed below.
RCT are not perfect either Despite all the love for RCT, it is well accepted that, like all other study formats, they can have shortcomings (Collins et al., 2009; Deveraux et al., 2004; Diomidus, 2002; Haahr and Hro ´bjartsson, 2006; Homberger and Wrone, 1997; Horn and Gassaway, 2007; Horn et al., 2005; Loewy, 2007; Rubinstein et al., 2009; Scott, 2009a,b; Vintzileos, 2009). PGS represents yet another telling example: A small number of underpowered RCT failed to demonstrate expected benefits (Gleicher et al., 2008). Their significant limitations allowed proponents of PGS, despite absence of contradictory data of any kind, to dismiss their IVF outcome results. As a consequence, PGS clinically not only survived but prospered. Paradoxically, another quite flawed RCT (Cohen et al., 2007) ultimately ended the PGS run, demonstrating its impact on worldwide practice (Mastenbroek et al., 2007). Even before Mastenbroek’s RCT, similar or identical conclusions could, however, have been reached had available data from quite limited RCT been more carefully analysed (Gleicher et al., 2008). Recalculating early Belgian PGS data (Staessen et al., 2004) already allowed for the conclusion that, especially in older patients, PGS may actually reduce, rather than improve, pregnancy rates (Gleicher et al., 2008), an observation that was later the centrepiece conclusion of the Dutch RCT (Mastenbroek et al., 2007). Like other study formats, underpowered or otherwise poor-quality RCT can thus offer more or less ‘the truth’, be misleading or, at times, be even outright biased. Chalmers (2009) well summarized how treatment comparison groups are created in unbiased ways but often exorbitant costs of recruitment can create temptations to enrol insufficient patient numbers or cut corners in other ways. Even small errors in original study assumptions may then leave RCT underpowered and a case–control study, under such circumstances, can be very cost-effective in contrast. RCT by themselves are thus no panacea (Haahr and Hro ´bjartsson, 2006; Rubinstein et al., 2009; Scott, 2009b). While well-designed RCT in most circumstances represent the highest standard of investigation (Scott, 2009a), it is important to acknowledge that they have to address realistic problems in a realistic way (Smith and Pell, 2003). That this is not always the case is well accepted (Diomidus,
N Gleicher, DH Barad 2002; Homberger and Wrone, 1997; Scott, 2009a,b). Well-designed observational studies may, therefore, on occasion offer better evidence than poorly designed RCT. For further information on how evidence can and should be graded, the reader is referred to recent publications from the GRADE Working Group (Brozek et al., 2009).
References Barad, H., Brill, H., Gleicher, N., 2007. Update on the use of dehydroepiandrosterone supplementation among women with diminished ovarian function. J. Assis. Repro. Genet. 24, 629–634. Brozek, J.L., Akl, E.A., Alonso-Coello, P., et al., 2009. Grading quality of evidence and strength of recommendations in clinical practice guidelines. Part 1 of 3. An overview of the GRADE approach and grading quality of evidence about interventions. Allergy 64, 669–677. Chalmers, I., 2009. Explaining the unbiased creation of treatment comparison groups. Lancet 374, 1670–1671. Chambers, D., Fayter, D., Paton, F., Woolacott, N., 2009. Use of non-randomised evidence alongside randomized trials in a systematic review of endovascular aneurism repair: strength and limitations. Eur. J. Vasc. Endovasc. Surg. (Epub ahead of print). Cohen, J., Wells, D., Munne ´, S., 2007. Removal of 2 cells from cleavage stage embryos is likely to reduce efficacy of chromosomal tests that are used to enhance implantation rates. Fertil. Steril. 87, 496–503. Collins, L.M., Chakraborty, B., Murphy, S.A., Strecher, V., 2009. Comparison of a phased experimental approach and a single randomized clinical trial for developing multicomponent behavioral interventions. Clin. Trials 6, 5–15. Deveraux, P.J., Choi, P.T., El-Dika, S., et al., 2004. An observational study found that authors randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J. Clin. Epidemiol. 57, 1232–1236. Diomidus, M., 2002. Epidemiological study designs. Stud. Health Technol. Inform. 65, 126–351. Gleicher, N., 2003. Safety in assisted reproduction technology. A rebuttal. Hum. Reprod. 18, 1765–1766. Gleicher, N., 2009. Patients are entitled to maximal IVF pregnancy rates. Reprod. Biomed. Online 18, 599–602. Gleicher, N., Weghofer, A., Barad, D., 2008. Preimplantation genetic screening: ‘established’ and ready for prime time? Fertil. Steril. 89, 780–788. Haahr, M.T., Hro ´bjartsson, A., 2006. Who is blinded in randomized clinical trials? A study of 200 trials and surveys of authors. Clin. Trials 3, 360–365. Homberger, J., Wrone, E., 1997. When to base clinical policies on observational versus randomized trial data. Ann. Intern. Med. 127, 697–703. Horn, S.D., Gassaway, J., 2007. Practice-based evidence study design for comparative effectiveness research. Med. Care 45 (10 Suppl. 2), S50–S57. Horn, S.D., DeJong, G., Ryser, D.K., Veazie, P.J., Teraoka, J., 2005. Another look at observational studies in rehabilitation research going beyond the holy grail of the randomized controlled trial. Arch. Phys. Med. Rehab. 86 (12 Suppl. 2), S8–S15. Lewin, S., Glenton, C., Oxman, A.D., 2009. Use of qualitative methods alongside randomized controlled trials of complex health care interventions: methodological study. BMJ 339, b3496.
Misplaced obsession with prospectively randomized studies
443
Loewy, E., 2007. Ethics and evidence-based medicine: is there a conflict? Med. Gen. Med. 9, 30. Mastenbroek, S., Twisk, M., van Echten-Arends, J., et al., 2007. In vitro fertilization with preimplantation genetic screening. N. Engl. J. Med. 357, 9–17. Prentice, R.L., Manson, J.E., Langer, R.D., et al., 2009. Benefits and risks of postmenopausal hormone therapy when it is initiated soon after menopause. Am. J. Epidemiol. 170, 12–23. Rossouw, J.E., Anderson, G.L., Prentice, R.L., et al., 2002. Writing Group for the Women’s Health Initiative Investigators. Risks and benefits of estrogen plus progestion in healthy postmenopausal women: principle results From the Women’s Health Initiative randomized controlled trial. JAMA 288, 321–333. Rubinstein, L., Crowley, J., Ivy, P., Leblanc, M., Sergent, D., 2009. Randomized phase II designs. Clin. Cancer Res. 15, 1883–1890. Scott, J.R., 2009a. In defense of case reports. Obstet. Gynecol. 114, 413–414. Scott, J.R., 2009b. Evidence-based medicine under attack. Obstet. Gynecol. 113, 1202–1203. Smith, G.C.S., Pell, P., 2003. Hazardous journey. Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomized controlled trials. BMJ 327, 1450–1461. Staessen, C., Platteau, P., van Assche, E., et al., 2004. Comparison of blastocyst transfer with or without preimplantation genetic diagnosis for aneuploidy screening in couples with advanced
maternal age: a prospective randomized controlled trial. Hum. Reprod. 19, 2849–2858. Stampfer, M.J., Willet, W.C., Colditz, G.A., Rosner, B., Speizer, F.E., Hennekens, C.H., 1985. A prospective study of postmenopausal estrogen therapy and coronary heart disease. N. Engl. J. Med. 313, 1044–1049. The Committee and of the American Society for Reproductive Medicine, 2009. Definition of ‘experimental procedure’. Fertil. Steril. 92, 1517. Vintzileos, A.M., 2009. Evidence-based compared with reality-based medicine in obstetrics. Obstet. Gynecol. 113, 1335–1340. Wilson, P.W., Garrison, R.J., Castelli, W.P., 1985. Postmenopausal estrogen use, cigarette smoking, and cardiovascular morbidity in women over 50. The Framingham study. N. Engl. J. Med. 312, 1038–1043. Declaration: NG and DHB are co-inventors of a recently granted US patent, which claims therapeutic benefits from dehydroepiandrosterone (DHEA) supplementation in women with diminished ovarian reserve. Both authors have additional pending patent applications in regards to DHEA and the FMR1 (fragile X) gene. NG is the owner of CHR. This work was supported by the Foundation for Reproductive Medicine and intramural research funds from CHR. Received 27 January 2010; refereed 3 May 2010; accepted 24 June 2010.