Journal of Clinical Epidemiology 65 (2012) 236e238
Classification systems to improve assessment of risk of bias Isabelle Boutrona,b,*, Philippe Ravauda,b a French Cochrane Centre, Paris, France Inserm, U738, Paris, France; AP-HP (Assistance Publique des H^opitaux De Paris), H^opital H^otel Dieu, Centre depidemiologie Clinique, Paris, France; Univ. Paris Descartes, Sorbonne Paris Cite, Faculte de Medecine, Paris, France
b
Accepted 28 September 2011
Systematic reviews and meta-analyses are a cornerstone of comparative effectiveness research [1]. Clinical decision makers often rely on the results of systematic reviews to develop guidelines. The strength of these methods relies on a scientifically rigorous approach that identifies, selects, and appraises all studies performed in a specific research area. Conducting a systematic review or meta-analysis involves a number of steps beginning with developing a protocol and formulating a research question, defining study selection criteria, then retrieving potentially relevant studies, selecting the studies to be included, and evaluating the risk of bias in each primary included study [2]. Careful consideration and appraisal of the methodological characteristics of primary studies included in systematic reviews and meta-analysis is an essential step because the conclusion cannot be trusted if the data are flawed. The assessment of the risk of bias of studies included in systematic reviews has widely evolved over time. Initially, reviewers evaluated not the risk of bias but rather the quality of trials included in systematic reviews, using quality scales combining information on several criteria in a single numerical value. The Cochrane collaboration provided guidance for only the assessment of allocation concealment, with no guidance for other domains. Therefore, each review group used its own quality scale or checklist. The work of Juni et al [3,4] on the use of scales to assess the quality of primary studies questioned these approaches. The team used 25 different scales to assess 17 trials comparing lowemolecular weight heparin with standard heparin for thromboprophylaxis. Depending on the scale used, the results and conclusions differed: with some scales, high-quality studies showed lowemolecular weight heparin not superior to standard heparin; with other scales, high-
emiologie Clinique, French * Corresponding author. Centre d’Epid Cochranc Centre, 1, place du Parvis Notre Dame, 75181 Paris, Cedex 4, France. E-mail address:
[email protected] (I. Boutron). 0895-4356/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2011.09.006
quality trials showed lowemolecular weight heparin superior to standard heparin [3]. Following this publication, the Cochrane collaboration, through an extensive process, developed a new tool to assess the methodological quality of studies, the ‘‘Risk of Bias’’ (RoB) tool. This tool provides an interesting framework in that first, it aims at assessing the risk of bias, not the quality of trials. In fact, quality is a multidimensional concept that could concern the risk of bias, quality of reporting, ethical approval, sample size calculation, and external validity of published trials, and so forth. In the systematic reviews and meta-analyses, the main issue is whether the treatment effect estimated in primary studies included is the true effect or whether the study overestimates or underestimates the treatment effect. Second, the tool aims at being completely transparent, with a separation of the facts and reviewers’ judgments. This aim is particularly important because reviewers, editors, and readers can challenge the author on the judgment. The RoB tool was implemented by the Cochrane collaboration in 2008 and was recently modified following an extensive evaluation [2]. The current tool contains seven items: two items dedicated to selection bias (random sequence generation and allocation concealment), one item dedicated to performance bias (blinding of participants and personnel), one item dedicated to detection bias (blinding of outcome assessors), one item dedicated to attrition bias (incomplete outcome data), and one item dedicated to reporting bias (selective outcome reporting). These domains were chosen on the basis of empirical evidence demonstrating potential for bias or exaggeration of treatment effects [5e9]. Furthermore, the tool is intended to assess the risk of bias related to the design, conduct, and analyses of the trial and not the quality of reporting. This tool has been an important step forward in the assessment of the risk of bias in systematic reviews and meta-analyses. Assessing the risk of bias with the RoB tool and evaluating the quality of studies included in systematic reviews raises some challenges mainly related to reliability. The assessment of the risk of bias of a trial is intertwined with the
I. Boutron, P. Ravaud / Journal of Clinical Epidemiology 65 (2012) 236e238
quality of reportingdthat is, the extent to which a report provides information about the design, conduct, and analysis of the trial. Reports often omit important methodological details. The CONSORT initiative allowed for improving the quality of reporting [10e12]. However, despite these guidelines, the reporting can be confusing and raise some problems in the reproducibility of this assessment. Hartling et al [13] evaluated the inter-rater agreement between reviewers using the RoB tool. Inter-rater agreement ranged from slight (k 5 0.13) to substantial (k 5 0.74). The interrater agreement was particularly problematic for blinding (k 5 0.35), incomplete data (k 5 0.32), and selective reporting of outcome (k 5 0.13). Recent research showed that this reliability could be enhanced by appropriate training, pilot testing, and context-specific decision rules [14]. To enhance reliability in assessing the risk of selective reporting of outcome, the ‘‘Outcome Reporting Bias In Trials,’’ was recently developed to classify missing or incomplete outcome reporting in reports of randomized trials [15]. The system identifies whether there is evidence that the outcome was measured and analyzed but only partially reported, whether the outcome was measured but not necessarily analyzed, whether the outcome was measured is unclear, or if clearly, the outcome was not measured. This classification system is particularly useful in enhancing reliability and also transparency when assessing this criterion. In this issue of the Journal of Clinical Epidemiology, Akl et al [16] report on their development and validation of specific instructions to evaluate the status of blinding if unclearly reported in published reports of randomized controlled trials. These instructions allow determining whether the blinding status previously classified as ‘‘unclear’’ is indeed ‘‘probably yes’’ or ‘‘probably no.’’ To assess the risk of performance bias and detection bias, the Cochrane handbook recommends that reviewers evaluate whether in a study, participants and personnel, as well as outcome assessors, were efficiently blinded, and if not blinded, to evaluate the risk of bias. This recommendation supposes first determining who was blinded, which can be challenging. A study of the quality of reporting of randomized trials indexed in PubMed in 2006 showed that 41% of publications did not report any details of any blinding; 26% provided specific details of who was blinded; and the remaining 33% simply used the terms ‘‘blinded,’’ ‘‘single blind,’’ or ‘‘double blind,’’ without providing further details [17]. In fact, the terminology frequently used to report blinding in published reports of randomized controlled trials (i.e., ‘‘single blind,’’ ‘‘double blind,’’ and ‘‘triple blind’’) is confusing [18]. Devereaux et al [19] showed that both physicians and textbooks vary greatly in their interpretations and definitions of the concepts. Physicians identified 10, 17, and 15 unique interpretations of single, double, and triple blinding, respectively, and textbooks provided 5, 9, and 7 different definitions [19]. In their study, Akl et al showed that the reliability of blinding assessment using their specific and detailed instructions were almost
237
perfect for patients, data collectors, and health care providers. Consequently, except for blinding of data analysts, the use of these specific instructions to classify ‘‘unclear’’ blinding status as ‘‘probably yes’’ or ‘‘probably no’’ should be an important tool to enhance the assessment of blinding in trials. The updated CONSORT 2010 Statements now recommend reporting ‘‘who was blinded after assignment to interventions (e.g., participants, care providers, those assessing outcomes) and how’’ [13]. However, this terminology is so rooted in usual reporting practice that adequate reporting remains low. Akl et al provide a useful tool to improve the assessment of the blinding status. They validated their tool by contacting authors of the published trials. Of course, this is only a preliminary step in evaluating the risk of bias because it cannot help determine whether blinding was efficient and evaluate the risk of bias when blinding is not implemented. Of course, the validity of the specific instructions Akl et al proposed need to be assessed in other samples because the authors’ results were of the reproducibility and validity of a sample of randomized controlled trial reports published in higheimpact factor journals and may not be valid in a more representation sample. Furthermore, the validation relying on contacting authors can be debated [20]. However, the authors’ approach is interesting and could be used for other items of the RoB tool. This assessment could be helpful to improve interrater agreement and transparency when assessing the risk of bias with the RoB tool.
References [1] Goodman S, Dickersin K. Metabias: a challenge for comparative effectiveness research. Ann Intern Med 2011;155:61e2. [2] Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available at: www.cochrane-handbook. org. Accessed October 30, 2011. [3] Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 1999;282: 1054e60. [4] Juni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001;323: 42e6. [5] Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408e12. [6] Gluud LL. Bias in clinical intervention research. Am J Epidemiol 2006;163:493e501. [7] Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA 2004;291:2457e65. [8] Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 1998;352: 609e13. [9] Wood L, Egger M, Gluud LL, Schulz KF, Juni P, Altman DG, et al. Empirical evidence of bias in treatment effect estimates in controlled
238
[10]
[11]
[12]
[13]
[14]
I. Boutron, P. Ravaud / Journal of Clinical Epidemiology 65 (2012) 236e238 trials with different interventions and outcomes: meta-epidemiological study. BMJ 2008;336:601e5. Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. PLoS Med 2010;7(3):e1000251. Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol 2010;63:e1e37. Hartling L, Ospina M, Liang Y, Dryden DM, Hooton N, Krebs Seida J, et al. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ 2009;339:b4012. Hartling L, Bond K, Vandermeer B, Seida J, Dryden DM, Rowe BH. Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma. PLoS ONE 2011;6(2):e17242.
[15] Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 2010;340:c365. [16] Akl EA, Sun X, Busse JW, Johnston BC, Briel M, Mulla S, et al. Specific instructions for estimating unclearly reported blinding status in randomized trials were reliable and valid. J Clin Epidemiol 2012;65:262e7 [in this issue]. [17] Hopewell S, Dutton S, Yu LM, Chan AW, Altman DG. The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed. BMJ 2010;340:c723. [18] Montori V, Bhandari M, Devereaux P, Manns B, Ghali W, Guyatt G. In the dark: the reporting of blinding status in randomized controlled trials. J Clin Epidemiol 2002;55:787e90. [19] Devereaux PJ, Manns BJ, Ghali WA, Quan H, Lacchetti C, Montori VM, et al. Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. JAMA 2001;285:2000e3. [20] Haahr MT, Hrobjartsson A. Who is blinded in randomized clinical trials? A study of 200 trials and a survey of authors. Clin Trials 2006;3:360e5.