Urolithiasis/Endourology
Low Methodological and Reporting Quality of Randomized, Controlled Trials of Devices to Treat Urolithiasis Peter J. Zavitsanos, Vincent G. Bird,* Kathryn A. Mince, Molly M. Neuberger and Philipp Dahm† From the Department of Urology, University of Florida and Malcom Randall Veterans Affairs Medical Center, Gainesville (PD), Florida
Abbreviations and Acronyms CENTRAL ¼ Cochrane Central Register of Controlled Trials MD ¼ mean difference PCNL ¼ percutaneous nephrolithotomy RCT ¼ randomized, controlled trial SWL ¼ shock wave lithotripsy Accepted for publication October 17, 2013. * Financial interest and/or other relationship with Boston Scientific. † Correspondence: Department of Urology, University of Florida College of Medicine, Health Science Center, Box 100247, Room N2-15, Gainesville, Florida 32610-0247 (telephone: 352273-8634; FAX: 352-273-7515; e-mail: p.dahm@ urology.ufl.edu).
Purpose: We assessed the methodological and reporting quality of randomized, controlled trials of stone disease management and determined whether the reporting quality of randomized, controlled trials improved with time. Materials and Methods: We systematically searched the literature for randomized, controlled trials of urolithiasis treatment. We developed and pilot tested a data extraction checklist based on CONSORT (Consolidated Standards of Reporting Trials) criteria as well as a clinical checklist relevant to urolithiasis, each scored as 0 to 25. Our primary outcome measures were the mean differences in CONSORT and clinical summary scores with time. We performed statistical hypothesis testing using the Student t-test with 2-sided a ¼ 0.05 to compare scores between 2002 to 2006 and 2007 to 2011. Results: A total of 104 randomized, controlled trials met study inclusion criteria. The most common procedure types studied were percutaneous nephrolithotomy (41.3%), ureteral stenting (28.8%) and shock wave lithotripsy (25.0%). Mean SE CONSORT summary scores were 11.4 0.4 and 12.1 0.3 in 2002 to 2006 and 2007 to 2011, respectively, with a mean difference of 0.7 (95% CI e0.3e1.6, p ¼ 0.167). Mean clinical summary scores were 7.4 0.5 and 9.3 0.4 in 2002 to 2006 and 2007 to 2011, respectively, with a mean difference of 1.8 (95% CI 0.6e3.1, p ¼ 0.004). Conclusions: While the number of randomized, controlled trials of urological devices used to treat stone disease substantially increased with time, methodological and clinical reporting quality remains suboptimal. This compromises their credibility and warrants efforts to promote appropriate performance of future endourological studies. Key Words: kidney, ureter, urolithiasis, equipment and supplies, randomized controlled trials as topic
WELL designed RCTs have the potential to provide the highest quality evidence for questions of therapeutic effectiveness, assuming that they are appropriately performed and analyzed, and transparently reported. Urologists use RCT results to guide clinical decision making in individuals and they are also being increasingly
988
j
www.jurology.com
used in systematic reviews and clinical practice guidelines that define standards of care and shape health policy.1 In the widely endorsed paradigm of evidence-based clinical practice high quality evidence supporting a given intervention provides a strong impetus for its clinical application while low quality evidence is the
0022-5347/14/1914-0988/0 THE JOURNAL OF UROLOGY® © 2014 by AMERICAN UROLOGICAL ASSOCIATION EDUCATION AND RESEARCH, INC.
http://dx.doi.org/10.1016/j.juro.2013.10.067 Vol. 191, 988-993, April 2014 Printed in U.S.A.
LOW QUALITY OF RANDOMIZED TRIALS OF DEVICES TO TREAT UROLITHIASIS
source of uncertainty and undesirable practice variation. Treatment of urinary stones is central to the practice of urology and also notable for its heavy reliance on surgical devices. For various historical and practical reasons these devices have not been assessed by the same evidentiary standards as drugs before regulatory approval and widespread implementation. Although unsystematic observations suggest that the number of RCTs of surgical treatment of urolithiasis has increased, little is known about the quality of these studies. Therefore, we formally assessed trends in the methodological and reporting quality of RCTs of surgical devices used to treat patients with stone disease over time. We sought to better define evidence gaps and suggest strategies for improvement.
989
if a study only mentioned 2 of 4 subcriteria for 1 of the 25 CONSORT criteria, the study received a half point for that criterion.4 Clinical scores were calculated based on the reporting of 25 predefined baseline and end point criteria. We assessed interobserver agreement beyond chance using the k statistic.5 Descriptive summary statistics for individual CONSORT and clinical criteria are shown as proportions and summary scores are shown as the mean SE and median. We calculated MDs and the 95% CI of summary scores between periods. We performed predefined subgroup analysis by continent of origin and post hoc subgroup analysis by journal of publication. Statistical hypothesis testing was done with SPSSÒ 20.0 using the chi-square and Student t-tests with 2-sided a ¼ 0.05. We did not adjust for multiple comparisons.
RESULTS MATERIALS AND METHODS We defined a surgical device RCT as a prospective study comparing surgical interventions that included a device in at least 1 arm of the trial and had therapeutic intent in human participants randomly allocated to study groups.2 Surgical procedures and devices included SWL, ureteroscopy, PCNL, lithotripters, lithotrites, endoscopes, ureteral stents and antiretropulsion devices. We systematically searched the literature using a defined search strategy in 2 databases (MEDLINEÒ and CENTRAL) with date restrictions (2002 to 2011) and publication type (RCT) to identify RCTs potentially eligible for study inclusion. We also assessed individual studies for eligibility that were referenced in systematic reviews and meta-analyses identified in the MEDLINE search. Two investigators (PJZ and KAM) independently screened all search results for eligibility. Consensus was achieved through discussion between the 2 investigators with arbitration by a third investigator (PD). Two independent investigators with formal methodology training reviewed and scored each included article using a standardized, pilot tested data extraction form incorporating the 2010 CONSORT statement criteria and an evidence-based checklist used to standardize RCT reporting and prevent the introduction of bias into studies (supplementary file 1, http://jurology.com/).3 We also included 25 clinical variables on the same checklist to evaluate baseline data and end points relevant to the treatment of patients with urinary stones (supplementary file 1, http://jurology.com/). Items were scored as met, not met or nonapplicable. Discrepancies were settled by discussion among the reviewers and in select cases by the third party arbiter. Our primary end points were the MDs in CONSORT and clinical criteria summary scores, each on a scale of 0 to 25. As an a priori null hypothesis, we considered that the reporting quality of urolithiasis surgical device RCTs published within the 10-year study period in 2007 to 2011 was no different than in 2002 to 2006. Consistent with prior studies we assigned quarter, third and half points for multicomponent criteria to maintain weighting. Thus,
A total of 104 RCTs were included in our study (supplementary file 2, http://jurology.com/ and fig. 1). The MEDLINE search identified 209 records, of which 99 ultimately met inclusion criteria. Another 1 and 4 studies were identified for inclusion through CENTRAL and reference lists of systematic reviews, respectively. Figure 1 shows the reasons for study exclusion. Table 1 lists the characteristics of included trials. The number of trials meeting inclusion criteria that were published in each 5-year period increased from 39 (2002 to 2006) to 65 (2007 to 2011). The most common types of procedures studied were PCNL, ureteral stenting and SWL. There was a marked increase from 17.9% (2002 to 2006) to 55.4% (2007 to 2011) in the proportion of studies of PCNL. Of the series 86.5% were parallel 1-arm studies. Mean sample size was 106.8 patients (range 18 to 903), including 82.2 in 2002 to 2006 and 122.6 in 2007 to 2011 (p ¼ 0.545). The percent of studies with a sample size of greater than 100 patients increased during the periods. Although more than half of the trials did not mention the number of study sites, the number of multicenter trials performed between the periods increased (2.6% to 10.8%). In regard to study origin the absolute number of studies from North America remained similar but the relative percent of North American studies decreased since more studies originated from other continents, most notably Asia. When comparing the 2 intervals, mean SE CONSORT summary scores were 11.4 0.4 (2002 to 2006) vs 12.1 0.3 (2007 to 2011) with a MD of 0.7 (95% CI e0.3e1.6, p ¼ 0.167). Mean SE clinical summary scores were 7.4 0.5 (2002 to 2006) and 9.3 0.4 (2007 to 2011) with a MD of 1.8 (95% CI 0.6e3.1, p ¼ 0.004). Median CONSORT summary scores were 12.0 (2002 to 2006) vs 12.1 (2007 to 2011) and median clinical summary scores were 8.0 (2002 to 2006) vs 10.0 (2007 to 2011).
990
LOW QUALITY OF RANDOMIZED TRIALS OF DEVICES TO TREAT UROLITHIASIS
Figure 1. Search strategy
Figure 2, A shows an analysis of the reporting of the 5 most important methodological safeguards against bias included among the CONSORT criteria. For these 5 criteria k ranged from 0.56 (intent to treat) to 0.96 (allocation concealment).
Table 1. Characteristics of 104 published RCTs of surgical device to treat urolithiasis
Totals Included as study arm: PCNL SWL Ureteral stenting Ureteroscopy No. study arms: 2 Greater than 2 Sample size range: 0e25 26e50 51e100 Greater than 100 No. study sites: Not reported 1 2 or Greater Continent of origin: North America South America Europe Asia Africa Publication journal: Urology The Journal of UrologyÒ Urological Research* Journal of Endourology European Urology Other * Current title is Urolithiasis.
No. 2002e2006 (%)
No. 2007e2011 (%)
Total No. (%)
39 (37.5)
65 (62.5)
104 (100)
7 13 15 6
36 13 15 5
(17.9) (33.3) (38.5) (15.4)
(55.4) (20) (23.1) (7.7)
43 (41.3) 26 (25) 30 (28.8) 11 (10.6)
31 (79.5) 8 (20.5)
59 (90.8) 6 (9.2)
90 (86.5) 14 (13.5)
4 11 13 11
2 15 21 27
6 26 34 38
(10.3) (28.2) (33.3) (28.2)
(3.1) (23.1) (32.3) (41.5)
(5.8) (25.0) (32.7) (36.5)
25 (64.1) 13 (33.3) 1 (2.6)
31 (47.7) 27 (41.5) 7 (10.8)
56 (53.8) 40 (38.5) 8 (7.7)
10 (25.6) 0 16 (41) 10 (25.6) 3 (7.7)
10 2 18 29 6
(15.4) (3.1) (27.7) (44.6) (9.2)
20 2 34 39 9
(19.2) (1.9) (32.7) (37.5) (8.7)
6 15 4 8 3 3
6 12 8 20 2 17
(9.2) (18.5) (12.3) (30.8) (3.1) (26.2)
12 27 12 28 5 20
(11.5) (26.0) (11.5) (26.9) (4.8) (19.2)
(15.4) (38.5) (10.3) (20.5) (7.7) (7.7)
Reporting rates were lowest for blinding and highest for followup completeness. However, none of them showed a percent of reporting compliance higher than 34% and no observed increase was statistically significant. Figure 2, B shows a similar analysis of the 5 most important clinical criteria. Reporting rates were highest for preoperative and postoperative imaging. A large, statistically significant improvement of 2.6% (2002 to 2006) to 21.5% (2007 to 2011) was noted for reporting ethics board approval and research consent. On predefined subgroup analysis we compared reporting quality by continent of origin (table 2). The only statistically significant improvement was seen for the CONSORT summary score of studies from North America. There was also a trend toward improved clinical reporting for North American and Asian studies that did not attain statistical significance. On post hoc subgroup analysis based on publication journal European Urology and UrologyÒ had the highest CONSORT and clinical summary scores, respectively, in 2007 to 2011. Only Journal of Endourology experienced a statistically significant increase in clinical summary score (from 5.4 to 8.9, p ¼ 0.002).
DISCUSSION A key finding of this study was the low overall reporting quality of RCTs of device related interventions to treat patients with urinary stones. Although the number of published RCTs increased approximately 1.67 times from the first to second 5-year intervals, we found no significant improvement in the overall methodological quality of these
LOW QUALITY OF RANDOMIZED TRIALS OF DEVICES TO TREAT UROLITHIASIS
991
Figure 2. A, select CONSORT criteria reporting. B, select clinical criteria reporting. SFS, stone-free status. IRB, institutional review board. Orange bars represent studies published in 2002 to 2006. Blue bars represent studies published in 2007 to 2011.
studies nor did reporting of important methodological safeguards against bias improve. In regard to reporting key clinical characteristics, such as prior treatment rates and definitions of stone-free status, there was improvement but the degree of improvement was modest. The major contributor to this improvement was the number of studies that reported research specific ethics board approval and/or consent. Our findings are important to the field of endourology since it is increasingly recognized that
the quality of evidence provided by a study relies on additional dimensions beyond the study design alone. Based on GRADE (Grading of Recommendations Assessment, Development and Evaluation) framework for rating quality of evidence, which was endorsed by more than 70 organizations worldwide including The Cochrane CollaborationÒ, RCTs are initially considered to provide high quality evidence but they may be downgraded for various factors, including the study limitations (ie lack of blinding)6 that we assessed.
Table 2. Subgroup analysis of difference in CONSORT and clinical scores between periods by continent of RCT origin Mean CONSORT Score
Mean Clinical Score
Continent
2002e2006
2007e2011
MD (95% CI)
2002e2006
2007e2011
Africa Asia Europe North America South America
12.17 11.63 11.78 10.47 e
12.58 11.56 11.48 14.38 12.13
0.41 (e1.52e2.35) e0.07 (e1.71e1.57) e0.30 (e1.67e1.07) 3.91 (1.22e6.62) e
9.33 7.70 7.38 6.60 e
10.67 9.79 7.78 9.50 9.00
MD (95% CI) 1.34 2.09 0.40 2.90
(e3.56e6.23) (e0.11e4.30) (e1.63e2.43) (e0.38e6.18) e
992
LOW QUALITY OF RANDOMIZED TRIALS OF DEVICES TO TREAT UROLITHIASIS
As a result of methodological issues, most RCTs of device related procedures to treat patients with stone disease would likely be downgraded as providing only moderate or low quality evidence. This may result in weaker clinical practice guideline recommendations that rely on the availability of high quality RCTs. Weak recommendations may have secondary health policy implications in the form of nonpayment or lower reimbursement. It may also be considered wasteful and unethical to perform a RCT that relies on volunteer participants unless seeking to optimize the quality of evidence that is generated. An interesting secondary finding was the geographic shift in the origin of these studies with an increasing proportion from Asia rather than Europe or North America. When analyzing the association between quality and continent of origin, we found that the mean methodological reporting quality of studies from North America improved by approximately 4 points but it remained essentially unchanged for all others. The underlying reasons are unclear. Framed positively, the improvement may reflect an increased awareness of study groups from North America about critical factors that determine the RCT quality and clinical usefulness. However, it may also represent a spurious finding. Given the importance to clinical decision making, future studies should continue to track the quality of RCTs from different parts of the world to confirm whether there is indeed divergence in quality with potential implications for editorial policy. To our knowledge no previous group to date has specifically addressed the reporting quality of RCTs of surgical treatment of patients with urinary stones. In a related study Scales et al used an earlier version of CONSORT criteria on a 0 to 22-point scale to assess the methodological quality of RCTs in the general urological literature.4 They compared the reporting of RCTs in 4 major urological journals in 1996 to those in 2004. Overall reporting quality was low with only a modest improvement in the mean CONSORT summary score with time from 10.2 points in 1996 to 12.0 points in 2004 for a MD of 1.8 (95% CI 1.0e2.6). Agha et al also used a 0 to 22-point scale to evaluate RCTs in the urological literature between 2000 and 2003, and found a similar average CONSORT score of 11.1.7 However, each series is dated and not specific to the device related treatment of stone disease. There are particular methodological challenges, including the inability to blind the urologists who perform the procedure, the issue of performance bias and the challenge of sham controlled trials to blind patients and avoid the placebo effect.8 It can also be a particular challenge to recruit patients to
RCTs of surgical procedures and devices, and identify funding sources. However, none of these challenges should negate the need for transparent reporting of important methodological details. Aside from the possibility of a type I statistical error, several other limitations deserve consideration. 1) For logistical reasons we did not blind members of the investigative team that performed the study assessment to the year of publication. To provide the most objective study assessment possible 2 investigators with formal evidence-based medicine training performed all data abstraction steps independently in duplicate with consistently high k values for key criteria, which represents a strength of this study. 2) This analysis focused on the reporting quality of methodological details and clinical characteristics, which may differ from actual study quality.9 We could have conceivably achieved better insight into study performance and analysis if we had contacted the authors. However, we chose not to do that because the results of such an analysis would not reflect the reality of clinical urologists or guideline developers, who typically do not have the time or resources to reach out to study investigators for additional information. 3) The list of clinical criteria applied in this study is not validated. However, we would argue that most if not all criteria have recognized importance for managing stone disease (supplementary file 1, http://jurology.com/). We hope that future investigators will build on these criteria to validate and disseminate a clinical checklist of critical clinical items to include in urolithiasis studies. However, CONSORT criteria are well established and have been broadly endorsed by many journals. 4) We are aware of the limitations of summary scores to characterize methodological and reporting quality since these calculations imply that different criteria given the same weight have similar importance. To avoid obscuring potential improvement for the most important CONSORT criteria, which are also included in The Cochrane Collaboration risk of bias tool,10 we reported these results separately. Our study has important implications for clinical urologists, journal editors and researchers. Urologists who treat stone disease should be aware that despite the increased number of studies that might be considered level 1 evidence based on the hierarchy of evidence system by the Centre for Evidence Based Medicine, many RCTs have shortcomings involving the reporting of important methodological and clinical details.11 Therefore, study results should not be taken at face value but critically appraised for validity and applicability using established frameworks such as A Users’ Guide to the Urological Literature.12 Journal editors and
LOW QUALITY OF RANDOMIZED TRIALS OF DEVICES TO TREAT UROLITHIASIS
reviewers should recognize these limitations. They should encourage and train reviewers to apply a checklist for key CONSORT and clinical criteria to promote more transparent reporting and higher study quality. The research community should strive toward standardizing the key clinical characteristics to be included in any study related to the treatment of stone disease and include all important information in the published reports.13 Concerted efforts such as these will hopefully result in higher quality evidence to guide the endourological treatment of patients with stone disease.
993
CONCLUSIONS While the number of RCTs of urological devices used by surgeons to treat stone disease has increased with time, methodological and clinical reporting quality remains suboptimal. There is an urgent need to improve the reporting quality of RCTs to effect a meaningful change in reporting practices. There exists the clear desire to improve the treatment of patients with urolithiasis based on the number of RCTs published. Future improved reporting quality would enhance this effort and add further credibility to surgical device trials.
REFERENCES 1. Dahm P, Chapple CR, Konety BR et al: The future of clinical practice guidelines in urology. Eur Urol 2011; 60: 72. 2. Chan AW and Altman DG: Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. BMJ 2005; 330: 753. 3. Schulz KF, Altman DG, Moher D et al: CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med 2010; 152: 726. 4. Scales CD Jr, Norris RD, Keitz SA et al: A critical assessment of the quality of reporting of randomized, controlled trials in the urology literature. J Urol 2007; 177: 1090. 5. McGinn T, Wyer PC, Newman TB et al: Tips for learners of evidence-based medicine:
3. Measures of observer variability (kappa statistic). CMAJ 2004; 171: 1369. 6. Guyatt GH, Oxman AD, Vist GE et al: GRADE: what is “quality of evidence” and why is it important to clinicians? BMJ 2008; 336: 995. 7. Agha R, Cooper D and Muir G: The reporting quality of randomised controlled trials in surgery: a systematic review. Int J Surg 2007; 5: 413. 8. McCulloch P, Taylor I, Sasako M et al: Randomised trials in surgery: problems and possible solutions. BMJ 2002; 324: 1448. 9. Devereaux PJ, Choi PT, El-Dika S et al: An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J Clin Epidemiol 2004; 57: 1232.
10. Cochrane Handbook for Systematic Reviews of Interventions, version 5.1.0, updated March 2011. Edited by JPT Higgins and S Green. The Cochrane CollaborationÒ 2011. Available at www.cochrane-handbook.org. Accessed August 12, 2013. 11. Singh JC and Dahm P: Evidence-based urology in practice: what are levels of evidence? BJU Int 2009; 103: 860. 12. Dahm P and Preminger GM: A Users’ Guide to the Urological Literature: introducing a series of evidence based medicine review articles. J Urol 2007; 178: 1149. 13. Bird VG and Bird VY: Need for standardization in defining parameters and success in clinical trials involving surgical treatment of urinary lithiasis. Curr Urol Rep 2011; 12: 87.