EUROPEAN UROLOGY 62 (2012) 791–796
available at www.sciencedirect.com journal homepage: www.europeanurology.com
Platinum Priority – Editorial and Reply from Authors Referring to the article published on pp. 779–790 of this issue
Posterior Reconstruction: Weighing the Evidence Khurshid R. Ghani *, Mani Menon Vattikuti Urology Institute, Henry Ford Health System, Detroit, MI, USA
The reapproximation of Denonvilliers’ fascia and the posterior periurethral tissue (rhabdosphincter), often described as posterior reconstruction (PR) or the Rocco stitch, may be performed at the time of the vesicourethral anastomosis at radical prostatectomy (RP) with the aim of improving early return to continence [1]. In fact, the method of incorporating the fascia posterior to the urethra for promotion of early continence was initially reported by Klein in 1992 [2]. Despite studies reporting differing results of its benefit, PR has become widely adopted at open RP (ORP), laparoscopic RP (LRP), and robot-assisted RP (RARP). The systematic review by Rocco and colleagues provides a welcome opportunity to critically appraise and qualify the evidence base surrounding the therapeutic benefit of this intervention [3]. The review concentrates on data from 11 comparative studies over a 6-yr period, divided among the three approaches to RP (two ORP, one LRP, and eight RARP studies). There were six retrospective and five prospective studies, two of which were randomized controlled trials (RCTs). Differences in functional outcomes (continence, anastomotic leak), oncologic outcomes (positive surgical margins), and safety outcomes (urinary retention, bladder neck stricture) were analyzed for patients undergoing RP with or without PR of the rhabdosphincter. As is the norm for meta-analyses, the strength of the intervention was assessed using a forest plot, which is a quantitative estimate of the net benefit aggregated over all the included studies. Depending on the outcome measure, the total number of studies aggregated ranged from two to seven. In summary, Rocco et al. found that PR improved return to continence at 3–7 d and 30–45 d with risk ratios of 1.79 (95% confidence interval [CI], 1.06–3.03) and 1.57 (95% CI, 1.15–2.14), respectively [3]. All other end points, including continence at 90 d, were not significant.
How was the evidence assessed? The authors used the Preferred Reporting Items for Systematic Reviews (PRISMA) framework for literature reviews evaluating health care interventions [4]. The aim of PRISMA is to enable transparency so readers can assess the strengths and weaknesses of the intervention in question. Explicit systematic methods are selected to minimize bias and provide reliable findings from which conclusions can be drawn. A review should address at least 27 different items. However, in this review, an important criterion the authors fail to establish is the risk of bias across studies, including the possibility of publication bias or selective reporting. The authors note that 4 of the 11 studies assessing continence within 90 d were negative studies, yet they exclude the negative study by Krane et al. [5], which was 1 of the 11 studies shortlisted after using the PRISMA methodology. Because this particular study assessed continence at 45–75 d, it was also excluded from the forest plot assessing benefit at 90 d. In this scenario, a sensitivity analysis would have been helpful to determine the effect of this study on the net benefit, thus minimizing any bias. One of the simplest ways to assess the risk of bias within systematic reviews is to examine the funnel plot [4]. Funnel plots display studies on a graph of effect against sample size; an asymmetric plot points toward exclusion of trials showing no effect. Some experts advocate caution in the assessment of meta-analyses where all the trials consist of small sample sizes, as is the case in all but one of the studies in this review. The effect of publication bias may be more pronounced in these situations. In addition, uninformative and potentially misleading combined estimates of the net benefit may be derived from analyses of observational studies where confounding and selection bias can often distort the findings. The combination of both observational
DOI of original article: http://dx.doi.org/10.1016/j.eururo.2012.05.041 * Corresponding author. Vattikuti Urology Institute, Henry Ford Health System, 2799 W. Grand Boulevard, Detroit, MI 48202, USA. Tel. +1 248 219 6307; Fax: +1 313 916 4352. E-mail address:
[email protected] (K.R. Ghani). 0302-2838/$ – see back matter # 2012 European Association of Urology. Published by Elsevier B.V. All rights reserved.
792
EUROPEAN UROLOGY 62 (2012) 791–796
studies and RCTs (as was attempted in this review) may produce even more spurious results and usually is not encouraged [6]. Another important matter to consider when assessing the validity of a systematic review is the degree of heterogeneity between studies. The extent to which different kinds of studies are mixed into the ‘‘melting pot’’ of the systematic review is justifiably a cause for concern. For example, are RARP studies assessing PR and continence equivalent to ORP studies? What advantage does magnified three-dimensional vision impart on the performance of the anastomosis when compared with open or laparoscopic methods? Can continence outcomes derived from pad weights [7] be compared with self-reported pad use [8], where some patients may have minor incontinence and change pads for social reasons while others may not? Statistical tests such as the I2 test of heterogeneity attempt to establish whether studies within meta-analyses are consistent [9]. Closer examination of the statistics in this review reveals I2 values between 82% and 87% for continence outcomes at 3–7 d, 30–45 d, and 90 d. While a value of 0% indicates no observed heterogeneity, larger values show increasing heterogeneity, with moderate and high values acknowledged to be above 50% and 75%, respectively. Unless we know how consistent the results of the studies are, we cannot determine the generalizability of the findings. Rocco and colleagues use the levels of evidence (LoEs) system from the Oxford Centre for Evidence-Based Medicine (CEBM) for grading the quality of evidence. This wellknown hierarchical model considers data from RCTs and meta-analyses as the highest category of evidence, with less importance placed on observational studies. This position, however, can be dangerously dogmatic and ignore the sometimes larger amount of evidence accumulated through nonrandomized studies. Recent evaluations have suggested that for selected medical topics, both randomized and nonrandomized studies may yield similar results [10]. Some treatments have such dramatic effects that biases can be
ruled out without randomized trials. In such a scenario, the ratio of the treatment effect (signal) relative to the expected prognosis (noise) is large enough (often above a factor of 10) to likely represent a real treatment effect [11]. The updated version (2011) of the LoE model now acknowledges this shift in thinking by recategorizing observational studies with dramatic effect as level 2 evidence, the same as a highquality RCT [12]. But what do we mean by quality? Probably the most thoroughly thought-out approach to defining quality of evidence in the context of a systematic review is provided by the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group: ‘‘Quality of evidence is a judgment about the extent to which one can be confident the estimates of effect are correct’’ [13]. In the GRADE system, judgments refer not to individual studies but to the body of evidence for each outcome, which is rated as high, moderate, low, or very low. Five categories are recognized to affect the quality of evidence: risk of bias, precision of the overall estimate across studies, consistency of results across studies, indirectness, and publication bias. Three further criteria are used to upgrade the quality (Table 1). These factors, along with details such as the total number of studies and participants, study design, and estimates for the relative and absolute effects of the intervention, are summarized to provide an evidence profile (EP). The EP serves as an explicit judgment of each factor that determines the quality of evidence for each outcome. With the increasing adoption of the GRADE system by many organizations, it is likely to become more popular in the field of urology worldwide [14]. What may not be so evident from this review is that the studies with the highest quality of evidence using objective assessments of continence (ie, pad weights), showed no benefit of PR [7,15]. What is clear is that surgeons achieving low continence rates without PR can improve continence outcomes with PR (Fig. 1). This is amply demonstrated in the study by Nguyen et al, where continence rates at 3–7 d improve from 3% to 34% with PR [8]. The signal-to-noise
Table 1 – A summary of the Grading of Recommendations Assessment, Development, and Evaluation approach to rating quality of evidence Study design
Randomized trials
Observational studies
Initial quality of body of evidence
Lower quality
High
Risk of bias 1 Serious 2 Very serious Inconsistency 1 Serious 2 Very serious Indirectness 1 Serious 2 Very serious
Low
Adapted from Balshem et al. [13].
Imprecision 1 Serious 2 Very serious Publication bias 1 Likely 2 Very likely
Higher quality
Large effect +1 Large +2 Very large Dose response +1 Evidence of a gradient
Quality of body of evidence High (4+: ****) Moderate (3+: ****)
All plausible residual confounding +1 Would reduce a demonstrated effect +1 Would suggest a spurious effect if no effect were observed Low (2+: ****) Very low (1+: ****)
EUROPEAN UROLOGY 62 (2012) 791–796
[(Fig._1)TD$IG] *Sutherland 2011 Menon 2008 *Joshi 2010 *Brien 2011 *Krane 2009 *Kim 2010 Coelho 2010 Nguyen 2008 Rocco 2007 Rocco 2007 Rocco 2006
793
References [1] Rocco F, Carmignani L, Acquati P, et al. Restoration of posterior
Studies
aspect of rhabdosphincter shortens continence time after radical retropubic prostatectomy. J Urol 2006;175:2201–6. [2] Klein EA. Early continence after radical prostatectomy. J Urol 1992; 148:92–5. [3] Rocco B, Cozzi G, Spinelli MG, et al. Posterior musculofascial reconstruction after radical prostatectomy: a systematic review of the
0
20
40
60
80
100
Continence (%)
literature. Eur Urol 2012;62:779–90. [4] Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that
Fig. 1 – Absolute changes in continence after adopting posterior reconstruction at radical prostatectomy. Continence outcome determined at a minimum of 30 d or at 45–90 d if not assessed earlier. For each individual study, the baseline continence without reconstruction is the beginning of the line, and continence after implementation of reconstruction is the end of the line. Forward arrowhead indicates improvement; backward arrowhead indicates reduction. Thick bars indicate no statistically significant difference; thin bars are statistically significant. Studies with blue lines are randomized controlled trials. All studies are cited in Rocco et al. [3]. * Continence outcome determined at 45–90 d.
evaluate health care interventions: explanation and elaboration. Ann Intern Med 2009;151:W65–94. [5] Krane LS, Wambi C, Bhandari A, Stricker HJ. Posterior support for urethrovesical anastomosis in robotic radical prostatectomy: single surgeon analysis. Can J Urol 2009;16:4836–40. [6] Egger M, Schneider M, Davey Smith G. Spurious precision? Metaanalysis of observational studies. BMJ 1998;316:140–4. [7] Menon M, Muhletaler F, Campos M, Peabody JO. Assessment of early continence after reconstruction of the periprostatic tissues in patients undergoing computer-assisted (robotic) prostatectomy: results of a 2-group parallel randomized controlled trial. J Urol
ratio in this study is obvious. However, if one is already reaching 74% continence at 30 d, reconstruction might not achieve much better results [7]. There might be other reasons for undertaking PR, such as improved hemostasis, greater support for a delicate anastomosis, or facilitating trainees to successfully perform the anastomosis. These factors were not assessed by Rocco et al. [3] and may warrant consideration in future studies. Finally, there is a danger that systematic reviews may confuse rather than inform practitioners. Statistically significant tiny effects for interventions of clinical importance have become more common in the literature. Cautious interpretation is warranted, since most of these effects could be eliminated with even minimal biases. Furthermore, it may be argued that hierarchical models of grading evidence have no place in evidence-based medicine. Those who assess evidence should be prepared to base their conclusions on studies that are fit-for-purpose rather than those that conform to rigid methodology. Some would argue that hierarchies attempt to replace judgment with an oversimplistic pseudoquantitative assessment of the available evidence [16]. Newer methods such as the GRADE system provide a framework for assessing quality that promotes transparency and an explicit accounting of the judgments made, thereby allowing scrutiny and encouraging debate. Conflicts of interest: The authors have nothing to disclose.
2008;180:1018–23. [8] Nguyen MM, Kamoi K, Stein RJ, et al. Early continence outcomes of posterior musculofascial plate reconstruction during robotic and laparoscopic prostatectomy. BJU Int 2008;101:1135–9. [9] Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–60. [10] Ioannidis JP, Haidich AB, Pappa M, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA 2001;286:821–30. [11] Glasziou P, Chalmers I, Rawlins M, McCulloch P. When are randomised trials unnecessary? Picking signal from noise. BMJ 2007;334: 349–51. [12] Centre for Evidence Based Medicine. CEBM (Centre for EvidenceBased Medicine) levels of evidence [2011]: introduction. Oxford Centre for Evidence-Based Medicine Web site. http://www.cebm. net/index.aspx?o=5653. Updated February 6, 2012. [13] Balshem H, Helfand M, Schunemann HJ, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol 2011;64:401–6. [14] Canfield SE, Dahm P. Rating the quality of evidence and the strength of recommendations using GRADE. World J Urol 2011; 29:311–7. [15] Sutherland DE, Linder B, Guzman AM, et al. Posterior rhabdosphincter reconstruction during robotic-assisted radical prostatectomy: results from a phase II randomized clinical trial. J Urol 2011; 185:1262–7. [16] Rawlins M. De testimonio: on the evidence for decisions about the use of therapeutic interventions. Lancet 2008;372:2152–61. http://dx.doi.org/10.1016/j.eururo.2012.06.006