EUROPEAN UROLOGY 60 (2011) 931–934
available at www.sciencedirect.com journal homepage: www.europeanurology.com
Platinum Priority – Editorial and Reply from Authors Referring to the article published on pp. 920–930 of this issue
Propensity Score Matching, Competing Risk Analysis, and a Competing Risk Nomogram: Some Guidance for Urologists May Be in Place Monique J. Roobol a,*, Eveline A.M. Heijnsdijk b a
Department of Urology, Erasmus University Medical Centre, Rotterdam, The Netherlands; b Department of Public Health, Erasmus University Medical Centre,
Rotterdam, The Netherlands
Urologists are increasingly being confronted with very sophisticated prediction tools that can be of aid in clinical decision making. Numerous publications are available, all claiming that the newly developed prediction tool performs well and outperforms the routine clinical approach. Prostate-specific antigen (PSA)–based screening is one of the main reasons for the more frequent detection of small, well-differentiated, localised prostate cancer (PCa) at younger ages [1,2]. Screening advances the time of diagnosis by approximately 4–13 yr, depending on stage and Gleason grade at time of detection [3]. Some early detected PCa would not surface clinically during the patient’s lifetime, and the question of whether or not to radically treat low-risk PCa arises. As an alternative strategy, active surveillance has emerged and consists of initially withholding radical treatment but monitoring the disease. The switch to radical treatment, if necessary at all, should be done during a phase in which the cancer can be treated with curative intent. This approach has the advantage that side effects—an inextricable aspect of radical treatment—can be postponed or avoided. Making the decision for a specific treatment for PCa coincides with decision-related stress [4], and counselling and aid in decision making is welcome for both patient and physician. It is of crucial importance that the potential user of a prediction tool is aware of its capabilities and performance in his or her specific situation. Therefore, it is almost mandatory that a newly developed tool is validated in a setting different from the one in which it was derived. Unfortunately, this is not always the case [5], potentially
resulting in wrong decisions with possible far-reaching consequences. Another potential problem is the fact that models are built on certain assumptions and/or techniques that undoubtedly have their flaws but are not immediately understood and appreciated by the user. Abdollah et al. [6] have developed a very sophisticated tool (the competing risk nomogram) that should be able to quantify the individual protective effect of radical prostatectomy (RP) relative to an observational treatment strategy. As noted, such a tool would be very helpful. This nomogram is based on propensity score matching and competing risk analysis using the Surveillance Epidemiology and End Results (SEER)–Medicare database, from which Abdollah et al. identified 141 155 men aged 65 yr with nonmetastatic PCa. After exclusion of T3/T4 tumours and missing data, 44 694 PCa patients were included in the study. Why did the authors use propensity score matching, and what is it? Randomly assigning men to a treatment modality tends to balance covariates so that both groups are comparable with respect to covariables like age distribution; therefore, this approach is the best for comparing treatment results. However, randomising men with localised PCa into a certain treatment option has proven difficult [7], and achieving this requires a multidisciplinary approach, a long-term commitment, and sufficient funding [8]. An alternative to random assignment is a matched-pairs design, in which each member of the first treatment group is matched with a member of the second treatment group on all factors that the researcher considers to be relevant and feasible. Propensity score
DOI of original article: 10.1016/j.eururo.2011.06.039 * Corresponding author. Department of Urology, Erasmus University Medical Centre, Room Nh-224, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands. Tel. +31 10 703 2240; Fax: +31 10 703 5315. E-mail address:
[email protected] (M.J. Roobol). 0302-2838/$ – see back matter # 2011 European Association of Urology. Published by Elsevier B.V. All rights reserved.
932
EUROPEAN UROLOGY 60 (2011) 931–934
matching is a refined approach to this design. The factors or covariates are combined into a propensity score, and men in both treatment groups are matched using this score. This means that the propensity score method can correct for biases if all pertinent variables are observed and measured without error. When a confounder to the treatment is not observed or measured with error, the propensity score method cannot fully correct the hidden bias. One way to test this is with a sensitivity analysis to measure the effect of an unknown confounder or to use a pretreatment variable that is known to be unaffected by the treatment. Applying the propensity score method on the basis of this variable should result in no treatment effect [9]. The authors performed the first option and assessed the effect of an unknown confounder. If the prevalence of the unknown confounder was at least 30% higher in one group, then the reported protective effect of the RP would disappear. This is very unlikely to happen. In the current paper, the propensity score matching resulted in a cohort of 11 669 matched pairs. Subsequently, the cohort was randomly split into a development cohort and a validation cohort that should result in comparable numbers of outcome variables. However, the number of PCa deaths in the RP group in the development cohort (n = 94) is considerably lower than in the validation cohort (n = 130), raising questions about the random subdivision of the two cohorts. In addition, the discrimination of the nomogram, especially for the prediction of other-cause mortality, was not perfect. The newly developed nomogram is based on a so-called competing risk model. What is this exactly, and why is it of importance for PCa? Survival data analyses investigate time-to-event data. The often-used Kaplan-Meier method provides an estimate for which it is assumed that censoring is noninformative. This assumption implies that the reason for censoring does not change the probability of the event of interest. By definition, this does not hold for competing events: If a subject dies from a cause not of interest to the research question, we know for sure that that person will not die from the cause of interest. Competing risks should therefore be considered in any analysis of cause-specific mortality. In the current study, a man might die from another cause than PCa. This man can no longer die of PCa, but he also would not survive to the end of the study period. The estimated cumulative incidence of an event of interest using the Kaplan-Meier method generally results in estimates larger than those obtained when accounting for competing risks. This is especially true for a disease that causes death in the elderly, for whom competing risks are abundantly present. The current paper used both techniques and produced a nomogram that is supposed to be of aid in clinical practice; however, a few additional critical remarks are warranted. The nomogram is based on SEER registry data. These administrative databases have their weaknesses in the availability of relevant data, as pointed out in the paper. The authors acknowledge the fact that PSA is missing, and they state that this is not crucial because PSA has been shown to have an intermediate effect on survival; however, this lack
might limit the predictive capability of the nomogram. Other studies have shown that PSA at age 60 is an extremely strong predictor of the risk of PCa metastasis (area under the curve [AUC]: 0.86) and death (AUC: 0.90) by age 85 [10]. Also in a nomogram developed to predict PCa specific mortality after RP, PSA has a significant contribution [11]. The outcome in the model is the potential benefit of RP versus an observational approach for low-risk clinically localised PCa. Consequently, the quality and the definition of these treatment data in the registry is critical. The nomogram was developed on the basis of data coming from numerous hospitals in different regions. The latter may be beneficial for the generalisability of the model but might raise questions about applicability in a particular hospital setting. It is known that high-volume centres have significantly better outcomes after RP than low-volume centres [12]. Whether or not the pooled data of the SEER database, despite large sample sizes, are representative of one’s personal experience remains questionable. Observation as a treatment option is defined as the lack of active treatment codes during 6 mo following PCa diagnosis. The possibility that active treatment was initiated after 6 mo cannot definitely be excluded. It is also unknown whether the men treated with RP received additional treatment after 6 mo. Furthermore, the term observation is vague. Active surveillance for low-risk clinically localised PCa within a strict protocol with scheduled visits and defined triggers for a switch to active therapy [13] might very well not be comparable with the SEER data on observation. Another remarkable characteristic of the nomogram, a direct consequence of the underlying data, is the fact that Gleason 8–10 has more impact on overall mortality than Gleason 6–7 PCa. The same holds true for clinical stage: T2C PCa has relatively more impact on overall mortality than T2A/B PCa, although one would expect the opposite. The nomogram is based on hazard ratios calculated using multivariable competing risk analysis. The confidence intervals around these hazard ratios are still rather large, causing these counterintuitive effects on overall and cancerspecific mortality. The authors have developed a very sophisticated tool that is correctly analysed and constructed from a statistical point of view. However, use of the nomogram requires two different assessments of a score and calculations. The underlying data may not be representative for a contemporary clinical setting, making the practical applicability of the proposed nomogram most likely limited. Conflicts of interest: The authors have nothing to disclose.
References [1] Catalona WJ, Smith DS, Ratliff TL, et al. Measurement of prostatespecific antigen in serum as a screening test for prostate cancer. N Engl J Med 1991;324:1156–61. [2] Schro¨der FH, Hugosson J, Roobol MJ, et al. Screening and prostatecancer mortality in a randomized European study. N Engl J Med 2009;360:1320–8. [3] Draisma G, Boer R, Otto SJ, et al. Lead times and overdetection due to prostate-specific antigen screening: estimates from the European
EUROPEAN UROLOGY 60 (2011) 931–934
Randomized Study of Screening for Prostate Cancer. J Natl Cancer Inst 2003;95:868–78. [4] Steginga SK, Turner E, Donovan J. The decision-related psychosocial concerns of men with localised prostate cancer: targets for intervention and research. World J Urol 2008;26:469–74. [5] Schro¨der F, Kattan MW. The comparability of models for predicting the risk of a positive prostate biopsy with prostate-specific antigen alone: a systematic review. Eur Urol 2008;54:274–90. [6] Abdollah F, Sun M, Schmitges J, et al. Cancer-specific and othercause mortality after radical prostatectomy versus observation
933
[9] Luo Z, Gardine JC, Bradley CJ. Applying propensity score methods in medical research: pitfalls an prospects. Med Care Res Rev 2010;67: 528–53. [10] Vickers AJ, Cronin AM, Bjo¨rk T, et al. Prostate specific antigen concentration at age 60 and death or metastasis from prostate cancer: case-control study. BMJ 2010;341:c4521. [11] Stephenson AJ, Kattan MW, Eastham JA, et al. Prostate cancerspecific mortality after radical prostatectomy for patients treated in the prostate-specific antigen era. J Clin Oncol 2009;27: 4300–5.
in patients with prostate cancer: competing-risks analysis of a
[12] Vickers AJ, Savage CJ, Bianco FJ, et al. Surgery confounds biology:
large North American population-based cohort. Eur Urol 2011;
the predictive value of stage-, grade- and prostate-specific antigen
60:920–30.
for recurrence after radical prostatectomy as a function of surgeon
[7] PR06 Collaborators. Early closure of a randomized controlled trial of three treatment approaches to early localised prostate cancer: the MRC PR06 trial. BJU Int 2004;94:1400–1. [8] Lane JA, Hamdy FC, Martin RM, Turner EL, Neal DE, Donovan JL. Latest results from the UK trials evaluating prostate cancer
experience. Int J Cancer 2011;128:1697–702. [13] van den Bergh RCN, Roemeling S, Roobol MJ, Roobol W, Schro¨der FH, Bangma CH. Prospective validation of active surveillance in prostate cancer: the PRIAS study. Eur Urol 2007;52:1560–3.
screening and treatment: the CAP and ProtecT studies. Eur J Cancer 2010;46:3095–101.
doi:10.1016/j.eururo.2011.07.039
Platinum Priority Reply from authors re: Monique J. Roobol, Eveline A.M. Heijnsdijk. Propensity Score Matching, Competing Risk Analysis, and a Competing Risk Nomogram: Some Guidance for Urologists May Be in Place. Eur Urol 2011;60:931–3 Maxine Sun a,*, Firas Abdollah b, Pierre I. Karakiewicz a,c a Cancer Prognostics and Health Outcomes Unit, University of Montreal Health Center, Montreal, Canada; b Department of Urology, Vita Salute San Raffaele University, Milan, Italy; c Department of Urology, University of Montreal Health Center, Montreal, Canada
We read with interest the comment by Roobol et al [1] with regard to our recent publication [2]. The authors [1] systematically summarized the statistical methods used and restated the limitations that pertained to our study [2]. First, as previously indicated, we agree that propensity-based matched analysis may not be considered an equivalent to randomized trials. However, it serves as a fitting alternative and reduces to a minimum the inherent treatment selection bias associated with retrospective data. Second, although the lack of prostate-specific antigen (PSA) may have undermined the discrimination of our model, its prognostic ability remains secondary to tumor stage and grade [3], for which were adjusted for. Third, the authors [1] proposed that the model may not be applicable to patients operated at hospitals with high volume. Although the relevance of hospital volume is undeniable, no study to date has demonstrated that hospital volume increases the DOIs of original articles: 10.1016/j.eururo.2011.06.039, 10.1016/j.eururo.2011.07.039 * Corresponding author. Cancer Prognostics and Health Outcomes Unit, University of Montreal Health Center, 1058, rue St-Denis, Montreal, Quebec, Canada H2X 3J4. Tel. +1 514 890 8000 ext 35335; Fax: +1 514 227 5103. E-mail address:
[email protected] (M. Sun).
prognostic ability of already-established factors (eg, tumor stage, tumor grade, age, comorbidity). Consequently, the role of hospital volume for prediction of prognosis remains inconclusive. Fourth, the authors stated that there may have been some residual differences between the development and validation cohorts. However, we do not view this as a limitation. In fact, most validation cohorts will differ from the cohort from which the model was developed. As such, this can only be considered an additional strength of our study [2]. Finally, with regard to the model, Gleason 8–10 was not statistically significantly associated with other-cause mortality (hazard ratio: 0.92 [range: 0.80–1.06]). Nonetheless, it is intuitive that it will show a trend toward a protective effect when examining this end point because these patients are more likely to succumb to cancer-specific mortality. The same observation may be made with regard to clinical stage.We acknowledge the limitations that the authors have noted (eg, treatment selection bias, lack of PSA, lack of hospital volume, unknown confounder). Nonetheless, we relied on stringent and statistically sound methods that were previously used within several established publications [4–6]: (1) propensity-based adjustment; (2) segregation of the population into two cohorts, namely, the development and validation cohorts; (3) competing-risks regression analyses to account for other-cause mortality; (4) development of a nomogram; (5) estimation of the predictive accuracy using discrimination and calibration; and (6) sensitivity analyses. The end result was a user-friendly prediction tool aimed at helping clinicians as well as patients in routine clinical counseling and follow-up decision making. Many novel prediction tools have been proposed to assess the prognosis of patients diagnosed with localized prostate cancer. To the best of our knowledge, our study provides the first competing risk nomogram, which can provide an estimation of the risk of cancer-specific and other-cause