ORIGINAL ARTICLE
Independent Validation of Six Melanoma Risk Prediction Models Catherine M. Olsen1, Rachel E. Neale1, Ade`le C. Green1,2, Penelope M. Webb1, the QSkin Study1, the Epigene Study1 and David C. Whiteman1 Identifying people at high risk of melanoma is important for targeted prevention activities and surveillance. Several tools have been developed to classify melanoma risk, but few have been independently validated. We assessed the discriminatory performance of six melanoma prediction tools by applying them to individuals from two independent data sets, one comprising 762 melanoma cases and the second a population-based sample of 42,116 people without melanoma. We compared the model predictions with actual melanoma status to measure sensitivity and specificity. The performance of the models was variable with sensitivity ranging from 97.7 to 10.5% and specificity from 99.6 to 1.3%. The ability of all the models to discriminate between cases and controls, however, was generally high. The model developed by MacKie et al. (1989) had higher sensitivity and specificity for men (0.89 and 0.88) than women (0.79 and 0.72). The tool developed by Cho et al. (2005) was highly specific (men, 0.92; women, 0.99) but considerably less sensitive (men, 0.64; women, 0.37). Other models were either highly specific but lacked sensitivity or had low to very low specificity and higher sensitivity. Poor performance was partly attributable to the use of non-standardized assessment items and various differing interpretations of what constitutes ‘‘high risk’’. Journal of Investigative Dermatology (2015) 135, 1377–1384; doi:10.1038/jid.2014.533; published online 29 January 2015
INTRODUCTION Accurate identification of people at high risk of melanoma is important for both prevention and early detection. In terms of medical surveillance, although the efficacy of routine skin screening in people of average risk remains unproven (Geller, 2002; Wolff et al., 2009), there is some evidence that screening those at high risk of melanoma is cost-effective (Freedberg et al., 1999) and may improve survival (MacKie et al., 1993). Medical organizations in North America (Feightner, 1994; Wolff et al., 2009), New Zealand, and Australia (ACNMGRW, 2008) recommend screening subsets of the population determined to be at high risk. A number of tools have been developed to assist this targeted screening, but recent reviews (Usher-Smith et al., 2014; Vuong et al., 2014) reported that only one (Fortes et al., 2010) had been externally validated in a new population, an important step to assess the performance of a model (Altman et al., 2009). 1
Cancer Control Group, Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia and 2Cancer Research UK Manchester Institute and Institute of Inflammation and Repair, University of Manchester, Manchester, UK Correspondence: David C. Whiteman, Cancer Control Group, Population Health Department, QIMR Berghofer Medical Research Institute, Locked Bag 2000, RBWH, Brisbane, Queensland 4029, Australia. E-mail:
[email protected] Abbreviations: AUC, area under the curve; ROC, receiver operating characteristic curve Received 18 September 2014; revised 18 November 2014; accepted 9 December 2014; accepted article preview online 30 December 2014; published online 29 January 2015
& 2015 The Society for Investigative Dermatology
Models tend to perform better when the distributions of melanoma risk factors in the new population are within the ranges seen in the development population (Moons et al., 2009). To assess the validity of a risk model in an independent population, one first calculates a risk score for the individuals in the new population using the parameters from the specified risk model. These risk scores are then compared with the actual status of each individual to assess how well the model discriminates those with the condition from those without. There are three different elements of validity: (i) agreement between the observed and predicted probabilities of outcomes (calibration), (ii) ability to distinguish subjects with different outcomes (discrimination), and (iii) ability to improve the decision-making process (clinical usefulness). We assessed the performance of six melanoma risk prediction tools, applying the risk metrics defined by the model developers, using two independent Australian data sets: a series of melanoma patients (the Epigene Study (Kvaskoff et al., 2013)) and a large population-based sample of people without melanoma (controls; the QSkin Study (Olsen et al., 2012)). RESULTS Table 1 shows a summary of the risk prediction models. The number of variables included in the models ranged from 4 to 10. Only one model developed in an Italian population had been externally validated (in a Brazilian population; Fortes et al., 2010). Two other models were developed in US populations (Cho et al., 2005; Fears et al., 2006) and one each in Australian (Mar et al., 2011), Scottish www.jidonline.org 1377
CM Olsen et al. Validating Melanoma Risk Prediction Tools
Table 1. Characteristics of six melanoma risk prediction models Author (year of publication)
Geographic location
Mar et al., 2011
Study design
Model predictors
Risk metrics
Australia
Published meta-analysis and registry data
Age Sex State Total common nevi Atypical nevi Freckles Hair color Family history of melanoma Personal history of melanoma Personal history of keratinocyte cancer
5-year risk (%): Very high, 43%; High, 0.15%; Moderate, 40.045%; Low
Quereux et al., 2011
France
Case-control
Age Sex Nevi on arms Freckles Family history of melanoma Severe sunburn in childhood Skin phototype Life in a country at low latitude
High risk defined as having three or more risk factors or one major risk factor.
Fortes et al., 2010
Italy
Case-control
Common nevi Skin color Hair color Freckles Sunburns in childhood
Score of 3 or more. High risk defined by using cut-off points on the receiver operating characteristic curve to maximize sensitivity and specificity
Fears et al., 2006
USA
Case-control
Age Sex Region Small nevi Large nevi1 Freckling Light complexion Skin photototype Light or no tan Severe solar damage1
5-year risk (%): Very high, 43%; High, 0.15%; Moderate, 40.045%; Low
Cho et al., 2005
USA
Prospective cohort
Age Sex Nevi 43 mm on arms/legs Hair color Family history of melanoma History of severe sunburn
10-year risk (%) Risk cut-points not defined
MacKie et al., 1989
Scotland
Case-control
Sex Total nevi Atypical nevi Freckling tendency
Relative Risk: o2 ¼ low or average risk; 2–5 ¼ increased risk; 5–10 ¼ greatly increased risk; 410 ¼ very greatly increased risk
1
Men only.
(MacKie et al., 1989), and French (Quereux et al., 2011) populations. Study design
Of the six tools, four were developed from case-control studies (MacKie et al., 1989; Fears et al., 2006; Fortes et al., 2010; Quereux et al., 2011), one from a prospective cohort study (Cho et al., 2005), and one using relative risks from meta-analyses of published data and attributable risk estimates from a clinical series of melanoma patients (Mar et al., 2011). 1378 Journal of Investigative Dermatology (2015), Volume 135
Definition of risk metrics
Two risk prediction models used arbitrary absolute risk cutpoints, defining ‘‘high risk’’ individuals as those with a 5-year absolute risk greater than 0.15% (Fears et al., 2006; Mar et al., 2011). One model (Fortes et al., 2010) defined ‘‘high risk’’ individuals using cut-off points on the receiver operating characteristic (ROC) curve to maximize sensitivity and specificity. For another model (Quereux et al., 2011), a ‘‘high risk’’ individual was defined as having three or more risk factors present or one major risk factor (‘‘presence of more than 20 nevi on the arms’’ for people aged below 60 years and
CM Olsen et al. Validating Melanoma Risk Prediction Tools
Table 2. Sensitivity, specificity, and receiver operating characteristic curve areas (AUC) of six melanoma risk prediction models tested in two independent data sets Sensitivity
Specificity
AUC ROC (95% CI)
Risk prediction model Males
Females
Males
Females
Males
Females
Persons
Mar et al., 2011
0.36
0.11
0.99
0.99
0.96 (0.95–0.97)
0.89 (0.86–0.91)
0.93 (0.92–0.95)
Quereux et al., 2011
0.62
0.84
0.60
0.47
0.81 (0.79–0.83)
0.82 (0.79–0.85)
0.80 (0.78–0.81)
Fortes et al., 2010
1
1
0.01
0.01
0.91 (0.89–0.92)
0.88 (0.86–0.90)
0.89 (0.88–0.90)
Fears et al., 2006
0.98
0.79
0.12
0.21
0.82 (0.80–0.84)
0.47 (0.43–0.50)
0.73 (0.71–0.75)
Cho et al., 20051
0.64
0.37
0.92
0.99
0.93 (0.92–0.94)
0.90 (0.88–0.91)
0.92 (0.91–0.93)
0.89
0.79
0.88
0.72
0.92 (0.89–0.92)
0.79 (0.77–0.81)
0.85 (0.84–0.86)
2
MacKie et al., 1989
Abbreviations: AUC, area under the curve; CI, confidence interval; ROC, receiver operating characteristic curve. 1 calculated 5-year risk and using the same cut-point (0.15%) as the Mar and Fears models. 2 very greatly increased risk (RR410).
‘‘presence of freckles’’ for patients aged 60 years and above). The fifth model defined ‘‘very high risk’’ individuals as those with a relative risk over 10 (MacKie et al., 1989). The sixth model by Cho et al. (2005) did not define risk cut-points. Characteristics of the case (Epigene) and control (QSkin) populations are provided in Supplementary Tables 1 and 2, online respectively. The mean age of cases was 58 years and controls 56 years. Sensitivity, specificity, and area under the ROC curve
The performance of the models in the validation data sets using the thresholds defined by the original investigators was variable. Sensitivity of the models ranged from 0.98 to 0.11 and specificity from 1 to 0.01. The model developed by MacKie et al. (1989) had higher sensitivity for men than women (0.89 and 0.79, respectively) and also higher specificity (0.88 and 0.72 ; Table 2). The tool developed by Cho et al. (2005) was highly specific (0.92 for men; 0.99 for women) but less sensitive (0.64 for men; 0.37 for women). Other models were either highly specific (Mar et al., 2011) but lacked sensitivity or had low (Quereux et al., 2011) to very low specificity (Fears et al., 2006; Fortes et al., 2010) and higher sensitivity. It is notable that two models categorized a large proportion of both cases and controls as high risk (Fears et al., 2006; Fortes et al., 2010), whereas two others had high specificity but lower sensitivity, correctly identifying a large proportion of controls as low risk (Cho et al., 2005; Mar et al., 2011). Factors contributing to these outcomes included the risk factors included in the models (Table 1), the magnitude of the risks ascribed to those factors (Supplementary Tables 3–8 online), and the prevalence of those risk factors in the two independent populations (Supplementary Tables 1–2 online). For the model by Fortes et al. (2010), the median risk scores in the case and control data sets were 218 and 42, respectively, but using the authors’ defined cut-point of 3 placed a large proportion of both cases and controls in the high risk category. Similarly, for the model by Fears et al. (2006), the mean estimate of 5-year melanoma risk was 0.49 and 0.71 in the control and case data sets, respectively, with a cut-point for
high risk of 0.15. Conversely, the model by Mar et al. (2011) correctly classified 99% of controls as low risk but showed lower sensitivity; the median 5-year melanoma risk was 0.006 in the control data set and 0.08 in the case data set (again with a cut-point for high risk of 0.15). Stratification by age showed that the models by Quereux et al. (2011) and MacKie et al. (1989) were more sensitive for people aged o60, whereas the model by Cho et al. (2005) was more sensitive among those aged Z60 years (Supplementary Table 9 online). Model specificity did not vary widely between the two age-groups. There were no appreciable differences in the model performance for men and women separately (data not shown). Using a common threshold of specificity, the model with the highest sensitivity was the model by Mar et al, 2011; (Table 3), followed by the models of Cho et al. (2005), MacKie et al. (1989), and Fortes et al. (2010). The model by Fears et al. (2006) had the lowest sensitivity at thresholds of 80 and 90% specificity; this model included a measure of ‘‘severe solar damage’’ to the skin (for men only), which was not available in the Epigene or QSkin data sets. Overall, the discriminatory performance of the models was high, with area under the curve (AUC) values ranging from 0.73 (95% confidence interval (CI) 0.71–0.75) for the model by Fears et al. (2006) to 0.93 (95% CI 0.92–0.95) for the model by Mar et al. (2011); Table 2; Figure 1). For the two models that had presented AUC values for their model derivation data sets, the AUC values were slightly higher in our validation data set (0.80 vs. 0.71 for the model by Quereux et al. (2011) and 0.89 vs. 0.79 for the model by Fortes et al. (2010)). Calibration
When the Hosmer–Lemeshow goodness-of-fit test was used to quantify the overall fit of the models to the data, a high Pvalue for the model developed by MacKie et al. (1989; P ¼ 0.99) indicated an excellent agreement between the observed and the expected number of cases; for all other models, the P-value was o0.001. The calibration curves are presented as Figure 2. www.jidonline.org 1379
CM Olsen et al. Validating Melanoma Risk Prediction Tools
1.0
Table 3. Sensitivity of six melanoma risk prediction models tested in two independent data sets using two different thresholds of specificity Sensitivity at 80% specificity
Sensitivity at 90% specificity
Mar et al., 2011
0.89
0.81
Quereux et al., 2011
0.64
0.51
Fortes et al., 2010
0.83
0.66
Fears et al., 2006
0.54
0.41
Cho et al., 2005
0.86
0.76
MacKie et al., 1989
0.81
0.70
0.8 Observed proportion
Risk prediction model
0.6
0.4 Fears Cho Mar Fortes Mackie Quereux
0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
Predicted probability
Figure 2. Calibration plots for six melanoma risk prediction models in the validation cohort.
1.0
Sensitivity
0.8
0.6
0.4 Cho, A=0.92 Fears, A=0.73 Mar, A=0.93 Fortes, A=0.89 Quereux, A=0.80 MacKie, A=0.85
0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
1–specificity
Figure 1. Receiver operating characteristic (ROC) curve and corresponding area under the curve (AUC) statistics for the risk score of six melanoma risk prediction models in the validation cohort.
Classification accuracy
When the accuracy of each model was assessed for classifying participants’ risks of melanoma according to categories as defined by the authors (Table 4), results differed substantially. Although the models by MacKie et al. (1989), Fears et al. (2006), and Fortes et al. (2010) correctly assigned high or very high risk classifications to more than 90% of melanoma cases, the latter two also assigned similarly high classifications to more than 80% of population controls. Sensitivity analyses
We repeated the analyses replacing ‘‘facial freckling’’ with ‘‘shoulder freckling’’ in the evaluation of the models by MacKie et al. (1989), Fears et al. (2006), Quereux et al. (2011), and Fortes et al. (2010)); these models all included 1380 Journal of Investigative Dermatology (2015), Volume 135
‘‘presence of freckling’’ with the exception of Fears et al. (2006) who included ‘‘freckling on the back’’. This substitution decreased the sensitivity of the model by MacKie et al. (1989; from 0.79 to 0.58 for women and from 0.89 to 0.82 for men), increased the sensitivity of the model by Fears et al. (2006) for women only (from 0.79 to 0.86) and the model by Quereux et al. (2011; from 0.84 to 0.96 for women and from 0.62 to 0.85 for men), and made no change to the sensitivity of the model by Fortes et al. (2010). Using alternative skin phototype variables (burning/tanning) in lieu of skin color made no appreciable difference to the sensitivity or specificity of the models by Fears et al. (2006) and Fortes et al. (2010; data not presented). Substituting a tanning variable in place of a burning variable for skin type increased the sensitivity and specificity of the model by Quereux et al. (2011; sensitivity 0.95 for women and 0.83 for men; specificity 0.56 for women and 0.68 for men). Substituting these alternate variables in the ROC analyses did not materially change the AUC for any of the risk prediction models. DISCUSSION There has been increasing interest in developing risk prediction tools to assess disease risk and prognosis. Models predicting risks of common cancers have been developed (Freedman et al., 2005), including for cancers of the breast (Gail et al., 1989), colorectum (Imperiale et al., 2003), prostate (Eastham et al., 1999), and lung (Spitz et al., 2007). They have been reasonably successful at identifying groups at higher and lower risks of developing cancer but have, in general, shown a limited ability to accurately discriminate between those who do and do not develop cancer. Our analyses suggest that, when using the cut-points defined by the model developers, the sensitivity and specificity of published melanoma risk prediction models are relatively low when tested in independent samples. This indicates that the models were poorly calibrated for predicting melanoma risk in the Australian population; however, the ability of the models to
CM Olsen et al. Validating Melanoma Risk Prediction Tools
Table 4. Classification accuracy of six melanoma risk prediction models tested in two independent data sets using cut-points as defined by the model developers Melanoma cases
Population controls
Risk prediction model Males (%)
Females (%)
Males (%)
Females (%)
Low risk
28.0
55.7
96.3
96.9
Moderate risk
36.0
33.9
3.3
2.7
High risk
36.0
10.5
0.4
0.4
0
0
0
0
Mar et al., 2011
Very high risk Quereux et al., 2011 Not at high risk
38.1
15.9
60.5
46.7
High risk
61.9
84.1
39.5
53.3
0
0
1.3
1.3
100
100
98.7
98.7
0
2.7
0
1.5
Fortes et al., 2010 Not at high risk High risk Fears et al., 2006 Low risk Moderate risk High risk Very high risk
2.3
18.7
13.2
19.1
89.4
78.7
85.1
79.1
8.4
0
1.7
0.3
Cho et al., 2005 Low risk
0.8
7.0
46.7
66.4
Moderate risk
35.5
55.7
45.5
32.2
High risk
63.7
37.3
7.8
1.4
0
0
0
0
Low or average risk
4.5
8.2
39.8
31.8
Increased risk
4.0
7.8
17.4
37.4
Very high risk MacKie et al., 1989
Greatly increased risk Very greatly increased risk
2.8
5.0
30.8
3.0
88.8
79.0
12.1
27.8
discriminate between cases and controls was generally high in our combined data sets. Published melanoma risk prediction models are heterogeneous in terms of the design of the studies used to develop the models, how predictor variables were defined and included, and how patients at ‘‘high risk’’ were categorized and defined. Our sensitivity analyses demonstrate the impact of using slightly different predictor variables on the sensitivity and specificity of some models. The variation in model performance seen after apparently subtle changes in terms underscores the need to establish standardized definitions of these characteristics across studies. In evaluating the performance of the models, we considered the trade-off between sensitivity and specificity. As we were interested in each model’s capacity to stratify people into clinically relevant risk categories, models that placed more people at the extremes of the risk distribution would have the
advantage of facilitating decision-making around targeted surveillance (with fewer people in the ‘‘middle’’ categories where there may be uncertainty about the appropriate advice). The model with the highest sensitivity and specificity when applied to our data sets using the cut-points defined by the model developers was the model by MacKie et al. (1989), developed in a case-control study and incorporating only four variables (sex, total nevi, atypical nevi, and freckling tendency). The next best performing models were those developed by Cho et al. (2005) and Quereux et al. (2011), although the model by Cho et al. (2005) was more sensitive among men than women. The model by Cho et al. (2005) was developed in a large prospective study and included ten variables, whereas the model by Quereux et al. (2011), with eight predictor variables, was developed in a case-control study. In evaluating the differential performance of these models in our data sets, it was difficult to disentangle the independent effects of the inclusion/exclusion of risk factors in the model and their effect sizes. In general, the highest relative risks reported were for high nevus counts, both common and atypical (Supplementary Tables 3–8 online). The discriminatory ability of the melanoma risk prediction models in our combined data set was generally high with AUC between 0.73 and 0.93. To put these findings in the context of models for other cancers, a recent meta-analysis reported a summary C statistic (equivalent to the AUC in ROC analyses) for the most well-known model for predicting risk of breast cancer, the model by Gail et al. (1989). Pooling five independent validations for the Gail-2 model resulted in a summary C statistic of 0.63 (95% CI 0.59–0.67; Meads et al., 2012). Values between 0.6 and 0.8 are also standard for risk prediction tools in chronic disease (Freedman et al., 2005). Notable findings of our evaluation of melanoma risk prediction tools were the marked between-study heterogeneity in the sources and collection of data, the factors considered for inclusion in the models, the categorization of these risk factors, and the choice of analytic model. Strengths of our analyses include the large sample size of our case and control groups. All cases were histologically confirmed. Limitations include possible misclassification of some exposures as a result of the harmonization of risk factor data. Our melanoma case group included only melanomas of the trunk, head, and neck and did not include cases with melanoma on the limbs. If the risk profile of people with limb melanomas differed greatly from that of other body sites our evaluation of the discriminatory accuracy of the prediction models may be inaccurate, but evidence suggests that this is not the case (Green and Siskind, 2012). We did not evaluate risk prediction models that incorporated genetic predictors as such data were not available for our validation data sets. Potentially, incorporating genetic information into risk prediction models could identify people as high risk who might not be so identified on the basis of their phenotypes or exposures alone, although evidence for this is conflicting. Several studies have reported that including the terms for genotype in addition to the core predictive factors gave no useful improvement in prediction (Bishop et al., 2009; Stefanaki et al., 2013), whereas another reported that includwww.jidonline.org 1381
CM Olsen et al. Validating Melanoma Risk Prediction Tools
ing MC1R genotype information did convey information about melanoma risk beyond phenotype (Kanetsky et al., 2010). A recent study found that including information about MC1R and previously unreported common genomic variants increased the predictive capacity of the model by 17% over the nongenetic model (Cust et al., 2014). In summary, a large number of prediction models have been developed for melanoma, ostensibly to aid clinicians (and their patients) to quantify individual risk and develop appropriate management plans. To be useful, a prediction tool must be accurate, generalizable, and clinically effective (Reilly and Evans, 2006). We found that most existing melanoma prediction models were poorly calibrated, at least when the models were applied to an independent Australian data set using the originally published cut-off scores to estimate risk. However, most models identified higher proportions of melanoma cases compared with disease-free controls across the range of all possible cut-off scores and thus had reasonable discriminatory accuracy. These findings suggest that models need to be calibrated specifically for target populations, using population-specific cut-off scores, before they can be used in clinical practice. We therefore echo the plea for more efforts to validate risk prediction tools and encourage the adoption of reproducible, standardized measures of predictor variables that can be easily recorded in the clinical setting (Usher-Smith et al., 2014). MATERIALS AND METHODS Melanoma risk prediction tools We included six melanoma risk prediction models (MacKie et al., 1989; Cho et al., 2005; Fears et al., 2006; Fortes et al., 2010; Mar et al., 2011; Quereux et al., 2011) identified from the recent systematic review (Vuong et al., 2014) that incorporated variables for which we had equivalents in our data sets. We did not include models using genetic information (Smith et al., 2012; Stefanaki et al., 2013) or those that provided insufficient information to undertake external validation (English and Armstrong, 1988; Marrett et al., 1992; Garbe et al., 1994; Barbini et al., 1998; Landi et al., 2001; Harbauer et al., 2003; Fargnoli et al., 2004; Whiteman and Green, 2005; Goldberg et al., 2007; Williams et al., 2011; Guther et al., 2012).
Validation populations Epigene. The Epigene data set included information for 762 incident cases of first primary melanoma diagnosed between 1 April 2007 and 30 September 2010. The methods of this case–case study have been described previously (Kvaskoff et al., 2013). All cases were aged 18–79 years and were residents of the greater Brisbane region, Queensland, Australia. Cases were ascertained prospectively through pathology companies servicing the region. The study did not recruit patients with melanomas on the limbs. Participants completed a detailed questionnaire and underwent a clinical examination by a dermatologist. The questionnaire captured information about demographic, phenotypic, medical, and other risk factors including sun exposure history. The clinical examination recorded hair and eye color and numbers of solar keratoses and melanocytic nevi present across all body sites. 1382 Journal of Investigative Dermatology (2015), Volume 135
QSkin. The QSkin Study comprises a cohort of 43,781 men and women aged 40–69 years randomly sampled from the population of Queensland, Australia, in 2011 (Olsen et al., 2012; overall participation fraction 23%). At baseline, information about demographic items, general medical history, standard pigmentary characteristics (including hair and eye color, freckling tendency, tanning ability and propensity to sunburn), past and recent history of sun exposure and sunburns, sun protection behaviors, use of tanning beds, and history of skin cancer was collected by self-completed questionnaire. Participants gave their consent for data linkage to cancer registries ensuring complete ascertainment of all melanoma occurrences (notification of cancer to Australian Cancer Registries has been legally mandated since 1982). For these analyses, we restricted the data set to 42,116 participants with no prior history of melanoma. Data harmonization. We matched variables from the Epigene and QSkin data sets (the ‘‘validation datasets’’) as closely as possible to those described in each model (‘‘derivation dataset’’, Supplementary Tables 3–8 online), irrespective of their later performance in the model. Where more than one variable in the validation data sets was a possible match for the corresponding variable from the derivation data set, we performed sensitivity analyses using the alternative variables. For example, in the Epigene data set, we had two freckling variables. In our primary analyses, we used the categorical measures of freckling on the face but performed sensitivity analyses using freckling on the shoulders. For skin phototype, both the Epigene and QSkin data sets included separate variables for tanning and burning responses. Where skin phototype was included in the model, we used burning response for our primary analyses, but we performed sensitivity analyses using the variable for tanning response. The Epigene data set did not include a variable for skin color, and we instead used burning response performing sensitivity analyses using tanning response. Statistical analyses For each risk prediction tool, we applied the fully specified model and calculated a risk score for each person in the two validation data sets, applying the risk metrics (as defined by the model developers) to classify people into risk categories (Supplementary Appendix 1 online). We then compared these predictions with actual outcomes, examining sensitivity in the melanoma cases (Epigene data set) and specificity in the disease-free controls (QSkin data set). For each analysis, we excluded individuals with missing data for one or more predictor variables. The number excluded varied depending upon the variables specified in each model and ranged from 18% (model by Quereux et al. (2011)) to 34% (model by Mar et al. (2011)). We conducted these analyses for all study participants and also stratified by age (o60/X60 years); 48% of cases and 65% of controls were aged o60 years. We examined the classification accuracy of each model using categories as defined by the authors of each model. For the single model that did not define a high risk cut-point (Cho et al., 2005), we defined a 5-year absolute risk greater than 0.15% as high risk to be consistent with two other models (Fears et al., 2006; Mar et al., 2011). We investigated reasons for high and low sensitivity and specificity by examining factors included in the models, their effect sizes, and the prevalence of those risk factors in the two independent populations.
CM Olsen et al. Validating Melanoma Risk Prediction Tools
Our calculations of sensitivity and specificity were dependent upon the choice of cut-point used to define ‘‘high risk’’ by each model. We also calculated the sensitivity of each model using a common threshold of specificity (80 and 90%). Area under the ROC curve is a plot of (1-specificity) of a test against its sensitivity for all possible cut-off points (Zweig and Campbell, 1993). We therefore combined the two data sets and evaluated the predictive discrimination of each model using the area under the ROC curves. We tested the calibration of each model using the Hosmer–Lemeshow goodness-of-fit test (Hosmer et al., 1997), where a small P-value indicates departure from goodness-of-fit. We used SAS PROC LOGISTIC (V.8.02; SAS Institute, Cary, NC) to calculate the area under the ROC curves. CONFLICT OF INTEREST The authors state no conflict of interest.
ACKNOWLEDGMENTS This work was supported by Program Grant 552429 and Project Grant 442960 from the National Health and Medical Research Council of Australia (NHMRC). DCW, REN, and PMW are supported by Research Fellowships from the NHMRC. These analyses used epidemiologic data from the Epigene Study and the QSkin Study, for which DCW was the Principal Investigator. Chief Investigators of the Epigene Study are DCW, ACG, Richard Williamson, Dominic Woods, Joe Triscott, and Rohan Mortimore. Marina Kvaskoff and Nirmala Pandeya cleaned the Epigene data and derived summary variables. Chief Investigators of the QSkin Study are DCW, CMO, ACG, REN, and PMW. We are also grateful to the Queenslanders who have willingly given their time to take part in the QSkin and Epigene Studies. SUPPLEMENTARY MATERIAL
Feightner JW (1994) Prevention of Skin Cancer. Canadian Task Force on the Periodic Health Examination: Ottawa Fortes C, Mastroeni S, Bakos L et al. (2010) Identifying individuals at high risk of melanoma: a simple tool. Eur J Cancer Prev 19:393–400 Freedberg KA, Geller AC, Miller DR et al. (1999) Screening for malignant melanoma: a cost-effectiveness analysis. J Am Acad Dermatol 41:738–45 Freedman AN, Seminara D, Gail MH et al. (2005) Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 97:715–23 Gail MH, Brinton LA, Byar DP et al. (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81:1879–86 Garbe C, Buttner P, Weiss J et al. (1994) Risk factors for developing cutaneous melanoma and criteria for identifying persons at risk: multicenter case-control study of the central malignant melanoma registry of the German Dermatological Society. J Invest Dermatol 102:695–9 Geller AC (2002) Screening for melanoma. Dermatol Clin 20:629–40. viii Goldberg MS, Doucette JT, Lim HW et al. (2007) Risk factors for presumptive melanoma in skin cancer screening: American Academy of Dermatology National Melanoma/Skin Cancer Screening Program experience 20012005. J Am Acad Dermatol 57:60–6 Green AC, Siskind V (2012) Risk factors for limb melanomas compared with trunk melanomas in Queensland. Melanoma Res 22:86–91 Guther S, Ramrath K, Dyall-Smith D et al. (2012) Development of a targeted risk-group model for skin cancer screening based on more than 100,000 total skin examinations. J Eur Acad Dermatol Venereol 26:86–94 Harbauer A, Binder M, Pehamberger H et al. (2003) Validity of an unsupervised self-administered questionnaire for self-assessment of melanoma risk. Melanoma Res 13:537–42 Hosmer DW, Hosmer T, Le Cessie S et al. (1997) A comparison of goodness-offit tests for the logistic regression model. Stat Med 16:965–80
Supplementary material is linked to the online version of the paper at http:// www.nature.com/jid
Imperiale TF, Wagner DR, Lin CY et al. (2003) Using risk for advanced proximal colonic neoplasia to tailor endoscopic screening for colorectal cancer. Ann Intern Med 139:959–65
REFERENCES
Kanetsky PA, Panossian S, Elder DE et al. (2010) Does MC1R genotype convey information about melanoma risk beyond risk phenotypes? Cancer 116:2416–28
Altman DG, Vergouwe Y, Royston P et al. (2009) Prognosis and prognostic research: validating a prognostic model. BMJ 338:b605 Australian Cancer Network Melanoma Guidelines Revision Working Party (2008) Clinical Practice Guidelines for the Managment of Melanoma in Australia and New Zealand
Kvaskoff M, Pandeya N, Green AC et al. (2013) Site-specific determinants of cutaneous melanoma: a case-case comparison of patients with tumors arising on the head or trunk. Cancer Epidemiol Biomarkers Prev 22: 2222–31
Barbini P, Cevenini G, Rubegni P et al. (1998) Instrumental measurement of skin colour and skin type as risk factors for melanoma: a statistical classification procedure. Melanoma Res 8:439–47
Landi MT, Baccarelli A, Calista D et al. (2001) Combined risk factors for melanoma in a Mediterranean population. Br J Cancer 85:1304–10
Bishop DT, Demenais F, Iles MM et al. (2009) Genome-wide association study identifies three loci associated with melanoma risk. Nat Genet 41:920–5 Cancer Council Australia and Australian Cancer Network, Sydney and New Zealand Guidelines Group: Wellington Cho E, Rosner BA, Feskanich D et al. (2005) Risk factors and individual probabilities of melanoma for whites. J Clin Oncol 23:2669–75 Cust AE, Bui M, Goumas C et al. (2014) Contribution of MC1R genotype and novel common genomic variants to melanoma risk prediction. Cancer Epidemiol Biomarkers Prev 23:566–7 Eastham JA, May R, Robertson JL et al. (1999) Development of a nomogram that predicts the probability of a positive prostate biopsy in men with an abnormal digital rectal examination and a prostate-specific antigen between 0 and 4 ng/mL. Urology 54:709–13 English DR, Armstrong BK (1988) Identifying people at high risk of cutaneous malignant melanoma: results from a case-control study in Western Australia. Br Med J 296:1285–8
MacKie RM, Freudenberger T, Aitchison TC (1989) Personal risk-factor chart for cutaneous melanoma. Lancet 2:487–90 MacKie RM, McHenry P, Hole D (1993) Accelerated detection with prospective surveillance for cutaneous malignant melanoma in high-risk groups. Lancet 341:1618–20 Mar V, Wolfe R, Kelly JW (2011) Predicting melanoma risk for the Australian population. Australas J Dermatol 52:109–16 Marrett LD, King WD, Walter SD et al. (1992) Use of host factors to identify people at high risk for cutaneous malignant melanoma. Cmaj 147:445–53 Meads C, Ahmed I, Riley RD (2012) A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat 132:365–77 Moons KG, Altman DG, Vergouwe Y et al. (2009) Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ 338:b606 Olsen CM, Green AC, Neale RE et al. (2012) Cohort profile: the QSkin Sun and Health Study. Int J Epidemiol 41:929–92
Fargnoli MC, Piccolo D, Altobelli E et al. (2004) Constitutional and environmental risk factors for cutaneous melanoma in an Italian population. A case-control study. Melanoma Res 14:151–7
Quereux G, Moyse D, Lequeux Y et al. (2011) Development of an individual score for melanoma risk. Eur J Cancer Prev 20:217–24
Fears TR, Dt Guerry, Pfeiffer RM et al. (2006) Identifying individuals at high risk of melanoma: a practical predictor of absolute risk. J Clin Oncol 24:3590–6
Reilly BM, Evans AT (2006) Translating clinical research into clinical practice: impact of using prediction rules to make decisions. Ann Intern Med 144:201–9
www.jidonline.org 1383
CM Olsen et al. Validating Melanoma Risk Prediction Tools
Smith LA, Qian M, Ng E (2012) Development of a melanoma risk prediction model incorporating MC1R genotype and indoor tanning exposure. Paper presented at: American Society of Clinical Oncology Annual Meeting: June 1-5, 2012; Chicago, IL Spitz MR, Hong WK, Amos CI et al. (2007) A risk model for prediction of lung cancer. J Natl Cancer Inst 99:715–26 Stefanaki I, Panagiotou OA, Kodela E et al. (2013) Replication and predictive value of SNPs associated with melanoma and pigmentation traits in a Southern European case-control study. PLoS One 8: e55712 Usher-Smith JA, Emery J, Kassianos AP et al. (2014) Risk prediction models for melanoma: a systematic review. Cancer Epidemiol Biomarkers Prev 23:1450–63
1384 Journal of Investigative Dermatology (2015), Volume 135
Vuong K, McGeechan K, Armstrong BK et al. (2014) Risk prediction models for incident primary cutaneous melanoma: a systematic review. JAMA Dermatol 150:434–44 Whiteman DC, Green AC (2005) A risk prediction tool for melanoma? Cancer Epidemiol Biomarkers Prev 14:761–3 Williams LH, Shors AR, Barlow WE et al. (2011) Identifying persons at highest risk of melanoma using self-assessed risk factors. J Clin Exp Dermatol Res 2(6). pii: 1000129 Wolff T, Tai E, Miller T (2009) Screening for skin cancer: an update of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med 150:194–8 Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–77