Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists

[ ] Original Research Lung Cancer Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists Heber MacMahon, MB...

Download PDF

1MB Sizes 0 Downloads 13 Views

Report

PDF Reader
Full Text

[

]

Original Research Lung Cancer

Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists Heber MacMahon, MB, BCh; Feng Li, MD, PhD; Yulei Jiang, PhD; and Samuel G. Armato III, PhD

Risk models have been developed that include the subject’s pretest risk proﬁle and imaging ﬁndings to predict the risk of cancer in an objective way. We assessed the accuracy of the Vancouver Lung Cancer Risk Prediction Model compared with that of trainee and experienced radiologists using a subset of size-matched nodules from the National Lung Screening Trial (NLST).

BACKGROUND:

One hundred cases from the NLST database were selected (size range, 4-20 mm), including 20 proven cancers and 80 size-matched benign nodules. Three experienced thoracic radiologists and three trainee radiologists were asked to estimate the likelihood of cancer in each case, ﬁrst independently, and then with knowledge of the model’s risk prediction. The results generated by the model alone also were estimated using receiver operating characteristic (ROC) analysis. The area under the ROC curve (AUC) for each viewing condition was calculated, and statistical signiﬁcance in their differences was tested by using the DorfmanBerbaum-Metz method. METHODS:

RESULTS: Human observers were more accurate (AUC value of 0.85 0.05 [SD]) than was the model (0.77 0.06) in estimating the risk of malignancy (P ¼ .0010), and use of the model did not improve their accuracy (0.84 0.06). Experienced radiologists performed better than did trainees. Human observers could distinguish benign from malignant nodule morphology more accurately than could the model, which relies mainly on nodule size for risk estimation.

Experienced and trainee radiologists had superior ability to predict the risk of cancer in size-matched nodules from a screening trial compared with that of the Vancouver model, and use of the model did not improve their accuracy. CHEST 2019; 156(1):112-119

CONCLUSIONS:

KEY WORDS:

imaging; lung cancer; oncology

ABBREVIATIONS: AUC = area under the ROC curve; BCCA = British Columbia Cancer Agency; NLST = National Lung Screening Trial; PanCan = Pan-Canadian Early Detection of Lung Cancer Study; ROC = receiver operating characteristic AFFILIATIONS: From the Department of Radiology, The University of Chicago, Chicago, IL. Part of this article has been presented at the 102nd Scientiﬁc Assembly and Annual Meeting of the Radiological Society of North America, November 27 to December 2, 2016, Chicago, IL.

112 Original Research

FUNDING/SUPPORT:

This study was funded by Philips Healthcare. Heber MacMahon, MB, BCh, Department of Radiology, MC-2026, The University of Chicago, 5841 S Maryland Ave, Chicago, Illinois 60637; e-mail: [email protected] Copyright Ó 2019 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved. DOI: https://doi.org/10.1016/j.chest.2019.04.002 CORRESPONDENCE TO:

[

156#1 CHEST JULY 2019

]

Lung cancer screening with low-dose CT scanning has been implemented widely in the United States and is under investigation in other parts of the world. It was effective in reducing disease-speciﬁc mortality in one large trial, although concerns remain regarding the large numbers of benign nodules encountered in screening scans and possible harm due to resulting excessive interval scans and unnecessary surgery.1-4 To avoid unnecessary interventions, strict guidelines have been developed that mandate a management strategy for screen-detected nodules, mainly according to nodule size and nodule type.5-8 As experience has accumulated, the threshold nodule size for a positive scan (ie, one that would require either CT scanning sooner than the routine 12-month interval or another procedure) has been raised from 4 to 6 mm, which has substantially reduced the false-positive rate, with only a minimal effect on sensitivity.9 Nonetheless, management decisions based on the likelihood of malignancy must sometimes be made in individual cases, taking into account numerous additional factors, such as the subject’s risk proﬁle, nodule morphology, and nodule location. Although experienced radiologists and physicians can achieve reasonable accuracy by using such estimates, this approach involves an element of

subjectivity. Therefore, risk models6,10-12 have been developed that include multiple parameters, including the subject’s pretest risk proﬁle and imaging ﬁndings, to predict the risk of cancer in a more objective way.

Materials and Methods

in the patient populations, prevalence of lung cancer, and CT scanning technique among the PanCan trial, BCCA trial, and NLST, of which differences in the latter two are most pertinent to this study. The average SD age was 61.6 years 6.5 in the BCCA trial and 61.4 years 5.0 in the NLST. In the BCCA trial, 54.1% of participants were male, vs 59% in the NLST.

Permission was granted for use of the NLST database, and The University of Chicago Institutional Review Board approval was obtained (IRB15-0187). The study was supported by a research contract from Philips Healthcare with The University of Chicago, and Philips personnel participated in the initial case selection and analysis. However, ﬁnal case selection, study design, results, manuscript editing, ROC calculations, and statistical analysis were entirely controlled by the authors. Selection Criteria The NLST database includes 26,715 low-dose CT scans of subjects from a randomized screening trial performed in the United States, and further details of that trial have been reported elsewhere.2 We included only nodules that were identiﬁed in the initial screening scan because the risk model is intended for situations in which no previous scans are available for comparison. The size range for included nodules was limited to 4 to 20 mm. A ﬁnal selection of 80 benign nodules and 20 malignant nodules was made based on the requirements to have reliable proof of malignancy or benignancy, to match the average sizes of benign and malignant nodules, and to have adequate image quality. The sex ratios, age ranges, and nodule sizes were closely matched in the benign and malignant nodule groups. Risk Calculator The risk calculator used in this study has been described in detail previously by McWilliams et al.6 It was developed using images and data from the PanCan trial and was validated using a separate set of images and data from the BCCA trial. There were some differences

chestjournal.org

One such model, which has been variously called the “Vancouver,” “Brock,” or “PanCan” model, was developed and validated by McWilliams et al6 using subjects from the Pan-Canadian Early Detection of Lung Cancer Study (PanCan) and the British Columbia Cancer Agency (BCCA) chemoprevention trial. The model achieved a high accuracy in predicting the likelihood of malignancy, with an area under the receiver operating characteristic (ROC) curve (AUC) >0.90 in the validation set and with impressive results in independent evaluations.13,14 Use of this model has been recommended for estimating risk in screen-detected lung nodules,6-8,13 although its accuracy has not proven superior to that of radiologists or other physicians. Therefore, we performed an observer performance test to assess the accuracy of the Vancouver Lung Cancer Risk Prediction Model compared with trainee and experienced radiologists using a subset of size-matched nodules from the National Lung Screening Trial (NLST).

Both full and parsimonious risk models were described and tested by McWilliams et al,6 the difference being the inclusion of nodule spiculation as an additional parameter in the full model. The full model, which was used in the present study, involves nine parameters: age, sex, and family history of lung cancer, as well as the image-derived features of nodule size, nodule type (solid, part solid, ground glass), upper lobe location, spiculation, number of nodules, and presence of emphysema. The experiment was initially conceived as a pilot test for a proprietary implementation of the Vancouver Lung Cancer Risk Prediction Model, including semiautomated nodule measurement and morphologic analysis. Thus, nodule size, nodule type (solid, part solid, ground glass), upper lobe location, spiculation, number of nodules, and presence of emphysema were estimated initially by the proprietary system. Because we intended to evaluate the risk model independent of the proprietary system, we required the observers to evaluate the automatically identiﬁed image parameters, change any with which they disagreed, and consider their risk estimate accordingly. Because the risk estimates provided by the Vancouver model are based on an actual screening population, in which statistics strongly favor a benign diagnosis, it was necessary to rescale the risk estimate for the selected group of cases used in the observer test, in which one in ﬁve nodules was malignant. Therefore, the model’s risk estimation was

113

Figure 1 – Image display and scoring interface used for observer test showing a benign right upper lobe nodule (blue arrow). A, Whole screen. B, Enlarged view of scoring interface from (A), showing (1) initial observer risk assessment (white arrowhead), (2) model’s risk assessment (blue bar), and (3) observer’s modiﬁed risk assessment after considering model’s estimate (brown bar). GGO ¼ ground-glass opacity.

displayed on a colored scale, from green (benign) to red (malignant), with most of the more suspicious nodules being correctly rated as such (Fig 1). The transformation does not affect the AUC, which indicates the stand-alone accuracy of the model; however, it has the potential to affect how the observers were inﬂuenced by the model and, thereby, the ROC curve of the observers using the model. Observer Test Three attending radiologists and three junior residents participated. Anonymized images were presented on a clinical diagnostic workstation with use of an interface that had been designed for this experiment (Fig 1). Prior to starting the formal test, six training cases were shown. Thereafter, a set of CT scans, including transverse, coronal, and sagittal series, were displayed, centered on the nodule in question, which was indicated by an arrow (Fig 1). The observers had access to the customary workstation tools, including scroll, zoom, and roam, as well as measurement tools and window adjustments. The entire data set was available for review, and observers could switch between imaging planes as desired. Demographic patient data and other information used by the model for risk estimation were also displayed. After reviewing the images and relevant patient data, the observers were required to place a check mark on a quasi-continuous scale of 1 to 100 to indicate their estimate of the likelihood of malignancy for the nodule in question.

114 Original Research

The observers then were asked if they concurred with the automatically extracted image parameters. If they disagreed, they were asked to change the parameters in question (eg, ground-glass vs part-solid vs solid morphology, spiculation, nodule measurements, presence of emphysema), whereupon the model’s risk estimate was instantly calculated and displayed. On the basis of this information, the observers were invited to change their original risk estimates. Thus, the following risk estimation conditions were evaluated: (1) the unassisted performance of the observers, (2) the stand-alone performance of the risk model, and (3) the observers’ performance when using the risk model. Statistical Analysis The results were calculated using ROC analysis, and the DorfmanBerbaum-Metz multireader multicase analysis of jackkniﬁng with analysis of variance was used to test for statistical signiﬁcance in the differences of AUC.15 Smooth ROC curves were estimated by means of maximum likelihood estimation based on the proper binormal model16 for each reader and for each of the three risk estimation conditions, from their estimate of the likelihood of malignancy. The analysis of variance test compared the mean AUC values of the three risk estimation conditions followed by three post hoc pairwise comparisons of the same. This analysis evaluated whether the overall diagnostic accuracy differed among the three risk estimation conditions.

[

156#1 CHEST JULY 2019

]

A

B

All radiologists

1.0

1.0 Radiologists alone

TPF

0.8

C

Experienced radiologists Radiologists alone

0.8 Radiologists using model

0.6

0.8 Radiologists using model

0.6

0.4

0.4

0.2

Model alone

0.2

0.0 0.0

0.2

0.4 0.6 FPF

0.8

1.0

Radiologists using model

0.4

0.2

0.0

Radiologists alone

0.6

Model alone

Model alone

Trainee radiologists

1.0

0.0 0.0

0.2

0.4 0.6 FPF

0.8

1.0

0.0

0.2

0.4 0.6 FPF

0.8

1.0

Figure 2 – Averaged receiver operating characteristic (ROC) curves for the radiologists alone, the model alone, and the radiologists using the model. All six radiologists (A), three experienced radiologists (B), and three trainees (C) were more accurate than was the model in estimating the risk of malignancy in these size-matched nodules. There was no signiﬁcant difference between the readers’ average areas under the ROC curves when unassisted compared with when they were exposed to the model predictions. Experienced radiologists (B) performed better than did trainees (C). FPF ¼ false positive fraction; TPF ¼ true positive fraction.

Results Experienced radiologists and trainees were more accurate than the model in estimating the risk of malignancy in these size-matched nodules (P ¼ .0016) (Fig 2, Table 1). The average SD AUCs for the six readers were 0.85 0.052 when they were unassisted and 0.84 0.055 when they were exposed to the risk model predictions. The average AUC for the model alone was 0.77 0.056. The radiologists’ average AUCs were signiﬁcantly better than those of the model, both when unassisted (P ¼ .0010) and when exposed to the model predictions (P ¼ .0038). However, the difference between the readers’ average AUCs when unassisted and when exposed to the model predictions were not signiﬁcant (P ¼ .68). Experienced radiologists (Fig 2B) performed better than did trainees (Fig 2C), although the difference fell short of signiﬁcance. The accuracy of TABLE 1

ﬁve of the six readers was slightly, but not signiﬁcantly, reduced when exposed to the model’s risk prediction (Table 1). Using greater or less than 50% risk of cancer as the criterion for correct vs incorrect decisions in malignant and benign nodules, and considering averaged observer results (Table 2), both observers and model correctly predicted benignity in 63 of 80 benign cases and incorrectly predicted malignancy in six of 80 benign cases. Observers were more accurate than the model in nine of 80 benign cases and less accurate in only two benign cases. For malignant nodules, observers and model were correct in 10 of 20 and incorrect in ﬁve of 20. However, the radiologists were correct in ﬁve malignant cases for which the model was incorrect, but there were no malignant cases for which the model was

] ROC Performance (AUC SD) of Model and Radiologists for Nodule Malignancy Assessment

Observer

Model Alone

Radiologists Alone

Radiologists With Model

Attending A

0.78 0.06

0.87 0.05

0.84 0.05

Attending B

0.78 0.06

0.86 0.05

0.84 0.06

Attending C

0.80 0.05

0.89 0.04

0.88 0.05

Trainee A

0.76 0.06

0.83 0.05

0.82 0.05

Trainee B

0.77 0.06

0.85 0.05

0.87 0.04

Trainee C

0.76 0.06

0.82 0.06

0.81 0.06

Mean

0.77 0.06

0.85 0.05

0.84 0.06

The Dorfman-Berbaum-Metz multireader multicase analysis of jackkniﬁng with analysis of variance was used to compare the three conditions. There were statistically signiﬁcant differences among these AUC values (P ¼ .0016) when both readers and cases were considered as random samples. The AUC value of the model was signiﬁcantly lower than that of both radiologists alone (P ¼ .0010) and radiologists with the aid of the model (P ¼ .0038). The AUC values of readers without aid and readers with the aid of the model were not signiﬁcantly different (P ¼ .68). AUC ¼ area under the ROC curve; ROC ¼ receiver operating characteristic.

chestjournal.org

115

TABLE 2

] Summary of Scores of Model and Radiologists for Nodule Malignancy Assessment

Group

Malignant Nodules (n ¼ 20)

Benign Nodules (n ¼ 80)

Both observer and model wrong

5

6

Both observer and model correct

10

63

Observer better than model

5

9

Model better than observer

0

2

Data are based on averaged results using a scoring system of 0 to 100 in which #50 indicated more likely benign and >50 indicated more likely malignant.

correct and the radiologists were incorrect. The standalone accuracy of the model, as measured by means of the AUC, was 0.77 for the cases used in the observer test, which is substantially less than that reported for the model for the entire NLST or BCCA databases.6,13

Figures 3 and 4 show a cancer and a benign scar, respectively. In both cases, the radiologists predicted the diagnosis more accurately than did the model, likely because of more complete assessment of nodule morphology.

Discussion Risk prediction models such as the Vancouver model have been developed to allow a more accurate prediction, compared with a subjective estimate, of the risk of cancer in a screen-detected nodule. The Vancouver model has been validated, to the extent that it has produced >90% overall risk prediction accuracy, when applied to two large lung cancer screening databases, and on this basis it has been proposed for use by physicians in clinical practice.7,8,13,14 However, these validation studies have not included a comparison with human observers. The results from the ROC analysis show that the overall diagnostic accuracy of radiologist observers was superior to that of the model with a high degree of statistical

Figure 3 – Example of a cancer for which the radiologist observers predicted the diagnosis more accurately than did the model. The radiologists (on average) estimated a risk of cancer of 83% (likely based on marked nodule spiculation), whereas the model estimated a risk of only 36%. In this instance, observers did not alter their estimate after being shown the model’s assessment. Blue arrow indicates the nodule. The nodule is shown on axial, coronal, sagittal, and magniﬁed axial views, with computer output shown below. Dx ¼ diagnosis; GGO ¼ ground-glass opacity.

116 Original Research

[

156#1 CHEST JULY 2019

]

Figure 4 – Example of a benign scar (designated a nodule in the National Lung Cancer Screening Trial), for which the radiologists predicted the diagnosis more accurately than did the model, likely on the basis of nodule morphology. Blue arrow indicates the nodule. The ﬁnding is shown on axial, coronal, sagittal, and magniﬁed axial views, with computer output shown below. The radiologists (on average) estimated a risk of cancer of only 5%, whereas the model predicted a risk of 60%. Again, the radiologists did not alter their initial estimate after they were shown the model’s assessment. Dx ¼ diagnosis; GGO ¼ ground-glass opacity.

signiﬁcance. Both attending and trainee radiologists were signiﬁcantly more accurate than was the model. Nodule size is a well-established criterion for estimating the risk of malignancy in a nodule and is one of the main parameters used by radiologists.3,5 By using size-matched benign and malignant nodules, we eliminated size as a criterion from the risk estimation to determine whether greater accuracy could be provided by the model, compared with radiologists using size and morphology, as they would do in clinical practice. Our conclusions were reafﬁrmed by those of a larger independent study, performed at another institution, using different observers and a different database, which produced results very similar to ours.17 That study’s investigators conﬁrmed that the model relies mainly on nodule size to predict risk. In that experiment, only the independent performance of observers was compared with the model. However, in clinical practice, physicians are free to agree or disagree with the model’s prediction in selected cases and to

chestjournal.org

modify their risk estimation accordingly, which potentially could result in substantially higher accuracy than that of the model alone or the observer alone. A similar effect has been demonstrated in observer tests that evaluated the impact of computer-assisted nodule detection and classiﬁcation on radiologists’ accuracy.18,19 In those experiments, the accuracy of observers who were exposed to the computed result was signiﬁcantly superior to that of either the computer alone or the unassisted observers. Thus, human skills can be complementary to a computed result. Therefore, in the current experiment, we allowed observers to consider the model’s prediction and to modify their ﬁnal decision accordingly. In this respect, our study more closely simulates clinical practice by addressing the issue of whether radiologists can beneﬁt by considering the model’s results when making management decisions. Review of individual cases from the observer test suggested that observers could distinguish benign from malignant nodule morphology better than could the

117

model. Because the enriched database included sizematched nodules, with a relatively high proportion of cancers, the average size of the nodules was larger than in a screening population. Larger nodules tend to have more distinctive morphology, which can be helpful in predicting their cause.20,21 Whereas the model parameters include nodule size, solid vs subsolid texture, and spiculation, they do not include consideration of overall nodule shape, internal air bronchograms, bronchiectasis, pleural-based conﬁguration, or adjacent architectural distortion, all of which are taken into consideration by radiologists.22 The model factors in multiple additional variables, such as age, sex, family history, and nodule location, and likely does so in a more precise way than do radiologists. However, van Riel et al17 revealed that these additional parameters have a relatively minor effect on the model’s accuracy. Therefore, it appears that the high accuracy of the model achieved in the analysis of large lung cancer screening databases is largely based on the well-established relationship between nodule size and risk of malignancy. Because the great majority of nodules encountered are very small and without distinctive morphology, size-based estimates produce impressive results in large populations.

Acknowledgments Author contributions: H. M. was responsible for the design and supervision of the project, including case selection and observer test supervision. F. L. was responsible for the monitoring of observer tests, analysis of results, and manuscript editing. Y. J. was responsible for statistical analysis and manuscript editing. S. G. A. was responsible for the manuscript review and editing. Financial/nonﬁnancial disclosures: The authors have reported to CHEST the following: H. M. is an advisory board member for Riverain Technologies; a minor stockholder in Hologic, Inc.; and a consultant for GE Healthcare and has received research support from Philips Healthcare; royalty and licensing fees from The University of Chicago, an honorarium from Konica Minolta, and a research contract with BioClinica. F. L. has received royalties and licensing fees for computer-aided diagnosis technology through The University of Chicago. Y. J. has received research grants from Delphinus Medical Technologies, Inc., and QView Medical, Inc., and is a research consultant for QView Medical, Inc.; Quantitative Insights, Inc.; and RadOnc eLearning Center, LLC. S. G. A. has received

118 Original Research

Our study has certain limitations, including the small number of cases and radiologists. However, the consistent results among the observers, as well as concordant results from a similar independent study, indicate that the main conclusion is likely reliable.18

Conclusions We believe that the negative outcome of this experiment has important implications, not only for the use of this speciﬁc model but also for the validation of other forms of software designed to assist physicians in their daily tasks. Although high stand-alone accuracy is a reasonable initial yardstick, direct comparison with the performance of physicians under simulated clinical conditions is critically important prior to clinical implementation. The results shown here suggest that, even with high average accuracy, a computer model may produce misleading results in an important subset of cases, potentially leading to adverse outcomes. Nonetheless, the concept of an objective risk model is logical, and incorporation of additional features into the model in the future, such as using advanced models based on machine learning and texture analysis, will likely provide more nuanced weighting of morphologic nodule features to enable a more robust and clinically useful decision support tool.

royalties and licensing fees through The University of Chicago for computer-aided diagnosis technology. Role of sponsors: The sponsor had no role in the design of the study, the collection and analysis of the data, or the preparation of the manuscript. Other contributions: The authors are grateful to Jonathan Chung, MD; Brittany Dashevsky, MD; Comeron Ghobadi, MD; Stephanie Jo, MD; Steven Montner, MD; and Steven Zangan, MD, for participating as observers.

References 1. National Lung Screening Trial Research Team, Aberle DR, Adams AM, et al. Reduced lung-cancer mortality with lowdose computed tomographic screening. N Engl J Med. 2011;365(5):395-409. 2. National Lung Screening Trial Research Team, Church TR, Black WC. Results of initial low-dose computed tomographic screening for lung cancer. N Engl J Med. 2013;368(21):1980-1991. 3. Aberle DR, DeMello S, Berg CD, et al; National Lung Screening Trial Research Team. Results of the two incidence screenings in the National Lung Screening

Trial. N Engl J Med. 2013;369(10):920931. 4. Detterbeck FC, Mazzone PJ, Naidich DP, Back PB. Screening for lung cancer: diagnosis and management of lung cancer—American College of Chest Physicians evidence-based clinical practice guidelines. Chest. 2013;143(5 suppl):e78Se92S. 5. Horeweg N, van Rosmalen J, Heuvelmans MA, et al. Lung cancer probability in patients with CT-detected pulmonary nodules: a prespeciﬁed analysis of data from the NELSON trial of low-dose CT screening. Lancet Oncol. 2014;15:1332-1341. 6. McWilliams A, Tammemagi MC, Mayo JR, et al. Probability of cancer in pulmonary nodules detected on ﬁrst screening CT. N Engl J Med. 2013;369: 910-919. 7. Pinsky PF, Gierada DS, Black W, et al. Performance of lung-RADS in the National Lung Screening Trial. Ann Intern Med. 2015;162(7):485-491. 8. American College of Radiology. Lung CT screening reporting & data system. http:// www.acr.org/Quality-Safety/Resources/ LungRADS. Accessed April 19, 2019. 9. Yip R, Henschke CI, Yankelevitz DF, et al. CT screening for lung cancer: alternative

[

156#1 CHEST JULY 2019

]

deﬁnitions of positive test result based on the National Lung Screening Trial and International Early Lung Cancer Action Program databases. Radiology. 2014;273: 591-596. 10. Perandini S, Soardi GA, Motton M, Rossi A, Signorini M, Montemezzi S. Solid pulmonary nodule risk assessment and decision analysis: comparison of four prediction models in 285 cases. Eur Radiol. 2016;26(9):3071-3076. 11. Soardi GA, Perandini S, Motton M, Montemezzi S. Assessing probability of malignancy in solid solitary pulmonary nodules with a new Bayesian calculator: improving diagnostic accuracy by means of expanded and updated features. Eur Radiol. 2015;25:155-162. 12. Cassidy A, Myles JP, van Tongeren M, et al. The LLP risk model: an individual risk prediction model for lung cancer. Br J Cancer. 2008;98(2):270-276. 13. White CS, Dharaiya E, Campbell E, Boroczky. The Vancouver Lung Cancer Risk Prediction Model: assessment by using a subset of the National Lung

chestjournal.org

Screening Trial cohort. Radiology. 2017;283:264-272. 14. Wille MMW, van Riel SJ, Saghir Z, et al. Predictive accuracy of the PanCan lung cancer risk prediction model: external validation based on CT from the Danish Lung Cancer Screening Trial. Eur Radiol. 2015;25:3093-3099. 15. Hillis SL, Berbaun KS, Metz CE. Recent development in the Dorfman-BerbaumMetz procedure for multireader ROC study analysis. Acad Radiol. 2008;15:647661. 16. Metz CE, Pan X. “Proper” binormal ROC curves: theory and maximum-likelihood estimation. J Math Psychol. 1999;43(1):133. 17. van Riel SJ, Ciompi F, Winkler Wille MM, et al. Malignancy risk estimation of pulmonary nodules in screening CTs: comparison between a computer model and human observers. PLoS One. 2017;12(11):e0185032. 18. Kobayashi T, Xu XW, MacMahon H, Metz CE, Doi K. Effect of a

computer-aided diagnosis scheme on radiologists’ performance in detection of lung nodules on radiographs. Radiology. 1996;199:843-848. 19. Li F, Aoyama M, Shiraishi J, et al. Radiologists’ performance for differentiating benign from malignant nodules on high-resolution CT using computer-estimated likelihood of malignancy. AJR Am J Roentgenol. 2004;183(5):1209-1215. 20. Erasmus JJ, Connolly JE, McAdams HP, Roggli VL. Solitary pulmonary nodules: part 1—morphologic evaluation for differentiation of benign and malignant lesions. RadioGraphics. 2000;20(1):43-58. 21. Truong MT, Sabloff BS, Ko JP. Multidetector CT of solitary pulmonary nodules. Radiol Clin North Am. 2010;48(1):141-155. 22. Chung K, Jacobs C, Scholten ET, et al. Lung-RADS category 4X: does it improve prediction of malignancy in subsolid nodules? Radiology. 2017;284:264-271.

119

Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists

Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists

Recommend Documents