Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice

Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice

Clinical Radiology xxx (xxxx) xxx Contents lists available at ScienceDirect Clinical Radiology journal homepage: www.clinicalradiologyonline.net Id...

1MB Sizes 0 Downloads 41 Views

Clinical Radiology xxx (xxxx) xxx

Contents lists available at ScienceDirect

Clinical Radiology journal homepage: www.clinicalradiologyonline.net

Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice C.-H. Liang a, b, c, Y.-C. Liu d, M.-T. Wu b, c, e, F. Garcia-Castro f, g, A. Alberich-Bayarri f, g, F.-Z. Wu b, c, e, * a

Department of Biomedical Imaging and Radiological Sciences, National Yang-Ming University, Taipei, Taiwan Faculty of Medicine, School of Medicine, National Yang Ming University, Taipei, Taiwan c Institute of Clinical Medicine, National Yang Ming University, Taipei, Taiwan d Department of Diagnostic Radiology, Xiamen Chang Gung Hospital, China e Department of Radiology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan f Radiology Department, Hospital Universitarioy Polite’cnico La Fe and Biomedical Imaging Research Group (GIBI230), Valencia, Spain g QUIBIM SL, Valencia, Spain b

art icl e i nformat ion Article history: Received 1 March 2019 Accepted 14 August 2019

AIM: To test the diagnostic performance of a deep learning-based system for the detection of clinically significant pulmonary nodules/masses on chest radiographs. MATERIALS AND METHODS: Using a retrospective study of 100 patients (47 with clinically significant pulmonary nodules/masses and 53 control subjects without pulmonary nodules), two radiologists verified clinically significantly pulmonary nodules/masses according to chest computed tomography (CT) findings. A computer-aided diagnosis (CAD) software using a deep-learning approach was used to detect pulmonary nodules/masses to determine the diagnostic performance in four algorithms (heat map, abnormal probability, nodule probability, and mass probability). RESULTS: A total of 100 cases were included in the analysis. Among the four algorithms, mass algorithm could achieve a 76.6% sensitivity (36/47, 11 false negative) and 88.68% specificity (47/ 53, six false-positive) in the detection of pulmonary nodules/masses at the optimal probability score cut-off of 0.2884. Compared to the other three algorithms, mass probability algorithm had best predictive ability for pulmonary nodule/mass detection at the optimal probability score cut-off of 0.2884 (AUCMass: 0.916 versus AUCHeat map: 0.682, p<0.001; AUCMass: 0.916 versus AUCAbnormal: 0.810, p¼0.002; AUCMass: 0.916 versus AUCNodule: 0.813, p¼0.014). CONCLUSION: In conclusion, the deep-learning based computer-aided diagnosis system will likely play a vital role in the early detection and diagnosis of pulmonary nodules/masses on chest radiographs. In future applications, these algorithms could support triage workflow via double reading to improve sensitivity and specificity during the diagnostic process. Ó 2019 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.

* Guarantor and correspondence: F.-Z. Wu, Department of Radiology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan. Tel.: þ886 985 330160. E-mail address: [email protected] (F.-Z. Wu). https://doi.org/10.1016/j.crad.2019.08.005 0009-9260/Ó 2019 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

2

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx

Introduction Chest radiography is the most commonly performed diagnostic examination in daily medical practice because of its easy accessibility, relative low cost, and wide availability in outpatient centres.1,2 Chest radiography interpretation guides subsequent investigations, and could help to determine further laboratory analyses and additional imaging studies if needed. Recent evidence has demonstrated that low-dose computed tomography (CT) screening could identify small subsolid nodules and reduce lung cancer mortality3e6; however, it may be difficult to screen extensively due to cost-effectiveness and national health policies. Chest radiography is usually the initial examination in patients with a clinical suspicion of pulmonary nodules or masses. Errors in pulmonary nodule/mass detection at chest radiography can result in delayed diagnoses and management for both benign and malignant conditions2; however, as a result of the tremendous increase in radiologists’ workloads over the past decade, the overworked radiologists could miss important diagnoses leading to medical malpractice.7e10 Recent advances in deep-learning techniques have enabled outstanding performance in a wide variety of robotic tasks in the areas of perception, planning, localisation, and classification in radiology.11,12 In the recent years, applying deep learning with convolutional neural networks in radiology has shown promising results in various clinical situations, such as pulmonary tuberculosis, pneumonia, and other abnormalities detected at chest radiography, lung nodule detection at CT, image segmentation, and tumour texture analysis.13e19 QUIBIM (Valencia, Spain) has developed a chest radiography classification tool using an algorithm approach that offers a solution to detect pulmonary nodules, which can help radiology departments become more efficient. The present study evaluates the diagnostic performance and efficacy of the QUIBIM Chest X-ray Classifier commercially available software for automatic detection of pulmonary nodules or masses on chest radiographs using four different deep-learning algorithms (heat map, nodule, mass, and abnormal probability algorithms).

Materials and methods Patient and image selection The institutional review board approved this retrospective study, and thus informed consent was waived. For this external validation study, a dataset of 100 patients was retrieved from the Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan database: 47 with clinically significant pulmonary nodules or masses and 53 control subjects without pulmonary nodules, which were validated through CT images according to British Thoracic Society guidelines for the investigation and management of pulmonary nodules.20 A retrospectively obtained independent set of de-identified chest radiographs (postero-

anterior view) from 100 patients who had undergone chest CT within 2 weeks of chest radiography was used to evaluate the performance of the QUIBIM Chest X-ray Classifier. The CT images were used as the reference standard to evaluate the accuracy of the original chest radiography report and the deep-learning algorithms. To establish the reference standard from the CT images, the clinical significance of pulmonary nodules/masses detected at CT were defined according to British Thoracic Society guidelines for the Investigation and Management of Pulmonary Nodules, which indicate the need for further investigation and work-up.

Deep learning and imaging processing The chest radiographic images in digital imaging and communications in medicine (DICOM) format were loaded onto a computer installed with the QUIBIM Chest X-ray Classifier. One hundred de-identified frontal chest radiographs of adult patients were processed with the QUIBIM artificial intelligence (AI) algorithm. The AI module is an ensemble of 14 pathology-specific 19-layer convolutional neural networks, followed by a fully connected layer, that imports a chest radiograph and outputs the probability of the disease along with heat maps localising the areas of the image most indicative of chest disease. The recently released database, ChestX-ray 14, which contains 112,120 chest radiography images labelled with up to 14 different thoracic diseases, including nodule and mass20,21 that were chosen based on frequency of observation and diagnosis in clinical practice. Dense connections and batch normalisation were also implemented to optimise for deep network training.21,22 The algorithms were modified and adopted by QUIBIM Precision and the software has been trained with ChestX-ray14 to estimate the probability of the presence of the 14 chest diseases using chest radiographs: atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, oedema, emphysema, fibrosis, pleural thickening, and hernia. Four different deep-learning algorithms for pulmonary nodules or masses detection were evaluated in this study, which included heat map algorithm, abnormal probability algorithm, nodule probability algorithm, and mass probability algorithm. Heat map is the ability to highlight the most abnormal region correctly on the heat map. Possibility is the index value between 0 and 1. The parameters of chest radiographic imaging datasets were also recorded, including imaging devices such as computed radiography (CR) and digital radiography (DR) units and processing time.

Statistical analysis All statistical analyses were performed with SPSS 17.0 for Windows (SPSS, Chicago, IL, USA) and MedCalc 13.2.2.0 (MedCalc Software, Ostend, Belgium). Continuous variables are presented as meanstandard deviation (SD). To compare the processing time between CR and DR, differences in continuous variables between two groups were compared using the independent Student t-test. On the

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx

external validation test datasets based on the reference standard from CT images, the predictive ability and cut-off values for each algorithm in the prediction of pulmonary nodules/masses were assessed using area under the receiver operating characteristic (AUROC) curves. AUROC between 0.7 and 0.9 was regarded as moderate accuracy according to Greiner et al.23 Youden index and the discriminant ability at each cut-off value for the four algorithms were used to determine the optimal cut-off value to diagnose pulmonary nodules/masses. Cross-tables, sensitivity, specificity, positive likelihood ratio (positive LR), negative likelihood ratio (negative LR), positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy were determined from the optimal cut-off value by the Youden index for different algorithm models in pulmonary nodules or masses detection. To determine and compare the diagnostic performance of four different AI algorithms in pulmonary nodules or masses detection, the optimal diagnostic cut-off values of these algorithms was determined by using the receiver operating characteristic curve (ROC) curve via the Youden index maximises the overall diagnostic accuracy. A comparison of the ROC curves was performed by using a method described by DeLong and colleagues.24 A p-value of <0.05 was considered significant.

Results Demographics and clinical characteristics A total of 100 patients with 100 chest radiographs were enrolled and summarised in Table 1. There were 47 patients with clinically significant pulmonary nodules/ masses and 53 patients with negative findings. The mean age was 55.0713.80 years and 54 (54%) patients were men. Among 100 chest radiographs, 72% of the chest radiographs were produced using DR, and the rest using CR. Average processing time per case was 94.0716.54 seconds, with a maximum of 133 seconds. For imaging Table 1 Baseline characteristics of 100 study subjects. Characteristic Age (years) Gender (male) Chest radiographs DICOM modality CR DR Processing time (seconds) CR DR Positive pulmonary nodule/mass Nodular size (cm) Nodular type Solid nodule Part-solid nodule

p-Value 55.07 13.80 54 (54%)

3

process time using AI, the mean processing time of CR was significantly longer compared to DR (116.8512.27 versus 85.26.33 seconds). Of the 47 pulmonary nodules or masses, 39 (82.97%) were solid nodules and eight (17.02%) were part-solid nodules. The mean nodule size was 4.370.41 cm (range 0.7e13.5 cm). The cross-tables for the best-performing models for pulmonary nodule/mass detection, including the heat map algorithm, abnormal probability algorithm, nodule probability algorithm, and mass probability algorithm are provided in Fig 1. Table 2 shows the sensitivity, specificity, diagnostic accuracy, negative predictive value (NPV), positive predictive value (PPV), likelihood ratio (LR) (þ), and LR () values of the four algorithms of QUIBIM Chest X-ray Classifier at optimal threshold of probability score for pulmonary nodule/mass detection. The sensitivity of the heat map algorithm was 38.3% and the specificity was 98.11% for identifying the most abnormal region. The sensitivity of the abnormal probability algorithm was 74.47% and the specificity was 81.13% for pulmonary nodule/mass detection at the optimal probability score cut-off of 0.4116. The sensitivity of the nodule probability algorithm was 85.11% and the specificity was 64.15% for pulmonary nodule/mass detection at the optimal probability score cut-off of 0.2879. The sensitivity of the mass probability algorithm was 76.6% and the specificity was 88.68% for pulmonary nodule/mass detection at the optimal probability score cut-off of 0.2884. Among these four different algorithms for pulmonary nodules detection, the nodule probability algorithm was the most sensitive algorithm whereas the heat map algorithm was the most specific. The areas under the ROC curves for pulmonary nodule detection were 0.682 (95% confidence interval [CI] 0.581e0.772) for the heat map algorithm, 0.810 (95% CI 0.719e0.882) for the abnormal probability algorithm, 0.813 (95% CI 0.723e0.884) for the nodule probability algorithm, and 0.916 (95% CI 0.844e0.962) for the mass probability algorithm, respectively (Fig 2). Compared to the other three algorithms, the mass probability algorithm had best predictive ability for pulmonary nodule detection at the optimal cut-off of probability score of 0.2884 (AUCMass: 0.916 versus AUCHeat map: 0.682, p<0.001; AUCMass: 0.916 versus AUCAbnormal: 0.810, p¼0.002; AUCMass: 0.916 versus AUCNodule: 0.813, p¼0.014).

Subgroup analysis of detected findings using the four algorithms

28 72 <0.001a 116.8512.27 85.26.33 47 (47%) 4.370.41 (0.7e13.5) 39 8

DICOM, digital imaging and communications in medicine; CR, computed radiography; DR, digital radiography. a For the imaging processing time per case, the mean processing time of CR modality was significantly larger in comparison to DR modality (116.8512.27 versus 85.26.33 seconds).

Detailed distribution of detected nodular diameter and type according to four algorithms is displayed in Table 3. This mass algorithm can detect pulmonary nodule/mass with an average diameter of 4.8722.894 cm. This heat map algorithm can detect and localise pulmonary nodules correctly with an average diameter of 6.0673.029 cm; however, the ability to detect part-solid nodules was relative weak compared to solid nodules for these algorithms. Nodules detected by these algorithms were usually larger than undetectable nodules.

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

4

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx

Figure 1 Flowchart of the 100 consecutive patients and retrospective assessment using the deep-learning Chest X-ray Classifier. Cross-tables for the best-performing models for pulmonary nodule/mass detection, including the heat map algorithm, abnormal probability algorithm, nodule probability algorithm, and mass probability algorithm.

Table 2 ROC analysis results at the threshold to maximise sensitivity and specificity in pulmonary nodule detection across different algorithm models. Algorithm model

Cut-off

ROC

Sensitivity

Specificity

Positive LR

Negative LR

PPV %

NPV %

Accuracy %

Heat map Abnormal probability Nodule probability Mass probability

Identify lesion 0.4116 0.2879 0.2884

0.682 0.810 0.813 0.916

38.30 74.47 85.11 76.60

98.11 81.13 64.15 88.68

20.30 3.95 2.37 6.77

0.63 0.31 0.23 0.26

94.73% 77.78% 67.80% 85.70%

64.19% 78.20% 82.90% 81.00%

70% 78% 74% 83%

ROC, receiver operating characteristic; LR, likelihood ratio; PPV, positive predictive value; NPV, negative predictive value.

Figure 2 Comparison of ROC curves for the four algorithms. Compared to the other three algorithms, the mass probability algorithm had best predictive ability for pulmonary nodule/mass detection at the optimal probability score cut-off of 0.2884 (AUCMass: 0.916 versus AUCHeat map: 0.682, p<0.001; AUCMass: 0.916 versus AUCAbnormal: 0.810, p¼0.002; AUCMass: 0.916 versus AUCNodule: 0.813, p¼0.014).

Diagnostic performance according to nodule size across the different algorithm models is summarised in Electronic Supplementary Material Table S1. For the four algorithm

models, the Chest X-ray Classifier software appears to have superior diagnostic accuracy for pulmonary nodules 3 cm than pulmonary nodules <3 cm. Comparison of the

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx Table 3 Detailed distribution of detected nodular diameter (cm) and type according to the four algorithms.

Heat map algorithm Nodule size Nodule type (solid nodule Abnormal algorithm Nodule size Nodule type (solid nodule Nodule algorithm Nodule size Nodule type (solid nodule Mass algorithm Nodule size Nodule type (solid nodule

%)

%)

%)

%)

Correct in detection

Failure in detection

n¼18 6.0673.029 100% n¼35 4.8542.876 79.50% n¼40 4.6302.892 87.50% n¼36 4.8722.894 88.90%

n¼29 3.3242.029 72.4% n¼12 2.9751.955 20.50% n¼7 2.9141.360 57.10% n¼11 2.7451.535 63.6%

p-Value

0.001 0.017 0.018 0.176 0.023 0.084 0.003 0.073

diagnostic performances of CR versus DR is summarised in Electronic Supplementary Material Table S2. DR appears to have superior diagnostic performance than CR for the four algorithms.

Discussion To the authors’ knowledge, this is the first study to externally validate the diagnostic performance of AI deeplearning algorithms for the detection of clinically significant pulmonary nodules/masses, which were validated by chest CT. The study reveals three main major findings: first, among the four different algorithms, the mass algorithm had the best diagnostic accuracy with an AUC of 0.916. Second, a rapid imaging processing time of 94.0716.54 seconds per case could help make workflow more efficient. Therefore, the QUIBIM Chest X-ray Classifier could be used to automatically evaluate all chest radiographs efficiently despite the large volume of chest radiographs prescribed in the outpatient setting. Third, implementation of deep learning with different algorithms could help radiologists improve medical imaging care with reduced error by initiating selective double reading. The present results show that the mass algorithm demonstrates the best diagnostic performance level in the detection of pulmonary nodules/masses with cut-off value of 0.2884 (probability score). This algorithm can detect pulmonary nodules/masses with average diameter of 4.872 cm, but there are still limits to the ability to detect smaller size or part-solid lesions. Among the algorithms for pulmonary nodule detection, the heat map algorithm demonstrated the highest specificity with a very high positive LR of 20.30. The heat map algorithm has the ability to automatically identify and localise the lesion correctly. Fig 3 shows a solid nodule of 2.3 cm, diagnosed as lung cancer, in the left middle lung field, which was detected by the heat map algorithm, with an abnormality score of 0.67. This model, however, has a high false-negative rate with an average detectable nodule size of approximately 6.067 cm. The nodule probability algorithm demonstrated the highest sensitivity.

5

Using these different algorithms strategies, the deeplearning algorithm could potentially assist radiologists in detecting pulmonary nodules on chest radiographs while minimising both false positives and false negatives. These algorithms could support radiology workflow by flagging suspicious cases prioritised on the radiology worklist so that they can be reviewed first.25,26 For an instant medical alert system, the heat map algorithm could be used (with highest specificity in support of rule in pulmonary nodules/ mass) to assist the radiographers or radiologists in flagging suspicious cases in a timely manner via integration with the PACS (picture archiving and communication system). To streamline the reporting process assisted by AI Chest X-ray Classifier, an early warning score-based alert could be integrated into the reporting system. Therefore, the radiologists could pay more attention to the plain films with a higher probability score according to the cut-off value of the mass probability score (0.2884). For plain chest films with the lowest probability score, radiologists can exclude lesions with greater confidence. To exclude lesions, a safer threshold should be set (an increase in sensitivity by taking a lower threshold value than the cut-off of the nodule probability score) to ensure that lesions are not misdiagnosed according to the cut-off value (0.2879) of the nodule probability score. Further prospective study is warranted to further validate these findings. Regarding the false-positive and false-negative results, there is still room for improvement. For example, a faint nodule with false-negative findings, which was missed by the heat map algorithm, was diagnosed as pulmonary adenocarcinoma manifesting as a part-solid nodule of 2.9 cm in the right upper lobe as demonstrated on CT in Fig 4; however, it was correctly detected with an abnormality probability score of 0.48. Another example of a falsenegative finding missed by the heat map and abnormal algorithm is shown in Fig 5. In this case, the part-solid nodular opacity is subtle in the right lower lobe. One example of a false positive by both the AI algorithm classifiers (nodule and abnormality probability algorithm) is shown in Fig 6. The chest radiography is of a 73-year-old woman with increasing lung marking in both lower lung fields, confirmed as normal at chest CT. There are some strategies that can help to reduce false alarms by standard double reading with AI. At the same time, standard double reading with AI interpretation can reduce the workload of radiologists and reduce misdiagnosis caused by excessive work.27e29 This study has several important limitations. First, the mass algorithm has the best diagnostic accuracy in pulmonary nodule/mass detection in comparison with the other algorithms; however, this algorithm can detect pulmonary nodule/mass with average diameter of 4.872 cm, but there are still limits to the ability to detect smaller size or part-solid lesions. This phenomenon actually also occurs when experienced thoracic radiologists view the same lesions.30 Future work will evaluate the value of interactive database enhancement and continuous training for optimisation algorithms for pulmonary subsolid nodule(s) <3 cm. Second, the present study evaluated deep-learning

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

6

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx

Figure 3 A solid nodule 2.3 cm, diagnosed as lung cancer, in the left middle lung field properly detected by the heat map algorithm, with an abnormality score of 0.67.

Figure 4 A faint nodule with false-negative findings missed by the heat map algorithm, which was diagnosed as pulmonary adenocarcinoma manifesting as a part-solid right upper lobe nodule of 2.9 cm as demonstrated at CT; however, it was properly detected with an abnormality score of 0.48.

Figure 5 A faint nodule with false-negative findings missed by both classifiers (heat map and abnormal probability score), which was diagnosed as pulmonary adenocarcinoma manifesting as a right lower lobe part-solid nodule of 1.6 cm as demonstrated at CT.

performance in a retrospective setting. Future work will aim to drive implementation of deep learning to aid the radiologists in detecting lung nodules in real-time. Third,

previous studies have demonstrated blinds spots in chest radiographs, which have been shown to contribute to detection and interpretation errors.2 Further work aiming

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx

7

Figure 6 False positive nodule by both AI algorithm classifiers (nodule and abnormality score). The chest radiography showed that this 73-yearold woman with increasing lung marking in both lower lung fields, which was diagnosed as a normal chest finding at CT.

to investigate the diagnostic performance of deep learning for blinds spots in chest radiography is warranted. Fourth, the heat map algorithm, which focused on the automatic identification and localisation of pulmonary nodules/ masses, has a good PPV (94.73%), is highly specific (98.11%), but has poor sensitivity (38.3%). This algorithm can detect pulmonary nodules with an average diameter of 6.0673.029 cm. Therefore, reliable identification and localisation of smaller pulmonary nodules with deep learning is critical to clinical implementation in real-world practice. Finally, the present study included chest radiographs generated by both DR and CR technologies and the processing time of CR was found to be much longer than DR. This may be attributed to differences in the principle of image processing between CR and DR. In the future, the diagnostic accuracy of convolutional neural networks between CR and DR should be investigated. In conclusion, deep-learning based CAD systems will likely play a vital role in the early detection and diagnosis of pulmonary nodules/masses on chest radiographs. In future applications, these algorithms could support triage workflow with double reading to improve sensitivity and specificity during the diagnostic process.

Conflicts of interest The authors declare the following financial interests/ personal relationships which may be considered as potential competing interests: Fabio GarciaCastro and Angel Alberich-Bayarri are founders of the spin-off company QUIBIM SL. The other authors declare that they have no competing interests.

Acknowledgements This study was supported by grants from Kaohsiung Veterans General Hospital, Taiwan, R.O.C. (nos. VGHKS103-

015, VGHKS104-048, VGHKS105-064, MOST108-2314-B-075B-008-).

VGHKS108-159,

Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.crad.2019.08.005.

References 1. Brogdon BG, Kelsey CA, Moseley Jr RD. Factors affecting perception of pulmonary lesions. Radiol Clin N Am 1983;21(4):633e54. 2. de Groot PM, Carter BW, Abbott GF, et al. Pitfalls in chest radiographic interpretation: blind spots. Sem Roentgenol 2015;50(3):197e209. 3. Aberle DR, Adams AM, Berg CD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365(5):395e409. 4. Wu FZ, Chen PA, Wu CC, et al. Semiquantative visual assessment of subsolid pulmonary nodules 3 cm in differentiation of lung adenocarcinoma spectrum. Sci Rep 2017;7(1):15790. 5. Hsu HT, Tang EK, Wu MT, et al. Modified Lung-RADS improves performance of screening LDCT in a population with high prevalence of nonsmoking-related lung cancer. Acad Radiol 2018;25(10):1240e51. 6. Wu FZ, Huang YL, Wu CC, et al. Assessment of selection criteria for lowdose lung screening CT among Asian ethnic groups in Taiwan: from mass screening to specific risk-based screening for non-smoker lung cancer. Clin Lung Cancer 2016;17(5):e45e56. 7. Bhargavan M, Kaye AH, Forman HP, et al. Workload of radiologists in United States in 2006e2007 and trends since 1991e1992. Radiology 2009;252(2):458e67. 8. Levin DC, Rao VM, Parker L, et al. Analysis of radiologists’ imaging workload trends by place of service. J Am Coll Radiol 2013;10(10):760e3. 9. Brady AP. Error and discrepancy in radiology: inevitable or avoidable? Insights Imaging 2017;8(1):171e82. 10. Forrest JV, Friedman PJ. Radiologic errors in patients with lung cancer. West J Med 1981;134(6):485e90. 11. Shin HC, Roth HR, Gao M, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 2016;35(5):1285e98. 12. Novikov AA, Lenis D, Major D, et al. Fully convolutional architectures for multiclass segmentation in chest radiographs. IEEE Trans Med Imaging 2018;37(8):1865e76. 13. Cicero M, Bilbily A, Colak E, et al. Training and validating a deep convolutional neural network for computer-aided detection and

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005

8

14.

15.

16.

17.

18.

19.

20.

21.

C.-H. Liang et al. / Clinical Radiology xxx (xxxx) xxx classification of abnormalities on frontal chest radiographs. Invest Radiol 2017;52(5):281e7. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 2017;284(2):574e82. Liu V, Clark MP, Mendoza M, et al. Automated identification of pneumonia in chest radiograph reports in critically ill patients. BMC Med Inform Decis Making 2013;13(1):90. Hua K-L, Hsu C-H, Hidayati SC, et al. Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets Ther 2015;8:2015e22. Zhang W, Li R, Deng H, et al. Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 2015;108:214e24. Cheng J-Z, Chen C-M, Shen D. Chapter 9: deep learning techniques on texture analysis of chest and breast images. In: Depeursinge A, Al-Kadi O S, Mitchell JR, editors. Biomedical texture analysis. London: Academic Press; 2017. p. 247e79. Nam JG, Park S, Hwang EJ, et al. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 2019 Jan;290(1):218e28. Baldwin DR, Callister MEJ. The British Thoracic Society guidelines on the investigation and management of pulmonary nodules. Thorax 2015; 70(8):794. Rajpurkar P, Irvin J, Ball RL, et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 2018;15(11). e1002686-e1002686.

22. Wang X, Peng Y, Lu L, et al. ChestX-Ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI; 2017. p. 3462e71. https://doi.org/10.1109/CVPR.2017.369. 23. Greiner M, Pfeiffer D, Smith RD. Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Prev Vet Med 2000;45(1e2):23e41. 24. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44(3):837e45. 25. Lee J-G, Jun S, Cho Y-W, et al. Deep learning in medical imaging: general overview. Korea J Radiol 2017;18(4):570e84. 26. Choy G, Khalilzadeh O, Michalski M, et al. Current applications and future impact of machine learning in radiology. Radiology 2018;288(2): 318e28. th A, Unger Z, et al. Detecting and classifying lesions in 27. Ribli D, Horva mammograms with deep learning. Sci Rep 2018;8(1):4165. 28. Geijer H, Geijer M. Added value of double reading in diagnostic radiology, a systematic review. Insights Imaging 2018;9(3):287e301. 29. Ciatto S, Del Turco MR, Burke P, et al. Comparison of standard and double reading and computer-aided detection (CAD) of interval cancers at prior negative screening mammograms: blind review. Br J Cancer 2003;89(9):1645e9. 30. del Ciello A, Franchi P, Contegiacomo A, et al. Missed lung cancer: when, where, and why? Diagn Interv Radiol 2017;23(2):118e26.

Please cite this article as: Liang C-H et al., Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice, Clinical Radiology, https://doi.org/10.1016/j.crad.2019.08.005