Real-World Performance of Computer-Aided Diagnosis System for Thyroid Nodules Using Ultrasonography

Real-World Performance of Computer-Aided Diagnosis System for Thyroid Nodules Using Ultrasonography

ARTICLE IN PRESS Ultrasound in Med. & Biol., Vol. 00, No. 00, pp. 17, 2019 Copyright © 2019 World Federation for Ultrasound in Medicine & Biology. Al...

790KB Sizes 0 Downloads 25 Views

ARTICLE IN PRESS Ultrasound in Med. & Biol., Vol. 00, No. 00, pp. 17, 2019 Copyright © 2019 World Federation for Ultrasound in Medicine & Biology. All rights reserved. Printed in the USA. All rights reserved. 0301-5629/$ - see front matter

https://doi.org/10.1016/j.ultrasmedbio.2019.05.032

 Original Contribution REAL-WORLD PERFORMANCE OF COMPUTER-AIDED DIAGNOSIS SYSTEM FOR THYROID NODULES USING ULTRASONOGRAPHY TAGEDPHYE LIN KIM, EUN JU HA, and MIRAN HANTAGEDEN Department of Radiology, Ajou University School of Medicine, Suwon, South Korea (Received 26 February 2019; revised 29 May 2019; in final from 30 May 2019)

Abstract—This study evaluated the diagnostic performance of a commercially available computer-aided diagnosis (CAD) system (S-Detect 1 and S-Detect 2 for thyroid) for detecting thyroid cancers. Among 218 thyroid nodules in 106 patients, the sensitivity, specificity, positive predictive value, negative predictive value and accuracy of the CAD systems were 80.2%, 82.6%, 75.0%, 86.3% and 81.7%, respectively, for the S-Detect 1 and 81.4%, 68.2%, 62.5%, 84.9% and 73.4%, respectively, for the S-Detect 2. The inter-observer agreement between the CAD system and radiologist for the description of calcifications was fair (kappa = 0.336), while the final diagnosis and each ultrasonographic descriptor showed moderate to substantial agreement for the S-Detect 2. To conclude, the current CAD systems had limited specificity in the diagnosis of thyroid cancer. One of the main limitations of the S-Detect 2 was its inaccuracy in recognizing calcifications, which meant that differentiation had to be undertaken by the radiologist. (E-mail: [email protected]) © 2019 World Federation for Ultrasound in Medicine & Biology. All rights reserved. Key Words: Artificial intelligence, Computer-aided diagnosis, Thyroid nodule, Thyroid cancer, Ultrasonography.

studies have reported on its potential role in thyroid cancer diagnosis and have reported comparable or higher diagnostic performances than experienced radiologists, suggesting a tremendous future potential (Chang et al. 2016; Choi et al. 2017; Jeong et al. 2018; Li et al. 2019; Yoo et al. 2018). Although most are based on algorithms developed by the researchers (Chang et al. 2016; Li et al. 2019), a few are based on a commercial system already available in practice (Choi et al. 2017; Yoo et al. 2018). The S-Detect series (Samsung Medison Co. Ltd., Seoul, Korea) are CAD systems that are already integrated into a commercially available US platform for use on the thyroid. The S-Detect 1 for the thyroid is based on the support vector machine models in ML techniques, and the S-Detect 2 for thyroid is based on the convolutional neural network-based DL techniques. Currently, several studies have reported on the diagnostic performance of S-Detect 1, but no data exist for S-Detect 2 in practice, until now (Choi et al. 2017; Yoo et al. 2018). The aim of this study was to retrospectively evaluate the real-world diagnostic performance of the newly commercially available CAD systems, S-Detect series for thyroid, in thyroid cancer diagnosis and to assess their future developmental directions.

INTRODUCTION The use of artificial intelligence (AI) in medicine is currently of great interest, especially with regard to the diagnostic analysis of radiologic images (Choy et al. 2018; Hosny et al. 2018; Lee et al. 2017; Park and Han 2018; Park and Kressel 2018). Current research is focused on the application of AI in radiology, and most of the results are promising, highlighting new and exciting opportunities. However, concerns about the adoption of AI tools in clinical practice are also increasing since they are in their infancy and not yet ready to be used in a clinical setting (Park and Han 2018; Park and Kressel 2018). Careful and meticulous confirmation of their clinical performance and use before their adoption is needed. In thyroid imaging, several computer-aided diagnosis (CAD) systems have been used for thyroid cancer diagnosis in ultrasonography (US), based on texture analysis, machine learning (ML) and deep learning (DL) techniques (Sollini et al. 2018). To date, most Address correspondence to: Eun Ju Ha, MD, PhD, Department of Radiology, Ajou University School of Medicine, Wonchon-Dong, Yeongtong-Gu, Suwon 443-380, South Korea. E-mail: [email protected]

1

ARTICLE IN PRESS 2

Ultrasound in Medicine & Biology

MATERIALS AND METHODS Patients This retrospective study was approved by our institutional review board, and written informed consent was obtained from all the patients before they underwent US. Between June and September 2016, a total of 106 consecutive patients with 218 thyroid nodules (5 mm in diameter) who underwent an US-guided fine needle aspirate (FNA) or an US examination before scheduled surgery were enrolled (29 men and 77 women; mean age, 48.0 y; range: 2281 y). A malignant nodule was diagnosed when malignancy was evident in the surgical specimen. All malignant cases underwent thyroidectomy. A benign nodule was diagnosed when any one of the following criteria were met: (i) confirmation of benign status in a surgical specimen; (ii) benign histology of a core needle biopsy or benign cytology of an FNA. Inconclusive FNA results or any benign cytologic results with a previous history of atypia of undetermined significance, suspicion for follicular neoplasm or suspicion for malignancy were excluded in this study (Cibas and Ali 2009). US image acquisition and analysis All US examinations were performed using a 312 MHz linear probe and a real-time US system (RS80 A; Samsung Medison Co. Ltd.). An experienced radiologist (E.J.H.) specializing in thyroid imaging (with 11 y of clinical experience in the evaluation of thyroid US data) performed all the US examinations. S-Detect 1 and S-Detect 2 for thyroid (Samsung Medison Co. Ltd.), which are CAD systems integrated into a commercially available US system (RS80 A and RS85 A; Samsung Medison Co. Ltd.), were used in this study. Since this study was designed retrospectively, S-Detect evaluations were performed offline. The CAD data were obtained by the same radiologist using the same representative images in both S-Detect 1 and S-Detect 2. The CAD data were obtained from transverse planes by manually setting a region of interest around the lesion. The software automatically calculated the mass contours and evaluated the US features of the mass, including the composition (solid, partially cystic or cystic), shape (oval-to-round or irregular), orientation (parallel or non-parallel), margins (well-defined, ill-defined or spiculated), echogenicity (hyperechoic/ isoechoic or hypoechoic) and spongiform status. Regarding calcification, this was classified as none, microcalcification, macrocalcification or rim calcification only by S-Detect 2, while S-Detect 1 was not able to detect any calcifications (Fig. 1). In terms of the margins, the operator chose one of four options suggested by the software. The operator did not repeat the process or did not make

Volume 00, Number 00, 2019

a correction. The CAD systems were set up to provide a dichotomous outcome, and the nodule was finally diagnosed as benign or malignant. Grayscale US images were evaluated by the radiologist (E.J.H) for size, internal content, echogenicity, shape, orientation, margin and the presence or absence of calcification (Shin et al. 2016). The nodule contents were categorized as solid (no obvious cystic content), partially cystic or cystic (pure cyst or almost entirely cystic content). The predominant echogenicity was categorized as hypoechogenicity (marked or mild), or hyperechogenicity/isoechogenicity by reference to that of the normal portion of the thyroid gland and the anterior neck muscle. Anechoic echogenicity was applied to cystic or almost completely cystic nodules. Shape was categorized as ovoid-to-round or irregular, and orientation was categorized as parallel (when the anteroposterior diameter of the nodule was equal to or less than the transverse or longitudinal diameter) or non-parallel (when the anteroposterior diameter of the nodule was longer than the transverse or longitudinal diameter in the transverse or longitudinal plane, respectively). The margins were categorized as smooth, spiculated/microlobulated or illdefined. Calcification was classified as none, microcalcification (tiny, punctate echogenic foci, 1 mm or less in diameter, with or without posterior shadowing), macrocalcification (echogenic foci larger than 1 mm in diameter) or rim calcification (peripheral curvilinear or eggshell-like calcification). Data and statistical analyses Differences in patient demographic data, gray-scale US features, and CAD diagnoses (benign and malignant) were evaluated using the x2 or Fisher exact test. The student’s t-test was employed to compare quantitative variables. The diagnostic performances of the CAD systems, the radiologist and the radiologist assisted by the CAD system for thyroid cancer were evaluated by calculating the sensitivities, specificities, positive predictive values (PPVs), negative predictive values (NPVs) and accuracy rates and were compared using a McNemar test. The areas under the receiver operating characteristic curves (AUROCs) were calculated with 95% confidence intervals (CIs). The test was considered positive when either radiologist or system defined a case as “positive.” The extent of inter-observer agreement (the kappa value) between the CAD system and the radiologist in terms of the descriptions of the US characteristics was determined. The level of agreement for Cohen’s kappa was defined as follows: < 0.20 = poor agreement, 0.210.40 = fair agreement, 0.410.60 = moderate agreement, 0.610.80 = substantial agreement and > 0.80 = good agreement.

ARTICLE IN PRESS Computer-Aided Diagnosis of Thyroid Nodules on US  H. L. KIM et al.

3

Fig. 1. An ultrasonography (US) image of a thyroid nodule acquired with the S-Detect 1 and S-Detect 2. (a) A solid hypoechoic nodule with suspicious US features is evident in the left thyroid gland. (b) A region of interest is manually drawn around the lesion. (c, d) The CAD software automatically calculates the mass contours (green contour) and presents the US features on the right of the screen and a possible diagnosis as a malignant nodule on the bottom (Left: S-Detect 1, Right: S-Detect 2). CAD = computer-aided diagnosis; US = ultrasonography.

All statistical analyses were performed using SPSS for Windows (ver. 23.0; IBM Corp., Armonk, NY, USA) and MedCalc for Windows (ver. 15.0; MedCalc, Ostend, Belgium). A significant difference was defined as a p value < 0.05. RESULTS Clinical and sonographic features of benign and malignant thyroid nodules The mean nodule diameter was 1.2 § 0.8 cm (range: 0.55.7 cm). The final diagnoses of the 218 nodules were 132 (60.6 %) benign and 86 (39.4 %) malignant. All malignant diagnoses were made after surgical resection and included 79 classic papillary thyroid carcinomas (PTCs) and 7 follicular variant PTCs. The US features of the benign and malignant nodules are summarized in Table 1. The mean diameter of the benign nodules was 1.2 § 0.9 cm, which was not statistically different from that of the malignant nodules (1.2 § 0.7 cm; p = 0.291). Several US features, including solid component, hypoechogenicity, a non-parallel orientation, spiculated/microlobulated margins and microcalcification, were significantly associated with thyroid cancer. The diagnosis of “possibly malignant” presented

by the CAD systems was also a significant factor in the detection of thyroid cancers (both p < 0.001). Diagnostic performance of the CAD systems (S-Detect 1 and 2), the radiologist and the radiologist assisted by the CAD system Table 2 summarizes the diagnostic performances of the CAD systems (S-Detect 1 and S-Detect 2), the radiologist and the radiologist assisted by the CAD system for detecting thyroid cancer. Both CAD systems exhibited no statistically significant differences in terms of sensitivity compared with the radiologist (80.2% vs. 84.9%, p = 0.454; 81.4% vs. 84.9%, p = 0.629, respectively), while the specificities and accuracies were significantly lower in the CAD systems than the radiologist (82.6% vs. 96.2%; 68.2% vs. 96.2%, all p < 0.001, respectively, for specificity) (81.7% vs. 91.7%; 73.4% vs. 91.7%, all p < 0.001, respectively, for accuracy). The sensitivity was not significantly different (p > 0.999) between the CAD systems; however, the specificity and accuracy were significantly lower in S-Detect 2 than in S-Detect 1 (p = 0.004 and p = 0.025, respectively). When the CAD systems were used to assist the radiologist, the diagnostic sensitivity significantly improved (84.9% vs. 91.9%, p = 0.031; 84.9% vs. 93.0%, p = 0.016),

ARTICLE IN PRESS 4

Ultrasound in Medicine & Biology

Table 1. Clinical and sonographic features of benign and malignant thyroid nodules Characteristic

Diameter (cm) Mean § SD Range Internal content Solid Partially cystic Cystic Echogenicity Hypoechogenicity Iso-/hyperechogenicity Anechoic Shape Round-to-oval Irregular Orientation Parallel Non-parallel Margin Smooth Spiculated/microlobulated Ill-defined Calcification None Microcalcification Macrocalcification Rim calcification Spongiform Absence Presence CAD system diagnosis (S-Detect 1) Possibly benign Possibly malignant CAD system diagnosis (S-Detect 2) Possibly benign Possibly malignant Radiologist’s diagnosis Possibly benign Possibly malignant

Benign Nodules (n = 132)

Malignant Nodules (n = 86)

1.2 § 0.9 0.55.7

1.2 § 0.7 0.53.9

74 50 8

79 7 0

40 84 8

74 12 0

130 2

82 4

122 10

50 36

104 1 27

15 57 14

116 3 8 5

16 62 7 1

130 2

86 0

109 23

17 69

90 42

16 70

127 5

13 73

p Value

0.291 < 0.001

< 0.001

0.167 < 0.001 < 0.001

Volume 00, Number 00, 2019

Figure 2 shows the AUROCs for the CAD systems, the radiologist and the radiologist assisted by the CAD system, in terms of differentiation of benign from malignant nodules. The AUROCs were 0.905 (95% CI, 0.8590.941) for the radiologist, followed by 0.865 (0.8120.907) for the S-Detect 1assisted radiologist, 0.814 (0.7560.863) for the S-Detect 1, 0.802 (0.7430.853) for the S-Detect 2assisted radiologist and 0.748 (0.6850.804) for the S-Detect 2. Extent of inter-observer agreement between the CAD systems and the radiologist The extent of the agreement for the final diagnosis between the CAD system and the radiologist was 82.6% (180/218) for S-Detect 1 and 74.3% (162/218) for S-Detect 2. For S-Detect 2, the inter-observer agreement for the description of calcifications was the lowest and remained at fair (kappa = 0.336), while each US descriptor showed moderate to substantial agreement. The extent of inter-observer agreement was moderate (kappa = 0.471) for the final diagnosis (Table 4).

< 0.001

DISCUSSION 0.520 < 0.001

< 0.001

< 0.001

CAD = computer-aided diagnosis; SD = standard deviation.

but both the specificity and the accuracy decreased (96.2% vs. 81.1%; 96.2% vs. 67.4%, all p < 0.001, for specificity) (91.7% vs. 85.3%, p = 0.023; 91.7% vs. 77.5%, p < 0.001, for accuracy) (Table 3).

The findings of this retrospective study show that the new, commercially available CAD systems, S-Detect, could achieve comparable sensitivities to the radiologist in the diagnosis of thyroid cancer; however, it had significantly lower accuracy and specificity compared with the radiologist in a real-world setting. One of the main limitations of the S-Detect 2 was its inaccuracy and poor detection rate of microcalcifications. Therefore, these systems have to be compensated for by the radiologist, which will eventually increase the accuracy of the CAD system. The widespread use of high-resolution US, combined with increased medical surveillance and access to health care services, has markedly increased the detection of thyroid nodules and increased the number of FNAs (Haugen 2017; Shin et al. 2016). However, since thyroid cancers are slow growing and less aggressive than other malignancies, there are concerns regarding over-diagnosis and over-treatment (Ahn et al. 2014). Although the delayed diagnosis of thyroid cancers may

Table 2. Diagnostic performance of the computer-aided diagnosis system (S-Detect 1 vs. S-Detect 2) and the radiologist Diagnostic Measure

Radiologist

S-Detect 1

S-Detect 2

p Value*

p Valuey

p Valuez

Sensitivity (%) Specificity (%) PPV (%) NPV (%) Accuracy (%)

84.9 (73/86) 96.2 (127/132) 93.6 (73/78) 90.7 (127/140) 91.7 (200/218)

80.2 (69/86) 82.6 (109/132) 75.0 (69/92) 86.3 (109/126) 81.7 (178/218)

81.4 (70/86) 68.2 (90/132) 62.5 (70/112) 84.9 (90/106) 73.4 (160/218)

0.454 < 0.001

0.629 < 0.001

> 0.999 0.004

< 0.001

< 0.001

0.025

CAD = computer-aided diagnosis; NPV = negative predictive value; PPV = positive predictive value. * p Value is that of the radiologist vs. the CAD system (S-Detect 1) comparison. y p Value is that of the radiologist vs. the CAD system (S-Detect 2) comparison. z p Value is that of the two different CAD systems.

ARTICLE IN PRESS Computer-Aided Diagnosis of Thyroid Nodules on US  H. L. KIM et al.

5

Table 3. Diagnostic performance of the computer-aided diagnosis systemassisted radiologist and the radiologist Diagnostic Measure

Radiologist

S-Detect 1Assisted Radiologist

S-Detect 2Assisted Radiologist

p Value*

p Valuey

Sensitivity (%) Specificity (%) PPV (%) NPV (%) Accuracy (%)

84.9 (73/86) 96.2 (127/132) 93.6 (73/78) 90.7 (127/140) 91.7 (200/218)

91.9 (79/86) 81.1 (107/132) 76.0 (79/104) 92.2 (107/116) 85.3 (186/218)

93.0 (80/86) 67.4 (89/132) 65.0 (80/123) 93.7 (89/95) 77.5 (169/218)

0.031 < 0.001

0.016 < 0.001

0.023

< 0.001

CAD = computer-aided diagnosis; NPV = negative predictive value; PPV = positive predictive value. * p Value is that of the radiologist vs. CAD systemassisted radiologist (S-Detect 1) comparison. y p Value is that of the radiologist vs. the CAD systemassisted radiologist (S-Detect 2) comparison.

Fig. 2. Comparison of receiver operating characteristic curves for the CAD systems (S-Detect 1 and S-Detect 2), radiologist and CAD-assisted radiologist in thyroid cancer diagnosis. CAD = computed-aided diagnosis.

influence the prognosis, unnecessary FNAs place substantial burdens on healthcare systems and cause considerable anxiety to patients. Thus, the current issue in thyroid cancer diagnosis is how to increase specificity and accuracy while maintaining sensitivity in a certain level, based on a combination of various US features and size cut-offs for FNA (Ha et al. 2018a; Ha et al. 2018b; Haugen 2017; Shin et al. 2016). Therefore, the Table 4. Inter-observer variation between the CAD system and the radiologist in terms of the description of the US features of the thyroid nodules Characteristic

S-Detect 1

S-Detect 2

Composition Shape Orientation Margin Echogenicity Calcification Final diagnosis

0.796 Not available 0.659 0.360 0.642 0.621

0.653 Not available 0.580 0.489 0.618 0.336 0.471

The extent of inter-observer agreement between the CAD system and the radiologist was calculated using Cohen kappa value. CAD = computer-aided diagnosis.

development of CAD systems based on a precise AI algorithm is expected to have high specificity and accuracy for identifying individuals at low risk of developing thyroid cancer, and this will avoid unnecessary FNAs. However, in this study, although the diagnostic sensitivity of the S-Detect 2 was comparable to that of the radiologist, the specificity and AUROCs were lower in a real-world setting. Similarly, (Choi et al. 2017) reported the comparable sensitivity of the S-Detect 1 to the radiologist (88.4% vs. 90.7%, p > 0.99), but a lower specificity and AUROC (specificity: 74.6% vs. 94.9%, p = 0.002; AUROCs: 0.83 vs. 0.92, p = 0.021). We believe that the diagnostic ability of the S-Detect series should be improved in this respect in the future. Thyroid cancer diagnosis requires accurate recognition of various US features to be able to differentiate between benign and malignant nodules. Since accurate recognition and consistent interpretation of US features is not easy for less-experienced operators, inter-observer variability is substantial, which often leads to unnecessary FNAs in practice (Choi et al. 2010). In developing a CAD system, AI is expected to help with the accurate

ARTICLE IN PRESS 6

Ultrasound in Medicine & Biology

and consistent interpretation of US features by the lessexperienced operators (Jeong et al. 2018). This is why the current CAD systems provide descriptions of the US features along with providing classification models of the final diagnosis. In addition, it is generally difficult to explain the internal relationship between input data and the output predicted by the AI model; however, it is possible to infer which characteristics of the input image affect the result, based on the US features provided. In this study, inter-observer agreement for the final diagnosis and US descriptors were moderate-to-substantial between the radiologist and the CAD systems. However, for S-Detect 2, the inter-observer agreement for the description of calcifications was the lowest and remained at fair (kappa = 0.336), which lowered the accuracy of this CAD system. Description of calcifications into one of four categories (none, micro-, macro-, and rim calcifications) is a newly assigned task in the S-Detect 2. Microcalcifications are defined as tiny, hyperechoic foci (<1 mm) within the solid portion of the nodule. The presence of microcalcifications increases the risk of malignancy in thyroid nodules and is one of the most specific US findings. However, since defining their presence by US in a thyroid nodule is challenging and their presence could change the patient management regime, S-Detect 1 could not provide any information for the calcifications. Echogenic foci with reverberation artifacts in partially cystic nodules were frequently classified as microcalcifications in the S-Detect 2. We believe that one of the main limitations of the S-Detect 2 is its inaccuracy with regard to recognizing calcifications; therefore, differentiation of calcifications have to be undertaken by the radiologist to increase the accuracy of the CAD system. A recent study by (Li et al. 2019) reported promising results in the development of a classification model with a cohort of more than 300,000 images, and the model was validated in three validation data sets. In their study, a classification model was developed without the provision of any information of US features, and it is the main difference of the CAD systems used in this study. The results showed that a newly developed CAD system had similar sensitivity to that of a group of skilled radiologist (84.3%93.4% vs. 89.0%96¢9%, respectively) and an even higher specificity (86.1%87.8% vs. 57.1%68.6%, respectively), although the reported specificity of the skilled radiologists in their study was relatively low compared with previous studies (86.4%95.5%) (Choi et al. 2017; Yoo et al. 2018). The technical performance of this study should be thoroughly validated in a different geographic setting and in a real-world setting. Our study had several limitations. First, we included nodules that had been subjected to US-guided FNA or US examination before scheduled surgery.

Volume 00, Number 00, 2019

Therefore, the proportion of malignancies was rather high, which may have influenced the diagnostic performance of the CAD systems. Second, we defined the diagnostic performance of the radiologist assisted by the CAD system as positive when the criteria met either those of the radiologist or the CAD system. The actual impact of the CAD system should be further validated in the future. Third, there may be an operator dependency to choose the margin although we followed one of the four options suggested by the CAD systems. Fourth, the radiologist’s final diagnosis was based on the personal experience. It may affect the generalizability of this study. Further validation pertaining to this issue and an assessment based on a larger study are required in the future. Fifth, the CAD systems do not provide an option for “anechoic echogenicity.” Further improvement is needed in the future regarding these issues. To conclude, the current CAD systems had limited specificity and accuracy as support for decision-making alongside radiologists in the diagnosis of thyroid cancer. One of its main limitations was its inaccuracy in recognizing calcifications and margins, which meant that differentiation of calcification have to be undertaken by the radiologist. Acknowledgment—This work was supported by the National Research Foundation of Korea, South Korea (# 2017 R1 C1 B5016217).

REFERENCES Ahn HS, Kim HJ, Welch HG. Korea’s thyroid-cancer “epidemic” screening and overdiagnosis. N Engl J Med 2014;371:1765–1767. Chang Y, Paul AK, Kim N, Baek JH, Choi YJ, Ha EJ, Lee KD, Lee HS, Shin D, Kim N. Computer-aided diagnosis for classifying benign versus malignant thyroid nodules based on ultrasound images: A comparison with radiologist-based assessments. Med Phys 2016;43:554. Choi SH, Kim EK, Kwak JY, Kim MJ, Son EJ. Interobserver and intraobserver variations in ultrasound assessment of thyroid nodules. Thyroid 2010;20:167–172. Choi YJ, Baek JH, Park HS, Shim WH, Kim TY, Shong YK, Lee JH. A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of thyroid nodules on ultrasound: Initial clinical assessment. Thyroid 2017;27:546–552. Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Pianykh OS, Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current applications and future impact of machine learning in radiology. Radiology 2018;288:318–328. Cibas ES, Ali SZ. The Bethesda system for reporting thyroid cytopathology. Thyroid 2009;19:1159–1165. Ha EJ, Na DG, Moon WJ, Lee YH, Choi N. Diagnostic performance of ultrasound-based risk-stratification systems for thyroid nodules: Comparison of the 2015 American Thyroid Association Guidelines with the 2016 Korean Thyroid Association/Korean Society of Thyroid Radiology and 2017 American Congress of Radiology Guidelines. Thyroid 2018a;28:1532–1537. Ha EJ, Na DG, Baek JH, Sung JY, Kim JH, Kang SY. US fine-needle aspiration biopsy for thyroid malignancy: Diagnostic performance of seven society guidelines applied to 2000 thyroid nodules. Radiology 2018b;287:893–900. Haugen BR. 2015 American Thyroid Association Management Guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: What is new and what has changed? Cancer 2017;123:372–381.

ARTICLE IN PRESS Computer-Aided Diagnosis of Thyroid Nodules on US  H. L. KIM et al. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts H. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500–510. Jeong EY, Kim HL, Ha EJ, Park SY, Cho YJ, Han M. Computer-aided diagnosis system for thyroid nodules on ultrasonography: Diagnostic performance and reproducibility based on the experience level of operators. Eur Radiol 2018;29:1978–1985. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, Kim N. Deep learning in medical imaging: General overview. Korean J Radiol 2017;18:570–584. Li X, Zhang S, Zhang Q, Wei X, Pan Y, Zhao J, Xin X, Qin C, Wang X, Li J, Yang F, Zhao Y, Yang M, Wang Q, Zheng Z, Zheng X, Yang X, Whitlow CT, Gurcan MN, Zhang L, Wang X, Pasche BC, Gao M, Zhang W, Chen K. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: A retrospective, multicohort, diagnostic study. Lancet Oncol 2019;20:193–201. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800–809.

7

Park SH, Kressel HY. Connecting technological innovation in artificial intelligence to real-world medical practice through rigorous clinical validation: What peer-reviewed medical journals could do. J Korean Med Sci 2018;33:e152. Shin JH, Baek JH, Chung J, Ha EJ, Kim JH, Lee YH, Lim HK, Moon WJ, Na DG, Park JS, Choi YJ, Hahn SY, Jeon SJ, Jung SL, Kim DW, Kim EK, Kwak JY, Lee CY, Lee HJ, Lee JH, Lee JH, Lee KH, Park SW, Sung JY. Korean Society of Thyroid Radiology. Korean Society of Radiology. Ultrasonography diagnosis and imaging-based management of thyroid nodules: Revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean J Radiol 2016;17:370–395. Sollini M, Cozzi L, Chiti A, Kirienko M. Texture analysis and machine learning to characterize suspected thyroid nodules and differentiated thyroid cancer: Where do we stand? Eur J Radiol 2018;99:1–8. Yoo YJ, Ha EJ, Cho YJ, Kim HL, Han M, Kang SY. A computer-aided diagnosis system for thyroid nodules on ultrasonography: Initial clinical experience. Korean J Radiol 2018;19:665–672.