Machine-learning-based computed tomography radiomic analysis for histologic subtype classification of thymic epithelial tumours

Machine-learning-based computed tomography radiomic analysis for histologic subtype classification of thymic epithelial tumours

European Journal of Radiology 126 (2020) 108929 Contents lists available at ScienceDirect European Journal of Radiology journal homepage: www.elsevi...

2MB Sizes 0 Downloads 3 Views

European Journal of Radiology 126 (2020) 108929

Contents lists available at ScienceDirect

European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad

Machine-learning-based computed tomography radiomic analysis for histologic subtype classification of thymic epithelial tumours

T

Jianping Hu, Yijing Zhao, Mengcheng Li, Yin Liu, Feng Wang, Qiang Weng, Ruixiong You, Dairong Cao* Department of Radiology, The First Affiliated Hospital of Fujian Medical University, 20 ChaZhong Rd, Fuzhou, Fujian, 350005, PR China

A R T I C LE I N FO

A B S T R A C T

Keywords: Radiomics Machine learning Thymic epithelial tumour Computed tomography WHO classification

Purpose: To evaluate the performance of machine-learning-based computed tomography (CT) radiomic analysis to differentiate high-risk thymic epithelial tumours (TETs) from low-risk TETs according to the WHO classification. Method: This retrospective study included 155 patients with a histologic diagnosis of high-risk TET (n = 72) and low-risk TET (n = 83) who underwent unenhanced CT (UECT) and contrast-enhanced CT (CECT). The radiomic features were extracted from the UECT and CECT of each patient at the largest cross-section of the lesion. The classification performance was evaluated with a nested leave-one-out cross-validation approach combining the least absolute shrinkage and selection operator feature selection and four classifiers: generalised linear model (GLM), k-nearest neighbor (KNN), support vector machine (SVM) and random forest (RF). The receiver-operating characteristic curve (ROC) and the area under the curve (AUC) were used to evaluate the performance of the classifiers. Results: The combination of UECT and CECT radiomic features demonstrated the best performance to differentiate high-risk TETs from low-risk TETs for all four classifiers. Among these classifiers, the RF had the highest AUC of 0.87, followed by GLM (AUC = 0.86), KNN (AUC = 0.86) and SVM (AUC = 0.84). Conclusions: Machine learning-based CT radiomic analysis allows for the differentiation of high-risk TETs and low-risk TETs with excellent performance, representing a promising tool to assist clinical decision making in patients with TETs.

1. Introduction Thymic epithelial tumours (TETs) are relatively rare neoplasms, but are the most common anterior mediastinal tumours in adults [1,2]. The histological classification introduced by the World Health Organization (WHO) has been reported to be an independent prognostic factor [3,4]. In the WHO classification, TETs are classified into six subtypes (types A, AB, B1, B2, B3, and thymic carcinomas), with increasing malignant nature [5]. A simplified risk classification, which defines types A, AB, and B1 as low-risk TETs and types B2, B3, and thymic carcinomas as high-risk TETs, has also been recommended as an alternative to further characterise the invasive behaviour and tumour recurrence probability [4]. Compared with low-risk TETs, high-risk TETs have a much poorer prognosis [4,6]. Computed tomography (CT) has primarily been used for the evaluation and staging of TETs. Several studies have reported that some imaging features of the tumours were useful for differentiating the



subtypes of TETs. In these studies, a smooth contour and round shape were related to type A thymoma, whereas irregular contours, a necrotic component, and invasion of the great vessels were related to thymic carcinomas [7–9]. However, these were of limited value for the prediction of histologic subtypes of TETs due to the significant overlap of features among the subgroups. Tumour heterogeneity is an essential feature of malignant tumours that can be assessed using histologic or imaging data [10]. Radiomics analysis of large imaging datasets has also illustrated the associations between tumour heterogeneity and radiomic features, which may be used for tumour detection, subtype classification, and therapeutic response assessment [11–14]. In recent studies, quantitative texture analysis from 18F-FDG positron-emission tomography (PET)/CT and CT images have demonstrated the potential to differentiate the tumour grades of TETs [15,16]. Different radiomic features can be combined into a robust predictive model to obtain a reliable diagnostic tool. As a technique for

Corresponding author. E-mail address: [email protected] (D. Cao).

https://doi.org/10.1016/j.ejrad.2020.108929 Received 26 April 2019; Received in revised form 1 February 2020; Accepted 26 February 2020 0720-048X/ © 2020 Elsevier B.V. All rights reserved.

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

2.2. CT image acquisition

recognising patterns, machine learning can be applied to determine the optimal combination of feature selection and classifier methods to achieve the best performance [17,18]. Moreover, some studies have shown that classifiers from different classifier families show different performance for different types of tumours [19–21]. However, there have been no reports of radiomic studies on the differentiation in the risk classification of TETs using machine-learning-based CT radiomic analysis with larger data and independent testing. The purpose of this study was to explore the potential use of CTbased radiomic features and machine-learning techniques to differentiate high-risk TETs from low-risk TETs. We also investigated the diagnostic performance of the different classifiers and radiomic features extracted from different imaging techniques, including unenhanced CT (UECT), contrast-enhanced CT (CECT), and the combination of UECT and CECT.

The CT images were acquired with three Toshiba Medical Systems CT scanners: 22 patients with a 16-MDCT scanner (Acquilion), 33 patients with an 80-MDCT scanner (Acquilion PRIME), and100 patients with a 320-MDCT scanner (Acquilion one). The scanning parameters were as follows: 120 kV; 130–200 mA s; detector collimation, 0.625 mm; the field of view, 35 cm; matrix size, 512 × 512; and reconstruction kernel, standard (FC10). The slice thicknesses for the CT scans were 5 mm (115 patients) and 7 mm (40 patients). Following an unenhanced CT, a contrast medium of 1.5 mL/kg body weight (Omnipaque 350, GE Healthcare) was administered intravenously at a rate of 3.0 mL/s followed by a 20-mL saline flush. Contrast-enhanced CT images were obtained 90 s after contrast agent administration. 2.3. Radiomic feature extraction and evaluation

2. Materials and methods Radiomic feature extraction was performed using the open-source Imaging Biomarker Explorer (IBEX) software, designed by commercial software package (Matlab, version 8.1.0; MathWorks, Natick, Mass) [22]. The radiomics study workflow diagram is shown in Fig. 2. The ROI of the tumour was manually segmented with ROI editor tools in IBEX software on unenhanced CT and enhanced CT images. A polygonal ROI was manually delineated to include the whole lesion based on the single-axial image with an optimal representation of the largest cross-sectional area (Fig. 2). The optimal representation section was determined together by two radiologists with 18 years (reader 1) and 9 years (reader 2) of experience in chest CT interpretation. The apparent calcification or cystic portion was carefully excluded when drawing the ROI. Before feature extraction, an image preprocessing with an edgepreserving smoothing (EPS) filter (as a preprocess tool implemented in the IBEX software) was applied to the lesion volume to reduce image noise. This smoothing preprocess has proved to be an effective way to reduce the impact of imaging noise in lung CT scans [23]. After preprocessing, the radiomic features for two-dimensional (2D) image slices were extracted from the UECT and CECT within the ROI. The extracted features included shape [excluding the three-dimensional (3D) shape features], intensity direct, intensity histogram, grey level co-occurrence matrix (GLCM) 25, the neighbour intensity difference (NID) 25, and the grey level run length matrix (GLRM) 25. The cooccurrence matrix features (a subcategory of texture features) were calculated in four directions (0, 45, 90, and 135 degrees) with three different offsets (1, 4, and 7), and run length matrix features (another subcategory of texture features) were calculated in two directions (0 and 90 degrees). The average value of these directions was calculated as the final value to avoid directional bias [24,25]. Moreover, some radiomic features from Intensity Direct (Kurtosis, Skewness, Percentile, Quantile, Range, InterQuartileRange, MeanAbsoluteDeviation, and MedianAbsoluteDeviation) that contained duplicate entries with Intensity Histogram sets were subsequently removed. In the end, 172 radiomic features were extracted for each image. The detailed features and abbreviations are listed in Table 1. Recently, several studies have demonstrated the influence of different acquisition protocols and different scanners on radiomics analysis [26,27]. After feature extraction, all radiomics features were harmonised to remove the scanner-specific effect using a ComBat harmonisation method with nonparametric mode. The ComBat harmonisation, which initially described genomic data analysis, has been shown to be useful for correcting the variations of radiomic feature values caused by different scanners and different acquisition protocols [28,29]. The Matlab and R function codes for ComBat harmonisation are available at https://github.com/Jfortin1/ComBatHarmonization. We randomly chose the CT images of 30 patients (20 % of all data) for ROI delineation and features extraction. The ROIs were independently drawn by two radiologists (reader 1 and reader 2) who

This retrospective study was approved by our institutional ethics committee, which waived the need for informed consent. 2.1. Patients A total of 201 patients with a diagnosis of thymic epithelial tumour was retrieved by searching the pathology database at our institution from January 2009 to December 2018. Six patients were excluded because of their histological diagnoses based on biopsy, and the other 195 patients had histological diagnoses based on surgical resection. Then, after searching our picture archiving and communication system (PACS), we excluded 40 patients for the following reasons: (1) patients without preoperative CT examination (n = 18); (2) patients with UECT (n = 9) or CECT (n = 8) only; (3) patients with lesions smaller than 1 cm (n = 2, to ensure enough area for drawing region of interest (ROI) and to minimise confounding factors for radiomic feature results); (4) poor CT image quality (n = 3). Finally, 155 patients were selected for this study. The study workflow diagram for patient selection is shown in Fig. 1. TETs were classified into two subgroups according to the simplified WHO classification system: low-risk group (A, AB, and B1) and high-risk group (B2, B3, and TC) [4].

Fig. 1. The patient selection workflow. 2

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

Fig. 2. The radiomics study workflow. A. The ROIs of the lesion and the normal liver tissue were created, and radiomics features were extracted. B. The reproducibility and stability of the features were evaluated by the lesion ROI and liver ROI, respectively, and the scanner-specific effect was subsequently removed. C. The classifiers were trained and evaluated with a nested leave-one-out cross-validation (LOOCV).

Table 1 List of radiomic features used in this study. Category

Features in IBE software

N= 172

Shape

Compactness1, Compactness2, Convex, ConvexHullVolume, Mass, Max3DDiameter, MeanBreadth, SphericalDisproportion, NumberOfVoxel, Orientation, Roundness, Sphericity, SphericalDisproportion, SurfaceArea, SurfaceAreaDensity, Volume Energy, GlobalEntropy, GlobalMax, GlobalMean, GlobalMedian, GlobalMin, GlobalStd, GlobalUniformity, LocalEntropyMax, LocalRangeMax, LocalEntropyMean, LocalEntropyMedian, LocalRangeStd, LocalEntropyMin, LocalEntropyStd, LocalRangeMean, LocalRangeMedian, LocalRangeMin, LocalStdMax, LocalStdMin, LocalStdMean, LocalStdMedian, LocalStdStd, RootMeanSquare, Variance, Kurtosis★, Skewness★, Range★, Percentile★, Quantile★, InterQuartileRange★, MeanAbsoluteDeviation★, MedianAbsoluteDeviation★ Kurtosis, Skewness, Range, Percentile*, PercentileArea*, Quantile*, InterQuartileRange, MeanAbsoluteDeviation, MedianAbsoluteDeviation AutoCorrelation, ClusterProminence, ClusterShade, ClusterTendency, Contrast, Correlation,DifferenceEntropy, Dissimilarity,Energy,Entropy,Homogeneity, Homogeneity2, InformationMeasureCorr1, InformationMeasureCorr2, InverseDiffMomentNorm, InverseDiffNorm, InverseVariance, MaxProbability,SumAverage, SumEntropy,SumVariance,Variance GrayLevelNonuniformity, HighGrayLevelRunEmpha, LongRunEmphasis, LongRunHighGrayLevelEmpha, RunPercentage, RunLengthNonuniformity, LongRunLowGrayLevelEmpha, LowGrayLevelRunEmpha, ShortRunEmphasis, ShortRunHighGrayLevelEmpha, ShortRunLowGrayLevelEmpha Busyness, Coarseness, Complexity, Contrast, TextureStrength

16

Intensity Direct

Intensity Histogram GrayLevelCooccurenceMatrix25 (Offset = 1,4, 7)

GrayLevelRunLengthMatrix25

NeighborIntensityDifference25

25

49 66

11

5

Notes: N, The total number of features (n=172); ★, The values of features in Intensity Direct set, which contained the same values with the features of the Intensity Histogram set, were removed. *, Percentile and Percentile Area were calculated at the point of from 5th to 95th with 5 intervals; Quantile was calculated at the point of 0.025,0.25, 0.5, 0.75, and 0.975, respectively.

were blinded to clinical and pathological results. The processes were repeated 4 weeks later. The remaining ROIs were segmented by reader 2. The intra-class correlation coefficient (ICC) was used to assess the intra- and inter-observer reproducibility of radiomic features extraction. The features that met the criteria of ICC greater than 0.75 (in the intra- and interobserver reproducibility study) were considered for further analysis. All the ROIs finished by reader 2 were used for the final feature analysis. We created a circular ROI with a diameter of 3.0 cm at the normal liver tissue (excluding the liver vessels) on the enhanced image. Radiomics features from the liver ROI were harmonised for the different scanners using the ComBat harmonisation method. The difference of harmonised features of the liver ROIs among the different scanners was compared using the Kruskal-Wallis rank sum test. The features that met the criteria with a P-value > 0.1 were considered to be the reliable features for removing the scanner-specific effect (Fig. 2).

Table 2 Demographic characteristics of the 155enrolled patients. Group

number

M/F ratio

Age (years)

Low-risk group Type A Type AB Type B1 High risk group Type B2 Type B3 TC

83 22 47 14 72 31 35 6

41/42 12/10 24/23 5/9 40/32 19/12 19/16 2/4

53 49 56 48 52 49 54 55

(23–79) (23–78) (33–71) (28–79) (25–78) (25–69) (34–78) (42–71)

Notes: Low-risk group, low-risk thymic epithelial tumor, including WHO histological subgroup: Type A, AB, B1; High-risk group, High-risk thymic epithelial tumor, including WHO histological subgroup: TypeB2, TypeB3 and thymic carcinoma; Number, the number of patients; M/F ratio, Male/Female ratio; Age: median and range in the bracket.

3

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

Table 3 The features selected by LASSO regression over all the LOOCV loops. Mode/Features

LASSO coeff

Number

UECT + CECT CE_GLCM25-7Homogeneity2 CE_GLCM25-7InformationMeasureCorr1 CE_GLCM25-4 AutoCorrelation UE_GLCM 25-4InverseVariance UE_IntensityDirectLocalEntropyMedian UE_IntensityHistogram10PercentileArea CE_IntensityHistogram95PercentileArea CE_GLCM25-ShortRunHighGrayLevelEmpha CE_GLCM25-7Contrast CE_IntensityHistogram10Percentile

−0.56 (−0.25∼−095) 0.46 (0.12∼0.53) −0.69 (−0.98∼−0.01) 0.38 (0.26∼0.49) 0.23 (0.06∼0.39) −0.02 (−0.04∼−0.00) −0.19 (−0.47∼0.00) −0.03 (−0.18∼0.00) 0.10 (0.00∼0.45) −0.20 (−0.47∼−0.01)

155/155 155/155 155/155 155/155 155/155 93/155 83/155 53/155 9/155 6/155

CECT GLCM 25-7InformationMeasureCorr1 IntensityHistogram0.975Quantile IntensityHistogramGaussFit1GaussAmplitude GLCM 25-4AutoCorrelation IntensityHistogram5PercentileArea IntensityHistogramInterQuartileRange IntensityHistogram95PercentileArea GLCM 25-7Contrast GLCM25-7Homogeneity2 GLCM25-7ClusterProminence

0.77 (0.05–1.43) 0.15 (0.00∼0.39) −0.12 (−0.4∼0.00) −0.92 (−1.35∼−0.02) 0.09 (0.02∼0.30) −0.06 (−0.29∼−0.01) −0.19 (−0.62∼−0.01) 0.25 (0.00∼1.15) −0.09 (−0.23∼−0.01) −0.21 (−0.36∼−0.03)

153/155 149/155 140/155 131/155 126/155 125/155 123/155 113/155 113/155 27/155

UECT GLCM 25-4InverseVariance GLCM 25-1InformationMeasureCorr2 IntensityHistogram10PercentileArea GLCM 25-7Homogeneity2 GLCM 25-1SumVariance GLCM 25-1ClusterProminence GLCM 25-7AutoCorrelation IntensityHistogram30Percentile IntensityDirectLocalEntropyMedian

0.56 (0.46∼0.65) 0.08 (0.00∼0.19) −0.12(−0.17∼−0.07) 0.05 (0.00∼0.14) −0.11 (−0.21∼0.00) −0.02(−0.06∼0.00) −0.03 (−0.14∼0.00) −0.05(−0.14∼0.00) 0.01(0.00∼0.04)

155/155 154/155 154/155 150/155 146/155 85/155 67/155 41/155 23/155

Notes UECT: unenhanced computed tomography; CECT: contrast enhanced computed tomography; GLCM: gray level co-occurrence matrix; LASSO coeff: The least absolute shrinkage and selection operator (LASSO) coefficients are reported by median and range in the bracket; LOOCV: leave-one-out cross-validation; Number: the frequency of feature selection.

parameter configuration for the classifier model by using an inner repeated 10-fold cross-validation with five times repeats. The performance of the predictive model was then evaluated in the testing set. The train/tuning parameters of the classifiers are as follows: GLM, the train family = “binomial”; KNN, the tuning parameter “k” was chosen from 1 to 11 in steps of 2; SVM-radial, the tuning parameters “C” and “sigma” were chosen from the set {2−2, 2−1… 23, 24} and {10−2, 10−1, 1, 101, 102}, respectively; RF, the tuning parameter “mtry” was from 2 to 10 by 1 increments. In the nested LOOCV, these classifiers were first trained or tuned in the training set and then validated in the one independent left-out testing sample derived from the different radiomic modes (UECT mode, CECT mode, and the combined mode, separately).

2.4. Feature selection and machine-learning classification To avoid model overfitting of the classifiers, a nested leave-one-out cross-validation (LOOCV) was employed to evaluate the performance potential of different machine-learning models. The detailed workflow is presented in Fig. 2. Specifically, at each step of the LOOCV process, one sample was taken as a test sample, and all the remaining samples were used as training sets. The procedure was repeated until all the samples in the dataset were used as the test sample. In each loop of the LOOCV, the feature values were standardised to the values to a mean of zero and a standard deviation of one to eliminate the possible influence caused by different dimensions using the “scale” function in R software. The least absolute shrinkage and selection operator (LASSO) algorithm was applied to choose the most valuable features. Five-fold cross-validation was performed to select the best λ-a parameter in LASSO to be determined-using 1-SE criteria. The LASSO analysis was performed using the “glmnet” package (version 0.84) in R software. The features with non-zero coefficients were selected from the candidate features and formed a radiomic signature for machine-learning classification analysis. Finally, four well-known machine-learning classifiers were used: generalised linear models (GLM), k-nearest neighbour (KNN), support vector machines with a radial basis function kernel (SVM- radial), and random forest (RF). All classifiers were implemented using the CARET package (version 6.0–47) in R software, which provides an overall and friendly interface to access many machine-learning algorithms. The following packages were used for classification methods: ‘glm’ (GLM), ‘knn’ (KNN), ‘kernlab’ (SVM-radial), and ‘random forest' (RF). The classifiers were trained or tuned in the training set to determine the best

2.5. Model evaluation The importance of the predictive radiomics features in the classifiers was evaluated by the selection frequencies of features obtained in Lasso regression over all of the LOOCV loops. The area under the curve (AUC) of the receiver operating characteristic (ROC), sensitivity (Sens), specificity (Spec), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC) of the models were computed based on the results of all independent left-out testing samples. 2.6. Statistical analysis The statistical analysis was conducted with R software (version 3.3.2; http://www.Rproject.org). All continuous variables were described as mean and 95 % confidence interval, or median and range. Wilcoxon rank-sum tests and two-sample t tests were used to compare 4

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

Fig. 3. Heat maps of the selected features after least absolute shrinkage and selection operator (LASSO) regression for different radiomic modes. Upper row: The combined mode of unenhanced computed tomography (UECT) and contrast enhanced computed tomography (CECT); Middle row: The CECT mode; Bottom row: The UECT mode.

3.2. Feature selection and acquisition of radiomic signatures

groups regarding continuous variables, whereas the Pearson chi-square test or Fisher exact test was used to compare groups regarding categorical variables. The significant difference between AUCs in the different classifiers and models was tested with the DeLong method using the MedCalc software package. All statistical analyses were two-sided, and p-values of < 0.05 were considered statistically significant.

A total of 344 radiomic features for each patient were extracted from UECT (172 features) and CECT images (172 features). Features with low reproducibility (the intra- or inter-observer ICC < 0.75) were excluded, so that the number of features was reduced to 230 (119 features for UECT image, and 111 features for CECT image). Features with low stability (Kruskal-Wallis rank sum test in the liver ROI, p < 0.1) were further removed, and the final number of features was 210 (117 features for UECT images and 109 features for CECT images). The features obtained in the process of LASSO regression were ranked by the selection frequencies of features over all of the LOOCV loops (Table 3). Most of these features were derived from the co-occurrence matrix and the intensity histogram. Heat maps of the selected features are presented in Fig. 3, and show the distribution and differences of normalised texture feature values for different radiomic modes.

3. Results 3.1. Demographic characteristics of the patients The clinicopathological characteristics of patients in our study cohort are shown in Table 2. The number in the low-risk group and the high-risk group was 83 and 72, respectively. There was no significant difference in age or sex either among the WHO histological subgroups or between the low-risk group and the high-risk group. 5

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

3.3. Performance of the classifiers and radiomic modes

Table 4 Classification performance of the classifiers for three radiomic modes. Classifiers/Modes

AUC

Sens

Spec

PPV

NPV

ACC

GLM UECT + CECT CECT UECT

0.86 (0.79 - 0.91) 0.80 (0.72 - 0.86) 0.62 (0.54 - 0.70)

0.75 0.68 0.53

0.84 0.80 0.60

0.81 0.74 0.54

0.80 0.74 0.60

0.80 0.74 0.57

KNN UECT + CECT CECT UECT

0.86 (0.79 - 0.91) 0.79 (0.72 - 0.86) 0.65 (0.57 - 0.72)

0.75 0.64 0.58

0.81 0.83 0.64

0.77 0.77 0.58

0.79 0.73 0.64

0.78 0.74 0.62

SVM UECT + CECT CECT UECT

0.85 (0.79 - 0.90) 0.83 (0.76 - 0.89) 0.60 (0.52 - 0.68)

0.74 0.67 0.50

0.82 0.84 0.63

0.78 0.79 0.54

0.78 0.75 0.59

0.78 0.76 0.57

RF UECT + CECT CECT UECT

0.87 (0.80 - 0.92) 0.81 (0.73 - 0.86) 0.61 (0.53 - 0.69)

0.71 0.70 0.47

0.89 0.82 0.62

0.85 0.77 0.52

0.78 0.76 0.57

0.81 0.76 0.55

The average performance (AUC, Sens, Spec, NPV, PPV, and ACC) of classifiers for different radiomic modes are presented in Table 4, and average ROC curves are shown in Fig. 4. The pairwise comparison of ROC curves indicated that the combination of UECT and CECT demonstrated the best performance for these machine-learning methods, followed by the CECT and UECT. There was no difference between CECT and the combination of UECT and CECT for the SVM classifier. There was no significant difference among the ROC curves of four classification methods for each radiomic mode. The RF classifier in the combination of UECT and CECT exhibited the best performance (0.87, 95 %CI: 0.80–0.92).

4. Discussion Medical images, such as CT or MRI, are traditionally “viewed” and “interpreted” by visual observation. Correct interpretation of these images depends on the observer's experience, knowledge, and the quality of the equipment. However, tumour heterogeneity, which may be associated with biologic aggressiveness, is challenging to capture and quantify with the visual assessment of images. Radiomic analysis, which can provide objective, quantitative evaluation of tumour heterogeneity from a large number of quantitative features of medical images, offers the potential to overcome the limitations of a subjective visual image interpretation. In the present study, we attempt to develop and validate a machine-learning approach combined with CT radiomic analysis to differentiate the risk classification of TETs. Among the three image modes, all the four classifiers provided excellent performance

Notes: AUC, area under the curve; UECT, unenhanced computed tomography; CECT, contrast enhanced computed tomography; Sens, Sensitivity; Spec, Specificity; NPV, negative predictive value; PPV, positive predictive value; ACC, accuracy; GLM, Generalised linear models; SVM, support vector machines; kNN, k-nearest neighbor; RF, Random forest. The value in the bracket is 95 % confidence interval.

Fig. 4. The ROC curves of the different radiomic modes (UECT, CECT and the combination of UECT and CECT) for the different classifiers. A. GLM Generalised linear models; B. KNN k-nearest neighbour; C. SVM, support vector machines; D. RF. random forest. 6

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

machine-learning methods should be evaluated in diff ;erent types of tumours concerning diff ;erent radiomic cohorts [20,21]. In the present study, we investigated the performance of four classifiers (GLM, KNN, SVM, and RF) that achieved higher performance in previous studies [40–42]. The RF classifier in the combined mode of UECT and CECT exhibited the best performance. Consistent with previous studies, the random forest classifier was reported to have the best predictive performance in lung cancers and head and neck cancers when compared with other classifier [20,21]. In general, our results demonstrated that all four classifiers showed excellent performance with relatively high AUC and ACC for the CECT mode and the combined mode of UECT and CECT. In addition, as the fact of competition between CT and MRI in the assessment of mediastinal tumours, using machine-learning methods, a recent study also shown that radiomics features from conventional MRI can also be used to establish robust prediction models for risk classification of TETs [43]. These encouraging results indicated that machinelearning-based radiomic methods might be a robust risk stratification approach for TETs that could aid clinical decision support. Our study has several limitations. First, a retrospective study performed at a single centre may have introduced a selection bias; a largescale, multi-centre, prospective study is needed to be performed for validation. Second, the ROIs were manually drawn at the largest crosssection of the lesion. A 3D volumetric analysis was not conducted because of the lack of thin-slice image data due to the long inclusion period of subjects and variable CT protocols. Besides, texture features may be affected by calcification, haemorrhage, and cysts in TETs, which are difficult to eliminate at the volumetric analysis. Third, CT image data were collected from three different scanners with different parameters, which can affect the extracted features [26,27]. Although a ComBat harmonisation method was used to remove scanner-specific effects from features, its effectiveness needs to be proved in further studies. Finally, due to from the pathologic database, the diagnosis of the TETs was made by different pathologists, which might result in inter-observer variability.

(AUC: 0.79-0.87) in the CECT mode and the combined mode of UECT and CECT. Although the RF classifier in the combination of UECT and CECT exhibited the best performance (AUC = 0.87), there is no significant difference among the four classifiers for each radiomic mode. Our results demonstrated that machine-learning analysis based on CT radiomic features has the potential to correctly differentiate high-risk TETs from low-risk TETs. The most important features after selection were the histogram and GLCM texture features, which were from either CECT or UECT. Among these features, as first order feature, an intensity histogram evaluates the frequency distribution features of pixel intensity within a given area of interest. As second order features, according to their formulas and definitions, GLCM-InformationMeasureCorr1 quantifies the degree of randomness, GLCM-7 InverseVariance indicates the similarity of voxel values along that direction, homogeneity is a measure of local grey level uniformity, and correlation reflects the consistency of image texture [22]. In previous studies, these features provide measures of tumour heterogeneity that have been reported to be related to histopathological features and prognosis in a variety of tumours such as oesophageal cancer, renal cancer, ovarian cancer, and non-small-cell lung cancer [30–33]. Another main finding of this study is that, when compared with the UECT radiomic features, an appropriate subset of features from the CECT or the combination of CECT and UECT can significantly improve the risk classification performance of TETs. Previous studies have pointed out that UECT and CECT may reflect differences in tumour biology. Radiomic features from UECT images may be associated with the spatial heterogeneity of histopathological characteristics in tumours such as cellular density, focal haemorrhage, and necrosis; whereas radiomic features from CECT images may indicate the heterogeneity of the tumour blood supply and the contrast distribution between intra-, extravascular, and extracellular space [34–36]. Therefore, radiomic features from CECT have been linked to the tumour microvascular architecture such as microvascular density and permeability. Our results appear to suggest that features from CECT may be more valuable than the ones from UECT, and a combination of features on UECT and CECT demonstrated the best performance of risk classification for TETs. Similar to our results, a recent study also showed that radiomics features from UECT and CECT could be used as noninvasive biomarkers for the differentiation of high-risk TETs and low-risk TETs [37]. However, classification performance was compared only between UECT and CECT, and no significant difference was shown in their study, although the performance of CECT demonstrated slightly better than that of UECT. In recent years, the application of machine-learning-based radiomic analysis to medical imaging has drawn increased attention. The radiomic study is usually accompanied by the extraction of a large number of imaging features, among which some are redundant and unstable. Therefore, a key challenge for machine-learning-based radiomic analysis is the extraction of stable and significant features for further assessment. In the present study, an EPS filter and a ComBat harmonisation method were first used to remove the possible effects caused by scan parameters and different scanners. The repeatability and stability of features were further assessed to ensure the reliability of selected features. Finally, LASSO regression was employed for feature selection to eliminate the redundant features as much as possible. In previous studies, LASSO regression has been proved to be a powerful feature selection method that can find important features and filter out the unimportant or unnecessary ones to achieve robust classification performance [38,39]. When using machine-learning algorithms, one major problem is the risk of overfitting. In our study, a nested LOOCV method was used to reduce the bias in performance estimation and to minimise the risk of overfitting. In machine-learning analysis, choosing a robust classifier was also crucial for obtaining the stability and classification performance of the radiomic model. Some studies have shown that different

5. Conclusion In conclusion, our study results suggest that machine-learning analysis based on CT radiomic features can be applied as a prediction method for risk classification of TETs. All classifiers presented a high diagnostic performance using a combination of features on UECT and CECT. As machine-learning research in radiology is still evolving, further work with larger sample sizes will be useful to further validate the performance of the classifier and to make it more reliable in clinical practice. Funding This work was supported by the Grant of Science and Technology Commission of Fujian Province (Grant number: 2019J01435). Declaration of Competing Interest We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled. References [1] R.F. Riedel, W.R. Burfeind, Thymoma: benign appearance, malignant potential, Oncologist 11 (8) (2006) 887–894. [2] F.C. Detterbeck, A. Zeeshan, Thymoma: current diagnosis and treatment, Chin. Med. J. 126 (11) (2013) 2186–2191.

7

European Journal of Radiology 126 (2020) 108929

J. Hu, et al.

[3] M. Tsuyuguchi, S. Kimura, M. Sumitomo, WHO histologic classification is a prognostic indicator in thymoma, Ann. Thorac. Surg. 77 (4) (2004) 1183–1188. [4] G. Chen, A. Marx, H.C. Wen, J. Yong, B. Puppe, P. Stroebel, H.K. Mueller Hermelink, New WHO histologic classification predicts prognosis of thymic epithelial tumors: a clinicopathologic study of 200 thymoma cases from China, Cancer 95 (2) (2002) 420–429. [5] M.D. Juan Rosai, L.H. Sobin, Histological Typing of Tumours of the Thymus, (1999). [6] K. Beom Kyung, C. Byoung Chul, C. Hye Jin, S. Joo Hyuk, P. Moo Suk, C. Joon, K. Se Kyu, K. Dae Joon, C. Kyung Young, L.C. Geol, A single institutional experience of surgically resected thymic epithelial tumors over 10 years: clinical outcomes and clinicopathologic features, Oncol. Rep. 19 (6) (2008) 1525–1531. [7] Y.J. Jeong, K.S. Lee, J. Kim, Y.M. Shim, J. Han, O.J. Kwon, Does CT of thymic epithelial tumors enable us to differentiate histologic subtypes and predict prognosis? AJR Am. J. Roentgenol. 183 (2) (2004) 283. [8] N. Tomiyama, T. Johkoh, N. Mihara, O. Honda, T. Kozuka, M. Koyama, S. Hamada, M. Okumura, M. Ohta, T. Eimoto, M. Miyagawa, N.L. Muller, J. Ikezoe, H. Nakamura, Using the World Health Organization Classification of thymic epithelial neoplasms to describe CT findings, AJR Am. J. Roentgenol. 179 (4) (2002) 881–886. [9] J. Sadohara, K. Fujimoto, N.L. Muller, S. Kato, S. Takamori, K. Ohkuma, H. Terasaki, N. Hayabuchi, Thymic epithelial tumors: comparison of CT and MR imaging findings of low-risk thymomas, high-risk thymomas, and thymic carcinomas, Eur. J. Radiol. 60 (1) (2006) 70–79. [10] B. Ganeshan, K.A. Miles, Quantifying tumour heterogeneity with CT, Cancer Imaging 13 (2013) 140–149. [11] J.P. O’Connor, E.O. Aboagye, J.E. Adams, H.J. Aerts, S.F. Barrington, A.J. Beer, R. Boellaard, S.E. Bohndiek, M. Brady, G. Brown, D.L. Buckley, T.L. Chenevert, L.P. Clarke, S. Collette, G.J. Cook, N.M. deSouza, J.C. Dickson, C. Dive, J.L. Evelhoch, C. Faivre-Finn, F.A. Gallagher, F.J. Gilbert, R.J. Gillies, V. Goh, J.R. Griffiths, A.M. Groves, S. Halligan, A.L. Harris, D.J. Hawkes, O.S. Hoekstra, E.P. Huang, B.F. Hutton, E.F. Jackson, G.C. Jayson, A. Jones, D.M. Koh, D. Lacombe, P. Lambin, N. Lassau, M.O. Leach, T.Y. Lee, E.L. Leen, J.S. Lewis, Y. Liu, M.F. Lythgoe, P. Manoharan, R.J. Maxwell, K.A. Miles, B. Morgan, S. Morris, T. Ng, A.R. Padhani, G.J. Parker, M. Partridge, A.P. Pathak, A.C. Peet, S. Punwani, A.R. Reynolds, S.P. Robinson, L.K. Shankar, R.A. Sharma, D. Soloviev, S. Stroobants, D.C. Sullivan, S.A. Taylor, P.S. Tofts, G.M. Tozer, M. van Herk, S. Walker-Samuel, J. Wason, K.J. Williams, P. Workman, T.E. Yankeelov, K.M. Brindle, L.M. McShane, A. Jackson, J.C. Waterton, Imaging biomarker roadmap for cancer studies, Nat. Rev. Clin. Oncol. 14 (3) (2017) 169–186. [12] P. Lambin, E. Rios-Velazquez, R. Leijenaar, S. Carvalho, R.G. van Stiphout, P. Granton, C.M. Zegers, R. Gillies, R. Boellard, A. Dekker, H.J. Aerts, Radiomics: extracting more information from medical images using advanced feature analysis, Eur. J. Cancer 48 (4) (2012) 441–446. [13] E.J. Limkin, R. Sun, L. Dercle, E.I. Zacharaki, C. Robert, S. Reuze, A. Schernberg, N. Paragios, E. Deutsch, C. Ferte, Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology, Ann. Oncol. 28 (6) (2017) 1191–1206. [14] R.J. Gillies, P.E. Kinahan, H. Hricak, Radiomics: Images Are More than Pictures, They Are Data, Radiology 278 (2) (2016) 563–577. [15] K. Yasaka, H. Akai, M. Nojima, A. Shinozaki-Ushiku, M. Fukayama, J. Nakajima, K. Ohtomo, S. Kiryu, Quantitative computed tomography texture analysis for estimating histological subtypes of thymic epithelial tumors, Eur. J. Radiol. 92 (2017) 84–92. [16] H.S. Lee, J.S. Oh, Y.S. Park, S.J. Jang, I.S. Choi, J.S. Ryu, Differentiating the grades of thymic epithelial tumor malignancy using textural features of intratumoral heterogeneity via (18)F-FDG PET/CT, Ann. Nucl. Med. 30 (4) (2016) 309–319. [17] B. Zhang, X. He, F. Ouyang, D. Gu, Y. Dong, L. Zhang, X. Mo, W. Huang, J. Tian, S. Zhang, Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma, Cancer Lett. 403 (2017) 21–27. [18] C. Parmar, P. Grossmann, J. Bussink, P. Lambin, H. Aerts, Machine learning methods for quantitative radiomic biomarkers, Sci. Rep. 5 (2015) 13087. [19] S.E. Viswanath, P.V. Chirra, M.C. Yim, N.M. Rofsky, A.S. Purysko, M.A. Rosen, B.N. Bloch, A. Madabhushi, Comparing radiomic classifiers and classifier ensembles for detection of peripheral zone prostate tumors on T2-weighted MRI: a multi-site study, BMC Med. Imaging 19 (1) (2019) 22. [20] C. Parmar, P. Grossmann, D. Rietveld, M.M. Rietbergen, P. Lambin, H.J.W.L. Aerts, Radiomic machine-learning classifiers for prognostic biomarkers of head and neck cancer, Front. Oncol. 5 (4) (2015). [21] C. Parmar, P. Grossmann, J. Bussink, P. Lambin, H.J.W.L. Aerts, Machine learning methods for quantitative radiomic biomarkers, Sci. Rep. 5 (2015) 13087. [22] L. Zhang, D.V. Fried, X.J. Fave, L.A. Hunter, J. Yang, L.E. Court, IBEX: an open infrastructure software platform to facilitate collaborative work in radiomics, Med. Phys. 42 (3) (2015) 1341–1353. [23] J. Yang, L. Zhang, X.J. Fave, D.V. Fried, F.C. Stingo, C.S. Ng, L.E. Court, Uncertainty analysis of quantitative imaging features extracted from contrast-enhanced CT in

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

8

lung tumors, Computerized Medical Imaging & Graphics the Official Journal of the Computerized Medical Imaging Society 48 (January 11) (2016) 1–8. C.A. Owens, C.B. Peterson, C. Tang, E.J. Koay, W. Yu, D.S. Mackin, J. Li, M.R. Salehpour, D.T. Fuentes, L.E. Court, J. Yang, Lung tumor segmentation methods: impact on the uncertainty of radiomics features for non-small cell lung cancer, PLoS One 13 (10) (2018) e0205003. X. Fave, M. Cook, A. Frederick, L. Zhang, J. Yang, D. Fried, F. Stingo, L. Court, Preliminary investigation into sources of uncertainty in quantitative imaging features, Comput. Med. Imaging Graph. 44 (2015) 54–61. L. He, Y. Huang, Z. Ma, C. Liang, C. Liang, Z. Liu, Effects of contrast-enhancement, reconstruction slice thickness and convolution kernel on the diagnostic performance of radiomics signature in solitary pulmonary nodule, Sci. Rep. 6 (2016) 34921. L. Lu, R.C. Ehmke, L.H. Schwartz, B. Zhao, Assessing agreement between radiomic features computed for multiple CT imaging settings, PLoS One 11 (12) (2016) e0166550. F. Orlhac, S. Boughdad, C. Philippe, H. Stalla-Bourdillon, C. Nioche, L. Champion, M. Soussan, F. Frouin, V. Frouin, I. Buvat, A postreconstruction harmonization method for multicenter radiomic studies in PET, J. Nucl. Med. 59 (8) (2018) jnumed.117.199935. F. Orlhac, F. Frouin, C. Nioche, N. Ayache, I. Buvat, Validation of a method to compensate multicenter effects affecting CT radiomics, Radiology 291 (1) (2019) 53–59. S. Rizzo, F. Botta, S. Raimondi, D. Origgi, V. Buscarino, A. Colarieti, F. Tomao, G. Aletti, V. Zanagnolo, M. Del Grande, N. Colombo, M. Bellomi, Radiomics of highgrade serous ovarian cancer: association between quantitative CT features, residual tumour and disease progression within 12 months, Eur. Radiol. 28 (11) (2018) 4849–4859. D.V. Fried, O. Mawlawi, L. Zhang, X. Fave, S. Zhou, G. Ibbott, Z. Liao, L.E. Court, Stage III non-small cell lung cancer: prognostic value of FDG PET quantitative imaging features combined with clinical prognostic factors, Radiology 278 (1) (2016) 214–222. Z. Feng, P. Rong, P. Cao, Q. Zhou, W. Zhu, Z. Yan, Q. Liu, W. Wang, Machine learning-based quantitative texture analysis of CT images of small renal masses: differentiation of angiomyolipoma without visible fat from renal cell carcinoma, Eur. Radiol. 28 (4) (2018) 1625–1633. S. Liu, H. Zheng, X. Pan, L. Chen, M. Shi, Y. Guan, Y. Ge, J. He, Z. Zhou, Texture analysis of CT imaging for assessment of esophageal squamous cancer aggressiveness, J. Thorac. Dis. 9 (11) (2017) 4724–4732. Z. Haowei, C.M. Graham, E. Okan, M.E. Griswold, Z. Xu, M.A. Khan, P. Karen, J.J. Caudell, R.D. Hamilton, G. Balaji, Locally advanced squamous cell carcinoma of the head and neck: CT texture and histogram analysis allow independent prediction of overall survival in patients treated with induction chemotherapy, Radiology 269 (3) (2013) 801–809. F. Ng, B. Ganeshan, R. Kozarski, K.A. Miles, V. Goh, Assessment of primary colorectal cancer heterogeneity by using whole-tumor texture analysis: contrast-enhanced CT texture as a biomarker of 5-year survival, Radiology 266 (1) (2013) 177–184. H. Li, Y. Zhu, E.S. Burnside, K. Drukker, K.A. Hoadley, C. Fan, S.D. Conzen, G.J. Whitman, E.J. Sutton, J.M. Net, M. Ganott, E. Huang, E.A. Morris, C.M. Perou, Y. Ji, M.L. Giger, MR imaging radiomics signatures for predicting the risk of breast cancer recurrence as given by research versions of MammaPrint, oncotype DX, and PAM50 gene assays, Radiology 281 (2) (2016) 382–391. X. Wang, W. Sun, H. Liang, X. Mao, Z. Lu, Radiomics signatures of computed tomography imaging for predicting risk categorization and clinical stage of thymomas, Biomed Res. Int. (2019) 3616852. P. Yin, N. Mao, C. Zhao, J. Wu, C. Sun, L. Chen, N. Hong, Comparison of radiomics machine-learning classifiers and feature selection for differentiation of sacral chordoma and sacral giant cell tumour based on 3D computed tomography features, Eur. Radiol. 29 (4) (2018) 1841–1847. S. Wu, J. Zheng, Y. Li, H. Yu, S. Shi, W. Xie, H. Liu, Y. Su, J. Huang, T. Lin, A radiomics nomogram for the preoperative prediction of lymph node metastasis in bladder cancer, Clin. Cancer Res. 23 (22) (2017) 6904–6911. C. Parmar, P. Grossmann, D. Rietveld, M.M. Rietbergen, P. Lambin, H.J. Aerts, Radiomic machine-learning classifiers for prognostic biomarkers of head and neck cancer, Front. Oncol. 5 (2015) 272. W. Wu, C. Parmar, P. Grossmann, J. Quackenbush, P. Lambin, J. Bussink, R. Mak, H.J. Aerts, Exploratory study to identify radiomics classifiers for lung cancer histology, Front. Oncol. 6 (2016) 71. X. Meng, W. Xia, P. Xie, R. Zhang, W. Li, M. Wang, F. Xiong, Y. Liu, X. Fan, Y. Xie, X. Wan, K. Zhu, H. Shan, L. Wang, X. Gao, Preoperative radiomic signature based on multiparametric magnetic resonance imaging for noninvasive evaluation of biological characteristics in rectal cancer, Eur. Radiol. (2018). G. Xiao, W.-C. Rong, Y.-C. Hu, Z.-Q. Shi, Y. Yang, J.-L. Ren, G.-B. Cui, MRI radiomics analysis for predicting the pathologic classification and TNM staging of thymic epithelial tumors: a pilot study, Am. J. Roentgenol. (2019) 1–13.