International Congress Series 1281 (2005) 1109 – 1114
www.ics-elsevier.com
False positive reduction for lung nodule CAD using support vector machines and genetic algorithms Luyin Zhao*, Lilla Boroczky, K.P. Lee Philips Research USA, Briarcliff Manor, NY, USA
Abstract. In this paper, we propose a machine learning approach to reduce false positive lung nodules identified in multi-slice CT scans by CAD algorithms. From a pool of features computed from the thin-slice scans, a genetic algorithm is used to determine an optimal feature subset for training a classifier that will eliminate as many of the false positives as possible while retaining the true nodules. We use support vector machines as the classifier for its superior performance. The experiment was conducted on a database of 66 true nodules and 123 false ones. From 15 features calculated for each nodule our approach selected 9 as the optimal feature subset size and the resulting classifier trained with a selected set of 9 features was able to achieve 98.5% sensitivity and 82.9% specificity using leave-one-out cross validation. D 2005 Published by Elsevier B.V. Keywords: Computer-aided detection; Computer-aided diagnosis; Lung nodule; CT; Multi-slice CT; False positive reduction; Feature extraction; Feature selection; Classification; Machine learning; Support vector machine; Genetic algorithm
1. Introduction Lung cancer is one of the most common fatal diseases with a 5-year survival rate of only 10–15% [1]. Early detection and diagnosis of suspicious lesions is the most effective way to improve the survival rate. With the growing acceptance of multi-slice computed tomography (MSCT) equipment, the radiologist has a much improved view of the thorax and can potentially diagnose lung cancer at an earlier stage than before. Unfortunately, the advent of these high-resolution MSCT scans also means that the radiologist has to deal with hundreds of images for even a single study. The potential for missing important information is high and necessitates computerized decision support tools. * Corresponding author. E-mail address:
[email protected] (L. Zhao). 0531-5131/ D 2005 Published by Elsevier B.V. doi:10.1016/j.ics.2005.03.061
1110
L. Zhao et al. / International Congress Series 1281 (2005) 1109–1114
Fig. 1. (a) True nodule; (c) false positive nodule; (b and d) corresponding segmentation masks.
Computer-aided detection (CAD) algorithms have been developed to identify suspicious or candidate lesions from MSCT scans [2–4]. Typically, features are computed by image processing algorithms and used to determine whether a region of interest in a scan contains a candidate nodule. While these algorithms have high sensitivity, many structures (for example, blood vessels) which can appear to be nodules are also flagged by them (Fig. 1). Since the radiologist needs to assess each identified structure, it is important to reduce such false positive nodules as much as possible while retaining the true nodules. This is known as the false positive reduction (FPR) problem and is typically addressed by machine learning (classification) technologies. Unlike other classification tasks that aim to reduce the total number of misclassified cases, the objective here is to eliminate as many false positives as possible under the constraint that all true positives be retained. A number of papers have been published on FPR for lung nodule CAD, e.g., [5–8]. While these solutions typically use a supervised machine learning approach, the question of what is the right set of features to use is rarely addressed. Choosing too many features induces high computation cost as well as the potential for over-fitting, while choosing too few features produces an inadequate classifier. In this paper we propose a FPR system for lung nodule CAD using a Genetic Algorithm (GA) and Support Vector Machines (SVMs). A GA is applied to determine an optimal feature subset, which is used to train an SVM that classifies true and false nodules. 2. False positive reduction system using SVMs and GAs The proposed false positive reduction system is depicted in Fig. 2. The CAD unit, which precedes the proposed FP reduction system, extracts the lung area, detects the volume of interest (VOI) surrounding a nodule and delineates the nodule CAD
CT scans
Computer-aided detection of lung nodule
FP reduction
VOI
Candidate feature set
SVM-driven Genetic Algorithm for feature selection
Size and list of optimal feature set
Segmentation (delineation)
Label VOI Feature extraction
Optimal feature set
Support Vector Machines nodule or non-nodule
Fig. 2. A false positive reduction system using SVMs and GAs.
L. Zhao et al. / International Congress Series 1281 (2005) 1109–1114
1111
by image processing algorithms [2–4]. The segmentation unit also generates a label VOI image, where nodule, background and lung-wall (or bcut-outQ) regions are labelled. The machine learning-based FPR unit consists of three major processing steps: image feature extraction, feature subset selection using an SVM-driven GA, and a classifier based on SVMs. We have chosen SVMs [9] as our classifier due to its superior performance for various classification tasks [10]. 2.1. Feature extraction The aim of this step is to determine a pool of features computed from the images that can discriminate reliably true nodules from false ones. We extract fifteen 2D and 3D features (Table 1) from the VOI exploiting the opportunities offered by thin-slice CT scans. The first four features in Table 1 describe the 3D gray level distribution inside the delineated objects. Lung nodules often have higher gray values than parts of vessels misidentified as nodules. Features 5–9 attempt to distinguish true nodules from false ones by characterizing the 3D shape of the delineated objects. Nodules tend to have more spherical shape than misidentified ones, e.g., part of vessels, which tend to have more elongated shape. Features 6–8 are calculated as ratios between different eigenvalues of the inertia matrix computed from the delineated 3D objects. Feature 10 is based on the observation that true nodules and false ones tend to have different 3D contrast between the interior and surroundings of the delineated objects. As shape-based features are often difficult to calculate for small nodules, we have developed certain histogram-based features (11–15) that do not require a priori segmentation of the objects. They are calculated as central moments of a cube centered in the volume of interest but excluding the wall pixels attached to the structures. 2.2. Feature subset selection using GA-driven SVMs One approach to selecting a subset of bgoodQ or most relevant features is through the use of GAs. The essential idea behind using a GA for the subset selection problem is to Table 1 Feature set used for false positive reduction Number 1–4 5
Feature gray_level_max, gray_level_min, gray_level_mean, gray_level_std compactness
9
flat_shape, spheric_shape, elongated_shape sphericity
10
contrast
11
histogram_mean, histogram_std, histogram_skewness, histogram_kurtosis histogram_high_value_count
6–8
15
Definition maximum, minimum, mean, standard deviation of gray level inside the delineated structure ratio of volume equivalent radius and spatial deviation equivalent radius ratio of eigenvectors of inertia matrix of the structure ratio of the volume of a sphere having same volume as the structure and the actual volume of the structure difference between the mean gray values inside and in a shell surrounding the structure mean, standard deviation, skewness, kurtosis of gray values in a cube centered in the VOI excluding lung-wall pixels ratio of number of pixels having high gray value in the cube centered in the VOI and the volume of the structure
1112
L. Zhao et al. / International Congress Series 1281 (2005) 1109–1114
create a number of bchromosomesQ that consist of multiple bgenesQ with each gene representing a selected feature [11]. The set of features represented by a chromosome is used to train an SVM on the training data. The fitness of the chromosome is evaluated as how well the resulting SVM performs (in the present case, the measure is how many true nodules are retained by the SVM’s classification instead of overall classification accuracy). At the start of this process, a population of chromosomes is generated by randomly selecting features to form chromosomes. The algorithm then iteratively searches for those chromosomes that perform well (high fitness). At each generation, the GA evaluates the fitness of each chromosome in the population and, through two main evolutionary methods, mutation and crossover, creates new chromosomes from the current ones. Genes that are within good chromosomes are retained for the next generation and those with poor performance are discarded. Eventually, an optimal solution (i.e., a collection of features) is found through this process of survival of the fittest. Overall we need to determine two variables: (1) how many features out of the 15 features are appropriate for the data set size; (2) what are those features. The specific GA we use, CHC [12], is able to represent in the chromosome these two types of variables and operates on them in the evolutionary process. In other words, the objective of the GA is to find the least number of features that perform best on machine learning. This approach becomes much more important if we are considering a bigger feature set (such as hundreds of features). 2.3. Validation and testing using SVMs The purpose of validation is to test in a relatively independent way whether the optimal feature subset is able to differentiate true nodules from false ones while retaining as many true nodules as possible. To do this, we used a standard statistical generalization error estimation method—cross validation. Specifically, we used two forms of cross validation: (1) Leave-one-out method, which runs n (the data set size) times, each time using only one case for testing and all other cases for training. (2) k-fold cross validation, which divides all cases into k sets of equal size and uses one set for testing and all other sets for training. The classifier generated by training with all true nodules and large false nodules is used to test on smaller false nodules. 3. Results The image database used in this study consists of 66 true nodules (volume-equivalent radius N2 mm) and 123 false nodules (volume-equivalent radius N4 mm) obtained from thin-slice CT scans (slice thickness = 1 mm). These candidates have been identified by a CAD algorithm [4] and read by a panel of four radiologists who classified them into true and false nodules. To conduct the feature subset selection process, at each generation, the data set is split according to the following pattern: a training set of size 110 is selected by picking randomly from the pool of true and false nodules to determine the performance of a
L. Zhao et al. / International Congress Series 1281 (2005) 1109–1114
1113
Table 2 Data splitting pattern for feature selection by GA True nodules False nodules
Training
Testing
Total
40 70
26 53
66 123
chromosome. The remaining ones form the testing set of size 79. Table 2 describes the data splitting pattern. The results of this step are the optimal feature subset size as well as a set of features that has demonstrated average best performance during the evolution. The optimal feature set obtained consists of nine features: gray_level_mean, gray_level_std, sphericity, compactness, elongated_shape, spheric_shape, flat_shape, contrast and histogram_mean. This optimal feature set is used to train an SVM classifier for FPR. To validate the optimal feature set, we applied the leave-one-out method, where the training/testing is run 189 times with each nodule left out in turn for testing and all other cases for training. Leave-one-out validation has resulted in sensitivity = (65/66) =98.5% and specificity =(102/123) =82.9%. The Az value of the ROC curve is 0.96 (Fig. 3). Then, an 11-fold cross validation was done. This was run 10 times with a subset of 17 nodules (6 true nodules+ 11 false nodules) and 1 time with a subset of 19 nodules (6 true nodules+ 13 false nodules) left out in turn for testing and all other cases for training. 11fold cross validation gave 93.2% sensitivity and 86.7% specificity. We applied the classifier generated by all the true nodules and large false nodules (volume equivalent radius N4 mm) to classify smaller false nodules. For 3–4-mm nodules, a classification accuracy of 47.6% was obtained. 4. Conclusion In this study, we proposed a machine learning approach using a GA and SVMs to reduce false positive lung nodules identified by CAD algorithms. From candidate nodules 2D and 3D features were computed and used to discriminate true nodules from false ones. ROC Curve 1.00
.75
Sensitivity
.50
.25
0.00 0.00
.25
.50
.75
1.00
1 - Specificity
Fig. 3. ROC for leave-one-out cross validation.
1114
L. Zhao et al. / International Congress Series 1281 (2005) 1109–1114
An optimal set of features was determined by an evolutionary computing process. The performance of this feature subset is verified by different validation methods, proving the high discriminatory power of the chosen set in terms of retaining the majority of true nodules and eliminating the majority of false nodules. In comparing our results to other studies reported in literature [5–8], our results show similar or better performance in retaining true nodules and removing false nodules, although we are reluctant to make any strong conclusions from such comparisons due to the different data sets and validation methods used. Further improvement in the performance of the proposed FPR system is expected by using new features to improve the specificity of the algorithm for small structures. References [1] B.B. Tan, et al., The solitary pulmonary nodule, Chest 123 (2003) 89S – 95S. [2] R. Wiemker, et al., Computer aided lung nodule detection on high resolution CT data, SPIE Medical Imaging, 2002, pp. 677 – 688. [3] S.G. Armato III, et al., Computerized detection of pulmonary nodules on CT scans, RadioGraphics 19 (1999) 1303 – 1311. [4] L. Fan, et al., Automatic detection of lung nodules from multi-slice low-dose CT images, SPIE Medical Imaging, 2001, pp. 1828 – 1835. [5] Y.C. Wu, et al., Reduction of false positives in computerized detection of lung nodules in chest radiographs using artificial neural networks, discriminant analysis, and a rule-based scheme, Journal of Digital Imaging 7 (4) (1994) 196 – 207. [6] W.A.H. Mousa, M.A.U. Khan, Lung nodule classification utilizing support vector machines, International Conference on Image Processing, 2002, pp. 153 – 156. [7] H. Yoshida, Local contralateral subtraction based on bilateral symmetry of lung for reduction of false positives in computerized detection of pulmonary nodules, IEEE Transactions on Biomedical Engineering 51 (5) (2004) 778 – 789. [8] K. Suzuki, et al., Massive training artificial neural network (MTANN) for reduction of false positives in computerized detection of lung nodules in low-dose computed tomography, Med. Phys. 30 (7) (2003) 1602 – 1617. [9] B.E. Boser, I. Guyon, V. Vapnik, A training algorithm for optimal margin classifiers, in: D. Haussler (Ed.), 5th Annual ACM Workshop on Computational Learning Theory, ACM Press, Pittsburgh, PA, 1992, pp. 144 – 152. [10] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. [11] K.E. Mathias, et al., Code compaction using genetic algorithms, Genetic and Evolutionary Computation Conference, Las Vegas, NV, 2000, pp. 710 – 717. [12] L.J. Eshelman, The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination, in: G.J.E. Rawlins (Ed.), Foundations of Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, 1991, pp. 265 – 283.