Pulmonary CT image classification with evolutionary programming

Pulmonary CT image classification with evolutionary programming

Pulmonary CT Image Classification with Evolutionary Programming ' Mark T. Madsen, PhD, Renuka Uppaluri, PhD, Eric A. Hoffman, PhD, Geoffrey McLennan, ...

3MB Sizes 9 Downloads 97 Views

Pulmonary CT Image Classification with Evolutionary Programming ' Mark T. Madsen, PhD, Renuka Uppaluri, PhD, Eric A. Hoffman, PhD, Geoffrey McLennan, MD

Rationale and Objectives. It is often difficult to classify information in medical images from derived features. The purpose of this research was to investigate the use of evolutionary programming as a tool for selecting important features and generating algorithms to classify computed tomographic (CT) images of the lung. Materials and Methods. Training and test sets consisting of 11 features derived from multiple lung CT images were generated, along with an indicator of the target area from which features originated. The images included five parameters based on histogram analysis, 11 parameters based on run length and co-occurrence matrix measures, and the fractal dimension. Two classification experiments were performed. In the first, the classification task was to distinguish between the subtle but known differences between anterior and posterior portions of transverse lung CT sections. The second classification task was to distinguish normal lung CT images from emphysematous images. The performance of the evolutionary programming approach was compared with that of three statistical classifiers that used the same training and test sets. Results. Evolutionary programming produced solutions that compared favorably with those of the statistical classifiers. In separating the anterior from the posterior lung sections, the evolutionary programming results were better than two of the three statistical approaches. The evolutionary programming approach correctly identified all the normal and abnormal lung images and accomplished this by using less features than the best statistical method.

Conclusion. The results of this study demonstrate the utility of evolutionary programming as a tool for developing classification algorithms. Key Words. Computers, diagnostic aid; images, analysis; lung, CT.

Emphysema is a debilitating and sometimes fatal disease characterized by the abnormal enlargement of the pulmonary air space and destruction of the alveolar walls. In some individuals it has a genetic origin, but in others it is strongly associated with smoking. The early diagnosis of emphysema is extremely important, as the cessation of smoking can limit the progression of the disease. Normal clinical findings, chest radiographs, and pulmonary function test results, however, have limited sensitivity for detecting early to moderate emphysema. In recent years, the introduction of spiral computed tomography (CT) has permitted a more precise view of the tissue density of the thorax in a reasonable scanning time (10 seconds). While CT is much more sensitive to changes in lung parenchyma than chest radiography, there is a fair amount of interobserver variability in the interpretation of the CT images of the lung (1,2). To reduce this variability, quantitative tissue classification schemes have been applied to these images (3-7). Tissue classification is based on the premise that there are distinct features within the images that can be used as visual cues in identifying specific conditions. The relevant features for the (computer) identification task are often either unknown or cannot be easily codified as an algorithm. In this situation, a set of features is generated from known distributions by using a variety of statistical and texture measures. Extracting the relevant features from this set is a difficult optimization problem. The parameter space spanned by the feature set is multidimensional and

Acad Radio11999; 6:736-741 1From the Department of Radiology, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242. Received October 26, 1998; revision requested March 36, 1999; revision received June 16; accepted June 22. Address reprint requests to M.T.M. ©AUR, 1999

736

a,

b,

Figure 1. The evolutionary programming classification method was compared with statistical classifiers in two different experiments. (a) In the first experiment, the classifiers sought to distinguish between the anterior (A) and the posterior (P) portions of the lung. Normal lung CT images were divided into thirds, and features were extracted from the anterior and posterior sections. (b) In the second experiment, the classifiers sought to distinguish between normal and emphysematous lungs based on the features extracted from the CT images.

nonorthogonal and has a large number of local minima. Traditional methods of feature extraction have used statistical techniques to identify the strongly correlated features and reduce the feature set. While such methods yield results, they are not necessarily optimal. The purpose of this study was to evaluate evolutionary programming for selecting relevant features and building a useful tissue classifier. Evolutionary programming is a general-purpose tool that uses a genetic algorithm to sample large parameter spaces efficiently and to yield an executable program as a final result (8,9). Algorithmic solutions to the problem at hand are encoded as a list of variables referred to as chromosomes. Initially a set of chromosomes is randomly generated. The fitness of the chromosomes is evaluated with a training set presented as an array of observed input parameters (features) and a single value result (truth) that indicates the condition associated with the parameters. The fittest chromosomes are allowed to produce offspring through crossover operations in which portions of two chromosomes are interchanged. Offspring from this operation that show an improved fitness gradually replace less fit members of previous generations. Occasionally, chromosomes are randomly selected for mutation operations in which at least one portion of the chromosomes is changed. These mutations survive if the chromosome fitness is improved. After many generations, a solution evolves analogous to the evolution of an organism operating under natural selection. Although there is no guarantee that the final solution is optimal, evolutionary

programming often provides a superior solution. Many generations (>100,000) are usually required to reach a solution. This is time consuming, and the evolution programming routine can run for days. The result produced by the evolutionary programming, however, is an executable algorithm that "solves" the optimization problem in the training set. This solution is available to be applied to the problem under consideration.

Nine nonsmoking volunteers with no history of pulmonary disease were used to provide the normal lung images, and 10 patients with severe emphysema were used to obtain the abnormal lung images. The diagnosis of emphysema in the 10 patients was based on clinical symptoms and results of pulmonary function tests. CT images of the thorax were acquired with an electron-beam CT scanner (Fastrac C-150-XL; Imatron, South San Francisco, Calif) with 3-mm collimation. The field of view was 300-400 mm, and the reconstruction matrix was 512 x 512 with 11-bit gray-level resolution. All the subjects were scanned during breath holding at maximum inspiration. Four sections from each subject were used in the experiments described below (6). With these data, two classification problems were considered. The first was that of distinguishing between the anterior and the posterior thirds of the normal lung (Fig la). The second classifier was developed to distinguish between normal and emphysematous lungs (Fig lb).

737

lst-Order Measures

2 n d - O r d e r Measures

.....

Mean gray level Gray-level variance Gray-level skewness Gray-level kurtosis Gray-level entropy

Co-occurrence Matrix Measures Entropy Angular second moment Inertia Contrast Correlation Inverse difference moment

Run Length Statistics Short run emphasis Long run emphasis Gray-level nonuniformity Run length nonuniformity Run percentage

Fractal Dimension Fractal dimension

Figure 2. Image feature set.

5 ecosystems (5 independent evolving environments) Mutation rate = 0.03 Stop when fimess < 0.001 Tournament fitness evaluation Figure 3.

Evolutionary programming parameters.

The CT images were preprocessed to remove nonlung structures, and a region-growing technique was used to produce relatively homogeneous regions. The texture of the CT images was evaluated with a combination of statistical and fractal features derived from both the original CT images and the processed homogeneous regions. Figure 2 lists the 17 features that were generated in this study. The firstorder features were obtained from the gray-level histogram. The second-order features were derived from run length and cd-occurrence matrices. The run length matrix consists of elements that specify the number of runs in an image that fall in a certain range of gray level at a specified length. The co-occurrence matrix describes the overall spatial relationship between gray levels in the image. Complete descriptions of the features may be found elsewhere (6,7). The CT images from which the features were extracted were categorized as normal or emphysematous based on the scan interpretation. This information was used to generate the training and test sets for two types of lung image evaluation. The first experiment sought to derive a classifier to distinguish the anterior third from the posterior third of normal lung (Fig la). This test was devised to test the detectability of subtle but known differences in these two lung regions. For the second experiment, a classifier was sought to distinguish between normal and emphysematous lung with the entire section used as the region from which the features were extracted (Fig lb).

738

Ecosystem size = 20 Probability of crossover = 0.9 Maximum no. of generations = 10 million

Statistical methods were first used to find classifiers for the two cases described above. The divergence measure, along with correlation analysis, was used to eliminate the redundant and highly dependent features. When a useful feature vector was found, the training set was used to generate a Bayesian classifier. This method is referred to as the adaptive multiple feature method (AMFM); details can be found elsewhere (6,7). Two other commonly used lung classifiers were also evaluated: the mean lung density (MLD) (10) and the lowest fifth percentile of the histogram (11,12). These classifiers were applied to the test data sets, and their performance was evaluated from the resulting sensitivity, specificity, and accuracy. The same training and test sets were used in generating a classifier with the evolutionary programming approach. There were 30 samples in the training set from 15 normal and 15 emphysematous lung CT sections. Each sample in the training set had the calculated values of the 17 features listed in Figure 2, along with a Boolean label indicating the true identity of the sample. The training set was presented to a commercially available evolution programming software package ("e"; Systems Dynamic International, Fenton, Mo). Since the result of the evolutionary program is an algorithm that optimizes the training set, this result is the classifier. The control parameters used for the evolutionary programming application are listed in Figure 3. These include the number of independent

Table 1 Identification-oLthe Anterior and Posterior Regions of Lung Correct Classification (%) Method Test set data Evolutionary programming AMFM Evolutionary programming and AMFM MLD Histogram* Training data AM FM Evolutionary programming and AMFM

Posterior

Anterior

Accuracy

85.7 82.1 82.1 96.4 82.1

80.6 96.8 96.8 54.8 48.4

83.0 89.8 89.8 74.6 64.4

100 (27/27) 100 (27/27)

84.4 (27/32) 91.5 (29/32)

90.6 94.9

Note.-Numbers in parentheses are correct calls/total images. *Lowest fifth percentile of the histogram.

evolving environments referred to as ecosystems and the number of starting members in each ecosystem. The probability of mutation, mating (chromosome exchange through crossover operations), and the type of fitness selection are also specified along with the stopping criteria. The resulting evolutionary programming classifier was applied to the test data sets for the anterior-posterior and normal-versus-emphysema problems, and its performance was evaluated from the calculated sensitivity, specificity, and accuracy. An additional training set was generated for the anterior-posterior identification problem in which the results from the AMFM classifier were included. The AMFM classifier was included simply by putting its classification result as an additional column in the feature list for each of the 30 samples in the training set. An evolutionary programming solution for this new training set was made and evaluated on the test set. It should be noted that the training and test sets were not totally independent, since (different) sections from the same individual were used in both the training and test sets.

The results of the classifiers in separating the anterior from the posterior region of normal lung are summarized in Table l. The overall accuracy of each method was calculated as 100 times the ratio of correct calls to total images. None of the methods was 100% successful in accurately identifying all of the regions. This is not surprising, because the differences between these two regions are subtle. The MLD classifier performed very well in identifying the posterior segment. This is also not surprising, as the gravita-

tional pooling of the blood and reduction of the alveolar dimensions alter the density enough to set a reliable density threshold for this region. A simple density threshold, however, is not sufficient to classify the anterior segment. The evolutionary programming classifier overall performed better than the MLD and the histogram methods and nearly as well as the AMFM method. When the evolutionary programming routine was run again with the results of the A19IFM classifier included as part of the training set, the new evolutionary programming classifier produced the same results as the AMFM classifier (accuracy, 89.8%). The performance of the new evolutionary programming classifier on the training set, however, was significantly better than that achieved by the AMFM classifier. Although the improved performance with the training data did not carry over to the test set, this result illustrates another important feature of the evolutionary programming approach: the ability to incorporate other approaches. The results for separating the normal lung from the emphysematous lung are given in Table 2. The MLD and histogram classifiers had difficulty identifying several of the sections, while both the evolutionary programming and the AMFM methods developed classifiers that were 100% accurate for the training and test sets. The AMFM classifier used three of the features, whereas the evolutionary programming classifier required only two. The latter was very simple: IF (5.65/GLN - MLD) < 0 THEN Normal Lung ELSE

Emphesyma

END.

739

The features used in this solution were the gray-level nonuniformity (GLN) and t he~VILD. The evolutionary programming method arrived at this solution after 18,000 generations, which took about 20 minutes on a computer with a 75-MHz Pentium central processing unit.

The development of useful image classifiers has many important applications in medicine. Finding useful classifiers can be very challenging, however, because no general analytic solutions exist for selecting and combing relevant features, and the search space is too large to allow exhaustive sampling of every possible combination of features. This is precisely the type of problem that is well suited for evolutionary programming. Although there is no general solution for finding the optimal result in large parameter spaces, genetic algorithms have demonstrated that they can efficiently discover very good, if not optimal, solutions (8,9). Genetic algorithms have been previously used in image classification problems to obtain a relevant feature vector (13-16). A genetic algorithm, however, yields a parametric solution, while the end result of evolutionary programming is an algorithm that can be compiled and executed. In the study presented herein, the evolutionary programming classifiers performed comparably to classifiers that resorted to sophisticated statistical manipulations. Since evolutionary programming does not require any problem-specific modeling for its solution search, it is very easy to implement. The evolutionary programming parameters were set according to the value recommended by the software vendor. Although the evolutionary programming parameters can be varied over any desired range, most investigators set the crossover probability relatively high (>0.8) and the mutation rate low (<0.01). The performance of the evolutionary programming classifier depends primarily on two factors: the feature vector and the suitability of the training set. Within the feature vector, there has to be one or more components that are either separately or in some combination a solution to the classification problem. A good solution will not be forthcoming, however, unless the training set spans the range of the classifier. Both the feature vector and the training set are strong determinants in the final solution. In this study, we compared the performance of an evolutionary programming classifier with that of a Bayesian approach that has been previously validated. By using the same feature vector and training set, one can make direct comparisons. In cases where a near-optimal technique

740

Table 2 Identification of Normal and Emphysematous Lung Sections

Method Evolutionary programming AMFM MLD Histogram*

Sensitivity (%)

Specificity (%) Accuracy (%)

100

100

100

100 95.0 95.0

100 94.4 100

100 94.7 97.4

*Lowest fifth percentile of the histogram.

already exists, evolutionary programming may not offer much of an advantage. For example, evolutionary programming did not improve the performance of the AMFM method on the test data set. The potential of the method, however, is illustrated by the improved result obtained on the training set data for the anterior-posterior classification problem when the results of the AMFM classifier were included in the training set. This combination increased the accuracy of classification in the training set by nearly 5%. This is typical of the type of performance enhancement that is often achieved when genetic algorithms are applied to difficult optimization problems (9). Improvements of this magnitude are often important, especially in general screening applications with a large number of samples. In conclusion, evolutionary programming appears to be an important and useful tool for finding algorithmic solutions to complicated problems where no optimal solution is known. Although it can be used in a wide range of applications, the development of image classifiers demonstrates its ability to derive a useful solution from a complex data set. IEFERENCEE 1. Zompatori M, Fasano L, Fabbri M, et al. Assessment of the severity of pulmonary emphysema by computed tomography. Monaldi Arch Chest Dis 1997; 52:147-154. 2. Bergin C, Muller N, Nichols DM, et al. The diagnosis of emphysema: a computed tomographic-pathologic correlation. Am Rev Respir Dis 1986; 133:541-546. 3. Crausman RS, Ferguson G, Irvin CG, Make B, Newell JD Jr. Quantitative chest computed tomography as a means of predicting exercise performance in severe emphysema [published erratum appears in Acad Radio11995; 2:870]. Acad Radio11995; 2:463-469. 4. Kinsella M, Muller NL, Abboud RT, Mordson NJ, DyBuncio A. Quantitation of emphysema by computed tomography using a "density mask" program and correlation with pulmonary function tests. Chest 1990; 97:315-321. 5. Rodriguez LH, Vargas PF, Raft U, et al. Automated discrimination and quantification of idiopathic pulmonary fibrosis from normal lung parenchyma using generalized fractal dimensions in high-resolution computed tomography images. Acad Radio11995; 2:10-18. 6. Uppaluri R, Mitsa T, Sonka M, Hoffman EA, McLennan G. Quantification of pulmonary emphysema from lung computed tomography images. Am J Respir Crit Care Med 1997; 156:248-254.

7. Uppaluri R, Hoffman EA, Sonka M, Hunninghake GW, McLennan G. Interstitial lung disease: a quantit~ive study using the adaptive multiple feature method. Am J Respir Crit Care Med 1999; 159:519-525. 8. Fogel D. Evolutionary computation: toward a new philosophy of machine intelligence. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 1995. 9. Mitchell M. An introduction to genetic algorithms. Cambridge, Mass: MIT Press, 1996. 10. Goddard P, Nicholoson E, Laszlo G, Watt J. Computed tomography in pulmonary emphysema. Clin Radio11982; 33:379-387. 11. Gould GA, MacNee W, McLean A, et al. CT measurements of lung density in life can quantitate distal airspace enlargement: an essential defining feature of human emphysema. Am Rev Respir Dis 1988; 137: 380-392.

12. Muller NL, Staples CA, Miller RR, Abbeud RT. "Density mask": an objective method to quantitate emphysema using computed tomography. Chest 1988; 94:782-787. 13. Fogel DB, Wasson EC III, Boughton EM, Porto VW. Evolving artificial neural networks for screening features from mammograms. Artif intell Med 1998; 14:317-326. 14. Fogel DB, Wasson EC III, Boughton EM. Evolving neural networks for detecting breast cancer. Cancer Lett 1995; 96:49-53. 15. Sahiner B, Chan HP, Wei D, eta[. Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue. Med Phys 1996; 23:1671-1684. 16. Sahiner B, Chan HP, Petrick N, Helvie MA, Goodsitt MM. Design of a high-sensitivity classifier based on a genetic algorithm: application to computer-aided diagnosis. Phys Med Bio11998; 43:2853-2871.

"Abdominal Vessel Enhancement with an Ultrasmall, Superparamagnetic Iron Oxide Blood Pool Agent." Acad Radiol 1999; 6:292-298. Figures 2a and 2b were inadvertently reversed. Figure 2 should have appeared as follows:

a.

d.

,

b.

c.

e.

f.

Figure 2. Dose dependence at both field strengths with the shortest echo times. (a-c) MR images obtained at 0.5 T show the aorta and the IVC from three subjects who were injected with (a) 1.0 mg, (b) 2.5 mg, and (c) 4.0 mg of iron per kilogram of body weight. (d-f) MR images show the corresponding comparison at 1.5 T for (d) 1.0 mg, (e) 2.5 mg, and (f) 4.0 mg of iron per kilogram of body weight. At both field strengths, higher signal intensities are seen with the two higher dose levels. Image noise is increased at the lower field strength. At the right side of the abdomen, three markers with known T1 and T2 values can be seen.

741