Computerized Medical Imaging and Graphics 29 (2005) 419–429 www.elsevier.com/locate/compmedimag
Image segmentation feature selection and pattern classification for mammographic microcalcifications J.C. Fua,*, S.K. Leeb, S.T.C. Wongc, J.Y. Yeha, A.H. Wanga, H.K. Wud a
Automated Measurement and Diagnostic Systems Laboratory, Department of Industrial Engineering and Technology Management, Da-Yeh University, 112 Shan-Jeau Rd, Da-Tsuen 515, Chang-Hwa, Taiwan, ROC b Su-Ao Veterans General Hospital, Taiwan, ROC c Department of Radiology, Brigham & Women’s Hospital, Harvard Medical School, USA d Department of Radiology, Chang-Hua Christian Hospital, Taiwan, ROC Received 16 April 2004; revised 26 January 2005; accepted 17 March 2005
Since microcalcifications in X-ray mammograms are the primary indicator of breast cancer, detection of microcalcifications is central to the development of an effective diagnostic system. This paper proposes a two-stage detection procedure. In the first stage, a data driven, closed form mathematical model is used to calculate the location and shape of suspected microcalcifications. When tested on the Nijmegen University Hospital (Netherlands) database, data analysis shows that the proposed model can effectively detect the occurrence of microcalcifications. The proposed mathematical model not only eliminates the need for system training, but also provides information on the borders of suspected microcalcifications for further feature extraction. In the second stage, 61 features are extracted for each suspected microcalcification, representing texture, the spatial domain and the spectral domain. From these features, a sequential forward search (SFS) algorithm selects the classification input vector, which consists of features sensitive only to microcalcifications. Two types of classifiers—a general regression neural network (GRNN) and a support vector machine (SVM)—are applied, and their classification performance is compared using the Az value of the Receiver Operating Characteristic curve. For all 61 features used as input vectors, the test data set yielded Az values of 97.01% for the SVM and 96.00% for the GRNN. With input features selected by SFS, the corresponding Az values were 98.00% for the SVM and 97.80% for the GRNN. The SVM outperformed the GRNN, whether or not the input vectors first underwent SFS feature selection. In both cases, feature selection dramatically reduced the dimension of the input vectors (82% for the SVM and 59% for the GRNN). Moreover, SFS feature selection improved the classification performance, increasing the Az value from 97.01 to 98.00% for the SVM and from 96.00 to 97.80% for the GRNN. q 2005 Elsevier Ltd. All rights reserved. Keywords: Microcalcification; Mammogram; Feature selection; General regression neural network; Support vector machine
1. Introduction Microcalcifications (mCaCCs) in mammographic images are major indicator in early detection of breast cancer [1,2]. Because of the small difference in image density between the mCaCCs and the surrounding breast tissues, it is not easy for the radiologists to manually locate microcalcification clusters. An effective computer-aided diagnostic (CAD) system is needed to help radiologists * Corresponding author. Tel.: C886 4 852 8469e2235; fax: C886 4 853 0591. E-mail address:
[email protected] (J.C. Fu).
0895-6111/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.compmedimag.2005.03.002
interpret digital X-ray images and to improve diagnostic accuracy. Much has been published on the detection of mCaCC clusters. The algorithms in the published literature proceed in three phases: (1) pre-processing to reduce the noise and/or enhance the microcalcification clusters, (2) feature extraction to extract mCaCC related features, and (3) classification to detect the occurrence of mCaCCs [1,3–7]. The CAD systems are based on either a one-stage model or a two-stage model. In a one-stage model, after the first two phases, the entire mammographic image undergoes phase 3 classification. In their one-stage model, Zheng et al. [8] compute four features from the spatial and spectral domains and use a feed-forward neural network to classify the input features. Strickland and Hahn [9] decompose the X-ray
420
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
image into four levels of spatial wavelet coefficients, which are input to a threshold model for mCaCC feature extraction and classification. Kim and Park [10] scan the image using a 128!128 mask, then extract texture-related features for input to a feed-forward neural classification network. One major disadvantage of one-stage models is that they classify each block of the scanned image, which greatly increases classification complexity. To solve this problem, two stage models [1,11] process the whole image for detection of suspected mCaCCs (first stage) and perform classification (second stage) on the suspected mCaCCs only. Each stage individually adopts features and classifiers suitable for its own purpose. Yu and Guan [11] developed a two-stage microcalcification detection system. In the first stage, suspected microcalcifications are detected by a backpropagation neural network based on four input features extracted from among wavelet coefficients and spatial domain statistics. In the second stage, a data mining algorithm selects the top 15 features out of 31 textural and spatial domain features and feeds them into a general regression neural network for classification. Results from FROC curves show that Yu’s two-stage model outperform one-stage models such as [9,12]. One major disadvantage of Yu’s model is that the first and second-stage classifiers have to be individually trained, which could consume significant time in system development and maintenance. Feature extraction and classification are two major functions of computerized mCaCC classification systems. Most systems extract features from the texture, spatial and spectral domains. Textural features represent properties of the object’s surface. Haralick et al. [7] developed a graytone spatial-dependence matrix method (known as the co-occurrence matrix method) to describe patterns of graylevel repetition. Statistics derived from the co-occurrence matrix are used to characterize the textural pattern. The cooccurrence matrix method is widely used in biomedical image processing [3,11]. Spatial domain features encode shape, pixel intensity, and other object characteristics. Zheng et al. [8] calculate the pixel intensity variance and energy variance of each image block. Yu and Guan (2000) segment the borders of suspected microcalcifications and extract features representing their compactness, elongation and invariant moment. In the spectral domain, Zheng et al. (1996) calculated the discrete cosine transform (DCT) of X-ray images and derived the block activity and spectral entropy from the DCT coefficients [11]. Back-propagation neural networks (BPNN) were once commonly used for mCaCC classification [8,10]. However, a major disadvantage of BPNNs is that they lack a scheme for choosing training parameters such as the transfer function, number hidden layer nodes, and convergence algorithm. General regression neural networks (GRNNs) were later introduced to replace the BPNN [11]. Unlike BPNNs, the GRNN is based on non-linear regression theory for function estimation. One major advantage of a GRNN is
the simplicity of the network training, which does not require time-consuming procedures based on trial-anderror. The drawback of GRNN’s is that large data sets increase network size and computational load. In addition to the problem of dimensionality, the GRNN does not generate the optimal boundary for problems such as mCaCC classification, which have two classes (microcalcification present and microcalcification absent). For two-class problems, an optimal classifier should in principle seek the hyperplane that produces the largest margin of separation (structural risk minimization) instead of minimizing learning error alone [5,13]. Support Vector Machines (SVMs) minimize the structural risk by using non-linear mapping to transform the input space to a highdimensional feature space, where a hyperplane is constructed to maximize the separation margin. SVM has recently been employed in mammographic mCaCC classification [1,5]. Though experimental results show that SVM outperforms other types of neural networks, the impact of relevant feature selection on classification performance has not been addressed. This paper proposes a two-stage computer aided diagnostic system for mammographic mCaCC classification. The output of the first stage is the suspected mCaCCs, an approach that allows false positives but ensures against false negative. During the first stage, closed form mathematic formulae are used to find suspected microcalcifications. During the second stage, the suspected microcalcifications are classified based on the optimal classification feature vector, which is selected by SFS from among 61 features in the textural, spatial, and spectral domains. The performance of two types of classifiers, GRNN and SVM, is investigated with and without prior SFS input feature vectors selection. Az values of the Receiver Operating Characteristic (ROC) curves are used to evaluate the classification performance.
2. Methods and techniques Fig. 1 shows the flow of the proposed system. In the first stage, spot-like characteristics in the original X-ray image are enhanced before undergoing border detection and filling to distinguish them from the background. Closed-form mathematical formulae are employed for initial First Stage X-ray Mammograms
Microcalcification Enhancement
Border Detection & Filling
Suspected Microcalcifications
Second Stage Feature Extraction
Two-class Pattern Classification
SFS Feature Selection
Microcalcification Present
Microcalcification Absent
Fig. 1. Diagram of the proposed system.
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
Fig. 2. Illustration of top-hat morphological processing.
classification of these spots (suspected microcalcification or not). In the second stage, for each suspected microcalcification, 61 features are extracted from the textural, spatial, and spectral domains. The microcalcification classification is performed twice. The first time, all 61 features are input to the GRNN or SVM. The second time, the features first undergo SFS feature selection to select the optimal combination of features for each type of classifier (GRNN and SVM). For each run of the system (GRNN, SVM, SFSCGRNN, and SFSCSVM), the ROC curve is derived as a measure of classification performance. 2.1. Segmentation of suspected microcalcifications The first stage determines the location and border of the suspected mCaCCs. This process begins with enhancement of the spot-like characteristics via top hat morphological processing. The top-hat transform is used to segment objects that differ in brightness from the surrounding background in images with uneven background intensity [14]. The formula is: B Z A K ½ðA1SEÞ4SE where
A B SE 1 4 ,
input image (the X-ray mammogram) processed image structure element morphological erosion operation morphological dilation operation image subtraction operation
[(A1SE)4SE] is also known as the morphological opening operation. After the image subtraction operation,
421
the peaks in the original image A stand out for detection. Fig. 2 shows a 1D profile of an image that has undergone the top-hat transform [15]. A more detailed description of the top-hat transform can be found in [2]. Fig. 3 shows an example X-ray mammogram processed by top-hat transform. Fig. 3a is the original X-ray mammographic image. Fig. 3(b) is the original image processed by the top-hat transform, with linear gray-level scaling applied to for better visual effect. Fig. 3(c) and (d) are 3D displays of the images in Fig. 3(a) and (b). In Fig. 3(c) and (d), the X and Y axes correspond to the X and Y axes in Fig. 3(a) and (b), and the Z values correspond to the gray-levels in Fig. 3(a) and (b). Fig. 4 shows a block diagram of the border detection of filling operation for suspected mCaCCs. After top-hat enhancement, the image undergoes Sobel and Canny edge detection to segment the enhanced borders from the background image. The Sobel edge detector applies Sobel approximation to the derivative of the image and detects edges whenever the gradient of reconstructed input image is at its maximum [16]. The Canny edge detector finds edges by looking for local maximum of the gradient of unprocessed input image. In each edge detection algorithm, the gradient is calculated using the derivative of a Gaussian filter [16], and the output is a binary image, where 1 represents edges and 0 represents background. The Sobel and Canny binary output images are postprocessed by the flood-filling operation to fill all objects with closed borders. They are then logically ORed together to produce a new image that possibly includes false positives but discourages false negatives. This new image undergoes morphological closing to eliminate noise. Fig. 5 illustrates first stage image processing, including a block from the X-ray mammogram image in original form (Fig. 5(a)), top-hat enhanced (Fig. 5(b)), and in binary form after morphological closing (Fig. 5(c)). The white dots in Fig. 5(c) signal suspected mCaCCs. The first stage adopts a closed form, data-driven model to detect suspected mCaCCs. Data-driven algorithms, which rely solely on input data, have the advantage that they are not subject to parameter selection and system training such as ad hoc parameter-based algorithms. In addition to location information, the proposed model characterizes
Fig. 3. X-ray microcalcification enhancement by morphological processing.
422
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429 Table 1 Features from texture analysis (qZq8)
Border Detection & Filling
Top-hat Output
Sobel Edge Detector
Flood-filling Operation OR
Morphological Closing Operation
Suspected Microcalcifications
No.
Name
No.
Name
No.
Name
1
5
9
Correlation
10
Contrast
11
Angular second moment
8
Sum variance Sum average Inverse difference moment Variance
Fig. 4. Flow chart for detection of suspected microcalcifications.
3
Difference entropy Difference variance Entropy
the shape of the suspected mCaCCs, providing essential information for further extraction of shape related features.
4
Sum entropy
Canny Edge Detector
Flood-filling Operation
2.2. Feature extraction and selection for microcalcification classification The second stage begins with extraction of 61 features for each block from the textural, spatial and spectral domains. Further feature selection is required, since not all these features are relevant to mCaCC classification. The goal of feature selection is to choose the optimal feature vector, consisting of the features that minimize the classification error. The feature extraction and selection processes are described below. Of the original 61 features, 44 come from the textural domain, 15 from the spatial domain, and 2 from the spectral domain. In its current form, the system detects the occurrence of mCACCs, but does not classify them as benign or malignant. Therefore, all 61 features describe individual suspected mCACCs [5,8,11]. In the future, the system will be equipped with a post-processor that will perform benign/malignant classification based on additional features that characterize clustering among the microcalcifications. 2.2.1. Textural information This paper, like most published work on mCaCC classification, uses co-occurrence matrices to describe textural properties. If the input block B measures M!N pixels, the gray-level configuration is characterized by a set of matrices Pq,d(a, b), each of which describes the relative frequently of pixels with gray-levels a and b in the M!N block Bq,d located a distance d and angle q relative to B [15]. In this paper, each block measures 16!16 pixels, and the center of the block coincides with the center of mass of the suspected mCaCC. The parameter values are dZ1 and qZ0, 45, 90 and 1358. Table 1 lists eleven features extracted
2
7
for each (d,q) pair (A detailed formula can be found in [7].). Thus, for each suspected microcalcification, five (d,q) pairs yield 44 textural features. Features 1–11 correspond to qZ08, features 12–22 to qZ458 features 23–33 to qZ908 and features 34–44 to qZ1358. 2.2.2. Spatial domain Spatial domain features include both shape-related features and window-based features. Shape-related features require information on the foreground and background of each suspected mCaCC. The foreground is the binary output of the first stage, and the background information is the foreground, image subtracted from the morphologically dilated foreground. Fig. 6 illustrates calculation of the foreground and background of a suspected mCaCC. Table 2 lists the 13 shape-related features (features 45–57) extracted from each suspected mCaCC. Further details can be found in Reference [11]. In contrast to shape-based features, window-based features are extracted from the image within a rectangular window. In this paper, two window-based features are extracted for each suspected mCaCC within the 16!16 pixel block window. They are the pixel intensity variance (i var, feature 58) and energy variance (e var, feature 59), given by [8]: i var Z
16 X 16 X
½Iðm; nÞ K avg2
mZ1 nZ1
e var Z
16 X 16 X ½Iðm; nÞ2 K eavg2 mZ1 nZ1
where avg Z
16 X 16 X 1 Iðm; nÞ 16 !16 mZ1 nZ1
eavg Z
Fig. 5. Input/output images in the first stage.
6
16 X 16 X 1 I 2 ðm; nÞ 16 !16 mZ1 nZ1
and I(m, n) is the gray-level of the pixel located in row n and column m of the window.
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
423
Fig. 6. Foreground and background of a suspected microcalcification.
2.2.3. Spectral domain Spectral domain features describe the frequency characteristics of the input image. In this paper, two spectral domain features are extracted from each 16!16 suspected mCaCC. They are the block activity (A, feature 60) and the spectral entropy (E, feature 61), given by [8]: AZ
16 X 16 X
among these is the set that minimizes MSE. SBS, the backward counterpart of SFS, begins with a full set of features, from which it and removes features one at a time [11]. The major advantage of SFS and SBS is that they reduce the solution space from 2NK1 to N
jXði; jÞj
1
!
N C
2
!
N C. C
!
N
mZ1 nZ1
E ZK
16 X 16 X
jÞln½Xði; jÞ Xði;
mZ1 nZ1
where X(i, j) is the discrete cosine transform (DCT) coefficient in row i and column j of the DCT matrix jÞ Z jXði; jÞj : Xði; A The advantage of extracting features from the textural, spatial and spectral domains is that features from all three domains can be used for mCACC classification. However, one practical problem-feature selection-emerges. The purpose of feature selection is to choose the subset of features that yield optimal mCACC classification performance (minimum error) while reducing the computational burden. For a problem with N features, an exhaustive search would require evaluation of 2NK1 subsets (excluding the empty set), clearly impractical in problems with large data sets [17]. Yu and Guan [11] apply the sequential forward search (SFS) and sequential backward search (SBS) to select features for mCACC classification. Experimental results show that SFS and SBS reduction of the feature vector improves classification accuracy. SFS begins with an empty feature set S and all N features marked as unchosen. At each iteration, one feature from among the unchosen features—the one that minimizes the mean square error (MSE)—is added to S. After N iterations, S includes all the features, and the selection stops. The result is N non-empty feature sets S13S23/3SN. The best
without missing the optimal subset [18]. In this paper, we used SFS feature selection because it lists the features relevant to classification in descending order—i.e. the first chosen feature is the most discriminatory, the second is the second most discriminatory, and so on. This ranking may be useful for future system expansion such as content-based image retrieval [5]. There are two schemes for training classifiers. The first is to divide the data into two sets: training and test, with the training set consisting of k subsets of equal size. For each combination of features, the classifier is trained k times, each time reserving one subset for later estimation of the generalization error. A major disadvantage of this approach is that the k-fold cross-validation training scheme is so time consuming [18]. The second training scheme divides the data into three sets: training, test, and validation. Because the validation data set is used to estimate the generalization error, the whole test set is applied for its original purpose, double checking the performance of the trained classifiers. Table 2 Shape related features No.
Name
No.
Name
45 46 47
Mean Standard deviation Foreground background ratio Foreground background difference Difference ratio
50 51 52
Area Compactness Elongation
53–56
Shape moment I–IV
57
Invariant moment I
48 49
424
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429 (X)
Validation Data Set Training Data Set Optimal Feature Set
MSE
Y1
Yn
1
No. of Feature Included
Summation Layer
B
A
0
Output Layer
A/B
1
…
Full Set
Hidden Layer
Fig. 7. Feature selection by SFS.
Optimal performance is achieved when the validation error is minimized [19]. The validation data set approach takes significantly less training time because it requires only one run, instead of k runs, per feature combination. Since the initial feature set is so large (61 in this paper), we implement the validation data set approach during SFS feature selection. Fig. 7 illustrates the feature selection process. 2.3. Microcalcification classification This paper investigates the performance of two classifiers. One is the GRNN, a neural network that employs a base of radial functions for functional approximation. The other is the SVM, which seeks the optimal boundary between two classes. 2.3.1. General regression neural networks It has been shown that a continuous function can be approximated to arbitrary accuracy by a linear combination of a sufficient number of radial basis functions (Gaussian functions) [20]. Spetch [17] developed a normalized radial basis function network, named the general regression neural network (GRNN), for estimation of continuous variables. Let X denote a particular measured value of the random variable x, y a scalar random variable, and f(X, y) the joint probability density function of X and y. Then the regression of y on X is given by ðN ðN E½yjX Z yf ðX; yÞdy = f ðX; yÞdy KN
KN
Using a non-parametric density-estimation method developed by Parzen (1962), the desired conditional mean of y given X is n n X X ^ YðXÞ Z Yi expðKD2i =2s2 Þ= expðKD2i =2s2 Þ where ^ YðXÞ n Xi yi
iZ1
iZ1
desired condition mean size of the training data set ith input vector in the training set ith output value of the training set
D2i Z ðX K XiÞT ðX K XiÞ s width of the estimating kernel
Input Layer
… X= [x1
x2
…
x d]
Fig. 8. Topology of GRNN.
Fig. 8 shows the network topology of a GRNN, where d is the dimension of the input vector X [11]. Nodes in the input layer simply pass the values in the input vector X to all the nodes in the hidden layer. The number of nodes in the hidden layer is equal to the size of the training set. The transfer function (kernel) of the ith hidden layer node is expðKD2i =2s2 Þ. In the summation layer, node A sums expðKD2i =2s2 Þ times Yi, and node B sums expðKD2i =2s2 Þ. The output node generates the predictor ^ YðXÞZ A=B. The advantage of a GRNN are its simplicity in network structure and network training: when s is known, only a single pass of learning is required to decide all the network weights. The drawback is the large network size and computational cost that result from a large training set. 2.3.2. Support vector machine The SVM approach—also known as structural risk minimization—is based on intuitive geometric principles. Given a collection of vectors in Rd, labeled C1 or K1, a SVM finds the hyperplane with the maximal separation margin, meaning that it maximizes the distance between the hyperplane and the nearest labeled vectors. Since most problems in Rd are not linearly separable, SVM uses the soft margin technique to maximize the margin in the feature space. To simplify the computation, the feature space is taken to be the non-linear projection of the input data to a higher dimensional space by a kernel function. As the kernel function, we select the Gaussian (the GRNN basis function) for purposes of performance comparison. The hyperplane optimization process is as follows [13,21]: (a) A linear classifier is defined by a hyperplane’s normal vector W and an offset b. The decision boundary is (W$X)CbZ0, where X is an input vector. The halfspaces defined by this hyperplane correspond to two classes, which we denote sign[(W$Xi)Cb]. Assuming that the training data (Xi, yi), iZ1, 2,.,n can be separated by this hyperplane, the labels are yi((W$Xi)C b)R1, where yiZG1. The margin is the distance between the hyperplane and the support vectors, which
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
425
can be represented by 2/kWk, the length of the weight vector W [21]. The problem of determining the hyperplane with the maximum separation margin is represented as follows: minimizejjWjj2 =2
subject to yiððW$XiÞ C bÞR 1;
i Z 1; 2; .; n where n Xi Yi (b)
size of the training data set ith input vector in the training set ith output value (K1 or C1) in the training set. Solutions to the above minimization problem exist if the two classes are linearly separable. To allow for classification errors, the hard margin is replaced by a soft margin. The objective function and the constraints are modified as follows:
minimize jjWjj2 =2 C C !
n X iZ1
xiR 0 where a positive value of xi signifies a misclassification of the ith training vector Xi. This quadric optimization problem can be simplified to its dual formulation,
subject to
n X iZ1 n X
ai
X K
but not the support vector, the optimization mechanism will set the corresponding ai to 0. This mechanism will reduce the computational time.
3. Results and discussion
xi
subject to yiððW$XiÞ C bÞR 1 K xi; i Z 1; 2; .; n
maximize
Fig. 9. Topology of SVM.
n X n 1X a a ðXi$XjÞyi yj 2 iZ1 jZ1 i j
ai yi Z 0; 0% ai % C:
iZ1
A theorem by Mercer gives an easy way to compute the dot product in feature space [21]. The ith input vector Xi is mapped to a higher dimension feature space by a non-linear function F(Xi). Provided that the kernel function K(X, Z) is continuous symmetric and positive definite, F(X)$F(Z)ZK(X, Z). For a Gaussian kernel, KðX; ZÞZ expðKjjXK Zjj2 =2s2 Þ, and therefore we do not need to calculate each singular mapping. The classifier and the optimization problem can now be restated as follows: ! n X Classifier : sign ai yi KðXi; XÞ C b
The two-stage model was tested on the Nijmegen database [9,11,22,23]. The original bit depth is 12 and the resolution is 2048!2048. Strickland and Hahn [9] reported that many mCaCCs fall out of the circles confirmed by biopsy in images C12o and C12c. To avoid biased evaluation, we excluded these two images from the data analysis. This paper used a non-linear mapping technique provided by the Nijmegen database that maps from 212-level to 28-level gray scales to reduce computational time [23]. The mapped images were input to the first stage algorithms, which calculated the suspected mCACCs. Visual inspection of the output images and their corresponding X-ray originals confirmed that the all the mCACCs circled in the original X-ray images had been detected. Moreover, the mCACC enhancement with border detection and filling detected 100% of the mCACC clusters. Fig. 10 shows the first stage input/output image for image number C16o. In Fig. 10a, the original X-ray image, the mCACC cluster in the lower left corner is circled. In Fig. 10b, the output image, the white dots indicate suspected mCACCs.
iZ1
where X is an input vector. Optimization : maximize
n X iZ1
ðXi; XjÞyi yj
subject to
n X
ai
X K
n X n 1X aaK 2 iZ1 jZ1 i j
ai yi Z 0; 0% ai % C:
iZ1
Fig. 9 shows the topology of a SVM with Gaussian kernel [14]. When the input vector Xi is correctly classified
Fig. 10. Location of suspected mCACCs.
426
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
Table 3 Distribution of selected data Data set Image block mCACCs (a) Normal tissue (b) Total (a)C(b)
Training data set (1)
Validation data set (2)
Test data set (3)
Total (1)C (2)C(3)
558 3116 3674
279 1559 1838
280 1559 1839
1117 6234 7531
The data in the training, test, and validation sets were randomly selected from the pool of suspected mCACCs. Each selected block was covered by a 16!16 window whose center coincided with the center of mass of the suspected mCACC. The blocks included 1117 with true mCACCs and 6234 with normal tissue. Table 3 gives a breakdown of the selected blocks among the three data sets. In all, 50% of the blocks were assigned to the training set, 25% to the test set, and 25% to the validation set. Only one parameter, s (the width of the Gaussian function), is decided based on GRNN and SVM training. The 3674 feature vectors from the training set were used sto determine the s that minimizes the mean square classification error. GRNN training yielded an optimal s of 1.1 in the range [0.1,200] for resolution 0. Since the SVM classification error was 0 for s in [0.1, 8], SVM training was restricted to that interval, with a resulting value at sZ6.2 vilified by the validation data set. Therefore, sZ1.1 and sZ6.2 were the parameter values used to evaluate
the performance of the GRNN and SVM classifiers and SFS feature selection. Fig. 11 shows the discriminatory power of the 61 features as ranked by SFS. The GRNN’s target output value is C1 for mCACCs, 0 for normal tissues. Since the output of a SVM is either C1 or K1, a post-processor convert K1 to 0 for purpose of GRNN comparison. Fig. 11 and Table 4 shows the results of the SFS feature selection. As shown in Fig. 11a, the minimum mean square (MSE) error occurred when the top 25 GRNN features were chosen. These features are listed in Table 4a, where the order number corresponds to the abscissa in Fig. 11a. As shown in Fig. 11b, the minimum MSE error for the SVM validation set (0.0283) was reached when the top 11 features (Table 4b) were selected. Fig. 11b shows the results of the feature selection for SVM’s. Fig. 4b clearly shows that the overfitting occurs after the training data MSE reaches to 0. The foreground background difference (No. 48), a shapebased feature, is the top ranked feature for both GRNN and SVM. Table 5 gives a further breakdown of the GRNN and SVM feature selection results by feature type. The underlined numbers identify features selected for both the GRNN and the SVM. A sub-table shows the textural features arranged by type and angle (11 types!4 anglesZ44 features), where dark highlighting indicates selected features. Data analysis shows that all three types of features are relevant to mCACCs. Of the 11 SVM-relevant features,
Fig. 11. Results of SFS feature selection.
Table 4a SFS Ranking of features by discriminatory power Order
Feature no. and name
Order
Feature no. and name
Order
Feature no. and name
1 2 3 4 5 6 7 8 9
48, Foreground background difference 11, Difference entropy (08) 44, Difference entropy (1358) 22, Difference entropy (458) 49, Difference ratio 43, Difference variance (1358) 28, Sum average (908) 35, Contrast (1358) 21, Difference variance (458)
10 11 12 13 14 15 16 17 18
32, Difference variance (908) 50, Area 46, Standard deviation 45, Mean 10, Difference variance (08) 47, Foreground background ratio 13, Contrast (458) 25, Correlation (908) 2, Contrast (08)
19 20 21 22 23 24 25
36, Correlation (1358) 41, Sum entropy (1358) 14, Correlation (458) 60, Block Activity 19, Sum entropy (458) 24, Contrast (908) 3, Correlation (08)
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
427
Table 4b Relevant features for SVM Order
Feature no. and name
Order
Feature no. and name
Order
Feature no. and name
1
5
56, Shape moment IV
9
43, Difference Variance (1358)
2
48, Foreground background difference 21, Difference variance (458)
6
10
33, Difference Entropy (908)
3 4
17, Sum average (458) 57, Invariant moment I
7 8
34, Angular second moment (1358) 46, Standard deviation 61, Spectral entropy
11
3, Correlation (08)
Table 5 Relevant features selected by SFS
Relevant Features for GRNN
Type
Relevant Features for SVM
Amount
Feature No.
18
2, 3, 10, 11, 13, 14, 19, 21, 22, 24, 25, 28, 32, 35, 36, 41, 43, 44
Textural
θ
Information
0˚
Amount
Feature No 1
2 3 456
78
3, 17, 21, 33, 34, 43
6
6
Spectral Domain
1
θ 0˚
9 10 11
Feature No 1
2 3 4 5 6 7 8 9 10 11
45˚ 12 13 14 15 16 17 18 19 20 21 22
45˚ 12 13 14 15 16 17 18 19 20 21 22
Spatial Domain
Feature No.
90˚ 23 24 25 26 27 28 29 30 31 32 33
90˚ 23 24 25 26 27 28 29 30 31 32 33
135˚ 34 35 36 37 38 39 40 41 42 43 44
135˚ 34 35 36 37 38 39 40 41 42 43 44
45,
46, 47, 48, 49,50 60
5 are also relevant to the GRNN, but more than 50% are considered as ‘noise’. Clearly, arbitrary application of the optimal feature set for one type of classifier to another type of classifier does not guarantee the best optimal classification performance. On the contrary, optimal performance requires separate feature selection for each type of classifier. Receiver operating characteristic (ROC) analysis is a commonly used approach for classification performance evaluation [10]. An ROC curve was created by plotting the sensitivity against 1-specificity for different cutoff values of the classifiers. Sensitivity is the ability to correctly detect disease, and specificity is the ability to avoid incorrectly classifying normal tissue as diseased. The area under the ROC curve (Az) is the average sensitivity over all possible specificities. Az values characterize the tradeoff between the sensitivity and specificity of the diagnostic test. Table 6 shows the ROC curves associated with Az values of
4
46, 48, 56, 57
1
61
the training and test data for GRNN and SVM, with and without prior SFS feature selection. To calculate the ROC curves, the SVM’s binary output (C1 or K1) generated by the thresholding of the hyperplane has to be released. Therefore, pre-thresholding values are used for ROC calculation. Though the training Az values obtained by using all 61 features to the GRNN and SVM are both 1, the test Az values show that a better performance results from SFS feature selection. The Az values of ROC curves in Table 6 confirm the ‘overfitting’ phenomenon shown in Fig. 11. Based on Table 6, we offer the following conclusions: (a) Under either feature selection criterion (no selection/SFS selection), the SVM outperforms the GRNN. For all 61 features used as input vectors, the test data set yielded Az values of 97.01% for the SVM and 96.00%
Table 6 Az values of ROC curves before and after SFS feature selection Input Vector
Classifier GRNN No. of features
All Features Selected by SFS
61 25
SVM Az
No. of features
Training
Test
1.0000 0.9945
0.9600 0.9780
61 11
Az Training
Test
1.0000 0.9978
0.9701 0.9800
428
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429
for the GRNN. With input features selected by SFS, the corresponding Az values were 98.00% for the SVM and 97.80% for the GRNN. The SVM outperformed the GRNN, whether or not the input vectors first underwent SFS feature selection. (b) For either classifier (GRNN or SVM), SFS feature selection improves classification performance, increasing the Az value from 97.01 to 98.00% for the SVM and from 96.00 to 97.80% for the GRNN. (c) Feature selection can drastically reduce the dimension of the input vector (in this case, 82% for the SVM and 59% for the GRNN).
the future feature-related system expansion, such as content-based retrieval of mammogram images.
4. Conclusion
References
This paper proposes a two-stage model for mCACC classification in X-ray mammograms. In the first stage, a closed form mathematical model based on the top-hat transform and a border detection algorithm calculates the location and shape of suspected mCACCs. When tested on the Nijmegen University Hospital (Netherlands) database, data analysis shows that the proposed model effectively detects the occurrence of mCACCs. The data driven structure of the mathematical model eliminates the need for parameter setup and system training. Two additional advantages of the model are its structural simplicity and its efficient reuse of information on the shape of suspected mCACCs to extract shapebased features in the spatial domain during the second stage. In the second stage, 61 features are extracted for each suspected mCACC, representing the texture, spatial and spectral domains. A sequential forward search algorithm selects the classification input vector, which consists of features that are relevant only to mCACCs. Two types of classifiers, general regression neural network (GRNN) and support vector machine (SVM), are applied and the resulting classification performance compared. A total of 7531 images comprise the training data set (3674), validation data set (1838) and test data set (1839). The training and validation sets are used to search for the optimal classification input features. Data analysis shows that feature selection can significantly reduce the dimension of the input vectors (82% for the SVM and 59% for the GRNN). Az values of the Receiver Operating Characteristic curves are used to evaluate the classification performance. Data analysis shows that the SVM outperforms the GRNN, whether or not the feature set first undergoes SFS selection. SFS increased classification performance for both the GRNN and the SVM, with the best performance resulting from SVM classification of the SFS reduced input vector. The SFS simultaneously increases performance and reduces system complexity. An additional advantage is that the SFS can be reused for
[1] Bazzani A, Bevilacqua A, Bollini D, Brancaccio R, Campanini R, Lanconelli N, et al. Automatic detection of clustered microcalcifications in digital mammograms using an SVM classifier. Eur Symp Artif Neural Netw 2000;195–200. [2] Bovik A. Handbook of image and video processing. New York: Academic Press; 2000. [3] Christoyianni I, Koutras A, Dermatas E, Kokkinakis G. Computer aided diagnosis of breast cancer in digitized mammograms. Comput Med Imaging Graph 2002;26:309–19. [4] El Naqa I, Yang Y, Galatsanos NP, Wernick MN. Content-based image retrieval for digital mammography. IEEE Int Conf Image Processing 2002;3:24–8. [5] El-Naqa I, Yang N, Wernick MN, Galatsanos NP, Nishikawa R. Support vector machine learning for detection of microcalcifications in mammograms. IEEE Int Symp Biomed Imaging 2002. [6] Fogel DB, Wasson III EC, Boughton EM, Porto VW. Evolving artificial neural networks for screening features from mammograms. Artif Intell Med 1998;14(3):317–26. [7] Haralick RM, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Trans Syst Man Cybern 1973;SMC-3(6):610–21. [8] Zheng B, Qian W, Clarke LP. Digital mammography: mixed feature neural network with spectral entropy decision for detection of microcalcifications. IEEE Trans Med Imaging 1996;15(5):589–97. [9] Strickland RN, Hahn HI. Wavelet transforms for detecting microcalcifications in mammograms. IEEE Trans Med Imaging 1996;15(2): 218–29. [10] Kim JK, Park HW. Statistical textural features for detection of microcalcifications in digitized mammograms. IEEE Trans Med Imaging 1999;18(3):231–8. [11] Yu S, Guan L. A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films. IEEE Trans Med Imaging 2000;19(2):115–26. [12] Karssemeijer N. A stochastic model for automated detection of calcifications in digital mammograms. Proceedings of 12th international conference of information processing in medical imaging 1991 p. 227–38. [13] Vapnik V. Statistical learning theory. New York: Wiley; 1998. [14] Parzen E. On estimation of a probability density function and mode. Ann Math Stat 1962;33:1065–76. [15] Sonka M, Halavac V, Boyle R. Image processing, analysis, and machine vision.: PWS Publishing; 1999. [16] The MathWorks, Inc. Image Processing Toolbox User’s Guide, The MathWorks, Inc.; 2001. [17] Spetch DF. A general regression neural network. IEEE Trans Neural Netw 1991;2(6):568–95. [18] Liu H, Motoda H. Feature selection for knowledge discovery and data mining. Dordecht: Kluwer; 1998.
Acknowledgements This research was conducted with the support of Taiwan’s National Science Council (NSC 91-2213-E212-026) and VTY Joint Research Program, Tsou’s Foundation (VTY 91-P2-17). Mr J. H. Chio and Mr M. L. Tsai, two Da-Yeh University graduate students, helped with the data analysis.
J.C. Fu et al. / Computerized Medical Imaging and Graphics 29 (2005) 419–429 [19] Suykens JAK, Gestel TV, Brabanter JD, Moor BE, Vandewalle J. Least squares support vector machines. New York: World Scientific Publishing; 2002. [20] Principe JC, Euliano NR, Lefebvre WC. Neural and adaptive systems. New York: Wiley; 2000. [21] Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B. An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 2001; 12(2):181–201. [22] Netsch T, Peitgen TN. Scale-space signatures for the detection of clustered microcalcifications in digital mammograms. IEEE Trans Med Imaging 1999;18(9):774–81. [23] Karssemeijer N. Adaptive noise equalization and image analysis in mammography. Proc Inf Processing Med Imaging 1993;472–86.
Jachih (J.C.) Fu, PhD is both a professor in the Department of Industrial Engineering and Technology Management and the Director of Library at Da-Yeh University, located in Central Taiwan. He currently heads the DYU Automated Measurement and Diagnostic Systems Laboratory, where he uses medical imaging technologies for computer-aided diagnosis and conducts monitoring applications for the electronic industry. His medical imaging research interests include feature extraction and classification in X-ray mammograms, border detection in cardiovascular MR imaging, and segmentation and image fusion in brain MR imaging.
San-Kan Lee, MD is the professor of Radiology of National Defense Medical Center and Chung Shan Medical University. His current position is the President of Suao Veterans Hospital. Dr Lee serviced at Department of Diagnostic Radiology, Tri-Service General Hospital, Taiwan, from 1976 to 1989. In 1982, he received a year of clinical fellowship training at Division of Diagnostic Ultrasound of Thomas Jefferson University Hospital in Philadelphia. From 1989 to 2003, he was Chairman of Department of Radiology at Taichung Veterans General Hospital. His research interest includes computed aided diagnosis (CAD) in X-ray mammograms, ultrasound screening of breast cancer, picture archiving and communicating system, and radiology information system. Dr Lee has published over 200 peerreviewed papers, and presented more than 50 abstracted in the international conferences. The study of CAD has earned Dr Lee one international and one domestic patent each.
429
Stephen TC Wong, PhD, PE is the founding Director of HCNR Center for Bioinformatics, Harvard Medical School, the founding Director of Functional and Molecular Imaging, Brigham and Women’s Hospital, and an Associate Professor of Radiology, Harvard Medical School. He has over 20 year global R&D and management experience. His current research interests include systems bioinformatics and multi-scale biomedical imaging as well as their applications in the delivery of personalized medicine and understanding the mechanisms of diseases, notably neurodegeneration and cancers.
Jinn-Yi Yeh, PhD received the BS degree from Chung-Yuan Christian University, Taiwan, in 1987 and the MS and PhD degrees from the University of Texas at Arlington in 1993 and 1997, respectively. He is presently an Assistant Professor in the Department of Industrial Engineering and Technology Management, Da-Yeh University, Taiwan. His research interests focus on bioinformatics, medical image processing, and heuristic algorithms.
An-Hsiang Wang, PhD received the PhD degrees from the National Tsing-Hua University in Taiwan. He is presently a Professor in the Department of Industrial Engineering and Technology Management, Da-Yeh University, Taiwan. His research interests focus on ergonomics, medical image processing, and classification algorithms.
Wu H.K., MD, specialized in chest and breast imaging, is a radiologist in the Department of Medical Imaging at Chang-Hua Christian Hospital. He teaches medical imaging interpretation at Chung-Tai Institute of Health Science and Technology. He received his MD from China Medical College of Taiwan in 1983. After taking 4 years of residency training program at MacKay Memorial Hospital in Taiwan, he went to Harbor-UCLA Diagnostic Imaging Center in California as a research fellow. His research interests include computer-aided diagnosis in breast ultrasonography, temper detection and recovery based on medical images, and estimation of breast parenchymal patterns via handheld ultrasonography.