Expert Systems with Applications 36 (2009) 8124–8133
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Multi-class support vector machine for classification of the ultrasonic images of supraspinatus Ming-Huwi Horng National PingTung Institute of Commerce, Department of Information Technology, No. 51, MinSheng E. Rd., Pingtung 900, Taiwan
a r t i c l e
i n f o
Keywords: Ultrasonic supraspinatus images Mutual information criterion Multi-class support vector machine Feature selection
a b s t r a c t This article proposes an effort to apply the multi-class support vector machine classifiers to classify the supraspinatus image into different disease groups that are normal, tendon inflammation, calcific tendonitis and supraspinatus tear. The supraspinatus tendon is often involved in the above-mentioned disease groups. Four different texture analysis methods – texture feature coding method, gray-level co-occurrence matrix, fractal dimension evaluation and texture spectrum – are used to extract features of tissue characteristic in the ultrasonic supraspinatus images. The mutual information criterion is adopted to select the powerful features from ones generated from the above-mentioned four texture analysis methods in the training stage, meanwhile, the five implementations of multi-class support vector machine classifiers are also designed to discriminate each image into one of the four disease groups in the classification stage. In experiments, the most commonly used performance measures including sensitivity, specificity, classification accuracy and false-negative rate are applied to evaluate the classification of the five implantations of multi-class support vector machines. In addition, the receiver operating characteristics analysis is also used to analyze the classification capability. The present results demonstrate that the implementation of multi-class fuzzy support vector machine can achieve 90% classification accuracy, and performance measures of this implementation are significantly superior to the others. Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction Shoulder pain is a common orthopedic problem. Clinical history and examination are essential in the diagnosis of shoulder pain origin. The most common causes of shoulder pain are rotator cuff diseases, which can be due to impingement syndrome or tearing into the rotator cuff tendons. The rotator cuff is made up of four muscles and the associated tendons that surround the humeral head of the shoulder and originate at the scapula. The four muscles are supraspinatus, infraspinatus, subscapularis and teres minor. Among the four muscles, the supraspinatus of the rotator cuff is easiest to injury or more serious to tear when people do energetic exercise. Accordingly, clinical physicians routinely observe the injury symptoms of supraspinatus. Ultrasonography is the chief imaging modality to assess the supraspinatus. Several gray scale and texture features had been found in the past for the differentiation of diseases of supraspinatus. Arslan, Apaydin, Kabaalioglu, Sindel, and Luleci (1999) adopted the 7.5-MHz linear-array transducer to grab images of patients with rotator cuff injury under longitudinal view. The study demonstrated that the bursa fluid and biceps effusion were highly correlated with symptoms of rotator cuff injury. Chiou et al. (1999) E-mail address:
[email protected] 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.10.030
proposed some diagnosis criteria for classification of full/partial supraspinatus tears of patients with shoulder pain. The sensitivity and specificity of experiments in applying the diagnosis criterion are 0.98 and 0.87. Marcello, Jose, Jorge, and Gerson (2002) used criteria that are (a) one or more cuff tendon(s) was not visible, (b) focal non visibility of one the tendons and (c) defect of well defined discontinuity of the tendons to diagnose the rotator cuff tear. However, all of these studies focused on the diagnosis of the rotator cuff tear. It is presently deficient to discuss the classification of other impingement syndromes such as inflammation and tendon calcification of the rotator cuff. Recently, the popular classification system (Neer, 1972) is widely used to divide impingement syndrome diseases of supraspinatus into three stages in clinical diagnosis. The inflammation manifestations such as edema or hemorrhage usually appear in supraspinatus of Stage 1, thereby, all ultrasonic images of this stage are collected into the Inflammation Group for later classification. The supraspinatus of Stage 2 is more serious and considered irreversible. Fibrosis and calcification always exhibit in the supraspinatus; hence, the images of this stage are regarded as the Calcific Group. Stage 3 generally occurs in patients more than 50 years old, and frequently involves a tendon rupture or tear. If the tendon rupture arises in the supraspinatus, it may require further repair. The images of Stage 3 are referred to as the Tear Group.
8125
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
The texture-based measures had been applied to ultrasound images over a decade. Chen, Chang, Kuo, Chen, and Huang (2002) utilized the wavelet transform and neural networks to classify the breast tumors. Horng (2007) applied the texture feature coding method and gray-level co-occurrence matrix to extract correlated features for classification of ultrasonic liver images. Kakkos et al. (2007) used spatial gray level dependence matrices and gray level run-length statistics to discriminate the diseases of symptomatic carotid plaques. Zhang, Wang, Dong, and Wang (2007) applied some sonographic features such as size, shape and echogeneity to classify the cervical lymph nodes based on a support vector machine. This article presents a new attempt to integrate different texture analysis methods to analyze the characteristic of ultrasonic supraspinatus images for classification. Numerous features can be extracted from the texture analysis methods for measuring the different patterns of supraspinatus images. Not all of the features are significant for this application. Most of the features may be redundant or even irrelevant. As the number of features used grows, the number of training samples for estimating the model parameters usually grows exponentially. Therefore, in the classification of supraspinatus images we need some feature selection algorithms to reduce the dimension of features. In general, only the exhaustive search can assure an optimal solution to the feature selection problem, but it is always time-consuming. The search technology and feature saliency measures are two alternative approaches for obtaining the suboptimal solution. Currently, many search technology had been proposed, those include sequential forward selection or backward elimination (Mao, 2004), forward floating search (Oh, Lee, & Moon, 2004), genetic (Zhang & Sun, 2002), and branch and bound algorithm based search (Chen, 2003). As for the approaches of feature saliency measures, some popular methods, for examples, the maximization of the mutual information between the feature space and the classes (Chang & Partridge, 2003), Fish index (Kudo & Sklansky, 2000) and F-score ranking method (Chen & Lin, 2005), had been developed. However, these methods are usually necessary to compare a large number of subsets of candidate features to look for the best. In this article, the mutual information criterion is adopted to select the powerful features for classification. Multi-class classifications had been practical usages in several problems of pattern recognition. There are two kinds to construct a multi-class classifier. One is to directly develop multi-class algorithms such as neural networks, and the other decomposes a multi-class problem to several two-class problems such as multi-class support vector machines (simplified as MCSVM in the rest of the paper). Support vector machine (SVM) originally designed for binary classification. A number of decompositions had been proposed for extending to the multi-class classification problems (Hsu & Lin, 2002; Wu, Lin, & Weng, 2004). The implementations of MCSVM are generally divided into two different approaches that are the one-against-all method and the one-against-one method. The oneagainst-all method (Wu et al., 2004) compares a given class with all the others put together; therefore, it needs to construct k SVMs for the k-class classification. The one-against-one approach (Abe, 2005) constructs many binary SVMs between all possible pairs of classes, hence, it constructs k(k 1)/2 SVMs for k-class classification. In Hsu and Lin (2002), they showed that the one-against-one method is more suitable for practical use. In this article, we compare with the performance of five different implementations of MCSVM that are the original one-against-all SVM, the one-against-all fuzzy SVM, the one-against-all decision- tree-based SVM, the original one-against-one SVM and the one-against-one directed acyclic graph methods in the classification of supraspinatus images. A new attempt to integrate the different texture analysis methods to classify the supraspinatus diseases of normal, inflammation, tendon calcific and tear is proposed. The adopted methods consist of three stages that are the texture feature extraction, the selection
of texture features and the disease classifications. In the texture feature extraction stage, the four texture analysis methods including the gray-level co-occurrence matrix (Haralick, Shanmugam, & Dinstein, 1973), texture spectrum (He & Wang, 1991), fractal dimension method (Dubuc, Zucker, Teicott, Quiniou, & Wehbi, 1989; Keller, Chen, & Crownover, 1989) and texture feature coding method (Horng, Sun, & Lin, 2002), are used to generate numerous texture features. In the feature selection stage, the mutual information (Battiti, 1994; Kwak & Choi, 2002) criterion is applied to find the powerful features among features generated from the four texture analysis methods. Finally, the five different implementations of MCSVM are designed to classify all of the ultrasonic supraspinatus images based on the five-fold cross-validation method in disease classification stages. 2. Materials and methods In this section, we briefly describe the methods used in this article supporting the texture analysis, feature selection and classification methods. 2.1. Texture analysis methods 2.1.1. Gray-level co-occurrence matrix (GLCM) A co-occurrence matrix is generally referred to as a gray-level co-occurrence matrix whose entries are transitions between all pairs of two gray levels (not necessarily distinct). The gray-level transitions are calculated based on two parameters – displacement d and angular orientation . More precisely, let i and j be two graylevels and N d;h ði; jÞ denotes the number of transitions between two pixels whose gray levels are i and j with d-pixels apart and angular orientation . In other words, N d;h ði; jÞ is the number of pixel-pairs at locations ( x, y) and ( w, z) satisfying the following conditions: where Gðx; yÞ ¼ i; Gðw; zÞ ¼ j and jðx; yÞ ðw; zÞjdm ¼ ðd; hÞ, jðx; yÞ ðw; zÞjdm ¼ ðd; hÞ is a distance measure to describe the distance between two pixels of spatial locations at (x, y) and (w, z) with d-pixels apart and along angular orientation . Normalizing N d;h ði; jÞ yields the probability or the relative frequency of gray-level transitions.
pði; jjd; hÞ ¼
N d;h ði; jÞ N
ð1Þ
where N is the number of total gray-level transitions in the cooccurrence matrix. Based on the gray-level co-occurrence matrix Haralick et al. (1973) proposed 12 feature measures for texture analysis under specific d-pixels apart and an angular orientation h. There are angular second moment, correlation, gray variance, inverse difference moment, entropy, sum entropy, difference entropy, and information measure of correlation, sum average, contrast, sum variance, and difference variance. These feature measures were used in experiments for comparison where d =1, 2, 3 and 4 pixels under all angular orientations. 2.1.2. Texture spectrum (TS) He and Wang (1991) first proposed the texture spectrum. The idea is to consider a so-called texture unit described by Fig. 1 with V0, the central pixel, designated as the pixel currently examined and its 8 neighboring pixels Vi, i > 0. Three values {0, 1, 2} will be assigned to Vi, respectively according to Eq. (2) as follows:
8 > < 0 if V i V 0 < D Ei ¼ 1 if jV i V 0 j 6 D > : 2 if V i V 0 > D
i ¼ 1; 2; . . . ; 8 and NEi ¼
8 X
Ei 3j1
j¼1
ð2Þ
8126
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
V2 V3 V4 V1 V0 V5 V8 V7 V6 Fig. 1. 3 3 texture unit of texture spectrum.
In Eq. (2), the D is denoted as the tolerance of variation. In experiments, the D is assigned to be 3. Obviously, there are 38 = 6561 combinations for the Ei in Eq. (2). The distribution of occurrence of all texture numbers, NEi , generated by Eq. (2) is called the texture spectrum. As the texture unit represents the local texture information of a given pixel and its neighborhood, the statistics of all the texture units in an image reveal its global texture aspects. Several features are useful for classification and defined, being, black–white symmetry (BWS), geometric symmetry (GS), degree of direction (DD), orientation features (MHS, MVS, MDS1 and MDS2) and central symmetry (CS). 2.1.3. Fractal dimension (FD) The concept of FD introduced by Mandelbrot (1977) provided a good representation of the roughness of natural surfaces. Dubuc et al. (1989) proposed the evaluation of FD by computing the maximum variation, V(e), of an image intensity surface f(x, y) in a square e. The maximum variation is defined as
VðeÞ ¼
Z 0
1
Z
1
v ðx; y; eÞdxdy
ð3Þ
0
where v ðx; y; eÞ ¼ sup jf ðx1 ; y1 Þ f ðx2 ; y2 Þj, and the supreme is taken over all pairs (x1, y1), (x2, y2) for which maxfjx x1 j; jx x2 j; jy y1 j; jy y2 jg 6 e. The FD is computed based on the following equation:
ln VðeÞ FD ¼ lim 3 e!0 ln e
into a texture feature image whose pixels are represented by texture feature numbers. The texture feature number of each pixel X is generated on the basis of gray-level changes of its eight surrounding pixels of the 3 3 texture unit (TU) in Fig 2. In this method, we consider the connectivity of the texture unit. The eight neighboring pixels in Fig. 3 constitute the 8-connectivity of the texture unit, which can be divided into the first-order 4-connectivity pixels and second-order 4-connectivity pixels. The four pixels labeled 1 satisfy the first-order 4-connectivity of the texture unit because they are immediately adjacent to the pixel X. They will be denoted by first-order connectivity pixels. The other four pixels labeled 2 satisfy the second-order 4-connectivity of the texture unit, which are diagonally adjacent to X and will be denoted by second-order connectivity pixels. In order to code pixel X in Fig. 3, TFCM produces a pair of integers (a, b) where a and b represent gray-level variations of first-order and second-order connectivity, respectively. As shown in Fig. 4, two scan lines along the 0°–180° and 90°–270° directions produce two sets of three successive first-order connectivity with pixel X in the middle of the horizontal and vertical lines forming by ‘‘+”. Similarly, two scan lines along the diagonal directions 45°–225° and 135°–315° formed by ‘‘” also produce two sets of three successive second-order connectivity pixels as shown in Fig. 5. Assume that the scan direction is from top to down and left to right, the three successive pixels can be arranged as (a, b, c) in the scanning order. For example, if (a,b, c) represents pixels scanned by the vertical line of the first-order connectivity in Fig. 4, the top pixel is denoted by a, pixel X by b and the bottom pixel by c. Suppose that (Ga, Gb, Gc) corresponds the gray levels of three pixels (a, b, c), respectively. If we consider two successive gray-level changes between two pairs (Ga, Gb) and (Gb, Gc), there are four different types of variations. The four types of gray-level variation defined by Eq. (6) can be graphed by the gray-level graphical structure shown in
ð4Þ
2 1 2 1 X 1 2 1 2
In other word, the FD can be estimated as three minus the slope of the least square fit of the data: fln e; lnðVðeÞg as e. In the box-counting method, the rate of growth of the number of boxes needed to cover a volume of the gray-level surface is computed as the FD. Let N(L) denote the number of boxes of size L needed to cover the gray-level volume. Keller et al. (1989) estimated FD based on the following equation:
D¼
ln NðLÞ ln L
Fig. 3. 3 3 texture unit of TFCM.
2 1 1 X 2 1
ð5Þ
That is to say, the FD can be estimated for the least square linear fit of ln NðLÞ and ln L.
0º- 180º scan line
2.1.4. Texture feature coding method (TFCM) The texture feature coding method is a coding and rotationinvariance scheme, which transforms an original gray-scale image
a
+ : 90º- 270º scan line Fig. 4. First-order 4-connectivity.
b V V
2
1
V8
V V V
3
V V
0
7
V
4
5
6
2 1 2
E E E
2
1
8
E E E
3
0
7
E
4
E5 E
6
Fig. 2. (a) Neighborhood of the 3 3 pixels, (b) texture unit (TU) of texture spectrum.
:
2 1 1 X
1
2
2
1
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
8127
TFNðx; yÞ ¼ aðx; yÞbðx; yÞ
ð7Þ
2
where a(x, y) and b(x, y) are values obtained using Table 1 for the pixel at spatial location (x, y). According to Eq. (7), there are 55 used texture feature numbers among {1, 2, . . . , 529}. Therefore we can compress 529 values to 55 values by removing unused texture feature numbers. By re-labeling we map that these 55 values to the integer values 0 to 54. In this case, we can define a texture feature number histogram by
45º -225º scan line 135º -315º scan line
Fig. 5. Second-order four-connectivity.
RP D ðnÞ ¼
RND ðnÞ ; N
n 2 f0; 1; 2 . . . ; 54g
ð8Þ
The D is the gray-level variation tolerance given in Eq. (6), RN D ðnÞ is the frequency of occurrence of the texture feature number n, and, N is the total number of pixels in the feature image. Furthermore the texture feature number co-occurrence matrix is also generated according to Eq. (9)
(i) (ii) (iii) (iv)
PD ði; jjd; hÞ ¼
Fig. 6. Types of gray-level graphical structure variations.
Fig. 6. In Eq. (6), the D is denoted to tolerance of variation. Type (i) describes the case that the gray levels of a, b and c are very close within the tolerance D. Type (ii) is the case that one pair of gray levels is within D, but the gray-level variation of the other pair variation exceeds D. Type (iii) is the case where the gray levels of a, b, and c are continuously decreasing or increasing with gray-level differences larger than D. In Type (iv), the gray-level variation is either first decreasing then increasing or first increasing then decreasing, in which the increments and decrements exceeds D
ðiÞ if ðjGa Gb j6 DÞ \ ðjGb Gc j6 DÞ ðiiÞ if ½ðjGa Gb j6 DÞ \ ðjGb Gc j6 DÞ [ ½ðjGa Gb jP DÞ \ ðjGb Gc j6 DÞ ðiiiÞ if ½ðGa Gb > DÞ \ ðGb Gc > DÞ
ð6Þ
RN D;d;h ði; jÞ ; Nt
i; j 2 0; 1; 2; . . . ; 54
ð9Þ
where RN D;d;h ði; jÞ is defined similarly as in Eq. (9), i and j are TFNs rather than gray-levels and Nt is the total number of TFN transitions. In experiments, the displacement d is chosen to be one and two pixels under angular orientation h cover all eight orientations of 0°, 45°, 90°, 135°, 180°, 225°, 270° and 315°. The seven texture features are extracted form the texture feature number histogram and texture feature number co-occurrence matrix under a specific pair of displacement and angular orientation that are coarseness (Coarse), homogeneity (Hom), mean convergence (MC), variance (Var), code entropy (CE), code similarity (CS) and resolution similarity (RS). In summary, each region R of the image can be extracted to 72 texture features that are generated from the above-mentioned four texture analysis methods. In these texture features, 48 features were generated from GLCM, eight features from TS, two from FD and the others from TFCM.
[ ½ðGb Ga > DÞ \ ðGc Gb > DÞ ðivÞ if ½ðGa Gb > DÞ \ ðGc Gb > DÞ
2.2. Powerful feature selection
[ ½ðGb Ga > DÞ \ ðGb Gc > DÞ According to this definition of gray-level graphical structure variation, the higher the type number, the greater the gray-level variation will occur. Since both a and b, corresponding to 1st and second order connectivity, respectively, have two scan lines, each of which can produce a type of gray-level variation, then a and b can be assigned a pair of gray-level graphical structure variations. The total number of combinations of arbitrary two gray-level graphical structure variations (including self-combinations) is 4ð4þ1Þ ¼ 10. As a result, each a and b can be represented by that dis2 played in Table 1. For example, if a = 11 or b = 11, there is a combination of (ii) and (iii). Finally, the texture feature number of each pixel is generated by taking the product of a and b. Let the gray level of the pixel with spatial location (x, y) be denoted by G(x, y) and the corresponding texture feature number by TFN(x, y). Then
Table 1 The variation generation of first-order (a) or second-order (b) 4-connectivity. Scan line 1 (a)
Scan line 2 (b)
(i) (ii) (iii) (iv)
(i)
(ii)
(iii)
(iv)
1 2 3 5
2 7 11 13
3 11 17 19
5 13 19 23
Input feature selection is a very important step in supervised or unsupervised learning problem especially when the number of observations is relatively small comparing to the number of features. Many irrelevant or redundant features may lead to over-fitting and even to poor accuracy of the classification system. However, the optimal feature selection is a combinatorial problem since it would be necessary to evaluate all combinations of available features. In this section we introduce the mutual information criterion for feature selection. First, we will describe the mutual information criterion (Kwak & Choi, 2002) for feature selection. Given a set of n training samples (xi, ci) of a d-dimensional feature vector X, xi ¼ ðfi1 ; fi2 ; . . . ; fid Þ 2 Rd ; and class labels as samples of discrete-valued random variable C, ci, i 2 f1; 2; . . . ; Ng. In experiments, the class number N is assigned to 4, that are, the c1 is normal disease group; the c2 for tendon inflammation; the c3 for the calcific tendonitis and the c4 for supraspinatus tear, respectively. The objective of mutual information (MI) criterion is to find the subset S (S X) with k-dimensional features (k < d) which maximizes the mutual information I(S; C). The details of the computation are described as follows. In general, the class label has discrete values while the input features are continuous variables. As a result, the mutual information IðS; CÞ between the selected features S and the class C can be expressed as follows:
IðS; CÞ ¼ HðCÞ HðCjSÞ;
ð10Þ
8128
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
HðCÞ ¼
X
Pðci Þ log Pðci Þ
ð11Þ
2.3. Multi-class support vector machines
ci 2C
HðCjSÞ ¼
Z
pðsÞ
s
X
pðcjsÞ log pðcjsÞds;
ð12Þ
ci 2C
where s is the instances of selected features. The H(C) is the entropy function that is easily calculated using Eq. (11). But the computation of the conditional entropy HðCjSÞ is very complicated because of the difficulty in estimating the probabilities of the p(s) and the pðcjsÞ. Kwak and Choi (2002) proposed a Parzen window method to calculate the probabilities of p(s) and pðcjsÞ. The nucleus of Parzen window estimation is that the approximation of probability involves the superposition of normalized Gaussian window functions u(s, h) centered on a set of random samples. Given a set of m k-dimensional selected feature vectors S = {s1, s2, . . . , sm}, the estimate of the p(s) is given by ^
pðsÞ ¼ ¼
m 1 X /ðs si ; hÞ m i¼1
! m 1 X 1 ðs si ÞT R1 ðs si Þ exp P 2 m i¼1 ð2pÞk=2 hk j j1=2 2h
ð13Þ
P where h is the window width parameter and is the covariance matrix. As for the estimation of p(c|s), the Bayesian rule is applied to transcribe it into Eq. (14)
PðcjsÞ ¼
pðscÞpðcÞ pðsÞ
ð14Þ
Hence, the probability of p(s|c) can be expressed as the following equation:
pðsjcÞ ¼ pðs ¼ si jc ¼ cj Þ ¼
1 X /ðs si ; hi Þ ncj s 2Ic
ð15Þ
i
where j = 1, 2, ...., N; ncj is the number of training samples belonging to class cj; and Ic is the part of training feature set belonging to class cj. The conditional probability p(c|s). can be expressed into the following equation using Eqs. (14) and (15):
P
pðsjcÞpðcÞ s 2I /ðs si ; hj Þ PðcjsÞ ¼ ¼ Pin C pðsÞ i¼1 /ðs si ; hÞ
The multi-class support vector machines (MCSVM) are the methods originally designed together with many binary support vector machines. First, we explain the highlight of binary SVM for classification (Allwein, Schapire, & Singer, 2000). Sequential paragraph describes two different approaches of MCSVM classifiers that are the one-against-all and the one-against-one approaches. 2.3.1. Binary support vector machine Binary support vector machine is first suggested by Vapnik (1995) and widely used in pattern recognition application including the credit scoring (Baesens et al., 2003; Huang, Chen, & Wang, 2007), spam categorization (Drucker, Wu, & Vapnik, 1999), bioinformatics (Yu, Ostrouchov, Geist, & Samatova, 2003) and brain tumor classification (Lu et al., 1999). The detail of SVM can be found in the referred works of Abe (2005) and Hsu, Chang, and Lin (2003). N Given a training sets of N points fðxi ; yi Þgi¼1 , with data x 2 Rn and the corresponding class labels yi 2 fþ1; 1g, SVM computes the optimal separating hyperplane with the maximum margin by solving the following the optimization problem:
Min wT w subject to the constraint yi ðwT xi þ bÞ P 1; for i ¼ 1; . . . ; N w;b
ð17Þ The quadratic optimization is used to find the saddle point of the Lagrange function of Lagrange multipliers ai based on following equation:
LP ðw; b; aÞ ¼
N X 1 T ðai yi ðhw xi i þ bÞ 1Þ; w w 2 i¼1
ai P 0
ð18Þ
The optimal solution of Lp will be found in saddle point that minimizes the Lp respect to w and b and maximizes with respect to ai
@LP ðw; b; aÞ ¼0 @w
ð19Þ
@LP ðw; b; aÞ ¼0 @b
ð20Þ
ai fyi ðwT xi þ bÞ 1g ¼ 0 for i ¼ 1; . . . ; N
ð21Þ
ai P 0 for i ¼ 1; . . . ; N
ð22Þ
ð16Þ
where hj is the specific window width parameter of class cj. In experiments, the h and hj are assigned to be equal for simplicity. Battiti (1994) applied a greedy forward sequential search method to search the most relevant features from an input feature set. In this scheme, starting from the empty set of selected features, the best feature of current feature set X is added into the selected feature set S and removed it from X until the mutual information between selected features and their corresponding class labels is maximized. The complete algorithms using the MI are realized as described. step 1. Let the S is empty. step 2. Compute the H(C) according to Eq.(11) for all training samples. step 3. Select the first feature, f(1), form the original feature set X, that maximizes the MI of S = {f(1)} and corresponding class label. Let X =X -{f(1)}, step 4. 4.1. Add any feature of X to the selected feature subset S and compute I(S, C). 4.2. Choose the best one feature fj e X such that maximizes I(S, C). Set X = X {fj} and add fj to the feature subset S. 4.3 Repeat 4.1 and 4.2, until the mutual information between selected feature vector S and their corresponding class labels C is maximized. step 5. Output the set S containing the selected feature vectors for later classification.
T
Form Eq. (22), ai = 0 or ai –0 and yi ðw xi þ bÞ ¼ 1 must be satisfied. Eqs. (21) and (22) can be reduced into the following equations:
w¼
N X
ai yi xi
ð23Þ
i¼1 N X
ai yi ¼ 0
ð24Þ
i¼1
The data xi is called the support vector when its corresponding ai is positive. Geometrically, these support vectors are the closet to the optimal hyperplane. Eqs. (19)–(22) meet the Karush Kuhn–Tucker (KKT) condition and further transform them into the dual LagrangianLD(a) by substituting Eqs. (23) and (24) into Eq. (25) as follows:
Max a
LD ðaÞ ¼
N X i¼1
ai
N 1X ai aj yi yj hxi xj i 2 i;j¼1
ð25Þ
P Subject to: ai P 0 and Ni¼1 ai yi ¼ 0, for i = 1, . . . , N. The solution ai of Eq. (25) determine the parameter w and b of the optimal hyperplane. Thus, the decision function f(x) can be expressed in the following formula:
8129
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
f ðxÞ ¼ sgn
N X
!
i
yi ai hxi xi þ b
ð26Þ
ci ¼ arg max ððwi ÞT /i ðxj Þ þ b Þ
ð31Þ
i¼1;...;k
i¼1
The formulated support vector machine is called the hard-margin support vector machine. However, the hard-margin support vector machine does not work well while the data xi are linearly inseparable. One improvement is the nonlinear SVM that maps the training data from the input space into a higher-dimensional dot-product feature space via a mapping function u(x). The mapping function u(x) must satisfy the Mercer’s Theorem. To simplify, the kernel function k(xi, xj) is used to replace the inner products of u(xi) and u(xj). The nonlinear SVM dual Lagrangian LD(a) can be expressed as the following equation:
a
LD ðaÞ ¼
N X 1 i T i sij nij ðw Þ w þ C 2 j¼1
Min
/ðxi Þ /ðxj Þ ¼ kðxi ; xj Þ Max
(b) One-against-all fuzzy SVM (OAA-FSVM). Two approaches that are fuzzy support vector machines (FSVM) (Wang & Chiang, 2007) and decision-tree-based (DTB) method (Polat & Gunes, 2008; Pontil & Verri, 1998; Takahashi & Abe, 2002) are proposed to improve the classifications using discrete decision functions of the one-against-all support vector machines. The main difference between the original OAA SVM and FSVM is that the cost C of FSVM is multiplied by fuzzy membership functions sij ð0 6 sij 6 1Þ. More precisely, Eq. (30) should be transformed into Eq. (32):
N X i¼1
wi ;bi ;ni
N 1X ai ai aj yi yj kðxi ; xj Þ 2 i;j¼1
subject to 0 6 ai 6 C;
i ¼ 1; . . . ; N
and
N X
ð27Þ
i
subject to ðwi ÞT /ðxj Þ þ b P 1 sij nij ðw Þ /ðxj Þ þ b 6 1 þ
ai yi ¼ 0
i¼1
nij P 0;
The C is the margin parameter that determines the trade-off between the maximization of the margin and minimization of the classification error. The decision function f(x) is obtained by following the steps like the hard-margin support vector machine:
f ðxÞ ¼ sgn
N X
! yi ai kðx; xi Þ þ b
ð28Þ
The radial basis function (RBF) is used as kernel function shown as Eq. (29), and the support vector xi are the centers of the radial basis functions
ð29Þ
2.3.2. Multi-class support vector machines (MCSVM) In this section, we briefly explained the five multi-class support vector machines that are original one-against-all SVM, oneagainst-all fuzzy SVM, one-against-all decision-tree SVM, original one-against-one SVM and one-against-one directed acyclic graph SVM. The five implementations are applied to classify ultrasonic supraspinatus images for comparison. (a) Original one-against-all (OAA) SVM. The earliest used implementation for MCSVM classifier is the one-against-all method. The problems of OAA SVM using the discrete decision functions are the existence of unclassifiable regions. These unclassifiable regions always lead into large numbers of erroneous classifications. This implementation of k-class MCSVM is usually converted into k binary SVM problems. The ith SVM is trained with all examples in the ith class with positive label, and all other examples with negative label. Suppose that there is a training set D with N training examN ples,fðxi ; yi Þgi¼1 , where xi 2 Rn is given a class label yi 2 f1; :::::; kg. The SVM of the ith class can be expressed into the following problem:
Min
wi ;bi ;ni
N X 1 i T i nij ðw Þ w þ C 2 j¼1
ðw Þ /ðxj Þ þ b 6 1 þ nij
P 0;
j ¼ 1; . . . ; N; 0 6 sij 6 1
v ij
ð33Þ
Secondly, we define a fuzzy membership function matrix S for implementation of OAA-FSVM classifier. Generally, we can assign higher weight sþ ij for each positive example and lower weight sij for each negative example. Therefore, the membership function sij can be defined as
( sij ¼
sþij sij
if if
v ij ¼ 1 v ij ¼ 1
ð34Þ
As far as the fuzzy membership function sij of training data concerned in experiments, it is set to 1 if the values of vij are 1, and the fuzzy membership function is set to h if the values of vij = 1. (c) One-against-all decision-tree based (OAA-DTB) SVM. Another work of Takahashi (2002) published the four different decision-tree constructions for improving the performance of original OAA support vector machine. In experiments we adopt the type 2 method to construct the decision tree. The details of this method are described as follows: Step 1. Initially, we assume that all the classes belong to different clusters and compute the cluster centers and the distances measurement between cluster i and cluster j, dij ði; j ¼ 1; . . . ; kÞ, by Eqs. (35) and (36)
1 X x; X i is the set of training data belong to class i: jX i j x2X i
i
i
ð32Þ
if yj –i
8 > < 1 if the ith image belongs to the jth class; ¼ for i ¼ 1; . . . ; 120; j ¼ 1; . . . ; 4: > : 1 otherwise
ci ¼
subject to ðwi ÞT /ðxj Þ þ b P 1 nij i T
sij nij
In the classification problem of 120 supraspinatus images into four disease groups, we first use 4 bipolar target variables v ij to indicate the target value of training image i in class j by Eq. (33) and the union of v ij is denoted into matrix V. The column of V contains positive (vij =1) examples and negative examples (vij = 1)
i¼1
kðxi ; xj Þ ¼ expfckx xi k2 g
i
i T
if yj ¼ i
nij
if yj ¼ i
ð30Þ
if yj –i
ð35Þ dij ¼ kci cj k
ð36Þ
j ¼ 1; . . . ; N
We can assign the data xj belongs to the class ci with the maximal value of the decision function based on Eq. (31).
Step 2. Select two clusters with the smallest distance measurement and merge them into one new cluster.
8130
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
Step 3. Repeat step 2 until all clusters are merged into two clusters. Step 4. Calculate the optimal hyperplane that separates the clusters generated in step 3. 0 Step 5. If the separated cluster in Step 4 has k (>2) classes, we regard the classes as belonging to different clusters and repeat 0 0 Step 2 with k 2 time and go to Step 4. If k ¼ 2, compute the optimal hyperplane that separates the two classes. If no cluster has more than one class, terminate the algorithm. (d) One-against-one voting based (OAO-VB) SVM. Another approach is called the one-against-one MCSVM (Krebel, 1999). Although this approach can effectively decrease the unclassifiable regions that occur in the one-against-all SVMs, the unclassifiable regions still binary exists. Unlike the OAA MCSVM, this method constructs kðk1Þ 2 support vector machines. All training data xt belong to one of the ith and jth classes are used to construct a binary SVM to obtain the optimal solution of the wij, bij and nijt
Min
ij
wji ;bji ;nt
N X 1 ij T ij nijt ðw Þ w þ C 2 t¼1 ij
subject to ðwij ÞT /ðxt Þ þ b P 1 nijt ij
ij T
ðw Þ /ðxt Þ þ b 6 1 þ
nijt
if yt ¼ i
ð37Þ
if yt ¼ j
nijt P 0
ij
The class of data xt ¼ arg maxððwij ÞT /ðxt Þ þ b Þ i;j
ð38Þ
where wij is the N-dimensional vector, u(x) is a mapping function that maps data x into the N-dimensional feature space, bij is the bias term. In practice, a voting scheme associated with the one-againstone approach is used to decide the class of data x. The decision region Ri of ith class is defined for explicit explanation of the voting scheme as follows: ij
Ri ¼ fxjDij ðxÞ > 0; j ¼ 1; . . . ; k; j–ig
ð39Þ
If test data x is in the Ri, we classify x into ith class. Otherwise, we classify x by a voting scheme according to the results of calculating Eq. (39), and further assign the test data x into class j* satisfies the Eqs. (40) and (41).
Di ðxÞ ¼
k X
signðDij ðxÞÞ
ð40Þ
j–i;j¼1
j ¼ arg max Dj ðxÞ j¼1;...;k
For simplification, let the method 1 be the original one-againstall (OAA) SVM, the method 2 be the OAA-FSVM, the method 3 be the OAA-DTB. The three methods belong to the one-against-all approach. Others include two methods of one-against-one approach that are the voting-based (OAO-VB) and directed acyclic graph (OAO-DAG) approaches. In experiments, we will discuss the performance comparisons applying the five methods to disease classification of the ultrasonic supraspinatus images. 3. The experimental results and performance evaluation In this section, many different performance indices including total classification accuracy, sensitivity, specificity and false-negative rate based on the five-fold cross-validation methods are used to evaluate the classification of supraspinatus images by using the five MCSVM implementations. Furthermore, the multi-class receiver operating characteristic (ROC) (Fawcett, 2006) analysis is also applied to analyze the discriminative capability of the five MCSVM methods. 3.1. Data acquisition
The decision function for ith class against jth class is defined with the maximum margin as following equation, that is:
Dij ðxÞ ¼ ðwij ÞT /ðxÞ þ b
Step 3. Eliminate the head or the tail classes from the list depending on the decision of the SVM. Step 4. Repeat steps 2 and 3 until the only one class remains in the list.
ð41Þ
(e) One-against-one directed acyclic graph (OAO-DAG) SVM. Recently, a directed acyclic graph (DAG) support vector machine (Maruyama, Maruyama, Miyao, & Nakano, 2004) is proposed to speed up the evaluation time keeping good performance and fast learning time of on-against-one MCSVM approaches. The DAG nodes method is a decision graph in essential. It contains kðk1Þ 2 and each node corresponds to a binary SVM. Starting from the root node, the graph is traversed down to leaf by evaluating a SVM at each node. Platt, Cristianini, and Shawe-Taylor (2000) and Maruyama et al. (2004) proposed a list manipulation algorithm to traverse the directed acyclic graph. The procedure of list manipulation is described as follows: Step 1. Make the list of all possible classes. The order of the classes is arbitrary. Step 2. Compare the head and the tail classes in the list.
The ultrasonic image database we used was recorded from 2004 to 2007, and the ages of patients ranged from 30 to 65 years. The 120 acquired images that equally divided into four classes are captured from an HDI Ultramark 5000 Ultrasound system (ATL Ultrasound, CA, USA) with ATL linear array probe (dynamic focusing) from National Cheng Kung University Hospital. The captured images are digitized into 256 256 pixels with 256 gray-level via a frame grabber. An area of interest of each supraspinatus image with 30 60 pixels under the depth around 1.5–2.5 cm from the body surface is selected to extract texture features for classification. Fig. 7a–d is the images of normal, tendon inflammation, calcific tendonitis and supraspinatus tear. In experiments, all programs are implemented in Visual C++ associated with LIBSVM (Chih-Jen Lin’s Homepage, 2008) Toolbox of Matlab software. 3.2. Powerful feature selection In comparison with the number of training data, large numbers of features are possible to bring about poor classification. Consequently, the mutual information criterion is used to reduce the dimension of texture features. The selected powerful features from the 72 features based on the MI criterion are listed in Table 2 including the Sum Variance (GLCM), Sum Variance (GLCM), Mean Convergence (TFCM), Contrast (GLCM) and Difference Variance (GLCM). It is valuable to explore that all of the selected powerful features are generated from GLCM and TFCM. 3.3. Classification accuracy, sensitivity and specificity analysis The k-fold cross-validation method is a popular way to analyze the classification performance. In experiments, all supraspinatus images are divided into five subsets with equal number of images of the different disease groups. One of the five subsets is selected as the test set and the other four subsets are put together to form a training set. More precisely, every supraspinatus image appears in a test set exactly once, and appears in training set four times. The average accuracy across all five trails is computed. The definitions of sensitivity, specificity and total classification accuracy are listed as follows.
8131
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
Fig. 7. Samples of ultrasonic images of supraspinatus. The image (a) is a normal case; image (b) is a sample of the tendon inflammation. The images (c) and (d) are calcific tendonitis and tear, respectively.
Table 2 Powerful texture feature selection by using the GLCM, TS, FD and TFCM methods. Feature selection method
Feature selected
Mutual information criterion
Sum average (GLCM, d = 2), Sum variance (GLCM, d = 2), Mean convergence (TFCM, d = 2), Contrast (GLCM, d = 3), Difference variance (GLCM, d = 2)
Sensitivity: number of true positive decisions / number of actually positive cases. Specificity: number of true negative decisions / number of actually negative cases. Total classification accuracy: number of correct decisions/total number of cases. Among the three indices, sensitivity and specificity are the most important references with which clinical doctors are concerned. In order to further exhaustively investigate the performance of classification using previously selected feature set, the total classification accuracy, sensitivity and specificity of the five MCSVM classifiers based on the five-fold cross-validation method are computed. In addition, an effective classification method should decrease the possibility of misclassifications, especially for the false-negative rate. The false-negative rate is the probability of the misclassification that the patients are classified with less severe diseases than the actual diagnosis. High false-negative rate represents the danger to underestimate the disease severity of patient while the clinical doctors use this classification. Therefore, the false-negative rate may be considered as an index for evaluating the performance of the five MCSVM classifiers. The confusion matrices of the five
MCSVM methods are listed in Table 3. Table 4 shows the four indices of classification performance of the five MCSVM methods. Table 3 The confusion matrices using the features listed in Table 2 under the five different MCSVM methods based five-fold cross-validation. Actual results
Classified results Tear
Calcific
Inflammation
Normal
Method 1 – OAA Tear Calcific Inflammation Normal
26 1 0 0
2 24 3 2
2 4 23 5
0 1 4 23
Method 2 – OAA-FSVM Tear Calcific Inflammation Normal
(h = 0.5) 30 2 0 0
0 26 1 2
0 2 27 3
0 0 2 25
Method 3 – OAA-DTB Tear Calcific Inflammation Normal
29 1 0 0
1 25 2 2
0 3 25 2
0 1 3 26
Method 4 – OAO-VB Tear Calcific Inflammation Normal
26 1 0 0
3 24 3 2
0 2 25 4
1 3 2 24
Method 5 – OAO-DAG Tear Calcific Inflammation Normal
29 3 1 0
1 25 2 3
0 1 24 3
0 1 3 24
8132
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
Table 4 The performance indices of the five MCSVM methods in supraspinatus image classifications. Method used
Sensitivity (%)
Method 1 – OAA 1. Supraspinatus tear 2. Inflammation tendon 3. Calcific tendon 4. Normal Method 2 – OAAFSVM (h = 0.5) 1. Supraspinatus tear 2. Inflammation tendon 3. Calcific tendon 4. Normal Method 3 – OAADTB 1. Supraspinatus tear 2. Inflammation tendon 3. Calcific tendon 4. Normal Method 4 – OAOVB 1. Supraspinatus tear 2. Inflammation tendon 3. Calcific tendon 4. Normal Method 5 – OAODAG 1. Supraspinatus tear 2. Inflammation tendon 3. Calcific tendon 4. Normal
Sensitivity (%)
86.67
95.56
80.00
93.33
76.67 76.67
92.22 92.22
100
False negative rate (%)
Total classification accuracy (%)
10.83
80.00
3.33
90.00
6.66
87.50
100
86.67
95.56
90.00 83.33
96.67 94.45
96.67
98.89
83.33
94.45
83.33 86.67
94.45 95.56
Fig. 8. ROC curves o the five MCSVM classifiers.
9.17 86.67
Table 5 The Az of ROC curves of five different 4-class support vector machines with the variance of the (C,c).
82.50
95.56
80.00
93.33
83.33 80.00
94.45 93.33
98.89
83.33
94.45
80.00 80.00
93.33 92.22
85.00
The multi-class ROC curve (Diri & Albayrak, 2008) can manifest the relationship between the true-positive fraction (TPF) and false-positive fraction (FPF) with the variations of parameters (C,c). In general, the area under the ROC curve (AUC), Az, is a powerful index for assessing the classification performance of the classifier. Hand and Till (2001) proposed an approach to calculating the AUC of k-class classification by using Eq. (42), nevertheless, not all of the five methods contain k(k 1)/2 binary SVMs. Therefore, the formula of Eq. (42) is modified into Eq. (43) for calculating the AUC:
X 2 AUCðci ; cj Þ k ðk 1Þ c ;c 2C i
ð42Þ
j
where the k is the number of clusters and ci is the ith disease group of supraspinatus.
AUC av erage ¼
OAA-FSVM (h=0.5)
OAA-DTB
OAO-VB
OAO-DAG
0.876
0.943
0.921
0.901
0.889
Based on the results of the present experiments of the classification of ultrasonic supraspinatus images, the followings can be emphasized:
3.4. Multi-class receiver operating characteristics (ROC) analysis
AUC av erage ¼
OAA
Az value
4. Discussion and conclusion 5.00
96.67
Method
X 1 AUCðci ; cj Þ number of binary SVMs c ;c 2C i
ð43Þ
j
where the ci and cj are two classified classes of any binary SVM. Fig. 8 shows the ROC curves of the five MCSVM methods and the corresponding AUC values are listed in Table 5.
1. The result of texture feature selection is that all of the selected texture features are generated from the GLCM and TFCM. This result reveals that the discriminative capability of selected features generated from the GLCM and TFCM methods are superior to any of other methods. The similar result was also found in the classification of ultrasonic liver images of our past work (Horng et al., 2002). 2. The five multi-class support vector machines including OAA, OAA-FSVM, OAA-DTB, OAO-VB and OAO-DAG are used to classify the ultrasonic supraspinatus images for comparison. In comparison with the total correct classification accuracy of the five methods, the classification accuracy of using OAA-FSVM method is significantly higher than that of the other four methods. This classification accuracy of OAA-FSVM method is 90%, particularly, the classification of supraspinatus tear is absolutely correct. Moreover, the false-negative rate of the OAA-FSVM is also only 3.33%. It reveals that the usage of the OAA-FSVM method is relatively safe in the disease diagnosis. The average of sensitivity and specificity of using the OAA-FSVM are 0.90 and 0.967 that are superior to one of other four methods. To sum up, the results show that the classification system of OAA-FSVM is greatly potential for diagnosing the supraspinatus diseases. 3. The AUC of ROC curves of OAA, OAA-FSVM, OAA-DTB, OAO-VB and OAO-DAG are 0.876, 0.943, 0.921, 0.901 and 0.889. Obviously, the AUC of OAA-FSVM method is largest, but the OAA is the worst. This result shows that the OAA-FSVM has considerable successes in the classification of ultrasonic supraspinatus images because of decreasing the problems of unclassifiable regions.
M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133
Finally, some conclusions are made as follows. This article compares with the performances of the five different MCSVM implementations in the disease classification of ultrasonic supraspinatus images. Experimental results show that OAA-FSVM is the most effective one because of the association with a fuzzy membership value. It significantly improves the original OAA MCSVM method for classifying the supraspinatus images. As for other methods such as OAO-VB and OAO-DAG, experimental results verify that they do not work well for classification of supraspinatus images, thereby how to modify original one-against-one support vector machine more effective is still an interesting studies in further wok. Although OAA-FSVM method associated with GLCM and TFCM methods has been demonstrated to be effective for classification of the ultrasonic supraspinatus images, the investigation of how to integrate the merit of other texture analysis methods such as wavelet transform (Arivazhagan, Ganesan, & Priyal, 2006; Sengur, 2008) and Gabor transform (Al-Rawi & Yang, 2007) are also interesting topics. Acknowledgement The author would like to thank the National Science Council, ROC, under Grant No. NSC 94-2213-E-251-005 for support of this work. References Abe, S. (2005). Support vector machines for pattern classification. Springer-Verlag. Allwein, E., Schapire, R., & Singer, Y. (2000). Reducing multiclass to binary: An unifying approach for margin classification. AT&T Corp. Press. Al-Rawi, M., & Yang, J. (2007). Using Gabor filter for the illumination invariant recognition of color texture. Mathematics and Computers in Simulation (available online 4). Arivazhagan, S., Ganesan, L., & Priyal, S. P. (2006). Texture classification using Gabor wavelets based rotation invariant features. Pattern Recognition Letters, 27, 1976–1982. Arslan, G., Apaydin, A., Kabaalioglu, A., Sindel, T., & Luleci, E. (1999). Sonographically detected subacromoal/subdelroid bursal effusion and biceps tendon sheath fluid: Reliable signs of the rotator cuff tear? Journal of Clinical Ultrasound, 27, 335–339. Baesens, B., Gestel, T., Van Viaene, S., Stepanova, M., Suykes, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art classifications for credit scoring. Journal of the Operational Research Society, 54, 627–635. Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5, 537–550. Chang, S., & Partridge, D. (2003). Feature ranking and best feature subsets using mutual information. Neural Computation, 13, 1925–1933. Chen, X. W. (2003). An improved branch and bound algorithm for feature selection. Pattern Recognition Letter, 24, 1925–1933. Chen, D. R., Chang, R. F., Kuo, W. J., Chen, M. C., & Huang, Y. L. (2002). Diagnosis of breast tumors with sonographic texture analysis using wavelet transform and neural networks. Ultrasound in Medicine & Biology, 28, 1301–1310. Chen, S. Y., & Lin, C. J. (2005). Combining SVMs with various feature selection strategies. Available from
. Chih-Jen Lin’s Homepage, Accessed: February, 2008. Chiou, H. J., Chou, Y. H., Wu, J. J., Hsu, C. C., Tiu, C. M., & Chang, C. Y. (1999). Highresolution ultrasonography of the musculoskeletal system: Analysis of 369 cases. Journal of Ultrasound in Medicine, 7, 212–218. Diri, B., & Albayrak, S. (2008). Visualization and analysis of classifiers performance in multi-class medical data. Experts Systems with Applications., 34, 628–634. Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Network, 10, 1048–1054. Dubuc, D., Zucker, S. W., Teicott, C., Quiniou, J. F., & Wehbi, D. (1989). Evaluating the fractal dimension f surfaces. Journal of the Optical Society of America, A7, 1124–1130. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874. Hand, D. J., & Till, R. J. (2001). A simple generalization of the area under ROC curve for multiple class classification. Machine Learning, 45, 171–186. Haralick, R. M., Shanmugam, K., & Dinstein, I. (1973). Textural features for image classification. IEEE Transaction on Systems, Man, and Cybernetics, 3, 610–621.
8133
He, D. C., & Wang, L. (1991). Texture features based on texture spectrum. Pattern Recognition, 24, 391–399. Horng, M. H. (2007). An ultrasonic image evaluation system for assessing the severity of chronic liver disease. Computerized Medical Imaging and Graphics, 31, 485–491. Horng, M. H., Sun, Y. N., & Lin, X. Z. (2002). Texture feature coding method for classification of liver sonography. Computerized Medical Imaging and Graphics, 26, 33–42. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Available from . Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415–425. Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Application, 3, 847–856. Kakkos, S. K., Stevens, J. M., Nicolaides, A. N., Kyriacou, E., Pattichis, C. S., Geroulakos, G., et al. (2007). Texture analysis of ultrasonic images of symptomatic carotid plaques can identify those plaques associated with ipsilateral embolic brain infarction. European Journal of Vascular and Endovascular Surgery, 33, 422–429. Keller, J. M., Chen, S., & Crownover, R. M. (1989). Texture description and segmentation through fractal geometry. Comput Vision Graph Image Process, 45, 150–166. Krebel, U. (1999), Pairwise classification and support vector machines. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in kernel methods: Support vector learning. Cambride, MA: MIT Press. Kudo, K., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifier. Pattern Recognition, 33, 25–41. Kwak, N., & Choi, C. H. (2002). Input feature selection by mutual information. IEEE Transaction on Pattern Analysis and Machine Intelligence, 24, 1667–1671. Lu, C., Gestel, T., Van Suykens, J. A. K., Huffel, S., Van Vergote, I., & Timmerman, D. (1999). Preoperative prediction of malignancy of ovarium tumor using least squares support vector machines. Artificial Intelligence in Medicine, 28, 281–286. Mandelbrot, B. B. (1977). The fractal geometry of nature. San Francisco, CA: Freeman. Mao, K. Z. (2004). Orthogonal feature selection and back elimination algorithms for features subset selection. IEEE Transactions on System Man Cybernetic, 34, 629–634. Marcello, H. N.-B., Jose, B. V., Jorge, E. J., & Gerson, M. (2002). Diagnostic imaging of shoulder rotator cuff lesions. Acta Ortopédica Brasileria, 10, 31–39. Maruyama, K. L., Maruyama, Minoru, Miyao, Hidetoshi, & Nakano, Yasuaki (2004). A method to make multiple hypotheses with high cumulative recognition rate using SVMs. Pattern Recognition, 37, 241–251. Neer, C. S. II, (1972). Anterior acromioplasty for the chronic impingement syndrome in the shoulder: A preliminary report. The Journal of Bone and Joint Surgery, 54, 41–50. Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for features selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1424–1437. Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems, 12, 547–553. Polat, K., & Gunes, S. (2008). A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems.. Expert Systems with Application (available online on December 2007). Pontil, M., & Verri, A. (1998). Support vector machines for 3-D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 637–646. Sengur, A. (2008). Wavelet transform and adaptive neuro-fuzzy inference system for color texture classification. Expert Systems with Applications, 34, 2120–2128. Takahashi, F., & Abe, S. (2002). Decision-tree-based multiclass support vector machine. In Proceedings of the ninth international conference on neural information processing (ICONIP’01). Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer. Wang, T. Y., & Chiang, H. M. (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing and Management, 43, 914–929. Wu, T. F., Lin, C. J., & Weng, R. C. (2004). Probability estimates for multi-class classification by pairwise coupling. Journal of the Machine Learning Research, 5, 975–1005. Yu, G. X., Ostrouchov, G., Geist, A., & Samatova, N. F. (2003). An SVM-based algorithm for identification of photosynthesis-specific genome features. In Second IEEE computer society bioinformatics conference (pp. 253–243). Zhang, H., & Sun, G. (2002). Feature selection using tabu search method. Pattern Recognition, 35, 701–711. Zhang, J. H., Wang, Y. Y., Dong, Y., & Wang, Y. (2007). Ultrasonographic feature selection and pattern classification for cervical lymph nodes using support vector machines. Computer Methods and Programs in Biomedicine, 88, 75–84.