Multi-class support vector machine for classification of the ultrasonic images of supraspinatus

Expert Systems with Applications 36 (2009) 8124–8133 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

657KB Sizes 0 Downloads 37 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 36 (2009) 8124–8133

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Multi-class support vector machine for classiﬁcation of the ultrasonic images of supraspinatus Ming-Huwi Horng National PingTung Institute of Commerce, Department of Information Technology, No. 51, MinSheng E. Rd., Pingtung 900, Taiwan

a r t i c l e

i n f o

Keywords: Ultrasonic supraspinatus images Mutual information criterion Multi-class support vector machine Feature selection

a b s t r a c t This article proposes an effort to apply the multi-class support vector machine classiﬁers to classify the supraspinatus image into different disease groups that are normal, tendon inﬂammation, calciﬁc tendonitis and supraspinatus tear. The supraspinatus tendon is often involved in the above-mentioned disease groups. Four different texture analysis methods – texture feature coding method, gray-level co-occurrence matrix, fractal dimension evaluation and texture spectrum – are used to extract features of tissue characteristic in the ultrasonic supraspinatus images. The mutual information criterion is adopted to select the powerful features from ones generated from the above-mentioned four texture analysis methods in the training stage, meanwhile, the ﬁve implementations of multi-class support vector machine classiﬁers are also designed to discriminate each image into one of the four disease groups in the classiﬁcation stage. In experiments, the most commonly used performance measures including sensitivity, speciﬁcity, classiﬁcation accuracy and false-negative rate are applied to evaluate the classiﬁcation of the ﬁve implantations of multi-class support vector machines. In addition, the receiver operating characteristics analysis is also used to analyze the classiﬁcation capability. The present results demonstrate that the implementation of multi-class fuzzy support vector machine can achieve 90% classiﬁcation accuracy, and performance measures of this implementation are signiﬁcantly superior to the others. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction Shoulder pain is a common orthopedic problem. Clinical history and examination are essential in the diagnosis of shoulder pain origin. The most common causes of shoulder pain are rotator cuff diseases, which can be due to impingement syndrome or tearing into the rotator cuff tendons. The rotator cuff is made up of four muscles and the associated tendons that surround the humeral head of the shoulder and originate at the scapula. The four muscles are supraspinatus, infraspinatus, subscapularis and teres minor. Among the four muscles, the supraspinatus of the rotator cuff is easiest to injury or more serious to tear when people do energetic exercise. Accordingly, clinical physicians routinely observe the injury symptoms of supraspinatus. Ultrasonography is the chief imaging modality to assess the supraspinatus. Several gray scale and texture features had been found in the past for the differentiation of diseases of supraspinatus. Arslan, Apaydin, Kabaalioglu, Sindel, and Luleci (1999) adopted the 7.5-MHz linear-array transducer to grab images of patients with rotator cuff injury under longitudinal view. The study demonstrated that the bursa ﬂuid and biceps effusion were highly correlated with symptoms of rotator cuff injury. Chiou et al. (1999) E-mail address: [email protected] 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.10.030

proposed some diagnosis criteria for classiﬁcation of full/partial supraspinatus tears of patients with shoulder pain. The sensitivity and speciﬁcity of experiments in applying the diagnosis criterion are 0.98 and 0.87. Marcello, Jose, Jorge, and Gerson (2002) used criteria that are (a) one or more cuff tendon(s) was not visible, (b) focal non visibility of one the tendons and (c) defect of well deﬁned discontinuity of the tendons to diagnose the rotator cuff tear. However, all of these studies focused on the diagnosis of the rotator cuff tear. It is presently deﬁcient to discuss the classiﬁcation of other impingement syndromes such as inﬂammation and tendon calciﬁcation of the rotator cuff. Recently, the popular classiﬁcation system (Neer, 1972) is widely used to divide impingement syndrome diseases of supraspinatus into three stages in clinical diagnosis. The inﬂammation manifestations such as edema or hemorrhage usually appear in supraspinatus of Stage 1, thereby, all ultrasonic images of this stage are collected into the Inﬂammation Group for later classiﬁcation. The supraspinatus of Stage 2 is more serious and considered irreversible. Fibrosis and calciﬁcation always exhibit in the supraspinatus; hence, the images of this stage are regarded as the Calciﬁc Group. Stage 3 generally occurs in patients more than 50 years old, and frequently involves a tendon rupture or tear. If the tendon rupture arises in the supraspinatus, it may require further repair. The images of Stage 3 are referred to as the Tear Group.

8125

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

The texture-based measures had been applied to ultrasound images over a decade. Chen, Chang, Kuo, Chen, and Huang (2002) utilized the wavelet transform and neural networks to classify the breast tumors. Horng (2007) applied the texture feature coding method and gray-level co-occurrence matrix to extract correlated features for classiﬁcation of ultrasonic liver images. Kakkos et al. (2007) used spatial gray level dependence matrices and gray level run-length statistics to discriminate the diseases of symptomatic carotid plaques. Zhang, Wang, Dong, and Wang (2007) applied some sonographic features such as size, shape and echogeneity to classify the cervical lymph nodes based on a support vector machine. This article presents a new attempt to integrate different texture analysis methods to analyze the characteristic of ultrasonic supraspinatus images for classiﬁcation. Numerous features can be extracted from the texture analysis methods for measuring the different patterns of supraspinatus images. Not all of the features are signiﬁcant for this application. Most of the features may be redundant or even irrelevant. As the number of features used grows, the number of training samples for estimating the model parameters usually grows exponentially. Therefore, in the classiﬁcation of supraspinatus images we need some feature selection algorithms to reduce the dimension of features. In general, only the exhaustive search can assure an optimal solution to the feature selection problem, but it is always time-consuming. The search technology and feature saliency measures are two alternative approaches for obtaining the suboptimal solution. Currently, many search technology had been proposed, those include sequential forward selection or backward elimination (Mao, 2004), forward ﬂoating search (Oh, Lee, & Moon, 2004), genetic (Zhang & Sun, 2002), and branch and bound algorithm based search (Chen, 2003). As for the approaches of feature saliency measures, some popular methods, for examples, the maximization of the mutual information between the feature space and the classes (Chang & Partridge, 2003), Fish index (Kudo & Sklansky, 2000) and F-score ranking method (Chen & Lin, 2005), had been developed. However, these methods are usually necessary to compare a large number of subsets of candidate features to look for the best. In this article, the mutual information criterion is adopted to select the powerful features for classiﬁcation. Multi-class classiﬁcations had been practical usages in several problems of pattern recognition. There are two kinds to construct a multi-class classiﬁer. One is to directly develop multi-class algorithms such as neural networks, and the other decomposes a multi-class problem to several two-class problems such as multi-class support vector machines (simpliﬁed as MCSVM in the rest of the paper). Support vector machine (SVM) originally designed for binary classiﬁcation. A number of decompositions had been proposed for extending to the multi-class classiﬁcation problems (Hsu & Lin, 2002; Wu, Lin, & Weng, 2004). The implementations of MCSVM are generally divided into two different approaches that are the one-against-all method and the one-against-one method. The oneagainst-all method (Wu et al., 2004) compares a given class with all the others put together; therefore, it needs to construct k SVMs for the k-class classiﬁcation. The one-against-one approach (Abe, 2005) constructs many binary SVMs between all possible pairs of classes, hence, it constructs k(k 1)/2 SVMs for k-class classiﬁcation. In Hsu and Lin (2002), they showed that the one-against-one method is more suitable for practical use. In this article, we compare with the performance of ﬁve different implementations of MCSVM that are the original one-against-all SVM, the one-against-all fuzzy SVM, the one-against-all decision- tree-based SVM, the original one-against-one SVM and the one-against-one directed acyclic graph methods in the classiﬁcation of supraspinatus images. A new attempt to integrate the different texture analysis methods to classify the supraspinatus diseases of normal, inﬂammation, tendon calciﬁc and tear is proposed. The adopted methods consist of three stages that are the texture feature extraction, the selection

of texture features and the disease classiﬁcations. In the texture feature extraction stage, the four texture analysis methods including the gray-level co-occurrence matrix (Haralick, Shanmugam, & Dinstein, 1973), texture spectrum (He & Wang, 1991), fractal dimension method (Dubuc, Zucker, Teicott, Quiniou, & Wehbi, 1989; Keller, Chen, & Crownover, 1989) and texture feature coding method (Horng, Sun, & Lin, 2002), are used to generate numerous texture features. In the feature selection stage, the mutual information (Battiti, 1994; Kwak & Choi, 2002) criterion is applied to ﬁnd the powerful features among features generated from the four texture analysis methods. Finally, the ﬁve different implementations of MCSVM are designed to classify all of the ultrasonic supraspinatus images based on the ﬁve-fold cross-validation method in disease classiﬁcation stages. 2. Materials and methods In this section, we brieﬂy describe the methods used in this article supporting the texture analysis, feature selection and classiﬁcation methods. 2.1. Texture analysis methods 2.1.1. Gray-level co-occurrence matrix (GLCM) A co-occurrence matrix is generally referred to as a gray-level co-occurrence matrix whose entries are transitions between all pairs of two gray levels (not necessarily distinct). The gray-level transitions are calculated based on two parameters – displacement d and angular orientation . More precisely, let i and j be two graylevels and N d;h ði; jÞ denotes the number of transitions between two pixels whose gray levels are i and j with d-pixels apart and angular orientation . In other words, N d;h ði; jÞ is the number of pixel-pairs at locations ( x, y) and ( w, z) satisfying the following conditions: where Gðx; yÞ ¼ i; Gðw; zÞ ¼ j and jðx; yÞ ðw; zÞjdm ¼ ðd; hÞ, jðx; yÞ ðw; zÞjdm ¼ ðd; hÞ is a distance measure to describe the distance between two pixels of spatial locations at (x, y) and (w, z) with d-pixels apart and along angular orientation . Normalizing N d;h ði; jÞ yields the probability or the relative frequency of gray-level transitions.

pði; jjd; hÞ ¼

N d;h ði; jÞ N

ð1Þ

where N is the number of total gray-level transitions in the cooccurrence matrix. Based on the gray-level co-occurrence matrix Haralick et al. (1973) proposed 12 feature measures for texture analysis under speciﬁc d-pixels apart and an angular orientation h. There are angular second moment, correlation, gray variance, inverse difference moment, entropy, sum entropy, difference entropy, and information measure of correlation, sum average, contrast, sum variance, and difference variance. These feature measures were used in experiments for comparison where d =1, 2, 3 and 4 pixels under all angular orientations. 2.1.2. Texture spectrum (TS) He and Wang (1991) ﬁrst proposed the texture spectrum. The idea is to consider a so-called texture unit described by Fig. 1 with V0, the central pixel, designated as the pixel currently examined and its 8 neighboring pixels Vi, i > 0. Three values {0, 1, 2} will be assigned to Vi, respectively according to Eq. (2) as follows:

8 > < 0 if V i V 0 < D Ei ¼ 1 if jV i V 0 j 6 D > : 2 if V i V 0 > D

i ¼ 1; 2; . . . ; 8 and NEi ¼

8 X

Ei 3j1

j¼1

ð2Þ

8126

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

V2 V3 V4 V1 V0 V5 V8 V7 V6 Fig. 1. 3 3 texture unit of texture spectrum.

In Eq. (2), the D is denoted as the tolerance of variation. In experiments, the D is assigned to be 3. Obviously, there are 38 = 6561 combinations for the Ei in Eq. (2). The distribution of occurrence of all texture numbers, NEi , generated by Eq. (2) is called the texture spectrum. As the texture unit represents the local texture information of a given pixel and its neighborhood, the statistics of all the texture units in an image reveal its global texture aspects. Several features are useful for classiﬁcation and deﬁned, being, black–white symmetry (BWS), geometric symmetry (GS), degree of direction (DD), orientation features (MHS, MVS, MDS1 and MDS2) and central symmetry (CS). 2.1.3. Fractal dimension (FD) The concept of FD introduced by Mandelbrot (1977) provided a good representation of the roughness of natural surfaces. Dubuc et al. (1989) proposed the evaluation of FD by computing the maximum variation, V(e), of an image intensity surface f(x, y) in a square e. The maximum variation is deﬁned as

VðeÞ ¼

Z 0

1

Z

1

v ðx; y; eÞdxdy

ð3Þ

0

where v ðx; y; eÞ ¼ sup jf ðx1 ; y1 Þ f ðx2 ; y2 Þj, and the supreme is taken over all pairs (x1, y1), (x2, y2) for which maxfjx x1 j; jx x2 j; jy y1 j; jy y2 jg 6 e. The FD is computed based on the following equation:

ln VðeÞ FD ¼ lim 3 e!0 ln e

into a texture feature image whose pixels are represented by texture feature numbers. The texture feature number of each pixel X is generated on the basis of gray-level changes of its eight surrounding pixels of the 3 3 texture unit (TU) in Fig 2. In this method, we consider the connectivity of the texture unit. The eight neighboring pixels in Fig. 3 constitute the 8-connectivity of the texture unit, which can be divided into the ﬁrst-order 4-connectivity pixels and second-order 4-connectivity pixels. The four pixels labeled 1 satisfy the ﬁrst-order 4-connectivity of the texture unit because they are immediately adjacent to the pixel X. They will be denoted by ﬁrst-order connectivity pixels. The other four pixels labeled 2 satisfy the second-order 4-connectivity of the texture unit, which are diagonally adjacent to X and will be denoted by second-order connectivity pixels. In order to code pixel X in Fig. 3, TFCM produces a pair of integers (a, b) where a and b represent gray-level variations of ﬁrst-order and second-order connectivity, respectively. As shown in Fig. 4, two scan lines along the 0°–180° and 90°–270° directions produce two sets of three successive ﬁrst-order connectivity with pixel X in the middle of the horizontal and vertical lines forming by ‘‘+”. Similarly, two scan lines along the diagonal directions 45°–225° and 135°–315° formed by ‘‘” also produce two sets of three successive second-order connectivity pixels as shown in Fig. 5. Assume that the scan direction is from top to down and left to right, the three successive pixels can be arranged as (a, b, c) in the scanning order. For example, if (a,b, c) represents pixels scanned by the vertical line of the ﬁrst-order connectivity in Fig. 4, the top pixel is denoted by a, pixel X by b and the bottom pixel by c. Suppose that (Ga, Gb, Gc) corresponds the gray levels of three pixels (a, b, c), respectively. If we consider two successive gray-level changes between two pairs (Ga, Gb) and (Gb, Gc), there are four different types of variations. The four types of gray-level variation deﬁned by Eq. (6) can be graphed by the gray-level graphical structure shown in

ð4Þ

2 1 2 1 X 1 2 1 2

In other word, the FD can be estimated as three minus the slope of the least square ﬁt of the data: fln e; lnðVðeÞg as e. In the box-counting method, the rate of growth of the number of boxes needed to cover a volume of the gray-level surface is computed as the FD. Let N(L) denote the number of boxes of size L needed to cover the gray-level volume. Keller et al. (1989) estimated FD based on the following equation:

D¼

ln NðLÞ ln L

Fig. 3. 3 3 texture unit of TFCM.

2 1 1 X 2 1

ð5Þ

That is to say, the FD can be estimated for the least square linear ﬁt of ln NðLÞ and ln L.

0º- 180º scan line

2.1.4. Texture feature coding method (TFCM) The texture feature coding method is a coding and rotationinvariance scheme, which transforms an original gray-scale image

a

+ : 90º- 270º scan line Fig. 4. First-order 4-connectivity.

b V V

2

1

V8

V V V

3

V V

0

7

V

4

5

6

2 1 2

E E E

2

1

8

E E E

3

0

7

E

4

E5 E

6

Fig. 2. (a) Neighborhood of the 3 3 pixels, (b) texture unit (TU) of texture spectrum.

:

2 1 1 X

1

2

2

1

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

8127

TFNðx; yÞ ¼ aðx; yÞbðx; yÞ

ð7Þ

2

where a(x, y) and b(x, y) are values obtained using Table 1 for the pixel at spatial location (x, y). According to Eq. (7), there are 55 used texture feature numbers among {1, 2, . . . , 529}. Therefore we can compress 529 values to 55 values by removing unused texture feature numbers. By re-labeling we map that these 55 values to the integer values 0 to 54. In this case, we can deﬁne a texture feature number histogram by

45º -225º scan line 135º -315º scan line

Fig. 5. Second-order four-connectivity.

RP D ðnÞ ¼

RND ðnÞ ; N

n 2 f0; 1; 2 . . . ; 54g

ð8Þ

The D is the gray-level variation tolerance given in Eq. (6), RN D ðnÞ is the frequency of occurrence of the texture feature number n, and, N is the total number of pixels in the feature image. Furthermore the texture feature number co-occurrence matrix is also generated according to Eq. (9)

(i) (ii) (iii) (iv)

PD ði; jjd; hÞ ¼

Fig. 6. Types of gray-level graphical structure variations.

Fig. 6. In Eq. (6), the D is denoted to tolerance of variation. Type (i) describes the case that the gray levels of a, b and c are very close within the tolerance D. Type (ii) is the case that one pair of gray levels is within D, but the gray-level variation of the other pair variation exceeds D. Type (iii) is the case where the gray levels of a, b, and c are continuously decreasing or increasing with gray-level differences larger than D. In Type (iv), the gray-level variation is either ﬁrst decreasing then increasing or ﬁrst increasing then decreasing, in which the increments and decrements exceeds D

ðiÞ if ðjGa Gb j6 DÞ \ ðjGb Gc j6 DÞ ðiiÞ if ½ðjGa Gb j6 DÞ \ ðjGb Gc j6 DÞ [ ½ðjGa Gb jP DÞ \ ðjGb Gc j6 DÞ ðiiiÞ if ½ðGa Gb > DÞ \ ðGb Gc > DÞ

ð6Þ

RN D;d;h ði; jÞ ; Nt

i; j 2 0; 1; 2; . . . ; 54

ð9Þ

where RN D;d;h ði; jÞ is deﬁned similarly as in Eq. (9), i and j are TFNs rather than gray-levels and Nt is the total number of TFN transitions. In experiments, the displacement d is chosen to be one and two pixels under angular orientation h cover all eight orientations of 0°, 45°, 90°, 135°, 180°, 225°, 270° and 315°. The seven texture features are extracted form the texture feature number histogram and texture feature number co-occurrence matrix under a speciﬁc pair of displacement and angular orientation that are coarseness (Coarse), homogeneity (Hom), mean convergence (MC), variance (Var), code entropy (CE), code similarity (CS) and resolution similarity (RS). In summary, each region R of the image can be extracted to 72 texture features that are generated from the above-mentioned four texture analysis methods. In these texture features, 48 features were generated from GLCM, eight features from TS, two from FD and the others from TFCM.

[ ½ðGb Ga > DÞ \ ðGc Gb > DÞ ðivÞ if ½ðGa Gb > DÞ \ ðGc Gb > DÞ

2.2. Powerful feature selection

[ ½ðGb Ga > DÞ \ ðGb Gc > DÞ According to this deﬁnition of gray-level graphical structure variation, the higher the type number, the greater the gray-level variation will occur. Since both a and b, corresponding to 1st and second order connectivity, respectively, have two scan lines, each of which can produce a type of gray-level variation, then a and b can be assigned a pair of gray-level graphical structure variations. The total number of combinations of arbitrary two gray-level graphical structure variations (including self-combinations) is 4ð4þ1Þ ¼ 10. As a result, each a and b can be represented by that dis2 played in Table 1. For example, if a = 11 or b = 11, there is a combination of (ii) and (iii). Finally, the texture feature number of each pixel is generated by taking the product of a and b. Let the gray level of the pixel with spatial location (x, y) be denoted by G(x, y) and the corresponding texture feature number by TFN(x, y). Then

Table 1 The variation generation of ﬁrst-order (a) or second-order (b) 4-connectivity. Scan line 1 (a)

Scan line 2 (b)

(i) (ii) (iii) (iv)

(i)

(ii)

(iii)

(iv)

1 2 3 5

2 7 11 13

3 11 17 19

5 13 19 23

Input feature selection is a very important step in supervised or unsupervised learning problem especially when the number of observations is relatively small comparing to the number of features. Many irrelevant or redundant features may lead to over-ﬁtting and even to poor accuracy of the classiﬁcation system. However, the optimal feature selection is a combinatorial problem since it would be necessary to evaluate all combinations of available features. In this section we introduce the mutual information criterion for feature selection. First, we will describe the mutual information criterion (Kwak & Choi, 2002) for feature selection. Given a set of n training samples (xi, ci) of a d-dimensional feature vector X, xi ¼ ðfi1 ; fi2 ; . . . ; fid Þ 2 Rd ; and class labels as samples of discrete-valued random variable C, ci, i 2 f1; 2; . . . ; Ng. In experiments, the class number N is assigned to 4, that are, the c1 is normal disease group; the c2 for tendon inﬂammation; the c3 for the calciﬁc tendonitis and the c4 for supraspinatus tear, respectively. The objective of mutual information (MI) criterion is to ﬁnd the subset S (S X) with k-dimensional features (k < d) which maximizes the mutual information I(S; C). The details of the computation are described as follows. In general, the class label has discrete values while the input features are continuous variables. As a result, the mutual information IðS; CÞ between the selected features S and the class C can be expressed as follows:

IðS; CÞ ¼ HðCÞ HðCjSÞ;

ð10Þ

8128

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

HðCÞ ¼

X

Pðci Þ log Pðci Þ

ð11Þ

2.3. Multi-class support vector machines

ci 2C

HðCjSÞ ¼

Z

pðsÞ

s

X

pðcjsÞ log pðcjsÞds;

ð12Þ

ci 2C

where s is the instances of selected features. The H(C) is the entropy function that is easily calculated using Eq. (11). But the computation of the conditional entropy HðCjSÞ is very complicated because of the difﬁculty in estimating the probabilities of the p(s) and the pðcjsÞ. Kwak and Choi (2002) proposed a Parzen window method to calculate the probabilities of p(s) and pðcjsÞ. The nucleus of Parzen window estimation is that the approximation of probability involves the superposition of normalized Gaussian window functions u(s, h) centered on a set of random samples. Given a set of m k-dimensional selected feature vectors S = {s1, s2, . . . , sm}, the estimate of the p(s) is given by ^

pðsÞ ¼ ¼

m 1 X /ðs si ; hÞ m i¼1

! m 1 X 1 ðs si ÞT R1 ðs si Þ exp P 2 m i¼1 ð2pÞk=2 hk j j1=2 2h

ð13Þ

P where h is the window width parameter and is the covariance matrix. As for the estimation of p(c|s), the Bayesian rule is applied to transcribe it into Eq. (14)

PðcjsÞ ¼

pðscÞpðcÞ pðsÞ

ð14Þ

Hence, the probability of p(s|c) can be expressed as the following equation:

pðsjcÞ ¼ pðs ¼ si jc ¼ cj Þ ¼

1 X /ðs si ; hi Þ ncj s 2Ic

ð15Þ

i

where j = 1, 2, ...., N; ncj is the number of training samples belonging to class cj; and Ic is the part of training feature set belonging to class cj. The conditional probability p(c|s). can be expressed into the following equation using Eqs. (14) and (15):

P

pðsjcÞpðcÞ s 2I /ðs si ; hj Þ PðcjsÞ ¼ ¼ Pin C pðsÞ i¼1 /ðs si ; hÞ

The multi-class support vector machines (MCSVM) are the methods originally designed together with many binary support vector machines. First, we explain the highlight of binary SVM for classiﬁcation (Allwein, Schapire, & Singer, 2000). Sequential paragraph describes two different approaches of MCSVM classiﬁers that are the one-against-all and the one-against-one approaches. 2.3.1. Binary support vector machine Binary support vector machine is ﬁrst suggested by Vapnik (1995) and widely used in pattern recognition application including the credit scoring (Baesens et al., 2003; Huang, Chen, & Wang, 2007), spam categorization (Drucker, Wu, & Vapnik, 1999), bioinformatics (Yu, Ostrouchov, Geist, & Samatova, 2003) and brain tumor classiﬁcation (Lu et al., 1999). The detail of SVM can be found in the referred works of Abe (2005) and Hsu, Chang, and Lin (2003). N Given a training sets of N points fðxi ; yi Þgi¼1 , with data x 2 Rn and the corresponding class labels yi 2 fþ1; 1g, SVM computes the optimal separating hyperplane with the maximum margin by solving the following the optimization problem:

Min wT w subject to the constraint yi ðwT xi þ bÞ P 1; for i ¼ 1; . . . ; N w;b

ð17Þ The quadratic optimization is used to ﬁnd the saddle point of the Lagrange function of Lagrange multipliers ai based on following equation:

LP ðw; b; aÞ ¼

N X 1 T ðai yi ðhw xi i þ bÞ 1Þ; w w 2 i¼1

ai P 0

ð18Þ

The optimal solution of Lp will be found in saddle point that minimizes the Lp respect to w and b and maximizes with respect to ai

@LP ðw; b; aÞ ¼0 @w

ð19Þ

@LP ðw; b; aÞ ¼0 @b

ð20Þ

ai fyi ðwT xi þ bÞ 1g ¼ 0 for i ¼ 1; . . . ; N

ð21Þ

ai P 0 for i ¼ 1; . . . ; N

ð22Þ

ð16Þ

where hj is the speciﬁc window width parameter of class cj. In experiments, the h and hj are assigned to be equal for simplicity. Battiti (1994) applied a greedy forward sequential search method to search the most relevant features from an input feature set. In this scheme, starting from the empty set of selected features, the best feature of current feature set X is added into the selected feature set S and removed it from X until the mutual information between selected features and their corresponding class labels is maximized. The complete algorithms using the MI are realized as described. step 1. Let the S is empty. step 2. Compute the H(C) according to Eq.(11) for all training samples. step 3. Select the ﬁrst feature, f(1), form the original feature set X, that maximizes the MI of S = {f(1)} and corresponding class label. Let X =X -{f(1)}, step 4. 4.1. Add any feature of X to the selected feature subset S and compute I(S, C). 4.2. Choose the best one feature fj e X such that maximizes I(S, C). Set X = X {fj} and add fj to the feature subset S. 4.3 Repeat 4.1 and 4.2, until the mutual information between selected feature vector S and their corresponding class labels C is maximized. step 5. Output the set S containing the selected feature vectors for later classiﬁcation.

T

Form Eq. (22), ai = 0 or ai –0 and yi ðw xi þ bÞ ¼ 1 must be satisﬁed. Eqs. (21) and (22) can be reduced into the following equations:

w¼

N X

ai yi xi

ð23Þ

i¼1 N X

ai yi ¼ 0

ð24Þ

i¼1

The data xi is called the support vector when its corresponding ai is positive. Geometrically, these support vectors are the closet to the optimal hyperplane. Eqs. (19)–(22) meet the Karush Kuhn–Tucker (KKT) condition and further transform them into the dual LagrangianLD(a) by substituting Eqs. (23) and (24) into Eq. (25) as follows:

Max a

LD ðaÞ ¼

N X i¼1

ai

N 1X ai aj yi yj hxi xj i 2 i;j¼1

ð25Þ

P Subject to: ai P 0 and Ni¼1 ai yi ¼ 0, for i = 1, . . . , N. The solution ai of Eq. (25) determine the parameter w and b of the optimal hyperplane. Thus, the decision function f(x) can be expressed in the following formula:

8129

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

f ðxÞ ¼ sgn

N X

!

i

yi ai hxi xi þ b

ð26Þ

ci ¼ arg max ððwi ÞT /i ðxj Þ þ b Þ

ð31Þ

i¼1;...;k

i¼1

The formulated support vector machine is called the hard-margin support vector machine. However, the hard-margin support vector machine does not work well while the data xi are linearly inseparable. One improvement is the nonlinear SVM that maps the training data from the input space into a higher-dimensional dot-product feature space via a mapping function u(x). The mapping function u(x) must satisfy the Mercer’s Theorem. To simplify, the kernel function k(xi, xj) is used to replace the inner products of u(xi) and u(xj). The nonlinear SVM dual Lagrangian LD(a) can be expressed as the following equation:

a

LD ðaÞ ¼

N X 1 i T i sij nij ðw Þ w þ C 2 j¼1

Min

/ðxi Þ /ðxj Þ ¼ kðxi ; xj Þ Max

(b) One-against-all fuzzy SVM (OAA-FSVM). Two approaches that are fuzzy support vector machines (FSVM) (Wang & Chiang, 2007) and decision-tree-based (DTB) method (Polat & Gunes, 2008; Pontil & Verri, 1998; Takahashi & Abe, 2002) are proposed to improve the classiﬁcations using discrete decision functions of the one-against-all support vector machines. The main difference between the original OAA SVM and FSVM is that the cost C of FSVM is multiplied by fuzzy membership functions sij ð0 6 sij 6 1Þ. More precisely, Eq. (30) should be transformed into Eq. (32):

N X i¼1

wi ;bi ;ni

N 1X ai ai aj yi yj kðxi ; xj Þ 2 i;j¼1

subject to 0 6 ai 6 C;

i ¼ 1; . . . ; N

and

N X

ð27Þ

i

subject to ðwi ÞT /ðxj Þ þ b P 1 sij nij ðw Þ /ðxj Þ þ b 6 1 þ

ai yi ¼ 0

i¼1

nij P 0;

The C is the margin parameter that determines the trade-off between the maximization of the margin and minimization of the classiﬁcation error. The decision function f(x) is obtained by following the steps like the hard-margin support vector machine:

f ðxÞ ¼ sgn

N X

! yi ai kðx; xi Þ þ b

ð28Þ

The radial basis function (RBF) is used as kernel function shown as Eq. (29), and the support vector xi are the centers of the radial basis functions

ð29Þ

2.3.2. Multi-class support vector machines (MCSVM) In this section, we brieﬂy explained the ﬁve multi-class support vector machines that are original one-against-all SVM, oneagainst-all fuzzy SVM, one-against-all decision-tree SVM, original one-against-one SVM and one-against-one directed acyclic graph SVM. The ﬁve implementations are applied to classify ultrasonic supraspinatus images for comparison. (a) Original one-against-all (OAA) SVM. The earliest used implementation for MCSVM classiﬁer is the one-against-all method. The problems of OAA SVM using the discrete decision functions are the existence of unclassiﬁable regions. These unclassiﬁable regions always lead into large numbers of erroneous classiﬁcations. This implementation of k-class MCSVM is usually converted into k binary SVM problems. The ith SVM is trained with all examples in the ith class with positive label, and all other examples with negative label. Suppose that there is a training set D with N training examN ples,fðxi ; yi Þgi¼1 , where xi 2 Rn is given a class label yi 2 f1; :::::; kg. The SVM of the ith class can be expressed into the following problem:

Min

wi ;bi ;ni

N X 1 i T i nij ðw Þ w þ C 2 j¼1

ðw Þ /ðxj Þ þ b 6 1 þ nij

P 0;

j ¼ 1; . . . ; N; 0 6 sij 6 1

v ij

ð33Þ

Secondly, we deﬁne a fuzzy membership function matrix S for implementation of OAA-FSVM classiﬁer. Generally, we can assign higher weight sþ ij for each positive example and lower weight sij for each negative example. Therefore, the membership function sij can be deﬁned as

( sij ¼

sþij sij

if if

v ij ¼ 1 v ij ¼ 1

ð34Þ

As far as the fuzzy membership function sij of training data concerned in experiments, it is set to 1 if the values of vij are 1, and the fuzzy membership function is set to h if the values of vij = 1. (c) One-against-all decision-tree based (OAA-DTB) SVM. Another work of Takahashi (2002) published the four different decision-tree constructions for improving the performance of original OAA support vector machine. In experiments we adopt the type 2 method to construct the decision tree. The details of this method are described as follows: Step 1. Initially, we assume that all the classes belong to different clusters and compute the cluster centers and the distances measurement between cluster i and cluster j, dij ði; j ¼ 1; . . . ; kÞ, by Eqs. (35) and (36)

1 X x; X i is the set of training data belong to class i: jX i j x2X i

i

i

ð32Þ

if yj –i

8 > < 1 if the ith image belongs to the jth class; ¼ for i ¼ 1; . . . ; 120; j ¼ 1; . . . ; 4: > : 1 otherwise

ci ¼

subject to ðwi ÞT /ðxj Þ þ b P 1 nij i T

sij nij

In the classiﬁcation problem of 120 supraspinatus images into four disease groups, we ﬁrst use 4 bipolar target variables v ij to indicate the target value of training image i in class j by Eq. (33) and the union of v ij is denoted into matrix V. The column of V contains positive (vij =1) examples and negative examples (vij = 1)

i¼1

kðxi ; xj Þ ¼ expfckx xi k2 g

i

i T

if yj ¼ i

nij

if yj ¼ i

ð30Þ

if yj –i

ð35Þ dij ¼ kci cj k

ð36Þ

j ¼ 1; . . . ; N

We can assign the data xj belongs to the class ci with the maximal value of the decision function based on Eq. (31).

Step 2. Select two clusters with the smallest distance measurement and merge them into one new cluster.

8130

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

Step 3. Repeat step 2 until all clusters are merged into two clusters. Step 4. Calculate the optimal hyperplane that separates the clusters generated in step 3. 0 Step 5. If the separated cluster in Step 4 has k (>2) classes, we regard the classes as belonging to different clusters and repeat 0 0 Step 2 with k 2 time and go to Step 4. If k ¼ 2, compute the optimal hyperplane that separates the two classes. If no cluster has more than one class, terminate the algorithm. (d) One-against-one voting based (OAO-VB) SVM. Another approach is called the one-against-one MCSVM (Krebel, 1999). Although this approach can effectively decrease the unclassiﬁable regions that occur in the one-against-all SVMs, the unclassiﬁable regions still binary exists. Unlike the OAA MCSVM, this method constructs kðk1Þ 2 support vector machines. All training data xt belong to one of the ith and jth classes are used to construct a binary SVM to obtain the optimal solution of the wij, bij and nijt

Min

ij

wji ;bji ;nt

N X 1 ij T ij nijt ðw Þ w þ C 2 t¼1 ij

subject to ðwij ÞT /ðxt Þ þ b P 1 nijt ij

ij T

ðw Þ /ðxt Þ þ b 6 1 þ

nijt

if yt ¼ i

ð37Þ

if yt ¼ j

nijt P 0

ij

The class of data xt ¼ arg maxððwij ÞT /ðxt Þ þ b Þ i;j

ð38Þ

where wij is the N-dimensional vector, u(x) is a mapping function that maps data x into the N-dimensional feature space, bij is the bias term. In practice, a voting scheme associated with the one-againstone approach is used to decide the class of data x. The decision region Ri of ith class is deﬁned for explicit explanation of the voting scheme as follows: ij

Ri ¼ fxjDij ðxÞ > 0; j ¼ 1; . . . ; k; j–ig

ð39Þ

If test data x is in the Ri, we classify x into ith class. Otherwise, we classify x by a voting scheme according to the results of calculating Eq. (39), and further assign the test data x into class j* satisﬁes the Eqs. (40) and (41).

Di ðxÞ ¼

k X

signðDij ðxÞÞ

ð40Þ

j–i;j¼1

j ¼ arg max Dj ðxÞ j¼1;...;k

For simpliﬁcation, let the method 1 be the original one-againstall (OAA) SVM, the method 2 be the OAA-FSVM, the method 3 be the OAA-DTB. The three methods belong to the one-against-all approach. Others include two methods of one-against-one approach that are the voting-based (OAO-VB) and directed acyclic graph (OAO-DAG) approaches. In experiments, we will discuss the performance comparisons applying the ﬁve methods to disease classiﬁcation of the ultrasonic supraspinatus images. 3. The experimental results and performance evaluation In this section, many different performance indices including total classiﬁcation accuracy, sensitivity, speciﬁcity and false-negative rate based on the ﬁve-fold cross-validation methods are used to evaluate the classiﬁcation of supraspinatus images by using the ﬁve MCSVM implementations. Furthermore, the multi-class receiver operating characteristic (ROC) (Fawcett, 2006) analysis is also applied to analyze the discriminative capability of the ﬁve MCSVM methods. 3.1. Data acquisition

The decision function for ith class against jth class is deﬁned with the maximum margin as following equation, that is:

Dij ðxÞ ¼ ðwij ÞT /ðxÞ þ b

Step 3. Eliminate the head or the tail classes from the list depending on the decision of the SVM. Step 4. Repeat steps 2 and 3 until the only one class remains in the list.

ð41Þ

(e) One-against-one directed acyclic graph (OAO-DAG) SVM. Recently, a directed acyclic graph (DAG) support vector machine (Maruyama, Maruyama, Miyao, & Nakano, 2004) is proposed to speed up the evaluation time keeping good performance and fast learning time of on-against-one MCSVM approaches. The DAG nodes method is a decision graph in essential. It contains kðk1Þ 2 and each node corresponds to a binary SVM. Starting from the root node, the graph is traversed down to leaf by evaluating a SVM at each node. Platt, Cristianini, and Shawe-Taylor (2000) and Maruyama et al. (2004) proposed a list manipulation algorithm to traverse the directed acyclic graph. The procedure of list manipulation is described as follows: Step 1. Make the list of all possible classes. The order of the classes is arbitrary. Step 2. Compare the head and the tail classes in the list.

The ultrasonic image database we used was recorded from 2004 to 2007, and the ages of patients ranged from 30 to 65 years. The 120 acquired images that equally divided into four classes are captured from an HDI Ultramark 5000 Ultrasound system (ATL Ultrasound, CA, USA) with ATL linear array probe (dynamic focusing) from National Cheng Kung University Hospital. The captured images are digitized into 256 256 pixels with 256 gray-level via a frame grabber. An area of interest of each supraspinatus image with 30 60 pixels under the depth around 1.5–2.5 cm from the body surface is selected to extract texture features for classiﬁcation. Fig. 7a–d is the images of normal, tendon inﬂammation, calciﬁc tendonitis and supraspinatus tear. In experiments, all programs are implemented in Visual C++ associated with LIBSVM (Chih-Jen Lin’s Homepage, 2008) Toolbox of Matlab software. 3.2. Powerful feature selection In comparison with the number of training data, large numbers of features are possible to bring about poor classiﬁcation. Consequently, the mutual information criterion is used to reduce the dimension of texture features. The selected powerful features from the 72 features based on the MI criterion are listed in Table 2 including the Sum Variance (GLCM), Sum Variance (GLCM), Mean Convergence (TFCM), Contrast (GLCM) and Difference Variance (GLCM). It is valuable to explore that all of the selected powerful features are generated from GLCM and TFCM. 3.3. Classiﬁcation accuracy, sensitivity and speciﬁcity analysis The k-fold cross-validation method is a popular way to analyze the classiﬁcation performance. In experiments, all supraspinatus images are divided into ﬁve subsets with equal number of images of the different disease groups. One of the ﬁve subsets is selected as the test set and the other four subsets are put together to form a training set. More precisely, every supraspinatus image appears in a test set exactly once, and appears in training set four times. The average accuracy across all ﬁve trails is computed. The deﬁnitions of sensitivity, speciﬁcity and total classiﬁcation accuracy are listed as follows.

8131

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

Fig. 7. Samples of ultrasonic images of supraspinatus. The image (a) is a normal case; image (b) is a sample of the tendon inﬂammation. The images (c) and (d) are calciﬁc tendonitis and tear, respectively.

Table 2 Powerful texture feature selection by using the GLCM, TS, FD and TFCM methods. Feature selection method

Feature selected

Mutual information criterion

Sum average (GLCM, d = 2), Sum variance (GLCM, d = 2), Mean convergence (TFCM, d = 2), Contrast (GLCM, d = 3), Difference variance (GLCM, d = 2)

Sensitivity: number of true positive decisions / number of actually positive cases. Speciﬁcity: number of true negative decisions / number of actually negative cases. Total classiﬁcation accuracy: number of correct decisions/total number of cases. Among the three indices, sensitivity and speciﬁcity are the most important references with which clinical doctors are concerned. In order to further exhaustively investigate the performance of classiﬁcation using previously selected feature set, the total classiﬁcation accuracy, sensitivity and speciﬁcity of the ﬁve MCSVM classiﬁers based on the ﬁve-fold cross-validation method are computed. In addition, an effective classiﬁcation method should decrease the possibility of misclassiﬁcations, especially for the false-negative rate. The false-negative rate is the probability of the misclassiﬁcation that the patients are classiﬁed with less severe diseases than the actual diagnosis. High false-negative rate represents the danger to underestimate the disease severity of patient while the clinical doctors use this classiﬁcation. Therefore, the false-negative rate may be considered as an index for evaluating the performance of the ﬁve MCSVM classiﬁers. The confusion matrices of the ﬁve

MCSVM methods are listed in Table 3. Table 4 shows the four indices of classiﬁcation performance of the ﬁve MCSVM methods. Table 3 The confusion matrices using the features listed in Table 2 under the ﬁve different MCSVM methods based ﬁve-fold cross-validation. Actual results

Classiﬁed results Tear

Calciﬁc

Inﬂammation

Normal

Method 1 – OAA Tear Calciﬁc Inﬂammation Normal

26 1 0 0

2 24 3 2

2 4 23 5

0 1 4 23

Method 2 – OAA-FSVM Tear Calciﬁc Inﬂammation Normal

(h = 0.5) 30 2 0 0

0 26 1 2

0 2 27 3

0 0 2 25

Method 3 – OAA-DTB Tear Calciﬁc Inﬂammation Normal

29 1 0 0

1 25 2 2

0 3 25 2

0 1 3 26

Method 4 – OAO-VB Tear Calciﬁc Inﬂammation Normal

26 1 0 0

3 24 3 2

0 2 25 4

1 3 2 24

Method 5 – OAO-DAG Tear Calciﬁc Inﬂammation Normal

29 3 1 0

1 25 2 3

0 1 24 3

0 1 3 24

8132

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

Table 4 The performance indices of the ﬁve MCSVM methods in supraspinatus image classiﬁcations. Method used

Sensitivity (%)

Method 1 – OAA 1. Supraspinatus tear 2. Inﬂammation tendon 3. Calciﬁc tendon 4. Normal Method 2 – OAAFSVM (h = 0.5) 1. Supraspinatus tear 2. Inﬂammation tendon 3. Calciﬁc tendon 4. Normal Method 3 – OAADTB 1. Supraspinatus tear 2. Inﬂammation tendon 3. Calciﬁc tendon 4. Normal Method 4 – OAOVB 1. Supraspinatus tear 2. Inﬂammation tendon 3. Calciﬁc tendon 4. Normal Method 5 – OAODAG 1. Supraspinatus tear 2. Inﬂammation tendon 3. Calciﬁc tendon 4. Normal

Sensitivity (%)

86.67

95.56

80.00

93.33

76.67 76.67

92.22 92.22

100

False negative rate (%)

Total classiﬁcation accuracy (%)

10.83

80.00

3.33

90.00

6.66

87.50

100

86.67

95.56

90.00 83.33

96.67 94.45

96.67

98.89

83.33

94.45

83.33 86.67

94.45 95.56

Fig. 8. ROC curves o the ﬁve MCSVM classiﬁers.

9.17 86.67

Table 5 The Az of ROC curves of ﬁve different 4-class support vector machines with the variance of the (C,c).

82.50

95.56

80.00

93.33

83.33 80.00

94.45 93.33

98.89

83.33

94.45

80.00 80.00

93.33 92.22

85.00

The multi-class ROC curve (Diri & Albayrak, 2008) can manifest the relationship between the true-positive fraction (TPF) and false-positive fraction (FPF) with the variations of parameters (C,c). In general, the area under the ROC curve (AUC), Az, is a powerful index for assessing the classiﬁcation performance of the classiﬁer. Hand and Till (2001) proposed an approach to calculating the AUC of k-class classiﬁcation by using Eq. (42), nevertheless, not all of the ﬁve methods contain k(k 1)/2 binary SVMs. Therefore, the formula of Eq. (42) is modiﬁed into Eq. (43) for calculating the AUC:

X 2 AUCðci ; cj Þ k ðk 1Þ c ;c 2C i

ð42Þ

j

where the k is the number of clusters and ci is the ith disease group of supraspinatus.

AUC av erage ¼

OAA-FSVM (h=0.5)

OAA-DTB

OAO-VB

OAO-DAG

0.876

0.943

0.921

0.901

0.889

Based on the results of the present experiments of the classiﬁcation of ultrasonic supraspinatus images, the followings can be emphasized:

3.4. Multi-class receiver operating characteristics (ROC) analysis

AUC av erage ¼

OAA

Az value

4. Discussion and conclusion 5.00

96.67

Method

X 1 AUCðci ; cj Þ number of binary SVMs c ;c 2C i

ð43Þ

j

where the ci and cj are two classiﬁed classes of any binary SVM. Fig. 8 shows the ROC curves of the ﬁve MCSVM methods and the corresponding AUC values are listed in Table 5.

1. The result of texture feature selection is that all of the selected texture features are generated from the GLCM and TFCM. This result reveals that the discriminative capability of selected features generated from the GLCM and TFCM methods are superior to any of other methods. The similar result was also found in the classiﬁcation of ultrasonic liver images of our past work (Horng et al., 2002). 2. The ﬁve multi-class support vector machines including OAA, OAA-FSVM, OAA-DTB, OAO-VB and OAO-DAG are used to classify the ultrasonic supraspinatus images for comparison. In comparison with the total correct classiﬁcation accuracy of the ﬁve methods, the classiﬁcation accuracy of using OAA-FSVM method is signiﬁcantly higher than that of the other four methods. This classiﬁcation accuracy of OAA-FSVM method is 90%, particularly, the classiﬁcation of supraspinatus tear is absolutely correct. Moreover, the false-negative rate of the OAA-FSVM is also only 3.33%. It reveals that the usage of the OAA-FSVM method is relatively safe in the disease diagnosis. The average of sensitivity and speciﬁcity of using the OAA-FSVM are 0.90 and 0.967 that are superior to one of other four methods. To sum up, the results show that the classiﬁcation system of OAA-FSVM is greatly potential for diagnosing the supraspinatus diseases. 3. The AUC of ROC curves of OAA, OAA-FSVM, OAA-DTB, OAO-VB and OAO-DAG are 0.876, 0.943, 0.921, 0.901 and 0.889. Obviously, the AUC of OAA-FSVM method is largest, but the OAA is the worst. This result shows that the OAA-FSVM has considerable successes in the classiﬁcation of ultrasonic supraspinatus images because of decreasing the problems of unclassiﬁable regions.

M.-H. Horng / Expert Systems with Applications 36 (2009) 8124–8133

Finally, some conclusions are made as follows. This article compares with the performances of the ﬁve different MCSVM implementations in the disease classiﬁcation of ultrasonic supraspinatus images. Experimental results show that OAA-FSVM is the most effective one because of the association with a fuzzy membership value. It signiﬁcantly improves the original OAA MCSVM method for classifying the supraspinatus images. As for other methods such as OAO-VB and OAO-DAG, experimental results verify that they do not work well for classiﬁcation of supraspinatus images, thereby how to modify original one-against-one support vector machine more effective is still an interesting studies in further wok. Although OAA-FSVM method associated with GLCM and TFCM methods has been demonstrated to be effective for classiﬁcation of the ultrasonic supraspinatus images, the investigation of how to integrate the merit of other texture analysis methods such as wavelet transform (Arivazhagan, Ganesan, & Priyal, 2006; Sengur, 2008) and Gabor transform (Al-Rawi & Yang, 2007) are also interesting topics. Acknowledgement The author would like to thank the National Science Council, ROC, under Grant No. NSC 94-2213-E-251-005 for support of this work. References Abe, S. (2005). Support vector machines for pattern classiﬁcation. Springer-Verlag. Allwein, E., Schapire, R., & Singer, Y. (2000). Reducing multiclass to binary: An unifying approach for margin classiﬁcation. AT&T Corp. Press. Al-Rawi, M., & Yang, J. (2007). Using Gabor ﬁlter for the illumination invariant recognition of color texture. Mathematics and Computers in Simulation (available online 4). Arivazhagan, S., Ganesan, L., & Priyal, S. P. (2006). Texture classiﬁcation using Gabor wavelets based rotation invariant features. Pattern Recognition Letters, 27, 1976–1982. Arslan, G., Apaydin, A., Kabaalioglu, A., Sindel, T., & Luleci, E. (1999). Sonographically detected subacromoal/subdelroid bursal effusion and biceps tendon sheath ﬂuid: Reliable signs of the rotator cuff tear? Journal of Clinical Ultrasound, 27, 335–339. Baesens, B., Gestel, T., Van Viaene, S., Stepanova, M., Suykes, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art classiﬁcations for credit scoring. Journal of the Operational Research Society, 54, 627–635. Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5, 537–550. Chang, S., & Partridge, D. (2003). Feature ranking and best feature subsets using mutual information. Neural Computation, 13, 1925–1933. Chen, X. W. (2003). An improved branch and bound algorithm for feature selection. Pattern Recognition Letter, 24, 1925–1933. Chen, D. R., Chang, R. F., Kuo, W. J., Chen, M. C., & Huang, Y. L. (2002). Diagnosis of breast tumors with sonographic texture analysis using wavelet transform and neural networks. Ultrasound in Medicine & Biology, 28, 1301–1310. Chen, S. Y., & Lin, C. J. (2005). Combining SVMs with various feature selection strategies. Available from . Chih-Jen Lin’s Homepage, Accessed: February, 2008. Chiou, H. J., Chou, Y. H., Wu, J. J., Hsu, C. C., Tiu, C. M., & Chang, C. Y. (1999). Highresolution ultrasonography of the musculoskeletal system: Analysis of 369 cases. Journal of Ultrasound in Medicine, 7, 212–218. Diri, B., & Albayrak, S. (2008). Visualization and analysis of classiﬁers performance in multi-class medical data. Experts Systems with Applications., 34, 628–634. Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Network, 10, 1048–1054. Dubuc, D., Zucker, S. W., Teicott, C., Quiniou, J. F., & Wehbi, D. (1989). Evaluating the fractal dimension f surfaces. Journal of the Optical Society of America, A7, 1124–1130. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874. Hand, D. J., & Till, R. J. (2001). A simple generalization of the area under ROC curve for multiple class classiﬁcation. Machine Learning, 45, 171–186. Haralick, R. M., Shanmugam, K., & Dinstein, I. (1973). Textural features for image classiﬁcation. IEEE Transaction on Systems, Man, and Cybernetics, 3, 610–621.

8133

He, D. C., & Wang, L. (1991). Texture features based on texture spectrum. Pattern Recognition, 24, 391–399. Horng, M. H. (2007). An ultrasonic image evaluation system for assessing the severity of chronic liver disease. Computerized Medical Imaging and Graphics, 31, 485–491. Horng, M. H., Sun, Y. N., & Lin, X. Z. (2002). Texture feature coding method for classiﬁcation of liver sonography. Computerized Medical Imaging and Graphics, 26, 33–42. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classiﬁcation. Available from . Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415–425. Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Application, 3, 847–856. Kakkos, S. K., Stevens, J. M., Nicolaides, A. N., Kyriacou, E., Pattichis, C. S., Geroulakos, G., et al. (2007). Texture analysis of ultrasonic images of symptomatic carotid plaques can identify those plaques associated with ipsilateral embolic brain infarction. European Journal of Vascular and Endovascular Surgery, 33, 422–429. Keller, J. M., Chen, S., & Crownover, R. M. (1989). Texture description and segmentation through fractal geometry. Comput Vision Graph Image Process, 45, 150–166. Krebel, U. (1999), Pairwise classiﬁcation and support vector machines. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in kernel methods: Support vector learning. Cambride, MA: MIT Press. Kudo, K., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classiﬁer. Pattern Recognition, 33, 25–41. Kwak, N., & Choi, C. H. (2002). Input feature selection by mutual information. IEEE Transaction on Pattern Analysis and Machine Intelligence, 24, 1667–1671. Lu, C., Gestel, T., Van Suykens, J. A. K., Huffel, S., Van Vergote, I., & Timmerman, D. (1999). Preoperative prediction of malignancy of ovarium tumor using least squares support vector machines. Artiﬁcial Intelligence in Medicine, 28, 281–286. Mandelbrot, B. B. (1977). The fractal geometry of nature. San Francisco, CA: Freeman. Mao, K. Z. (2004). Orthogonal feature selection and back elimination algorithms for features subset selection. IEEE Transactions on System Man Cybernetic, 34, 629–634. Marcello, H. N.-B., Jose, B. V., Jorge, E. J., & Gerson, M. (2002). Diagnostic imaging of shoulder rotator cuff lesions. Acta Ortopédica Brasileria, 10, 31–39. Maruyama, K. L., Maruyama, Minoru, Miyao, Hidetoshi, & Nakano, Yasuaki (2004). A method to make multiple hypotheses with high cumulative recognition rate using SVMs. Pattern Recognition, 37, 241–251. Neer, C. S. II, (1972). Anterior acromioplasty for the chronic impingement syndrome in the shoulder: A preliminary report. The Journal of Bone and Joint Surgery, 54, 41–50. Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for features selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1424–1437. Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classiﬁcation. Advances in Neural Information Processing Systems, 12, 547–553. Polat, K., & Gunes, S. (2008). A novel hybrid intelligent method based on C4. 5 decision tree classiﬁer and one-against-all approach for multi-class classiﬁcation problems.. Expert Systems with Application (available online on December 2007). Pontil, M., & Verri, A. (1998). Support vector machines for 3-D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 637–646. Sengur, A. (2008). Wavelet transform and adaptive neuro-fuzzy inference system for color texture classiﬁcation. Expert Systems with Applications, 34, 2120–2128. Takahashi, F., & Abe, S. (2002). Decision-tree-based multiclass support vector machine. In Proceedings of the ninth international conference on neural information processing (ICONIP’01). Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer. Wang, T. Y., & Chiang, H. M. (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing and Management, 43, 914–929. Wu, T. F., Lin, C. J., & Weng, R. C. (2004). Probability estimates for multi-class classiﬁcation by pairwise coupling. Journal of the Machine Learning Research, 5, 975–1005. Yu, G. X., Ostrouchov, G., Geist, A., & Samatova, N. F. (2003). An SVM-based algorithm for identiﬁcation of photosynthesis-speciﬁc genome features. In Second IEEE computer society bioinformatics conference (pp. 253–243). Zhang, H., & Sun, G. (2002). Feature selection using tabu search method. Pattern Recognition, 35, 701–711. Zhang, J. H., Wang, Y. Y., Dong, Y., & Wang, Y. (2007). Ultrasonographic feature selection and pattern classiﬁcation for cervical lymph nodes using support vector machines. Computer Methods and Programs in Biomedicine, 88, 75–84.

Multi-class support vector machine for classification of the ultrasonic images of supraspinatus

Multi-class support vector machine for classification of the ultrasonic images of supraspinatus

Recommend Documents