Improving the accuracy of computer-aided radiographic weld inspection by feature selection

Improving the accuracy of computer-aided radiographic weld inspection by feature selection

ARTICLE IN PRESS NDT&E International 42 (2009) 229–239 Contents lists available at ScienceDirect NDT&E International journal homepage: www.elsevier...

605KB Sizes 0 Downloads 19 Views

ARTICLE IN PRESS NDT&E International 42 (2009) 229–239

Contents lists available at ScienceDirect

NDT&E International journal homepage: www.elsevier.com/locate/ndteint

Improving the accuracy of computer-aided radiographic weld inspection by feature selection T. Warren Liao  Industrial Engineering Department, Louisiana State University, 3128 CEBA, Baton Rouge, LA 70803, USA

a r t i c l e in fo

abstract

Article history: Received 20 April 2007 Received in revised form 23 September 2008 Accepted 4 November 2008 Available online 21 November 2008

This paper presents new results of our continuous effort to develop a computer-aided radiographic weld inspection system. The focus of this study is on improving accuracy by feature selection. To this end, we propose two versions of ant colony optimization (ACO)-based algorithms for feature selection and show their effectiveness to improve the accuracy in detecting weld flaws and the accuracy in classifying weld flaw types. The performances of ACO-based methods are compared with that of no feature selection and that of sequential forward floating selection, which is a known good feature selection method. Four different classifiers, including nearest mean, k-nearest neighbor, fuzzy k-nearest neighbor, and centerbased nearest neighbor, are employed to carry out the tasks of weld flaw identification and weld flaw type classification. & 2008 Elsevier Ltd. All rights reserved.

Keywords: Ant colony optimization Feature selection Sequential forward selection Sequential forward floating selection Weld flaw Weld flaw types Classification Weld inspection Metaheuristic

1. Introduction Welding is a major joining process used to fabricate many engineered artifacts and structures such as cars, ships, space shuttles, off-shore drilling plate-forms, and pipe lines. Flaws resulted from welding operations are detrimental to the integrity of the fabricated artifacts/structures. Commonly seen weld flaws include lack of fusion, lack of penetration, gas holes, porosities, cracks, inclusions, etc. Of course, some flaw types might appear more often than others for a particular welding process. To maintain the desirable level of structural integrity, welds must be inspected according to the established standard. The results of weld inspection also provide useful information for identifying the potential problems in the fabrication process, which are necessary for improving the welding operations. In the current industrial practice, weld inspection is often carried out by certified inspectors and radiographic testing is one of the most commonly used NDT techniques. Efforts have been made in the past to develop computer-aided weld inspection systems based on radiographic images to improve the objectivity and productivity of weld inspection operations. Liao [1] decomposed the development of such a system into three

 Tel.: +1 225 5785365; fax: +1 225 5785109.

E-mail address: [email protected] 0963-8695/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.ndteint.2008.11.002

stages and grouped past work into three categories accordingly as follows:

 Segmentation of welds from the background: Felisberto et al. 



[2], Liao and Ni [3], Liao and Tang [4], Liao et al. [5], and Liao [6]. Detection of flaws in weld: Carrasco and Mery [7], Daum et al. [8], Gayer et al. [9], Hyatt et al. [10], Kaftandjian et al. [11], Liao and Li [12], Liao et al. [13], Murakami [14], and Wang and Wong [15]. Classification of different types of weld flaws: Aoki and Suga [16], Kato et al. [17], Liao [1], Murakami [14], Silva et al. [18], and Wang and Liao [19].

To the best of our knowledge the feature selection issue has only been addressed in one study, i.e. [20], which showed the effectiveness of feature selection to improve the classification accuracy. In that study, two feature selection methods, the context merit algorithm and the sequential forward floating search algorithm, were used together with a decision tree classifier. Silva et al. [18] measured the relevance of a feature and evaluated the performances of some selected feature subsets. However, no feature selection method was employed in their study. Since they considered only four features, feature selection is actually unnecessary because an exhaustive search method should be applicable to find the best feature subset for their data.

ARTICLE IN PRESS 230

T.W. Liao / NDT&E International 42 (2009) 229–239

Feature selection refers to a process that selects an optimal subset of original features based on an evaluation criterion. For a data set with N features, there exist 2N candidate subsets. Even for a moderate N, the search space is exponentially prohibitive for exhaustive search. The search strategies that have been employed in feature selection can be roughly grouped into three categories: (1) Complete search such as branch and bound that guarantees to find the optimal result. (2) Sequential search that adds or removes features one (or some) at a time. Three cases can be distinguished: sequential forward selection if starting with an empty set and adding one feature (or some numbers of features) at a time; sequential backward elimination if starting with the entire set of features and removing one feature (or some features) at a time; bidirectional search if the above two are combined. (3) Random search such as genetic algorithm that starts with a randomly selected subset and then proceeds to generate the next subset in a completely random manner or with sequential search. Depending upon how the evaluation is carried out, a feature selection method can be distinguished either as a filter approach, a wrapper approach, or a hybrid. The filter approach evaluates and selects feature subset based on the general characteristics of the data without involving any learning model. On the other hand, the wrapper approach employs a learning model and uses its performance as the evaluation criterion. Compared with the filter approach, the wrapper approach is known to be more accurate but cost higher computationally. The hybrid approach is designed to trade accuracy with computational speed by applying the wrapper approach to only the set of features pre-selected by the filter approach. The ant colony optimization (ACO) metaheuristic was given in Dorigo et al. [21], along with an overview of recent work on ant algorithms for discrete optimization. Note that their review did not include feature selection at all. The first use of ACO for feature selection seems to be reported in Ref. [22]. In that work, principal component analysis (PCA) was used to extract eigenfaces from images at the preprocessing stage; and then ACO was applied to select the optimal subset features using cross validation with support vector machine as the classifier. Shen et al. [23] modified ACO to select variables in quantitative structure-activity relationship (QSAR) modeling and to predict inhibiting action of some diarylimidazole derivatives on cyclooxygenase enzyme. In another work, ACO was tailored to perform feature selection for the purpose of prostate tissue characterization with support vector machine as the classifier, based on Trans-rectal ultrasound images [24]. It was shown that ACO outperformed genetic algorithms in obtaining a better feature subset that leads to higher classification accuracy. Zhang and Hu [25] modified ACO by substituting the mutual information for random as some problem-specific local heuristics to select feature subsets for support vector machine modeling of temperature data. Bello et al. [26] developed a twostep ant colony system for feature selection that splits the heuristic search into two stages, generating candidate feature subsets in the first stage and then randomly selecting some of them as the initial state for the ants in the second stage. In Ref. [27], a modified discrete binary ACO algorithm was shown to greatly improve the fault diagnosis performance of support vector machine for the Tennessee Eastman process. In a most recent study, the performance of backpropagation feedforward networks in medical diagnosis was shown to greatly improve when it was used together with ACO for feature selection [28]. Most of the ACO-based feature selection methods reviewed above implements

the random search strategy with feature subsets generation guided by ACO. The only exception is Ref. [28], in which a strategy similar to the sequential forward selection is followed. This paper proposes two different ACO-based feature selection methods with each following a different search strategy and applies them to two different stages of computer-aided weld inspection, i.e., weld flaw identification and weld flaw type classification. The novelty of this study lies in three areas: (i) the first application of ACO-based feature selection in weld inspection, (ii) the comparison of several different classifiers used together with ACO-based feature selection, and (iii) the comparison of the above-mentioned two different ACO-based search approaches (random vs. sequential) for finding optimal feature subsets. More details of the proposed methods are described in the next section. Section 3 demonstrates the working of the proposed method with a small example. Section 4 gives the test results on the identification of weld flaws and the classification of weld flaw types to show the effectiveness of the proposed methods in comparison with the sequential forward floating search method. The discussion is then followed. The last section concludes the paper and identifies topics for future study.

2. Ant colony optimization-based feature selection methods This section describes the two proposed ACO-based feature selection methods with one following the sequential forward search strategy and the other following the random search strategy. The two methods are designed to include some basic traits commonly seen in nearly all ACO meta heuristics as well as some unique to the specific task–feature selection. These traits are elaborated in Section 2.1. The algorithm of each method is given in Section 2.2. The wrapper approaches for performance evaluation are described in Section 2.3.

2.1. Traits 2.1.1. It employs a colony of artificial ants that work cooperatively to find good solutions Each artificial ant is a simple agent that possesses the path finding behavior of a real ant observed in real ant colonies. Although each artificial ant can build a feasible solution, highquality solutions are the result of the cooperation among the individuals of the whole colony. In the context of feature selection, each artificial ant is assigned the task to find feasible feature subsets and the whole colony work together to find the optimal feature subsets. The size of colony, or the number of artificial ants, is a parameter needed to be specified, which is similar to the population size, or the numbers of chromosomes, in a genetic algorithm.

2.1.2. Each artificial ant builds its solution by applying a stochastic local search policy The stochastic local search policy is guided primarily both by ants’ private information, which contains the memory of the ants’ past actions, and by publicly available pheromone trails and a priori problem-specific local information. In the context of feature selection, ants’ private information contains the features that have been selected in building a feature subset. The stochastic local search policy used in the sequential search strategy is modified from the one proposed by Yan and Yuan [22], which was initially used in a random search strategy. Specifically an ant chooses a feature u* from the unselected feature set U at

ARTICLE IN PRESS T.W. Liao / NDT&E International 42 (2009) 229–239

2.2. Algorithms

iteration t according to the following probability:

pu ðtÞ ¼

8 > > > < > > > : DP u



1

if qoq0 & u ¼ arg maxDPu ðtÞ

0  P

if qoq0 & u a arg maxDPu ðtÞ

DP u

(1)

if qXq0

(

2.2.1. ACO-S The ACO-S algorithm has the following steps: ACO-S algorithm

Pu;1 ðtÞ  Pu;0 ðtÞ

if minðDPu ðtÞÞ40

P u;1 ðtÞ  P u;0 ðtÞ þ 2jminðDP u ðtÞÞj

else

and P u;1 ðtÞ ¼ tu;1 ðtÞ=2 P u;0 ðtÞ ¼ tu;0 ðtÞ. The parameter, q, is a random variable uniformly distributed over [0, 1]. The parameter, q0, is called exploitation probability factor, which determines the relative importance of exploitation (in the case that qoq0) versus exploration (in the case that qXq0). Note that two different pheromone levels, tu,1 and tu,0, are used to represent the pheromone levels of selected and not selected features, respectively. The stochastic local search policy used in the random search strategy follows that of Yan and Yuan [22]. Specifically an ant builds a feature subset by choosing each feature u from the entire feature set at iteration t according to the following probability: ( P u ðtÞ ¼

1 0

if q1 oq0 &P u;1 ðtÞ4P u;0 ðtÞ or q1 Xq0 &q2 40:5 if q1 oq0 &Pu;1 ðtÞpP u;0 ðtÞ or q1 Xq0 &q2 p0:5

(2)

In Eq. (2), q1 and q2 are all random variables uniformly distributed over [0, 1].

2.1.3. To cooperate, artificial ants communicate by stigmergy similar to real ants via depositing pheromone on the ground while walking The shorter the path between the food source and the nest, the sooner the pheromone is deposited by the real ants; which in turn leads to more ants taking the shorter path. In the meanwhile the pheromone deposited on the longer path receives few or no new deposits and gets evaporated over time. Artificial ants deposit an amount of pheromone that is a function of the quality of the solution found. Unlike real ants, the timing for updating pheromone trails is often problem dependent. Our methods update pheromone trails only after a solution is generated. Our pheromone updating policy follows that used by Yan and Yuan [22]. For this policy all ants are allowed to update pheromone trails. The pheromone levels of selected and not selected features, tu,1 and tu,0, respectively are updated as follows: X tu;1 ðt þ 1Þ ¼ ð1  eÞ tu;1 ðtÞ þ Dtau;1

tu;0 ðt þ 1Þ ¼ ð1  eÞ tu;0 ðtÞ þ

8a X

Dtau;0 ,

where (

ð1  Ea ðtÞÞ=Q ;

(

ð1  Ea ðtÞÞ=Q ;

if ant a does not choose this feature u

Dtau;0 ðtÞ ¼

0;

otherwise

0;

1. Load the data and normalize it by feature to remove the magnitude effect 2. Set parameters including maximum number of runs, maxR, maximum size of feature subsets, maxS, maximum number of ants, maxA, maximum number of iterations, maxT, exploitation probability factor, q0, pheromone evaporation parameter, e, and a constant, Q ( ¼ 10 in this study). 3. For r ¼ 1:maxR, 3.1. Initialize the pheromone level associated with each feature ( ¼ 1 in this study). 3.2. For s ¼ 1:maxS, 3.2.1. For t ¼ 1: maxT, 3.2.1.1. Initialize the feature subset found by each ant to be null and its value, in terms of classification error, to be a large number. 3.2.1.2. For a ¼ 1: maxA, 3.2.1.2.1. While the feature subset size, s, is not yet reached Select a feature from the unselected list according to Eq. (1). End while 3.2.1.2.2. Evaluate the classification error of the feature subset found by each ant with a selected classifier. End for 3.2.1.3. Update the pheromone level of each feature according to Eq. (3), depending upon whether it is selected or not. 3.2.1.4. Record the best feature subset in each iteration, and other interesting information such as the pheromone update profiles associated with each feature. End for End for 3.3. Record the best feature subset in each run. End for 4. Write output to the file.

2.2.2. ACO-R The ACO-R algorithm has the following steps: ACO-R algorithm

(3)

8a

Dtau;1 ðtÞ ¼

The two ACO-based feature selection methods are named ACOS and ACO-R with S and R denote sequential forward search and random search, respectively.

8u

In the above equation,

DP u ðtÞ ¼

231

if ant a choose this feature u otherwise

In Eq. (3), e is the pheromone evaporation parameter; Ea(t) denotes the classification error of the feature subset found by ant a at iteration t; and Q is a constant (set at 10 in this study).

1. Load the data and normalize it by feature to remove the magnitude effect 2. Set parameters including maximum number of runs, maxR, maximum number of ants, maxA, maximum number of iterations, maxT, exploitation probability factor, q0, pheromone evaporation parameter, e, and a constant, Q ( ¼ 10 in this study). 3. For r ¼ 1:maxR, 3.1. Initialize the pheromone level associated with each feature ( ¼ 1 in this study). 3.2. For t ¼ 1:maxT, 3.2.1. Initialize the feature subset found by each ant to be

ARTICLE IN PRESS 232

T.W. Liao / NDT&E International 42 (2009) 229–239

null and its value, in terms of classification error, to be a large number. 3.2.2. For a ¼ 1: maxA, 3.2.2.1. While the feature subset is null Build a feature subset from the entire feature set according to Eq. (2). End while 3.2.2.2. Evaluate the classification error of the feature subset found by each ant with a selected classifier. End for 3.2.3. Update the pheromone level of each feature according to Eq. (3), depending upon whether a feature is selected or not. 3.2.4. Record the best feature subset in each iteration, and other interesting information such as the pheromone update profiles associated with each feature. End for End for 3.3. Record the best feature subset in each run. End for 4. Write output to the file.

2.3. Wrapper approaches The wrapper approach is employed to evaluate the goodness of a feature subset found by an artificial ant. To compare their performances, four learning models are alternatively used. The performance is measured in terms of the average test error of stratified 5-fold cross-validations. The four learning models are the nearest mean (NM), k-nearest neighbor (KNN), fuzzy k-nearest neighbor (FKNN), and center-based nearest neighbor (CBNN). Though most of these learners are well-known, for the sake of completeness a brief description of each is given below.

2.3.1. Nearest mean classifier The nearest mean classifier is single prototype classifier, i.e. it finds the centroid from a given class and stores it for later classification of unseen data. Given a set of n training samples in a c-class problem with each class, Ci, having ni elements, the centroid of class i can be computed as follows: oi ¼

1 X x; ni x 2C j j

i ¼ 1; 2; . . . ; c.

(4)

2.3.3. Fuzzy k-nearest neighbor classifier The fuzzy k-nearest neighbor classifier [29] assigns membership values to an unseen test datum indicating its belongingness to all classes rather than assigning it to a particular class only as in KNN. Given a set of n training samples in a c-class problem, the FKNN algorithm has the following steps: FKNN algorithm 1. Load the test data 2. Set k, 1pkpn and m (41), which is used in Eq. (5) below to determine how heavily the distance is weighted when computing each neighbor’s contribution to the membership value. 3. For each test datum x, 3.1. Find its k-nearest neighbors, xj, j ¼ 1, y, k, using a distance measure such as Euclidean 3.2. Assign memberships in all classes to x according to Eq. (5). ! Pk 1 j¼1 uij jjx  xj jj2=ðm1Þ ! ; for i ¼ 1; . . . ; c, (5) ui ðxÞ ¼ Pk 1 j¼1 jjx  xj jj2=ðm1Þ where uij denotes the membership of the jth nearest neighbors in the ith class, which is usually known or can be determined before hand. 3.3. Assign x to the class with the highest membership value End for 2.3.4. Center-based nearest neighbor classifier The center-based nearest neighbor classifier [30] finds the centroid from a given class as in Eq. (4) and defines a center-based line (CL) that connects a training example with its class centroid. In the classification phase, an unseen test datum x is assigned a class label of the training example used to define the CL that is closest to x. The distance from x to a CL is computed as the Euclidean distance between x and the projection of x onto the CL. Consider xj be a training example in class i with centroid oi. The project point of x onto the CL connecting xj and oi is calculated as pi;j ¼ xj þ

ðx  xj ÞT ðoi  xj Þ ðoi  xj ÞT ðoi  xj Þ

ðoi  xj Þ.

(6)

3. An illustrated example

i

In the classification phase, an unseen test datum x is assigned a class label of the stored centroids for which the Euclidean distance is the smallest. That is i*(x) ¼ argmin||x, oi||. 2.3.2. K-nearest neighbor classifier The k-nearest neighbor classifier assigns an unseen test datum to the class represented by the majority of its k-nearest neighbors. Given a set of n training samples in a c-class problem, the KNN algorithm has the following steps: KNN algorithm 1. Load the test data 2. Set k, 1pkpn. 3. For each test datum x, 3.1. Find its k-nearest neighbors using a distance measure such as Euclidean 3.2. Determine the majority class represented in the set of Knearest neighbors; break the tie if necessary. End for

A small 2-class data set of 33 records with 25 features each, which were sampled from the weld flaw detection dataset to be tested in Section 4, is used as the example for two purposes: (i) to show the working of each ACO-based feature selection method, and (ii) to illustrate the effect of data set size when compared with the results given in Section 4. For comparison, the example is also processed by the sequential forward floating search algorithm [31]. All programs were coded in Matlab and executed using a Dell Latitude C640 laptop computer with Mobile Intel 2 GHz Pentium 4-M CPU. Tables 1–3 summarize the stratified 5-fold cross-validation classification error rate, the CPU time, and the best feature subsets found for each combination of feature selection method and classifier, respectively. The above results were obtained with 5 ants and 5 iterations for the ACO-S method, and with 30 ants and 30 iterations for the ACO-R, regardless of the classifier used. In the table, they are denoted as ACO-S (5, 5) and ACO-R (30, 30), respectively. Due to its stochastic nature, five runs were made for each ACO-based method by setting both q0 and e to 0.5 arbitrarily. The effects of these two parameters will be further investigated in

ARTICLE IN PRESS T.W. Liao / NDT&E International 42 (2009) 229–239

233

Table 1 Summary of classification error rates for the small dataset.

No selection SFFS ACO-S (5, 5) ACO-R (30, 30) a

NM

1NN

F1NN

CBNN

26.67% 6.67 10.0074.71 (best ¼ 6.67) 14.0072.79a (best ¼ 10.00)

16.67% 0 2.0071.83 (best ¼ 0) 2.6771.49 (best ¼ 0)

16.67% 0 4.0072.79 (best ¼ 0) 3.3372.36 (best ¼ 0)

16.67% 3.33 0.6771.49 (best ¼ 0) 0.6771.49 (best ¼ 0)

When both maxA and maxT were increased to 50, this value becomes 1073.33% with the lowest error rate of 6.67%.

Table 2 Summary of CPU times (in seconds) for the small dataset.

SFFS ACO-S (5, 5) ACO-R (30, 30) a

NM

1NN

F1NN

CBNN

14.4 35.5 17.2a

15.1 38.9 18.9

14.7 39.2 19.0

47.3 67.1 58.8

When both maxA and maxT were increased to 50, this value becomes 44.9.

Table 3 Summary of best feature subsets found for the small dataset. NM SFFS ACO-S (5, 5)

{6} {6} 3/5

1NN

F1NN 1

CBNN 2

{7,8,10} {6,9,10,22,23}3 1/5

{7,8,10} {6,10,15,17,25}4 1/5

{2,5,10,21} {4,8,10,18–21,25}5a 1/5 {5,9,10,18,21,24,25} 1/5 {2,10,18}5b 1/5 {1,2,11,15,17,19,21}5c 1/5

{6,8,9,10,17,20,21} 1/5

{2,7,8,10,11,18,21,22,25}7a 1/5 {7,8,10,18,19,22,23} 1/5 {8,10,12,17,19,21–23,25}7b 1/5 {1,8,9,10,17–19,21–23}7c 1/5

{7,8,10,19,22,23} 1/5

ACO-R (30, 30)

–6

{2,7,10,17,23,24} 1/5

Notes: (1) 15 larger feature subsets with zero classification errors were also found. (2) 15 larger feature subsets with zero classification errors were also found. (3) 2 larger feature subsets with zero classification errors were also found. (4) 2 larger feature subsets with zero classification errors were also found. (5) 1, 11, 3 more feature subsets with zero classification errors were also found for 5a, 5b, 5c, respectively. (6) When both maxA and maxT were increased to 50, the best feature subset is 6, 10, 23 or 2, 6, 7, 9, 15, 17, 24, with each obtained in one out of a total of 5 runs. (7) One more feature subset with zero classification errors were also found for 7a, 7b, and 7c, respectively.

the Discussion section. Their classifier errors are given as ‘‘average7standard deviation’’ in the table, and the best performance is also provided. The results of using the full set of features are given in the first row of Table 1. In Table 3, the fraction after a best feature subset indicates the number of runs that it was found over a total of five runs. The results indicate that: (1) Feature selection improves the classification accuracy, regardless the method used. (2) The best feature subset found is classifier dependent. Note that three classifiers, including 1NN, fuzzy 1NN, and CBNN, were able to achieve zero classification errors with either one of the two proposed ACO-based feature selection methods. On the other hand, the SFFS method failed to attain perfect classification when CBNN was the classifier. (3) The performance of the nearest mean classifier is relatively poor compared to the other three classifiers on this particular dataset. Note that this small data set was arbitrarily sampled from the weld flaw recognition data set. Therefore, it is not representative of the full set at all. It will be shown in the next section that the nearest mean classifier actually is the best among all four classifiers for the full weld flaw identification dataset.

(4) As the number of ants and iterations increases for the ACObased feature selection methods, the chance of finding a better feature subset that yields lower error rate generally increases, but at the cost of longer computational time. (5) The parameters associated with the two ACO-based methods provide them the ability to find better feature subsets than those obtained by the SFFS method. But the specification of these parameters is an issue needed to be contended with. More discussion on this is given later. (6) ACO-R is generally faster than the sequential forward floating selection method as well as the ACO-S method. However, ACO-R tends to produce best feature subsets with larger size than those produced by the sequential forward selection strategies. In addition, the variation of ACO-R results in multiple runs is generally lower than that of ACO-S results, which seems to reduce their ability to find a better feature subset than ACO-S.

Figs. 1 and 2 are plotted to show how the pheromone levels of features change with the search process in the first run of each ACO-based feature selection method when nearest mean is used as the classifier, and both q0 and e are set to 0.5. The number of updates is determined by both the number of iterations and the

ARTICLE IN PRESS 234

T.W. Liao / NDT&E International 42 (2009) 229–239

ACO-S (5,5) with Nearest Mean - Selected Features

100 Pheromone

80 60 40 20 0 1

10

19

28

37

46

55 64 73 82 Update Sequence

91

100 109 118

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

x16

x17

x18

x19

x20

x21

x22

x23

x24

x25

ACO-S (5, 5) with Nearest Mean - Not Selected Features

Pheromone Level

80 70 60 50 40 30 20 10 0 1

9

17

25

33

41

49 57 65 73 81 Update Sequence

89

97 105 113 121

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

x16

x17

x18

x19

x20

x21

x22

x23

x24

x25

Fig. 1. Pheromone level change profile of an ACO-S run: (a) for selected features and (b) for not selected features.

number of features for the ACO-S method (125 ¼ 5  25 in Fig. 1), and by the number of iterations specified for the ACO-R method (50 in Fig. 2). Obviously these profiles are determined by the updating rule given in Eq. (3). From these profiles, it can be known which features are selected during different stages of the search process. For example, in Fig. 1 x6 is the first feature selected to increase its pheromone level and maintains at the highest pheromone level for the consecutive iterations. Comparing Fig. 1(a) with (b), one can easily observe that the order of profiles in them is just the opposite, which is expected because a particular feature can only be updated either as a selected feature or a not selected feature in each step. These pheromone level profiles, however, do not always tell which features form a good subset. For example, the best feature subset for the run showed in Fig. 1 is {6}, which happens to contain the first feature selected to increase its pheromone level to the highest. However, the best feature subset for the run showed in Fig. 2 is {6, 10, 23}, for which neither feature in the subset has the highest pheromone level throughout its 50 iterations.

4. Test results Both proposed ACO-based feature selection methods were applied to the identification of weld flaws and the classification of

weld flaw types. Table 4 summarizes the major characteristics of the data sets tested. Detailed description of how the data sets were originally obtained can be referred to Refs. [13] and [19]. Each data set is also processed by the sequential forward floating selection method for comparison. The test results were obtained with 10 ants and 10 iterations for the ACO-S method and 30 ants and 30 iterations for the ACO-R in both data sets. Due to the relatively poor performance of the latter on the weld flaw identification data, 50 ants and 50 iterations were also used to see whether improvement could be made. Due to its stochastic nature, five runs were made for each ACO-based method by setting both q0 and e to 0.5 arbitrarily, identical to those used in the illustrated example. The test results of each data set are presented in the followings: 4.1. Weld flaw identification Tables 5–7 summarize the classification error rates, CPU times, and locally best feature subsets found, respectively. Based on these results, the following observations can be made: (1) The nearest mean classifier is the best among all four tested, regardless of whether feature selection is applied or not, or which feature selection method is applied. (2) Regardless the classifier, feature selection reduces the classification error. The lowest classification error of 16.2% is

ARTICLE IN PRESS T.W. Liao / NDT&E International 42 (2009) 229–239

235

ACO-R (50,50) with Nearest Mean - Selected Features

Pheromone Level

200 150 100 50 0 1

4

7

10 13 16 19 22 25 28 31 34 37 40 43 46 49 Update Sequence

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

x16

x17

x18

x19

x20

x21

x22

x23

x24

x25

ACO-R (50,50) with Nearest Mean - Not Selected Features

Pheromone Level

200 150 100 50 0 1

4

7

10

13

16

19

22 25 28 31 34 Update Sequence

37

40

43

46

49

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

x16

x17

x18

x19

x20

x21

x22

x23

x24

x25

Fig. 2. Pheromone level change profile of an ACO-R run: (a) for selected features and (b) for not selected features.

Table 4 Summary of datasets used in testing. Dataset

Number of features

Number of records

Number of classes

Reference

Weld flaw identification Weld flaw type classification

25 numeric 12 numeric

399 147

2 6

(Liao et al., 1999) (Wang and Liao, 2002)

Table 5 Summary of classification error rates for the weld flaw identification data.

No selection SFFS ACO-S (10, 10) ACO-R (30, 30) ACO-R (50, 50) a

NM

1NN

F1NN

CBNN

27.09% 16.2 19.871.75 (best ¼ 16.71) 21.9270.53 (best ¼ 21.27) 21.1171.22a (best ¼ 18.99)

35.44% 21.77 23.2970.96 (best ¼ 21.77) 25.8770.49 (best ¼ 25.06) 23.4470.29 (best ¼ 23.04)

35.44% 21.01 24.7671.89 (best ¼ 22.03) 24.7670.73 (best ¼ 23.8) 23.7570.73 (best ¼ 22.79)

35.95% 20.76 25.2772.45 (best ¼ 21.52) 24.1572.27 (best ¼ 20.76) 24.6671.07 (best ¼ 22.79)

When maxA and maxT were increased to 100, the error rates become 20.271.05 with the best ¼ 18.48.

achieved when the SFFS feature selection is used with the nearest mean classifier. The associated best feature subset contains only three features, i.e., 2, 6, and 9. In the Discussion section, it will be shown that using other parameters the

ACO-S method can also achieve the same lowest classification error. (3) As the number of ants and iterations increase, the classification error rates generally reduce as shown by our experiment

ARTICLE IN PRESS 236

T.W. Liao / NDT&E International 42 (2009) 229–239

with the ACO-R method. For this dataset, the ACO-R method did not work as well as the ACO-S method because it could not find the best feature subset as that found by the SFFS and the ACO-S method even after maxA and maxT were increased to 100. Since the computational time is already very high, we decided not to increase the maxA and maxT values further because it would require even more time, to the point that it is unbearable even if the best feature subset is eventually found. (4) The variation of ACO-R results in multiple runs is again relatively lower than that of ACO-S results, as that seen in the small data set. 4.2. Weld flaw type classification Tables 8–10 summarize the classification error rates, CPU times, and locally best feature subsets found, respectively. Based on these results, the following observations can be made: (1) Both the 1NN and fuzzy 1NN classifier are the best for this data set when feature selection is applied. The lowest classification error of 9.3% was achieved regardless which features selection method is used. The best feature subsets associated with the lowest classification error found by a different feature selection method might differ. (2) If CPU time is taken into consideration, then the fuzzy 1NN classifier is better than 1NN because it requires less computational time to obtain the same result. (3) The ACO-R method needs to employ more ants and iterations than the ACO-S method in order to find better feature subsets. To find the best feature subset in at least one run, 10 ants and 10 iterations are sufficient for the sequential forward selection methods but 30 ants and 30 iterations are required for the random search methods.

(4) Just like the small example, the ACO-based methods are better than the SFFS method when the CBNN classifier is used.

5. Discussion The above test results have clearly shown that the ACO-based feature selection methods are capable of finding the best feature subsets compared with the SFFS method, though at the cost of slightly longer search time. These results were obtained by fixing most of the parameters except two, i.e. the maximum number of ants, maxA, and the maximum number of iterations, maxT. In this section, the effects of two more parameters, specifically the exploitation probability factor, q0, and the pheromone evaporation parameter, e, are investigated. With all other parameters fixed, these two parameters are varied at three levels each at 0.1, 0.5, and 0.9. Hence, a total of nine combinations are experimented including the one that has been used in the previous runs, i.e. the 0.5–0.5 combination. The nearest mean classifier is chosen over the other classifiers for this investigation for the following reasons: (1) it requires the least CPU time; and (2) it is the best classifier for the weld flaw identification data set. All three datasets were tested. Table 11 summarizes the effects of the ACO-S parameters on the classification error, CPU time, and best feature subsets found for the small dataset. Two levels (5 and 10) of both maxA and maxT were experimented. The combinations of (0.1, 0.1), short for q0 ¼ 0.1 and e ¼ 0.1, and (0.1, 0.9) are better than others because these two are able to attain the lowest classification error when maxA and maxT are set at the lower value, i.e., 5. The worst combination is (0.9, 0.1), which has the highest classification error rate for both levels of maxA and maxT. This implies that too much exploitation and too less exploration is conducive to higher error. Table 12 summarizes the effects of the ACO-R parameters for the

Table 6 Summary of CPU times (in seconds) for the weld flaw identification data.

SFFS ACO-S (10, 10) ACO-R (30, 30) ACO-R (50, 50) a

NM

1NN

F1NN

CBNN

62.5 222.5 42.5 102.5a

4263.0 6982.6 2194.8 6866.7

2052.8 3172.1 906.6 3134.3

23200.9 27653.5 11975.8 33953.6

When maxA and maxT were increased to 100, the CPU time becomes 506.5.

Table 9 Summary of CPU times (in seconds) for the weld flaw type classification data.

SFFS ACO-S (10, 10) ACO-R (30, 30)

NM

1NN

F1NN

CBNN

6.3 62.7 30.5

109.0 489.2 360.5

46.4 195.4 118.8

558.9 1710.1 1163.2

Table 7 Summary of best feature subsets found for the weld flaw identification data.

SFFS ACO-S (10, 10) ACO-R (30, 30) ACO-R (50, 50)

NM

1NN

F1NN

CBNN

{2,6,9} – – –

{2–4,6,9,10,12,20,22,23} {2,4,6} – –

{2,4–10,15–17,23,25} – – –

{1,2,4–7,9,10,12,17,20, 22, 23} – {1,2,4–7,13,16,22} –

Table 8 Summary of classification error rates for the weld flaw type classification data.

None SFFS ACO-S (10,10) ACO-R (30,30)

NM

1NN

F1NN

CBNN

16.67% 10.42 11.1170.0 (best ¼ 11.11) 10.8370.38 (best ¼ 10.42)

11.81% 9.03 9.3170.62 (best ¼ 9.03) 9.0370.0 (best ¼ 9.03)

11.81% 9.03 9.4470.62 (best ¼ 9.03) 9.0370.0 (best ¼ 9.03)

10.42% 10.42 10.1470.38 (best ¼ 9.72) 9.8670.31 (best ¼ 9.72)

ARTICLE IN PRESS T.W. Liao / NDT&E International 42 (2009) 229–239

237

Table 10 Summary of best feature subsets found for the weld flaw type classification data. NM

1NN

F1NN

CBNN

SFFS ACO-S (10, 10)

{5,7,9,11} –

{2,5,6,8,9,11,12} {2,5,6,9,11} 1/5 {2,3,4,5,6,9,11} 1/5 {2,3,4,5,6,9,11,12} 1/5 {2,4,5,6,8,9,11,12} 1/5

{2,5,6,8,9,11,12} {2,3,5,6,9,11,12} 1/5 {2,5,6,8,9,11,12} 1/5 {2,3,4,5,6,8,9,11} 1/5

– {2,5,7,9,10,12} 1/5 {2,3,5,7,9,10,12} 1/5

ACO-R (30,30)

{5,7,9,11} 1/5 {5,7,9,11,12} 1/5

{2,5,6,9,11,12} 1/5 {2,3,5,6,8,9,11,12} 1/5 {2,4,5,6,8,9,11} 1/5 {2–6,8,9,11} 1/5 {2–6,8,9,11,12} 1/5

{2,5,6,8,9,11} 1/5 {2,5,6,8,9,11,12} 1/5 {2–6,8,9,11} 1/5 {2–6,8,9,11,12} 1/5 {2,4,5,6,9,11,12} 1/5

{2,3,5,7,9,10,12} 3/5 {5,7,8,9,10,12} 1/5

Table 11 Effects of ACO-S parameters on the small dataset with nearest mean classifier. Method

q0

e

ACO-S (10, 10)

0.1 0.1 0.1 0.5 0.5 0.5

0.1 0.5 0.9 0.1 0.5 0.9

0.9 0.9 0.9

0.1 0.1 0.1 0.5 0.5 0.5 0.9 0.9 0.9

ACO-S (5, 5)

Classification error (%)

CPU time (s)

Best feature subsets

6.6770 6.6770 6.6770 6.6770 6.6770 7.3371.49 (best ¼ 6.67)

126.8 144.5 144.5 169.4 144.3 144.4

0.1 0.5 0.9

1074.71 (best ¼ 6.67) 6.6770 6.6770

144.2 144.0 169.2

{6} 5/5 {6} 5/5 {6} 5/5 {6} 4/5, {2,6} 1/5 {6} 5/5 {6} 3/5 {6,10,18} 1/5 {6} 3/5 {6} 4/5, {6,18} 1/5 {6} 3/5, {2,6} 1/5, {1,6,17} 1/5

0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9

6.6770 7.3371.49 (best ¼ 6.67) 6.6770 8.071.83 (best ¼ 6.67) 10.074.71 (best ¼ 6.67) 10.6772.79 (best ¼ 6.67) 17.3374.35 (best ¼ 10) 14.6774.47 (best ¼ 6.67) 12.077.3 (best ¼ 6.67)

35.6 12.9 36.2 36.3 35.5 36.5 33.3 36.3 36.2

{6} {6} {6} {6} {6} {6} – {6} {6}

4/5, 3/5, 3/5, 2/5, 3/5 1/5

{6,10} 1/5 {6,14} 1/5 {6,15} 1/5, {6,18} 1/5 {1,6,17} 1/5

1/5 3/5

Table 12 Effects of ACO-R parameters on the small dataset with nearest mean classifier. Method

q0

e

ACO-R (50, 50)

0.1 0.1

0.1 0.5

0.1 0.5

ACO-R (30, 30)

Classification error (%)

CPU time (s)

Best feature subsets

9.3371.49 (best 6.67) 8.071.83 (best ¼ 6.67)

47.8 48.3

0.9 0.1

10.6771.49 (best ¼ 10) 10.073.33 (best ¼ 6.67)

47.9 45.5

0.5

0.5

10.073.33 (best ¼ 6.67)

44.9

0.5

0.9

10.6774.35 (best ¼ 6.67)

47.8

0.9 0.9 0.9

0.1 0.5 0.9

10.6771.49 (best ¼ 10) 12.072.98 (best ¼ 10) 12.075.06 (best ¼ 10)

51.0 50.7 48.8

{6,9,10,15,17,20,23–25} 1/5 {4,6,10,14,15,17,18,22,24} 1/5 {2,6,9,10,18,19,24,25} 1/5 {1,6,9,10,14,17,18,20,22} 1/5 – {2,6,7,9,10,15,24} 1/5 {2,6,10,14,17–20,24} 1/5 {6, 15, 17, 18} 1/5 {2, 6, 9, 10, 15, 18} 1/5 {6,7,9,15,17,18,24} 1/5 {2,6,14,15,17,19,24} 1/5 – – –

0.1 0.1 0.1 0.5 0.5 0.5 0.9 0.9 0.9

0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9

14.072.79 (best ¼ 10) 14.072.79 (best ¼ 10) 10.072.36 (best ¼ 6.67) 12.075.06 (best ¼ 3.33) 14.072.79 (best ¼ 10) 14.6772.78 (best ¼ 10) 13.3372.36 (best ¼ 10) 13.3372.36 (best ¼ 10) 15.3374.47 (best ¼ 10)

17.5 17.5 17.4 17.4 17.2 17.4 17.9 17.8 17.6

– – {4,6,10,14,17,18,19,23} 1/5 {6,10,18,23,24,25} 1/5 – – – – –

ARTICLE IN PRESS 238

T.W. Liao / NDT&E International 42 (2009) 229–239

small dataset. Two levels (30 and 50) of both maxA and maxT were also experimented. Two combinations are kind of unusual. The (0.1, 0.9) combination has a higher classification error rate at the higher level of maxA and maxT, which is not desirable. The (0.5, 0.1) combination found the lowest classification error rate of 3.3% at lower level of maxA and maxT, which is desirable. Both situations are rare but could happen due to the stochastic nature of the ACO-based search methods. Generally, lower value of q0 is better, favorable of more exploration. Table 13 summarizes the effects of both ACO-based feature selection methods on the classification error, CPU time, and best feature subsets found for the weld flaw identification dataset. None of the nine combinations produced the lowest classification

error rate of 16.2% when the ACO-R method was used. On the other hand, four out of nine combinations did just that when the ACO-S method was used. They are the (0.1, 0.1) (0.1, 0.5), (0.1, 0.9) and (0.5, 0.9) combinations. Again, lower value of q0 is better, especially for the ACO-S method. Likely, Table 14 summarizes the effects of both ACO-based feature selection methods for the weld flaw type classification dataset. For this dataset, the ACO-R method is better than the ACO-S method because it finds more best-feature subsets (7 out of 9 combinations vs. 5 out of 9 combinations) in shorter search time. This is exactly opposite to the weld flaw identification dataset. It is thus clear that the performance of a feature selection method is very much data dependent. Three feature subsets were found by both methods

Table 13 Effects of parameters on the weld flaw identification dataset with nearest means classifier. Method

q0

e

Classification error (%)

Minimum error (%)

CPU time (s)

Best feature subsets

ACO-S (10, 10)

0.1 0.1 0.1 0.5 0.5 0.5 0.9 0.9 0.9

0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9

17.6270.81 17.7771.57 19.5472.41 20.0571.53 19.871.75 19.1472.43 19.672.86 21.8271.21 20.7671.8

16.2 16.2 16.2 17.98 16.71 16.2 16.71 20.00 18.23

271.0 243.1 221.8 248.5 222.5 222.6 273.3 217.2 222.5

{2,6,9} {2,6,9} {2,6,9} – – {2,6,9} – – –

0.1 0.1 0.1 0.5 0.5 0.5 0.9 0.9 0.9

0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9

21.0670.66 21.1770.89 20.6171.5 20.7171.5 21.1171.22 21.0670.58 21.5270.76 20.8171.36 21.0670.83

20.25 20.25 17.98 18.23 18.99 20.25 20.25 18.48 19.75

110.7 118.7 115.0 122.5 102.5 128.8 118.8 120.4 118.5

– – – – – – – – –

ACO-R (50, 50)

1/5 1/5 1/5

1/5

Table 14 Effects of parameters on the weld flaw type classification dataset with nearest means classifier. Method

q0

E

Classification error (%)

Minimum error (%)

CPU time (s)

Best feature subsets

ACO-S (10, 10)

0.1 0.1 0.1

0.1 0.5 0.9

10.6970.38 10.9770.31 10.8370.38

10.42 10.42 10.42

61.7 62.8 65.3

0.5 0.5 0.5 0.9 0.9 0.9

0.1 0.5 0.9 0.1 0.5 0.9

11.6771.73 11.1170 11.3970.79 12.7871.27 14.1771.44 13.6171.81

10.42 11.11 10.42 11.11 11.81 11.11

65.3 62.7 64.9 79.2 65.1 65.3

{5,7,9,11} 3/5 {5,7,9,11} 1/5 {5,7,9,11} 1/5 {5,7,9,11,12} 1/5 {4,5,7,9,11} 2/5 – {4,5,7,9,11} 1/5 – – –

0.1

0.1

10.5670.31

10.42

30.7

0.1

0.5

10.5670.31

10.42

30.8

0.1

0.9

10.6970.38

10.42

30.8

0.5 0.5

0.1 0.5

11.1170 10.8370.38

11.11 10.42

30.3 30.5

0.5

0.9

10.8370.38

10.42

30.5

0.9

0.1

10.5670.31

10.42

31.1

0.9 0.9

0.5 0.9

11.1170 10.9770.31

11.11 10.42

29.9 29.7

ACO–R (30, 30)

{5,7,9,11} 3/5 {4,5,7,9,11} 1/5 {5,7,9,11} 1/5 {4,5,7,9,11} 2/5 {5,7,9,11,12} 1/5 {5,7,9,11} 2/5 {5,7,9,11,12} 1/5 – {5,7,9,11} 1/5 {5,7,9,11,12} 1/5 {5,7,9,11} 1/5 {5,7,9,11,12} 1/5 {5,7,9,11,12} 3/5 {4,5,7,9,11} 1/5 – {5,7,9,11} 1/5

ARTICLE IN PRESS T.W. Liao / NDT&E International 42 (2009) 229–239

that are able to produce the lowest classification error rate of 10.42%. Once again, lower value of q0 seems to be better for both methods. Many metaheurstic methods exist and some such as simulated annealing and tabu search have also been applied to feature selection. The ACO is chosen for this study because it is the newest metaheuristic. We are not aware of any comparative study of different metaheuristic methods for feature selection. A fair comparison would require non-trivial effort to tune the parameters associated with each metaheurstic method. Any results shown without such a tuning effort would be biased. We hope to carry out such comparative studies and share the results with the interested readers in the near future.

6. Conclusions For the purpose of improving the accuracy of a computer-aided radiographic weld inspection system, the effectiveness of feature selection was investigated. Two ant colony optimization-based feature selection methods based on different search strategies were presented. The two methods were first tested with a small 25-dimensional data set and then applied to two major stages in computed-aided radiographic weld inspection, i.e., identification of weld flaws and classification of weld flaw types. The testing method employed is stratified k-fold cross validation. The performances were measured in terms of classification error rate and CPU time. For comparison, all data sets were also processed by the sequential forward floating selection method that has been reported with good performance in the past. Based on the test results, the following conclusions can be drawn: (1) Feature selection improves the classification accuracy in both identification of weld flaws and classification of weld flaw types, compared with using no feature selection at all. (2) Among all four classifiers tested, nearest mean produces the lowest classification error of 16.2% for the weld flaw identification dataset with feature selection by either the SFFS or the ACO-S method. The ACO-R method does not work well on this particular dataset. (3) Among all four classifiers tested, both 1NN and fuzzy 1NN produce the lowest classification error of 9.03% for the weld flaw type classification dataset with feature selection by either one of the three methods considered. For this dataset, ACO-R seems to work better than the ACO-S method. (4) The ACO-based feature selection methods with proper selection of the associated parameters work comparably or better than the SFFS method depending upon the classifier used. The best combination of feature selection method and classifier seems to be data dependent. It would be worthwhile to test with more datasets to figure out whether one particular combination works better than others on a particular data set. It is also interesting to understand the reason why the ACO-R method works well in the classification of weld flaw types but not in the identification of weld flaws. The major difference in the two datasets is in the number of features. But is that the only contributing factor? Another possible future research topic is to develop better ACO-based feature selection methods, by devising a different stochastic local search policy and/or pheromone update rule. Lastly, a comparison between ant colony optimization and other metaheuristic methods for feature selection is also useful.

239

References [1] Liao TW. Classification of welding flaw types with fuzzy expert systems. Expert Systems with Applications 2003;25:101–11. [2] Felisberto MK, Lopes HS, Centeno TM, Arruda LVA. An object detection and recognition system for weld bead extraction from digital radiographs. Computer Vision and Image Understanding 2006;102:238–49. [3] Liao TW, Ni J. An automated radiographic NDT system for weld inspection: Part I. Weld extraction. NDT&E International 1996;29(3):157–62. [4] Liao TW, Tang K. Automated extraction of welds from digitized radiographic images based on MLP neural networks. Applied Artificial Intelligence 1997; 11:197–218. [5] Liao TW, Li DM, Li YM. Extraction of welds from radiographic images using fuzzy classifiers. Information Sciences 2000;126:21–42. [6] Liao TW. Fuzzy reasoning based automatic inspection of radiographic welds: weld recognition. Journal of Intelligent Manufacturing 2004;15: 69–85. [7] Carrasco M, Mery D. Segmentation of welding discontinuities using a robust algorithm. Materials Evaluation 2004;November:1142–7. [8] Daum W, Rose P, Heidt H, Builtjes JH. Automatic recognition of weld defects in X-ray inspection. British Journal of NDT 1987;29(2):79–82. [9] Gayer A, Saya A, Shiloh A. Automatic recognition of welding defects in realtime radiography. NDT International 1990;23(3):131–6. [10] Hyatt R, Kechter GE, Nagashima S. A method for defect segmentation in digital radiographs of pipeline girth welds. Materials Evaluation 1996; November:925–8. [11] Kaftandjian V, Dupuis O, Babot D, Zhu YM. Uncertainty modeling using Dempster–Shafer theory for improving detection of weld defects. Pattern Recognition Letters 2003;24:547–64. [12] Liao TW, Li YM. An automated radiographic NDT system for weld inspection: Part II. Flaw detection. NDT&E International 1998;31(3):183–92. [13] Liao TW, Li DM, Li YM. Detection of welding flaws from radiographic images with fuzzy clustering methods. Fuzzy Sets and Systems 1999;108(2): 145–58. [14] Murakami K. Image processing for non-destructive testing. Welding International 1990;4(2):144–9. [15] Wang X, Wong S. Radiographic image segmentation for weld inspection using a robust algorithm. Research in Nondestructive Evaluation 2005;16:131–42. [16] Aoki L, Suga Y. Application of artificial neural network to discrimination of defect type in automatic radiographic testing of welds. ISIJ International 1999;39(10):1081–7. [17] Kato Y, Okumura T, Matsui S, Itoga K, Harada T, Sugimoto K, et al. Development of an automatic weld defect identification system for radiographic testing. Welding in the World 1992;30(7/8):182–8. [18] da Silva RR, Caloˆba LP, Siqueira MHS, Rebello JMA. Pattern recognition of weld defects detected by radiographic test. NDT&E International 2004;37: 461–70. [19] Wang G, Liao TW. Automated identification of different types of welding defects in radiographic images. NDT&E International 2002;35:519–28. [20] Prener P. Improving the accuracy of decision tree induction by feature preselection. Applied Artificial Intelligence 2001;15:747–60. [21] Dorigo M, Caro GD, Gambardella LM. Ant algorithms for discrete optimization. Artificial Life 1999;5:137–72. [22] Yan Z, Yuan C. Ant colony optimization for feature selection in face recognition. In: Zhang D, Jain AK, editors. ICBA 2004, LNCS 3072. Berlin: Springer; 2004. p. 221–6. [23] Shen Q, Jiang JH, Tao JC, Shen GL, Yu RQ. Modified ant colony optimization algorithm for variable selection in QSAR modeling: QSAR studies of cyclooxygenase inhibitors. Journal of Chemical Information and Modeling 2005;45:1024–9. [24] Mohamed SS, Youssef AM, El-Saadany EF, Salama MMA. Artificial life feature selection techniques for prostrate cancer diagnosis using TRUS images. In: Kamel M, Campilho A, editors. ICIAR 2005, LNCS 3656. Berlin: Springer; 2005. p. 903–13. [25] Zhang C, Hu H. Ant colony optimization combining with mutual information for feature selection in support vector machines. In: Zhang S, Jarvis R, editors. AI 2005, LNAI 3809. Berlin: Springer; 2005. p. 918–21. [26] Bello R, Puris A, Nowe A, Martı´nez Y, Garcı´a MM. Two step ant colony system to solve the feature selection problem. In: Martı´nez-Trinidad JF, et al., editors. CIARP 2006, LNCS 4225. Berlin: Springer; 2006. p. 588–96. [27] Wang L, Yu J. A modified discrete binary ant colony optimization and its application in chemical process fault diagnosis. In: Wang TD, et al., editors. SEAL 2006, LNCS 4247. Berlin: Spring; 2006. p. 889–96. [28] Sivagaminathan RK, Ramakrishnan S. A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Systems with Applications 2007;33:49–60. [29] Keller JM, Gray MR, Givens Jr. JA. A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics 1985;15(4):580–5. [30] Gao QB, Wang ZZ. Center-based nearest neighbor classifier. Pattern Recognition 2007;40:346–9. [31] Pudil P, Novovicˇova´ J, Kittler J. Floating search methods in feature selection. Pattern Recognition Letters 1994;15:1119–25.