Kernel-based learning and feature selection analysis for cancer diagnosis

Applied Soft Computing 51 (2017) 39–48 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/as...

Download PDF

1MB Sizes 0 Downloads 131 Views

Report

PDF Reader
Full Text

Applied Soft Computing 51 (2017) 39–48

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Kernel-based learning and feature selection analysis for cancer diagnosis Seyyid Ahmed Medjahed a,∗ , Tamazouzt Ait Saadi b , Abdelkader Benyettou a , Mohammed Ouali c,d a

Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB, BP 1505, El M’naouer, 31000 Oran, Algérie University of Le Havre, France and University of Abdelhamid Ibn Badis Mostaganem, Algérie c Computer Science Department, University of Sherbrooke, J1K2R1 Canada d College of Computers and Information Technology, Taif University, Taif, Saudi Arabia b

a r t i c l e

i n f o

Article history: Received 2 June 2016 Received in revised form 5 November 2016 Accepted 5 December 2016 Available online 9 December 2016 Keywords: Classiﬁcation Feature selection Kernel-based learning Support vector machines recursive feature elimination Binary dragon ﬂy

a b s t r a c t DNA microarray is a very active area of research in the molecular diagnosis of cancer. Microarray data are composed of many thousands of features and from tens to hundreds of instances, which make the analysis and diagnosis of cancer very complex. In this case, gene/feature selection becomes an elemental and essential task in data classiﬁcation. In this paper, we propose a complete cancer diagnostic process through kernel-based learning and feature selection. First, support vector machines recursive feature elimination (SVM-RFE) is used to preﬁlter the genes. Second, the SVM-RFE is enhanced by using binary dragonﬂy (BDF), which is a recently developed metaheuristic that has never been benchmarked in the context of feature selection. The objective function is the average of classiﬁcation accuracy rate generated by three kernel-based learning methods. We conducted a series of experiments on six microarray datasets often used in the literature. Experiment results demonstrate that this approach is efﬁcient and provides a higher classiﬁcation accuracy rate using a reduced number of genes. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Cancer is a dangerous disease characterized by abnormally large cell proliferation in normal tissue. It is the most lethal disease in the world, and the diagnosis of cancer is a very difﬁcult and complex task. Early detection of cancerous cells can increase survival rates for patients by more than 97% 1 [1]. To aid in this, classiﬁer systems are commonly used in cancer classiﬁcation to help experts make a well-informed diagnosis. Recently, DNA microarray technology for cancer diagnosis has become a very popular topic of research [2–4]. It measures the expressions of large numbers of genes simultaneously and results in high quality tumor identiﬁcation. However, the number of genes is between 20,000 and 30,000, and the number of samples is often

∗ Corresponding author. E-mail addresses: [email protected], [email protected] (S.A. Medjahed), [email protected] (T.A. Saadi), [email protected] (A. Benyettou), [email protected] (M. Ouali). 1 American Cancer Society Hompage. (2008). Citing Internet sources Available from: http://www.cancer.org. http://dx.doi.org/10.1016/j.asoc.2016.12.010 1568-4946/© 2016 Elsevier B.V. All rights reserved.

less than 100, which results in the Hughes phenomenon (curse of dimensionality), and, consequently, an incorrect diagnosis. The curse of dimensionality was coined by Richard E. Bellman in 1961 and refers to phenomena that arise when analyzing data in highdimensional spaces. To resolve this, a gene/feature selection step is necessary. Feature selection is an essential step not only in gene selection but also in many other applications, such as pattern recognition, hyperspectral imagery, and bioinformatics [5–8]. In supervised classiﬁcation, feature selection is a major step in providing a high accuracy rate, improving the quality of classiﬁcation, and reducing the computational complexity of the classiﬁcation algorithm. The principal aim is to reduce the dimensionality by ﬁnding the smallest subset of features from the original features that can achieve maximum classiﬁcation accuracy. This task aids classiﬁcation performance by eliminating the irrelevant and redundant features. Feature selection approaches are characterized by objective function, search strategy, subset generator and stop criterion. Therefore, we can classify feature selection approaches into three classes: ﬁlter, wrapper and embedded approach [2,3,9]. In ﬁlter approach, the optimal feature subset is obtained by using a measure of relevance independent from the classiﬁer system [2]. In wrapper approach, the objective function is computed by using

40

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

Fig. 1. General schema of the proposed approach.

a classiﬁer: for example, the classiﬁcation accuracy rate is the percentage of examples that are correctly classiﬁed [5]. For the embedded method, the feature selection process is integrated into the classiﬁer system, so the search is guided by the classiﬁer system. Although the ﬁlter approach is much faster, features are independent of the classiﬁcation model. The wrapper approach is not as fast but still provides very good results [3]. In this paper, we propose a complete cancer diagnostic process through kernel-based learning and feature selection. In addition, we propose a new feature selection approach composed of two steps. The ﬁrst step consists of selecting a subset of candidate genes using SVM-RFE [10]. Since SVM-RFE does not take into account the redundancy of genes and becomes unstable at some values [11], we offer a second step of gene selection based on the BDF optimizer to locate the optimal subset of genes. BDF is a new swarm intelligence optimization inspired by the static and dynamic swarming behaviors of dragonﬂies in nature and was developed by Mirjalili Seyedali in 2015. The second step is applied to the subset of candidate genes selected by SVM-RFE and selects the most important and informative from the candidate gene subset by minimizing the classiﬁcation error rate. The problem of gene selection is a binary optimization problem, which will be solved using the BDF algorithm. The proposed approach is benchmarked by six microarray datasets widely used in the literature. Experiment results show that our approach outperforms several alternative methods and demonstrate its feasibility and effectiveness. The rest of the paper is organized as follows: Section 2 details the proposed approach; Section 3 describes and analyzes the experiment results; and Section 4 concludes this study and offers some overall perspective.

by SVM-RFE alone. The subset of genes generated by SVM-RFE will be considered the candidate subset of genes. The second step will select the optimal and the smallest subset of genes from the candidate subset by minimizing the objective function, which represents the classiﬁcation error rate (classiﬁcation error rate = 1 − classiﬁcation accuracy rate). In other term, we attempt to select the minimum size of features (smallest subset of features) that minimize the classiﬁcation error rate. The objective function is optimized by using BDF. Fig. 1 illustrates the proposed approach.

2.1. First phase In this pretreatment phase, SVM-RFE selects a subset of candidate genes (k) from the initial number of genes (N) [10]. Support vector machine (SVM) is a learning algorithm based on the principle of minimizing structural risk [12]. It is a method that analyzes data and is used for classiﬁcation and regression analysis. SVM consists to predict the class of each given input. To classify the data, SVM constructs a hyperplane in a high-dimensional space, providing good separation. This hyperplane has the largest distance to the nearest training point of any class (i.e., maximizing the distance between these classes). Clearly, there are several valid hyperplanes, but a remarkable property of SVM is that this hyperplane must be optimal. The points that lie closest to the maximum margin hyperplane are called the support vectors [12]. For a given training set of N points (xi , yi ) with input data xi , i = 1, . . ., N and output data yi = { −1, 1} given by an expert. The margin is the distance of closest examples from the line decision (hyperplane). The primal problem of optimal hyperplane is given as follows:

2. The proposed approach The proposed approach uses a combination of the SVM-RFE approach and BDF algorithm, since the effectiveness of SVM-RFE becomes unstable at some values of the genes, and, also, SVMRFE does not take into account the redundancy of the genes [11]. The proposed approach seeks to improve the results obtained

minw,b

1 w2 2

subject to

yi (w, xi + b) ≥ 1 i ∈ {1, . . ., N}

(1)

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

The problem of optimal hyperplane can be translated as a quadratic optimization problem, deﬁned as follows in dual form by introducing the Lagrangian Multiplier ˛i : N

−

min˛

˛i +

1 yi yj ˛i ˛j xi , xj 2

i=1

subject to

N

i,j

(2) ˛i yi = 0

41

to avoid to generate a subset of feature that dependent on a single classiﬁer. This formulation can be seen as a combinatorial optimization problem and can be solved by used a stochastic approach. We propose to use dragonﬂy algorithm which is a new swarm intelligence optimization inspired from the static and dynamic swarming behaviors of dragonﬂies in nature and developped by Mirjalili Seyedali in 2015 [13]. We chose to use Dragonﬂy algorithm for two major reasons:

i=1

∀i ∈ {1, . . ., N}, ˛i ≥ 0 The margin width for linear SVM is given as follows: w=

N

˛i yi xi

(3)

i=1

where ˛i is the Lagrangian Multiplier, xi is the gene expression, yi is the class of xi . SVM-RFE is an embedded approach developed by Guyon et al. [10] and it is used for feature selection by generate the ranking of features using backward feature elimination. The ranking score is calculated the components of weight vector w of the SVM deﬁned in Eq. (3). The algorithm of SVM-RFE can be deﬁned in three steps: Algorithm 1.

SVM-RFE Algorithm

1: 2: 3: 4: 5:

repeat Train SVM on the training set Compute w Remove the genes with the smallest weight until all genes are ranked

SVM-RFE is an embedded approach used for feature selection by generating a ranking of features using backward feature elimination. First, SVM-RFE is developed to solve the problem of gene selection for cancer classiﬁcation [10]. The ranking score is calculated using the weight vector w of the SVM deﬁned in Eq. (3). SVM-RFE attempts to select the genes using the weight vector w, which is computed using the support vector machine. The algorithm is iterative, and, in each iteration, the SVM classiﬁer is trained in computing the vector w, as the genes that have low weight are removed. Finally, the genes are sorted according to their weights, and the k genes with the highest weights are selected. In this paper, this phase is used as a preprocessing phase, to reduce the number of features. Also, this phase allows the selection of k N features and passes the selected features on to the second phase. 2.2. Second phase The wrapper approach selects features using a classiﬁer algorithm, which is used as an evaluation function. In this phase, we use BDF as a search strategy, and the kernel-based learning methods to compute the objective function, which is the classiﬁcation error rate. The classiﬁcation error rate is the average of classiﬁcation error rates obtained by -SVM, C-SVM, and LS-SVM. Taking the average of three classiﬁers will prevent the generation of a subset of features that depend on a single classiﬁer. This formulation can be seen as a combinatorial optimization problem and can be solved using a stochastic approach. Our wrapper approach is deﬁned as follow: Suppose D = {f1 , . . ., fk } feature set obtained after the ﬁrst phase. The principle is to select or not the feature fi which will be used in the classiﬁcation model. Therefore, we deﬁne the binary variable B = {B1 , . . ., Bk } such as If Bi = 1 the feature fi is selected. Else the feature fi is not selected The objective function is the classiﬁcation error rate which is the average of classiﬁcation error rate obtained by: -SVM, C-SVM and LS-SVM. The basic idea to take the average of three classiﬁer is

• The ﬁrst one is that Dragonﬂy algorithm has never been applied nor benchmarked in the context of gene selection and cancer classiﬁcation. • The second one is that Dragonﬂy algorithm is a new optimization algorithm more efﬁcient than Practical Swarm Optimization, Genetic Algorithm [13]. Dragonﬂy is an insect of Odonata family and it is considered as small predators that hunt all entire small insects. The dragonﬂy algorithm is composed of two major’s phases: the exploration and exploitation [13]. Reynolds [13,14] has deﬁned the behaviors of swarms as follows: Separation of the ith dragonﬂy, which is the avoidance of others individuals in the neighborhood and is deﬁned as: n

Si = −

Xi − Xj

(4)

j=1

Xi is the position of the current dragonﬂy and Xj the position of jth neighboring. n is the total number of neighboring [13]. Alignment of the ith dragonﬂy, which indicates the velocity compared to others dragonﬂies in the neighborhood and it can be computing as follows:

n Ai =

X j=1 j

(5)

n

Xj is the velocity of the jth neighboring dragonﬂy [13]. Cohesion of the ith dragonﬂy, which is the tendency of swarm towards the center of the mass of the neighborhood and is given as follows:

n

Ci =

X j=1 j n

− Xi

(6)

Attraction of the ith dragonﬂy, which is the attraction to the food source can be described as: Fi = X + − Xi

(7)

X+ is the position of the food source. Distraction of the ith dragonﬂy, which is the outwards of an enemy can be computing as Ei = X − + Xi

(8)

X− is the position of the enemy. In this stage, we use Si , Ai , Ci , Fi , Ei to update of the position of dragonﬂy. This position is deﬁned as follows: Xt+1 = (sS i + aAi + cC i + dF i + eE i ) + wXt Xi,t+1 = Xi,t + Xi,t+1

(9) (10)

where s, a, c, d and e respectively is the weight of separation, alignment, cohesion, food source and enemy position respectively Xi is the step. w is the inertia weight and t the iteration [13].

42

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

This algorithm is used to solve continuous variable, for binary variables, the equation T(Xi ) generates the step for binary variables [13]:

Xi T (Xi ) = Xi2 + 1

(11)

Xi,t+1 =

(12)

1, r < T (Xi,t+1 ) 0, r ≥ T (Xi,t+1 )

where r is random variable between 0 and 1. The schema of the BDA algorithm for gene selection is given as follows: Algorithm 2. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32:

This study’s major contribution is achieving a high classiﬁcation accuracy rate by solving the problem of gene selection. Our approach combines the advantages of two different approaches, the embedded and wrapper approaches. The embedded approach SVM-RFE is used to reduce the number of genes, selecting k genes from N (the total number of genes). The second phase uses the wrapper approach based on BDF. Generally, the standard wrapper approach uses the classiﬁcation error rate produced by a single classiﬁer, which gives rise to three disadvantages: the subset of selected genes is speciﬁc to this single classiﬁer; it increases overﬁtting when the number of samples is insufﬁcient; and the computational time for the number of genes is too large. The proposed approach overcomes these disadvantages by:

BDA Algorithm for Gene Selection

Split the microarray dataset into: Z1 : training set, Z2 : validation set and Z3 : testing set k is the number of genes after ﬁrst phase i is the ith dragon ﬂy and g is the gth gene Initialize the dragonﬂies population Xi,g (i = 1, . . ., n) Initialize step vectors Xi,g (i = 1, . . ., n) Initialize max i teration Initialize Err O ptimal = inf for t = 1 to max i teration do for i = 1 to n do Generate randomly Z1 and Z2 from Z1 and Z2 Z1 = Z1 − {genes with Xi,g = 0} Z2 = Z2 − {genes with Xi,g = 0} Train C-SVM using Z1 and evalute its using Z2

Err C − SVM = classiﬁcation error rate Train -SVM using Z1 and evalute its using Z2

Err − SVM = classiﬁcation error rate Train LS-SVM using Z1 and evalute its using Z2

Err L S − SVM = classiﬁcation error rate Err = Err C−SVM+Err 3−SVM+LS C−SVM if Err < Err B est then Err O ptimal = Err Xi,g O ptimal = Xi,g end if Compute S, A and C using Eqs. (4) and (5) and (6) Compute the food source Fi and enemy Ei using Eqs. (7) and (8) Update w, s, a, c, d and e Compute step vector Xt+1 using Eq. (9) Compute the probabilities T(x) using Eq. (11) Compute position vector Xt+1 using Eq. (12) end for end for Output: Xi,g O ptimal

Algorithm 2 describes the gene selection algorithm based on the BDF optimization algorithm, which is the second phase of the approach. First, the algorithm takes as input the dataset that contains k genes selected from the ﬁrst phase. Also, the algorithm takes the parameters of the support vector machine and the Gaussian kernel. Lines 1–7 represent the data initialization. The dataset is divided into three sets: training set (Z1 ), validation set (Z2 ) and testing set (Z3 ). The training and validation sets are used for gene selection, and the testing set is used to compute the classiﬁcation accuracy rate under the ﬁnal subset of genes. Line 8 is the main loop of the algorithm. Line 9 is the loop for dragonﬂies’ population (the search agents). Line 10 randomly generates training set Z1 and testing set Z2 from the initial training set Z1 and testing set Z2 . In lines 11 and 12, the algorithm removes the genes that have the value of Xi,g = 0 from the training and testing sets. From lines 13 to 18, the classiﬁcation error rate is computed using the three kernel-based methods. In line 19, the algorithm computes the average classiﬁcation error rate provided by the three classiﬁers. In lines 20 − 23, the evaluation of the objective function is completed, and in lines 24 − 29, the position of dragonﬂies’ populations is updated. The output of the algorithm is the subset of genes (i.e. Xi,g = 1) that produced the lowest average classiﬁcation error rate. The selected genes are evaluated under the testing set.

• SVM-RFE reduces the number of genes and, consequently, reduces the computational time in the second step. • Randomly choosing new training and test sets in each iteration of the algorithm prevents overﬁtting. • Using the average of classiﬁcation error rates generated by three kernel-based classiﬁers addresses the issue of the subset of selected genes not being speciﬁc to nor dependent on only one classiﬁer. 2.3. The objective function In this study, we propose to use an objective function that represents the average of classiﬁcation error rates computed from three classiﬁers: C-SVM, -SVM and LS-SVM. We have chosen three different variants of SVM for our calculations. LS-SVM is different from C-SVM and -SVM in the resolution of quadratic problems. LS-SVM transforms the dual problem as a set of linear equations, while CSVM and -SVM solve the dual problem using SMO, active set or other methods. -SVM, compared to C-SVM, has the advantage of controlling the number of support vectors used. Note that, while other classiﬁers can be used with this approach, we focused our study on kernel-based classiﬁers.

2.3.1. C-SVM C-SVM has been developed to classify the data that are not linearly separabale by introducing the slack variables i which represent the distance between wrong points and the optimal hyperplane. Another parameter is introduced called C which controls the trade-off between the slack variable and the margin size [12]. The quadratic optimization problem is given as follows:

minw,b,

1 i w2 + C 2

subject to

yi (w, xi + b) ≥ 1 − i

N

i=1

(13)

i ≥ 0 i ∈ {1, . . ., N} The dual problem of (13) has the following form:

min˛

N

−

˛i +

1 yi yj ˛i ˛j xi , xj 2

i=1

subject to

N i=1

i,j

(14) ˛i yi = 0

∀i ∈ {1, . . ., N}, 0 ≤ ˛i ≤ C

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

2.3.2. -SVM -SVM controls the complexity (the number of support vectors) of SVM by introducing a constant and a parameter [15]. The primal form of the quadratic optimization problem is as follows: minw,b,,

1 1 i w2 − + 2 N

subject to

yi (w, xi + b) ≥ 1 − i

N

i=1

(15)

43

Table 1 Classiﬁcation accuracy rate (%) and the selected features obtained by the proposed approach for each artiﬁcial dataset. Datasets

Selected features

Classiﬁcation accuracy rate

CorrAL m-of-n-3-7-10 Monk1 Monk2 Monk3

1, 2, 3, 4, 6 3, 4, 5, 6, 7, 8, 9 1, 2, 5 1, 2, 3, 6 2, 4, 5

90.62 100 97.22 68.00 80.56

i ≥ 0 and ≥ 0 Table 2 Information about the microarray datasets.

i ∈ {1, . . ., N} The dual problem is given as follows: 1 yi yj ˛i ˛j xi , xj 2

min˛

i,j

N

subject to

˛i yi = 0 (16)

i=1 N

∀i ∈ {1, . . ., N}, 0 ≤ ˛i ≤

1 N

The problem (14) and (16) can be solved by using Sequential Minimal Optimization, Active Set, Interior Point, etc. 2.3.3. LS-SVM LS-SVM is a least squares SVM version. The dual problem is deﬁned as linear equations system, in other terms, LS-SVM ﬁnds the solution by solving a linear equations system instead of a quadratic programming [16]. The dual problem (14) is transformed as the following linear euqations system: 0

1TN

1N

+ −1 IN

Number of genes

Number of samples

Number of classes

Breast cancer [18] Colon cancer [19] DLBC [20] Leukemia [21] Lung cancer [22] Ovarian cancer [23]

24,481 2000 4026 5147 12,533 15,154

97 62 47 72 181 253

2 2 2 2 2 2

˛i ≥ 0

i=1

Dataset name

b

˛

=

0

Y

(17)

where: Y = [y1 , . . ., yN ]T , 1N = [1, . . ., 1]T , ˛ = [˛1 , . . ., ˛N ], IN = identity matrix, = kernel function. We have chosen three different variant of SVM. LS-SVM is different compared to C-SVM and -SVM in the resolution of quadratic problem. LS-SVM transforms the dual problem as an set of linear equations system while the C-SVM and -SVM solve the dual problem using SMO, active set or other methods. -SVM compared to C-SVM has the advantage to control the number of support vector using . 3. Experimental results In this section, we present the results obtained by the experiments. The performance evaluation is conducted in terms of classiﬁcation accuracy rate, number of selected genes and computational time. We done two experimentations. The ﬁrst experimentation is performed to demonstrate the relevance of the proposed approach. We run experiments on ﬁve synthetic datasets (artiﬁcial benchmark datasets) widely used for testing the performance of feature selection approach, called CorrAL, m-of-n-3-7-10, Monk1, Monk2 and Monk3 [17]. These datasets can be obtained from https://www.sgi.com/tech/mlc/db/. The ﬁrst dataset is the artiﬁcial dataset CorrAL. This dataset is composed of 6 features (a0 , a1 , b0 , b1 , Irr, Corr) and it represents the target concept (a0 ∩ a1 ) ∪ (b0 ∩ b1 ). Irr is an irrelevant feature and Corr is highly correlated feature with the class label. The second dataset is the artiﬁcial dataset m-of-n-3-7-10 which is composed of 10 features including 3 irrelevant features. Features 2, 3, 4, 5, 6, 7, 8 are relevant to the class label.

The datasets Monk1, Monk2 and Monk3 are composed of 6 features (a1 ,. . .,a6 ). The target concept of each dataset is deﬁned as follows: • Monk1: (a1 = a2 ) ∨ (a5 = 1). Relevant features are: a1 , a2 , a5 • Monk2: (an = 1) for exactly two choices of n. Relevant features are: a1 , a2 , a3 , a4 , a5 , a6 • Monk3: (a5 = 3 ∧ a4 = 1) ∨ (a5 = / 1 ∧ a2 = / 3). Relevant feature are: a2 , a4 , a5 Table 1 describes the results obtained by the proposed approach for each artiﬁcial dataset. Table 1 shows the selected features and the classiﬁcation accuracy rate obtained by the proposed approach for each artiﬁcial dataset. As observed in Table 1, for CorrAL dataset, the proposed algorithm has selected a0 , a1 , b0 , b1 and Corr as the optimal feature subset. The classiﬁcation accuracy rate is 90.62%. The results obtained for m-of-n-3-7-10 is very satisfactory. The classiﬁcation accuracy rate is improved and achieved 100%. The proposed approach has selected 7 features, 3, 4, 5, 6, 7, 8, 9 as the optimal feature subset. We note that the relevant features are 2, 3, 4, 5, 6, 7, 8, therefore, the proposed approach has selected 6 relevant features among the 7. For the Monk1 dataset, our proposed approach has selected the features, 1, 2, 5 which correspond to the target concept. The accuracy rate is 97.22%. For the Monk2 dataset, the proposed approach has selected the features, 1, 2, 3, 6 and reached 68% of the accuracy rate. The results obtained for Monk3 are satisfactory. The proposed approach has selected the features, 2, 4, 5 as optimal feature subset which match with the target concept. The second experimentation is conducted on real datasets, which are the DNA microarray datasets. 3.1. Datasets The microarray datasets used in our study are common in the literature. We chose six DNA microarray datasets for research: breast cancer, colon cancer, DLBCL cancer (diffuse large B-cell lymphoma), leukemia, lung cancer, and ovarian cancer. Table 2 presents information about the microarray datasets. The ﬁrst column shows the name of each dataset, and the second column contains the number of genes. The third column is the number of samples, and the last column is the number of classes. The datasets are taken from Kent Ridge (KR) Bio-Medical Data

44

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

Fig. 2. Classiﬁcation accuracy rate for each selected genes obtained by the ﬁrst step SVM-RFE.

Set Repository and can be found at http://datam.i2r.a-star.edu.sg/ datasets/krbd/. 3.2. Parameters setting To assess the performance of the approach, we used an explicit investigative setting. Each microarray dataset was split into three subsets: 50% of the instances were used for the training phase, 30% for the validation phase, and the remaining 20% for the testing phase. To overcome the problem of overﬁtting, we randomly generated training and validation sets in each iteration of the algorithm. In the ﬁrst phase, SVM-RFE selected 60% of the initial genes (k = 60%). The value of k has been determined empirically. We have deﬁned 30%, 40%, 50%, 60%, 70% and 80%. The value of k that has given good results is 60%. A small value of k (20%, 40%) selects a small number of genes in the ﬁrst phase, therefore, the second phase will have no interest if small number of genes are selected in the ﬁrst phase. A large value of k (70%, 80%), makes the second step very computationally intensive and the processing time can be prohibitive. The second phase is a wrapper approach and is very computationally intensive. The idea is to balance the number of selected genes in each phase. In addition, the value of 60% produces a good classiﬁcation accuracy. Fig. 2 shows the classiﬁcation accuracy rate obtained by different percentage of selected genes in the ﬁrst step SVM-RFE. As seen in Fig. 2, the high classiﬁcation accuray rate obtained for each dataset is acheived by using 60% of genes. The SVM classiﬁer was used with the Gaussian kernel, and the parameter was initialized to 2. Hyper-parameters tuning of the SVM and the kernel function is a crucial task. Many optimization algorithms such as PSO, SA-SVM, DIRECT can be used to automatically optimize the hyper-parameters [24,25]. The parameters of the BDF algorithm were as follows: the number of dragonﬂies’ population is set to 60. The algorithm stops when the objective function is equal to 0 or when the number of iterations ﬁxed to 500 is reached. All of these parameters were chosen by experimentation and have proven their performance. 3.3. Results and discussion The experimentation is conducted in terms of: classiﬁcation accuracy rate, computational time and number of selected genes.

The classiﬁcation accuracy rate was computed under different training and testing sets. We run 100 executions of the proposed approach, at each execution, we split randomly the dataset into training, validation and testing sets. Then, we compute the classiﬁcation accuracy rate and we retain the best, the worst, and the mean classiﬁcation accuracy rate. Tables 3 and 4 describe the classiﬁcation accuracy rate, the number of selected genes and the computational time obtained by the proposed approach for each microarray dataset and under the testing set. Table 3 describes the classiﬁcation accuracy rate obtained by the proposed approach for each microarray dataset and under the testing set. Table 3 also presents the best, mean and worst classiﬁcation accuracy rates produced by our approach for each microarray dataset and at each step. The ﬁrst column is the name of the dataset. The second column represents the classiﬁcation accuracy rate (best, mean, worst) in the ﬁrst step (SVM-RFE). The third column represents the classiﬁcation accuracy rate (best, mean, worst) in the second step (BDF). Table 4 desribes the number of selected genes and the computational time obtained in each step of the proposed approach and for each microarray dataset. The ﬁrst column is the name of the dataset. The second column contains the initial number of genes, before the treatment. The third column represents the number of selected genes after applying SVM-RFE (we selected 60% of the genes that provided the highest weight vector of SVM). The fourth column is the number of genes after applying the BDF algorithm. The last column is the computational time. From Table 3, we can observe that the classiﬁcation accuracy rate is signiﬁcantly improved in the second step compared to the ﬁrst step. In the ﬁrst step, SVM-RFE selects the top 60% and in the second step, BDF algorithm selects the smallest subset among the 60% of features that achieves a high classiﬁcation accuracy rate. Analysis of the results presented in Tables 3 and 4 demonstrate the performance of the proposed approach. The classiﬁcation accuracy rate obtained by our approach reached 100% for ﬁve microarray datasets: colon, DLBCL, leukemia, lung and ovarian. For the breast cancer dataset, the classiﬁcation accuracy rate achieved was 89.47%. The number of genes was signiﬁcantly reduced. SVMRFE removes 40% of genes that are considered irrelevant, and the BDF algorithm ﬁnds the optimal subset of genes to improve the classiﬁcation accuracy rate. The performance of the breast cancer data was not improved by selecting 7237 genes from 24, 481

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

45

Table 3 Classiﬁcation accuracy rate (%), obtained by the proposed approach in the ﬁrst and second step. Dataset name

Breast cancer Colon cancer DLBC Leukemia Lung cancer Ovarian cancer

Step 1: SVM-RFE

Step 2: BDF

Classiﬁcation accuracy

Classiﬁcation accuracy

Best

Mean

Worst

Best

Mean

Worst

78.94 95.65 96.87 96.20 95.34 99.00

64.71 81.08 80.62 81.48 80.73 93.80

47.36 60.86 59.37 61.55 60.44 85.07

89.47 100 100 100 100 100

86.22 97.46 89.44 95.81 99.14 98.19

84.11 95.04 86.67 88.89 97.18 94.00

Table 4 Number of selected genes and computational time obtained by the proposed approach in the ﬁrst and second step. Dataset name

Initial number of genes

Step 1 SVM-RFE

Step 2 BDF

Computational time (S)

Breast cancer Colon cancer DLBC Leukemia Lung cancer Ovarian cancer

24,481 2000 4026 5147 12,533 15,154

14,688 1200 2415 3088 7519 9092

7237 510 1210 1522 3737 4573

1149.19 287.99 381.02 498.49 831.33 939.80

genes, which corresponds to 29.56% of the genes; the classiﬁcation accuracy rate was 86.22%. Therefore, the results provided by our algorithm for this dataset were not very satisfying compared to the other datasets. As seen in Tables 3 and 4, the performance of the colon cancer data was very satisfying, as maximum accuracy of 100% was reached with 510 genes from 2000 genes, representing 25% of the genes. The results obtained for the DLBCL dataset were quite appropriate, as 100% accuracy was achieved using 30% of genes (1210 genes from 4026 genes). For the leukemia dataset, the number of selected genes was 1522 genes from 5144 genes (29.58% of genes), attaining 100% classiﬁcation accuracy rate. The analysis of the results of the lung cancer data shows that the proposed approach has reached maximum accuracy with 3737 genes from 12,533 genes, which represents 29.81% of the genes. The last dataset is ovarian cancer, for which the proposed approach produced high accuracy (100%), with 4573 genes from 15,154 genes (30.17%). In terms of computational time, wrapper approaches are demanding because they make repetitive calls to the classiﬁer. Our approach is not overwhelming in terms of computational time because the ﬁrst phase eliminates 60% of the genes using SVM-RFE. In this study, SVM-RFE is used in the ﬁrst step of the algorithm as pre-processing step to reduce the number of features, because, in the second step we use the BDF algorithm and it becomes very computationally when the number of features is very large. We propose to test other ﬁltering techniques such as mRMR instead of SVM-RFE for selecting the top k = 60% of genes from initial genes set. The results are described in Table 5. Table 5 shows the best, average and worst classiﬁcation accuracy rate obtained by the proposed approach at each step on the six datasets. We observe from Table 5 that the results obtained by the proposed approach using mRMR as a pre-processing stage are quite similar compared to SVM-RFE. As a pre-processing stage, one can use either SVM-RFE or mRMR. Both methods produce a decent seed point where the BDF method takes over and it is not as important which initialization method is used since much of the work is done by BDF in the second stage. Nevertheless, mRMR has a slight advantage over SVM-RFE in terms of accuracy and it is slightly faster. Hence, we propose to use mRMR instead of SVM-RFE in the pre-ﬁltering stage. Comparison with other approaches is given in Table 6. The results are reported from [26].

Table 6 compares the classiﬁcation accuracy rate obtained by our approach and other approaches suggested in the literature. We note that the methods described in the table are reported from [26] and they use cross validation to compute the classiﬁcation accuracy rate. These papers do not detail the ways in which the experimentations were conducted [26]. However, we suggest to compare the proposed approach with seven ﬁlter approaches: Max-Relevance Min-Redundancy (mRMR) [9,17], Mutual Information Maximization (MIM) [45], Mutual Information Feature Selection (MIFS) [45], Conditional Mutual Info Maximization (CMIM) [45], Joint Mutual Information (JMI) [9,17], Gini Index and Relief. In addition, four wrapper approaches are also used for comparison: Particle Swarm Optimization (PSO) [46], Binary Bat Algorithm (BBA) [47], Binary Gravitational Search Algorithm (BGSA) [48] and Genetic Algorithm (GA) [37]. The proposed approach is also compared to two classiﬁers: SVM and KNN by using all the features. Table 7 presents the classiﬁcation accuracy rate obtained by the proposed approach compared to seven ﬁlter approaches, four wrapper approaches and two classiﬁers using all the features: SVM and KNN. The comparison between the proposed approach and other feature selection techniques has been conducted with the same training and testing samples. SVM with Gaussian kernel ( = 2) was used to report ﬁnal performance after the selection of the optimal subset of features. For ﬁltering methods, we have tested different percentage of selected features and we retain the subset of features which provided the best classiﬁcation accuracy rate. As seen in Table 7, our proposed approach achieved a high classiﬁcation accuracy rate while selecting the fewest number of genes. The most remarkable observation for our approach concerns nearly all of the datasets, in that we obtained the maximum classiﬁcation accuracy rate (100%). To validate the performance of our approach, we tested the stability by executing the algorithm 100 times with randomly generated training and testing sets, using the holdout method. Stability can be deﬁned as the sensitivity of feature preference (i. e ., how different training sets affect the feature preference) [49]. The results are illustrated in Fig. 3. From the experiment results, we can conclude that our approach produced satisfying results compared to other approaches. This paper can be summarized with the following points:

46

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

Table 5 Classiﬁcation accuracy rate (%), obtained by the proposed approach in the ﬁrst step: mRMR and second step: BDF. Dataset name

Breast cancer Colon cancer DLBC Leukemia Lung cancer Ovarian cancer

Step 1: mRMR

Step 2: BDF

Classiﬁcation accuracy

Classiﬁcation accuracy

Best

Mean

Worst

Best

Mean

Worst

77.29 90.61 95.83 97.07 95.62 91.50

71.31 81.65 89.70 92.99 90.05 85.21

65.12 70.00 83.19 89.10 84.65 79.88

88.31 100 100 100 100 100

83.78 96.66 93.75 95.53 96.98 97.16

80.01 93.42 88.02 90.99 94.33 94.00

Table 6 Comparison between the proposed approach and some previous work. Authors

Colon

DLBC

Leukemia

Lung

Ovarian

Muchenxuan et al. [27] Tan et al. [28] Ye et al. [29,26] Liu et al. [30,26] Tan and Gilbert [31,26] Ding and Peng [32,26] Cho and Won [33,26] Yand et al. [34,26] Peng et al. [35,26] Wang et al. [36,26] Huerta et al. [37,26] Pang et al. [38,26] Li et al. [39,26] Zhang et al. [40,26] Yue et al. [41,26] Hernandez et al. [42,26] Li et al. [43,26] Wang et al. [44,26] Edmundo et al. [26] This study Average values Best value

90.30 80.70 85.00 91.90 95.10 93.50 87.70 84.80 87.00 100 91.40 83.80 83.50 90.30 85.40 84.60 93.60 93.50 93.50

96.10 80.50 – 98.00 – – 93.00 – – 95.60 – – 93.00 92.20 – – – – 100

97.20 73.60 97.50 100 91.10 100 95.90 73.20 98.60 95.80 100 94.10 97.10 100 91.50 100 100 100 100

– – – 100 93.20 97.20 – – 100 – – 91.20 – 100 – – – – 98.30

– – – 99.20 – – – – – – – 98.80 99.90 – – – – – 98.80

97.46 100

89.44 100

95.81 100

99.14 100

98.19 100

Table 7 Comparison between the best classiﬁcation accuracy rate obtained by the proposed approach and several feature selection approaches. Datasets

Colon

Leukemia

Ovarian

Breast

DLBC

Average performance

Standard deviation

Filter approaches MIM mRMR MIFS JMI CMIM Gini Index Relief

82.61 90.61 82.61 73.91 56.62 80.50 81.46

70.37 97.07 77.78 66.67 62.69 94.00 91.61

89.47 91.50 87.20 91.81 91.25 99.40 95.82

74.03 77.29 66.16 69.89 65.16 72.95 47.37

95.67 95.83 87.50 86.11 85.67 94.44 91.00

82.43 90.46 80.25 77.67 72.27 88.25 81.45

10.49 7.85 8.81 10.80 15.22 11.06 19.76

Wrapper approaches PSO BGSA BBA GA

99.83 67.50 77.08 98.30

100 85.00 90.35 97.20

100 92.60 69.00 98.83

76.32 48.15 57.10 95.86

99.44 73.33 77.22 98.27

95.11 73.31 74.15 97.69

10.51 17.14 12.22 1.18

Using all features KNN SVM

91.67 95.33

96.43 100

93.00 100

63.16 72.42

72.22 94.44

83.29 92.43

14.70 11.48

This study Best values Average values Standard deviation

100 97.46 1.31

100 95.81 3.43

100 98.19 1.50

89.47 86.22 2.52

100 89.44 8.18

97.89 93.42 –

4.70 5.30 –

1. The proposed approach is an amalgamation of embedded and wrapper approaches (SVM-RFE and BDF). 2. The embedded approach is SVM-RFE, which was used to reduce the initial number of genes (60% of genes selected). 3. The wrapper approach is based on the BDF algorithm, which is a recent metaheuristic never used in the context of gene selection.

4. The objective function is the average classiﬁcation error rate obtained by 3 variants of SVM (C-SVM, -SVM and LS-SVM). 5. The proposed approach is not time-consuming compared to other wrapper approaches deﬁned in the literature. 6. Compared to other methods, the proposed approach is powerful and efﬁcient in selecting the subset of genes that achieved a high classiﬁcation accuracy rate.

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

47

Fig. 3. Classiﬁcation accuracy rate obtained by different training and test sets.

4. Conclusion In this paper, we present a cancer diagnostic process based on feature selection. The proposed approach incorporates two phases: the ﬁrst selected 60% of candidate genes using SVM-RFE. The second phase selected the optimal subset of genes from the candidate genes using the BDF algorithm. In the second phase, we optimized the objective function, which is the mean of the three values of classiﬁcation error rate obtained by three kernel-based learning methods. The performance of our approach was demonstrated on six microarray datasets: breast cancer, colon cancer, ovarian cancer, DLBCL cancer, leukemia and lung cancer. Additionally, our approach was compared with several approaches mentioned in the literature. Compared to previous work, our approach provides a high classiﬁcation accuracy rate with a much smaller gene subset. According to our experimentation, it has been shown that our approach can successfully challenge the problem of gene selection and the diagnosis of cancer. We also tested the stability of our approach by running the algorithm 100 times, and the results show that our approach is stable. Gene selection is an important issue in medical research. It improves the quality and complexity of the classiﬁer model and, consequently, signiﬁcantly improves the classiﬁcation accuracy rate. For future work, instead of minimizing the classiﬁcation accuracy rate, another objective function could be designed from class separability distances or correlation between genes.

References [1] K.D. Miller, R.L. Siegel, C.C. Lin, A.B. Mariotto, J.L. Kramer, J.H. Rowland, K.D. Stein, R. Alteri, A. Jemal, Cancer treatment and survivorship statistics, 2016, Cancer J. Clin. 66 (4) (2016) 271–289. [2] R. Sheikhpoura, M.A. Sarrama, R. Sheikhpourb, Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classiﬁers in diagnosis of breast cancer, Appl. Soft Comput. 40 (2016) 113–131. [3] J. Apollonia, G. Leguizamna, E. Alba, Two hybrid wrapper-ﬁlter feature selection algorithms applied to high-dimensional microarray experiments, Appl. Soft Comput. 38 (2016) 922–932. [4] B.A. Garroa, K. Rodrgueza, R.A. Vzquezb, Classiﬁcation of DNA microarrays using artiﬁcial neural networks and ABC algorithm, Appl. Soft Comput. 38 (2016) 548–560. [5] A.A. Abdoos, P.K. Mianaei, M.R. Ghadikolaei, Combined VMD-SVM based feature selection method for classiﬁcation of power quality events, Appl. Soft Comput. 38 (2016) 637–646.

[6] Y. Lina, Q. Hua, J. Liub, J. Chenc, J. Duana, Multi-label feature selection based on neighborhood mutual information, Appl. Soft Comput. 38 (2016) 244–256. [7] J. Prez-Rodrguez, A.G. Arroyo-Pea, N. Garca-Pedrajas, Simultaneous instance and feature selection and weighting using evolutionary computation: proposal and study, Appl. Soft Comput. 37 (2015) 416–443. [8] S.A. Medjahed, T.A. Saadi, A. Benyettou, M. Ouali, Gray Wolf Optimizer for hyperspectral band selection, Appl. Soft Comput. 40 (2016) 178–186. [9] O.M. Soufan, D. Kleftogiannis, P. Kalnis, V.B. Bajic, DWFS: a wrapper feature selection tool based on a parallel genetic algorithm, PLOS ONE 10 (2) (2015) 1–23. [10] G. Isabelle, W. Jason, B. Stephen, V. Vladimir, Gene selection for cancer classiﬁcation using support vector machines, Mach. Learn. 46 (1) (2002) 389–422. [11] P.A. Mundra, J.C. Rajapakse, SVM-RFE with MRMR ﬁlter for gene selection, IEEE Trans. Nanobiosci. 9 (1) (2010) 31–37. [12] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [13] M. Seyedali, Dragonﬂy algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems, Neural Comput. Appl. (2015) 1–21. [14] C. Reynolds, Flocks, herds and schools: a distributed behavioral model, SIGGRAPH Comput. Graph. 21 (4) (1987) 25–34. [15] B. Schlkopf, A.J. Smola, R.C. Williamson, P.L. Bartlett, New support vector algorithms, Neural Comput. 12 (5) (2000) 1207–1245. [16] J.A.K. Suykens, T.V. Gestel, J.D. Brabanter, B.D. Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientiﬁc Pub. Co, Singapore, 2002. [17] O.M. Soufan, An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia, 2012, Master’s thesis. [18] L.J. van ’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression proﬁling predicts clinical outcome of breast cancer, Nature 415 (6871) (2002) 530–536. [19] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, A. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. 96 (12) (1999) 6745–6750. [20] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J.J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, L.M. Staudt, Distinct types of diffuse large B-cell lymphoma identiﬁed by gene expression proﬁling, Nature 403 (6769) (2000) 503–511. [21] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomﬁeld, Molecular classiﬁcation of cancer: class discovery and class prediction by gene expression monitoring, Science 286 (1999) 531–537. [22] G.J.G. Gordon, R.V.R. Jensen, L.-L.L. Hsiao, S.R.S. Gullans, J.E.J. Blumenstock, S.S. Ramaswamy, W.G.W. Richards, D.J.D. Sugarbaker, R.R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer Res. 62 (17) (2002) 4963–4967.

48

S.A. Medjahed et al. / Applied Soft Computing 51 (2017) 39–48

[23] E.F. Petricoin, A.M. Ardekani, B.A. Hitt, V.A.F.J.D. Peter J Levine, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, L.A. Liotta, Use of proteomic patterns in serum to identify ovarian cancer, Mech. Dis. 259 (9306) (2002) 572–577. [24] S.-W. Lina, K.-C. Yingb, S.-C. Chenc, Z.-J. Lee, Particle swarm optimization for parameter determination and feature selection of support vector machines, Expert Syst. Appl. 35 (4) (2008) 1817–1824. [25] S.-W. Lina, Z.-J. Lee, S.-C. Chenc, T.-Y. Tseng, Parameter determination of support vector machine and feature selection using simulated annealing approach, Appl. Soft Comput. 8 (4) (2008) 1505–1512. [26] E.B. HuertaAfﬁliated, B. Duval, J.-K. Hao, Pattern Recognition in Bioinformatics: Third IAPR International Conference, PRIB 2008, Melbourne, Australia, October 15–17, 2008, in: Proceedings, chap. Gene Selection for Microarray Data by a LDA-Based Genetic Algorithm, Springer Berlin Heidelberg, 2008, pp. 250–261. [27] M. Tonga, K.-H. Liub, C. Xuc, W. Jud, An ensemble of SVM classiﬁers based on gene pairs, Comput. Biol. Med. 43 (2013) 729–737. [28] A.C. Tan, D.Q. Naiman, L. Xu, R.L. Winslow, D. Geman, Simple decision rules for classifying human cancers from gene expression proﬁles, Bioinformatics 21 (2005) 3896–3904. [29] J. Ye, T. Li, T. Xiong, R. Janardan, Using uncorrelated discriminant analysis for tissue classiﬁcation with gene expression data, IEEE Trans. Comput. Biol. Bioinform. 1 (4) (2004) 181–190. [30] B. Liu, Q. Cui, T. Jiang, S. Ma, A combinational feature selection and ensemble neural network method for classiﬁcation of gene expression data, BMC Bioinform. 5 (136) (2004) 1–12. [31] A.C. Tan, D. Gilbert, Ensemble machine learning on gene expression data for cancer classiﬁcation, Appl. Bioinform. 2 (3) (2003) 75–83. [32] C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data, J. Bioinform. Comput. Biol. 3 (2) (2005) 185–206. [33] S.B. Cho, H.-H. Won, Cancer classiﬁcation using ensemble of neural networks with multiple signiﬁcant gene subsets, Appl. Intell. 26 (3) (2007) 243–250. [34] W.-H. Yang, D.-Q. Dai, H. Yan, Generalized discriminant analysis for tumor classiﬁcation with gene expression data, Mach. Learn. Cybern. (2006) 4322–4327. [35] Y. Peng, W. Li, Y. Liu, A hybrid approach for biomarker discovery from microarray gene expression data, Cancer Inform. (2006) 301–311.

[36] Z. Wang, V. Palade, Y. Xu, Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis, Proc. Evolv. Fuzzy Syst. 24 (2006) 1–24, 6. [37] E.B. Huerta, B. Duval, J.-K. Hao, A hybrid GA/SVM approach for gene selection and classiﬁcation of microarray data, in: Rothlauf EvoWorkshops 2006 LNCS, 2006, pp. 34–44. [38] S. Pang, I. Havukkala, Y. Hu, N. Kasabov, Classiﬁcation consistency analysis for bootstrapping gene selection, Neural Comput. Appl. (2007) 527–539. [39] G.-Z. Li, X.-Q. Zeng, J.Y. Yang, M.Q. Yang, Partial least squares based dimension reduction with gene selection for tumor classiﬁcation, in: Proc. of 7th IEEE Intl. Symposium on Bioinformatics and Bioengineering, 2007, pp. 1439–1444. [40] L.-J. Zhang, Z.-J. Li, H.-W. Chen, An effective gene selection method based on relevance analysis and discernibility matrix, in: PAKDD 2007. LNCS 4426, 2007, pp. 1088–1095. [41] F. Yue, K. Wang, W. Zuo, Informative gene selection and tumor classiﬁcation by null space LDA for microarray data, in: ESCAPE 2007. LNCS 4614, 2007, pp. 435–446. [42] J.C.H. Hernandez, B. Duval, J.-K. Hao, A genetic embedded approach for gene selection and classiﬁcation of microarray data, in: EvoBIO 2007. LNCS 4447, 2007, pp. 90–101. [43] S. LiAfﬁliated, X. Wu, X. Hu, Gene selection using genetic algorithm and support vectors machines, Soft Comput. 12 (7) (2008) 693–698. [44] S. Wang, H. Chen, S. Li, D. Zhang, Feature extraction from tumor gene expression proﬁles using DCT and DFT, in: EPIA 2007. LNCS 4874, 2007, pp. 485–496. [45] G. Brown, A. Pocock, M.-J. Zhao, M. Luján, Conditional likelihood maximisation: a unifying framework for information theoretic feature selection, J. Mach. Learn. Res. 13 (2012) 27–66. [46] KJ, ER, Particle swarm optimization, in: Proceedings of IEEE International Conference on Neural Networks IV, vol. 4, 1995, pp. 1942–1948. [47] S. Mirjalili, S.M. Mirjalili, X.-S. Yang, Binary bat algorithm, Neural Comput. Appl. 25 (3–4) (2013) 663–681. [48] E. Rashedi, H. Nezamabadi, S. Saryazdi, GSA: a gravitational search algorithm, Inf. Sci. 179 (13) (2009) 2232–2248. [49] A. Kalousis, Stability of feature selection algorithms, in: Fifth IEEE International Conference on Data Mining, 2005.

Kernel-based learning and feature selection analysis for cancer diagnosis

Kernel-based learning and feature selection analysis for cancer diagnosis

Recommend Documents