Available online at www.sciencedirect.com Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Computer Science 00 (2019) 000–000 Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 00 (2019) 000–000
7th International Conference onComputer Information ScienceDirect Procedia Science 00Technology (2019) 000–000 and Quantitative Management (ITQM 2019) Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 162 (2019) 307–315 7th 7th International International Conference Conference on on Information Information Technology Technology and and Quantitative Quantitative Management Management (ITQM 2019) (ITQM 2019) 7th International Conference on Information Technology andfor Quantitative Management Classification and Feature Selection Method Medical Datasets (ITQM 2019) 7th International Conference on Information Technology and Quantitative Management by Brain Storm Feature Optimization Algorithm and Support Vector Classification Method (ITQM 2019) Classification and and Feature Selection Selection Method for for Medical Medical Datasets Datasets
Machine by Optimization Algorithm by Brain Brain Storm Storm Optimization Algorithm and Support Vector Classification and Feature Selection Methodand forSupport MedicalVector Datasets Machine Classification and Feature Selection Method forSupport Medical Datasets by Tuba Brain Storm Optimization Algorithm and Vector Machine a a a a Eva , Ivana Strumberger , Timea Bezdan , Nebojsa Bacanin , Milan Tubaa by Brain Storm Optimization Algorithm and Support Vector Machine a a a a a Singidunum University Eva Tuba Bezdan a, Ivana Strumberger a, Timea a, Nebojsa Bacanin a, Milan Tuba a Machine Eva Tuba , Ivana Strumberger Danijelova , Timea , Nebojsa Bacanin , Milan Tuba 29,Bezdan 11000 Belgrade a
Serbia
a Singidunum University a a a a Singidunum Eva Tubaa , Ivana Strumbergera , Timea Bezdan University, Nebojsa Bacanin , Milan Tuba Danijelova 29, 11000 Belgrade a 29,Bezdan 11000 Belgrade Eva Tubaa , Ivana StrumbergeraDanijelova , Timea , Nebojsa Bacanina , Milan Tubaa Serbia a Singidunum University
Abstract
Serbia Danijelova 29, 11000 Belgrade a Singidunum University Serbia Danijelova 29, 11000 Belgrade Serbia
Medicine is one of the sciences where development of computer science enables a lot of improvements. Usage of computers
Abstract in medicine increases the accuracy and speeds up processes of data analysis and setting the diagnoses. Nowadays, numerous Abstract
computer isaided diagnostic systems exist and machine learning science algorithms haveasignificant role in them. Faster and more Medicine one of the sciences where development of computer enables lot of improvements. Usage of computers Medicine is one of sciences Common where development of computer science a lot of improvements. Usage of computers Abstract accurate systems arethe necessary. machine learning task that is partenables ofand computer diagnosticNowadays, systems and different in medicine increases the accuracy and speeds up processes of data analysis settingaided the diagnoses. numerous in medicine increases the accuracy and speeds up processesInoforder data analysis and setting the diagnoses. Nowadays, numerous medical analytic packages is classification. to obtain better classification accuracy itFaster is of important to Abstract computer aided diagnostic systems exist and machine learning algorithms have role in them. and more Medicinedata is one of thesoftware sciences where development of computer science enables asignificant lot of improvements. Usage computers computer aidedset diagnostic systems exist and machine learningmodel. algorithms havedatasets significant role in large them.feature Fastersets andwhere more choose feature and proper parameters for the classification Medical often have accurate systems are necessary. Common machine learning task that is part of computer aided diagnostic systems and different in medicine increases the accuracy and speeds up processes of data analysis and setting the diagnoses. Nowadays, numerous Medicine is one of the sciences Common where development of computer science a lot ofaided improvements.systems Usage of computers accurate systems are machine task that is partenables of computer and different many features are in necessary. correlation withexist others thus itlearning is important to reduce the feature set. In diagnostic this accuracy paper we itpropose adjusted medical data analytic software packages isand classification. Inoforder to obtain better classification is important to computer aided diagnostic systems machine learning algorithms have significant role in them. Faster and more in medicine increases the accuracy and speeds up processes data analysis and setting the diagnoses. Nowadays, numerous medical data analytic software packages is classification. In order to obtain better classification accuracy it is important to brain storm optimization algorithm for feature selection in medical datasets. Classification was done by support vector machine choose feature set and proper parameters for the classification model. Medical datasets often have large feature sets where accurate systems are necessary. Common machine learning task that is part of computer aided diagnostic systems and different computer aidedset diagnostic systems exist and machine learningmodel. algorithms havedatasets significant role in large them.feature Fastersets andwhere more choose feature and proper parameters for the classification Medical often have where itsdata parameters are optimized by brain optimization Theclassification proposed is tested on adjusted standard many features areare in necessary. correlation withalso others thus itstorm is important to reduce the feature set. In diagnostic thismethod paper we propose medical analytic software packages is classification. In toalgorithm. obtain better accuracy it is and important to accurate systems Common machine taskorder that is part of aided systems different many features are medical in correlation with thus itlearning important to reduce thecomputer featureBy set. In this paper we propose adjusted publicly available datasets andothers compared toisother state-of-the-art methods. analyzing thelarge obtained results it was brain storm optimization algorithm for feature selection in medical datasets. Classification was done by support vector machine choose feature set and proper parameters for the classification model. Medical datasets often have feature sets where medical dataoptimization analytic software packages is classification. In order to obtain better classification accuracy it vector is important to brain storm algorithm for feature selection in medical datasets. Classification was done by support machine shown thatparameters the are proposed method achieves higher accuracy and reduce the number of feature where its areproper optimized also by brain storm optimization algorithm. The proposed method is onsets standard many with others thus is important to reduce the feature set.often Inneeded. this paper wetested propose adjusted choosefeatures feature set in andcorrelation parameters for the it classification model. Medical datasets have large feature where where its parameters are optimized also by brain storm optimization algorithm. The proposed method is tested on standard c 2019 ⃝ The Authors. Published byfor Elsevier B.V. publicly available datasets and compared other state-of-the-art analyzing the obtained results it was brain storm optimization algorithm feature selection in medical Classification was by support vector machine many are medical in correlation with thus itto important todatasets. reduce methods. the featureBy set. In done this paper we propose adjusted publicly available medical datasets andothers compared tois other state-of-the-art methods. By analyzing the obtained results it was © 2020features The Authors. Published by Elsevier B.V. Selection and/or peer-review under responsibility of ITQM2019. shown that the proposed method achieves higher accuracy and reduce the number of feature needed. where its parameters are optimized also by brain storm optimization algorithm. The proposed method is tested on standard brainisstorm optimization algorithm for selection inlicense medical datasets. Classification was done by support vector machine This an open access article under thefeature CChigher BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/4.0/) shown that the proposed method achieves accuracy and reduce the number of feature needed. c 2019itsThe ⃝ Authors. Published bythe Elsevier B.V. publicly available medical datasets and compared to other state-of-the-art methods. By analyzing the obtained results it and was where parameters are optimized also by feature brain storm optimization algorithm. The proposed is tested on standard Peer-review under responsibility of scientific of the 7th International Conference on method Information Technology c 2019 The ⃝ Authors. Published by Elsevier B.V.committee Keywords: medical datasets, classification, selection, support vector machine, optimization, swarm intelligence, brain Selection and/or peer-review under responsibility of ITQM2019. shown that the proposed method achieves higher accuracy and reduce the number of feature needed. Quantitative Management (ITQM 2019) publicly available medical datasets and comparedoftoITQM2019. other state-of-the-art methods. By analyzing the obtained results it was Selection and/or peer-review under responsibility storm optimization algorithm c 2019that ⃝ Thethe Authors. Published by Elsevier B.V.accuracy and reduce the number of feature needed. shown proposed method achieves higher Keywords: medical datasets, classification, feature selection, support vector machine, optimization, swarm intelligence, brain Selection and/or peer-review under ITQM2019. Keywords: medical datasets, classification, support vector machine, optimization, swarm intelligence, brain c 2019 ⃝ The Authors. Published by responsibility Elsevierfeature B.V. ofselection, storm optimization algorithm 1. Introduction storm optimization algorithm Selection peer-review responsibility ITQM2019. Keywords:and/or medical datasets, under classification, featureofselection, support vector machine, optimization, swarm intelligence, brain storm optimization algorithm
Progress in computer and technology development enables improvement in numerous fields andbrain one Keywords: medical datasets,science classification, feature selection, support vector machine, optimization, swarm intelligence, 1. Introduction 1. Introduction storm optimization algorithm of them is medicine. In medicine fast and accurate diagnosis can save a patient’s life so it is crucial to have good Progress in computer science and(CAD) technology development enables Usually, improvement in numerous andCAD one computer aided diagnostic systems that can help physicians. the most essentialfields part of 1. Introduction Progress in computer science and technology development enables improvement in numerous fields and one of them is medicine. In medicine fast and accurate diagnosis can save a patient’s life so it is crucial to have good systems is classification. 1. them Introduction of is medicine. In medicine fast and accurate diagnosis can save a patient’s life so it is crucial to have good Progress in computer and(CAD) technology development enables improvement in numerous fields andCAD one computer aided diagnostic systems that tasks can help physicians. Usually, the most essential part of Classification is one ofscience the machine learning while machine learning algorithms used to gather computer aided diagnostic systems (CAD) that can help physicians. Usually, the mostare essential part of some CAD of them is medicine. In medicine fast and accurate diagnosis can save a patient’s life so it is crucial to have good systems classification. Progress in computer and technology enables improvement numerous fields and one knowledge from the set ofscience given data, search for development certain patterns and after that makeintheir own decisions based systems is classification. computer aided diagnostic systems (CAD) that tasks candiagnosis help Usually, the most essential of some CAD Classification is one of the machine learning whilephysicians. machine learning algorithms are used topart gather of them is medicine. In medicine fast and accurate can save a patient’s life so it is crucial to have good Classification is one of the machine learning tasks while machine learning algorithms are used to gather some systems is aided classification. knowledge from diagnostic the set of given data, searchthat forcan certain patterns and after that make theiressential own decisions computer systems (CAD) help physicians. Usually, the most part of based CAD ∗ Corresponding knowledge from author: the setMilan of given search certain after make their own decisions based Tuba data, supported by thefor Ministry of patterns Education,and Science andthat Technological Development of Republic of Classification is one of the machine learning tasks while machine learning algorithms are used to gather some systems is classification. Serbia, Grant No. III-44006. knowledge from
[email protected]. the set of of the given data, search fortasks certain patterns andlearning after that make their address: Classification is one machine learning while machine algorithms areown useddecisions to gather based some ∗E-mail author: Milan Tuba supported by the Ministry of Education, Science and Technological Development of Republic of ∗Corresponding Corresponding author: Milan Tuba supported by the Ministry of Education, Science and Technological Development of Republic of knowledge from the set of given data, search for certain patterns and after that make their own decisions based Serbia, Grant No. III-44006.
Serbia, Grant No. III-44006. E-mail address:
[email protected]. ∗E-mail Corresponding author: Milan Tuba supported by the Ministry of Education, Science and Technological Development of Republic of address:
[email protected]. Serbia, Grant No. III-44006. ∗ Corresponding author: Milan Tubabysupported by the Ministry of Education, Science and Technological Development of Republic of 1877-0509 © 2020 The Authors. Published Elsevier B.V. E-mail address:
[email protected]. Serbia, III-44006. This is anGrant open No. access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review responsibility of the scientific committee of the 7th International Conference on Information Technology and E-mailunder address:
[email protected]. Quantitative Management (ITQM 2019) 10.1016/j.procs.2019.11.289
308
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
on the learned facts. Nowadays, machine learning algorithms are widespread and studied since they are used in various fields such as medicine, bioinformatics [1], economy [2], agriculture [3], robotics [4], etc. Classification is a supervised learning task where the output is categorical, the class where certain instance belongs. Supervised learning is a process where the goal is to generate a decision model that will correctly classify unknown instances based on a model build on the training set where the classes of instances are known. Decision model search for patterns in training set data that will enable the classification of new unknown instances. The classification of medical datasets is a challenging problem since it usually contains a lot of features and instances. Search for more accurate and faster classification methods in CAD systems is lead by the importance of early and precise diagnose for patient recovery. Each instance used in a classification task is represented by several features that can be numerical or categorical. The classification accuracy, besides classification method, highly depends on the chosen feature set which should enable to classification method to separate instances from different classes and to find similarities of instances that are in the same class. It is hard to determine which features will describe an instance in this way so the usual strategy while collecting data is to describe instances with as much as possible features and then decide which are important. Too many features can lead to the problem where the impact of the main differences and similarities in the decision model decrease since it will try to include all possible information. This is why the feature selection problem attracts scientists and why it becomes one of the research topics. The goal of the feature selection methods is to determine the minimal subset of feature that provides the best classification. The feature selection problem is an exponential problem. For a set with the n features there are 2n subsets which means that exhaustive search is not possible (in reasonable time) even for rather small values of n. For solving problems like this, metaheuristics such as swarm intelligence algorithms can be used. In this paper we propose the brain storm optimization algorithm (BSO) for solving the feature selection problem in classification of medical datasets. For classification, we propose support vector machine (SVM) where parameters are tuned also by BSO algorithm. In order to test the quality of the proposed BSO-SVM classification method for medical datasets, the results are compared to method from literature. The structure of the paper is as follows. Section 2 is dedicated to literature review. Section 3 describes feature selection problem and gives a short description of support vector machine. Brain storm optimization algorithm adjusted for solving feature selection problem and for finding the optimal SVM parameters is presented in Section 4. Simulation results are given and discussed in Section 5. The paper is concluded and plans for future work are given in Section 6. 2. Literature Review Finding the optimal subset of features for classification and optimization of SVM are hard optimization problems thus deterministic approaches are useless for solving them. Instead nature inspired population based methods such as genetic algorithm (GA), artificial bee colony (ABC), particle swarm optimization (PSO) and others can be applied. In the past these algorithms were commonly used for classification of medical datasets. Classification is a rather common problem in different computer aided diagnostic (CAD) systems where feature selection is crucial. Medical datasets are usually rather large in sense of the number of features so feature selection has a significant impact on classification accuracy. In [5] a method for diabetes diagnosis based on k-means and genetic algorithm was proposed. In order to clean the data and remove the noise, k-means algorithm was used. The optimal feature subset was searched by genetic algorithm and the classification was performed by SVM. Compare to other existing methods, the method presented in [5] achieved higher classification accuracy. Two methods that use particle swarm optimization for solving feature selection problem while analyzing medical data were proposed in [6]. They combined PSO and the rough set theory. Two ideas are to start with the empty set of features and add one by one feature and to start with the random subset of features that is further improved by PSO. In [7] another PSO based feature selection method was proposed. In [7] classification was done by support vector machine and the final goal was to predict mortality of septic patients. Authors proposed modification of binary PSO for feature selection and the standard PSO was adjusted for tuning SVM. The proposed method was
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
309
compared to genetic algorithm and the original PSO and the results showed that the accuracy was improved while simultaneously lowering the size of the feature subset. In [8], artificial bee colony based feature selection method for liver diseases, hepatitis and diabetes diagnostic was proposed. Support vector machine was used for classification. Clustering algorithm was used for feature selection and artificial bee colony searched for cluster centers. Support vector machine was not optimized. Each dataset was divided to two parts, one used for training the SVM and the second part was used to test the quality of the obtained model. Achieved results were compared to numerous methods from literature and they were improved by the proposed method. Another artificial bee colony based method for finding the optimas feature subset in medical databases were proposed in [9]. The goal was to correctly diagnose cardiovascular disease. Artificial bee colony algorithm was adjusted to search the optimal feature subset while classification was done by SVM. The ABC method achieved better results compare to feature selection with reverse ranking. A method for peak detection in EEG signals based on PSO was presented in [10]. The PSO was used for feature selection as well as for SVM parameters’ tuning. In [10] methods with the original and modified PSO were proposed. The highest classification accuracy reported in [10] was 98.59%. 3. Feature Selection Problem and Support Vector Machine Classification represents a rather important problem in various fields and one of them is medicine. Since classification affects large number of applications different classification methods can be found in the literature such as decision trees, random forests, artificial neural networks, support vector machines, Naive Bayes classifier, etc. In order to achieve high classification accuracy by any of classifiers it is necessary to choose a good feature set to describes each instance. If the feature set is too small, it will not contain enough information to describe similarities and differences between instances while too large feature sets will decrease the impact of features that truly separate instances from different classes and show the similarities between instances inside the same class. It is obvious that feature selection is one of the most important steps for achieving good classification accuracy. One of the well-known methods for lowering the number of features used for classification is principal component analysis (PCA). The PCA method transforms the original feature set to a new space where the obtained features are orthogonal and named principal components. The maximal variation in the original dataset is represented by the first principal component. The problem with the PCA method is that it does not preserve the original features which is sometimes necessary. In order to preserve features numerous methods based on swarm intelligence algorithms were proposed as it was described in the previous section. Besides selecting the appropriate feature set, classification accuracy depends on the classifier. Each classification method has some parameters that affect the accuracy of the created model. In this research, the support vector machine was used for classification. SVM parameters and their role in classification are explained in the following part of the paper. Support vector machine (SVM) is one of the most used classification methods. It is commonly used in medical applications such as classification of anomalies in brain [11], recognition of bloody areas in wireless endoscopic images [12], lung disease detection [13], Alzheimer disease diagnosis [14], retinal vein detection [15], etc. SVM was presented more than twenty years ago by Vapnik [16]. SVM is actually a binary classifier but nowadays there are different techniques that are enabled to a binary classifier to perform multi-classification. Two well known and widely used methods are one-against-one that builds a model for each pair of classes and the class is assigned by the majority voices, and one-against-all method that classifies each class against all others. In the best case scenario, each instance will be classified as part of one class in one of the models and in all other cases to the other classes. Naturally, these will not be always the case so there are also various methods to deal with other scenarios. In this research SVM was used from the LibSVM toolbox [17] where multi-classification is implemented by using one-against-one technique and when the same number of voices from different classes have been assigned to one instance the final class is chosen randomly. The support vector machine separates instances from different classes by hyperplane which exists if the dataset contains linearly separable instances. The optimal hyperplane divides datasets so all instances from the same class are on the one side of the hyperplane and it is as far as possible from all instances.
310
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
If we represent the training set with n instances by vectors xi ∈ Rd (d is equal to the number of features, i = 1, 2, ..., n) and the corresponding classes for each instance are marked as yi ∈ {−1, 1}, where, then the hyperplanes that lie on the edge of each class are determined as: (w · xi + b) = 1,
(1)
(w · xi + b) = −1 2 In order to obtain the optimal hyperplane, it is necessary to maximize the distance ( ||w|| ) between these two hyperplanes, i.e minimization of ||w|| is needed. Since each instance has to be outside of the space between two hyperplanes defined by Eq.(2), hyperplanes defined by the support vectors are:
yi (w · xi + b) ≥ 1 for 1 ≤ i ≤ n.
(2)
The described model does not allow instances inside the margin or wrongly classified instances which is completely useless for real-life datasets that contain outliers and/or mistakes in labels. In order to adjust this theoretical model to real-life applications, the idea of imperfect classification is introduced. The idea is to instead of previous so called hard margin, the soft margin is used and it is defined as follow: yi (w · xi + b) ≥ 1 − ϵi ,
ϵi ≥ 0,
1≤i≤n
(3)
where wrongly classified instances are allowed by the slack variables ϵi . When the soft margin is used, the support vector machine model is created by finding the solution of the following quadric programming problem: n
∑ 1 ||w||2 + C ϵi , 2 i=1
(4)
K(xi , x j ) = exp(−γ||xi − x j ||2 ).
(5)
where C is the soft margin parameter that is used for controlling the influence of wrongly classified instances, i.e. instances that are not on the right side of the hyperplane. Large values for this parameter will result with the hard margin, i.e. misclassified instances are not permitted On the other hand too low values of the parameter C will lead to the classification that is basically equal to setting the random hyperplane. It is needless to say that the proper value for the soft margin parameter C is crucial for a good classification model. Since both models, hard and soft margin, consider a certain amount of linear separability of instances, the usage of SVM would be rather limited. The power of SVM lies in the possibility to classify even more complex datasets by introducing the kernel trick. The idea is that if data are not linearly separable as they are, they can be in some higher-dimensional space. In order to map data into higher dimensional space, the dot product is replaced by a kernel function, i.e. a function that has to satisfy certain conditions (more precisely Mercer’s condition). There are several functions commonly applied and one of them is radial basis function (RBF):
In the RBF function, γ represents a free parameter that has a high influence on the quality of the classification. It defines the influence of each instance to the model that is creating. In this paper, we will consider tuning two parameters of the SVM, C and γ. It is not enough to search for parameter values separately, but it should be found the optimal pair of values for each considered dataset. Since the search space for both parameters is infinitive and the fitness landscape is rather complicated, there is no deterministic method that can be used for setting values for pair (C, γ). One of the simplest and most basic algorithms for setting SVM parameters is grid search: SVM model is generated for different pairs (C, γ) where C and γ have predefined possible values. The SVM model that obtains the highest accuracy is chosen. This can be the starting point for a search for the optimal values for (C, γ) but since classification accuracy represents a multimodal function, it is necessary to use a more sophisticated search such as swarm intelligence algorithms. In this paper we propose one of the swarm intelligence algorithms for feature selection and SVM parameters tuning. We propose adjusted brain storm optimization algorithm.
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
311
4. Brain Storm Optimization Algorithm for Feature Selection and SVM Parameters Tuning Brain storm optimization algorithm (BSO) is one of the well-known nature-inspired population-based algorithm that belongs to the category of swarm intelligence. The human idea generation process, also known as the brainstorming process, was used as an inspiration for this algorithm. The BSO algorithm was presented by Yuhui Shi in 2011 [18]. This algorithm was applied to various hard optimization problems such as path planning [19], optimal satellite configuration [20], clustering optimization [21], energy optimization in grid systems [22], drone placement for optimal coverage [23], etc. In the BSO algorithm ideas are the simple agents, potential solutions represented as d dimensional vectors where d is the problem dimension. In the beginning, n random solutions are generated inside the search space and arranged into m clusters, usually by using the well-known k-means clustering algorithm. In each of the clusters, the fittest solution is set as a cluster center. In each iteration, new solutions are generated by modifying a single solution or by combining two solutions from different clusters. The BSO algorithm has several parameters that represent the probabilities of how the new solution will be generated. The pseudocode of the algorithm is presented in Algorithm 1. Algorithm 1 Pseudo-code of the brain storm optimization algorithm 1: Initialization phase 2: Generate initial population, i.e. n random solutions. 3: repeat 4: Divide population into m clusters by k-means algorithm. 5: Evaluate all solutions and use the best one in each cluster as centers. 6: r =rand(0,1) 7: if rand(0, 1) < p5a then 8: Select a random cluster center and replace it with new random solution. 9: end if 10: repeat 11: Generate new solutions. 12: r =rand(0,1) 13: if r < p6b then 14: Choose a cluster c x with probability p6bi . 15: r1 =rand(0,1) 16: if r1 < p6bii then 17: Potentially change the c x cluster center by adding random value. 18: else 19: Potentially change a random solution from the c x by adding random value. 20: end if 21: else 22: Select two random clusters. 23: r2 =rand(0,1) 24: if r2 < p6c then 25: New solution is combination of two cluster centers. 26: else 27: New solution is combination of two random solutions from the chosen clusters. 28: end if 29: end if 30: Keep better solution between old and new one. 31: until n new solution is generated. 32: until iteration number equal to maximal iteration number
The probability p6b represents the probability that one solution, x selected , will be chosen to potentially be replaced by the new one (the better solution is always saved). The new solutions xnew is generated as: xnew = x selected + ζ ∗ n(µ, σ)
(6)
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
312
where n(µ, σ) is chosen from the Gaussian distribution (mean=µ, variance=σ), ζ is the constant for controlling Gaussian random value impact obtained as: ζ = logsig((0.5 ∗ maxIteration − currentIteration)/k) ∗ rand()
(7)
where parameter k changes the logsig() (logarithmic sigmoid transfer function) function’s slope and rand() is random value. The probability of selecting the certain cluster from which the solution will be chosen is equal to the algorithm’s parameter p6bi and it is proportional to the number of the solution inside the clusters. The algorithm’s parameters p6bii and p6c represent the probabilities of selecting cluster centers or random solutions from a certain cluster. If it is decided that two solutions will be combined (with probability 1 − p6b ), the new solution will be the average of the selected solutions. In the BSO algorithm, exploration is performed by generating a new random solution that will replace the cluster’s center with probability p5a . In this paper we present brain storm optimization algorithm adjusted and used for feature selection problem and setting metaparameters of the SVM. In order to use the BSO algorithm for the feature selection problem, it was necessary to modify it to has a binary solution vector. The lower and upper bounders for this problem were set to 0 and 1, respectively. The original BSO algorithm was used with one additional step performed before the solution evaluation. The BSO algorithm will generate continuous solutions in the range [0, 1] and in order to obtain discrete values, we used threshold method. For each dimension in the vector si is set to 0 or 1 by the following function: { 1, i f sij > th j si = (8) 0, otherwise
where sij is the value of dimension j in solution i, j = 1, 2, . . . , d, i = 1, 2, . . . , n and th is the constant value determined at the beginning and represents the threshold value. If dimension j has value 0, it means that the feature j will not be used for classification. If the value of dimesion j is equal to one, then that feature is included to chosen feature subset. This simple modification is commonly used for feature selection problem [24, 25]. The problem dimension is equal to the number of features in a dataset increased by two which is the number of SVM parameters that need to be tuned. Soft margin parameter C and kernel function parameter γ are real values, so no additional step is needed. The only adaptation is that C and γ are searched in the logarithmic solution space. Each of these two parameters have their own lower and upper bounders. The binarization process is applied only to feature dimensions that have range [0,1], and values for C and γ are searched in log2 -space where the limits for C were log2 C ∈ [−5, 15] and γ limits were log2 γ ∈ [−15, 5]. The objective function was classification accuracy measured by 10-fold cross-validation combined by the number of chosen features. We used the same objective function that was proposed in [9]: noFeatures , λ = 10log(totalNoFeatures)+1 (9) totalNoFeatures where acc represents the classification accuracy, noFeatures is the number of selected features while totalNoFeatures is the total number of features in the dataset. The λ that depends on the number of features is for normalization and controls the tradeoff between accuracy and the number of features. f (x) = (100 − acc) + λ
5. Simulation Results The proposed BSO algorithm used for finding the optimal feature subset and also for tuning parameters C and γ was coded in Matlab R2016a where LibSVM toolbox [17] was used for SVM. All test were performed on the R CoreTM i7-3770K CPU at 4GHz, 8GB RAM and Windows 10 Professional OS. platform with Intel ⃝ The proposed BSO method was used for three medical datasets available on UCI machine learning repository [26]. We used the same datasets as in [8]: Hepatitis dataset, Liver disorder and Diabetes dataset. These datasets are commonly used for testing classification methods. The details about the used datasets are presented in the Table 1. The missing values in datasets were replaced by a median of the missing feature. Each dataset was divided into the training and test set by choosing randomly 75% of instances for training and the rest of them were in test set.
E. Tuba al. / Procedia Computer Science Science 00 (2019) 000–000 EvaetTuba et al. / Procedia Computer 162 (2019) 307–315
313
Table 1: Medical datasets from UCI machine learning repository No. of No. of No. of No Dataset classes instances features 1 Hepatitis 2 155 19 2 Liver 2 345 7 3 Diabetes 7 768 8
Parameters of the proposed BSO method were set as follows. Population size n was 40 while the number of iterations was limited to 150 which gives 6000 fitness function evaluations (same as in [8]). The other parameters n of the BSO algorithm: p5a was 0.2, p6b was 0.8, p6bii and p6c were 0.4, p6bi is equal to nci , number of individuals divided by the population size. In order to determine the actual power of the proposed method, we presented a comparison with the results reported in [8]. For each dataset, they include different papers (details in [8]), and we also used some of these results in this paper while some of the worst cases were excluded due to the space limitation. Classification accuracies obtained by the proposed BSO-SVM algorithm and by the methods from [8] are presented in Table 2. Better results are printed in bold. Table 2: Comparison of classification accuracies obtained by the proposed method and methods from literature ([8]) Hepatitis Liver Diabetes Method Accuracy Method Accuracy Method Accuracy FS-AIRS with fuzzy res. 92.59 SSVM 70.33 AWAIS 75.87 FS-Fuzzy-AIRS 94.12 HNFB-1 73.33 PCA + ANFIS 89.47 AIRS 76.00 AWAIS 70.17 LS-SVM 78.21 PCA-AIRS 94.12 Fuzzy SVM 70.85 GDA-LS-SVM 82.05 ANN and AIS 96.80 CBP + PSO 76.81 ANN + FNN 84.20 LDA-ANFIS 94.16 SVM with GP 69.70 Clustered-HMLP 80.59 MLNN (MLP) + LM 91.87 PSO + 1-NN 68.99 ARI + NN 81.28 ABCFS + SVM 94.92 ABCFS + SVM 82.55 ABCFS + SVM 89.97 BSO + SVM 97.16 BSO + SVM 84.31 BSO + SVM 91.46 As it can be seen from the Table2, our BSO based method achieved better classification accuracy in comparison with other methods presented in [8]. The method proposed in [8] was the second-best for all considered datasets. The ABCFS-SVM method achieves this accuracy by using 11 features out of 19 for Hepatitis dataset, 5 features out of 6 for Liver disorder dataset and 6 out of 8 features for Diabetes dataset. Our proposed BSO-SVM method used 10 features in the most runs for the Hepatitis dataset, which is one less compared to ABCFS-SVM method and the accuracy was improved from 94.92% to 97.16%. For the second dataset, Liver disorder, the same number of features was used, 5, but due to SVM parameter optimization, again the better classification accuracy was obtained, 82.55% compared to 84.31%. For the last dataset, the proposed BSO-SVM method again used the same number of features as ABCFS-SVM method. 6. Conclusion In this research, we proposed the brain storm optimization algorithm for performing feature selection and for tuning parameters of the SVM used for medical datasets classification. The BSO algorithm was adjusted for a binary solution in order to perform feature selection. The fitness function combines classification accuracy with the number of selected features. The proposed method was tested on three well known medical datasets:
314
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
Hepatitis, Liver disorder and Diabetes dataset. Obtained results were compared to other state-of-the-art methods and the proposed method achieved better classification accuracies for all three considered datasets while reducing or keeping the same number of features that are used. In future work, larger datasets can be tested as well as different fitness functions. Another direction of future research is different methods of adaptations swarm intelligence algorithms to discrete problems. Acknowledgment Research has been supported by the Ministry of Education, Science and Technological Development of Republic of Serbia, Grant No. III-44006. References [1] J. Rahman, N. I. Mondal, K. B. Islam, et al., Feature fusion based SVM classifier for protein subcellular localization prediction, Journal of integrative bioinformatics 13 (1) (2016) 23–33. [2] T. H. Nguyen, K. Shirai, J. Velcin, Sentiment analysis on social media for stock movement prediction, Expert Systems with Applications 42 (24) (2015) 9603–9611. [3] P. Velmurugan, M. Renukadevi, Detection of unhealthy region of plant leaves and classification of plant leaf diseases using texture based clustering features, Artificial Intelligent Systems and Machine Learning 9 (1) (2017) 8–10. [4] C. Premebida, D. R. Faria, U. Nunes, Dynamic bayesian network for semantic place classification in mobile robotics, Autonomous Robots 41 (5) (2017) 1161–1172. [5] T. Santhanam, M. Padmavathi, Application of k-means and genetic algorithms for dimension reduction by integrating SVM for diabetes diagnosis, Procedia Computer Science 47 (2015) 76–83. [6] H. H. Inbarani, A. T. Azar, G. Jothi, Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis, Computer methods and programs in biomedicine 113 (1) (2014) 175–185. [7] S. M. Vieira, L. F. Mendonc¸a, G. J. Farinha, J. M. Sousa, Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients, Applied Soft Computing 13 (8) (2013) 3494–3504. [8] M. S. Uzer, N. Yilmaz, O. Inan, Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classification, The Scientific World Journal ArticleID 419187 (2013) 1–10. [9] B. Subanya, R. Rajalaxmi, Feature selection using artificial bee colony for cardiovascular disease classification, in: 2014 International Conference on Electronics and Communication Systems (ICECS), IEEE, 2014, pp. 1–6. [10] A. Adam, M. I. Shapiai, M. Tumari, M. Zaidi, M. S. Mohamad, M. Mubin, Feature selection and classifier parameters estimation for EEG signals peak detection using particle swarm optimization, The Scientific World Journal ArticleID 973063 (2014) 1–13. [11] A. Larroza, D. Moratal, A. Paredes-Sanchez, E. Soria-Olivas, M. L. Chust, L. A. Arribas, E. Arana, Support vector machine classification of brain metastasis and radiation necrosis based on texture analysis in MRI, Journal of Magnetic Resonance Imaging 42 (5) (2015) 1362– 1368. doi:10.1002/jmri.24913. [12] E. Tuba, M. Tuba, R. Jovanovic, An algorithm for automated segmentation for bleeding detection in endoscopic images, in: International Joint Conference onNeural Networks (IJCNN), IEEE, 2017, pp. 4579–4586. [13] M. Keshani, Z. Azimifar, F. Tajeripour, R. Boostani, Lung nodule segmentation and recognition using SVM classifier and active contour modeling: A complete intelligent system, Computers in biology and medicine 43 (4) (2013) 287–300. [14] N. Zeng, H. Qiu, Z. Wang, W. Liu, H. Zhang, Y. Li, A new switching-delayed-PSO-based optimized SVM algorithm for diagnosis of Alzheimer¢¢s disease, Neurocomputing 320 (2018) 195–202. [15] E. Tuba, L. Mrkela, M. Tuba, Retinal blood vessel segmentation by support vector machine classification, in: 27th International Conference Radioelektronika, IEEE, 2017, pp. 1–6. [16] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3) (1995) 273–297. doi:10.1007/BF00994018. [17] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (3) (2011) 27:1–27:27. doi:10.1145/1961189.1961199. [18] Y. Shi, Brain storm optimization algorithm, in: Y. Tan, Y. Shi, Y. Chai, G. Wang (Eds.), Advances in Swarm Intelligence,LNCS, Vol. 6728, Springer Berlin Heidelberg, 2011, pp. 303–309. [19] E. Dolicanin, I. Fetahovic, E. Tuba, R. Capor-Hrosik, M. Tuba, Unmanned combat aerial vehicle path planning by brain storm optimization algorithm, Studies in Informatics and Control 27 (1) (2018) 15–24. [20] C. Sun, H. Duan, Y. Shi, Optimal satellite formation reconfiguration based on closed-loop brain storm optimization, IEEE Computational Intelligence Magazine 8 (4) (2013) 39–51. [21] E. Tuba, I. Strumberger, N. Bacanin, D. Zivkovic, M. Tuba, Cooperative clustering algorithm based on brain storm optimization and k-means, in: 2018 28th International Conference Radioelektronika, IEEE, 2018, pp. 1–5. [22] M. Arsuaga-R´ıos, M. A. Vega-Rodr´ıguez, Multi-objective energy optimization in grid systems from a brain storming strategy, Soft Computing 19 (11) (2015) 3159–3172. [23] E. Tuba, R. Capor-Hrosik, A. Alihodzic, M. Tuba, Drone placement for optimal coverage by brain storm optimization algorithm, in: International Conference on Health Information Science, Springer, 2017, pp. 167–176. [24] E. Emary, H. M. Zawbaa, C. Grosan, A. E. Hassenian, Feature subset selection approach by gray-wolf optimization, in: Afro-European Conference for Industrial Advancement, Springer, 2015, pp. 1–13.
Eva Tuba et al. / Procedia Computer Science 162 (2019) 307–315 E. Tuba et al. / Procedia Computer Science 00 (2019) 000–000
315
[25] E. Emary, H. M. Zawbaa, A. E. Hassanien, Binary grey wolf optimization approaches for feature selection, Neurocomputing 172 (2016) 371–381. [26] M. Lichman, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, URL http://archive.ics.uci.edu/ml (2013).