Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection

Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection

Boosting Salp Swarm Algorithm by Sine Cosine algorithm and Disrupt Operator for Feature Selection Journal Pre-proof Boosting Salp Swarm Algorithm by...

8MB Sizes 0 Downloads 67 Views

Boosting Salp Swarm Algorithm by Sine Cosine algorithm and Disrupt Operator for Feature Selection

Journal Pre-proof

Boosting Salp Swarm Algorithm by Sine Cosine algorithm and Disrupt Operator for Feature Selection Nabil Neggaz, Ahmed A. Ewees, Mohamed Abd Elaziz, Majdi Mafarja PII: DOI: Reference:

S0957-4174(19)30820-6 https://doi.org/10.1016/j.eswa.2019.113103 ESWA 113103

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

2 May 2019 25 November 2019 25 November 2019

Please cite this article as: Nabil Neggaz, Ahmed A. Ewees, Mohamed Abd Elaziz, Majdi Mafarja, Boosting Salp Swarm Algorithm by Sine Cosine algorithm and Disrupt Operator for Feature Selection, Expert Systems With Applications (2019), doi: https://doi.org/10.1016/j.eswa.2019.113103

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Highlights • Propose a novel FS method, called ISSAFD, which improves Salp Swarm Algorithm (SSA). • ISSAFS enhanced the followers in SSA using SCA and Disrupt operator (DO). • Evaluating the influence of the operators of SCA on the behavior of leaders in SSA. • Comparing the performance of ISSAFD with swarm intelligence (SI). • The proposed ISSAFD provided better results in terms of performance measures.

1

Boosting Salp Swarm Algorithm by Sine Cosine algorithm and Disrupt Operator for Feature Selection Nabil Neggaza , Ahmed A. Eweesb , Mohamed Abd Elaziz∗,c,e , Majdi Mafarjad a

Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB, BP 1505, EL M’naouer, 31000 Oran-Algérie - Laboratoire Signal Image PArole (SIMPA)-Département d’informatique Faculté des Mathématiques et Informatique [email protected]/[email protected] b Department of Computer, Damietta University, Egypt [email protected], [email protected] c Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt [email protected] d Department of Computer Science, Birzeit University,Birzeit Palestine [email protected] e School of Computer Science & Technology, Huazhong university of Science and Technology, Wuhan 430074, China. Corresponding Author: Mohamed Abd Elaziz ([email protected])

Abstract Features Selection (FS) plays an important role in enhancing the performance of machine learning techniques in terms of accuracy and response time. As FS is known to be an NP-hard problem, the aim of this paper is to introduce basically a new variant of Salp Swarm Optimizer (SSA) for FS (called ISSAFD (Improved Followers of Salp swarm Algorithm using Sine Cosine algorithm and Disrupt Operator), that updates the position of followers (F) in SSA using sinusoidal mathematical functions that were inspired from the Sine Cosine Algorithm (SCA). This enhancement helps to improve the exploration phase and to avoid stagnation in a local area. Moreover, the Disruption Operator (Dop ) is applied for all solutions, in order to enhance the population diversity and to maintain the balance between exploration and exploitation processes. Two other variants of SSA are developed based on SCA called ISSALD (Improved Leaders of Salp swarm Algorithm using Sine Cosine algorithm and Disrupt Operator) and ISSAF (Improved FollowPreprint submitted to Elsevier

November 26, 2019

ers of Salp swarm Algorithm using Sine Cosine algorithm). The updating process in consists to update the leaders (L) position by SCA and applying (Dop ), whereas in ISSAF, the Dop is omitted and the position of followers is updated by SCA. Experimental results are evaluated on twenty datasets where four of them represent high dimensionality with a small number of instances. The obtained results show a good performance of ISSAFD in terms of accuracy, sensitivity, specificity, and the number of selected features in comparison with other metaheuristics (MH). Keywords: Feature Selection (F S); Disruption Operator (Dop ); Salp Swarm Algorithm (SSA); Sine Cosine Algorithm (SCA); Metaheuristics (MH).

1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1. Introduction Feature Selection (FS) is a preprocessing step that proved its efficiency in improving the performance of different learning techniques, in terms of enhancing their quality and reducing the required computational time for learning (Liu & Motoda, 2012). The importance of FS methods is due to the availability of redundant and/or irrelevant features in the datasets, which negatively influence the performance of the learning algorithms. FS problem can be classified as a searching problem since it aims to search for the minimum number of features that represent the original feature set without information loss (Liu & Motoda, 2012). With the advancement of the data collection tools, a huge amount of features becomes available in the datasets in most of the real world fields such as medical, biology and telecommunications industry. Thus, analyzing such amounts of data became impractical. Consequently, searching for the representative features is a time consuming and complicated process since, it is exhaustive search strategy (generate all possible feature subsets to select only one subset) (Talbi, 2009). In recent years, metaheuristics (MH) algorithms have been widely used to tackle different optimization problems including the FS problem (Silva et al., 2018; Guyon & Elisseeff, 2003). According to (Yang, 2013), metaheuristics algorithms, especially the nature-inspired ones, proved their ability to outperform traditional and deterministic methods in tackling different optimization problems in science, engineering, and industry. Good examples of those algorithms the categorized as Swarm Intelligence (SI) such as Particle Swarm Optimization (PSO) (Eberhart & Kennedy, 1995), Ant Colony Optimization 2

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

52 53 54

55 56

57 58 59 60

(ACO) (Dorigo et al., 1996), Salp Swarm Algorithm (SSA) (Mirjalili, 2015), and Sine Cosine Algorithm (SCA) (Mirjalili, 2016). Besides the searching strategies that can be employed to determine the representative features, evaluating the selected feature subsets is another aspect of FS process. Filters, wrappers and embedded are three categories of FS methods based on the subset evaluation criteria. More details about the three models can be found in (Liu & Motoda, 2012). Recently, wrapper approaches attracted the attention of many researchers in the literature, due to the involvement of the learning algorithm in the selection process. Hence the selection of a feature is based on the resulting performance of the learning algorithm (e.g., classification accuracy for a specific classifier) (Kohavi & John, 1997). Different classification techniques have been widely used in different FS methods. For example, K-nearest Neighbor (KNN), Decision Tree (DT), and Artificial Neural Networks (ANN). In this paper, an alternative FS approach, called ISSAFD, is proposed which depends on improving the performance of slap swarm algorithm using sine cosine algorithm and the disrupt operator (Dop ). In ISSAFD, the process of updating the population is realized by incorporating two strategies. The first one aims to improve the leaders by employing the standard operator of the SSA, while the second strategy, updates the position of followers using sine/cosine operators imitated from the SCA. This cooperation enhances the behavior of convergence ability. Whereas, the DO operator allows to produce several diversified solutions in the search space, which is necessary for such algorithm. Then, the best solution which has the smallest value of fitness is reached. The previous process is repeated a certain number of iterations in order to obtain the nearest or global optimal. The major contributions of this paper are as follows: • Introducing a novel algorithm for FS called ISSAFD that combines the merits of both SSA, SCA algorithms and DO, with the purpose to enhance the behavior of followers in SSA • Evaluating the influence of the operators of SCA on the behavior of leaders in SSA. • Comparing the performance of ISSAFD with swarm intelligence (SI) algorithms such as SCA, SSA, Grey Wolf Optimizer (GWO), Ant Lion Optimization (ALO), Particle Swarm Optimization (PSO) and bioinspired method known as Genetic Algorithm (GA). Furthermore, a 3

61 62

fair comparison is realized with some works of the literature in terms of accuracy and the selected number of features.

68

The rest of the paper is organized as follows: the recent FS approaches in literature are presented in section 2, followed by a description of the used algorithms in Section 3. Section 4 describes the details of the proposed approach (ISSAFD). The setup of all experiments done in this paper along with the obtained results are discussed in Section 5. Finally, the conclusion and the future directions are drawn in Section 6.

69

2. Related Works

63 64 65 66 67

70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

Recently, various SI algorithms have been utilized as search strategies in different wrapper FS methods (Faris et al., 2018; Hafez et al., 2016; Mafarja & Mirjalili, 2018; Emary et al., 2016a; Ibrahim et al., 2017; Zhang et al., 2018; Ghimatgar et al., 2018). PSO, as a primary SI algorithm, has been widely used with FS methods. (Mafarja et al.) and (Mafarja & Sabar) are two recent approaches that employ two variants of PSO algorithm as searching strategies in wrapper FS approaches. Recently, a hybrid approach between PSO and Shuffled Frog Leaping Algorithm (SFLA) was proposed in (Rajamohana & Umamaheswari, 2018) to improve the accuracy of fake reviews identification. Moreover, different evolutionary algorithms (EAs) (e.g., Genetic Algorithms (GA) and Differential Evolution (DE)) have been widely employed as searching strategies in different FS approaches in order to determine the best subset of features (Dong et al., 2018; Hancer et al., 2018; Lensen et al., 2018; Elaziz et al., 2019). In the recent years, the intention to use cooperative metaheuristics (CMH) has risen and many approaches were proposed in the literature. Their results were competitive in solving many optimization problems including FS (Silva et al., 2018; Elaziz et al., 2017b). (Tawhid & Dsouza, 2018) proposed a novel synergy between Bat Algorithm (BA) and PSO called Hybrid Binary Bat Enhanced Particle Swarm Optimization Algorithm(HBBEPSO) for FS. In HBBEPSO, the exploration capabilities of BA were combined with PSO capabilities, which produce a new approach that is able to converge to the best global solution in the search space. In (Chen et al., 2018), an enhanced PSO approach with two crossover operators was proposed to tackle the FS problem. ACO algorithm was also applied in many FS methods. For instance, (Shunmugapriya & Kanmani, 2017) proposed FS approach that com4

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133

bines the characteristics of ACO with Artificial Bees Colony (ABC) (called (AC-ABC) to enhance the search process. A Binary Butterfly Optimization (BOA) based FS approach has been proposed by (Arora & Anand, 2019). In (Mafarja et al., 2019), three variants of Binary Grasshopper Optimization Algorithm (BGOA), known as BGOA using sigmoid function (BGOA-S), BGOA using V shaped (BGOA-V), and (BGOA based on Mutation (BGOA-M), were proposed. In these approaches, the crossover and mutation operators from the GA algorithm were employed to enhance the performing of the GOA algorithms, and the results were promising and better than basic approaches. Another GOA based FS approach was proposed in (Zakeri & Hokmabadi, 2019). The SSA is a recent metaheuristic algorithm that mimics the behavior of salps in the deep oceans. However, it has been used as a search strategy in many FS approaches (Aljarah et al., 2018; Faris et al., 2018). The experimental results in both works proved the ability of the SSA to outperform other optimizers. Moreover, another SSA based FS was proposed in (Ahmed et al., 2018), where a set of chaotic maps were used to control the balance between exploration and exploitation in the SSA algorithm. (Sayed et al., 2018) proposed a chaotic based SSA for global optimization and feature selection. A new method of wrapper FS is created based on mutation operator and SSA called SSAMUT in (Khamees et al., 2018). In addition, (Ibrahim et al., 2019), proposed a new hybridization between SSA and PSO called SSAPSO which allows to enhance the efficiency of exploration and exploitation steps. Furthermore, (Baliarsingh et al., 2019) proposed a weighted chaotic SSA named as WCSSA for genomic high dimensional data. This algorithm aims to seek simultaneously the optimal gene selection and the kernel parameters of extreme machine learning (ELM). Another inspiration based on sine and cosine functions is developed for FS. As example, (Sindhu et al., 2017) proposed a novel FS method based on an improved SCA variant called (ICSA). In ICSA, an elitism strategy is used to select the global solution, and a new updating mechanism for the new solution was proposed. As other global optimization algorithms, SCA suffers from the stagnation in local optima. To overcome this drawback, (Elaziz et al., 2017b) proposed a hybrid model between the SCA and the differential evolution operators that served as local search methods. This model helps the SCA algorithm to avoid the local optima. As can be concluded from the previous studies, both SSA and SCA algorithms suffer from stagnation in local optima and low convergence rates. 5

134

135 136

137 138 139 140 141 142

3. Background In this section, the general concept of sine cosine algorithm (SCA) and salp swarm algorithm (SSA) and Dop are described. 3.1. Sine cosine algorithm The SCA algorithm is a new method that belongs to the class of populationbased optimization techniques. This algorithm is introduced by (Mirjalili, 2016). The particularity of this algorithm lies in the movement of search agents that uses two mathematical operators based on the sine and cosine functions as in Eqs.(1) or (2), respectively: t Xijt+1 = Xijt + r1 × sin(r2 ) × |r3 XBest − Xijt | if r4 < 0.5 j

(1)

t Xijt+1 = Xijt + r1 × cos(r2 ) × |r3 XBest − Xjt | if r4 ≥ 0.5 j

(2)

t is the target solution in j th dimension at t iteration, Xijt is the where XBest j current solution in j th dimension at t iteration, |.| indicates the absolute cost. r1 , r2 , r3 and r4 are random numbers. The parameter r1 controls the balance between exploration and exploitation. This parameter is modified during the iterations by using the following formula:

r1 = a − t 143 144 145 146 147 148 149 150

a T

(3)

where t is the current iteration, T , is the maximum number of iteration and a is a constant. r2 determines the direction of the movement of the next solution if it towards or outwards target. r3 indicates the weight for the best solution in order to stochastically emphasize (r3 >1) or de-emphasize (r3 <1) the effect of destination in defining the distance (Elaziz et al., 2017a). The parameter r4 allows switching between sine and cosine or vice versa using Eqs (1) and (2). The general framework of the SCA is depicted in Algorithm 1.

6

151

152 153 154 155 156 157 158 159 160 161

1: 2: 3: 4: 5: 6: 7: 8: 9:

Algorithm 1: Sine cosine algorithm (SCA) Initialize N solutions. Set the initial iteration number t := 0. repeat Evaluate each solution and we determine the best solution. Update random parameters r1 , r2 , r3 and r4 . Update the position of search agents using Eqs. (1) and (2). Set t = t + 1. until (t
3.2. Salp Swarm Algorithm The SSA is a new swarm intelligence (SI) algorithm, that was developed recently by (Mirjalili et al., 2017). The principal idea behind the operators of the SSA is that they imitate the swarming behavior of salps in deep oceans. Salps belong to the species of Salpidae and have a transparent barrel-shaped body . They are similar to jellyfishes in their tissues and movement. Furthermore, they move as the water is pumped through the body as propulsion to move forward (Anderson & Bone, 1980). The salps provide together a new form of swarm known as slap chain when navigating in oceans as shown in Figure. 1.1

Figure 1: Individual salp. 162 163 164

The salp chain behavior has been modeled mathematically by dividing the population into groups based on leader and followers. The front of the chain is considered as the leader while the remainder of salps is known as 1

www.alimirjalili.com/SSA.html.

7

Figure 2: The behavior of natural salps swarm.

165 166 167 168 169 170 171 172

173 174

175 176 177 178 179

followers as shown in Figure. 2 . The role of the leader is to guide the swarm of salps, and each follower follows the preceding one. Similar to other SI algorithms, the process of SSA start by initializing a random population of salps, then evaluating the fitness for each salp. The salp with the best fitness value is denoted as the leader salp, while other salps are denoted as followers. The best performing salp is denoted also as a food source to be chased by the salp chain. To update the position of sales chain, two main phases are distinguished: leader phase and followers phase. 3.2.1. Leader phase The position of the leader is updated using Eq.(4) as follows:  XBestj + c1 ((ubj − lbj ) c2 + lbj ) if c3 ≥ 0.5 1 Xj = XBestj − c1 ((ubj − lbj ) c2 + lbj ) else

(4)

where Xj1 and XBestj represent the new position of the leader and food source in the j th dimension, ubj and lbj represent the upper and lower bounds of j th dimension, respectively. c2 and c3 are randomly generated numbers in the interval [0, 1]. The parameter c1 presents a significant factor in SSA which controls the balance between exploration and exploitation. Furthermore, c1 8

180

decreases gradually by the course of iterations as shown in Eq.(5): 4t 2

c1 = 2e−( T ) 181 182

183 184 185

186 187 188 189 190 191

192

(5)

where t indicates the current iteration and T is the maximum number of iterations. 3.2.2. Followers phase To update the position of the followers, new concept is introduced which is based on Newton’s law of motion as in Eq.(6): 1 (6) Xji = gt2 + ω0 t, i ≥ 2 2 where Xji represents the position of ith follower salp in the j th dimension. In the optimization process, the time t corresponds to the current iteration, where g and ω0 indicate the acceleration and the velocity, respectively. In Eq.(6), the initial speed ω0 is fixed to 0 and the discrepancy is fixed to 1 (∆t = 1), so the updating process of followers can be expressed as in Eq.(7): 1 Xji = (Xji + Xji−1 ) 2 The pseudo-code of the SSA is shown in Algorithm 2.

9

(7)

193

194 195 196 197 198 199 200 201 202 203

204 205 206 207

Algorithm 2: Salp swarm algorithm 1: Initialize the population size N and max iterations number T . 2: Set the initial iteration number t := 0. 3: Generate the initial population X which contains N . 4: Evaluate solutions the fitness function of all individuals X. 5: Denote the best solution in the population as XBest 6: Repeat 7: Update c1 according to Eq.(5). 8: for i=1 to N do 9: if (Xi leader) then 10: Update the position of the leader salp as in Eq.(4). 11: else 12: Update the position of the follower salp as in Eq.(7). 13: end if 14: end for 15: Set t = t + 1. 16: until (t
10

208

209 210 211 212 213 214 215 216 217 218 219 220

4. Proposed ISSAFD method The framework of the proposed feature selection method is given in Figure 3. In general, the proposed ISSAFD method depends on improving the behaviors of the SSA by using the operators of the SCA and the Dop operator. In which, each operator has its own task. The aim of using the SCA is to improve the exploration ability of the followers rather than the updating mechanism used in the traditional SSA. Since the followers have the largest effect on the convergence of solutions toward the global solution. Whereas, the Dop is used to enhance the diversification of the whole population after updating it using the SSA or SCA operators, which leads to improving the exploration ability of the ISSAFD and convergence to the optimal solution. The ISSAFD consists of four stages which are given in details in the following sections.

Figure 3: The Framework of the proposed ISSAFD method.

11

221 222 223 224

4.1. Initial stage The ISSAFD begins by generating a population that contains a set of N individuals where each individual represents a solution for a given optimization problem. The population X is generated using the following equation: Xi = lbi + αi × (ubi − lbi ), i = 1, 2, ..., N

225 226 227 228

229 230 231 232 233

234 235 236 237

where Xi is the ith solution belong to X, the αi ∈ [0, 1] represents a random number. The lbi and ubi represent the lower and upper boundary of the given problem, however, in this study, lbi = 0 and ubi = 1. In addition, each solution Xi must be converted into binary solution BXi using Eq.(10):  1 if Xij > 0.5 BXij = (10) 0 otherwise To make the definition of BX more clear, consider the solution Xi has five elements given as Xi = [0.81, 0.23, 0.12, 0.53, 0.91] then BXi = [1, 0, 0, 1, 1]. This means that when BXi is applied to features of the given dataset, the second and third features are irrelevant features, while the others are relevant features and must be selected.

4.2. Evaluating stage This stage starts by evaluate the quality of each solution Xi by computing the objective function. To compute the objective value for the ith solution, Eq.(11). |BXi | ) (11) Dim where λ ∈ [0, 1] and (µ = 1−λ). λ represents an equalization factor that used to balance between the classification error rate γi and the number of selected features |BXi |. In this study, the KNN classifier is used as an evaluator during the FS process. The KNN classifier was selected to be used in this work due to its simplicity, easy to implement, and since no parameters are required. In addition, the γi represents the error rate of testing set, in which the dataset is divided randomly into two parts, the first part is the training set which has a size equals to 80% from the total size of the dataset. Meanwhile, the second part is the testing set which has 20% from the dataset. F iti = λ × γi + µ × (

238 239 240 241 242 243 244 245 246

(9)

12

247 248 249 250 251 252 253 254 255 256

257 258

259 260 261 262 263 264

265

4.3. Updating stage In this stage, the solution with the highest objective value among all solutions is denoted as the best solution XBest . Then the population X is split into two populations, using the traditional SSA. The first half of the population represent the leaders and they are updated using the operators of SSA as mentioned in Eq.(4). Whereas, the second half of the population, which represents the followers, is updated using the operators of the SCA as defined in Eqs. (1) and (2). Thereafter, the Dop is used to update the whole population X in order to maintain its diversity. However, in order to decrease the computational time at this stage, the Dop is used as in Eq.(12).  X × Dop if αo > 0.5 X= (12) X otherwise

Eq.(12) refers to the Dop is applied to X only when the random number αo ∈ [0, 1] is greater than 0.5 otherwise it is not used.

4.4. Terminal stage The steps of the evaluating and updating stages are repeated until the termination condition is met. In this study, this condition is the maximum number of iteration which used to assess the quality of the proposed method to find the optimal subset of features during the specified number of iterations. 4.5. Computational complexity The computational complexity of ISSAFD is depends on the complexity of the SSA, SCA and disrupt operator (DO). Therefore, the complexity of the proposed method is given as: O (ISSAF D) = Ks O (SSA) + (N − Ks )O (SCA) + O (DO) where

266 267 268 269

O (SSA) = O (t (Dim × N + C × N + N log N )) O (SCA) = O (t (Dim × N + C × N )) O (DO) = O (t × N )

(13)

where t represents the number of iterations, Dim indicates the number of variables. C is the cost of objective function and N represents the number of solutions. Ks represents the number of solution which updated using the SSA. 13

270 271 272 273 274 275 276 277

278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300

301 302 303 304 305

5. Experimental evaluation and discussion In this section, the performance of the proposed approach is compared with other feature selection methods. In addition, the proposed ISSAFD is compared with two other variant methods of SSA depend on SCA and Disruption Operator. The first method depends on using the SCA to update the leader instead of the followers and using the Dop , so this method called ISSALD. Whereas, the second variant of SSA is called ISSAF which aims to use only the SCA to update the followers without applying the Dop . 5.1. Datasets and parameter setup In order to validate the efficiency of the proposed algorithm ISSAFD, twenty datasets, with varying dimensionality; i.e., two categories are defined as low and high dimensionality, were utilized. The first category is available online at UCI (Frank, 2010), whereas the second type is inspired by (Mafarja & Mirjalili, 2018). Table 1 describes the used datasets in terms of number of features, number of instances and number of classes. These datasets belong to several fields (i, e., biology, games, physics and biomedical) and cover different sizes and dimensions. In order to validate the efficiency of ISSAFD, some parameters and strategies are required. We define the strategy of classification and the type of classifier. As a strategy of classification, we used Hold-out strategy which consists to divide randomly the dataset into two parts: 80% for the training set and 20% for testing set. All experiments were repeated for 30 independent times to obtain statistically meaningful results. Moreover, we consider KNN as a classifier with a Euclidean distance metric (K=5). To have a robust comparison, some other evolutionary feature selections algorithms such as GA, PSO, ALO, SCA, SSA, and GWO have been tested using the same parameter settings. So, all algorithms are uniformly distributed where the population size is set to 10 and the max number of iterations is fixed to 100. The dimension of all algorithms is fixed to the number of features in the original dataset. Table 2 describes parameters setting for all algorithms. 5.2. Performance measures In order to evaluate the performance of the proposed method (ISSAFD), some measures should be defined. Table 3 shows the confusion matrix (CM), which helps to evaluate the performance of the classifier such as Accuracy, Sensitivity, and Specificity. 14

Table 1: Low and high dimensionality datasets description Datasets

Number of features

Number of instances

Number of classes

Data category

Exactly

13

1000

2

Low dimensionality

Exactly2

13

1000

2

Low dimensionality

HeartEW

13

270

2

Low dimensionality

Lymphography

18

148

2

Low dimensionality

M-of-n

13

1000

2

Low dimensionality

PenglungEW

325

73

2

Low dimensionality

SonarEW

60

208

2

Low dimensionality

SpectEW

22

267

2

Low dimensionality

CongressEW

16

435

2

Low dimensionality

IonosphereEW

34

351

2

Low dimensionality

KrvskpEW

36

3196

2

Low dimensionality

Vote

16

300

2

Low dimensionality

WaveformEW

40

5000

3

Low dimensionality

WineEW

13

178

3

Low dimensionality

Zoo

16

101

6

Low dimensionality

BreastEW

30

569

2

Low dimensionality

10367

50

4

High dimensionality

9_Tumors

5726

60

9

High dimensionality

Leukemia 2

11225

72

3

High dimensionality

Prostrate Tumors

10509

102

2

High dimensionality

Brain_Tumors 2

Table 2: Parameters setting Parameter

Value

Size of the population Maximum number of iterations Dimension Number of runs λ in fitness function µ in fitness function a in GWO and SCA c1 and c2 in PSO wmax and wmin in PSO Crossover probability in GA Mutation probability in GA Elitism selection in GA

N = 10 T = 100 Number of features Nr =30 0.99 0.01 decrease linearly from 2 to 0 c1 =c2 =2 wmax = 0, 9 and wmin = 0, 2 Pc = 0.7 Pm = 0.2 Rate = 0.8

15

Table 3: Confusion Matrix Predected class Actual class

Positive

Negative

Postive

True Positive (TP)

False Negative (FN)

Negative

False Positive (FP

True Negative (TN)

• Average accuracy (AV GAcc ): The accuracy metric represents the rate of correctly data classification (see Eq. (14). Accuracy =

TP + TN TP + FN + FP + TN

(14)

In this study, the different algorithms are executed 30 times (Nr = 30), so the AV GAcc metric is calculated as Eq. (15). AV GAcc

Nr 1 X AcckBest = Nr k=1

(15)

• Average sensitivity (AV GSens ) : The sensitivity measure is called also true positive rate (TPR), which indicates the percentage of predicting positive patterns (See Eq. (16)). Sensitivity =

TP TP + FN

(16)

The AV GSens metric is computed from the selected features of the best solution as in Eq.(17). AV GSens

Nr 1 X = SenskBest Nr k=1

(17)

• Average specificity (AV GSpec ): The specificity metric is known also true negative rate (TNR), that represents the percentage of prognosticating negative patterns. This metric is computed by Eq. (18). Specif icity =

16

TN FP + TN

(18)

The AV GSpec measure is computed as the following equation. AV GSpec

Nr 1 X = SpeckBest Nr k=1

(19)

• Average fitness value (AV GF it ): The fitness value metric evaluates the performance of algorithms, which puts the relationship between minimizing the error rate of classification and reducing the selection ratio as in Eq. (11) and its average is expressed as in Eq. 20. AV GF it

306 307

Nr 1 X F itkBest = Nr k=1

(20)

• Average number of the selected features (AV G|BXBest | ): This measure calculate the ability of an algorithm in reducing the features number of a given dataset over all number of independent runs. It calculated as in Eq. (21). Nr 1 X k BXBest (21) AV G|BXBest | = Nr k=1

k | is the cardinality of the selected features of the best where |BXBest solution for k th run.

• Average computation time (AV GT ime ): It calculates the average of CPU time (in seconds) for each algorithm as in the following equation: AV GT ime

Nr 1 X = T imekBest Nr k=1

(22)

• Standard deviation (STD): It is used for evaluating the quality of each algorithm and analysing the obtained results over different executions. Eq. (23) is used to apply this measure. v u Nr u 1 X 2 k ST DY = t YBest − AV GY (23) Nr k=1 17

308 309

310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333

(Note: the ST DY is calculated for all measures: Accuracy, Fitness, Time, Number of selected features, Sensitivity and Specificity. 5.3. The effect of λ and µ on the fitness function The main objective in the FS problem is to find a trade-off between minimizing the selection ratio while trying to maximize the classification accuracy. Thus, a good balance between these to contradictory objectives should be maintained. Here we are interested to study the influence of the λ and µ parameters in Eq.(11) that controls this balance. In this study, the Leukimia2 dataset was used since it represents a real challenge in the field of feature selection (Moayedikia et al., 2017). This dataset contains a small number of instances with a large dimensionality (72 instances with 11225 attributes). So, the leukimia2 dataset has a certain peculiarity in FS which is used in all experiments due to two reasons: it contains a large number of features (11225) and its sensitivity in comparison with other datasets. Different values for λ and µ were used to measure the classification accuracy, the number of selected features, and the fitness value. Table 4 presents the average of classification accuracy, number of selected features, and the fitness values with different values of λ and µ. It can be seen, when changing the values of λ and µ, equally, the accuracy rate, the number of selected features and the fitness values are changing. As soon as, the value of λ increases, the accuracy also increases while the fitness and the number of features selected decrease which is the purpose of this study. By analyzing the Table 4, the best values of the corresponding λ and µ are set to 0.99 and 0.01, respectively which correctly confirms the results obtained by the works of (Mafarja et al., 2018; Faris et al., 2018; Emary et al., 2018). Table 4: The infuence of λ and µ for the Leukimia2

λ

µ

AV GAcc

0.5 0.7 0.9 0.99

0.5 0.3 0.1 0.01

0.95 0.97 0.99 1

AV G|BXBest | 9462.675 5343.1 5331.875 5320.65

18

AV GF it 0.254 0.1709 0.0048 0.0047

334 335 336 337 338 339

340 341 342 343 344 345 346 347

348 349 350 351 352

353 354 355 356 357 358 359 360

361 362 363 364 365

5.4. Comparison of ISSAFD with ISSAF and ISSALD This subsection aims to compare the performance of the proposed method ISSAFD with two novel versions known as ISSAF and ISSALD, where ISSALD method modifies the position of leader in SSA by sine/cosine equations. However, ISSAF employed the same process used by ISSAFD but without applying disruption operator. • In terms of accuracy: According to Table 5, the ISSAFD method achieved the highest accuracy in 15 datasets out of 20 and it acts similarly with ISSAF and ISSLD in 2 and 3 datasets, respectively. The ISSAF achieved the best values in 2 datasets and ranked second between the proposed approaches, whereas, the ISSALD took the last rank. Moreover, The ISSAFD also achieved the smallest STD value in 14 datasets which proves the robustness of the algorithm and its ability to search the promising regions in the search space. • In terms of fitness: As can be seen in Table 6, the ISSAFD method showed good behavior in reaching the minimum fitness values in 13 datasets out of 20, followed by the ISSAF with 4 datasets, whereas, the ISSALD obtained the best fitness value in only 3 dataset. Also, the ISSAFD was the most stable method in terms of STD value. • In terms of selected number of attributes: Inspecting the results in Table 7, the ISSAFD method was able to select the most significant features in 11 datasets out of 20 and it acted like the ISSALD method in one dataset (namely Exactly2 dataset). The ISSALD method comes in the second place by achieving the best results in 4 datasets and it acted like the ISSAF method in one dataset (namely KrvskpEW dataset) whereas, the ISSALD comes in the last place where is achieved the best results in 3 datasets. • In terms of sensitivity and specificity: The ISSAFD method ranked first with 7 datasets whereas, it acted similarly as the ISSAF and SSALD methods in 2 and 4 datasets respectively. The ISSAF ranked second with 6 datasets followed by SSALD with 5 datasets. From Table 8, the ISSAFD was the most stable method in terms of STD value. Table 9 19

Table 5: Comparison of ISSAFD with other variants: ISSAF and ISSALD in terms of accuracy Algorithms Datasets

ISSAFD AVG

ISSAF

ISSALD

STD

AVG

STD

AVG

STD

Exactly

0.9803

0.0440

0.9932

0.0182

0.7977

0.1180

Exactly2

0.8100

0.0000

0.7553

0.0079

0.7800

0.0000

HeartEW

0.9056

0.0164

0.8667

0.0171

0.8272

0.0190

Lymphography

0.9717

0.0156

0.8870

0.0241

0.8778

0.0331

M-of-n

0.9875

0.0277

0.9980

0.0048

0.8973

0.0786

PenglungEW

1,0000

0.0000

0.9356

0.0122

1.0000

0.0000

SonarEW

0.9968

0.0082

0.9540

0.0187

0.9675

0.0171

SpectEW

0.9389

0.0121

0.8586

0.0150

0.8815

0.0173

CongressEW

1,0000

0.0000

0.9870

0.0078

0.9628

0.0120

IonosphereEW

0.9850

0.0073

0.9540

0.0148

0.9620

0.0149

KrvskpEW

0.9742

0.0057

0.9730

0.0115

0.9442

0.0147

Vote

0.9806

0.0063

0.9694

0.0088

0.9561

0.0148

WaveformEW

0.7636

0.0132

0.7505

0.0118

0.7249

0.0130

WineEW

1.0000

0.0000

0.9944

0.0113

0.9806

0.0208

Zoo

1.0000

0.0000

1.0000

0.0000

1.0000

0.0000

BreastEW

0.9851

0.0070

0.9675

0.0124

0.9675

0.0066

Brain_Tumor2

1.0000

0.0000

0.5933

0.0583

0.9900

0.0305

9_Tumors

1.0000

0.0000

0.8495

0.0636

0.8883

0.0826

Leukemia2

1.0000

0.0000

1,0000

0.0000

1.0000

0.0000

Prostrate Tumors

0.9857

0.0222

0.9805

0.0194

0.9635

0.0271

Ranking

W|T|L

18|2|2

W|T|L

4|2|18

W|T|L

3|3|17

20

Table 6: Comparison of ISSAFD with other variants: ISSAF and ISSALD in terms of fitness

Algorithms Datasets

ISSAFD

ISSAF

ISSALD

AVG

STD

AVG

STD

AVG

STD

Exactly

0.0251

0.0444

0.0120

0.0185

0.2036

0.1153

Exactly2

0.1889

0.0000

0.2452

0.0078

0.2186

0.0000

HeartEW

0.0983

0.0157

0.1369

0.0170

0.1738

0.0185

Lymphography

0.0324

0.0154

0.1158

0.0238

0.1227

0.0326

M-of-n

0.0181

0.0281

0.0073

0.0052

0.1064

0.0773

PenglungEW

0.0036

0.0002

0.0675

0.0119

0.0002

0.0001

SonarEW

0.0071

0.0081

0.0498

0.0185

0.0344

0.0167

SpectEW

0.0644

0.0117

0.1450

0.0145

0.1203

0.0169

CongressEW

0.0022

0.0004

0.0163

0.0080

0.0387

0.0115

IonosphereEW

0.0188

0.0072

0.0491

0.0149

0.0395

0.0144

KrvskpEW

0.0316

0.0055

0.0263

0.0114

0.0581

0.0141

Vote

0.0234

0.0057

0.0332

0.0080

0.0460

0.0144

WaveformEW

0.2394

0.0133

0.2524

0.0117

0.2752

0.0126

WineEW

0.0033

0.0007

0.0105

0.0106

0.0216

0.0204

Zoo

0.0025

0.0005

0.0030

0.0004

0.0019

0.0002

BreastEW

0.0193

0.0071

0.0363

0.0125

0.0339

0.0060

Brain_Tumor2

0.0047

0.0000

0.4074

0.0577

0.0100

0.0302

9_Tumors

0.0047

0.0001

0.1538

0.0629

0.1114

0.0818

Leukemia2

0.0047

0.0000

0.0047

0.0000

0.0002

0.0000

Prostrate Tumors

0.0190

0.0220

0.0143

0.0192

0.0364

0.0269

Ranking

W|T|L

13|0|7

W|T|L

4|0|16

W|T|L

3|0|17

21

Table 7: Comparison of ISSAFD with other variants: ISSAF and ISSALD in terms of selected attributes Algorithms

ISSAFD

ISSAF

ISSALD

Datasets

AVG

STD

AVG

STD

AVG

STD

Exactly

5.3333

1.2685

4.3333

0.9248

6.8000

2.7834

Exactly2

1.0000

0.0000

3.8667

3.2561

1.0000

0.0000

HeartEW

6.2000

1.4479

6.3333

1.3476

6.4667

0.8193

Lymphography

7.8333

1.7036

8.0000

1.4856

8.1333

1.3060

M-of-n

7.4333

1.1943

6.8667

0.7303

8.2000

1.2429

PenglungEW

118.1667

7.2829

129.5333

9.8216

108.0667

2.7283

SonarEW

23.7667

3.3701

28.2333

2.7628

27.3333

5.9558

SpectEW

8.6667

2.5235

11.0667

2.1324

6.4667

1.6761

CongressEW

3.2333

0.6814

5.4000

1.4527

3.2667

0.9948

IonosphereEW

13.4667

3.5790

13.5333

2.1129

13.5000

2.2834

KrvskpEW

21.7000

1.9325

20.0333

1.8659

22.3000

4.9211

Vote

4.2333

1.8998

4.6667

2.4117

4.2667

0.9444

WaveformEW

12.2667

2.5452

12.5667

3.3598

12.6000

4.0735

WineEW

2.1213

0.9444

3.1000

1.2521

6.5333

0.7589

Zoo

4.3333

0.7303

4.7333

0.6915

3.0667

0.2537

BreastEW

5.4000

2.3413

12.4667

2.3154

11,4000

3,2013

Brain_Tumor2

4913.5000

35.1399

5014.5667

60.2842

4979.6000

85.1581

9_Tumors

2681.5000

32.3758

2773.3000

36.9642

2794.1000

326.2576

Leukemia2

5323.0333

42.5258

5326.1333

48.6803

102.6333

18.4979

Prostrate Tumors

5085.8667

58.8187

5063.2333

46.1838

5168.3333

420.6817

Ranking

W|T|L

12|1|8

W|T|L

4|0|16

W|T|L

6|1|14

22

366 367 368 369

shown that the ISSAFD method achieved the best values in 13 datasets whereas, it achieved 100% specificity in Leukemia2 dataset like ISSAF and ISSALD methods; followed by the ISSAF method in 4 datasets. The ISSAF ranked last with only two datasets. Table 8: Comparison of ISSAFD with other variants: ISSAF and ISSALD in terms of sensitivity

Algorithms

370 371 372

ISSAFD

ISSAF

ISSALD

Datasets

AVG

STD

AVG

STD

AVG

STD

Exactly

0.9880

0.0266

0.9869

0.0109

0.9514

0.0615

Exactly2

1.0000

0.0000

0.9658

0.0479

1.0000

0.0000

HeartEW

0.9237

0.0354

0.9449

0.0512

0.8903

0.0553

Lymphography

0.9278

0.0362

0.9222

0.0417

1.0000

0.0000

M-of-n

0.9815

0.0420

0.9986

0.0054

0.8544

0.1274

PenglungEW

1.0000

0.0000

1.0000

0.0000

1.0000

0.0000

SonarEW

0.9961

0.0149

0.9379

0.0348

0.9519

0.0431

SpectEW

0.7667

0.0595

0.7615

0.0310

0.7056

0.0811

CongressEW

1.0000

0.0000

0.9914

0.0094

0.9509

0.0117

IonosphereEW

0.9986

0.0053

0.9928

0.0109

0.9935

0.0130

KrvskpEW

0.9703

0.0076

0.9702

0.0099

0.9472

0.0195

Vote

0.9970

0.0115

1.0000

0.0000

0.9737

0.0359

WaveformEW

0.7370

0.0205

0.7369

0.0221

0.6997

0.0234

WineEW

1.0000

0.0000

0.9958

0.0228

1.0000

0.0000

Zoo

0.9722

0.0987

1.0000

0.0000

0.9667

0.0562

BreastEW

0.9818

0.0128

0.9815

0.0140

0.9958

0.0074

Brain_Tumor2

0.5750

0.2470

0.1667

0.3790

0.9917

0.0456

9_Tumors

0.5667

0.3051

0.7000

0.4661

0.4222

0.2466

Leukemia2

1.0000

0.0000

1.0000

0.0000

1.0000

0.0000

Prostrate Tumors

0.9667

0.0615

0.9889

0.0288

0.9455

0.0453

Ranking

W|T|L

11|4|9

W|T|L

8|2|12

W|T|L

7|4|13

5.5. Comparison of ISSAFD with standard Meta-heuristics In this section, the analysis of the results of ISSAFD and others methods namely SSA, SCA, GWO, ALO, GA and PSO are presented in Tables 10- 16 23

Table 9: Comparison of ISSAFD with other variants: ISSAF and ISSALD in terms of specificity

Algorithms

ISSAFD

ISSAF

ISSALD

Datasets

AVG

STD

AVG

STD

AVG

STD

Exactly

0.9641

0.0872

0.9839

0.0384

0.4213

0.3889

Exactly2

0.3333

0.0000

0.1240

0.1468

0.2200

0.0000

HeartEW

0.8812

0.0497

0.7940

0.0321

0.7420

0.0948

Lymphography

0.9833

0.0339

0.8133

0.0367

1.0000

0.0000

M-of-n

0.9909

0.0206

0.9906

0.0056

0.9253

0.0619

PenglungEW

0.9030

0.0231

1.0000

0.0000

0.8310

0.1036

SonarEW

0.9973

0.0101

0.9717

0.0284

0.9792

0.0212

SpectEW

0.9881

0.0150

0.8894

0.0229

0.9317

0.0291

CongressEW

1.0000

0.0000

0.9798

0.0145

0.9856

0.0168

IonosphereEW

0.9565

0.0255

0.8550

0.0547

0.9040

0.0387

KrvskpEW

0.9785

0.0129

0.9754

0.0159

0.9411

0.0347

Vote

0.9711

0.0080

0.9563

0.0126

0.9480

0.0210

WaveformEW

0.8801

0.0102

0.8591

0.0119

0.8467

0.0118

WineEW

1.0000

0.0000

0.9964

0.0109

0.9760

0.0290

Zoo

0.8556

0.0974

1.0000

0.0000

0.9872

0.0574

BreastEW

0.9896

0.0106

0.9417

0.0211

0.9190

0.0194

Brain_Tumor2

0.5944

0.1431

0.8889

0.0000

0.9944

0.0304

9_Tumors

0.6444

0.1182

0.9424

0.0506

0.6481

0.1432

Leukemia2

1.0000

0.0000

1.0000

0.0000

1.0000

0.0000

Prostrate Tumors

0.9952

0.0181

0.9926

0.0282

0.9833

0.0379

Ranking

W|T|L

14|1|6

W|T|L

5|1|15

W|T|L

3|1|17

24

373 374

375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391

392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408

and Figures 4- 7 in term of the performance measures mentioned in Section 5.2. • In terms of accuracy: By observing the results in Table 10, it can be clearly seen that ISSAFD achieved the highest accuracy in 90% of the datasets, followed by ALO and PSO that obtained the same results in two datasets. The rest of the methods are ranked as follows: SSA followed by SCA, GA, and GWO. The power of the proposed algorithm ISSAFD is shown obviously in the large datasets presented in Table 10, it is ranked first with average accuracy of these datasets equals to 99.6%; whereas, the SCA is ranked second with only 76% followed by PSO and ALO with 70.8% and 70.3%, respectively. In addition, the proposed method showed stable behavior through all experiments. This can be proved by inspecting the STDev values, where since it achieved the smallest value among all methods (i.e., 0.0093) followed by ALO and PSO with 0.036 and 0.037, respectively. Figures 4- 7 show the Boxplots of classification accuracy measurement for the ISSAFD, which is compared with several optimizers including GWO, SCA, SSA, ALO, GA, PSO, that have been implemented and tested on twenty datasets in the same environment. • In terms of sensitivity and specificity: According to Table 11 the proposed method ISSAFD achieved the highest sensitivity value in 15 datasets out of 20 followed by PSO and ALO that obtained the same results in three datasets; whereas, the GWO obtained the best results in two datasets. However, the SSA did not achieve the best sensitivity in all datasets, it is ranked the forth based on the average overall datasets followed by GA, GWO and SCA. Moreover, the proposed approach showed a stable behavior due to the STD values, since it obtained the smallest STD average among all methods (i.e., 0.049) followed by ALO, SSA, PSO, and GA with 0.075, 0.090, 0.093, and 0.099, respectively. By inspecting the results in Table 12, the ISSAFD reached the best specificity value in 11 datasets out of 20 followed by PSO, ALO, GWO, and SSA that obtain the best values in 7, 6, 3, and 2 datasets, respectively; some of these methods obtained the best specificity values in same datasets. Overall the results of datasets, the ISSAFD ranked first followed by ALO, SSA, SCA, PSO, GA, and GWO. In the large datasets, the ISSAFD also achieved the best value followed by GA and 25

Table 10: Comparison of ISSAFD versus other optimizers in terms of accuracy Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking

Metric

ISSAFD

SCA

SSA

GWO

ALO

GA

PSO

AVG

0.9803

0.8510

0.9815

0.7860

1.0000

0.8052

1.0000

STD

0.0440

0.1531

0.0647

0.1240

0.0000

0.1261

0.0000

AVG

0.8100

0.4842

0.6403

0.6290

0.6555

0.6513

0.6718

STD

0.0000

0.1584

0.0625

0.0978

0.0409

0.0510

0.0242

AVG

0.9056

0.8068

0.8000

0.7802

0.7957

0.7926

0.7932

STD

0.0164

0.0488

0.0349

0.0550

0.0286

0.0520

0.0223

AVG

0.9717

0.8122

0.8078

0.8044

0.8411

0.8211

0.8222

STD

0.0156

0.0536

0.0623

0.1113

0.0572

0.0719

0.0576

AVG

0.9875

0.9563

0.9895

0.9203

1.0000

0.9110

1.0000

STD

0.0277

0.0799

0.0347

0.0667

0.0000

0.0778

0.0000

AVG

1.0000

0.7378

0.7089

0.6644

0.7200

0.6600

0.7133

STD

0.0000

0.0801

0.0792

0.0811

0.0537

0.0590

0.0704

AVG

0.9968

0.8405

0.8683

0.8381

0.8595

0.8571

0.8802

STD

0.0082

0.0473

0.0389

0.0421

0.0469

0.0447

0.0417

AVG

0.9389

0.7617

0.7395

0.7525

0.7512

0.7488

0.7395

STD

0.0121

0.0418

0.0404

0.0375

0.0448

0.0417

0.0476

AVG

1.0000

0.9456

0.9307

0.9180

0.9291

0.9261

0.9291

STD

0.0000

0.0145

0.0199

0.0270

0.0174

0.0254

0.0189

AVG

0.9850

0.9160

0.9047

0.8920

0.9258

0.9047

0.9150

STD

0.0073

0.0380

0.0342

0.0493

0.0261

0.0268

0.0290

AVG

0.9742

0.9364

0.9565

0.9279

0.9656

0.9377

0.9616

STD

0.0057

0.0200

0.0100

0.0298

0.0073

0.0170

0.0093

AVG

0.9806

0.9494

0.9489

0.9489

0.9511

0.9389

0.9494

STD

0.0063

0.0102

0.0190

0.0169

0.0138

0.0216

0.0208

AVG

0.7636

0.7258

0.7374

0.7208

0.7491

0.7279

0.7486

STD

0.0132

0.0151

0.0166

0.0143

0.0142

0.0185

0.0116

AVG

1.0000

0.9731

0.9639

0.9676

0.9954

0.9657

0.9806

STD

0.0000

0.0523

0.0414

0.0438

0.0105

0.0315

0.0522

AVG

1.0000

0.9714

0.9825

0.9222

0.9937

0.9460

0.9937

STD

0.0000

0.0826

0.0318

0.0880

0.0207

0.0496

0.0165

AVG

0.9851

0.9491

0.9433

0.9415

0.9468

0.9433

0.9447

STD

0.0070

0.0287

0.0161

0.0137

0.0172

0.0191

0.0153

AVG

1.0000

0.7197

0.6796

0.6681

0.6644

0.6933

0.6907

STD

0.0000

0.1554

0.0494

0.0824

0.0797

0.0563

0.0778

AVG

1.0000

0.6257

0.6468

0.6346

0.6236

0.6218

0.6355

STD

0.0000

0.2572

0.1195

0.1306

0.1095

0.1307

0.1148

AVG

1.0000

0.9044

0.8667

0.8489

0.8800

0.8711

0.8689

STD

0.0000

0.0736

0.0581

0.0630

0.0565

0.0605

0.0593

AVG

0.9857

0.8016

0.6063

0.6111

0.6444

0.6111

0.6381

STD

0.0222

0.1091

0.0718

0.0993

0.0827

0.0730

0.0657

W|T|L

18|0|2

0|0|20

0|0|20

0|0|20

2|2|18

0|0|20

2|2|18

26

409 410 411

412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434

435 436 437 438 439 440 441 442 443 444

ALO. Moreover, the proposed approach also showed the smallest STD average among all methods (i.e., 0.034) followed by ALO and SSA with 0.037 and 0.050, respectively. • In terms of average number of feature selected: This measure is used to show how the methods can reduce the features numbers of the given datasets. In this measure, all methods were examined and illustrated in Table 13. The results showed that, the SCA obtained the greatest ability to reduce the features numbers in all datasets with the average equals to 79%; whereas, the proposed approach ISSAFD ranked second with 63% followed by the ALO, PSO, SSA, GWO, and GA with 0.59%, 058%, 0.55%, 0.53%, and 0.51%, respectively. We have noted that, the SCA was able to reduce the large number of features to the minimum limit among all methods since it reduced the features number 98% of their original number. Although, the SCA showed a high ability to reduce the large features number, the obtained the highest accuracy in these datasets equals to 0.99 whereas, SCA achieved 0.76 and ranked second. Figure 8 illustrates the relationship between the average classification accuracy and the number of selected features. From this figure we can see that, the SCA is ranked first in reducing the number of features whereas, its accuracy came in the fifth rank. So, although the ISSAFD is ranked second in reducing the features number, it showed the best accuracy overall datasets. Therefore, it can be more accurate than SCA. From above notes the ranking will be as the following order, ISSAFD followed by SCA, ALO, PSO, SSA, and GWO while GA came in the last rank. • In terms of the average value of the fitness : This measure is used to show the ability of methods in minimizing the fitness value. Based on the results of Table 14, the PSO ranked first since it obtained the minimum values in 10 datasets out of 20, followed by the ISSAFD with 8 datasets. The remain methods ranked as follow, ALO ranked third followed by SCA, SSA, GA, and GWO. Whereas, the ISSAFD, achieved the best and minimum value (i.e., 0.009) in term of STD followed by ALO and SSA with 0.0120 and 0.0172, respectively. Moreover, to evaluate the convergence of the proposed method, it is compared with the traditional SCA and SSA as depicted in Figure 9. It can be seen that 27

Table 11: Comparison of ISSAFD versus other optimizers in terms of sensitivity Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking

Metric

ISSAFD

SCA

SSA

GWO

ALO

GA

PSO

AVG

0.9880

0.9148

0.9883

0.9124

1.0000

0.8744

1.0000

STD

0.0266

0.0909

0.0471

0.0903

0.0000

0.0864

0.0000

AVG

1.0000

0.4625

0.7327

0.7033

0.7566

0.7497

0.7793

STD

0.0000

0.2853

0.1021

0.1676

0.0620

0.0632

0.0247

AVG

0.9237

0.8460

0.8080

0.7828

0.8253

0.8057

0.8287

STD

0.0354

0.0809

0.0541

0.0811

0.0434

0.0681

0.0389

AVG

0.9278

0.0667

0.0667

0.1000

0.0333

0.1667

0.7333

STD

0.0362

0.2537

0.2537

0.3051

0.1826

0.3790

0.4498

AVG

0.9815

0.9510

0.9863

0.8932

1.0000

0.8815

1.0000

STD

0.0420

0.0919

0.0463

0.0900

0.0000

0.1039

0.0000

AVG

1.0000

0.7667

0.9333

0.8333

0.9333

0.9667

0.8667

STD

0.0000

0.4302

0.2537

0.3790

0.2537

0.1826

0.3457

AVG

0.9961

0.8479

0.8750

0.8792

0.8646

0.8625

0.9021

STD

0.0149

0.0815

0.0544

0.0613

0.0736

0.0601

0.0629

AVG

0.7667

0.0000

0.0533

0.0567

0.0133

0.0633

0.0633

STD

0.0595

0.0000

0.0629

0.0626

0.0346

0.0718

0.0556

AVG

1.0000

0.9553

0.9520

0.9433

0.9567

0.9393

0.9533

STD

0.0000

0.0086

0.0214

0.0263

0.0183

0.0213

0.0177

AVG

0.9986

0.9404

0.9695

0.9518

0.9667

0.9759

0.9688

STD

0.0053

0.0528

0.0260

0.0739

0.0278

0.0207

0.0260

AVG

0.9703

0.9438

0.9579

0.9419

0.9638

0.9443

0.9628

STD

0.0076

0.0171

0.0114

0.0208

0.0124

0.0213

0.0102

AVG

0.9970

0.9449

0.9474

0.9564

0.9641

0.9410

0.9641

STD

0.0115

0.0330

0.0311

0.0331

0.0200

0.0388

0.0363

AVG

0.7370

0.6936

0.7127

0.6867

0.7208

0.6970

0.7183

STD

0.0205

0.0273

0.0253

0.0241

0.0192

0.0230

0.0162

AVG

1.0000

0.9923

0.9846

0.9821

0.9949

0.9974

0.9974

STD

0.0000

0.0235

0.0313

0.0482

0.0195

0.0140

0.0140

AVG

0.9722

0.9375

0.9792

0.9000

0.9875

0.9292

1.0000

STD

0.0987

0.1760

0.0576

0.1687

0.0381

0.1169

0.0000

AVG

0.9818

0.9481

0.9472

0.9435

0.9509

0.9495

0.9472

STD

0.0128

0.0377

0.0238

0.0239

0.0265

0.0264

0.0254

AVG

0.5750

0.6167

0.9500

0.9667

0.9667

0.9500

0.9333

STD

0.2470

0.3869

0.1526

0.1269

0.1269

0.1526

0.1729

AVG

0.5667

0.3333

0.8000

0.8667

0.8333

0.8333

0.7667

STD

0.3051

0.4795

0.4068

0.3457

0.3790

0.3790

0.4302

AVG

1.0000

0.9542

0.8500

0.8542

0.8500

0.8583

0.8542

STD

0.0000

0.0695

0.0509

0.0474

0.0509

0.0432

0.0474

AVG

0.9667

0.7861

0.6056

0.5778

0.6278

0.5972

0.6444

STD

0.0615

0.1325

0.0927

0.1217

0.1132

0.1160

0.0901

W|T|L

15|0|5

0|0|20

1|0|19

2|1|18

3|3|17

0|0|20

3|2|17

28

Table 12: Comparison of ISSAFD versus other optimizers in terms of specificity Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking

Metric

ISSAFD

SCA

SSA

GWO

ALO

GA

PSO

AVG

0.9641

0.6948

0.9649

0.4764

1.0000

0.6356

1.0000

STD

0.0872

0.3163

0.1096

0.3638

0.0000

0.2578

0.0000

AVG

0.3333

0.5546

0.3397

0.3872

0.3262

0.3312

0.7793

STD

0.0000

0.2676

0.1098

0.1455

0.0485

0.0811

0.0247

AVG

0.8812

0.7613

0.7907

0.7773

0.7613

0.7773

0.8287

STD

0.0497

0.0925

0.0784

0.0710

0.0782

0.0805

0.0389

AVG

0.9833

1.0000

0.9989

1.0000

1.0000

0.9989

1.0000

STD

0.0339

0.0000

0.0063

0.0000

0.0000

0.0063

0.0000

AVG

0.9909

0.9601

0.9917

0.9396

1.0000

0.9319

1.0000

STD

0.0206

0.0779

0.0274

0.0559

0.0000

0.0639

0.0000

AVG

0.9030

0.9810

0.9929

1.0000

0.9952

0.9976

0.8667

STD

0.0231

0.0321

0.0218

0.0000

0.0181

0.0130

0.3457

AVG

0.9973

0.8359

0.8641

0.8128

0.8564

0.8538

0.9021

STD

0.0101

0.0571

0.0532

0.0569

0.0562

0.0548

0.0629

AVG

0.9881

0.9348

0.8955

0.9106

0.9189

0.9045

0.0633

STD

0.0150

0.0513

0.0476

0.0442

0.0581

0.0522

0.0556

AVG

1.0000

0.9324

0.9018

0.8838

0.8919

0.9081

0.9533

STD

0.0000

0.0323

0.0446

0.0555

0.0487

0.0548

0.0177

AVG

0.9565

0.8681

0.7778

0.7750

0.8458

0.7653

0.9688

STD

0.0255

0.0701

0.0884

0.0615

0.0560

0.0722

0.0260

AVG

0.9785

0.9294

0.9550

0.9146

0.9674

0.9313

0.9628

STD

0.0129

0.0465

0.0163

0.0415

0.0129

0.0243

0.0102

AVG

0.9711

0.9529

0.9500

0.9431

0.9412

0.9373

0.9641

STD

0.0080

0.0166

0.0234

0.0203

0.0204

0.0296

0.0363

AVG

0.8801

0.8583

0.8592

0.8497

0.8656

0.8555

0.7183

STD

0.0102

0.0123

0.0150

0.0124

0.0129

0.0138

0.0162

AVG

1.0000

0.9841

0.9696

0.9826

1.0000

0.9812

0.9974

STD

0.0000

0.0333

0.0414

0.0335

0.0000

0.0295

0.0140

AVG

0.8556

0.9949

1.0000

0.9923

1.0000

0.9949

1.0000

STD

0.0974

0.0281

0.0000

0.0310

0.0000

0.0195

0.0000

AVG

0.9896

0.9508

0.9365

0.9381

0.9397

0.9325

0.9472

STD

0.0106

0.0293

0.0201

0.0262

0.0278

0.0235

0.0254

AVG

0.5944

0.9958

1.0000

1.0000

1.0000

1.0000

0.9333

STD

0.1431

0.0228

0.0000

0.0000

0.0000

0.0000

0.1729

AVG

0.6444

0.3152

0.5273

0.4848

0.5576

0.6273

0.7667

STD

0.1182

0.1447

0.1420

0.1459

0.1579

0.1293

0.4302

AVG

1.0000

0.9571

0.9667

0.9524

0.9857

0.9619

0.8542

STD

0.0000

0.0764

0.0615

0.0685

0.0436

0.0643

0.0474

AVG

0.9952

0.8222

0.6074

0.6556

0.6667

0.6296

0.6444

STD

0.0181

0.1258

0.1000

0.1498

0.1092

0.1143

0.0901

W|T|L

11|0|9

1|3|19

1|2|19

3|5|17

6|9|14

1|2|19

7|7|13

29

1

0.8

0.95

0.7

0.9

Accuracy

Accuracy

0.6 0.85 0.8 0.75

0.5

0.4

0.7 0.3 0.65 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

GWO

ALO

Algorithms

PSO

SCA

SSA

ISSAFD

Algorithms

(a) Exactly

(b) Exactly2 1

0.9 0.9 0.85

Accuracy

Accuracy

0.8 0.8

0.75

0.7

0.7 0.6 0.5

0.65

0.4

0.6 GA

GWO

ALO

PSO

SCA

SSA

0.3

ISSAFD

GA

GWO

ALO

Algorithms

PSO

SCA

SSA

ISSAFD

Algorithms

(c) HeartEW

(d) Lymphography 1

1

0.95 0.9

0.95

0.9

Accuracy

Accuracy

0.85

0.85

0.8 0.75 0.7

0.8

0.65 0.6

0.75

0.55 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

Algorithms

GWO

ALO

PSO

SCA

SSA

ISSAFD

Algorithms

(e) M-of-n

(f) PenglungEW

Figure 4: Boxplots of ISSAFD versus other optimizers based on accuracy metric: Exactly to SpectEW.

30

1 0.95 0.9 0.85

0.9

Accuracy

Accuracy

0.95

0.85

0.8 0.75

0.8

0.7 0.65

0.75 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

GWO

ALO

Algorithms

PSO

SCA

SSA

ISSAFD

SSA

ISSAFD

Algorithms

(a) SonarEW

(b) SpectEW 1

1 0.98

0.95

0.96 0.9

Accuracy

Accuracy

0.94 0.92 0.9

0.85 0.8

0.88 0.75

0.86

0.7

0.84 0.82 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

GWO

ALO

Algorithms

PSO

SCA

Algorithms

(c) CongressEW

(d) IonosphereEW 0.98

0.98

0.97 0.96

0.96 0.95

Accuracy

Accuracy

0.94

0.92

0.94 0.93 0.92

0.9

0.91 0.9

0.88

0.89 0.86

0.88 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

Algorithms

GWO

ALO

PSO

SCA

SSA

ISSAFD

Algorithms

(e) KrvskpEW

(f) Vote

Figure 5: Boxplots of ISSAFD versus other optimizers based on accuracy metric: SonarEW to Vote.

31

1

0.78

0.98 0.96

Accuracy

Accuracy

0.76

0.74

0.94 0.92 0.9

0.72

0.88 0.7

0.86 0.84

0.68 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

GWO

ALO

Algorithms

PSO

SCA

SSA

ISSAFD

Algorithms

(a) WaveformEW

(b) WineEW 1

1 0.95

0.98

0.9 0.96

Accuracy

Accuracy

0.85 0.8 0.75

0.94

0.92

0.7 0.65

0.9 0.6 0.55

GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

GWO

ALO

Algorithms

PSO

SCA

SSA

ISSAFD

Algorithms

(c) Zoo

(d) BreastEW

1

1 0.9

0.9

0.8 0.7

Accuracy

Accuracy

0.8

0.7

0.6

0.6 0.5 0.4 0.3 0.2

0.5

0.1 0.4

0 GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

Algorithms

GWO

ALO

PSO

SCA

SSA

ISSAFD

Algorithms

(e) Brain_Tumor2

(f) 9_Tumors

Figure 6: Boxplots of ISSAFD versus other optimizers based on accuracy metric: WaveformEW to 9_Tumors.

32

1

0.95

0.95

0.9

0.9

Accuracy

Accuracy

1

0.85 0.8 0.75

0.8 0.75

0.7 0.65

0.85

0.7

GA

GWO

ALO

PSO

SCA

SSA

ISSAFD

GA

Algorithms

GWO

ALO

PSO

SCA

SSA

ISSAFD

Algorithms

(a) Leukemia2

(b) Prostate Tumors

Figure 7: Boxplots of ISSAFD versus other optimizers based on accuracy metric for Leukemia2 and Prostate Tumors.

Figure 8: The average of the accuracy and the features selection ratio of all methods

33

Table 13: Comparison between ISSAFD with other metaheuristics based on minimum number of selected features Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumors 2 9_Tumors Leukemia 2 Prostrate Tumors Ranking

Metric

ISSAFD

SCA

SSA

GWO

ALO

GA

PSO

AVG

5.3333

5.7667

6.9333

5.7667

6.0000

8.2000

6.0000

STD

1.2685

0.8172

1.0483

3.1588

0.0000

1.8828

0.0000

AVG

1.0000

2.9667

5.9667

6.0333

6.5000

6.9333

6.3333

STD

0.0000

1.9205

1.6709

2.3265

1.1371

1.4840

0.8023

AVG

6.2000

4.9333

6.7333

7.8000

6.1000

7.5333

5.9667

STD

1.4479

1.0807

1.3374

2.0069

1.2134

1.7953

1.0300

AVG

7.8333

4.6000

6.8667

6.9000

6.3333

7.6000

7.0667

STD

1.7036

1.1919

1.1366

2.1066

1.3730

1.5222

1.2576

AVG

7.4333

6.0333

6.7667

9.1000

6.0000

8.5000

6.0000

STD

1.1943

0.8899

1.0726

1.7090

0.0000

1.3326

0.0000

AVG

118.1667

13.7000

143.5333

110.1667

128.6667

144.8333

141.0333

STD

7.2829

4.9421

10.4806

38.9085

9.8132

11.0018

10.4436

AVG

23.7667

11.9667

27.6667

30.0000

23.5333

27.0000

26.6333

STD

3.3701

4.0725

2.6042

8.1875

3.8393

3.0626

3.2956

AVG

8.6667

5.1000

7.7667

10.0667

5.8667

9.0000

8.3667

STD

2.5235

0.3051

1.7157

2.1961

1.5025

1.6400

2.4280

AVG

3.2333

3.2667

6.0000

7.0333

5.5000

7.3000

5.1667

STD

0.6814

1.1725

1.9827

3.0792

1.5920

1.8965

1.2617

AVG

13.4667

4.3000

12.2333

10.7667

9.7333

14.1667

10.9667

STD

3.5790

1.0875

1.9241

3.2237

1.9989

2.3057

1.9384

AVG

21.7000

9.6667

19.5667

27.4667

19.4000

19.9000

20.6667

STD

1.9325

2.6566

2.4870

4.6441

3.5681

2.9402

2.8567

AVG

4.2333

4.2667

6.5000

6.8333

6.6000

6.5333

6.5667

STD

1.8998

1.9640

2.3305

3.0181

1.4288

2.1292

1.3817

AVG

12.2667

12.7000

22.7667

27.5667

22.7000

22.8000

21.5333

STD

2.5452

3.3646

3.1479

6.0211

3.0643

3.4180

2.6876

AVG

2.1213

2.6667

3.6333

5.4667

2.4667

6.0333

2.3333

STD

0.9444

0.8841

1.1592

2.0126

0.7303

1.4000

0.7100

AVG

4.3333

4.5333

5.8667

7.9333

4.6000

6.5000

4.6333

STD

0.7303

0.6288

1.1958

1.9640

0.8944

1.7370

0.6149

AVG

5.4000

5.6000

12.7333

15.3000

9.1333

12.5000

11.8333

STD

2.3413

2.2682

2.4766

3.3441

1.9954

3.1596

2.5742

AVG

4913.5000

144.8333

4999.2333

3678.0667

4880.8000

5068.4333

4987.4667

STD

35.1399

53.8991

51.1675

1080.4006

35.3040

64.0499

18.5728

AVG

2681.5000

144.5667

2798.0333

2353.3667

2771.2000

2821.6000

2821.5667

STD

32.3758

205.0720

38.2834

792.8315

51.0472

47.2883

33.9103

AVG

5323.0333

182.5333

5439.0000

3988.6667

5304.7000

5459.9333

5401.0333

STD

42.5258

67.1040

42.0689

1123.2541

39.5406

43.8917

28.5868

AVG

5085.8667

362.6333

5141.7333

3327.9000

5101.7667

5163.3000

5182.2000

STD

58.8187

423.2828

52.0404

903.9182

46.7522

47.7213

43.2079

W|T|L

8|0|12

11|0|9

0|0|20

0|0|20

1|1|19

0|0|20

1|1|19

34

0.2

0.36

0.18

0.34

0.16

0.32

0.14

0.3

0.12

Average Fitness

Average Fitness

0.38

0.28 0.26 0.24

0.06

0.22

0.04 ISSAFD SCA SSA

0.2 0.18

0.1 0.08

0

20

40

60

80

ISSAFD SCA SSA

0.02 0

100

0

20

40

iterations

60

80

100

iterations

(a) Exactly2

(b) CongressEW

0.14

0.25

0.12 0.2

0.08

Average Fitness

Average Fitness

0.1

0.06

0.15

0.1

0.04 0.05 0.02

0

ISSAFD SCA SSA

0

20

40

60

80

0

100

ISSAFD SCA SSA

0

20

40

iterations

80

100

(d) IonosphereEW

0.06

0.12

0.05

0.1

0.04

0.08

Average Fitness

Average Fitness

(c) Zoo

0.03

0.02

0.06

0.04

0.01

0

60

iterations

0.02

ISSAFD SCA SSA

0

20

40

60

80

0

100

iterations

ISSAFD SCA SSA

0

20

40

60

80

100

iterations

(e) Sonar

(f) Leukemia 2

Figure 9: Convergence curve for SSA, SCA and ISSAFD on six datasets

35

445 446 447

448 449 450 451 452 453 454 455 456

457 458 459 460 461 462 463 464 465 466 467 468 469

470 471 472 473 474 475 476 477 478 479

the proposed method ISSAFD presents a fast convergence for the most datasets, while SCA presents a slow convergence. In addition, SSA presents an intermediate convergence behavior for the most datasets. • In terms of average time computation : This measure indicates the speed of an algorithm in selecting features from a given dataset. According to the results of Table 15, the average time of all algorithms equals to 6.1 seconds. In this measure, the GA is the fastest algorithm since it obtained the lowest time in 9 datasets out of 20 in 4.9 seconds, followed by ISSAFD and SSA with 7 and 4 datasets, respectively. In addition, the STD measure proved the stability of the ISSAFD since it ranked three with 0.15 after both of PSO and ALO with 0.0752 and 0.0753, respectively. 5.6. Wilcoxon’s rank test: In this subsection, the Wilcoxon’s test is applied to check if there is a significant difference among the proposed approach and other methods. It is applied on accuracy measure at a significant level equals to 0.05; if p-value < 0.05 that indicates the proposed approach has a significant difference. Table 16 shows that the results of the ISSAFD vs. SCA and ALO showed significant differences in all dataset except for M-of-n and Zoo, respectively. Whereas, it also outperformed the SSA in all datasets except for Exactly and M-of-n. we note that the bold values in Table 16 mean that p-value is greater than 0.05. We can observe that the ISSAFD showed significant differences in all datasets vs. PSO, GWO, and GA. In general, the proposed approach showed positive improvement against all methods and it inherited the strength of SSA and SCA algorithms. 5.7. Comparison with the state-of-the-art FS methods The performance of the proposed approach in classifying twenty benchmark datasets is compared with thirteen state-of-the-art namely binary dragonfly algorithm (BDA) (Mafarja et al., 2018), hybrid whale optimization algorithm with simulated annealing with tangent transfert functio (WOASAT) (Mafarja & Mirjalili, 2017), binary salp swarm algorithm (BSSA) (Faris et al., 2018), binary gray wolf optimization-Approach 2 (bGWO2) (Emary et al., 2016b), binary gray wolf optimization-Approach 1 (bGWO1) (Emary et al., 2016b), SSAPSO (Ibrahim et al., 2019), ISCA (Sindhu et al., 2017), improved salp swarm algorithm (ISSA) (Hegazy et al., 2018), grasshopper optimization 36

Table 14: Comparison of ISSAFD versus others optimizers based on best values of fitness Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking

Metric

ISSAFD

SCA

SSA

GWO

ALO

GA

PSO

AVG

0.0251

0.1222

0.0278

0.2229

0.0046

0.2066

0.0046

STD

0.0444

0.1205

0.0576

0.0799

0.0046

0.0769

0.0000

AVG

0.1889

0.2325

0.2310

0.2442

0.2213

0.2457

0.2133

STD

0.0000

0.0025

0.0086

0.0122

0.0069

0.0150

0.2308

AVG

0.0983

0.1077

0.1017

0.1239

0.0915

0.1219

0.0772

STD

0.0157

0.0214

0.0152

0.0173

0.0129

0.0196

0.0978

AVG

0.0324

0.0807

0.0654

0.1149

0.0519

0.0999

0.0363

STD

0.0154

0.0180

0.0207

0.0377

0.0162

0.0292

0.0688

AVG

0.0181

0.0357

0.0126

0.0869

0.0046

0.0861

0.0046

STD

0.0281

0.0532

0.0242

0.0420

0.0046

0.0467

0.0000

AVG

0.0036

0.1258

0.1914

0.2498

0.1690

0.2355

0.1360

STD

0.0002

0.0400

0.0249

0.0418

0.0376

0.0414

0.2024

AVG

0.0071

0.0554

0.0541

0.0946

0.0456

0.0634

0.0277

STD

0.0081

0.0221

0.0168

0.0224

0.0172

0.0237

0.0749

AVG

0.0644

0.0836

0.0750

0.0981

0.0729

0.0878

0.0586

STD

0.0117

0.0115

0.0098

0.0147

0.0064

0.0105

0.0761

AVG

0.0022

0.0294

0.0318

0.0431

0.0262

0.0391

0.0145

STD

0.0004

0.0053

0.0070

0.0086

0.0040

0.0077

0.0278

AVG

0.0188

0.0389

0.0557

0.0757

0.0396

0.0706

0.0308

STD

0.0072

0.0121

0.0121

0.0146

0.0079

0.0116

0.0593

AVG

0.0316

0.0504

0.0368

0.0482

0.0255

0.0510

0.0228

STD

0.0055

0.0112

0.0065

0.0075

0.0039

0.0117

0.0367

AVG

0.0234

0.0313

0.0283

0.0373

0.0217

0.0360

0.0196

STD

0.0057

0.0065

0.0072

0.0077

0.0037

0.0070

0.0349

AVG

0.2394

0.2071

0.1946

0.2075

0.1837

0.2038

0.1748

STD

0.0133

0.0097

0.0078

0.0105

0.0068

0.0125

0.1948

AVG

0.0033

0.0021

0.0028

0.0088

0.0019

0.0056

0.0015

STD

0.0133

0.0097

0.0078

0.0105

0.0068

0.0125

0.1948

AVG

0.0025

0.0107

0.0037

0.0238

0.0029

0.0339

0.0025

STD

0.0005

0.0178

0.0007

0.0381

0.0006

0.0499

0.0038

AVG

0.0193

0.0172

0.0202

0.0277

0.0149

0.0256

0.0110

STD

0.0071

0.0051

0.0046

0.0066

0.0042

0.0068

0.0217

AVG

0.0047

0.0001

0.0048

0.0464

0.0047

0.0214

0.0048

STD

0.0000

0.0001

0.0000

0.0498

0.0000

0.0375

0.0048

AVG

0.0047

0.0039

0.1206

0.2288

0.0654

0.2022

0.0048

STD

0.0001

0.0204

0.0727

0.1014

0.0681

0.0843

0.1463

AVG

0.0047

0.0002

0.0708

0.0784

0.0707

0.0709

0.0048

STD

0.0000

0.0001

0.0000

0.0224

0.0000

0.0000

0.0709

AVG

0.0190

0.0208

0.1620

0.2232

0.0850

0.1841

0.0521

STD

0.0220

0.0269

0.0469

0.0451

0.0331

0.0624

0.1464

W|T|L

8|1|12

3|0|17

0|0|20

0|0|20

2|2|18

0|0|20

10|2|10

37

Table 15: Comparison between ISSAFD with other metaheuristics based on time Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking

Metric

ISSAFD

SCA

SSA

GWO

ALO

GA

PSO

AVG

5.2474

5.2675

6.1712

5.8158

6.0738

5.3047

6.0253

STD

0.2345

0.2401

0.3838

0.7531

0.0411

0.2111

0.0604

AVG

4.6817

4.6848

6.0642

6.4241

6.2144

5.0456

6.1010

STD

0.3949

0.5941

0.4893

0.6478

0.0563

0.1888

0.0899

AVG

3.0765

3.6705

3.8701

3.8027

3.8757

3.0971

3.8228

STD

0.1641

0.0646

0.0490

0.0872

0.0332

0.0220

0.0207

AVG

4.0305

3.5469

3.6485

3.6406

3.6802

2.9251

3.6211

STD

0.0218

0.0523

0.0428

0.0407

0.0346

0.0326

0.0297

AVG

5.0924

5.0928

5.7137

5.5713

5.7925

5.1035

5.7482

STD

0.2566

0.1319

0.3158

1.1317

0.0344

0.1202

0.0319

AVG

8.9222

3.8012

3.9324

4.0957

3.9515

3.1252

3.9185

STD

0.0981

0.0414

0.0325

0.0361

0.0347

0.0319

0.0283

AVG

8.7573

3.6515

3.6813

3.7135

3.7133

2.9429

3.6493

STD

0.0130

0.0517

0.0403

0.0388

0.0328

0.0202

0.0284

AVG

3.8668

3.6496

3.7769

3.6278

3.7698

3.0199

3.7236

STD

0.0194

0.0410

0.0603

0.0518

0.0345

0.0640

0.0489

AVG

4.9958

3.7073

4.0702

4.0139

4.1382

3.2910

4.0713

STD

0.0801

0.1621

0.1368

0.1735

0.0378

0.0463

0.0343

AVG

4.9477

3.7844

3.8080

3.7675

3.8042

2.9807

3.7562

STD

0.0349

0.1256

0.2128

0.1250

0.0316

0.0572

0.0270

AVG

13.3236

16.9805

8.6549

10.2884

8.6481

7.0108

8.5729

STD

0.2122

2.7574

0.4485

0.9030

0.1821

0.5941

0.1403

AVG

3.4026

3.4110

3.6087

3.5897

3.6620

3.8761

3.5907

STD

0.0360

0.1210

0.1058

0.0965

0.0378

0.0524

0.0341

AVG

20.3592

36.2139

18.8454

23.8825

18.3341

15.0997

18.3388

STD

0.5800

9.3794

1.5606

2.6485

0.4581

1.5280

0.5309

AVG

2.6622

3.1859

3.4195

3.4294

3.4390

2.7383

3.3916

STD

0.1533

0.1481

0.0485

0.0395

0.0258

0.0277

0.0276

AVG

3.4129

3.4212

3.4761

3.5188

3.4999

3.7786

3.4567

STD

0.0845

0.0563

0.0464

0.0569

0.0386

0.0267

0.0391

AVG

5.2273

4.1948

3.8757

3.7588

3.9326

3.1492

3.8247

STD

0.0633

0.1808

0.2941

0.1129

0.0340

0.1048

0.0279

AVG

9.9118

5.2046

6.9822

13.2539

7.9923

5.4345

7.5435

STD

0.0679

0.0715

0.0954

0.3628

0.0645

0.1127

0.0742

AVG

8.7458

4.9592

5.8747

9.3682

6.3548

4.9919

6.1487

STD

0.0965

0.0865

0.1431

0.5131

0.0864

0.1852

0.0517

AVG

10.3236

5.8585

8.9088

14.7234

10.0450

6.9539

9.5933

STD

0.0894

0.0850

0.1204

0.5314

0.1089

0.1723

0.0911

AVG

11.8059

5.9487

11.1463

14.8553

12.2234

8.6859

11.7038

STD

0.2989

0.2993

0.1576

1.1016

0.0979

0.3981

0.0871

W|T|L

7|0|13

4|0|16

0|0|20

0|0|20

0|0|20

9|0|11

0|0|20

38

Table 16: p-value of the the Wilcoxon test for the classification accuracy results of ISSAFD and other algorithms ISSAFD vs.

SCA

GWO

ALO

GA

PSO

SSA

Datasets

p-value

p-value

p-value

p-value

p-value

p-value

Exactly

1.35E-02

5.08E-09

6.61E-04

4.18E-08

6.61E-04

2.39E-01

Exactly2

1.08E-12

1.20E-12

9.91E-13

1.20E-12

8.31E-13

1.16E-12

HeartEW

3.86E-09

2.08E-11

4.77E-11

1.63E-11

1.14E-11

1.59E-11

Lymphography

1.80E-11

1.19E-10

2.23E-11

2.25E-11

1.80E-11

2.12E-11

M-of-n

7.45E-01

5.91E-06

1.37E-03

2.53E-06

1.37E-03

9.45E-02

PenglungEW

9.53E-13

9.11E-13

7.58E-13

7.88E-13

9.23E-13

9.78E-13

SonarEW

3.73E-12

3.66E-12

3.61E-12

3.75E-12

3.40E-12

3.57E-12

SpectEW

1.28E-11

1.23E-11

1.21E-11

1.16E-11

1.28E-11

1.28E-11

CongressEW

4.49E-13

1.07E-12

1.03E-12

1.06E-12

8.19E-13

1.04E-12

IonosphereEW

4.96E-11

8.54E-12

8.32E-12

8.13E-12

8.30E-12

1.12E-11

KrvskpEW

7.65E-11

3.93E-11

1.35E-05

2.76E-10

4.18E-07

6.32E-10

Vote

7.57E-12

2.86E-10

2.58E-11

4.35E-11

7.56E-11

6.75E-10

WaveformEW

3.13E-10

6.31E-11

4.29E-04

1.52E-09

3.55E-05

2.09E-07

WineEW

6.55E-04

1.25E-05

2.14E-02

1.38E-08

4.19E-02

8.22E-07

Zoo

5.56E-03

1.24E-07

8.15E-02

1.14E-07

4.18E-02

1.32E-03

BreastEW

2.40E-08

1.64E-11

2.26E-10

2.17E-11

2.02E-11

1.75E-11

Brain_Tumor2

4.45E-12

7.21E-13

7.20E-13

5.20E-13

4.53E-13

3.17E-13

9_Tumors

6.08E-10

4.14E-12

1.15E-12

1.17E-12

1.13E-12

1.16E-12

Leukemia2

4.99E-09

7.77E-13

5.68E-13

7.11E-13

7.67E-13

8.22E-13

Prostrate Tumors

9.03E-11

9.34E-12

8.86E-12

8.78E-12

8.39E-12

8.90E-12

39

480 481 482 483 484 485 486

487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507

algorithm for feature selection (GOFS) (Zakeri & Hokmabadi, 2019), chaotic salp swarm algorithm (CSSA) (Sayed et al., 2018), sigmoid binary butterfly optimization Algorithm (S-bBOA) (Arora & Anand, 2019), multi-ensemble grey wolf optimizer (MEGWO) (Tu et al., 2019), and binary grasshopper optimization algorithm based on mutation (BGOA-M) (Mafarja et al., 2019). Not all the previous studies contain the same datasets, therefore, the missed results were replaced with "-". • In terms of accuracy

Table 17 reports the classification accuracy of ISSAFD in all datasets along with different methods. From this table, we can conclude that, in terms of the small datasets (i.e., from number 1 to 16 in Table 17), the ISSAFD outperformed all methods in 6 datasets out of 16 that equals to 38% of all datasets. As well as the ISSAFD and BDA achieved 100% accuracy in 2 datasets namely PenglungEW and WineEW. In addition, the ISSAFD obtained 100% accuracy in the Zoo dataset equal with BDA and BSSA. The BDA comes in the second rank because it obtained the best accuracy in 3 datasets out of 16 and achieved 100% accuracy in the M-of-n and Exactly datasets equal with the third-ranked method (i.e., WOASAT (Mafarja & Mirjalili, 2017)). The BGOA-M is ranked fourth with 3 datasets, it acted similar to WOASAT but in general, WOASAT obtains better results then BGOA-M. The BSSA method is ranked fifth followed by bGWO2. However, the results of the rest of the methods are good, they are less than the other methods for all matching datasets. In terms of the large datasets, the ISSAFD outperformed all other methods namely BDA, BSSA, ISCA and GOFS. These results indicate that the proposed approach has the ability to achieve promising results than other methods in terms of small and large datasets.

40

Table 17: Comparison of ISSAFD versus state-of-the-arts in terms of classification accuracy Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumors2 9_Tumor Leukemia2 Prostrate Tumors Ranking (W|T|L) Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumors2 9_Tumor Leukemia2 Prostrate Tumors Ranking (W|T|L)

ISSAFD 0.980 0.810 0.906 0.972 0.988 1.000 0.997 0.939 1.000 0.985 0.974 0.981 0.764 1.000 1.000 0.985 1.000 1.000 1.000 0.986 13|3|7 ISCA 0.842 0.837 0.913 0.987 0.966 0|0|5

BDA 1.000 0.773 0.876 0.992 1.000 1.000 0.980 0.850 0.987 0.991 0.979 0.989 0.758 1.000 1.000 0.979 0.710 0.561 8|5|10 ISSA 0.734 0.853 0.766 0.978 0.872 0.957 0|0|6

WOASAT 1.000 0.750 0.850 0.890 1.000 0.940 0.970 0.880 0.980 0.960 0.980 0.970 0.760 0.990 0.970 0.980 3|3|13 GOFS 0.881 0.986 0.942 0.949 0.987 0.991 0|0|6

41

BSSA 0.980 0.758 0.860 0.890 0.991 0.877 0.937 0.836 0.963 0.918 0.964 0.951 0.733 0.993 1.000 0.948 0.989 1|1|15 CSSA 0.989 0.928 0.846 0.766 0.882 0|0|5

bGWO2 0.776 0.750 0.776 0.700 0.963 0.584 0.729 0.822 0.938 0.834 0.956 0.920 0.789 0.920 0.879 0.935 1|0|15 S-bBOA 0.972 0.760 0.824 0.868 0.972 0.878 0.936 0.846 0.959 0.907 0.966 0.965 0.743 0.984 0.978 0.971 0|0|16

bGWO1 0.708 0.745 0.776 0.744 0.908 0.600 0.731 0.820 0.935 0.807 0.944 0.912 0.786 0.930 0.879 0.924 0|0|16 MEGWO 0.853 0.948 0.959 0.825 0.788 0.984 0.992 0|0|7

SSAPSO 0.823 0.847 0.944 0.939 0.785 0|0|5 BGOA-M 1.000 0.736 0.837 0.919 1.000 0.973 0.933 0.843 0.977 0.966 0.980 0.967 0.760 0.989 0.961 0.970 3|3|13

508 509 510 511 512 513 514 515 516 517 518 519

520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541

• In terms of the selected number of features

Table 18 shows a comparison between the proposed approach and ten previous studies. As can be seen in this table, the ISSAFD obtained the smallest number of features in 8 datasets out of 20, whereas, the BGOAM came in the second rank with 6 datasets. However, the BGOA-M showed good ability in decreasing the features numbers, its accuracy, in general, is less than the ISSAFD. The BDA is ranked third followed by bGWOA2 and WOASAT, respectively. The rest of the studies are showed similar results in most datasets except for the bGWOA1 method, it showed the worst features number in all datasets. Finally, the ISSAFD showed good result in the produced features numbers as well as good accuracy in most datasets.

From all the previous results, it can be noticed that the performance of the ISSAFD is better than other algorithms in most of comparisons. The superiority of the ISSAFD can be due to the following two factors, the first one is to use the SCA to enhance the behaviors of the SSA’s followers that leads to maintain the population and improve its ability to search for the best solution. The second reason is using the DO operator that helps in avoiding getting trapped in local optima and improving the ability of exploration as well as guaranteeing the diversity of the population. Therefore, these factors obviously improve the behavior of the SSA and the advantages of both SCA and Dop are moved to the SSA. As well as, the simplicity, few predefined parameters, and low computation requirement of the SSA help in giving stable and good results after combing with SCA algorithm and Dop operator. In addition, the DO has a large effect on improving the performance of the proposed method. This result is observed when comparing between ISSAFD and ISSAF where the ISSAFD outperformed ISSAF by 70% in terms of accuracy. Moreover, the ISSAFD achieved the highest accuracy in 65% of all datasets when comparing with the state-of-the-art followed by BDA with 40%. In general, the proposed ISSAFD is easy to implement and showed good performances and results, however, it still needs more enhancement, especially, in the time complexity issue. That can be obtained by applying the DO for only a small part of the population or based on a specific condition.

42

Table 18: Comparison of ISSAFD versus state-of-the-arts in terms of features selected number Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking (W|T|L) Datasets Exactly Exactly2 HeartEW Lymphography M-of-n PenglungEW SonarEW SpectEW CongressEW IonosphereEW KrvskpEW Vote WaveformEW WineEW Zoo BreastEW Brain_Tumor2 9_Tumors Leukemia2 Prostrate Tumors Ranking (W|T|L)

ISSAFD 5.3 1.0 6.2 7.8 7.4 118.2 23.8 8.7 3.2 13.5 21.7 4.2 12.3 2.1 4.3 5.4 4913.5 2681.5 5323.0 5085.9 9|2|11 BGOA-M 6.0 7.0 5.0 7.0 6.0 36.0 16.0 7.0 3.0 7.0 11.0 3.0 14.0 3.0 5.0 10.0 5|2|11

S-bBOA 7.6 4.8 5.8 8.4 6.8 172.0 32.8 10.8 6.4 16.2 17.6 5.2 25.0 6.2 5.2 16.8 0|0|16 bGWOA1 8.6 8.4 9.2 9.6 10.6 160.6 37.2 12.8 7.0 19.6 26.6 8.6 30.0 8.8 10.6 21.0 0|0|16

BSSA 7.4 2.0 8.0 10.2 7.5 172.7 32.4 13.3 5.4 17.3 21.9 7.1 23.3 8.0 7.6 16.0 0|0|16 bGWOA2 5.3 4.4 6.2 8.8 6.0 124.5 16.0 8.7 6.6 9.2 12.8 4.8 14.6 6.6 7.4 15.3 2|2|14

43

WOASA-1 6.0 1.0 6.2 6.8 6.0 138.0 26.60 9.60 4.40 11.4 19.4 3.8 21.6 6.8 5.8 13.6 3|2|13 CSSA 13.2 12.6 19.0 7.5 1|0|3

ISSA 6.2 18.9 11.1 5.4 19.8 4.5 12.9 0|0|7 MEGWO 4.0 25.6 10.6 15.2 16.0 4.0 5.4 1|0|6

BDA 6.0 7.1 5.8 8.2 6.1 121.2 25.6 6.8 5.5 11.5 20.7 3.4 23.0 3.6 4.4 11.5 5121.4 2758.0 5287.0 2|0|17 QCSI-FS 6.0 10.0 7.0 8.0 5.0 0|0|5

557

We can observe that ISSAFD provides high accuracy, however the number of features selected is important for 60% datasets in our study. Moreover, it is important to emphasize that ISSAFD requires more computational time in comparison to another algorithm as GA. Also, we can see that when we treat high datasets with great number of attributes, ISSAFD needs more time due to the integration of Disrupt operator. Another limitation is marked which is the no-precision of the performance i.e the final subsets of features changes at each execution because the process of exploration/exploitation is based on random rules. This phenomenon provides more confusion for user. In addition, FS based on ISSFD requires the integration of KNN classifier which represent a simple classifier in order to accelerate the process of execution but the community of machine learning prefers to integrate a robust classifier as SVM, RF or MLP which provide better performance. The application of disrupt operator to the whole population allow to diversify the solution but the computational time is increased; therefore, we will handle all these notes in the future works.

558

6. Conclusion and future works

542 543 544 545 546 547 548 549 550 551 552 553 554 555 556

559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577

This paper introduced an alternative feature selection approach based on improved the performance of the Salp swarm algorithm (SSA) which emulates the behavior of salps. The modified version of SSA called ISSAFD since it depends on using the sine-cosine algorithm (SCA) to enhance the behaviors of the followers. In addition, the disruption operator (DO) is used to improve the exploration ability and ensure the diversity of the population. To assess the quality of the proposed ISSAFD, a set of experimental series are performed on twenty datasets where four datasets represent high dimensional with a small number of instance. The performance of the ISSAFD was compared with numerous methods including SSA, SCA, binary grey wolf optimizer (bGWO), particle swarm optimization (PSO), ant lion optimization (ALO) and genetic algorithm (GA) as well as a comparison with the stateof-the-art were provided. The obtained results showed that the ISSAFD has performed better than other FS methods in terms of performance measures which including accuracy, sensitivity, specificity and number of selected features. Based on the encouraging results of the proposed ISSAFD, in future, it can apply it in different applications such as image segmentation, task scheduling, and cloud computing. Moreover, we can formulate the problem of FS as a multi-objective optimization problem (MOP) based on ISSAFD. 44

581

The process aims to find a compromise between two objectives maximizing the accuracy and reducing the number of features. So, the advantage of using MOA allows to generate a set of solutions rather one solution. Also, becomes more efficient in order to determine the best set.

582

Acknowledgments

578 579 580

584

This work is supported by the China Postdoctoral Science Foundation under Grant No. 2019M652647.

585

References

586

References

583

587 588 589 590

591 592 593

594 595

596 597

598 599 600

601 602 603

604 605 606

Ahmed, S., Mafarja, M., Faris, H., & Aljarah, I. (2018). Feature selection using salp swarm algorithm with chaos. In Proceedings of the 2nd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (pp. 65–69). Aljarah, I., Mafarja, M., Heidari, A. A., Faris, H., Zhang, Y., & Mirjalili, S. (2018). Asynchronous accelerating multi-leader salp chains for feature selection. Applied Soft Computing, 71 , 964–979. Anderson, P. A., & Bone, Q. (1980). Communication between individuals in salp chains. ii. physiology. Proc. R. Soc. Lond. B , 210 , 559–574. Arora, S., & Anand, P. (2019). Binary butterfly optimization approaches for feature selection. Expert Systems with Applications, 116 , 147–160. Baliarsingh, S. K., Vipsita, S., Muhammad, K., Dash, B., & Bakshi, S. (2019). Analysis of high-dimensional genomic data employing a novel bioinspired algorithm. Applied Soft Computing, 77 , 520–532. Chen, Y., Li, L., Xiao, J., Yang, Y., Liang, J., & Li, T. (2018). Particle swarm optimizer with crossover operation. Engineering Applications of Artificial Intelligence, 70 , 159–169. Dong, H., Li, T., Ding, R., & Sun, J. (2018). A novel hybrid genetic algorithm with granular information for feature selection and optimization. Applied Soft Computing, 65 , 33–46. 45

607 608 609

610 611 612

613 614 615

616 617 618

619 620 621 622

623 624 625

626 627

628 629 630

631 632 633 634

635 636

Dorigo, M., Maniezzo, V., & Colorni, A. (1996). Ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 26 , 29–41. Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. In Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on (pp. 39–43). IEEE. Elaziz, M. A., Ewees, A. A., Ibrahim, R. A., & Lu, S. (2019). Oppositionbased moth-flame optimization improved by differential evolution for feature selection. Mathematics and Computers in Simulation, . Elaziz, M. A., Oliva, D., & Xiong, S. (2017a). An improved oppositionbased sine cosine algorithm for global optimization. Expert Systems with Applications, 90 , 484–500. Elaziz, M. E. A., Ewees, A. A., Oliva, D., Duan, P., & Xiong, S. (2017b). A hybrid method of sine cosine algorithm and differential evolution for feature selection. In International Conference on Neural Information Processing (pp. 145–155). Springer. Emary, E., Zawbaa, H. M., & Grosan, C. (2018). Experienced gray wolf optimization through reinforcement learning and neural networks. IEEE transactions on neural networks and learning systems, 29 , 681–694. Emary, E., Zawbaa, H. M., & Hassanien, A. E. (2016a). Binary ant lion approaches for feature selection. Neurocomputing, 213 , 54–65. Emary, E., Zawbaa, H. M., & Hassanien, A. E. (2016b). Binary grey wolf optimization approaches for feature selection. Neurocomputing, 172 , 371– 381. Faris, H., Mafarja, M. M., Heidari, A. A., Aljarah, I., AlaâĂŹM, A.-Z., Mirjalili, S., & Fujita, H. (2018). An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems, 154 , 43–67. Frank, A. (2010). Uci machine learning repository. http://archive. ics. uci. edu/ml , .

46

637 638 639

640 641

642 643 644

645 646 647

648 649

650 651 652

653 654 655 656

657 658 659 660

661 662 663

664 665

Ghimatgar, H., Kazemi, K., Helfroush, M. S., & Aarabi, A. (2018). An improved feature selection algorithm based on graph clustering and ant colony optimization. Knowledge-Based Systems, 159 , 270–285. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3 , 1157–1182. Hafez, A. I., Zawbaa, H. M., Emary, E., & Hassanien, A. E. (2016). Sine cosine optimization algorithm for feature selection. In 2016 (INISTA) (pp. 1–5). IEEE. Hancer, E., Xue, B., & Zhang, M. (2018). Differential evolution for filter feature selection based on information theory and feature ranking. KnowledgeBased Systems, 140 , 103–119. Harwit, M. (2006). Astrophysical concepts. Springer Science &Business Media, ,. Hegazy, A. E., Makhlouf, M., & El-Tawel, G. S. (2018). Improved salp swarm algorithm for feature selection. Journal of King Saud University-Computer and Information Sciences, . Ibrahim, R. A., Ewees, A. A., Oliva, D., Elaziz, M. A., & Lu, S. (2019). Improved salp swarm algorithm based on particle swarm optimization for feature selection. Journal of Ambient Intelligence and Humanized Computing, 10 , 3155–3169. Ibrahim, R. A., Oliva, D., Ewees, A. A., & Lu, S. (2017). Feature selection based on improved runner-root algorithm using chaotic singer map and opposition-based learning. In International Conference on Neural Information Processing (pp. 156–166). Springer. Khamees, M., Albakr, A. Y., & Shakher, K. (2018). A new approach for features selection based on binary slap swarm algorithm. Journal of Theoretical & Applied Information Technology, 96 . Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97 , 273–324.

47

666 667 668 669

670 671

672 673

674 675 676 677

678 679 680 681

682 683 684

685 686

687 688

689 690 691

692 693

694 695

Lensen, A., Xue, B., & Zhang, M. (2018). Automatically evolving difficult benchmark feature selection datasets with genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 458–465). ACM. Liu, H., Ding, G., & Wang, B. (2014). Bare-bones particle swarm optimization with disruption operator. Appl. Math. Comput., 238 , 106–122. Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining volume 454. Springer Science & Business Media. Mafarja, M., Aljarah, I., Faris, H., Hammouri, A. I., AlaâĂŹM, A.-Z., & Mirjalili, S. (2019). Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Systems with Applications, 117 , 267–286. Mafarja, M., Aljarah, I., Heidari, A. A., Faris, H., Fournier-Viger, P., Li, X., & Mirjalili, S. (2018). Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowledge-Based Systems, 161 , 185–204. Mafarja, M., Jarrar, R., Ahmad, S., & Abusnaina, A. A. (). Feature selection using binary particle swarm optimization with time varying inertia weight strategies. Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62 , 441–453. Mafarja, M., & Sabar, N. R. (). Rank based binary particle swarm optimisation for feature selection in classification. Mafarja, M. M., & Mirjalili, S. (2017). Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing, 260 , 302– 312. Mirjalili, S. (2015). The ant lion optimizer. Advances in Engineering Software, 83 , 80–98. Mirjalili, S. (2016). Sca: a sine cosine algorithm for solving optimization problems. Knowledge-Based Systems, 96 , 120–133.

48

696 697 698 699

700 701 702 703

704 705 706

707 708 709

710 711 712

713 714 715 716

717 718 719

720 721

722 723 724

725 726 727

Mirjalili, S., Gandomi, A. H., Mirjalili, S. Z., Saremi, S., Faris, H., & Mirjalili, S. M. (2017). Salp swarm algorithm: A bio-inspired optimizer for engineering design problems. Advances in Engineering Software, 114 , 163– 191. Moayedikia, A., Ong, K.-L., Boo, Y. L., Yeoh, W. G., & Jensen, R. (2017). Feature selection for high dimensional imbalanced class data using harmony search. Engineering Applications of Artificial Intelligence, 57 , 38– 49. Rajamohana, S., & Umamaheswari, K. (2018). Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Computers & Electrical Engineering, 67 , 497–508. Sayed, G. I., Khoriba, G., & Haggag, M. H. (2018). A novel chaotic salp swarm algorithm for global optimization and feature selection. Applied Intelligence, (pp. 1–20). Shunmugapriya, P., & Kanmani, S. (2017). A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid). Swarm and Evolutionary Computation, 36 , 27–36. Silva, M. A. L., de Souza, S. R., Souza, M. J. F., & de Franca Filho, M. F. (2018). Hybrid metaheuristics and multi-agent systems for solving optimization problems: A review of frameworks and a comparative analysis. Applied Soft Computing, 71 , 433–459. Sindhu, R., Ngadiran, R., Yacob, Y. M., Zahri, & Hanin, N. A. (2017). Sine–cosine algorithm for feature selection with elitism strategy and new updating mechanism. NCA, 28 , 2947–2958. Talbi, E.-G. (2009). Metaheuristics: from design to implementation volume 74. John Wiley & Sons. Tawhid, M. A., & Dsouza, K. B. (2018). Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems. Applied Computing and Informatics, . Tu, Q., Chen, X., & Liu, X. (2019). Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Applied Soft Computing, 76 , 16–30. 49

728 729 730

731 732 733

734 735 736

Yang, X.-S. (2013). Metaheuristic optimization: Nature-inspired algorithms and applications. In Artificial Intelligence, Evolutionary Computing and Metaheuristics (pp. 405–420). Springer. Zakeri, A., & Hokmabadi, A. (2019). Efficient feature selection method using real-valued grasshopper optimization algorithm. Expert Systems with Applications, 119 , 61–72. Zhang, L., Mistry, K., Lim, C. P., & Neoh, S. C. (2018). Feature selection using firefly optimization for classification and regression models. Decision Support Systems, 106 , 64–85.

50

Credit Author Statement Author Contributions: All authors contributed equally to this work. Nabil Neggaz proposed the idea of solving the problem of feature selection. He developed the code of the objective function and searched for the datasets and He also wrote background. Majdi Mafarja wrote the Introduction and related work. Mohamed Abd elaziz developed part of the experiments and described the proposed approach and part of the implementation. Ahmed A. Ewees write the analysis of the results Declaration of Competing Interest The authors declare no competing financial interests

52