A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection

A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection

A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection Communicated by Dr Weiguo Sheng Journal Pre-proof ...

1005KB Sizes 0 Downloads 28 Views

A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection

Communicated by Dr Weiguo

Sheng

Journal Pre-proof

A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection Fan Cheng, Jiabin Chen, Jianfeng Qiu, Lei Zhang PII: DOI: Reference:

S0925-2312(20)30203-4 https://doi.org/10.1016/j.neucom.2020.02.028 NEUCOM 21896

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

10 September 2019 27 January 2020 5 February 2020

Please cite this article as: Fan Cheng, Jiabin Chen, Jianfeng Qiu, Lei Zhang, A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection, Neurocomputing (2020), doi: https://doi.org/10.1016/j.neucom.2020.02.028

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.

A Subregion Division Based Multi-Objective Evolutionary Algorithm for SVM Training Set Selection Fan Chenga,b , Jiabin Chenb , Jianfeng Qiua,b , Lei Zhanga,b,∗ a Key

Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, China b School of Computer Science and Technology, Anhui University, China

Abstract Support vector machine (SVM) is a popular machine learning method with a solid theoretical foundation, and has shown promising performance on different classification problems. However, it suffers from an expensive training cost, which makes it not be very suitable for the application with a large-scale training set. To this end, as a data pre-processing technique, training set selection (TSS) for SVM has received much attention recently, since it can reduce the size of SVM training set without degrading the performance. In this paper, a subregion division based multi-objective evolutionary algorithm termed SDMOEA-TSS is proposed for SVM training set selection, where objective space is divided into several subregions for effectively searching good solutions. Specifically, in SDMOEA-TSS, a divided based initialization strategy is firstly suggested to initialize the population to locate in different regions of objective space. Then a subregion based evolutionary (including crossover, mutation and update) strategy is developed, which not only makes full use of individuals in each subregion for local search but also maintains the whole population’s global search ability. Empirical studies on 21 public data sets demonstrate the superiority of the proposed algorithm over the state-of-the-arts in terms of both quality and diversity of the selected SVM training subset. Keywords: Training Set Selection, Multi-objective Optimization, Evolutionary Algorithm, SVM, Instance Selection

1. Introduction SVM (Support Vector Machine), as a popular and powerful supervised classifier in machine learning, has been successfully used in a wide variety of applications, ranging from pattern mining [1] and computer vision [2] to medical diagnosis [3] and information retrieval [4]. Despite of its strong theoretical foundations and good generalization performance, SVM also has some disadvantages, one of which is that training an SVM needs to solve a constrained quadratic programming optimization problem, whose computational complexity is O(n2 ) even O(n3 ) [5] (n is the number of instances in the training set). This issue is especially challenging nowadays, since in many real applications of SVM, the number of training data is very large [6, 7]. To tackle the disadvantage, different techniques have been suggested. Among them, as a data pre-processing ∗ Corresponding

author.

Preprint submitted to Journal of Neurocomputing

February 7, 2020

technique, training set selection (TSS) has attracted much focus, as it can not only decrease the number of training data but also keep (even improve) the performance of SVM [8]. Essentially, TSS for SVM is one kind of instance selection problem, whose goal is to select the only relevant instances in training set before performing SVM training task [9]. Due to its importance, in the last decade, plenty of TSS algorithms with different optimization techniques have been developed [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], among which, evolutionary algorithm (EA) based TSS methods have been paid much attention, as they do not make any assumptions on training set properties, and yield more effective solutions than those using non-evolutionary methods [10]. For example, Kawulok et al. proposed three different genetic algorithm (GA) based instance selection methods for SVM, and experimental results justified their superiority to the classical non-EA based TSS algorithms [15, 16, 17]. Nalepa et al. suggested several memetic algorithm (MA) based methods to perform instance selection for SVM. Empirical studies on the public classification data sets have demonstrated the effectiveness of the suggested algorithms [18, 19]. More EA based TSS methods for SVM can be found in [9]. Despite that EA based training set selection methods mentioned above have shown their competitiveness in obtaining instance subset with high quality, however, most of them utilize single objective optimization techniques. In fact, as pointed out in [9], TSS for SVM is a combinatorial optimization problem, which is characterized by two aspects. On one side, it needs to maximize accuracy of SVM obtained by using the selected training subset, on the other side, the number of the instances in the selected training subset should be minimized. To this end, those single objective EAs often need to introduce a trade-off parameter to balance the accuracy and size of the selected instances subset. Nevertheless, how to set a suitable value for the trade-off parameter is also a difficult problem, especially when we do not have any prior knowledge in real applications. A natural approach to tackle the problem is to develop the multi-objective evolutionary algorithm (MOEA) for SVM training set selection. To be a little surprise, although there are many works on designing MOEAs related to SVM optimization [20, 21, 22], there are only a few works that focus on developing MOEAs on TSS for SVM. For example, in [23], Pighetti et al. applied NSGA-II algorithm to produce good training subsets, which were used to train a SVM. In [24], Rosales-Perez et al. proposed a MOEA/D based TSS algorithm, which can simultaneously perform instance selection and hyper-parameters selection for SVM. Lately, in [25], Acampora et al. suggested a multi-objective training set selection algorithm under the framework of PESA-II. By using the suggested algorithm, the number of selected training instances was greatly reduced and the performance of SVM was further improved. Empirical results of these three algorithms have justified the superiority of MOEA for solving the TSS for SVM over the single EAs and the non-EAs. In this paper, we continue this research line by proposing a novel subregion division based MOEA, termed SDMOEA-TSS, for SVM training set selection, with which the quality and diversity of the selected instance subset can be further improved. Specifically, the main contributions of this paper are summarized as follows. • A subregion division based multi-objective evolutionary algorithm, named SDMOEATSS, is proposed for training set selection for SVM, where objective space is divided into several subregions for searching better solutions. By using the proposed subregion division search strategy, the SDMOEA-TSS is capable of obtaining training subsets with both high quality and good diversity. • In SDMOEA-TSS, a divided based initialization scheme is firstly suggested, which divides the objective space into different subregions and initializes the population in each 2

subregion effectively. Then a subregion based evolutionary strategy is designed, which consists of the subregion based crossover, mutation and update operations. With this strategy, SDMOEA-TSS can utilize the individuals in subregions for local search as well as the whole population for global search. • The effectiveness of the proposed SDMOEA-TSS is verified by comparing it with several state-of-the-arts on 21 SVM classification data sets with different characteristics. Experimental results have demonstrated the superiority of the proposed method over the comparison methods in terms of both quality and diversity of the solutions. The remainder of the paper is organized as follows. In Section 2, related work on training set selection for SVM is presented. Section 3 gives the details of the proposed algorithm and empirical results by comparing our algorithm with several state-of-the-arts are reported in Section 4. Section 5 concludes the paper and discusses the future work. 2. Related Work In this section, we will review the related work on the training set selection (TSS) for SVM. To be specific, firstly, we discuss preliminaries about instance selection (IS) and the related work on IS, since training set selection is a specific form of instance selection1 . Then we focus on the work of TSS for SVM. Finally, we detailed review the multi-objective evolutionary algorithms (MOEAs) and MOEAs for SVM training set selection, which are closely related to our work. 2.1. Preliminaries about Instance Selection and Its Related Work Instance selection (IS) as an important data reduction technique in data mining, has been widely used in many applications[28, 29, 30]. The goal of IS is to obtain a subset of the original instance set with the same (in certain cases even higher) predictive capabilities as (or than) the original set [31]. Formally, the task of instance selection can be described as follows. Given an instance data set D = {(xi , yi )}ni=1 , where xi ∈ Rd denotes the i-th instance and yi ∈ {c1 , ...cm } is the corresponding class label (m is the number of classes). The problem of instance selection is to find an instance subset S ⊆ D with size n0 (n0 < n), which has the same (or higher) predictive capabilities as (or than) the original set. Due to its importance, recently, a number of effective algorithms have been proposed for instance selection, which can be roughly divided into following two categories [27]: prototype selection (PS) and training set selection (TSS). Before we further discuss these two categories of algorithms, it should be admitted that there also exists some other interesting work that can be applied to solve the problem of instance selection effectively, e.g., Pareto optimization for subset selection algorithm (POSS) [32], evolutionary algorithm for large-scale sparse multi-objective optimization problem (SparseEA) [33] and so on. The first group of instance selection is prototype selection, whose aim is to obtain an instance subset that allows the KNN classifier to achieve the maximum classification rate [11]. Representative algorithms of this group includes Edited Nearest Neighbor (ENN) [34], Reduced Nearest Neighbor (RNN) [34], Decremental Reduction Optimization Procedure (DROP) [35], Multi1 It should be noted that in some papers, instance selection (IS), training set selection (TSS) and the prototype selection (PS) are viewed as the similar problems, and can be used interchangeably. However, in this paper, we adopt the distinction in [26, 27], where TSS and PS are two detailed kinds of IS.

3

Class Instance Selection (MCIS) [36], Generational Genetic Algorithm (GGA) [37], SteadyState Genetic Algorithm (SSGA) [38] and so on. These prototype selection algorithms have shown the effectiveness on selecting instance subsets for KNN with high quality, and in recognizing their competitiveness, some researchers considered whether the PS techniques can be extended to improve the performance of other classification methods, which yields the second group of instance selection: training set selection (TSS). Training set selection is to obtain a training subset for a classification learning method (such as SVM, decision tree or neural network et al.), and is often used in the situation that there is too much data in training set, which includes noisy and redundant data [26]. By using TSS technique, not only the number of training data is reduced, but also the performance of a classification algorithm is enhanced. In this paper, we are interested on TSS, specifically, we focus on the TSS for SVM, which has become a hot research topic for data mining in the past few years. 2.2. Training Set Selection for SVM SVM is a well-known supervised classifier in data mining and has been widely used in many real applications [1, 2, 3, 4]. However, it suffers from the important shortcoming of its high training time and memory costs, which greatly depends on the size of training set [5]. To tackle the issue, different techniques have been suggested, either from the algorithm level or from the data level [39]. Recently, as a data pre-processing technique, instance selection has been introduced to select the training subset of SVM, which is called as TSS for SVM. In the past decade, a large number of training set selection methods for SVM have been developed, e.g., clustering based method [11], neighborhood analysis method [12], sampling based method [13] and active learning method [14] and so on. These methods have shown their promising performance in selecting SVM instance subset with high quality, however, most of them solve the problem of TSS by only considering the traditional optimization techniques. Compared with traditional optimization techniques, evolutionary algorithm (EA) based optimization technique does not assume any structure of the data or any behavior of the classifier, thus, is very suitable for training set selection[9]. In the following, we focus on some representative EA based TSS algorithms for SVM, which are closely related to the work we proposed. Before reviewing the EA based TSS for SVM in detail, we note that there also exist some valuable work by adapting EA based prototype selection approaches already developed for KNN to training set selection for SVM. One representative work is [10], in which Verbiest et al. extended three EA base PS algorithms (GGA, CHC and SSGA) and two non-EA based PS algorithms (ENN and MCIS) to SVM training set selection. Experimental results have shown the effectiveness of these extended algorithms. In addition, the authors have also compared EA based algorithms with non-EA ones, and the comparison results revealed that the EA based algorithms had better performance than the non-EA algorithms, especially, GGA algorithm is the most suitable TSS method for SVM. Other EAs based PS for KNN to TSS for SVM can also be found in [9]. In the following, we concentrate on the EA based training set selection algorithms only suggested for SVM. For example, Kawulok et al. proposed a GA based instance selection algorithm for SVM, namely, GASVM [15]. In GASVM, an individual in population represented a candidate SVM training subset of a given size, and standard genetic operators: crossover and mutation were adopted to create the individuals of offspring. Experimental results on the real-world and artificial data sets have shown that GASVM outperformed several non-EA based algorithms, such as sampling based TSS algorithms. In recognizing the superiority of GASVM, more EA based TSS algorithms for SVM have been developed. In [16, 17], two adaptive GA based algorithms, termed AGA and DAGA were suggested, whose sizes of the 4

selected instance subsets can be determined adaptively, and resulted in better solutions than the GASVM algorithm. In [18, 19], Nalepa et al. have developed several memetic algorithm (MA) based algorithms to perform training set selection for SVM, where different selection schemes that balance the exploration and exploitation of the solution space were suggested. By using these suggested schemes, the developed algorithms can extract SVM training subsets with high quality . The EA based TSS algorithms mentioned above have shown their competitiveness in achieving good SVM training subsets, however, by taking a closer look at these algorithms we can find that to get good performances, these algorithms often require to specify a trade-off parameter that balances the accuracy and the number of the selected training instances. Nevertheless, it is difficult to set a suitable value for this trade-off parameter, especially when there is no prior knowledge of the application can be considered. A natural way to overcome this challenge is to design the multi-objective evolutionary algorithm (MOEA) for TSS, where the accuracy on the selected training subset and the number of the selected instances are regraded as two optimized objectives, and can be simultaneous optimized without using a trade-off parameter. 2.3. MOEAs and MOEAs for SVM Training Set Selection In the real world, a various of complex problems can be formulated as multi-objective optimization problems (MOPs), which are characterized by multiple objectives that conflict with each other. The multi-objective optimization problem can be defined as follows (taking the minimization problem as an example): Definition 1. A multi-objective optimization problem is defined as:

Min.F(X) = (F1 (X), F2 (X), ..., Fm (X))T

(1)

where X = {X1 , X2 , ..., Xn } is the n-dimensional decision vector and m is the number of objective functions. Given two decision vectors X1 and X2 , if Fi (X1 ) ≤ Fi (X2 ) for all i = 1, 2, ...m and F(X1 ) , F(X2 ), then X1 dominates X2 or X2 is dominated by X1 (denoted as X1  X2 ). if a decision vector X is not dominated by any other decision vectors, X is a Pareto optimal solution (i.e. nondominated solution). The Pareto solution set denoted as PS = {X ∈ Ω|@X ∗ ∈ Ω, X ∗  X} is the set of all Pareto optimal solutions. The Pareto front denoted as PF = {F(X)|X ∈ PS } is the projection of the Pareto set into the objective space. Multi-objective evolutionary algorithms (MOEAs) are proposed to find a set of non-dominated solutions approximating the true Pareto front as near as possible. Among these approaches, the typical representatives of MOEAs include NSGA-II [40], SPEA-II [41] and MOEA/D [42], and so on. Recently, several novel MOEAs with promising performance have been suggested, which include AR-MOEA [43], SparseEA [33] and MOEO [44] et al. With the emergence of more MOEAs, they have been successfully used to solve complex problems in different fields, such as complex network [45], pattern recommendation [46], industrial automation [47] and economic emission dispatch [48] et al. Although MOEAs have also been widely applied to different SVM optimization areas, such as the parameters selection of SVM [20, 21], reduction of the support vectors for SVM [22] et al., however, to the best of our knowledge, there are only three works 5

focus on designing MOEAs for SVM training set selection, which are all published in the last four years. In [23], a multi-objective genetic algorithm (MOGA) based training set selection algorithm, named MOGA-LSH was proposed to tackle the SVM multi-classification instance selection problem. In MOGA-LSH, NSGA-II was adopted as the framework, and GA was used to search the optimal training subsets for SVM. To further improve the performance of MOGA-LSH, a locality sensitive hashing (LSH) strategy was also suggested. Content-based image classification tasks for SVM have shown that MOGA-LSH outperformed several state-of-the-arts TSS for SVM. In [24], an evolutionary multi-objective model and instance selection algorithm for SVM with Pareto-based ensembles (EMOMIS-PbE) was proposed, where MOEA/D was used as the framework. In EMOMIS-PbE, each individual in population consisted of two parts. The first part was used for the instance selection and the second part was for the model selection, by which the training set selection and hyper-parameter selection for SVM were simultaneously conducted. Moreover, five different Pareto based ensemble strategies were also adopted in EMOMIS-PbE, which further enhanced the performance of EMOMIS-PbE. Experimental results on a suite of 43 well-known classification problems have demonstrated the superiority of EMOMIS-PbE over the traditional instance selection methods for SVM, such as FCNN and DROP3. Lately, in [25], a Pareto-based multi-objective evolutionary approach, named ParetoTSS was suggested for SVM training set selection. The proposed ParetoTSS adopted Pareto Envelopebased Selection Algorithm II (PESA-II) as the framework, whose grid-based fitness assignment mechanisms can maintain diversity in both environmental selection and mating selection. To choose a solution from the final non-dominated set, a decision-making mechanism based on a sum model was also designed. Empirical studies on UCI data sets have shown that the performance of ParetoTSS was not only better than the single objective EA based algorithm (such as GGA) but also outperformed the non-EA based TSS algorithm (such as shell extraction algorithm). The empirical results of MOGA-LSH, EMOMIS-PbE and ParetoTSS justified the effectiveness of MOEA for solving the TSS for SVM, and in this paper, we continue this research line by proposing a novel MOEA (named SDMOEA-TSS) to further improve the quality of the selected SVM instance subset. Different from the existing MOEAs for SVM TSS problem, which all view the objective space as one region, the proposed SDMOEA-TSS divides the objective space into several subregions, and each subregion has its own evolutionary strategy, which utilizes the individuals in subregions for local search and the whole population for global search. By using the subregion division based evolutionary strategy, the proposed SDMOEA-TSS achieves SVM training subset with higher quality. The details of the proposed method is presented in next section. 3. The Proposed Algorithm In this section, we firstly present the framework of the proposed algorithm, then give two important components of SDMOEA-TSS, which are the divided based initialization strategy and the subregion based evolutionary (including crossover, mutation and update) strategy. 3.1. The General Framework of SDMOEA-TSS The proposed algorithm adopts a similar framework with NSGA-II [40], which is one of the most popular multi-objective algorithm in evolutionary computation. Specifically, the general 6

Algorithm 1: The General Framework of SDMOEA-TSS Input: maxgen :maximum number of generations, N : population size, D : original training set Output: S V M training subsets set S 1 /*The first phase*/ 2 [P, θ, R] ← Divided based Initialize Population(N, D); //P is the initialized population, θ is the angle that divides the objective space into different subregions, R denotes the subregio -ns in the objective space. 3 /*The second phase*/ 4 while the number of generations does not achieve maxgen do 5 Q0 ← Subregion based Crossover(P); 6 Q ← Subregion based Mutation(Q0 ); 7 P ← Subregion based Update(P, Q, R, θ, N); 8

9 10

/*The third phase*/ S ← Non-Dominated sort(P); return S ;

framework of SDMOEA-TSS consists of the following three phases. In the first phase, a divided based initialization strategy is suggested to initialize population P into different subpopulations, and each subpopulation is located in a particular region of objective space. For each individual pi in the population P (i ∈ {1, ..., N} and N is the size of population), it represents a training subset for SVM, and the length of pi equals to the number of the instances in the whole training set. The encoding scheme of the individual adopts binary encoding, which means that if the j-th bit in pi is set to 1, the j-th instance is included in the pi training subset. In the second phase, a subregion based evolutionary strategy (including crossover, mutation and update) is designed. To be specific, the designed crossover strategy is to utilize the individuals in the same subregion for local search and individuals in their neighbor subregions for global search. The basic idea of the suggested mutation strategy is to define different asymmetric mutation probability for the individuals in different subregions. By using the suggested crossover and mutation strategies, we create the offspring individuals with high quality. Then a subregion based population update strategy is developed, where the parent and offspring individuals in the same subregion are evaluated for the next evolution. To this end, the SVM accuracy (Acc) on the individual pi and the reduction rate (Red) on pi are adopted as two optimized objectives, which are used to measure the quality of parent and offspring individuals. The suggested evolutionary (crossover, mutation and update) operations are repeated in the second phase, until a termination criterion is satisfied. In the last phase, a set of non-dominated solutions are obtained, which are returned as the final output. The general framework of SDMOEA-TSS is presented in Algorithm 1. From the above explanation of Algorithm 1, we can find that there are two important components in the proposed SDMOEA-TSS, which are the divided based initialization strategy (Line 2) and the subregion based evolutionary scheme (Lines 5-7). In the following, we will elaborate them in detail. 3.2. The Divided based Initialization Strategy As mentioned in Section 1, the aim of the proposed algorithm is to obtain SVM training subsets with both high quality and good diversity. Thus, in the initialization phase, a divided 7

based initialization strategy is designed to initialize the whole population P to locate in three different objective space regions (P = {P1 , P2 , P3 }). Let assume the objective space be shown in Figure 1, the first subpopulation P1 is located in the upper left region of objective space, which prefers to the solutions with high reduction rate. The third subpopulation P3 is initialized in the lower right region of objective space, which prefers to the training subsets with high SVM accuracy. The second one P2 is to achieve the trade-off solutions between reduction rate and SVM accuracy. To be specific, q1 , q2 , q3 (0 < q1 < q2 < q3 < 1) different probabilities are firstly used to generate three subpopulations, and each subpopulation has the size of O(dN/3e). Then we find the center point of each subpopulation in the objective space, and denote them as point c1 , c2 and c3 . The middle points between c1 , c2 and c2 , c3 are denoted as a and b. Thus, by connecting point (0, 0) with point a, and point (0, 0) with point b, we can divide the object space into three subregions, which are named as subregions A, B and C. Figure 1 presents an illustrative example in detail, and the whole procedure of the divided based initialization is described in Algorithm 2.

P1 c1

Reduction rate

a

P2 A

c2

B

b

P3

θ1 c3

θ2

C

θ3

Accuracy

Figure 1: An example to illustrate the divided based initialization strategy.

3.3. The Subregion based Evolutionary Strategy In the second phase of SDMOEA-TSS, a subregion based evolutionary strategy is suggested to produce the individuals for next evolution, which includes the crossover, mutation and the population update strategies. In the following, we will illustrated them in detail. In the suggested subregion based crossover strategy, an individual pi in the population P has a probability of qin (0≤ qin ≤ 1) to perform the crossover operation with the individuals in the same subregion, meanwhile it also has a probability of 1 − qin to crossover with the individuals in its neighbor subregion. By using this strategy, we can not only make full use of population in each subregion for local search, but also maintain the global search ability of the whole population. To be specific, as shown in Figure 1, for the individual in subregion A (subregion C), it has a probability of qin to perform crossover with the individuals in subregion A (subregion C), and has a probability of 1 − qin crossover with the individuals in subregion B. For the individual in subregion B, it crossovers with the individuals in subregion B with a probability of qin , whereas has a probability of (1 − qin )/2 crossover with the individuals in subregions A and C, respectively. The whole procedure of the subregion based crossover strategy is described in Algorithm 3. After performing the crossover operation, a subregion based mutation operation is carried on the individuals in population P. It should be noted that in the traditional binary encoding mutation 8

Algorithm 2: Divided based Initialize Population Input: N : population size, (q1 , q2 , q3 ) : initialization probability, D : original training set. Output: P: the initialized population, θ: the angle that divides the objective space into different subregions, R: the subregions in the objective space. 1 for i = 1 to 3 do 2 Pi ← Initialize Population(dN/3e , D, qi ); 3 ci ← Calculate center point in Pi ; 4 5 6 7 8 9 10 11 12

P ← {P1 , P2 , P3 }; a ← (c1 + c2 )/2; //compute middle point between c1 and c2 b ← (c2 + c3 )/2; //compute middle point between c2 and c3 θ1 ← 90◦ − arctan(0, a); //compute the angle of subregion A θ3 ← arctan(0, b); //compute the angle of subregion C θ2 ← 90◦ − θ1 − θ3 ; //compute the angle of subregion B θ ← {θ1 , θ2 , θ3 }; R ← {A, B, C}; //the objective space R is divided into subregions A, B and C according to the angles θ1 , θ2 , θ3 return P, θ, R;

Algorithm 3: Subregion based Crossover Input: P {P1 , P2 , P3 } : the parent population, qin : crossover probability in the same subregion. Output: the population Q0 after performing crossover 0 0 0 1 Q1 ← ∅,Q2 ← ∅,Q3 ← ∅; 2 for i = 1 to 3 do 3 Q0in ← crossover in subregions(Pi , qin ); 4 if i = 1 or i = 3 then 5 Q0bt ← crossover between subregions(Pi , P2 , 1 − qin ); 6 Q0i ← Q0in ∪ Q0bt ; 7 8 9 10 11 12

else

Q0bt1 ← crossover between subregions(P1 , P2 , (1 − qin )/2); Q0bt2 ← crossover between subregions(P2 , P3 , (1 − qin )/2); Q0i ← Q0in ∪ Q0bt1 ∪ Q0bt2 ; n o Q0 ← Q01 , Q02 , Q03 ; return Q0 ;

strategy of NSGA-II, each mutated bit has a similar probability of flipping from 0 to 1 as from 1 to 0, which may deteriorate the reduction rate of the individuals. To this end, an asymmetric mutation strategy is adopted, whose idea is from the intuition that each mutated bit of individual in the population has more probability of switching from 1 to 0 than 0 to 1, and individuals located in the objective space of upper left (such as subregion A) have more asymmetric mutation probability than the ones located in the lower right region (such as subregion C). Specifically, for any individual in the subregions A or B or C, its mutation probability pm =[pm0 , pm1 ] is set 9

to [KA / |D| , 1/ |D|], [KB / |D| , 1/ |D|] and [KC / |D| , 1/ |D|], where |D| is the number of whole instances in the training set D, and |D| > KA > KB > KC > 1. Algorithm 4 below presents the detailed procedure of subregion based mutation strategy in the proposed algorithm. Algorithm 4: Subregion based Mutation n o Input: Q0 Q01 , Q02 , Q03 : the population after performing crossover, KA , KB and KC : |D| > KA > KB > KC > 1 Output: offspring Q 1 Q1 ← ∅,Q2 ← ∅,Q3 ← ∅; 2 for i = 1 to 3 do 3 if i = 1 then 4 [pm0 , pm1 ] ← [KA / |D| , 1/ |D|]; 5 Q1 ← mutation(Q01 , pm0 , pm1 ); 6 7 8 9 10 11 12 13

else if i = 2 then [pm0 , pm1 ] ← [KB / |D| , 1/ |D|]; Q2 ← mutation(Q02 , pm0 , pm1 );

else if i = 3 then [pm0 , pm1 ] ← [KC / |D| , 1/ |D|]; Q3 ← mutation(Q03 , pm0 , pm1 ); Q ← {Q1 , Q2 , Q3 }; return Q;

When the crossover and the mutation operations are finished, we can update the population by combining the parent and offspring individuals. To this end, a subregion based population update strategy is suggested, where the individuals of parent and offspring in the same subregion are evaluated and then updated. Specifically, for each individual pi in the offspring population Q, we compute its two optimized objectives values (accuracy and reduction rate), which creates a point p pi in the objective space. By calculating the angle between the line (0, p pi ) and X-axis (accuracy axis), we partition the offspring individual pi into a subregion (A or B or C). When all the individuals in the offspring population Q are partitioned into their corresponding subregion, we update subpopulation in each subregion by evaluating the parent and the offspring individuals in the same subregion, and the whole procedure of the region-based update population strategy is described in Algorithm 5. 3.4. Time Complexity Analysis In this section, we will discuss the time complexity of the proposed SDMOEA-TSS. Specifically, the time complexity of SDMOEA-TSS is mainly dominated by evaluating the fitness function and the selection of the non-dominated set of solutions. For the former, in each generation, we need to evaluate two objectives (the reduction rate and accuracy) of the individual (SVM training subset). The reduction rate can be evaluated in O(|D|), where |D| is the number of instances in original training set. The calculation of accuracy is depended on the detailed SVM version, and here we adopt common time complexity of SVM, which is O(|D|2 ). Thus, in each generation, the computational complexity of evaluating the fitness function for the whole population is O(N × |D|2 ), where N is the number of individuals in population. For the latter, in each 10

Algorithm 5: Subregion based Update Input: N : population size, P {P1 , P2 , P3 } : the parent population, Q {Q1 , Q2 , Q3 }: the offspring population, R {A, B, C} : the subregions A, B and C, θ {θ1 , θ2 , θ3 }: the angle that divides the objective space into three subregions. Output: P: the population for the next evolution. 1 Q11 ← ∅,Q22 ← ∅,Q33 ← ∅; 2 for i = 1 to N do 3 compute the accuracy( f1 ) and reduction rate( f2 ) of individual pi ; 4 θ pi ← arctan( f2 / f1 ); 5 if θ pi > (θ2 + θ3 ) then 6 Q11 ← Q11 ∪ pi ; //partition offspring individual pi into subregion A 7 8 9 10 11 12 13 14 15

else if θ3 < θ pi ≤ (θ2 + θ3 ) then Q22 ← Q22 ∪ pi ; //partition offspring individual pi into subregion B else if θ pi ≤ θ3 then Q33 ← Q33 ∪ pi ; //partition offspring individual pi into subregion C

Q1 ← Q11 ,Q2 ← Q22 ,Q3 ← Q33 ; for j = 1 to 3 do P j ← update(P j ∪ Q j );

P ← {P1 , P2 , P3 }; return P;

generation, the proposed method adopts non-dominated sorting of NSGA-II to select solutions, whose computational complexity is O(N 2 ). Therefore, the time complexity of SDMOEA-TSS in one generation is O(N × |D|2 + N 2 ). Since the proposed SDMOEA-TSS runs maxgen generations. Thus, the total time complexity of SDMOEA-TSS is O(maxgen × (N × |D|2 + N 2 )). Considering maxgen is often set as a constant, thus the final time complexity of SDMOEA-TSS is O(N × |D|2 + N 2 ). 4. Experimental Results and Analysis In this section, we empirically verify the performance of the proposed SDMOEA-TSS by comparing it with several representative algorithms for SVM training set selection. Specifically, the experiments are designed as follows. 4.1. Experimental Design 4.1.1. Comparison Algorithms The performance of SDMOEA-TSS is compared with five representative algorithms for SVM training set selection, which are EMOMIS-PbE [24], ParetoTSS [25], POSS [32], MOEA/DTSS [42] and MOEO-TSS [44]. Among them, the first two algorithms are the most recently suggested MOEAs for SVM training set selection, which have shown better performance than many other state-of-the-arts, such as FCNN, GGA and SE and so on. The EMOMIS-PbE is an evolutionary multi-objective model and instance selection algorithm for SVM, where the training set selection and hyper-parameter selection for SVM were simultaneously optimized. In addition, 11

five different Pareto-based ensemble strategies were also adopted to further obtain the better solutions. The ParetoTSS is another effective algorithm of TSS for SVM, where a Pareto-based multi-objective evolutionary approach was designed for the first time to obtain the SVM training subset with high quality. In ParetoTSS, the SVM training set selection was firstly regarded as a Pareto optimization problem. Then PESA-II was adopted as the framework, in which a modified version of the Heuristic Uniform Crossover (HUX) and a bit flip mutation were designed to achieve SVM training subsets with high quality. The third comparison algorithm POSS is a general subset selection method and can be applied to solve the problems of instance selection, feature selection and so on. In POSS, the evolutionary Pareto optimization technique was utilized to find a small-sized subset with good performance, and in the following experiments, we apply POSS for the SVM training set selection problem. In addition, to make our experiments more convincing, we also use two existing MOEAs (MOEA/D [42] and MOEO [44]) to solve SVM TSS problem, and in the following experiments they are named as MOEA/D-TSS and MOEOTSS, respectively. For fair comparisons, we adopt the recommended parameters values for all the comparison algorithms, which were suggested in their original papers. For all the algorithms (including ours), we set their evaluation numbers with the same value, which is a commonly fair setting in comparing different MOEAs [49, 50]. To be specific, we fix the number of evaluations as 20,000, and the population size N = 100. The libSVM2 with Radial Basis Function (RBF) kernel is used to construct the SVM classifier for all the algorithms. The kernel parameter γ and the cost parameter C in libSVM are optimized in EMOMIS-PbE, and these two parameters are set as γ = 0.01 and C = 1 for ParetoTSS, which were suggested in the original paper. For the other four algorithms (POSS, MOEA/D-TSS, MOEO-TSS and the propose SDMOEA-TSS), the two parameters are set as the default values of libSVM, which are γ = 1/d and C=1, where d is the number of dimensions in the training set. Moreover, for the proposed SDMOEA-TSS, the initialization probability p (q1 , q2 , q3 ) = (0.1, 0.3, 0.5). The probability of qin to perform crossover in the same subregion is set as qin = 0.8. The asymmetric mutation probability pm = [pm0 , pm1 ] for the subregions A, B and C are set as [(50/ |D| , 10/ |D| , 5/ |D|) , 1/ |D|], where |D| is the number of the total instances in the training set. The detailed parameters of all compared algorithms are listed in Table 1. Table 1: Parameter settings of all the compared algorithms

Algorithms EvaNum N p pm0 pm1 qin reference SDMOEA-TSS 20,000 100 (0.1,0.3,0.5) (50/ |D| , 10/ |D| , 5/ |D|) 1/ |D| 0.8 [ours] ParetoTSS 20,000 100 0.5 0.05 0.001 [25] EMOMIS-PbE 20,000 100 0.5 0.1 0.1 [24] POSS 20,000 100 1/ |D| 1/ |D| [32] MOEA/D-TSS 20,000 100 0.5 0.1 0.1 [42] MOEO-TSS 20,000 100 0.5 [44]

2 https://www.csie.ntu.edu.tw/

cjlin/libsvm/index.html

12

4.1.2. Experimental data sets These six algorithms are tested on 21 public data sets, which are the popular data sets for SVM classification and can be available from UCI machine learning repository3 . The detailed characteristics of these data sets are depicted in Table 2. Table 2: The detailed characteristics of 21 data sets

Data set Heart Cleveland Bupa Bands Saheart Australian Pima Geman Yeast Phoneme Tae Hayes-roth Haberman Wdbc DrivFace Pd speech features Winequality-red Titanic Segment Winequality-white Penbased

#Instances 270 297 345 365 462 690 768 1,000 1,484 5,404 151 160 306 569 606 756 1,599 2,201 2,310 4,898 10,992

#Features 13 13 6 19 9 14 8 20 8 5 5 4 3 30 6,400 754 11 3 19 11 16

#Classes 2 5 2 2 2 2 2 2 10 2 3 3 2 2 3 2 11 2 7 11 10

In the Table 2, “#Instances”, “#Features” and “#Classes” denote the number of training instances, features and the classes in the experimental data sets. For each data set, we randomly divide it into 10 folds, in which nine folds are used for training set and the remaining fold is for testing set. To further reduce the variations, three independent 10-fold partitions are also generated for each set, and we report the averaging values of 30 runs as the final results. All experiments reported in this paper are conducted on a PC with an Intel Core i7-6700 CPU, 3.4GHz, 8GB memory, and the Windows 7 64 bit operating system. The source code of the proposed SDMOEA-TSS is available by requesting to the authors through their emails. 4.1.3. Evaluation metrics The aim of this paper is to propose an effective MOEA to obtain SVM training subset with both good quality and diversity. Therefore, we adopt the SVM accuracy and the reduction rate as two metrics to measure the solutions quality of different algorithms. To evaluate the diversity of 3 https://archive.ics.uci.edu/ml/

13

those algorithms, we use Hypervolume (HV) [51] as the metric, which could provide a combined information about convergence and diversity of a solution set. The calculation of HV metric requires to set a reference point r = (r1 , ...rm ) in the objective space, where m is the number of the optimized objectives, and the reference point is generally suggested to be a little bigger than the nadir point to emphasize the balance between convergence and diversity of the obtained solutions. The HV value of an obtained solution set S (with regard to the reference point r) is the volume of the region which is dominated by S and dominates r. It can be calculated by the following formula: [  HV(S , r) = volume [ f1 , r1 ]×, ... × [ fm , rm ] (2) f ∈S

where f = ( f1 , ... fm ), and fi (i ∈ {1, ...m}) is the value of the i-th objective function. The larger the HV metric value is, the better convergence and diversity of the algorithm has.

4.2. Experimental Results Based on the 21 data sets above, we compare the proposed SDMOEA-TSS with five comparison algorithms and the experimental results are reported from the following aspects. 4.2.1. The Experimental Results Evaluated by HV In the first part of experiments, we are interested on the convergence and diversity of six MOEAs on the test data sets. To this end, we adopt HV as the evaluation metric, and present the HV values of six SVM training set selection algorithms in Table 3. Note that for each data set, the highest HV value is marked in boldface, and the number in the parentheses denotes the rank of the algorithm on a data set. From Table 3, we can find that compared with other MOEAs based SVM training set selection algorithms, the proposed SDMOEA-TSS achieves the highest HV values on all the test data sets. The reason for the highest HV value of SDMOEA-TSS is attributed to the suggested subregion division based evolutionary strategy. By dividing the objective space into different subregions, the proposed SDMOEA-TSS can make full use of individuals in each subregion for local search but also maintains the whole population’s global search ability, which guarantees the proposed SDMOEA-TSS has both good convergence and diversity. 4.2.2. The Experimental Results Evaluated by Accuracy and Reduction Rate In the second part of experiments, we focus on how our SDMOEA-TSS performs when measured by accuracy and reduction rate metrics. To this end, we compare SDMOEA-TSS with other five MOEA based SVM training set selection algorithms on 21 data sets, and plot their final non-dominated solutions in Figure 2. It can be observed from Figure 2 that on most of the data sets, EMOMIS-PbE, ParetoTSS, POSS and the proposed SDMOEA-TSS achieve better non-dominated solutions than those of MOEA/D-TSS and MOEO-TSS. The reason mainly lies in the fact that the former four MOEAs are well designed for the SVM training set selection problem, while the later two algorithms just apply existing MOEAs to solve this bi-objective optimization problem and do not develop any strategy oriented to the SVM TSS problem. Moreover, by taking a closer look at the four well designed MOEAs, we can find that their non-dominated solutions also vary on different data sets. For example, the non-dominated solutions of POSS and ParetoTSS have good performance in terms of reduction rate, while their accuracy is not very competitive. Similarly, the EMOMIS-PbE method achieves non-dominated solutions with competitive accuracy on many data sets, however, its reduction rate is not high. Different from those 14

Table 3: The HV values of six compared algorithms on different data sets. Data set

SDMOEA-TSS

ParetoTSS

EMOMIS-PbE

POSS

MOEA/D-TSS

MOEO-TSS

Heart

0.9053(1)

0.8891(2)

0.8145(4)

0.8793(3)

0.7456(5)

0.5716(6)

Cleveland

0.7299(1)

0.6928(2)

0.6308(4)

0.6711(3)

0.5825(5)

0.4486(6)

Bupa

0.7368(1)

0.7179(2)

0.6322(4)

0.7178(3)

0.5878(5)

0.4431(6)

Bands

0.7779(1)

0.7531(2)

0.6695(4)

0.7520(3)

0.6082(5)

0.4673(6)

Saheart

0.7958(1)

0.7785(2)

0.6709(4)

0.7739(3)

0.6078(5)

0.4874(6)

Australian

0.9019(1)

0.8950(2)

0.7297(4)

0.8913(3)

0.6579(5)

0.5425(6)

Pima

0.8204(1)

0.8061(2)

0.6551(4)

0.8026(3)

0.5893(5)

0.4908(6)

Geman

0.8412(1)

0.8082(2)

0.6445(4)

0.7890(3)

0.5842(5)

0.4888(6)

Yeast

0.6397(1)

0.5963(3)

0.4802(4)

0.6032(2)

0.4293(5)

0.3636(6)

Phoneme

0.8162(1)

0.8064(2)

0.6037(4)

0.8030(3)

0.4936(5)

0.3388(6)

Tae

0.6673(1)

0.6470(2)

0.5817(5)

0.6292(3)

0.5925(4)

0.4427(6)

Hayes-roth

0.8177(1)

0.7732(2)

0.7085(5)

0.7216(4)

0.7287(3)

0.5426(6)

Haberman

0.7952(1)

0.7896(3)

0.6391(5)

0.7915(2)

0.6439(4)

0.5042(6)

Wdbc

0.9812(1)

0.9772(3)

0.7308(5)

0.9789(2)

0.7365(4)

0.6001(6)

DrivFace

0.9752(1)

0.9513(2)

0.7159(5)

0.9433(3)

0.7202(4)

0.6017(6)

Pd speech features

0.8816(1)

0.8545(3)

0.6226(5)

0.8582(2)

0.6246(4)

0.5116(6)

Winequality-red

0.6439(1)

0.6316(2)

0.4243(5)

0.6295(3)

0.4313(4)

0.3713(6)

Titanic

0.8059(1)

0.8048(2)

0.5287(4)

0.8028(3)

0.5268(5)

0.4697(6)

Segment

0.9281(1)

0.9007(2)

0.5994(5)

0.8383(3)

0.6082(4)

0.5363(6)

Winequality-white

0.5885(1)

0.5697(3)

0.5824(2)

0.5685(4)

0.5053(5)

0.4667(6)

Penbased

0.9880(1)

0.9790(2)

0.6534(4)

0.7922(3)

0.6234(5)

0.5030(6)

AverageRank

1

2.24

4.29

2.90

4.57

6

15

1

1

1

0.9

0.8

0.8

0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.6

0.5

0.4 0.5

0.55

0.6

0.65

0.9 0.85

0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.6

0.5

0.7

0.75

Accuracy

0.8

0.85

0.9

0.4 0.5

0.95

0.55

0.7 0.65 0.6 0.55

0.6

0.65

0.7

Accuracy

0.9

0.9

0.9

0.8

0.8

0.8

0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.6

0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.6

0.5

0.65

Accuracy

0.7

0.75

0.4 0.62

0.8

Reduction rate

1

0.4 0.55

0.64

0.66

(d) Bands

0.68

0.6

0.5

0.7

0.72

Accuracy

0.74

0.76

0.78

0.4 0.55

0.8

0.9

0.8

0.8

0.4

0.65

0.7

0.7

0.6

0.5

0.75

0.4 0.68

0.8

Accuracy

Reduction rate

0.9

0.8

Reduction rate

0.9

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.7

0.72

(g) Pima

0.74

0.78

Accuracy

0.8

0.82

0.84

0.4 0.3

0.86

0.9

0.8

0.8

0.4 0.7

0.72

0.74

0.7

0.5

0.76

Accuracy

0.78

(j) Phoneme

0.8

0.82

Reduction rate

0.9

0.8

Reduction rate

0.9

0.6

0.4 0.3

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.35

0.4

0.45

0.5

0.55

(k) Tae

16

0.6

0.8

0.85

0.9

0.95

0.35

0.4

0.45

0.5

Accuracy

0.55

0.6

0.65

0.7

0.6

0.5

Accuracy

0.75

Accuracy

(i) Yeast 1

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.6

1

0.5

0.65

(h) Geman

0.7

0.75

0.7

1

0.6

0.6

0.5

0.76

0.7

(f) Australian 1

0.7

0.65

Accuracy

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

(e) Saheart 1

0.5

0.6

0.7

1

0.6

0.55

(c) Bupa

1

0.6

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.5 0.5

0.75

1

0.5

Reduction rate

0.8 0.75

(b) Cleveland

Reduction rate

Reduction rate

(a) Heart

Reduction rate

Reduction rate

0.9

Reduction rate

Reduction rate

0.95

0.65

0.7

0.75

0.4 0.4

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.45

0.5

0.55

0.6

0.65

Accuracy

0.7

(l) Hayes-roth

0.75

0.8

0.85

1

1

1

0.95

0.95

0.95

0.9

0.8 0.75 0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.65 0.6 0.55 0.5 0.7

0.72

0.8 0.75 0.7 0.65 0.6

0.74

0.76

0.78

Accuracy

0.55 0.6

0.8

0.85

0.85

Reduction rate

Reduction rate

Reduction rate

0.9

0.9

0.85

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.65

0.7

(m) Haberman

0.8 0.75 0.7 0.65 0.6 0.55

0.75

0.8

Accuracy

0.85

0.9

0.95

0.5 0.89

1

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.9

0.91

(n) Wdbc

1

0.92

0.93

0.94

Accuracy

0.95

0.96

0.97

0.98

(o) DrivFace 1

1 0.95

0.5

0.4 0.72

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.74

0.76

0.78

0.8 0.75 0.7 0.65 0.6 0.55

0.8

Accuracy

0.82

0.84

0.86

(p) Pd speech features

0.45

0.8

0.7

0.6

0.5

0.5

0.55

0.6

Accuracy

0.4 0.66

0.65

1

1

0.95

0.95

0.9

0.9

0.85

0.85

0.85

0.8 0.75

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.6 0.55 0.5 0.1

0.2

0.3

0.4

0.8 0.75 0.7

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.65 0.6 0.55 0.5

0.6

Accuracy

0.7

(s) Segment

0.8

0.9

1

Reduction rate

1

0.9

0.7

0.68

0.7

0.5 0.4

0.42

0.44

0.46

0.7

0.55

Accuracy

0.52

0.54

(t) Winequality-white

0.76

0.78

0.8

0.8

0.6

0.5

0.74

Accuracy

0.75

0.65

0.48

0.72

(r) Titanic

0.95

0.65

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

(q) Winequality-red

Reduction rate

Reduction rate

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS

0.5 0.4

0.88

Reduction rate

0.7

0.6

0.9

0.9 0.85

0.8

Reduction rate

Reduction rate

0.9

0.56

0.58

0.5 0.8

SDMOEA−TSS ParetoTSS EMOMIS−PbE POSS MOEA/D−TSS MOEO−TSS 0.85

0.9

Accuracy

0.95

1

(u) Penbased

Figure 2: The Pareto fronts of six compared algorithms on 21 test data sets.

three comparison algorithms, the results in Figure 2 also show that the proposed SDMOEATSS has better Pareto fronts on most of the data sets. The above results have demonstrated the competitiveness of the proposed method, and the reason for this superior performance of SDMOEA-TSS is attributed to the fact that the non-dominated solutions of SDMOEA-TSS are obtained from different subregions. Some subregions achieve the solutions with high reduction rate, while other subregions achieve the solutions with good accuracy (in the following section, we will further verify this fact). By combining these different subregions’ Pareto solutions together, SDMOEA-TSS can obtain SVM training subsets with higher quality. 4.2.3. A Further Experimental Comparsion between Different subregions of SDMOEA-TSS and The State-of-the-arts The experimental results above have demonstrated the superority of the proposed method, and in this section, we further validate the effectiveness of solutions in different subregions of SDMOEA-TSS by comparing them with the state-of-the-arts. To this end, two groups of experiments are designed. In the first group of experiments, we compare the solutions obtained in sub17

region A (lie in the upper left part) of SDMOEA-TSS with POSS and ParetoTSS, which achieve the solutions with good reduction rate. In the second group of experiments, we compare the solutions obtained in subregion C (lie in the lower right part) of SDMOEA-TSS with EMOMIS-PbE, which generates the solutions with good accuracy. It should be noted that since all the algorithms are MOEAs, which returns a set of non-dominated solutions, to select one single solution from the non-dominated set, we adopt the decision-making scheme used in ParetoTSS[25], where the solution with the highest sum of the reduction rate and accuracy value is chosen as the output. Those two groups of experiment results are depicted in Tables 4 and 5, respectively. Table 4: The comparison results of SDMOEA-TSS (solutions in subregion A), ParetoTSS and POSS Accuracy

Datasets

Reduction rate

SDMOEA-TSS

ParetoTSS

POSS

SDMOEA-TSS

ParetoTSS

POSS

Heart

0.8327(1)

0.8167(3)

0.8222(2)

0.9869(1)

0.9793(2)

0.9739(3)

Cleveland

0.5838(1)

0.5628(3)

0.5690(2)

0.9860(1)

0.9743(3)

0.9771(2)

Bupa

0.6097(3)

0.6367(2)

0.6511(1)

0.9841(1)

0.9767(2)

0.9714(3)

Bands

0.6682(2)

0.6617(3)

0.6701(1)

0.9787(1)

0.9775(2)

0.9685(3)

Saheart

0.7085(1)

0.7070(2)

0.7007(3)

0.9847(1)

0.9835(2)

0.9819(3)

Australian

0.8643(1)

0.8575(3)

0.8604(2)

0.9904(1)

0.9893(2)

0.9872(3)

Pima

0.7608(1)

0.7487(3)

0.7570(2)

0.9878(2)

0.9888(1)

0.9865(3)

Geman

0.7300(2)

0.7320(1)

0.7243(3)

0.9824(2)

0.9806(3)

0.9833(1)

Yeast

0.5310(3)

0.5405(2)

0.5439(1)

0.9956(1)

0.9878(3)

0.9940(2)

Phoneme

0.7761(3)

0.7762(2)

0.7767(1)

0.9980(2)

0.9967(3)

0.9981(1)

Tae

0.5083(3)

0.5367(2)

0.5408(1)

0.9779(1)

0.9573(2)

0.9524(3)

Hayes-roth

0.5875(3)

0.6688(1)

0.6021(2)

0.9771(1)

0.9236(3)

0.9391(2)

Haberman

0.7361(3)

0.7420(2)

0.7462(1)

0.9913(1)

0.9877(2)

0.9835(3)

Wdbc

0.9684(2)

0.9737(1)

0.9613(3)

0.9910(2)

0.9918(1)

0.9900(3)

DrivFace

0.9833(1)

0.8833(3)

0.9016(2)

0.9670(3)

0.9963(1)

0.9927(2)

Pd speech features

0.8400(2)

0.8533(1)

0.7763(3)

0.9855(2)

0.9941(1)

0.9853(3)

Winequality-red

0.5866(1)

0.5860(2)

0.5768(3)

0.9958(1)

0.9923(3)

0.9926(2)

Titanic

0.7833(1)

0.7828(2)

0.7746(3)

0.9985(1)

0.9979(3)

0.9982(2)

Segment

0.8900(1)

0.8723(2)

0.8205(3)

0.9268(3)

0.9484(2)

0.9966(1)

Winequality-white

0.5205(1)

0.5196(3)

0.5201(2)

0.9916(3)

0.9938(2)

0.9977(1)

Penbased

0.9601(1)

0.9570(2)

0.7753(3)

0.9890(2)

0.9830(3)

0.9990(1)

AverageRank

1.76

2.14

2.10

1.57

2.19

2.24

It can be observed from Table 4 that the solutions obtained in the subregion A of SDMOEATSS have better accuracy and reduction rate than those of POSS and ParetoTSS. Meanwhile, the experimental results in Table 5 show that when compared with EMOMIS-PbE, the solutions provided by the subregion C of SDMOEA-TSS are also better or comparable. From the comparison results in Tables 4 and 5, we can conclude that the solutions achieved by different subregions of 18

Table 5: The comparison results between SDMOEA-TSS (solutions in subregion C) and EMOMIS-PbE Accuracy

Datasets

Reduction rate

SDMOEA-TSS

EMOMIS-PbE

SDMOEA-TSS

EMOMIS-PbE

Heart

0.8330(2)

0.8340(1)

0.6503(2)

0.7341(1)

Cleveland

0.5569(1)

0.5510(2)

0.7091(1)

0.7049(2)

Bupa

0.6501(1)

0.6396(2)

0.7264(1)

0.6925(2)

Bands

0.7030(1)

0.6913(2)

0.7207(1)

0.7006(2)

Saheart

0.7020(2)

0.7098(1)

0.6538(2)

0.6863(1)

Australian

0.8580(2)

0.8618(1)

0.6332(2)

0.6569(1)

Pima

0.7712(1)

0.7639(2)

0.6358(2)

0.6477(1)

Geman

0.7597(2)

0.7607(1)

0.6631(1)

0.6109(2)

Yeast

0.5777(1)

0.5687(2)

0.7065(1)

0.5944(2)

Phoneme

0.7911(1)

0.7905(2)

0.6228(1)

0.5517(2)

Tae

0.5488(1)

0.5336(2)

0.7579(1)

0.7093(2)

Hayes-roth

0.7375(2)

0.7417(1)

0.7958(1)

0.7143(2)

Haberman

0.7363(1)

0.7361(2)

0.6369(2)

0.7328(1)

Wdbc

0.9701(1)

0.9666(2)

0.6235(2)

0.6794(1)

DrivFace

1.0000(1)

0.9000(2)

0.6495(1)

0.6484(2)

Pd speech features

0.8933(1)

0.8400(2)

0.6843(1)

0.5991(2)

Winequality-red

0.5854(1)

0.5850(2)

0.6397(1)

0.5804(2)

Titanic

0.7806(2)

0.7866(1)

0.6099(1)

0.6039(2)

Segment

0.9208(2)

0.9219(1)

0.6307(1)

0.5702(2)

Winequality-white

0.5306(1)

0.5203(2)

0.6332(1)

0.5429(2)

Penbased

0.9850(2)

0.9860(1)

0.6546(1)

0.5840(2)

AverageRank

1.38

1.62

1.29

1.71

SDMOEA-TSS are also with high quality. 4.2.4. The Experimental Results Evaluated by Running Time In this part of experiments, we are concentrated on the computational costs of different algorithms. Therefore, we report the running time of six algorithms on the experimental data sets, and the detailed comparison results are depicted in Table 6. From the table, we can find that POSS algorithm has the minimum running time on all the data sets, and the ParetoTSS performs the second best. The proposed SDMOEA-TSS is the third best one, which is better than EMOMIS-PbE, MOEA/D-TSS and MOEO-TSS. The running time results of different algorithms are mainly attributed to the following two facts. The first fact is that all the comparison algorithms evaluate the individual (representing a SVM training subset) by training a SVM classifier, and the more bits in the individual with 1 (representing more instances are selected in this training subset), the more time needs to evaluate this individ19

ual. Another fact is that POSS and ParetoTSS have designed special strategies to ensure their individuals in the population have more 0 bits than those of 1. To be specific, In POSS, a zero initialization individual strategy is suggested, and in each evolution only one bit of the individual is flipped to 1, which guarantees the individuals in POSS have much more 0 than 1. Similarly, a modified bit flip mutation strategy is developed in ParetoTSS, which ensures most individuals in ParetoTSS have less instances. On the contrary, other three comparison algorithms (EMOMISPbE, MOEA/D-TSS and MOEO-TSS) do not design special strategies to reduce the number of 0 bits in individuals, thus, their running times are quite long. It should be noted that in the proposed SDMOEA-TSS, some individuals (such as the individuals in subregions A and B) have higher mutation probability of flipping from 1 to 0, while other individuals (such as the ones in subregion C) have the same mutation probability of flipping from 1 to 0 as 0 to 1. Therefore, the running time of the SDMOEA-TSS is worse than POSS and ParetoTSS, but better than EMOMIS-PbE, MOEA/D-TSS and MOEO-TSS. Table 6: The running time of six compared algorithms on different data sets (seconds). Data set

SDMOEA-TSS

ParetoTSS

EMOMIS-PbE

POSS

MOEA/D-TSS

MOEO-TSS

Heart

208.2(3)

189.4(2)

254.0(4)

142.1(1)

499.1(6)

365.9(5)

Cleveland

300.4(3)

209.0(2)

494.3(4)

189.9(1)

767.6(6)

611.1(5)

Bupa

224.2(3)

202.9(2)

341.3(4)

151.5(1)

602.7(6)

489.8(5)

Bands

333.8(3)

285.0(2)

601.2(4)

187.9(1)

859.6(6)

786.8(5)

Saheart

345.0(3)

282.1(2)

499.2(4)

188.5(1)

764.8(6)

720.0(5)

Australian

565.4(3)

428.0(2)

958.5(4)

294.3(1)

1151.7(5)

1175.8(6)

Pima

715.0(3)

395.5(2)

1206.2(4)

278.9(1)

1359.3(5)

1429.2(6)

Geman

2196.8(3)

1001.3(2)

4242.9(4)

485.3(1)

4468.9(5)

4884.1(6)

Yeast

2737.4(3)

1135.8(2)

8118.2(4)

590.3(1)

8693.3(5)

9223.0(6)

Tae

117.9(2)

126.0(3)

126.2(4)

81.6(1)

299.6(6)

168.7(5)

Hayes-roth

114.7(2)

138.9(3)

147.5(4)

105.1(1)

313.6(6)

154.2(5)

Haberman

175.9(2)

178.5(3)

195.3(4)

129.9(1)

463.8(6)

309.4(5)

Wdbc

566.3(3)

330.9(2)

852.8(4)

298.3(1)

1011.9(5)

1053.0(6)

DrivFace

102591.5(3)

39785.2(2)

172489.1(6)

39643.0(1)

158342.2(4)

164376.6(5)

Pd speech features

18039.1(3)

5700.3(2)

35546.2(4)

5698.1(1)

36371.6(5)

37352.9(6)

Winequality-red

4183.3(3)

1297.1(2)

9115.6(4)

561.9(1)

9683.3(5)

10805.0(6)

Titanic

2540.7(3)

797.0(2)

5235.3(4)

632.8(1)

5703.1(5)

6544.2(6)

Segment

7684.9(3)

3620.8(2)

15269.3(5)

991.4(1)

15172.8(4)

16548.0(6)

Winequality-white

16989.4(3)

3494.7(2)

48176.6(5)

1569.2(1)

43462.1(4)

110812.5(6)

Phoneme

30992.1(3)

5029.3(2)

98624.4(4)

1574.3(1)

101367.0(5)

108267.4(6)

Penbased

91898.5(3)

49845.4(2)

162840.6(5)

5230.2(1)

159584.2(4)

164201.2(6)

AverageRank

13500.8(2.85)

5469.3(2.13)

26920.6(4.23)

2810.7(1)

26235.1(5.18)

21937(5.61)

20

4.3. The Effectiveness of the Suggested Strategies in SDMOEA-TSS As mentioned in Section 3, there are two main strategies suggested in the proposed SDMOEATSS, which are the divided based initialization strategy and the subregion based evolutionary strategy. In this section we will investigate the effectiveness of these two strategies on the performance of SDMOEA-TSS. 4.3.1. The Effectiveness of the Divided based Initialization Strategy In the proposed SDMOEA-TSS, a divided based initialization strategy is suggested to initialize the population P to locate in three different regions of objective space, and in this section, we investigate the effectiveness of this strategy by comparing SDMOEA-TSS with SDMOEA-TSSRan, which is the same algorithm as ours, except that it adopts random initialization and its three subregions are divided with the same angles (30◦ ). The comparison results between SDMOEATSS and SDMOEA-TSS-Ran are reported in terms of HV, accuracy and reduction rate, which are depicted in Table 7. It can be observed from the table that the proposed SDMOEA-TSS has better performance than SDMOEA-TSS-Ran when measured by HV, accuracy and reduction rate. This fact has demonstrated the effectiveness of the suggested divided based initialization strategy. Table 7: The comparison results between SDMOEA-TSS and SDMOEA-TSS-Ran HV

Datasets

Accuracy

Reduction rate

SDMOEA-TSS SDMOEA-TSS-Ran SDMOEA-TSS SDMOEA-TSS-Ran SDMOEA-TSS SDMOEA-TSS-Ran Heart

0.9053(1)

0.9038(2)

0.8377(2)

0.8413(1)

0.9879(1)

0.9774(2)

Cleveland

0.7299(1)

0.7022(2)

0.5896(1)

0.5756(2)

0.9848(1)

0.9676(2)

Bupa

0.7368(1)

0.7149(2)

0.5845(1)

0.5788(2)

0.9856(1)

0.9732(2)

Bands

0.7779(1)

0.7631(2)

0.6657(1)

0.6538(2)

0.9807(1)

0.9725(2)

Saheart

0.7958(1)

0.7832(2)

0.7142(1)

0.7129(2)

0.9841(1)

0.9825(2)

Australian

0.9019(1)

0.8999(2)

0.8560(2)

0.8589(1)

0.9897(1)

0.9877(2)

Pima

0.8204(1)

0.8125(2)

0.7591(2)

0.7631(1)

0.9903(1)

0.9856(2)

Geman

0.8412(1)

0.8224(2)

0.7313(1)

0.7300(2)

0.9832(1)

0.9802(2)

Yeast

0.6397(1)

0.6172(2)

0.5308(2)

0.5453(1)

0.9834(1)

0.8937(2)

Phoneme

0.8162(1)

0.8121(2)

0.7779(1)

0.7776(2)

0.9977(1.5)

0.9977(1.5)

Tae

0.6678(1)

0.6487(2)

0.5442(1)

0.5296(2)

0.9713(1)

0.9397(2)

Hayes-roth

0.8135(1)

0.8070(2)

0.7125(1)

0.6750(2)

0.9000(1)

0.8785(2)

Haberman

0.7975(1)

0.7921(2)

0.7525(1)

0.7415(2)

0.9906(1)

0.9855(2)

Wdbc

0.9823(1)

0.9746(2)

0.9631(1)

0.9368(2)

0.9898(2)

0.9938(1)

DrivFace

0.9688(1)

0.9640(2)

0.9200(1)

0.9180(2)

0.9689(2)

0.9780(1)

Pd speech features

0.8797(1)

0.8651(2)

0.8667(1)

0.8267(2)

0.9765(2)

0.9912(1)

Winequality-red

0.6476(1)

0.6384(2)

0.5753(1)

0.5497(2)

0.9924(1)

0.9715(2)

Titanic

0.8068(1)

0.8043(2)

0.7769(2)

0.7783(1)

0.9973(2)

0.9978(1)

Segment

0.9287(1)

0.8942(2)

0.8896(2)

0.9026(1)

0.9292(1)

0.8826(2)

Winequality-white

0.5858(1)

0.5751(2)

0.5139(1)

0.4710(2)

0.9931(2)

0.9995(1)

Penbased

0.9876(1)

0.9098(2)

0.9636(2)

0.9745(1)

0.9573(1)

0.8574(2)

AverageRank

1

2

1.33

1.67

1.26

1.74

21

4.3.2. The Effectiveness of the Subregion based Evolutionary Strategy In the proposed SDMOEA-TSS, a subregion based evolutionary strategy is suggested, which not only makes full use of individuals in each subregion for local search but also maintains the whole population’s global search ability. To validate the effectiveness of this strategy, we compare the proposed SDMOEA-TSS with SDMOEA-TSS-NS, which is the same algorithms as ours except that in its crossover operation is performed only in the same subregion and the mutation operation adopts the symmetric mutation probability. The experimental results of SDMOEATSS and SDMOEA-TSS-NS are reported in terms of HV, accuracy and reduction rate, which are depicted in Table 8. From the table we can find the proposed SDMOEA-TSS achieves much higher HV, reduction rate than those of SDMOEA-TSS-NS, and has comparable accuracy to that of SDMOEA-TSS-NS. This fact verifies the effectiveness of the suggested subregion based evolutionary strategy. Table 8: The comparison results between SDMOEA-TSS and SDMOEA-TSS-NS HV

Datasets

Accuracy

Reduction rate

SDMOEA-TSS SDMOEA-TSS-NS SDMOEA-TSS SDMOEA-TSS-NS SDMOEA-TSS SDMOEA-TSS-NS Heart

0.9053(1)

0.9049(2)

0.8377(1)

0.8201(2)

0.9879(1)

0.9686(2)

Cleveland

0.7299(1)

0.7292(2)

0.5896(1)

0.5691(2)

0.9848(1)

0.9711(2)

Bupa

0.7368(1)

0.7353(2)

0.5845(1)

0.6076(1)

0.9856(1)

0.9537(2)

Bands

0.7779(1)

0.7692(2)

0.6657(1)

0.6785(1)

0.9807(1)

0.9509(2)

Saheart

0.7958(1)

0.7788(2)

0.7142(1)

0.7078(2)

0.9841(1)

0.9611(2)

Australian

0.9019(1)

0.8674(2)

0.8560(1)

0.8565(1)

0.9897(1)

0.9490(2)

Pima

0.8204(1)

0.7869(2)

0.7591(1)

0.7575(2)

0.9903(1)

0.9420(2)

Geman

0.8412(1)

0.7917(2)

0.7313(1)

0.7290(2)

0.9832(1)

0.9249(2)

Yeast

0.6397(1)

0.5992(2)

0.5308(2)

0.5389(1)

0.9834(1)

0.9051(2)

Phoneme

0.8162(1)

0.7490(2)

0.7779(1)

0.7774(2)

0.9977(1)

0.9060(2)

Tae

0.6678(1)

0.6640(2)

0.5442(2)

0.5550(1)

0.9713(1)

0.9617(2)

Hayes-roth

0.8135(1)

0.8100(2)

0.7125(1)

0.6938(2)

0.9000(2)

0.9049(1)

Haberman

0.7975(1)

0.7935(2)

0.7525(1)

0.7157(2)

0.9906(1)

0.9782(2)

Wdbc

0.9823(1)

0.9529(2)

0.9631(2)

0.9683(1)

0.9898(1)

0.9588(2)

DrivFace

0.9688(1)

0.9234(2)

0.9000(2)

0.9333(1)

0.9689(1)

0.9304(2)

Pd speech features

0.8797(1)

0.8265(2)

0.8667(1)

0.8421(2)

0.9765(1)

0.9309(2)

Winequality-red

0.6476(1)

0.6004(2)

0.5753(2)

0.5829(1)

0.9924(1)

0.9156(2)

Titanic

0.8068(1)

0.7521(2)

0.7769(1)

0.7765(2)

0.9973(1)

0.9215(2)

Segment

0.9287(1)

0.8870(2)

0.8896(2)

0.8916(1)

0.9292(1)

0.8643(2)

Winequality-white

0.5858(1)

0.5401(2)

0.5215(1)

0.5200(2)

0.9931(1)

0.8977(2)

Penbased

0.9876(1)

0.9074(2)

0.9636(2)

0.9680(1)

0.9573(1)

0.9001(2)

AverageRank

1

2

1.48

1.52

1.05

1.95

5. Conclusion and Future Work In this paper, we proposed a subregion division based multi-objective evolutionary method, termed SDMOEA-TSS, for SVM training set selection. The proposed SDMOEA-TSS adopted 22

a similar framework with NSGA-II and simultaneously optimized two conflicted objective, accuracy and reduction rate. To achieve the training sets with good quality, a divided based initialization strategy was firstly suggested, then a subregion based evolutionary (including crossover, mutation and update) strategy was further designed. By using these two strategies, SDMOEATSS could not only make full use of individuals in each subregion for local search but also maintained the whole population’s global search ability. Experimental results on benchmark data sets have demonstrated the superiority of the proposed method over the state-of-the-arts. Due to the broad application scenarios of TSS for SVM, there still exist some interesting topics to be further investigated. For example, all the existing MOEAs for TSS (including the proposed method) adopt binary encoding scheme, which means the length of an individual equals to the size of original training set. When the number of training set is very large, the searching space of these MOEAs increases greatly, which make them inefficiency for the large-scale data set. Thus, one interesting future work is to design the efficient search strategy for these MOEAs, so as to solve the SVM TSS problem in large-scale situation. In addition, the experimental results in Section 4.2.4 have shown that the evaluation cost of the proposed SDMOEA-TSS is still high, which results in its long running time. In the future, we plan to develop MOEA with low evaluation cost by considering the surrogate model [52]. Credit author statement This is a manuscript submitted to Neurocomputing for considerations of publication. The paper has not been published, nor has it been submitted to elsewhere for publication. All authors are aware of this submission. Many thanks for your assistance. Declaration of Competing Interest There are no conflicts about this paper. Acknowledgments This work is supported by the Natural Science Foundation of China (Grant No.U1804262, 61976001 and 61876184), and Humanities and Social Sciences Project of Chinese Ministry of Education (Grant No.18YJC870004) and the Natural Science Foundation of Anhui Province (Grant No.1708085MF166, 1908085MF219), Key Program of Natural Science Project of Educational Commission of Anhui Province (Grant No.KJ2017A013). References [1] D. Fradkin, F. M¨orchen, Mining sequential patterns for classification, Knowledge and Information Systems 45 (3) (2015) 731–749. [2] L. Yang, K. Wen, Q. Gao, X. Gao, F. Nie, SVM based multi-label learning with missing labels for image annotation, Pattern Recognition 78 (2018) 307–317. [3] F. Ye, Evolving the SVM model based on a hybrid method using swarm optimization techniques in combination with a genetic algorithm for medical diagnosis, Multimedia Tools and Applications (3) (2018) 3889–3918. [4] U. Khan, L. Schmidt-Thieme, A. Nanopoulos, Collaborative SVM classification in scale-free peer-to-peer networks, Expert Systems with Applications 69 (2017) 74–86. [5] C. C. Chang, C. J. Lin, Libsvm: A library for support vector machines, ACM transactions on intelligent systems and technology 2 (3) (2011) 27.

23

[6] L. L. Wang, H. Y. Ngan, N. H. Yung, Automatic incident classification for large-scale traffic data by adaptive boosting SVM, Information Sciences 467 (2018) 59–73. [7] D. Lee, LS-GKM: a new gkm-SVM for large-scale datasets, Bioinformatics 32 (14) (2016) 2196–2198. [8] J. R. Cano, F. Herrera, M. Lozano, Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study, IEEE Transactions on Evolutionary Computation 7 (6) (2003) 561–575. [9] J. Nalepa, M. Kawulok, Selecting training sets for support vector machines: a review, Artificial Intelligence Review (6) (2018) 1–44. [10] N. Verbiest, J. Derrac, C. Cornelis, S. Garc´ıa, F. Herrera, Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis, Applied Soft Computing 38 (2016) 10–22. [11] I. Czarnowski, Cluster-based instance selection for machine classification, Knowledge and Information Systems 30 (1) (2012) 113–133. [12] J. Cervantes, F. G. Lamont, A. L´opez-Chau, L. R. Mazahua, J. S. Ru´ız, Data selection based on decision tree for SVM classification on large data sets, Applied Soft Computing 37 (2015) 787–798. [13] S. Tong, E. Chang, Support vector machine active learning for image retrieval, in: Proceedings of the ninth ACM international conference on Multimedia, ACM, 2001, pp. 107–118. [14] Y. Fu, X. Zhu, B. Li, A survey on instance selection for active learning, Knowledge and Information Systems 35 (2) (2013) 249–283. [15] M. Kawulok, J. Nalepa, Support vector machines training data selection using a genetic algorithm, in: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition, Springer, 2012, pp. 557–565. [16] J. Nalepa, M. Kawulok, Adaptive genetic algorithm to select training data for support vector machines, in: European Conference on the Applications of Evolutionary Computation, Springer, 2014, pp. 514–525. [17] M. Kawulok, J. Nalepa, W. Dudzik, An alternating genetic algorithm for selecting SVM model and training set, in: Mexican Conference on Pattern Recognition, Springer, 2017, pp. 94–104. [18] J. Nalepa, M. Kawulok, The smaller, the better: Selecting refined SVM training sets using adaptive memetic algorithm, in: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, ACM, 2016, pp. 165–166. [19] J. Nalepa, M. Kawulok, Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs, Neurocomputing 185 (2016) 113–132. [20] P. B. Miranda, R. B. Prudˆencio, A. C. P. de Carvalho, C. Soares, Multi-objective optimization and meta-learning for SVM parameter selection, in: The 2012 International Joint Conference on Neural Networks, IEEE, 2012, pp. 1–8. [21] A. Rosales-P´erez, J. A. Gonzalez, C. A. C. Coello, H. J. Escalante, C. A. Reyes-Garcia, Surrogate-assisted multiobjective model selection for support vector machines, Neurocomputing 150 (2015) 163–172. [22] H. G. Jung, G. Kim, Support vector number reduction: Survey and experimental evaluations, IEEE Transactions on Intelligent Transportation Systems 15 (2) (2013) 463–476. [23] R. Pighetti, D. Pallez, F. Precioso, Improving SVM training sample selection using multi-objective evolutionary algorithm and LSH, in: 2015 IEEE Symposium Series on Computational Intelligence, IEEE, 2015, pp. 1383–1390. [24] A. Rosales-P´erez, S. Garc´ıa, J. A. Gonzalez, C. A. C. Coello, F. Herrera, An evolutionary multiobjective model and instance selection for support vector machines with pareto-based ensembles, IEEE Transactions on Evolutionary Computation 21 (6) (2017) 863–877. [25] G. Acampora, F. Herrera, G. Tortora, A. Vitiello, A multi-objective evolutionary approach to training set selection for support vector machine, Knowledge-Based Systems 147 (2018) 94–108. [26] G. Cerruela Garc´ıa, A. de Haro Garc´ıa, J. P. P. Toledano, N. Garc´ıa Pedrajas, Improving the combination of results in the ensembles of prototype selectors, Neural Networks 118 (2019) 175–191. [27] E. Leyva, A. Gonz´alez, R. P´erez, Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective, Pattern Recognition 48 (4) (2015) 1523–1537. ´ Arnaiz Gonz´alez, J. F. D´ıez Pastor, I. A. Gunn, Instance selection improves geometric mean [28] L. I. Kuncheva, A. accuracy: a study on imbalanced data classification, Progress in Artificial Intelligence 8 (2) (2019) 215–228. [29] E. Giasson, A. Ten Caten, T. Bagatini, B. Bonfatti, Instance selection in digital soil mapping: a study case in rio grande do sul, brazil, Ciˆencia Rural 45 (9) (2015) 1592–1598. [30] C. Davatz, C. Inzinger, J. Scheuner, P. Leitner, An approach and case study of cloud instance type selection for multi-tier web applications, in: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE Press, 2017, pp. 534–543. [31] J. A. Olvera L´opez, J. A. Carrasco Ochoa, J. F. Mart´ınez Trinidad, J. Kittler, A review of instance selection methods, Artificial Intelligence Review 34 (2) (2010) 133–143. [32] C. Qian, Y. Yu, Z. H. Zhou, Subset selection by pareto optimization, in: Advances in Neural Information Processing Systems, 2015, pp. 1774–1782.

24

[33] Y. Tian, X. Zhang, C. Wang, Y. Jin, An evolutionary algorithm for large-scale sparse multi-objective optimization problems, IEEE Transactions on Evolutionary Computation, 2019. [34] D. L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics (3) (1972) 408–421. [35] D. R. Wilson, T. R. Martinez, Reduction techniques for instance-based learning algorithms, Machine learning 38 (3) (2000) 257–286. [36] J. Chen, C. Zhang, X. Xue, C. L. Liu, Fast instance selection for speeding up support vector machines, KnowledgeBased Systems 45 (2013) 1–7. [37] D. E. Goldberg, J. H. Holland, Genetic algorithms and machine learning, Machine learning 3 (2) (1988) 95–99. [38] T. R. Babu, M. N. Murty, Comparison of genetic algorithm based prototype selection schemes, Pattern Recognition 34 (2) (2001) 523–525. [39] A. Pradhan, Support vector machine: A survey, International Journal of Emerging Technology and Advanced Engineering 2 (8) (2012) 82–85. [40] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation 6 (2) (2002) 182–197. [41] E. Zitzler, M. Laumanns, L. Thiele, Spea2: Improving the strength pareto evolutionary algorithm, TIK-report 103. [42] Q. F. Zhang, H. Li, Moea/d: A multiobjective evolutionary algorithm based on decomposition, IEEE Transactions on Evolutionary Computation 11 (6) (2007) 712–731. [43] Y. Tian, R. Cheng, X. Y. Zhang, F. Cheng, Y. C. Jin, An indicator-based multiobjective evolutionary algorithm with reference point adaptation for better versatility, IEEE Transactions on Evolutionary Computation 22 (4) (2017) 609–622. [44] G. Q. Zeng, J. Chen, L. M. Li, M. R. Chen, L. Wu, Y. X. Dai, C. W. Zheng, An improved multi-objective populationbased extremal optimization algorithm with polynomial mutation, Information Sciences 330 (2016) 49–73. [45] L. Zhang, H. B. Pan, Y. S. Su, X. Y. Zhang, Y. Y. Niu, A mixed representation-based multiobjective evolutionary algorithm for overlapping community detection, IEEE Transactions on Cybernetics 47 (9) (2017) 2703–2716. [46] X. Y. Zhang, F. C. Duan, L. Zhang, F. Cheng, Y. C. Jin, K. Tang, Pattern recommendation in task-oriented applications: A multi-objective perspective, IEEE Computational Intelligence Magazine 12 (3) (2017) 43–53. [47] Q. G. Zeng, J. Chen, X. Yu, L. M. Li, C. R. Zheng, M. R. Chen, Design of fractional order pid controller for automatic regulator voltage system based on multi-objective extremal optimization, Neurocomputing 160 (2015) 173–184. [48] M. R. Chen, G. Q. Zeng, K. D. Lu, Constrained multi-objective population extremal optimization based economicemission dispatch incorporating renewable energy resources, Renewable Energy 143 (2019) 277–294. [49] E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach, IEEE Transactions on Evolutionary Computation 3 (4) (1999) 257–271. [50] S. Y. Jiang, S. X. Yang, X. Sheng, L. Wang, X. B. Liu, Scalarizing functions in decomposition-based multiobjective evolutionary algorithms, IEEE Transactions on Evolutionary Computation 22 (2) (2018) 296–313. [51] C. F. R. Lacour, K. Klamroth, A box decomposition algorithm to compute the hypervolume indicator, Computers & Operations Research 79 (2017) 347–360. [52] W. Y. Gong, A. M. Zhou, Z. H. Cai, A multioperator search strategy based on cheap surrogate models for evolutionary optimization, IEEE Transactions on Evolutionary Computation 19 (5) (2015) 746–758.

25

Fan Cheng received the B.Sc. in 2000 and M.Sc. in 2003 both from HeFei University of Technology, China. He received the Ph.D. in 2012 from University of Science and Technology of China, China. Now he is an Associate Professor of School of Computer Science and Technology at Anhui University, China. His main research interests include machine learning, imbalanced classification, multi-objective optimization, and complex network.

Jiabin Chen is a Master student of School of Computer Science and Technology at Anhui University, China. He received the B.Sc. in 2017 from Anhui Jianzhu University, China. His research interests are multi-objective optimization and instance selection.

Jianfeng Qiu received the B.Sc. from AnQing Normal University in 2003. He received the M.Sc. i n 2006 and Ph.D. in 2014 from Anhui University, China. Currently, he is a lecture in the School of Computer Science and Technology, Anhui University, China. His main research interests include machine learning, imbalanced classification, multi-objective optimization, and complex network.

26

Lei Zhang received the B.Sc. from Anhui Agriculture University in 2007, and the Ph.D. in 2014 from University of Science and Technology of China. Currently, he is an Associate Professor in the School of Computer Science and Technology, Anhui University, China. His main research interests include multi-objective optimization and applications, data mining, social network analysis and pattern recommendation. He has published more than 40 papers in refereed conferences and journals, such as ACM SIGKDD, ACM CIKM, IEEE ICDM, IEEE TCYB, ACM TKDD, IEEE CIM and Information Sciences. He is the recipient of the ACM CIKM’12 Best Student Paper Award.

27