Adaptive recommendation model using meta-learning for population-based algorithms

Adaptive recommendation model using meta-learning for population-based algorithms

Accepted Manuscript Adaptive recommendation model using meta-learning for population-based algorithms Xianghua Chu , Fulin Cai , Can Cui , Mengqi Hu ...

NAN Sizes 0 Downloads 142 Views

Accepted Manuscript

Adaptive recommendation model using meta-learning for population-based algorithms Xianghua Chu , Fulin Cai , Can Cui , Mengqi Hu , Li Li , Quande Qin PII: DOI: Reference:

S0020-0255(18)30810-7 https://doi.org/10.1016/j.ins.2018.10.013 INS 13994

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

20 October 2017 11 October 2018 12 October 2018

Please cite this article as: Xianghua Chu , Fulin Cai , Can Cui , Mengqi Hu , Li Li , Quande Qin , Adaptive recommendation model using meta-learning for population-based algorithms, Information Sciences (2018), doi: https://doi.org/10.1016/j.ins.2018.10.013

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Adaptive recommendation model using meta-learning for population-based algorithms Xianghua Chu1, 2, Fulin Cai1, Can Cui3, Mengqi Hu4, Li Li1, Quande Qin1,* 1. College of Management, Shenzhen University, China 2. Institute of Big Data Intelligent Management and Decision, Shenzhen University, China USA

CR IP T

3. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 4. Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, USA *Corresponding author, email: [email protected]

Abstract: To efficiently solve complex optimization problems, numerous population-based meta-heuristics and extensions have been developed. However, the performances of the algorithms vary depending on the problems. In

AN US

this research, we propose an Adaptive Recommendation Model (ARM) using meta-learning to identify appropriate problem-dependent population-based algorithm. In ARM, the algorithms are adaptively selected by mapping the problem characteristics to the algorithm performance. Since the meta-features extracted and meta-learner adopted would significantly affect the system performance, 18 meta-features including statistical, geometrical and landscape features are extracted to characterize optimization problem spaces. Both instance-based and model-based learners are investigated. Two performance metrics, Spearman‟s rank correlation coefficient and success rate are used to evaluate the accuracy of optimizer‟s ranking prediction and the precision of the best

M

optimizer recommendation. The proposed ARM is compared against population-based algorithms with distinct search capabilities such as PSO variants, non-PSO population-based optimizers, hyper-heuristics and ensemble

ED

methods. Benchmark functions and real-world problems with various properties are adopted in the experiments. Experimental results reveal the extendibility and effectiveness of ARM on the diverse tested problems in terms of

PT

solution accuracy, ranking and success rate.

Keywords: global optimization, algorithm selection, meta-learning, population-based algorithm, recommendation

CE

system

1. Introduction

AC

The increasing complexity of real-world optimization problems urges researchers to develop efficient

meta-heuristic algorithms. Consequently, numerous population-based algorithms have been proposed, e.g., particle swarm optimization (PSO) [23], differential evolution (DE) [48] , artificial bee colony algorithm (ABC) [22]. Extensive research had investigated the applications of these algorithms to diverse global optimization problems, and the efficacy of the population-based algorithms had been demonstrated [6, 7, 26, 39, 42]. However, it is known that the performances of these algorithms and their variants are problem-dependent. The immediate challenge becomes, for a new problem without prior knowledge, it is difficult to identify the appropriate algorithm. The simplest solution is to take a trial-and-error approach. But it is computationally expensive, limiting its application to real-time, dynamic and large-scale problems. Researchers have explored strategies to enhance algorithms‟ general search capability such as hybrid meta-heuristics [2] and parameter control methods [13]. Again, these methods are tailored for the specific problems, and the performance of the algorithms on new problems is

ACCEPTED MANUSCRIPT

questionable. Recently, hyper-heuristics [4], algorithm ensemble [29] and algorithm portfolio [38] are proposed to alleviate the dilemma of the algorithm generalization. Hyper-heuristic adopts a high-level algorithm to guide the low-level algorithms to tackle the problem. The algorithm portfolio focuses on allocating computation time among algorithms and fully utilizes the advantages of these algorithms to maximize the expected utility of a problem-solving episode. While promising, the two methods disregard the similarities between the current problem and previously observed problems [5]. In addition, the algorithms‟ performances require the prior information and expert knowledge of search algorithms.

CR IP T

In this paper, we propose a generalized data-driven recommendation system taking advantage of meta-learning concept that can adaptively select the best optimizer for global optimization problems based on the problem characteristics. Meta-learning is a novel approach previously applied in machine learning field. The principle of meta-learning is to investigate the learning instance and refine its mechanism for the future applications for other instances [3, 40]. Since inception, meta-learning has been applied to different areas [43], e.g. the selection of kernel width of support vector regression [47], clustering algorithm selection [14], energy consumption forecasting [11], meta-modeling process[10], and scheduling optimization [44], just to name a few.

AN US

To the best of our knowledge, limited research has empirically investigated the relationship between the performance of population-based algorithms and the feature landscapes of the optimization problems [34]. We contend discovering the algorithm-problem relationship is important as it will provide the general guidelines to identify the appropriate algorithms for a given problem. To fill this gap, we develop a self-adaptive automatic learning mechanism in which the connection between the meta-features and the algorithm performance is bridged. As a result, given a new problem, the landscape features are recognized by meta-learner to adaptively select the

M

best optimizer for global optimization. Specifically, three challenges need to be addressed: (1) In meta-learning area, the selection of meta-features which characterize optimization problems plays a critical role on the selection performance. It is challenging to devise a set of meta-features which are generalized

ED

and efficient to characterize the diverse problem spaces. Therefore, what meta-features are crucial to depict the landscape of global optimization?

(2) The optimization problem is complex and black-box in general, in which meta-features are generated via

PT

sampling data to represent the problem space. Hence, to characterize the problem landscape as accurately as possible, how to efficiently sample data and what sample size is appropriate?

CE

(3) Due to the large number of population-based algorithms, it is impossible to enumerate all the algorithms in the algorithm repository. How to compose the algorithm repository including representative algorithms for recommendation, and further validate the proposed meta-learning mechanism?

AC

(4) In addition to the benchmark functions, how well does ARM perform on more complex test functions and

real-world problems? To address the above questions, a population-based algorithm recommendation model, termed Adaptive

Recommendation Model (ARM) is proposed. There are four components in the ARM: meta-feature module, algorithm performance module, learning module and recommending module. The meta-feature module consists of a set of meta-level features to demonstrate the problem properties based on representative sampling data from the problem; the algorithm performance module contains the population-based optimizers and the metrics to evaluate performance of the optimizers; the learning module obtains a mature model that can map the meta-features to the algorithm performance through a training series; the recommendation module is to select an appropriate optimizer for a new problem by the trained model. The main goal of the model is to achieve an optimal recommendation model through building a knowledge repository that maps meta-features and algorithm performance.

ACCEPTED MANUSCRIPT

To validate the efficacy of the proposed ARM, eleven-step procedure is launched: (1) A pool of 33 optimization problems collected from [7] is first constructed as problem repository. (2) A set of population-based algorithms, i.e. six PSO variants with distinct characteristics and properties, are initially employed to build the algorithm repository. The PSOs include global topology PSO (GPSO) [42], local topology PSO (LPSO) [24], local topology PSO with constriction factor (LCPSO) [24, 26], comprehensive learning PSO (CLPSO) [26], cooperative PSO (CPSO) [15], and unified PSO (UPSO) [37]. (3) The rankings of algorithm performances on each benchmark function are captured in terms of the average best fitness value and t-test comparison. (4) Latin hypercube sampling (LHS) technique [17] is employed to gather a representative dataset of benchmark function. (5) Eighteen meta-features (statistical, geometrical and landscape) with computationally efficient characteristics are derived

CR IP T

from the sampled dataset for the depiction of problem landscape. (6) Two types of meta-leaners, instance-based (k-nearest-neighbors, k-NN) vs. model-based (artificial neural-network, ANN), are implemented to study the meta-learning process given different problem dimensions and varied algorithm performance. (7) Spearman‟s rank correlation coefficient and the Success rate are used to assess the performance of ARM. (8) The performances of the two meta-learners are validated and compared with different problem dimensions and population sizes. (9) Five population-based algorithms (DE, ABC, backtracking search algorithm (BSA) [8], differential search optimization

AN US

algorithm (DSA) [9] and firefly algorithm (FA) [50]) are employed to validate the effectiveness and extendibility of ARM. (10) Three ensemble algorithms and one hyper-heuristic (Ensemble of mutation strategies and parameters in DE (EPSDE) [30], ensemble particle swarm optimizer [29], multi-population ensemble DE (MPEDE) [49] and drone squadron optimization (DSO) [12]) are introduced into algorithm pool to demonstrate the generalization and performances of the ARM. (11) Fifteen benchmark functions from CEC 2015 and two complex real-world problems are adopted to validate the performance of ARM. Experimental results reveal that the proposed framework could achieve a reliable correlation on algorithm rankings, and averagely over 88% success rate in the

M

best performer recommendation for the benchmark functions and real-world problems. In summary, there are three contributions from the proposed ARM:

ED

(1) A practical set of meta-features is proposed to describe the problem space of global optimization. (2) To the best of our knowledge, this is the first adaptive recommendation model using meta-learning from the machine learning domain for population-based algorithms addressing global optimization.

PT

(3) The knowledge-based learning mechanism of ARM does not require extensive expertise, and it has the flexibility and extendibility to include more efficient optimizers for complex optimization problems.

CE

The rest of the paper is organized as follows: related works about population-based algorithms are reviewed in Section 2. Section 3 presents a detailed description of ARM. Experiments and results are discussed in Section 4.

AC

Section 5 provides a discussion of the conclusions and future work.

2. Related work According to “No-Free-Lunch” theorem, no single population-based algorithm performs well on all

problems. Rather than improving single population-based algorithm, researchers have attempted to integrate different algorithms to enhance the adaptability and performance. Specifically, research attempts to integrate algorithms with complementary characteristics into a hybrid version, termed hybrid metaheuristics [1]. Algorithm portfolio mechanism is proposed to avoid betting the entire computational budget on a single algorithm [38]. Another notable effort is on adaptively selecting the appropriate algorithms for the problems [36]. Hyper-heuristics [4] is to design as high-level heuristics to guide the low-level heuristics during the searching process. One critique is these methods choose to ignore the similarities between the current problem and previously observed problems

ACCEPTED MANUSCRIPT [5]. And, the algorithms‟ performances require the prior information and expert knowledge of search algorithms. In addition, all methods reviewed above do not harvest the algorithm-problem relationship which maybe instrumental to select the appropriate algorithms for the given problems. In addition, more involved algorithms entail greater waste of computational sources, which are usually limited in practice. In fact, the relationship between problems and algorithm performance should be considered. Meta-learning has been widely applied in machine learning field for selecting the appropriate parameter set or model for a complex task [3, 43]. Previous research mainly lies in two directions: parameter tuning [27, 33, 45, 47] and algorithm selection [11, 14, 20, 44]. In parameter tuning, Soares et al. adopt meta-learning to set the parameter of kernel width for support vector regression [47]. The error of regression result is remarkably reduced

CR IP T

comparing with traditional methods (fixed default ranking, cross-validation and a heuristic method); The initialization was recommended by meta-learning to located in a good region in [46]. This facilitates the multi-objective optimization algorithms to search for the promising Pareto front. Parameter settings affect the performance of population-based algorithms. Smith and Fogarty theoretically summarize the adaptive parameter control methods, including meta-learning, for genetic algorithm [45]. To select the optimal parameter configuration for CMA-ES algorithm, Muñoz et al. implement a neural network regression model to predict the performance of

AN US

expected running time on the black box problems [33]. In this study, prior expert knowledge is required to predetermine a set of candidate parameter settings for model input. Although the parameters are appropriately set as the recommendations, it is interesting to observe that algorithm performances are more dependent on the solving skills rather than the tuned parameters [27]. In the area of algorithm selection,

From the perspective of algorithm selection using meta-learning, Cui et al. [11] develop a framework that recommends an appropriate surrogate model to forecast the short-term building energy consumption, where six

M

statistical and machine learning models are studied. To facilitate the practitioner selecting appropriate the best clustering algorithm, the high-quality rankings of clustering algorithms are predicted by k-Nearest Neighbors algorithm using meta-learning concept [14]. Muñoz et al. [34] focus on

the algorithm selection for black-box

ED

continuous optimization problems and study the current landscape analysis methods. Researchers also investigate the application of meta-learning in scheduling problem [44] and traveling salesman problem (TSP) [20]. In [44], Smith-Miles et al. solve the selection problems between two heuristics (the earliest due date and shortest

PT

processing time) for scheduling problems. The authors investigate meta-learning‟s performance using adopting neural network and decision tree as performance prediction models. A meta-learning model with a specific meta-feature set for TSP instances is proposed to predict the rankings of meta-heuristics in [20].

CE

The application of meta-learning is an emerging field, yet, it is noted that the applicability of meta-learning

on population-based optimizers has not been fully investigated. Meta-learning, which learns from previous experience, can operate at a more generalized level. Specifically, it has the potential to study different datasets,

AC

ranging from different problem domains to different features [36]. This research is to develop a data-driven framework that can adaptively identify the appropriate population-based optimizer for global optimization problems. The knowledge discovered from relationship between the meta-features (problem landscape features) and algorithm performance (optimizer performance metrics) will help the user to identify the appropriate model for a new global optimization problem.

3. ARM for population-based algorithms In this research, derived from Rice‟s work in machine learning [40], a population-based algorithm recommendation model is proposed to adaptively select the appropriate method for numerical optimization

ACCEPTED MANUSCRIPT

problems as illustrated in Fig. 1. The population-based algorithm recommendation system can be viewed as a meta-learning based framework that models the relations of algorithm performance and optimization problem characteristics. The proposed ARM has four components: meta-feature module, algorithm performance module, learning module and recommending module. The details are described in the following subsections.

Repository of problems

CR IP T

Algorithm performance module

Meta-feature module

Repository of algorithms

Data sampling Performance measurement

Meta-feature calculation

New problem

Solve

Meta-learning process

Meta-learner

Data sampling

Recommendation system

Meta-feature calculation

Recommending module

M

Learning module

Optimizer selection

AN US

Algorithm performance

Meta-features

3.1.

ED

Fig. 1 Framework of ARM

Meta-feature module

Identifying the appropriate set of meta-features to characterize a problem landscape is a crucial step for

PT

meta-level induction learning [43]. Meta-feature selection and extraction greatly affect the subsequent learning performance as the meta-learning algorithm (meta-learner) is sensitive to the set of meta-features.

CE

To comprehensively capture the problem characteristics, in this research we consider three types of meta-features: statistical features, geometrical measurement features and landscape features. The statistical features are statistical information of the problem including the mean, standard deviation, skewness, kurtosis,

AC

altitude of search space, first quartile, median and third quartile; The geometrical measurement features describe the complexity of the problem surfaces which includes gradient-based features 1) - 4), outlier ratio and ratio of local extrema. The basic statistical features and geometrical measurement features are empirically selected from more

than

40

features

applied

in

these

meta-learning

studies

[10];

Landscape

features

with

computationally-efficient characteristic are chosen to represent dispersion [28] and fitness distance correlation [19]. The definitions of the 18 derived meta-features are as follows. With a sample of N data points, the gradient value Gi of the ith data point is calculated as:

Gi  f ( xi )  f ( xi  step), i  1,...N.

(1)

where xi  ( xi1 , xi2 ,..., xiD ) is the position of point i; f ( xi ) refers to the fitness value of point i and the step is defined as 1% of the function range.

ACCEPTED MANUSCRIPT

1)

G : Mean of Gradient of Fitness Surface, which evaluates how steep and rugged the fitness surface is according to its rate of change around the sampled data points. N

G  1 N  Gi

(2)

i

2)

M ( G ) : Median of gradient of fitness surface.

3)

SD( G )

: Standard deviation of gradient of fitness surface, which evaluates the variation of the gradient on

N

SD( G )  1 ( N  1) ( Gi  G ) 2 i 1

4)

f : Mean of fitness values, which evaluates the general fitness value of the surface. N

f  1 N  fi

(4)

AN US

i 1

6)

(3)

G max : Max of gradient of fitness surface, which illustrates the maximal degree of sudden change on the

surface.

5)

CR IP T

the surface.

SD( f ) : Standard deviation of fitness values, which evaluates the bumpiness of the surface by measuring each value‟s deviation by

N

SD( f )  1 ( N  1) ( f i  f ) 2

(5)

i 1

 1 ( f ) : Skewness of fitness values, which measures the lack of symmetry on the surface.

M

7)



3



 1 ( f )  E ( fi  f ) Std .( f i )  , i  1,..., N

 2 ( f ) : Kurtosis of fitness values, which measures the surface relative to the normal distribution.

ED

8)

PT

 2 ( f )  E ( fi  f )4  ( E ( fi  f )2 )2 i  1,..., N

9)

(6)

(7)

f : Altitude of search space, which evaluates the degree of the search space altitude on the basis of

CE

difference between the upper and lower bound of fitness values. f  lg(abs(max( fi )  min( fi ))), i  1,..., N

(8)

10) Q1 of fitness value: 25% quartile of response values, which represents the lower quartile of fitness values.

AC

11) Q2 of fitness value: 50% quartile of response values, which represents the median quartile of fitness values. 12) Q3 of fitness value: 75% quartile of response values, which represents the upper quartile of fitness values. 13) Outlier ratio: Ratio of outliers of fitness values, which evaluates percentage of extreme values among all values through the Grubbs test [16].

14) & 15) Ratio of local extrema: Ratio of local minima and maxima with 4 nearby points in Euclidian space, which distinguishes fitness surfaces as bumpy surface or flat. 16) Depth of local extrema (DLE): Averaged depth of local minima and maxima with 4 neighborhoods in Euclidian space, which describes how vigorously swarms jump out of the local optima by looking into the average maximum fitness value distance among neighborhoods. 17) Dispersion (DISP): The pairwise distance between the q best points, which identifies the global structure. In

ACCEPTED MANUSCRIPT

this study, q is equal to 20% of the sampled size. DISP 

q q 1 d ( xi , x j )   q(q  1) i 1 j 1, j  i

(9)

18) Fitness distance correlation (FDC): The relationship between the position and fitness value, which indicates the capability to identify deceptiveness in the landscape. FDC 

1 N yi  y di  d )( ) ( N  1 i 1  y d

(10)

CR IP T

where di is the distance between point i and the best position; y and d are the average fitness value and distance;

 y and  d are the standard deviations of the fitness value and the distance.

To develop the repository of global optimization problems, 33 problems with various properties are initially selected for knowledge training. The problems include unimodal functions without rotation, rotated unimodal functions, multimodal functions without rotation, rotated multimodal functions, noisy and mis-scaled functions

AN US

(detailed properties are summarized in Appendix A). To include all the properties of optimization problems is challenging, we aim to extend the knowledge gained from these cases to address new problems. Then, the problem set of CEC2015 benchmark functions is employed to validate the effectiveness of the obtained representative knowledge. Please note that, more functions can be added to the problem repository for knowledge learning in the next stage, which can improve the recommendation accuracy of the proposed model.

Given the problem repository, LHS technique is employed to sample raw data from each optimization

M

function for landscape characterization [17]. LHS has high uniformity and coverage in a multidimensional distribution that requires less computational effort; therefore, it is a popular statistical sampling method in the construction of the propagation of uncertainty in analyses of complex systems. In addition, when the sample size is

ED

small, LHS makes simulations converge faster than traditional random sampling technique such as Monte Carlo

3.2.

PT

sampling [31].

Algorithm performance module Since the population-based algorithms conduct stochastic search, different results can be obtained for each

CE

run. Thus, to measure the algorithm performance, every algorithm is run 30 times for each benchmark function, and the maximum number of function evaluations is set to be Dimension (D) *10000. The algorithms are tested on the 33 problem functions to capture the average solution accuracy. The mean value of the best fitness value (ABF)

AC

over the 30 runs as a common metric for comparing algorithm performance [7] is adopted for performance measurement. The calculation of ABF is given by the following equation: 30

ABF   best fitness value / total # of runs

(11)

n 1

The performance of each algorithm is sorted by the average best fitness value, in addition, a significance test (t-test) is conducted to validate the significance of the performance difference. A two-tailed t-test is employed for statistical comparison and the level of significance is 0.05. Because the population-based algorithms are adaptively selected for a new problem, the selection is based on the rankings derived from the ABF value. This mitigates the high variance of difficulty for optimization across various benchmark functions. Ranking facilitate the recommendation process which is scale-free and case-wise independent [10].

ACCEPTED MANUSCRIPT

Candidate algorithms in the repository have different strength and diverse characteristics. This is because the optimization problems always possess diverse properties. Proposing a variant of an algorithm is a common way to enhance the performance. However, a modified algorithm with high performance for all problems is challenging. Here six canonical PSO variants with different strengths are initially selected to first build the algorithm repository. These PSOs include GPSO [42], LPSO [24], LCPSO [24], CLPSO [26], CPSO [15] and UPSO [37]. The PSOs were separately improved in three directions: topology structure, learning strategy and parameter setting, as shown in the Table 1. As a result, the PSOs display various search properties. GPSO owning global topology converges quickly on unimodal problems but easily gets trapped at local optima; LPSO with local topology performs niching

slowly on unimodal problems.

Table 1 Categorization of PSO variants Parameter setting

Topology structure

LCFPSO (constriction factor) GPSO (weighted)

GPSO (global topology)

Learning strategy

CLPSO (comprehensive learning strategy) CPSO (cooperative learning strategy)

AN US

LPSO (local topology) UPSO (integration of global and neighbor

CR IP T

search that is effective for multimodal problems; and CLPSO performs well on complex problems while converges

topology)

Please note that, integrating all distinct population-based algorithms is challenging due to the large number of optimizers. Here our objective is to validate ARM on the representative algorithms including PSOs, PSO variants, non-PSO population-based optimizers, hyper-heuristics and ensemble methods. The five other

M

state-of-the-art population-based algorithms are DE [48], ABC [22], BSA [8], DSA [9] and FA [50]. These widely used algorithms are motivated by various sources of inspirations: ABC, FA and PSOs are inspired by the

ED

biologically collective behaviors in nature; DE, BSA and DSA are proposed using the genetic and mathematical type of population search. It is empirically known that the algorithms display diverse search behaviors on global optimization [8, 9, 21, 50]. Moreover, ensemble algorithms [29, 30, 49] and one hyper-heuristic [12] are included

PT

to justify the effectiveness of ARM.

Learning module

CE

3.3.

Generally, meta-learning algorithms are divided into two categories: instance-based learner and model-based

learner [43]. The instance-based learners predict the rankings using the problem similarity which is measured by a

AC

distance metric. It is based on the hypothesis that an algorithm has similar performance on similar problems; the model-based learners predict the rankings using an underlying model that is generated from a learning process that maps the meta-features to the algorithm performance.

3.3.1.

Instance-based meta-learner

The k-NN ranking approach is chosen as an instance-based learner, due to its wide use in other studies and its effectiveness and efficiency [3, 14, 20, 47]. The k nearest neighbors are selected by measuring the cosine similarity between the new problem and the meta-examples, based on the meta-features. Then, the rankings of performance are calculated to make a recommendation. The cosine similarity is calculated as follows:

ACCEPTED MANUSCRIPT

xi x j

similarity (i, j )  cos( xi , x j ) 

2

2

xi *

, i, j  1,..., m

(12)

xj

The ranking of the algorithm a on the new problem i is predicted as

rri , a 

 similarity(i, j)ir  similarity(i, j)

j,a

jN ( i )

(13)

j N ( i )

where rri and iri are the recommended rank and the ideal rank of algorithm i ; N (i) represents the set of k-NN of

3.3.2.

CR IP T

problem i.

Model-based meta-learner

In this study, a regression-based learner is trained on the meta-example datasets. Given the complexity and heterogeneities among the various meta-features and the ranking position values of each algorithm, an ANN [41] is employed due to its superiority on the non-linear function modeling and robustness to noisy and redundant features

3.4.

AN US

[20, 44].

Recommending module

After the meta-learning process is completed, the knowledge repository mapping the meta-features and the algorithms‟ performance is generated. To address a new optimization problem, several steps should be followed: Step 1: Sample data from the new problem space.

M

Step 2: Calculate the meta-features to characterize the new problem space using the proposed meta-feature set.

ED

Step 3: Input the meta-features into the recommendation system. Step 4: Output the recommended optimizer from the candidate algorithms. Step 5: Solve the new problem using the selected optimizer.

PT

Once the recommended rankings of the algorithms are calculated, two common evaluation metrics from the meta-learning field are adopted to measure the output performance: SRCC and SR.

CE

The SRCC [35] is employed to evaluate the agreement between recommended rankings and true (ideal)

AC

rankings.

 1 6*



N i 1 2

d i2

N ( N  1)

(14)

where di is the distance between the recommended ranking and the ideal ranking of algorithm i; N is the number of algorithms. For this coefficient, the value of 1 means full agreement while -1 refers to full disagreement. A value of 0 represents no relationship between the two rankings, which would be the expected value of a random ranking method. The SR is measured by the percentage of the exact matches between the ideal best performer and recommended best performer on the problems. It is calculated as the ratio of the total number of matches achieved by the predicted best performers to the tested problems. This is derived to evaluate the precision of the recommendation system. In the recommendation of optimization algorithms, users are primarily concerned about the selected best performer being equal to the ideal one. Hence, both the SRCC and the SR were employed to

ACCEPTED MANUSCRIPT

comprehensively compare the performance of the different population-based algorithms.

3.5.

ARM implementation The implementation procedure of ARM is illustrated as follows: Step 1: Collect candidate algorithms and optimization problems for constructing algorithms repository and

problem repository. Step 2: Test and obtain the result of each algorithm on each problem in the repositories. In this study, the

store the rankings as the meta-knowledge of algorithm performance.

CR IP T

performance is measured by ABF and tuned by t-test. Rank the performances of algorithms on each problem and

Step 3: Randomly sample raw data from the problem space in the repository using LHS, and calculate the corresponding 18 features using the sampled raw data. Store these as meta-feature.

Step 4: Cross-validate the meta-learning model according to the meta-features (features) and algorithm performance (label) and retrain the best model using all data.

AN US

For a new optimization problem, the ARM solution procedure is given as follows:

Step 1: Extract meta-features from the new problem and input them into the trained model. The rankings of the optimization algorithms are output as recommendations.

Step 2: Use the best recommended optimization algorithm to solve the new problem.

4. Experimental validation

M

Step 3: Recommendation results are evaluated using SRCC and SR.

ED

In this section, numerical experiments are grouped into four categories. In the first experiment, different sample sizes for data generated from the benchmark functions are tested separately with three dimensions (10D, 30D and 50D). The performance of meta-learner is sensitive to the sample size which depends on meta-feature

PT

generation and can affect the recommendation results of the proposed meta-learning model. Thus, the appropriate sample size must be selected at the initial stage. Once the sampling size is determined, the second experiment is conducted on PSOs to determine the appropriate meta-learner by exploring the performance of the meta-leaners on

CE

different learning instance repositories. Two types of meta-learner models (ANN and k-NN) are investigated. The first two experiments are studied using the benchmark functions shown in Appendix A on three dimensions (10D, 30D and 50D) separately [7]. The third experiment is to validate the effectiveness and extendibility of ARM on

AC

other population-based optimizers. Six optimizers and a pool of CEC2015 benchmark problems are employed for investigation. Besides, three ensemble algorithms and one hyper-heuristic are adopted to further verify the proposed model in the fourth experiment. In the last experiment, two real-world problems are adopted to validate practical performance of ARM. Regarding the general parameter setting for the optimizers, the maximum number of Fes (MAX_FES) is set to be 10000*D in 30 independent runs. The specific parameter settings are adopted from the corresponding original references. The rankings of each algorithm based on the ABF values among 33 benchmark functions are obtained for representing the performances as discussed in algorithm module. Moreover, both the instance-based meta-learner and the model-based meta-learner (ANN) are implemented to study the proposed ARM in the first two experiments. Based on the preliminary study, we found that the k-NN

ACCEPTED MANUSCRIPT

method with two neighbors performs more efficient (termed as 2NN). For the model-based learning method, ANN, the hidden layer size is tuned within the set [5, 10, 15], and the transfer function is selected from radial basis, logistic sigmoid and tan sigmoid. Besides, to select the appropriate model parameters and avoid over-fitting, a 10-fold cross validation process with 70% of the data used for training and 30% used for validation is applied. After training, ANN regression models are constructed for predicting the ranking values of candidate algorithms.

4.1.

Experiment on sample size adjustment The objective of this experiment is to identify the appropriate sample size for the data from benchmark

CR IP T

functions. Six PSO variants separately optimize each benchmark function with three dimensions (10D, 30D and 50D), and the results are used to build the learning instance repository. In this experiment, the parameter settings of PSO variants are the same to their original references [15, 24, 26, 37, 42].

0.9

1.0

10 Dimension

30 Dimension

0.9

0.6

Average SRCC

ANN KNN

0.7

AN US

Average SRCC

0.8

ANN KNN

0.8

0.7

0.5

0.6

1500

2500

3500

500

1500

M

500

Sample Size

ED

(a)

2500

3500

Sample Size

(b)

1.0

50 Dimension

AC

CE

Average SRCC

PT

0.9

ANN KNN

0.8

0.7

0.6 500

1500

2500

3500

Sample Size

(c) Fig. 2 the performance curve of two meta-learner in three dimensions.

In this experiment, we employ representative functions, f1, f5, f6, f17, f30 and f31 as the test set, and the remaining functions as the training set. Four sample sizes, 500, 1500, 2500 and 3500, are implemented by two meta-learners separately with the proposed meta-feature set. The SRCC value denotes the performance of the recommendation result on the test problems. Each experiment is conducted 10 times. The average SRCC values of the ten repetitions are computed for a multiple comparison test, and the results are presented in Fig. 2. It is

ACCEPTED MANUSCRIPT

observed that as the sample size grows, the curve of ANN is smooth on 30D and 50D problems but increases on 10D problems. For k-NN with 2-neighbor, the performance improves with an increase of sample size from 500 to 2500. Moreover, the speed of convergence slows down as dimensions grow. In the following experiments, the sample sizes are set around 2500, specifically, 2000, 2500 and 3000, which correspond with the dimensions of 10D, 30D and 50D.

4.2.

Validation of ARM on PSO variants The objective of this experiment is to evaluate the accuracy and robustness of ARM derived from PSO

CR IP T

variants. In this experiment, the leave-one-out cross validation strategy is adopted to validate the performance of predictions. Specifically, 32 out of 33 functions are treated as a training data set, and the remaining one is treated as the test set. Hence, the experiment is repeated 33 times to ensure that every function is iteratively trained and tested. For each repetition, the recommendation performance for each problem is measured by the SRCC and SR. Once all the values are generated, the average result over the 33 problems is obtained. To evaluate the stability of the proposed model, the processes on the 33 problems are conducted 30 times to gather the standard deviation of

AN US

recommendation results. Apart from the population size of 40, a population size of 60 on three dimensions (10D, 30D and 50D) is adopted to study the effectiveness of ARM. Please note that, this experiment aims to investigate the performance of the derived model with population size variation, the other parameters of the algorithms remain the same as their original references [15, 24, 26, 37, 42].

The average performance statistics of meta-learner are summarized in Table 2 and Table 3. „Baseline‟ refers to the mean of the target values across the 33 problems, and the mean commonly serves as a reference for the

M

quality of recommendation [3]. A value is greater than the Baseline indicating better quality. As can be seen from Table 2 and Table 3, the recommendation rankings given by both of learners greatly surpass the Baseline ranking. Both of learners have a high average value of SR and a low value average of SRCC for the low dimension. Besides,

ED

it can be observed that the standard deviations are less than 0.04, which indicates the promising convergence and stability of ARM. For the relatively simple problems, many optimizers are easily to find optimal solutions, resulting in similar ranking values. As a result, recommending an appropriate algorithm is easier than predicting

PT

the rankings between optimizers on the 10D problems. For the 30D and 50D problems, both SRCC and SR results are improved as the performance differences of the optimizers are more significant due to the increasing complexity.

CE

When the population size is 40, the performance of the two learners are similar in low dimension (10D

problems) but different for relatively high dimension. Specifically, ANN performs better on the 30D and the 50D problems than 2NN in terms of SRCC and SR. When the population size is 60, ANN outperforms 2NN on the

AC

three dimensions, which indicates the efficacy of ANN in the proposed model. This may be attributed to the fact that the instance-based meta-learner solely predicts the rankings depending on the features that characterize the problems. If the features are insufficient to fully depict the problem properties, finding the true similarity among the problems becomes difficult, which leads to ineffective algorithm selection of the learners. Besides, the diversity of training set may not be sufficient to allow 2NN to locate a more similar problem. The 2NN learner makes recommendation based on the overall rankings of similar algorithms. In some cases, although the problems have high similarity, the same rankings of algorithm performances are still unobtainable due to the randomness of population-based algorithms and the particular distinctions between problems. Though the model-based meta-learner is a supervised learning approach, it derives the model for relating the meta-features to the optimizers‟ rankings. As a result, the proposed model is more tolerant to not only the noises from the meta-features but also the change of rankings. Therefore, both of the meta-learners are appropriate for ARM, though the model-based

ACCEPTED MANUSCRIPT

meta-learner is more effective and adaptable than the instance-based meta-learner. In the following experiments, ANN is selected as the meta-learner. Table 2 Average performance statistics (mean±std) of meta-learner when population size is 40 10D Meta-learner

30D

50D

SR

SRCC

SR

SRCC

SR

ARM-2NN

0.64±0.02

0.87±0.03

0.74±0.02

0.8±0.02

0.68±0.01

0.82±0.02

ARM-ANN

0.62±0.02

0.87±0.05

0.75±0.03

0.84±0.03

0.7±0.02

0.84±0.02

Baseline

0.55

0.55

0.57

0.61

0.57

0.67

CR IP T

SRCC

Table 3 Average performance statistics (mean±std) of meta-learner when population size is 60 Meta-learner

10D

30D

50D

SRCC

SR

SRCC

SR

SRCC

SR

0.64±0.02

0.87±0.03

0.74±0.03

0.86±0.02

0.69±0.01

0.83±0.04

0.65±0.02

0.87±0.04

0.77±0.01

0.88±0.04

0.71±0.01

0.85±0.04

Baseline

0.57

0.61

0.72

0.61

0.49

0.58

AN US

ARM-2NN ARM-ANN

The optimization results of ARM are also compared with each standalone optimizer. The statistical results regarding the best performer on the benchmark problems are presented in Table 4 and Table 5. When population size is 40, it is observed that the proposed models outperform the other optimizers on all the experiments by the number of times that it performs the best, followed by CLPSO that achieves the best results on eighteen 10D problems, twenty 30D problems and tween-three 50D problems. Similar results can be observed for the algorithms

M

with population size of 60. Overall, ARM averagely obtains the best solutions on 28 out of 33 problems over three dimensional settings. Thus, ARM is not only a recommendation system which prevents high computational cost

ED

caused by the traditional trial-and-error method, but also an efficient optimizer for a diverse set of problems. Besides, ARM can significantly improve the computational efficiency. In this experiment, the traditional trial-and-error approach which runs over all the candidate algorithms takes almost one hour (50.67 minutes) to identify the best result of a global optimization problem on average. Whilst, ARM costs less than 10 minutes on

PT

average. All reported time consumptions are based on the experiments run

on a computer with AMD CPU

A8-7650K 3.3GHz, 8Gb RAM and Microsoft Windows 7.

CE

Table 4 Statistics of the best performer when population size is 40. Algorithm

Population size = 40 10D

30D

50D

15% (5/33)

0% (0/33)

0% (0/33)

LPSO

21% (7/33)

9% (3/33)

0% (0/33)

LCPSO

52% (17/33)

33% (11/33)

24% (8/33)

CLPSO

55% (18/33)

61% (20/33)

70% (23/33)

CPSO

3% (1/33)

0% (0/33)

0% (0/33)

UPSO

39% (13/33)

24% (8/33)

18% (6/33)

ARM-2NN

88% (29/33)

82% (27/33)

82% (27/33)

ARM-ANN

88% (29/33)

85% (28/33)

85% (28/33)

AC

GPSO

Table 5 Statistics of the best performer when population size is 60. Algorithm

Population size = 60

ACCEPTED MANUSCRIPT

30D

50D

GPSO

21% (7/33)

3% (1/33)

12% (4/33)

LPSO

9% (3/33)

36% (12/33)

0% (0/33)

LCPSO

61% (20/33)

61% (20/33)

33% (11/33)

CLPSO

39% (13/33)

61% (20/33)

48% (16/33)

CPSO

3% (1/33)

6% (2/33)

9% (3/33)

UPSO

42% (14/33)

39% (13/33)

18% (13/33)

ARM-2NN

88% (29/33)

85% (28/33)

82% (27/33)

ARM-ANN

88% (29/33)

88% (29/33)

85% (28/33)

CR IP T

10D

4.3.

Validation of ARM on various population-based optimizers

In this experiment, 6 various population-based optimizers with different sources of inspirations are included in algorithm repository to validate ARM‟s extendibility and effectiveness. Specifically, GPSO [42], DE [48], ABC [22], BSA [8], DSA [9] and FA [50]. ABC, FA and PSO were inspired by the biologically collective behaviors in

AN US

nature; DE, BSA and DSA were proposed using the genetic and mathematical type of population search. The algorithms are widely used and known to display diverse search behaviors for global optimization problems [8, 9, 21, 50]. The parameters of the candidate algorithms are set to the recommended values in the corresponding original references [8, 9, 22, 42, 48, 50]. The repository of problems is constructed by the 33 benchmark functions. For generalizability, the problem dimension is set as 30. Then, ANN is adopted as the meta-learner. The first three modules of ARM are conducted as previously described and six meta-learners are trained and prepared to make

M

recommendations for new problems.

To fully test the effectiveness of ARM, in this experiment, the pool of learning-based real-parameter optimization problems collected from CEC 2015 are added, including unimodal functions, multimodal functions,

ED

hybrid functions and composition functions. For detailed properties of the problems please refer to [25]. Each problem is treated as a new problem to be solved by ARM. Then, the true best algorithm among the candidate algorithms is verified using the trial-and-error method. The results are gathered and shown in Table 6. It can be

PT

observed that ARM obtains the best performance on 14 out of 15 functions. The high SRCC value on each problem indicates high agreement between the recommended rankings and the true rankings. Moreover, the distribution of the best performers shows that ARM can take great advantage of the diversity of the component algorithms in

CE

solving the CEC 2015 benchmark functions. Please note that although ARM is validated using the PSOs and tested on the other population-based

algorithms, the flexibility and knowledge learning mechanism of the model enables its scalability to more

AC

population-based optimizers. Attributed to the adaptive recommendation strategy, the performance of ARM will be scaled up with the integration of efficient algorithms with diverse search properties.

Table 6 Ranking values on CEC 2015 benchmark functions. Algorithm

f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

f12

f13

f14

f15

Accumulation

GPSO

4

5

6

3

5

4

3

4

6

4

6

5

6

6

1

1/15

DE

6

6

5

5

6

6

6

6

3

6

5

6

4

5

1

1/15

ABC

5

2

2

6

2

5

1

5

4

5

1

2

3

1

1

4/15

BSA

1

1

3

2

3

2

2

1

2

1

2

3

2

3

1

5/15

DSA

3

3

4

4

4

3

5

2

1

3

3

4

1

4

1

3/15

ACCEPTED MANUSCRIPT

FA

2

4

1

1

1

1

4

3

5

2

4

1

5

2

6

5/15

ARM

1

1

1

1

1

1

1

1

1

1

1

1

1

2

1

14/15

Table 7 SRCC value on each CEC 2015 benchmark functions. f

f1

f2

f3

f4

f5

f6

f7

f8

ARM

0.99

0.99

0.99

0.90

0.97

1

1

1

f

f9

f10

f11

f12

f13

f14

f15

Average

ARM

1

1

1

0.99

1

0.99

0.73

0.97

Experiment of ARM on ensemble algorithms and hyper-heuristics

CR IP T

4.4.

In this section, 3 ensemble algorithms and 1 hyper-heuristics are included to validate the performance of ARM. Specifically, DSO [12], EPSDE [30], EPSO [29], and MPEDE [49] are employed to establish the algorithm repository. The parameter settings of these algorithms in the corresponding original references are adopted in this experiment [12, 29, 30, 49]. The problem repository, problem dimensions, meta-leaner and test problems are consistent with the previous section. The results of ranking values are shown in Table 8 Ranking values on CEC

AN US

2015 benchmark functions.. As seen, ARM obtains the best performance on 14 out of 15 functions. The results of SRCC values are presented in Table 9. It is observed that ARM obtains an average value of 0.93 in this experiment, which shows that ARM could make highly accurate

recommendation. The experimental results indicate that

ARM outperforms the compared hyper-heuristics and ensemble algorithms in terms of obtaining the best results.

f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

f12

f13

f14

f15

Accumulation

DSO

4

4

1

4

3

3

3

3

4

4

3

4

3

4

4

1/15

EPSDE

2

2

4

2

4

4

4

2

2

3

4

2

4

1

1

2/15

EPSO

3

3

2

3

1

2

2

4

1

2

1

3

2

2

1

4/15

MPEDE

1

1

3

1

2

1

1

1

3

1

2

1

1

3

1

10/15

ARM

1

1

1

1

1

1

1

2

1

1

1

1

1

1

1

14/15

PT

ED

Algorithm

M

Table 8 Ranking values on CEC 2015 benchmark functions.

CE

Table 9 SRCC value on each CEC 2015 benchmark functions. f1

f2

f3

f4

f5

f6

f7

f8

ARM

0.95

1

0.95

1

0.65

1

0.95

0.8

f

f9

f10

f11

f12

f13

f14

f15

Average

ARM

0.95

1

0.95

1

1

1

0.8

0.93

AC

f

4.5.

Experiment on real-world problems

To validate the practical applicability of ARM, 2 real-world cases are studied [18, 32]: the spread spectrum radar polyphase code design problem and the parameter estimation for frequency modulation problem. Since ensemble algorithms and hyper-heuristics are relatively computationally expensive, the trained models and the repository from Section 4.3 are adopted for ARM to address the real-world problems in this test.

4.5.1.

Spread spectrum radar polyphase code design problem

ACCEPTED MANUSCRIPT

The selection of an appropriate waveform is the main factor in radar systems with pulse compression. To accomplish this, a new method was designed based on the properties of the aperiodic autocorrelation function and the hypothesis of the coherent radar pulse processing [32]. A spread spectrum radar polyphase code design problem (SSRP) can be modeled as a min-max nonlinear non-convex optimization problem with continuous variables. The problem surface is rough and has numerous local optima. The model is presented as bellow:



min f  x   max 1  x  ,...,2 m  x , X

 x ,..., x   R 1

n

n

(15)



0  x j  2 , j  1,..., n ,

(16)

CR IP T

x X

where m  2n  1 and

j n   2i 1  x    cos   xk  , i  1,..., n  k  2i  j 1 1  j i  



n

 xk  i  1,..., n  1   k  2i  j 1 1  j

 cos  

j  i 1

(18)

AN US

2i  x   0.5 

(17)

m  i  x   i  x  , i  1,..., m

(19)

The goal of the model is to minimize the so-called autocorrelation function of the largest receiver module in the pulse-compressed radar complex envelope output of the best receiver. The variable represents the symmetrical

M

phase difference. The continuous min–max global optimization problem is NP-hard [32], and characterized by the fact that the objective function is piecewise smooth. For this case, the meta-features of the problem are automatically generated and input into the meta-learners. The solution result is presented in Table 10. It can be

ED

observed that ARM obtains the best solution and best ranking on the spread spectrum radar polyphase code design

PT

problem.

Table 10 Comparison on spread spectrum radar polyphase code design problem GPSO

DE

ABC

BSA

DSA

FA

ARM

Result

1.07E+00

1.32E+00

1.22E+00

1.22E+00

1.31E+00

7.72E-01

7.72E-01

Ranking

2

6

3

3

5

1

1

SRCC 0.98

AC

CE

Algorithm

4.5.2.

Parameter estimation for frequency modulation

In telecommunications and signal processing, Frequency-Modulated (FM) is the encoding of information in

a carrier wave by varying the instantaneous frequency of the wave and FM synthesizer is a popular technique [18]. To make the generated sound wave more equivalent to the target sound, there are six parameters of the sound wave for an FM synthesizer that need to be optimized and is formed as

X  a1 , w1 , a2 , w2 , a3 , w3 

. The models of the

target sound wave and the estimated sound wave are respectively presented in Eqs. (20) and (21):

y0  t   sin  5 * t *   1.5sin  4.8 * t *   2sin(4.9 * t *  )  



y  t   a1 *sin w1 * t *  a2 *sin  w2 * t *  a3 *sin  w3 * t * 



(20)

(21)

ACCEPTED MANUSCRIPT where   2 / 100 , and the parameter t is defined in the range [-6.4, 6.35]. The objective function is to minimize





the summation of square errors between the target wave with the minimum value f X sol =0 and the estimated wave is given as bellow: 100

f  X     y  t   y0  t  

2

(22)

t 0

This problem is a highly complex multimodal model with strong epistasis. The recommended result is given in Table 11. It is seen from the result that ARM achieves the best performance on the parameter estimation for

Table 11 Comparison on parameter estimation for frequency modulation GPSO

DE

ABC

BSA

DSA

Result

5.25E+00

1.40E+01

7.21E+00

5.23E+00

6.89E+00

Ranking

2

5

4

1

3

FA

ARM

1.67E+01

5.23E+00

6

1

SRCC 0.90

AN US

Algorithm

CR IP T

frequency modulation in terms of solution accuracy.

In conclusion, ARM can recommend the correct best performers and highly similar rankings to the true values for the two real-world problems, which indicates ARM‟s practical effectiveness for real-world problems.

5. Conclusion

Numerous population-based algorithms have been proposed due to their efficiency for global optimization.

M

Although these algorithms have demonstrated promising performance on some of the problems, their overall effectiveness across a variety of problems is limited. Instead of using traditional trial-and-error method, an algorithm recommendation model based on prior knowledge is valuable to optimize a new problem. Therefore, we

ED

contend the characteristics of a given optimization problem may shed the light in choosing the appropriate algorithm using the meta-learning concept. In this paper, a generalized ARM based meta-learning model is proposed for adaptively selecting population-based optimizers by mapping the problem characteristics to algorithm

PT

performance. The gap between the algorithm performance and problem characteristics is bridged using the proposed four components: meta-feature module, algorithm performance module, learning module and recommending module. First, 18 computationally efficient meta-features including statistical features, geometrical

CE

measurement features and landscape features are empirically derived to characterize problem spaces. Second, algorithm performance is represented by rankings to normalize the computational difference on different problems. Third, a meta-learner in the learning module is triggered to learn the underlying mapping function of the problem

AC

properties and algorithm performance according to the knowledge learned from the first two modules. Finally, given a new problem, the meta-features are analyzed and the most appropriate optimizer is selected to address the problem. In this study, the benchmark functions of various characteristics and 2 real-world problems are collected as the problem repository and 15 various algorithms are included to validate the effectiveness of ARM. Experimental analysis is conducted through four experiments: (1) parameter tuning on the sample size for generating meta-features; (2) the ARM is initially tested using PSOs, and is comprehensively tested on benchmark functions with various properties in different dimensions; (3) the effectiveness of the ARM is then tested on 6 other population-based algorithms using the optimization problems from CEC 2015 and further validated by three ensemble algorithms and one hyper heuristic. (4) 2 real-world problems are adopted to validate ARM‟s practical performance. Experimental results demonstrate the effectiveness and efficiency of the proposed model for global optimization.

ACCEPTED MANUSCRIPT

In summary, the contributions of the developed model are threefold: (1) A practical set of meta-features is proposed to depict the problem space of global optimization; (2) To the best of our knowledge, this is the first adaptive recommendation model using meta-learning from the machine learning domain for population-based algorithms, which primarily addresses global optimization; (3) The knowledge-based learning mechanism of ARM, which requires little expert experiences, enables its flexibility and extendibility to include more efficient optimizers for complex problems. Please note that even though the model is derived using the PSOs and tested on other population-based algorithms, the performance of ARM can be scaled up with the integration of more efficient algorithms because of its adaptive recommendation strategy. In addition, ARM can serve as an alternative approach for traditional trial-and-error optimization tasks,

CR IP T

especially when the number of candidate optimizers is large and little prior knowledge of the problems is available. This study provides practical guidelines for the design, implementation and testing of a recommendation model for various global optimization problems. Specifically, it can facilitate non-experts with optimization algorithm selection by reducing the computational cost and improving optimization efficiency.

Despite ARM‟s promise, a considerable number of improvements could be undertaken for future research. (1)

Appropriate feature characterization for the global optimization problems could be explored further.

AN US

There are still many candidate features in the landscape analysis methods that describes the characteristics of local area [34]. In addition, the performance of the meta-learner requires investigation using other supervised learning algorithms, such as random forest, support vector regression and deep learning. (2)

Multi-criteria metrics, e.g. precision and computational cost, will be studied to enrich the ranking values for recommendation in addition to the ABF metric.

More numerical comparisons between ARM and other approaches, such as ensemble methods,

M

(3)

hyper-heuristics, and hybrid methods, are of great interest for further study. In addition, the impact of

PT

Acknowledgements

ED

algorithm diversity in ARM for global optimization problems will also be studied in our future work.

This work is partially supported by Major Project for National Natural Science Foundation of China (Grant No. 71790615, the Design for Decision-making System of National Security Management), the Key Project of

CE

National Nature Science Foundation of China (Grant No. 71431006, Decision Support Theory and Platform of the Embedded Service for Environmental Management), National Natural Science Foundation of China (Grant Nos. 71501132, 71701079, 71402103 and 71371127), Natural Science Foundation of Guangdong Province (Grant Nos.

AC

2016A030310067 and 2015A030313556), and 2016 Tencent “Rhinoceros Birds”- Scientific Research Foundation for Young Teachers of Shenzhen University. The authors would like to thank Dr Teresa Wu for her valuable advice in improving the quality of the manuscript.

References [1] M.A. Awadallah, M.A. Al-Betar, A.T. Khader, A.L.A. Bolaji, M. Alkoffash, Hybridization of harmony search with hill climbing for highly constrained nurse rostering problem, Neural Computing and Applications, 28 (3) (2017) 463-482. [2] C. Blum, J. Puchinger, G.R. Raidl, A. Roli, Hybrid metaheuristics in combinatorial optimization: A survey, Applied Soft Computing, 11 (6) (2011) 4135-4151.

ACCEPTED MANUSCRIPT

[3] P. Brazdil, C.G. Carrier, C. Soares, R. Vilalta, Metalearning: Applications to data mining, Springer Science & Business Media, 2008. [4] E.K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. Özcan, R. Qu, Hyper-heuristics: A survey of the state of the art, Journal of the Operational Research Society, 64 (12) (2013) 1695-1724. [5] X. Chen, Y.S. Ong, M.H. Lim, K.C. Tan, A multi-facet survey on memetic computation, IEEE Transactions on Evolutionary Computation, 15 (5) (2011) 591-607. [6] X. Chu, J. Chen, F. Cai, L. Li, Q. Qin, Adaptive brainstorm optimisation with multiple strategies, Memetic Computing, (2018) 1-14. [7] X. Chu, T. Wu, J.D. Weir, Y. Shi, B. Niu, L. Li, Learning-interaction-diversification framework for swarm

CR IP T

intelligence optimizers: A unified perspective, Neural Computing and Applications, (2018) 1-21.

[8] P. Civicioglu, Backtracking search optimization algorithm for numerical optimization problems, Applied Mathematics & Computation, 219 (15) (2013) 8121-8144.

[9] P. Civicioglu, Transforming geocentric cartesian coordinates to geodetic coordinates by using differential search algorithm, Computers & Geosciences, 46 (3) (2012) 229-247.

[10] C. Cui, M. Hu, J.D. Weir, T. Wu, A recommendation system for meta-modeling: A meta-learning based

AN US

approach, Expert Systems with Applications, 46 (2015) 33-44.

[11] C. Cui, T. Wu, M. Hu, J.D. Weir, X. Li, Short-term building energy model recommendation system: A meta-learning approach, Applied Energy, 172 (2016) 251-263.

[12] V.V. de Melo, W. Banzhaf, Drone squadron optimization: A novel self-adaptive algorithm for global numerical optimization, Neural Computing and Applications, (2017) 1-28.

[13] A.E. Eiben, S.K. Smit, Parameter tuning for configuring and analyzing evolutionary algorithms, Swarm & Evolutionary Computation, 1 (1) (2011) 19-31.

M

[14] D.G. Ferrari, L.N.D. Castro, Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods, Information Sciences, 301 (2015) 181-194. [15] V.D.B. Frans, A.P. Engelbrecht, A cooperative approach to particle swarm optimization, IEEE Transactions on

ED

Evolutionary Computation, 8 (3) (2004) 225-239. [16] F.E. Grubbs, Sample criteria for testing outlying observations, Annals of Mathematical Statistics, 21 (1) (1950) 27-58.

PT

[17] J.C. Helton, F.J. Davis, Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems, Reliability Engineering & System Safety, 81 (1) (2003) 23-69.

CE

[18] A. Horner, J. Beauchamp, L. Haken, Machine tongues xvi: Genetic algorithms and their application to fm matching synthesis, Computer Music Journal, 17 (4) (1993) 17-29. [19] T. Jones, S. Forrest, Fitness distance correlation as a measure of problem difficulty for genetic algorithms, in:

AC

International Conference on Genetic Algorithms, 1995, pp. 184-192. [20] J.Y. Kanda, A.C.P.L.F.d. Carvalho, E.R. Hruschka, C. Soares, Using meta-learning to recommend meta-heuristics for the traveling salesman problem, in: 10th International Conference on Machine Learning and Applications and Workshops, 2011, pp. 346-351.

[21] D. Karaboga, B. Basturk, On the performance of artificial bee colony (abc) algorithm, Applied soft computing, 8 (1) (2008) 687-697. [22] D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (abc) algorithm, Journal of Global Optimization, 39 (3) (2007) 459-471. [23] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International Conference on Neural Networks, 1995, pp. 1942-1948. [24] J. Kennedy, R. Mendes, Population structure and particle swarm performance, in: IEEE Congress

ACCEPTED MANUSCRIPT

Evolutionary Computation, 2002, pp. 1671-1676. [25] J. Liang, B. Qu, P. Suganthan, Q. Chen, Problem definitions and evaluation criteria for the cec 2015 competition on learning-based real-parameter single objective optimization, Technical Report201411A, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Singapore, (2014). [26] J.J. Liang, A.K. Qin, P.N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, Ieee Transactions on Evolutionary Computation, 10 (3) (2006) 281-295. [27] T. Liao, D. Molina, T. Tzle, Performance evaluation of automatically tuned continuous optimizers on different benchmark sets, Applied Soft Computing, 27 (2014) 490-503.

CR IP T

[28] M. Lunacek, D. Whitley, The dispersion metric and the cma evolution strategy, in: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM, Seattle, Washington, Usa, 2006, pp. 477-484.

[29] N. Lynn, P.N. Suganthan, Ensemble particle swarm optimizer, Applied Soft Computing, 55 (2017) 533-548.

[30] R. Mallipeddi, P.N. Suganthan, Q.K. Pan, M.F. Tasgetiren, Differential evolution algorithm with ensemble of parameters and mutation strategies, Applied Soft Computing, 11 (2) (2011) 1679-1696.

AN US

[31] A. Matala, Sample size requirement for monte carlo simulations using latin hypercube sampling, Helsinki University of Technology, Department of Engineering Physics and Mathematics, Systems Analysis Laboratory, (2008).

[32] N. Mladenović, J. Petrović, V. Kovačević-Vujčić, M. Čangalović, Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search, European Journal of Operational Research, 151 (2) (2003) 389-399.

[33] M.A. Muñoz, M. Kirley, S.K. Halgamuge, A meta-learning prediction model of algorithm performance for

Springer, 2012, pp. 226-235.

M

continuous optimization problems, in: International Conference on Parallel Problem Solving from Nature,

[34] M.A. Muñoz, Y. Sun, M. Kirley, S.K. Halgamuge, Algorithm selection for black-box continuous optimization

ED

problems: A survey on methods and challenges, Information Sciences, 317 (2015) 224-245. [35] H.R. Neave, P.L. Worthington, Distribution-free tests, Contemporary Sociology, 19 (3) (1990) 488. [36] G.L. Pappa, G. Ochoa, M.R. Hyde, A.A. Freitas, J. Woodward, J. Swan, Contrasting meta-learning and

PT

hyper-heuristic research: The role of evolutionary algorithms, Genetic Programming and Evolvable Machines, 15 (1) (2014) 3-35.

CE

[37] K.E. Parsopoulos, M.N. Vrahatis, Unified particle swarm optimization for solving constrained engineering optimization problems, in: International Conference on Natural Computation, Springer, 2004, pp. 582-591. [38] F. Peng, K. Tang, G. Chen, X. Yao, Population-based algorithm portfolios for numerical optimization, IEEE

AC

Transactions on Evolutionary Computation, 14 (5) (2010) 782-800. [39] Q. Qin, S. Cheng, Q. Zhang, L. Li, Y. Shi, Particle swarm optimization with interswarm interactive learning strategy, IEEE Transactions on Cybernetics, 46 (10) (2016) 2238-2251.

[40] J.R. Rice, The algorithm selection problem *, Advances in Computers, 15 (1976) 65-118. [41] F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, 65 (6) (1958) 386. [42] Y. Shi, R. Eberhart, Modified particle swarm optimizer, in: IEEE World Congress on Computational Intelligence, 1998, pp. 69-73. [43] K.A. Smith-Miles, Cross-disciplinary perspectives on meta-learning for algorithm selection, Acm Computing Surveys, 41 (1) (2008) 137-153. [44] K.A. Smith-Miles, R.J. James, J.W. Giffin, Y. Tu, A knowledge discovery approach to understanding

ACCEPTED MANUSCRIPT

relationships between scheduling problem structure and heuristic performance, in: International Conference on Learning and Intelligent Optimization, Springer-Verlag, 2009, pp. 89-103. [45] J.E. Smith, T.C. Fogarty, Operator and parameter adaptation in genetic algorithms, Soft Computing, 1 (2) (1997) 81-87. [46] C. Soares, A hybrid meta-learning architecture for multi-objective optimization of svm parameters, Neurocomputing, 143 (143) (2014) 27-43. [47] C. Soares, P.B. Brazdil, P. Kuba, A meta-learning method to select the kernel width in support vector regression, Machine Learning, 54 (3) (2004) 195-209. [48] R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for global optimization over

CR IP T

continuous spaces, Journal of Global Optimization, 11 (4) (1997) 341-359.

[49] G. Wu, R. Mallipeddi, P.N. Suganthan, R. Wang, H. Chen, Differential evolution with multi-population based ensemble of mutation strategies, Information Sciences, 329 (2016) 329-345.

[50] X.S. Yang, Firefly algorithm, stochastic test functions and design optimisation, International Journal of

AC

CE

PT

ED

M

AN US

Bio-Inspired Computation, 2 (2) (2010) 78-84.

ACCEPTED MANUSCRIPT

Appendix A. Benchmark functions Table 12 Properties of benchmark functions. Basic function

Search Range

f1

Sphere

[-100,100]

f2

Schwefel P2.2

[-10,10]D

f3 f4

Schwefel P1.2 Schwefel P2.21

D

MM

Se

Sf

Rt

Ns

MS

N

Y

Y

N

N

N

N

N

Y

N

N

N

[-100,100]

D

N

N

Y

N

N

N

[-100,100]

D

N

Y

Y

N

N

N

D

N

N

Y

N

N

N

Y

Y

N

N

Y

Y

N

N

Y

Y

N

N

N

N

N

N

Y

N

N

N

Y

N

N

N

Y

N

N

N

f5

Rosenbrock

[-100,100]

f6

Schwefel P2.21

[-100,100]D

N

N

[-100,100]

D

N

N

D

N

N

f7

Rosenbroock

CR IP T

#

f8

Diff Power

[-100,100]

f9

2D minima

[-5,5]D

Y

Y

[-5,5]

D

Y

Y

D

Y

Y

Y

N

Y

N

Y

N

N

N

Y

Y

N

N

N

N

Y

N

Y

N

N

N

Y

N

N

N

N

N

f10

Rastrigin Non-Rastrigin

[-5,5]

f12

Ackley

[-32,32]D

f13

Griewank

AN US

f11

[-600,600]

D

D

f14

Weierstrass

[-0.5,0.5]

f15

Salomon

[-100,100]D

Penalized 2 D

f18

2 minima

f19

Griewank

f20

Salomon

f22

Rastrigin

Non-Rastrigin

PT

f23

[-50,50]

Y

N

Y

N

N

N

D

Y

N

N

Y

N

N

[-600,600]D

Y

N

Y

Y

N

N

Y

N

N

Y

N

N

Y

N

Y

Y

N

N

Y

N

Y

Y

N

N

Y

N

Y

Y

N

N

Y

N

Y

Y

N

N

Y

N

Y

Y

N

N

[-0.5,0.5]

[-5,5]

D

f25

Schwefel P2.13

[-π, π]D

CE

[-32,32]

f27

Schwefel P1.2

[-50,50]

D

D

Y

N

Y

Y

N

N

[-100,100]

D

N

N

Y

N

Y

N

D

N

N

Y

Y

Y

N

Y

Y

Y

N

N

Y

Y

Y

Y

N

N

Y

N

N

Y

Y

Y

N

Quadric

[-100,100]

f29

Mis-scaled Rastrigin 10

[-5,5]D

AC

f28

f30

Mis-scaled Rastrigin 100

D

[-5,5]D

Ackley

Penalized 2

D

[-100,100]

f24

f26

D

[-100,100]

Weierstrass

f21

[-π, π]

M

f17

Schwefel P2.13

ED

f16

D

[-5,5]

D D

f31

Schwefel P1.2

[-100,100]

f32

Mis-scaled Rastrigin 10

[-5,5]D

Y

N

Y

Y

N

Y

D

Y

N

Y

Y

N

Y

f33

Mis-scaled Rastrigin 100

[-5,5]

where ” MM” denotes multimodal, ”Sf” denotes shifted operation, ”Rt” denotes rotated operation, Se denotes separable, Ns denotes noisy, MS denotes mis-scaled. The value of the corresponding column is „„Y‟‟ if the function has the specific property, otherwise, it is „„N‟‟; For the Rt column, apart from „„Y‟‟ and ‟‟N‟‟, ‟‟Single‟‟.

ACCEPTED MANUSCRIPT

Table 13 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the

f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

GPSO

1

1

0

0

0

0

0

0

0

0

0

LPSO

1

1

0

0

0

0

0

0

1

0

0

LCPSO

1

1

1

1

0

1

0

1

0

0

0

CLPSO

1

1

0

0

0

0

1

0

1

1

1

CPSO

1

0

0

0

0

0

0

0

0

0

0

UPSO

1

1

1

1

1

1

0

1

1

0

0

ARM-2NN

1

1

1

1

0.9

0.6

0.1

1

1

1

1

ARM-ANN

1

1

1

1

0.5

1

0.3

0.8

1

1

1

f12

f13

f14

f15

f16

f17

f18

f19

f20

f21

f22

GPSO

0

0

1

0

0

1

0

0

0

0

0

LPSO

0

0

1

0

0

1

0

0

1

0

0

LCPSO

1

0

1

1

0

1

0

0

0

1

0

CLPSO

0

1

1

0

0

1

1

1

0

0

1

CPSO

0

0

0

0

0

0

0

0

0

0

0

UPSO

1

0

0

0

1

1

0

0

0

0

0

ARM-2NN

1

0.7

1

1

0

1

1

1

0.5

1

1

ARM-ANN

1

0.8

1

1

0.2

1

1

0.4

0.3

0.8

1

f23

f24

f25

f26

f27

f28

f29

f30

f31

f32

f33

GPSO

0

0

0

1

0

0

0

0

0

0

0

LPSO

0

0

0

1

0

0

0

0

0

0

0

LCPSO

0

CLPSO

1

CPSO

0

ARM-2NN

AC

M

ED 1

1

1

1

0

0

1

0

0

0

0

1

0

0

1

1

0

1

1

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

0

0

0

0

0

0

1

1

0.7

1

1

0.3

1

1

1

1

1

1

1

1

1

1

0.6

1

1

1

1

1

CE

ARM-ANN

1

PT

UPSO

AN US

Algorithms

CR IP T

best performer (when population size is 40 and on 10D).

ACCEPTED MANUSCRIPT

Table 14 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the

f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

1

0

0

LCPSO

0

1

0

1

0

1

0

0

0

0

0

CLPSO

1

1

0

0

1

0

1

0

1

1

1

CPSO

0

0

0

0

0

0

0

0

0

0

0

UPSO

1

1

1

1

0

0

0

1

0

0

0

ARM-2NN

0.8

1

0

1

1

0.5

1

0

1

1

1

ARM-ANN

1

0.

0

1

0.9

0.7

1

0.5

1

1

1

f12

f13

f14

f15

f16

f17

f18

f19

f20

f21

f22

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

0

1

0

0

0

0

0

1

0

0

LCPSO

0

0

0

1

1

1

0

0

0

1

0

CLPSO

0

1

1

0

0

1

1

1

0

0

1

CPSO

0

0

0

0

0

0

0

0

0

0

0

UPSO

1

0

0

0

0

1

0

0

0

0

0

ARM-2NN

1

0.8

1

1

1

1

1

1

0.3

1

1

ARM-ANN

0.2

1

1

0.3

1

1

1

1

0

0.6

1

f23

f24

f25

f26

f27

f28

f29

f30

f31

f32

f33

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

0

0

0

LCPSO

0

CLPSO

1

CPSO

0

ARM-2NN

AC

M

ED 1

0

1

1

0

0

1

0

0

1

0

1

0

0

1

1

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1

0

1

1

1

0.2

1

1

1

1

1

1

0

1

1

1

0.3

1

1

1

1

1

CE

ARM-ANN

0

PT

UPSO

AN US

Algorithms

CR IP T

best performer (when population size is 40 and on 30D).

ACCEPTED MANUSCRIPT

Table 15 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the

f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

0

0

0

LCPSO

0

0

0

0

0

1

0

0

0

0

0

CLPSO

1

1

0

0

1

0

1

0

1

1

1

CPSO

0

0

0

0

0

0

0

0

0

0

0

UPSO

1

1

1

1

0

1

0

1

0

0

0

ARM-2NN

1

1

0

0

1

0.1

1

0

1

1

1

ARM-ANN

1

0.9

0

0.2

1

0.6

1

0

1

1

1

f12

f13

f14

f15

f16

f17

f18

f19

f20

f21

f22

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

0

0

0

LCPSO

0

0

0

1

1

0

0

0

0

1

0

CLPSO

1

1

1

1

0

1

1

1

1

0

1

CPSO

0

0

0

0

0

0

0

0

0

0

0

UPSO

0

0

0

0

0

0

0

0

0

0

0

ARM-2NN

1

1

1

1

1

1

1

1

1

0

1

ARM-ANN

1

1

1

1

0.9

1

1

1

1

0

1

f23

f24

f25

f26

f27

f28

f29

f30

f31

f32

f33

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

0

0

0

LCPSO

0

CLPSO

1

CPSO

0

ARM-2NN

AC

M

ED 1

0

1

1

0

0

1

0

0

1

0

1

0

0

1

1

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

CE

ARM-ANN

0

PT

UPSO

AN US

Algorithms

CR IP T

best performer (when population size is 40 and on 50D).

ACCEPTED MANUSCRIPT

Table 16 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the best performer (when population size is 60 and on 10D). f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

GPSO

1

1

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

1

0

0

LCPSO

1

1

0

1

1

1

0

1

1

0

0

CLPSO

0

0

0

0

0

0

0

0

1

1

1

CPSO

0

0

0

0

0

0

0

0

0

0

0

UPSO

1

1

1

1

0

1

1

1

1

0

0

ARM-2NN

0.2

1

0.4

1

1

1

ARM-ANN

0.9

1

0.3

1

1

1

f12

f13

f14

f15

f16

f17

GPSO

0

0

1

0

0

1

LPSO

0

0

0

0

0

0

LCPSO

1

0

1

1

1

1

CLPSO

0

1

0

0

CPSO

0

0

0

0

UPSO

1

1

0

0

ARM-2NN

1

0.9

1

1

ARM-ANN

1

0

0.9

1

f23

f24

f25

GPSO

0

0

0

LPSO

0

0

0

LCPSO

0

CLPSO

1

CPSO

0

ARM-2NN

AC

1

1

0

1

1

1

1

f18

f19

f20

f21

f22

0

0

0

0

0

0

0

1

1

0

0

0

1

1

1

AN US

1

0

0

1

1

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

1

1

1

0.1

1

1

1

0.8

1

0.9

0.1

1

1

1

f26

f27

f28

f29

f30

f31

f32

f33

1

0

0

0

0

1

0

1

0

0

0

0

0

0

0

0

M

ED

1

1

1

1

1

0

0

1

0

0

0

0

0

0

0

1

1

0

1

1

0

0

0

0

0

1

0

0

0

0

0

1

0

1

0

0

0

0

1

0

0

1

1

1

1

0.5

0.5

1

1

1

1

1

1

1

1

1

0.9

0.8

1

1

1

1

1

CE

ARM-ANN

0

1

PT

UPSO

CR IP T

Algorithms

ACCEPTED MANUSCRIPT

Table 17 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the best performer (when population size is 60 and on 30D). f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

GPSO

0

0

0

0

0

0

0

0

0

0

0

LPSO

1

1

0

0

0

1

0

0

1

0

0

LCPSO

1

1

1

1

0

1

0

1

1

0

1

CLPSO

1

1

0

0

1

0

0

0

1

1

1

CPSO

0

0

0

0

0

0

0

0

0

1

1

UPSO

1

1

1

1

0

1

1

1

1

0

0

ARM-2NN

1

1

1

1

0.8

1

ARM-ANN

1

1

1

0.8

1

f12

f13

f14

f15

GPSO

0

0

0

LPSO

1

0

LCPSO

1

CLPSO

CR IP T

Algorithms

1

1

1

1

1

0.1

1

1

1

1

f16

f17

f18

f19

f20

f21

f22

0

0

0

0

1

0

0

0

1

1

0

1

0

0

1

1

0

0

1

1

1

1

0

0

1

1

0

1

1

1

0

CPSO

0

0

0

0

UPSO

1

0

0

0

ARM-2NN

1

0.4

1

ARM-ANN

1

0.4

f23 GPSO

AN US

1

1

1

1

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0.8

0.4

1

1

0.6

0.6

0.4

1

1

0.3

1

1

1

0.4

0.9

0.6

1

f24

f25

f26

f27

f28

f29

f30

f31

f32

f33

0

0

0

0

0

0

0

0

0

0

0

LPSO

0

1

0

1

0

0

0

0

0

0

0

LCPSO

0

CLPSO

1

CPSO

0

ARM-2NN

AC

ED 0

1

1

1

0

0

1

0

0

1

0

1

0

0

1

1

0

1

1

0

0

0

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

0

0

1

1

0.1

1

1

0.4

1

1

1

1

1

1

1

0.6

1

1

0.3

1

1

1

1

1

CE

ARM-ANN

1

PT

UPSO

M

0

ACCEPTED MANUSCRIPT

Table 18 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the best performer (when population size is 60 and on 50D). f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

f11

GPSO

0

1

0

0

0

0

0

0

0

0

0

LPSO

0

0

0

0

0

0

0

0

0

0

0

LCPSO

1

1

0

0

0

0

0

0

0

0

0

CLPSO

0

0

0

0

1

0

0

0

1

1

1

CPSO

0

0

0

0

0

0

0

0

0

1

1

UPSO

1

1

1

1

1

1

1

1

0

0

0

ARM-2NN

1

0

0.7

1

0.8

0.9

ARM-ANN

0.9

0.8

0.1

0.4

1

f12

f13

f14

f15

GPSO

0

0

0

LPSO

0

0

LCPSO

1

CLPSO

CR IP T

Algorithms

1

1

1

1

0.8

0.9

0.1

1

1

1

f16

f17

f18

f19

f20

f21

f22

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

1

0

0

1

1

0

CPSO

0

0

0

0

UPSO

1

1

0

0

ARM-2NN

1

1

1

ARM-ANN

1

1

f23 GPSO

AN US

0

0

1

0

1

0

1

0

0

0

0

0

0

0

0

1

0

1

0

0

0

0.5

0.9

1

1

1

1

0.7

1

1

0.5

0.8

0.9

1

1

1

0.3

1

f24

f25

f26

f27

f28

f29

f30

f31

f32

f33

0

0

0

0

1

0

0

0

1

0

0

LPSO

0

0

0

0

0

0

0

0

0

0

0

LCPSO

0

CLPSO

1

CPSO

0

ARM-2NN

AC

ED 1

0

1

1

0

0

1

0

0

1

0

0

0

1

1

1

0

1

1

0

0

0

0

0

1

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1

0

1

1

1

0

1

1

1

1

1

1

0.9

1

1

1

0.2

1

1

1

1

1

CE

ARM-ANN

1

PT

UPSO

M

0