Accepted Manuscript
Adaptive recommendation model using meta-learning for population-based algorithms Xianghua Chu , Fulin Cai , Can Cui , Mengqi Hu , Li Li , Quande Qin PII: DOI: Reference:
S0020-0255(18)30810-7 https://doi.org/10.1016/j.ins.2018.10.013 INS 13994
To appear in:
Information Sciences
Received date: Revised date: Accepted date:
20 October 2017 11 October 2018 12 October 2018
Please cite this article as: Xianghua Chu , Fulin Cai , Can Cui , Mengqi Hu , Li Li , Quande Qin , Adaptive recommendation model using meta-learning for population-based algorithms, Information Sciences (2018), doi: https://doi.org/10.1016/j.ins.2018.10.013
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Adaptive recommendation model using meta-learning for population-based algorithms Xianghua Chu1, 2, Fulin Cai1, Can Cui3, Mengqi Hu4, Li Li1, Quande Qin1,* 1. College of Management, Shenzhen University, China 2. Institute of Big Data Intelligent Management and Decision, Shenzhen University, China USA
CR IP T
3. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 4. Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, USA *Corresponding author, email:
[email protected]
Abstract: To efficiently solve complex optimization problems, numerous population-based meta-heuristics and extensions have been developed. However, the performances of the algorithms vary depending on the problems. In
AN US
this research, we propose an Adaptive Recommendation Model (ARM) using meta-learning to identify appropriate problem-dependent population-based algorithm. In ARM, the algorithms are adaptively selected by mapping the problem characteristics to the algorithm performance. Since the meta-features extracted and meta-learner adopted would significantly affect the system performance, 18 meta-features including statistical, geometrical and landscape features are extracted to characterize optimization problem spaces. Both instance-based and model-based learners are investigated. Two performance metrics, Spearman‟s rank correlation coefficient and success rate are used to evaluate the accuracy of optimizer‟s ranking prediction and the precision of the best
M
optimizer recommendation. The proposed ARM is compared against population-based algorithms with distinct search capabilities such as PSO variants, non-PSO population-based optimizers, hyper-heuristics and ensemble
ED
methods. Benchmark functions and real-world problems with various properties are adopted in the experiments. Experimental results reveal the extendibility and effectiveness of ARM on the diverse tested problems in terms of
PT
solution accuracy, ranking and success rate.
Keywords: global optimization, algorithm selection, meta-learning, population-based algorithm, recommendation
CE
system
1. Introduction
AC
The increasing complexity of real-world optimization problems urges researchers to develop efficient
meta-heuristic algorithms. Consequently, numerous population-based algorithms have been proposed, e.g., particle swarm optimization (PSO) [23], differential evolution (DE) [48] , artificial bee colony algorithm (ABC) [22]. Extensive research had investigated the applications of these algorithms to diverse global optimization problems, and the efficacy of the population-based algorithms had been demonstrated [6, 7, 26, 39, 42]. However, it is known that the performances of these algorithms and their variants are problem-dependent. The immediate challenge becomes, for a new problem without prior knowledge, it is difficult to identify the appropriate algorithm. The simplest solution is to take a trial-and-error approach. But it is computationally expensive, limiting its application to real-time, dynamic and large-scale problems. Researchers have explored strategies to enhance algorithms‟ general search capability such as hybrid meta-heuristics [2] and parameter control methods [13]. Again, these methods are tailored for the specific problems, and the performance of the algorithms on new problems is
ACCEPTED MANUSCRIPT
questionable. Recently, hyper-heuristics [4], algorithm ensemble [29] and algorithm portfolio [38] are proposed to alleviate the dilemma of the algorithm generalization. Hyper-heuristic adopts a high-level algorithm to guide the low-level algorithms to tackle the problem. The algorithm portfolio focuses on allocating computation time among algorithms and fully utilizes the advantages of these algorithms to maximize the expected utility of a problem-solving episode. While promising, the two methods disregard the similarities between the current problem and previously observed problems [5]. In addition, the algorithms‟ performances require the prior information and expert knowledge of search algorithms.
CR IP T
In this paper, we propose a generalized data-driven recommendation system taking advantage of meta-learning concept that can adaptively select the best optimizer for global optimization problems based on the problem characteristics. Meta-learning is a novel approach previously applied in machine learning field. The principle of meta-learning is to investigate the learning instance and refine its mechanism for the future applications for other instances [3, 40]. Since inception, meta-learning has been applied to different areas [43], e.g. the selection of kernel width of support vector regression [47], clustering algorithm selection [14], energy consumption forecasting [11], meta-modeling process[10], and scheduling optimization [44], just to name a few.
AN US
To the best of our knowledge, limited research has empirically investigated the relationship between the performance of population-based algorithms and the feature landscapes of the optimization problems [34]. We contend discovering the algorithm-problem relationship is important as it will provide the general guidelines to identify the appropriate algorithms for a given problem. To fill this gap, we develop a self-adaptive automatic learning mechanism in which the connection between the meta-features and the algorithm performance is bridged. As a result, given a new problem, the landscape features are recognized by meta-learner to adaptively select the
M
best optimizer for global optimization. Specifically, three challenges need to be addressed: (1) In meta-learning area, the selection of meta-features which characterize optimization problems plays a critical role on the selection performance. It is challenging to devise a set of meta-features which are generalized
ED
and efficient to characterize the diverse problem spaces. Therefore, what meta-features are crucial to depict the landscape of global optimization?
(2) The optimization problem is complex and black-box in general, in which meta-features are generated via
PT
sampling data to represent the problem space. Hence, to characterize the problem landscape as accurately as possible, how to efficiently sample data and what sample size is appropriate?
CE
(3) Due to the large number of population-based algorithms, it is impossible to enumerate all the algorithms in the algorithm repository. How to compose the algorithm repository including representative algorithms for recommendation, and further validate the proposed meta-learning mechanism?
AC
(4) In addition to the benchmark functions, how well does ARM perform on more complex test functions and
real-world problems? To address the above questions, a population-based algorithm recommendation model, termed Adaptive
Recommendation Model (ARM) is proposed. There are four components in the ARM: meta-feature module, algorithm performance module, learning module and recommending module. The meta-feature module consists of a set of meta-level features to demonstrate the problem properties based on representative sampling data from the problem; the algorithm performance module contains the population-based optimizers and the metrics to evaluate performance of the optimizers; the learning module obtains a mature model that can map the meta-features to the algorithm performance through a training series; the recommendation module is to select an appropriate optimizer for a new problem by the trained model. The main goal of the model is to achieve an optimal recommendation model through building a knowledge repository that maps meta-features and algorithm performance.
ACCEPTED MANUSCRIPT
To validate the efficacy of the proposed ARM, eleven-step procedure is launched: (1) A pool of 33 optimization problems collected from [7] is first constructed as problem repository. (2) A set of population-based algorithms, i.e. six PSO variants with distinct characteristics and properties, are initially employed to build the algorithm repository. The PSOs include global topology PSO (GPSO) [42], local topology PSO (LPSO) [24], local topology PSO with constriction factor (LCPSO) [24, 26], comprehensive learning PSO (CLPSO) [26], cooperative PSO (CPSO) [15], and unified PSO (UPSO) [37]. (3) The rankings of algorithm performances on each benchmark function are captured in terms of the average best fitness value and t-test comparison. (4) Latin hypercube sampling (LHS) technique [17] is employed to gather a representative dataset of benchmark function. (5) Eighteen meta-features (statistical, geometrical and landscape) with computationally efficient characteristics are derived
CR IP T
from the sampled dataset for the depiction of problem landscape. (6) Two types of meta-leaners, instance-based (k-nearest-neighbors, k-NN) vs. model-based (artificial neural-network, ANN), are implemented to study the meta-learning process given different problem dimensions and varied algorithm performance. (7) Spearman‟s rank correlation coefficient and the Success rate are used to assess the performance of ARM. (8) The performances of the two meta-learners are validated and compared with different problem dimensions and population sizes. (9) Five population-based algorithms (DE, ABC, backtracking search algorithm (BSA) [8], differential search optimization
AN US
algorithm (DSA) [9] and firefly algorithm (FA) [50]) are employed to validate the effectiveness and extendibility of ARM. (10) Three ensemble algorithms and one hyper-heuristic (Ensemble of mutation strategies and parameters in DE (EPSDE) [30], ensemble particle swarm optimizer [29], multi-population ensemble DE (MPEDE) [49] and drone squadron optimization (DSO) [12]) are introduced into algorithm pool to demonstrate the generalization and performances of the ARM. (11) Fifteen benchmark functions from CEC 2015 and two complex real-world problems are adopted to validate the performance of ARM. Experimental results reveal that the proposed framework could achieve a reliable correlation on algorithm rankings, and averagely over 88% success rate in the
M
best performer recommendation for the benchmark functions and real-world problems. In summary, there are three contributions from the proposed ARM:
ED
(1) A practical set of meta-features is proposed to describe the problem space of global optimization. (2) To the best of our knowledge, this is the first adaptive recommendation model using meta-learning from the machine learning domain for population-based algorithms addressing global optimization.
PT
(3) The knowledge-based learning mechanism of ARM does not require extensive expertise, and it has the flexibility and extendibility to include more efficient optimizers for complex optimization problems.
CE
The rest of the paper is organized as follows: related works about population-based algorithms are reviewed in Section 2. Section 3 presents a detailed description of ARM. Experiments and results are discussed in Section 4.
AC
Section 5 provides a discussion of the conclusions and future work.
2. Related work According to “No-Free-Lunch” theorem, no single population-based algorithm performs well on all
problems. Rather than improving single population-based algorithm, researchers have attempted to integrate different algorithms to enhance the adaptability and performance. Specifically, research attempts to integrate algorithms with complementary characteristics into a hybrid version, termed hybrid metaheuristics [1]. Algorithm portfolio mechanism is proposed to avoid betting the entire computational budget on a single algorithm [38]. Another notable effort is on adaptively selecting the appropriate algorithms for the problems [36]. Hyper-heuristics [4] is to design as high-level heuristics to guide the low-level heuristics during the searching process. One critique is these methods choose to ignore the similarities between the current problem and previously observed problems
ACCEPTED MANUSCRIPT [5]. And, the algorithms‟ performances require the prior information and expert knowledge of search algorithms. In addition, all methods reviewed above do not harvest the algorithm-problem relationship which maybe instrumental to select the appropriate algorithms for the given problems. In addition, more involved algorithms entail greater waste of computational sources, which are usually limited in practice. In fact, the relationship between problems and algorithm performance should be considered. Meta-learning has been widely applied in machine learning field for selecting the appropriate parameter set or model for a complex task [3, 43]. Previous research mainly lies in two directions: parameter tuning [27, 33, 45, 47] and algorithm selection [11, 14, 20, 44]. In parameter tuning, Soares et al. adopt meta-learning to set the parameter of kernel width for support vector regression [47]. The error of regression result is remarkably reduced
CR IP T
comparing with traditional methods (fixed default ranking, cross-validation and a heuristic method); The initialization was recommended by meta-learning to located in a good region in [46]. This facilitates the multi-objective optimization algorithms to search for the promising Pareto front. Parameter settings affect the performance of population-based algorithms. Smith and Fogarty theoretically summarize the adaptive parameter control methods, including meta-learning, for genetic algorithm [45]. To select the optimal parameter configuration for CMA-ES algorithm, Muñoz et al. implement a neural network regression model to predict the performance of
AN US
expected running time on the black box problems [33]. In this study, prior expert knowledge is required to predetermine a set of candidate parameter settings for model input. Although the parameters are appropriately set as the recommendations, it is interesting to observe that algorithm performances are more dependent on the solving skills rather than the tuned parameters [27]. In the area of algorithm selection,
From the perspective of algorithm selection using meta-learning, Cui et al. [11] develop a framework that recommends an appropriate surrogate model to forecast the short-term building energy consumption, where six
M
statistical and machine learning models are studied. To facilitate the practitioner selecting appropriate the best clustering algorithm, the high-quality rankings of clustering algorithms are predicted by k-Nearest Neighbors algorithm using meta-learning concept [14]. Muñoz et al. [34] focus on
the algorithm selection for black-box
ED
continuous optimization problems and study the current landscape analysis methods. Researchers also investigate the application of meta-learning in scheduling problem [44] and traveling salesman problem (TSP) [20]. In [44], Smith-Miles et al. solve the selection problems between two heuristics (the earliest due date and shortest
PT
processing time) for scheduling problems. The authors investigate meta-learning‟s performance using adopting neural network and decision tree as performance prediction models. A meta-learning model with a specific meta-feature set for TSP instances is proposed to predict the rankings of meta-heuristics in [20].
CE
The application of meta-learning is an emerging field, yet, it is noted that the applicability of meta-learning
on population-based optimizers has not been fully investigated. Meta-learning, which learns from previous experience, can operate at a more generalized level. Specifically, it has the potential to study different datasets,
AC
ranging from different problem domains to different features [36]. This research is to develop a data-driven framework that can adaptively identify the appropriate population-based optimizer for global optimization problems. The knowledge discovered from relationship between the meta-features (problem landscape features) and algorithm performance (optimizer performance metrics) will help the user to identify the appropriate model for a new global optimization problem.
3. ARM for population-based algorithms In this research, derived from Rice‟s work in machine learning [40], a population-based algorithm recommendation model is proposed to adaptively select the appropriate method for numerical optimization
ACCEPTED MANUSCRIPT
problems as illustrated in Fig. 1. The population-based algorithm recommendation system can be viewed as a meta-learning based framework that models the relations of algorithm performance and optimization problem characteristics. The proposed ARM has four components: meta-feature module, algorithm performance module, learning module and recommending module. The details are described in the following subsections.
Repository of problems
CR IP T
Algorithm performance module
Meta-feature module
Repository of algorithms
Data sampling Performance measurement
Meta-feature calculation
New problem
Solve
Meta-learning process
Meta-learner
Data sampling
Recommendation system
Meta-feature calculation
Recommending module
M
Learning module
Optimizer selection
AN US
Algorithm performance
Meta-features
3.1.
ED
Fig. 1 Framework of ARM
Meta-feature module
Identifying the appropriate set of meta-features to characterize a problem landscape is a crucial step for
PT
meta-level induction learning [43]. Meta-feature selection and extraction greatly affect the subsequent learning performance as the meta-learning algorithm (meta-learner) is sensitive to the set of meta-features.
CE
To comprehensively capture the problem characteristics, in this research we consider three types of meta-features: statistical features, geometrical measurement features and landscape features. The statistical features are statistical information of the problem including the mean, standard deviation, skewness, kurtosis,
AC
altitude of search space, first quartile, median and third quartile; The geometrical measurement features describe the complexity of the problem surfaces which includes gradient-based features 1) - 4), outlier ratio and ratio of local extrema. The basic statistical features and geometrical measurement features are empirically selected from more
than
40
features
applied
in
these
meta-learning
studies
[10];
Landscape
features
with
computationally-efficient characteristic are chosen to represent dispersion [28] and fitness distance correlation [19]. The definitions of the 18 derived meta-features are as follows. With a sample of N data points, the gradient value Gi of the ith data point is calculated as:
Gi f ( xi ) f ( xi step), i 1,...N.
(1)
where xi ( xi1 , xi2 ,..., xiD ) is the position of point i; f ( xi ) refers to the fitness value of point i and the step is defined as 1% of the function range.
ACCEPTED MANUSCRIPT
1)
G : Mean of Gradient of Fitness Surface, which evaluates how steep and rugged the fitness surface is according to its rate of change around the sampled data points. N
G 1 N Gi
(2)
i
2)
M ( G ) : Median of gradient of fitness surface.
3)
SD( G )
: Standard deviation of gradient of fitness surface, which evaluates the variation of the gradient on
N
SD( G ) 1 ( N 1) ( Gi G ) 2 i 1
4)
f : Mean of fitness values, which evaluates the general fitness value of the surface. N
f 1 N fi
(4)
AN US
i 1
6)
(3)
G max : Max of gradient of fitness surface, which illustrates the maximal degree of sudden change on the
surface.
5)
CR IP T
the surface.
SD( f ) : Standard deviation of fitness values, which evaluates the bumpiness of the surface by measuring each value‟s deviation by
N
SD( f ) 1 ( N 1) ( f i f ) 2
(5)
i 1
1 ( f ) : Skewness of fitness values, which measures the lack of symmetry on the surface.
M
7)
3
1 ( f ) E ( fi f ) Std .( f i ) , i 1,..., N
2 ( f ) : Kurtosis of fitness values, which measures the surface relative to the normal distribution.
ED
8)
PT
2 ( f ) E ( fi f )4 ( E ( fi f )2 )2 i 1,..., N
9)
(6)
(7)
f : Altitude of search space, which evaluates the degree of the search space altitude on the basis of
CE
difference between the upper and lower bound of fitness values. f lg(abs(max( fi ) min( fi ))), i 1,..., N
(8)
10) Q1 of fitness value: 25% quartile of response values, which represents the lower quartile of fitness values.
AC
11) Q2 of fitness value: 50% quartile of response values, which represents the median quartile of fitness values. 12) Q3 of fitness value: 75% quartile of response values, which represents the upper quartile of fitness values. 13) Outlier ratio: Ratio of outliers of fitness values, which evaluates percentage of extreme values among all values through the Grubbs test [16].
14) & 15) Ratio of local extrema: Ratio of local minima and maxima with 4 nearby points in Euclidian space, which distinguishes fitness surfaces as bumpy surface or flat. 16) Depth of local extrema (DLE): Averaged depth of local minima and maxima with 4 neighborhoods in Euclidian space, which describes how vigorously swarms jump out of the local optima by looking into the average maximum fitness value distance among neighborhoods. 17) Dispersion (DISP): The pairwise distance between the q best points, which identifies the global structure. In
ACCEPTED MANUSCRIPT
this study, q is equal to 20% of the sampled size. DISP
q q 1 d ( xi , x j ) q(q 1) i 1 j 1, j i
(9)
18) Fitness distance correlation (FDC): The relationship between the position and fitness value, which indicates the capability to identify deceptiveness in the landscape. FDC
1 N yi y di d )( ) ( N 1 i 1 y d
(10)
CR IP T
where di is the distance between point i and the best position; y and d are the average fitness value and distance;
y and d are the standard deviations of the fitness value and the distance.
To develop the repository of global optimization problems, 33 problems with various properties are initially selected for knowledge training. The problems include unimodal functions without rotation, rotated unimodal functions, multimodal functions without rotation, rotated multimodal functions, noisy and mis-scaled functions
AN US
(detailed properties are summarized in Appendix A). To include all the properties of optimization problems is challenging, we aim to extend the knowledge gained from these cases to address new problems. Then, the problem set of CEC2015 benchmark functions is employed to validate the effectiveness of the obtained representative knowledge. Please note that, more functions can be added to the problem repository for knowledge learning in the next stage, which can improve the recommendation accuracy of the proposed model.
Given the problem repository, LHS technique is employed to sample raw data from each optimization
M
function for landscape characterization [17]. LHS has high uniformity and coverage in a multidimensional distribution that requires less computational effort; therefore, it is a popular statistical sampling method in the construction of the propagation of uncertainty in analyses of complex systems. In addition, when the sample size is
ED
small, LHS makes simulations converge faster than traditional random sampling technique such as Monte Carlo
3.2.
PT
sampling [31].
Algorithm performance module Since the population-based algorithms conduct stochastic search, different results can be obtained for each
CE
run. Thus, to measure the algorithm performance, every algorithm is run 30 times for each benchmark function, and the maximum number of function evaluations is set to be Dimension (D) *10000. The algorithms are tested on the 33 problem functions to capture the average solution accuracy. The mean value of the best fitness value (ABF)
AC
over the 30 runs as a common metric for comparing algorithm performance [7] is adopted for performance measurement. The calculation of ABF is given by the following equation: 30
ABF best fitness value / total # of runs
(11)
n 1
The performance of each algorithm is sorted by the average best fitness value, in addition, a significance test (t-test) is conducted to validate the significance of the performance difference. A two-tailed t-test is employed for statistical comparison and the level of significance is 0.05. Because the population-based algorithms are adaptively selected for a new problem, the selection is based on the rankings derived from the ABF value. This mitigates the high variance of difficulty for optimization across various benchmark functions. Ranking facilitate the recommendation process which is scale-free and case-wise independent [10].
ACCEPTED MANUSCRIPT
Candidate algorithms in the repository have different strength and diverse characteristics. This is because the optimization problems always possess diverse properties. Proposing a variant of an algorithm is a common way to enhance the performance. However, a modified algorithm with high performance for all problems is challenging. Here six canonical PSO variants with different strengths are initially selected to first build the algorithm repository. These PSOs include GPSO [42], LPSO [24], LCPSO [24], CLPSO [26], CPSO [15] and UPSO [37]. The PSOs were separately improved in three directions: topology structure, learning strategy and parameter setting, as shown in the Table 1. As a result, the PSOs display various search properties. GPSO owning global topology converges quickly on unimodal problems but easily gets trapped at local optima; LPSO with local topology performs niching
slowly on unimodal problems.
Table 1 Categorization of PSO variants Parameter setting
Topology structure
LCFPSO (constriction factor) GPSO (weighted)
GPSO (global topology)
Learning strategy
CLPSO (comprehensive learning strategy) CPSO (cooperative learning strategy)
AN US
LPSO (local topology) UPSO (integration of global and neighbor
CR IP T
search that is effective for multimodal problems; and CLPSO performs well on complex problems while converges
topology)
Please note that, integrating all distinct population-based algorithms is challenging due to the large number of optimizers. Here our objective is to validate ARM on the representative algorithms including PSOs, PSO variants, non-PSO population-based optimizers, hyper-heuristics and ensemble methods. The five other
M
state-of-the-art population-based algorithms are DE [48], ABC [22], BSA [8], DSA [9] and FA [50]. These widely used algorithms are motivated by various sources of inspirations: ABC, FA and PSOs are inspired by the
ED
biologically collective behaviors in nature; DE, BSA and DSA are proposed using the genetic and mathematical type of population search. It is empirically known that the algorithms display diverse search behaviors on global optimization [8, 9, 21, 50]. Moreover, ensemble algorithms [29, 30, 49] and one hyper-heuristic [12] are included
PT
to justify the effectiveness of ARM.
Learning module
CE
3.3.
Generally, meta-learning algorithms are divided into two categories: instance-based learner and model-based
learner [43]. The instance-based learners predict the rankings using the problem similarity which is measured by a
AC
distance metric. It is based on the hypothesis that an algorithm has similar performance on similar problems; the model-based learners predict the rankings using an underlying model that is generated from a learning process that maps the meta-features to the algorithm performance.
3.3.1.
Instance-based meta-learner
The k-NN ranking approach is chosen as an instance-based learner, due to its wide use in other studies and its effectiveness and efficiency [3, 14, 20, 47]. The k nearest neighbors are selected by measuring the cosine similarity between the new problem and the meta-examples, based on the meta-features. Then, the rankings of performance are calculated to make a recommendation. The cosine similarity is calculated as follows:
ACCEPTED MANUSCRIPT
xi x j
similarity (i, j ) cos( xi , x j )
2
2
xi *
, i, j 1,..., m
(12)
xj
The ranking of the algorithm a on the new problem i is predicted as
rri , a
similarity(i, j)ir similarity(i, j)
j,a
jN ( i )
(13)
j N ( i )
where rri and iri are the recommended rank and the ideal rank of algorithm i ; N (i) represents the set of k-NN of
3.3.2.
CR IP T
problem i.
Model-based meta-learner
In this study, a regression-based learner is trained on the meta-example datasets. Given the complexity and heterogeneities among the various meta-features and the ranking position values of each algorithm, an ANN [41] is employed due to its superiority on the non-linear function modeling and robustness to noisy and redundant features
3.4.
AN US
[20, 44].
Recommending module
After the meta-learning process is completed, the knowledge repository mapping the meta-features and the algorithms‟ performance is generated. To address a new optimization problem, several steps should be followed: Step 1: Sample data from the new problem space.
M
Step 2: Calculate the meta-features to characterize the new problem space using the proposed meta-feature set.
ED
Step 3: Input the meta-features into the recommendation system. Step 4: Output the recommended optimizer from the candidate algorithms. Step 5: Solve the new problem using the selected optimizer.
PT
Once the recommended rankings of the algorithms are calculated, two common evaluation metrics from the meta-learning field are adopted to measure the output performance: SRCC and SR.
CE
The SRCC [35] is employed to evaluate the agreement between recommended rankings and true (ideal)
AC
rankings.
1 6*
N i 1 2
d i2
N ( N 1)
(14)
where di is the distance between the recommended ranking and the ideal ranking of algorithm i; N is the number of algorithms. For this coefficient, the value of 1 means full agreement while -1 refers to full disagreement. A value of 0 represents no relationship between the two rankings, which would be the expected value of a random ranking method. The SR is measured by the percentage of the exact matches between the ideal best performer and recommended best performer on the problems. It is calculated as the ratio of the total number of matches achieved by the predicted best performers to the tested problems. This is derived to evaluate the precision of the recommendation system. In the recommendation of optimization algorithms, users are primarily concerned about the selected best performer being equal to the ideal one. Hence, both the SRCC and the SR were employed to
ACCEPTED MANUSCRIPT
comprehensively compare the performance of the different population-based algorithms.
3.5.
ARM implementation The implementation procedure of ARM is illustrated as follows: Step 1: Collect candidate algorithms and optimization problems for constructing algorithms repository and
problem repository. Step 2: Test and obtain the result of each algorithm on each problem in the repositories. In this study, the
store the rankings as the meta-knowledge of algorithm performance.
CR IP T
performance is measured by ABF and tuned by t-test. Rank the performances of algorithms on each problem and
Step 3: Randomly sample raw data from the problem space in the repository using LHS, and calculate the corresponding 18 features using the sampled raw data. Store these as meta-feature.
Step 4: Cross-validate the meta-learning model according to the meta-features (features) and algorithm performance (label) and retrain the best model using all data.
AN US
For a new optimization problem, the ARM solution procedure is given as follows:
Step 1: Extract meta-features from the new problem and input them into the trained model. The rankings of the optimization algorithms are output as recommendations.
Step 2: Use the best recommended optimization algorithm to solve the new problem.
4. Experimental validation
M
Step 3: Recommendation results are evaluated using SRCC and SR.
ED
In this section, numerical experiments are grouped into four categories. In the first experiment, different sample sizes for data generated from the benchmark functions are tested separately with three dimensions (10D, 30D and 50D). The performance of meta-learner is sensitive to the sample size which depends on meta-feature
PT
generation and can affect the recommendation results of the proposed meta-learning model. Thus, the appropriate sample size must be selected at the initial stage. Once the sampling size is determined, the second experiment is conducted on PSOs to determine the appropriate meta-learner by exploring the performance of the meta-leaners on
CE
different learning instance repositories. Two types of meta-learner models (ANN and k-NN) are investigated. The first two experiments are studied using the benchmark functions shown in Appendix A on three dimensions (10D, 30D and 50D) separately [7]. The third experiment is to validate the effectiveness and extendibility of ARM on
AC
other population-based optimizers. Six optimizers and a pool of CEC2015 benchmark problems are employed for investigation. Besides, three ensemble algorithms and one hyper-heuristic are adopted to further verify the proposed model in the fourth experiment. In the last experiment, two real-world problems are adopted to validate practical performance of ARM. Regarding the general parameter setting for the optimizers, the maximum number of Fes (MAX_FES) is set to be 10000*D in 30 independent runs. The specific parameter settings are adopted from the corresponding original references. The rankings of each algorithm based on the ABF values among 33 benchmark functions are obtained for representing the performances as discussed in algorithm module. Moreover, both the instance-based meta-learner and the model-based meta-learner (ANN) are implemented to study the proposed ARM in the first two experiments. Based on the preliminary study, we found that the k-NN
ACCEPTED MANUSCRIPT
method with two neighbors performs more efficient (termed as 2NN). For the model-based learning method, ANN, the hidden layer size is tuned within the set [5, 10, 15], and the transfer function is selected from radial basis, logistic sigmoid and tan sigmoid. Besides, to select the appropriate model parameters and avoid over-fitting, a 10-fold cross validation process with 70% of the data used for training and 30% used for validation is applied. After training, ANN regression models are constructed for predicting the ranking values of candidate algorithms.
4.1.
Experiment on sample size adjustment The objective of this experiment is to identify the appropriate sample size for the data from benchmark
CR IP T
functions. Six PSO variants separately optimize each benchmark function with three dimensions (10D, 30D and 50D), and the results are used to build the learning instance repository. In this experiment, the parameter settings of PSO variants are the same to their original references [15, 24, 26, 37, 42].
0.9
1.0
10 Dimension
30 Dimension
0.9
0.6
Average SRCC
ANN KNN
0.7
AN US
Average SRCC
0.8
ANN KNN
0.8
0.7
0.5
0.6
1500
2500
3500
500
1500
M
500
Sample Size
ED
(a)
2500
3500
Sample Size
(b)
1.0
50 Dimension
AC
CE
Average SRCC
PT
0.9
ANN KNN
0.8
0.7
0.6 500
1500
2500
3500
Sample Size
(c) Fig. 2 the performance curve of two meta-learner in three dimensions.
In this experiment, we employ representative functions, f1, f5, f6, f17, f30 and f31 as the test set, and the remaining functions as the training set. Four sample sizes, 500, 1500, 2500 and 3500, are implemented by two meta-learners separately with the proposed meta-feature set. The SRCC value denotes the performance of the recommendation result on the test problems. Each experiment is conducted 10 times. The average SRCC values of the ten repetitions are computed for a multiple comparison test, and the results are presented in Fig. 2. It is
ACCEPTED MANUSCRIPT
observed that as the sample size grows, the curve of ANN is smooth on 30D and 50D problems but increases on 10D problems. For k-NN with 2-neighbor, the performance improves with an increase of sample size from 500 to 2500. Moreover, the speed of convergence slows down as dimensions grow. In the following experiments, the sample sizes are set around 2500, specifically, 2000, 2500 and 3000, which correspond with the dimensions of 10D, 30D and 50D.
4.2.
Validation of ARM on PSO variants The objective of this experiment is to evaluate the accuracy and robustness of ARM derived from PSO
CR IP T
variants. In this experiment, the leave-one-out cross validation strategy is adopted to validate the performance of predictions. Specifically, 32 out of 33 functions are treated as a training data set, and the remaining one is treated as the test set. Hence, the experiment is repeated 33 times to ensure that every function is iteratively trained and tested. For each repetition, the recommendation performance for each problem is measured by the SRCC and SR. Once all the values are generated, the average result over the 33 problems is obtained. To evaluate the stability of the proposed model, the processes on the 33 problems are conducted 30 times to gather the standard deviation of
AN US
recommendation results. Apart from the population size of 40, a population size of 60 on three dimensions (10D, 30D and 50D) is adopted to study the effectiveness of ARM. Please note that, this experiment aims to investigate the performance of the derived model with population size variation, the other parameters of the algorithms remain the same as their original references [15, 24, 26, 37, 42].
The average performance statistics of meta-learner are summarized in Table 2 and Table 3. „Baseline‟ refers to the mean of the target values across the 33 problems, and the mean commonly serves as a reference for the
M
quality of recommendation [3]. A value is greater than the Baseline indicating better quality. As can be seen from Table 2 and Table 3, the recommendation rankings given by both of learners greatly surpass the Baseline ranking. Both of learners have a high average value of SR and a low value average of SRCC for the low dimension. Besides,
ED
it can be observed that the standard deviations are less than 0.04, which indicates the promising convergence and stability of ARM. For the relatively simple problems, many optimizers are easily to find optimal solutions, resulting in similar ranking values. As a result, recommending an appropriate algorithm is easier than predicting
PT
the rankings between optimizers on the 10D problems. For the 30D and 50D problems, both SRCC and SR results are improved as the performance differences of the optimizers are more significant due to the increasing complexity.
CE
When the population size is 40, the performance of the two learners are similar in low dimension (10D
problems) but different for relatively high dimension. Specifically, ANN performs better on the 30D and the 50D problems than 2NN in terms of SRCC and SR. When the population size is 60, ANN outperforms 2NN on the
AC
three dimensions, which indicates the efficacy of ANN in the proposed model. This may be attributed to the fact that the instance-based meta-learner solely predicts the rankings depending on the features that characterize the problems. If the features are insufficient to fully depict the problem properties, finding the true similarity among the problems becomes difficult, which leads to ineffective algorithm selection of the learners. Besides, the diversity of training set may not be sufficient to allow 2NN to locate a more similar problem. The 2NN learner makes recommendation based on the overall rankings of similar algorithms. In some cases, although the problems have high similarity, the same rankings of algorithm performances are still unobtainable due to the randomness of population-based algorithms and the particular distinctions between problems. Though the model-based meta-learner is a supervised learning approach, it derives the model for relating the meta-features to the optimizers‟ rankings. As a result, the proposed model is more tolerant to not only the noises from the meta-features but also the change of rankings. Therefore, both of the meta-learners are appropriate for ARM, though the model-based
ACCEPTED MANUSCRIPT
meta-learner is more effective and adaptable than the instance-based meta-learner. In the following experiments, ANN is selected as the meta-learner. Table 2 Average performance statistics (mean±std) of meta-learner when population size is 40 10D Meta-learner
30D
50D
SR
SRCC
SR
SRCC
SR
ARM-2NN
0.64±0.02
0.87±0.03
0.74±0.02
0.8±0.02
0.68±0.01
0.82±0.02
ARM-ANN
0.62±0.02
0.87±0.05
0.75±0.03
0.84±0.03
0.7±0.02
0.84±0.02
Baseline
0.55
0.55
0.57
0.61
0.57
0.67
CR IP T
SRCC
Table 3 Average performance statistics (mean±std) of meta-learner when population size is 60 Meta-learner
10D
30D
50D
SRCC
SR
SRCC
SR
SRCC
SR
0.64±0.02
0.87±0.03
0.74±0.03
0.86±0.02
0.69±0.01
0.83±0.04
0.65±0.02
0.87±0.04
0.77±0.01
0.88±0.04
0.71±0.01
0.85±0.04
Baseline
0.57
0.61
0.72
0.61
0.49
0.58
AN US
ARM-2NN ARM-ANN
The optimization results of ARM are also compared with each standalone optimizer. The statistical results regarding the best performer on the benchmark problems are presented in Table 4 and Table 5. When population size is 40, it is observed that the proposed models outperform the other optimizers on all the experiments by the number of times that it performs the best, followed by CLPSO that achieves the best results on eighteen 10D problems, twenty 30D problems and tween-three 50D problems. Similar results can be observed for the algorithms
M
with population size of 60. Overall, ARM averagely obtains the best solutions on 28 out of 33 problems over three dimensional settings. Thus, ARM is not only a recommendation system which prevents high computational cost
ED
caused by the traditional trial-and-error method, but also an efficient optimizer for a diverse set of problems. Besides, ARM can significantly improve the computational efficiency. In this experiment, the traditional trial-and-error approach which runs over all the candidate algorithms takes almost one hour (50.67 minutes) to identify the best result of a global optimization problem on average. Whilst, ARM costs less than 10 minutes on
PT
average. All reported time consumptions are based on the experiments run
on a computer with AMD CPU
A8-7650K 3.3GHz, 8Gb RAM and Microsoft Windows 7.
CE
Table 4 Statistics of the best performer when population size is 40. Algorithm
Population size = 40 10D
30D
50D
15% (5/33)
0% (0/33)
0% (0/33)
LPSO
21% (7/33)
9% (3/33)
0% (0/33)
LCPSO
52% (17/33)
33% (11/33)
24% (8/33)
CLPSO
55% (18/33)
61% (20/33)
70% (23/33)
CPSO
3% (1/33)
0% (0/33)
0% (0/33)
UPSO
39% (13/33)
24% (8/33)
18% (6/33)
ARM-2NN
88% (29/33)
82% (27/33)
82% (27/33)
ARM-ANN
88% (29/33)
85% (28/33)
85% (28/33)
AC
GPSO
Table 5 Statistics of the best performer when population size is 60. Algorithm
Population size = 60
ACCEPTED MANUSCRIPT
30D
50D
GPSO
21% (7/33)
3% (1/33)
12% (4/33)
LPSO
9% (3/33)
36% (12/33)
0% (0/33)
LCPSO
61% (20/33)
61% (20/33)
33% (11/33)
CLPSO
39% (13/33)
61% (20/33)
48% (16/33)
CPSO
3% (1/33)
6% (2/33)
9% (3/33)
UPSO
42% (14/33)
39% (13/33)
18% (13/33)
ARM-2NN
88% (29/33)
85% (28/33)
82% (27/33)
ARM-ANN
88% (29/33)
88% (29/33)
85% (28/33)
CR IP T
10D
4.3.
Validation of ARM on various population-based optimizers
In this experiment, 6 various population-based optimizers with different sources of inspirations are included in algorithm repository to validate ARM‟s extendibility and effectiveness. Specifically, GPSO [42], DE [48], ABC [22], BSA [8], DSA [9] and FA [50]. ABC, FA and PSO were inspired by the biologically collective behaviors in
AN US
nature; DE, BSA and DSA were proposed using the genetic and mathematical type of population search. The algorithms are widely used and known to display diverse search behaviors for global optimization problems [8, 9, 21, 50]. The parameters of the candidate algorithms are set to the recommended values in the corresponding original references [8, 9, 22, 42, 48, 50]. The repository of problems is constructed by the 33 benchmark functions. For generalizability, the problem dimension is set as 30. Then, ANN is adopted as the meta-learner. The first three modules of ARM are conducted as previously described and six meta-learners are trained and prepared to make
M
recommendations for new problems.
To fully test the effectiveness of ARM, in this experiment, the pool of learning-based real-parameter optimization problems collected from CEC 2015 are added, including unimodal functions, multimodal functions,
ED
hybrid functions and composition functions. For detailed properties of the problems please refer to [25]. Each problem is treated as a new problem to be solved by ARM. Then, the true best algorithm among the candidate algorithms is verified using the trial-and-error method. The results are gathered and shown in Table 6. It can be
PT
observed that ARM obtains the best performance on 14 out of 15 functions. The high SRCC value on each problem indicates high agreement between the recommended rankings and the true rankings. Moreover, the distribution of the best performers shows that ARM can take great advantage of the diversity of the component algorithms in
CE
solving the CEC 2015 benchmark functions. Please note that although ARM is validated using the PSOs and tested on the other population-based
algorithms, the flexibility and knowledge learning mechanism of the model enables its scalability to more
AC
population-based optimizers. Attributed to the adaptive recommendation strategy, the performance of ARM will be scaled up with the integration of efficient algorithms with diverse search properties.
Table 6 Ranking values on CEC 2015 benchmark functions. Algorithm
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
Accumulation
GPSO
4
5
6
3
5
4
3
4
6
4
6
5
6
6
1
1/15
DE
6
6
5
5
6
6
6
6
3
6
5
6
4
5
1
1/15
ABC
5
2
2
6
2
5
1
5
4
5
1
2
3
1
1
4/15
BSA
1
1
3
2
3
2
2
1
2
1
2
3
2
3
1
5/15
DSA
3
3
4
4
4
3
5
2
1
3
3
4
1
4
1
3/15
ACCEPTED MANUSCRIPT
FA
2
4
1
1
1
1
4
3
5
2
4
1
5
2
6
5/15
ARM
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
14/15
Table 7 SRCC value on each CEC 2015 benchmark functions. f
f1
f2
f3
f4
f5
f6
f7
f8
ARM
0.99
0.99
0.99
0.90
0.97
1
1
1
f
f9
f10
f11
f12
f13
f14
f15
Average
ARM
1
1
1
0.99
1
0.99
0.73
0.97
Experiment of ARM on ensemble algorithms and hyper-heuristics
CR IP T
4.4.
In this section, 3 ensemble algorithms and 1 hyper-heuristics are included to validate the performance of ARM. Specifically, DSO [12], EPSDE [30], EPSO [29], and MPEDE [49] are employed to establish the algorithm repository. The parameter settings of these algorithms in the corresponding original references are adopted in this experiment [12, 29, 30, 49]. The problem repository, problem dimensions, meta-leaner and test problems are consistent with the previous section. The results of ranking values are shown in Table 8 Ranking values on CEC
AN US
2015 benchmark functions.. As seen, ARM obtains the best performance on 14 out of 15 functions. The results of SRCC values are presented in Table 9. It is observed that ARM obtains an average value of 0.93 in this experiment, which shows that ARM could make highly accurate
recommendation. The experimental results indicate that
ARM outperforms the compared hyper-heuristics and ensemble algorithms in terms of obtaining the best results.
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
Accumulation
DSO
4
4
1
4
3
3
3
3
4
4
3
4
3
4
4
1/15
EPSDE
2
2
4
2
4
4
4
2
2
3
4
2
4
1
1
2/15
EPSO
3
3
2
3
1
2
2
4
1
2
1
3
2
2
1
4/15
MPEDE
1
1
3
1
2
1
1
1
3
1
2
1
1
3
1
10/15
ARM
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
14/15
PT
ED
Algorithm
M
Table 8 Ranking values on CEC 2015 benchmark functions.
CE
Table 9 SRCC value on each CEC 2015 benchmark functions. f1
f2
f3
f4
f5
f6
f7
f8
ARM
0.95
1
0.95
1
0.65
1
0.95
0.8
f
f9
f10
f11
f12
f13
f14
f15
Average
ARM
0.95
1
0.95
1
1
1
0.8
0.93
AC
f
4.5.
Experiment on real-world problems
To validate the practical applicability of ARM, 2 real-world cases are studied [18, 32]: the spread spectrum radar polyphase code design problem and the parameter estimation for frequency modulation problem. Since ensemble algorithms and hyper-heuristics are relatively computationally expensive, the trained models and the repository from Section 4.3 are adopted for ARM to address the real-world problems in this test.
4.5.1.
Spread spectrum radar polyphase code design problem
ACCEPTED MANUSCRIPT
The selection of an appropriate waveform is the main factor in radar systems with pulse compression. To accomplish this, a new method was designed based on the properties of the aperiodic autocorrelation function and the hypothesis of the coherent radar pulse processing [32]. A spread spectrum radar polyphase code design problem (SSRP) can be modeled as a min-max nonlinear non-convex optimization problem with continuous variables. The problem surface is rough and has numerous local optima. The model is presented as bellow:
min f x max 1 x ,...,2 m x , X
x ,..., x R 1
n
n
(15)
0 x j 2 , j 1,..., n ,
(16)
CR IP T
x X
where m 2n 1 and
j n 2i 1 x cos xk , i 1,..., n k 2i j 1 1 j i
n
xk i 1,..., n 1 k 2i j 1 1 j
cos
j i 1
(18)
AN US
2i x 0.5
(17)
m i x i x , i 1,..., m
(19)
The goal of the model is to minimize the so-called autocorrelation function of the largest receiver module in the pulse-compressed radar complex envelope output of the best receiver. The variable represents the symmetrical
M
phase difference. The continuous min–max global optimization problem is NP-hard [32], and characterized by the fact that the objective function is piecewise smooth. For this case, the meta-features of the problem are automatically generated and input into the meta-learners. The solution result is presented in Table 10. It can be
ED
observed that ARM obtains the best solution and best ranking on the spread spectrum radar polyphase code design
PT
problem.
Table 10 Comparison on spread spectrum radar polyphase code design problem GPSO
DE
ABC
BSA
DSA
FA
ARM
Result
1.07E+00
1.32E+00
1.22E+00
1.22E+00
1.31E+00
7.72E-01
7.72E-01
Ranking
2
6
3
3
5
1
1
SRCC 0.98
AC
CE
Algorithm
4.5.2.
Parameter estimation for frequency modulation
In telecommunications and signal processing, Frequency-Modulated (FM) is the encoding of information in
a carrier wave by varying the instantaneous frequency of the wave and FM synthesizer is a popular technique [18]. To make the generated sound wave more equivalent to the target sound, there are six parameters of the sound wave for an FM synthesizer that need to be optimized and is formed as
X a1 , w1 , a2 , w2 , a3 , w3
. The models of the
target sound wave and the estimated sound wave are respectively presented in Eqs. (20) and (21):
y0 t sin 5 * t * 1.5sin 4.8 * t * 2sin(4.9 * t * )
y t a1 *sin w1 * t * a2 *sin w2 * t * a3 *sin w3 * t *
(20)
(21)
ACCEPTED MANUSCRIPT where 2 / 100 , and the parameter t is defined in the range [-6.4, 6.35]. The objective function is to minimize
the summation of square errors between the target wave with the minimum value f X sol =0 and the estimated wave is given as bellow: 100
f X y t y0 t
2
(22)
t 0
This problem is a highly complex multimodal model with strong epistasis. The recommended result is given in Table 11. It is seen from the result that ARM achieves the best performance on the parameter estimation for
Table 11 Comparison on parameter estimation for frequency modulation GPSO
DE
ABC
BSA
DSA
Result
5.25E+00
1.40E+01
7.21E+00
5.23E+00
6.89E+00
Ranking
2
5
4
1
3
FA
ARM
1.67E+01
5.23E+00
6
1
SRCC 0.90
AN US
Algorithm
CR IP T
frequency modulation in terms of solution accuracy.
In conclusion, ARM can recommend the correct best performers and highly similar rankings to the true values for the two real-world problems, which indicates ARM‟s practical effectiveness for real-world problems.
5. Conclusion
Numerous population-based algorithms have been proposed due to their efficiency for global optimization.
M
Although these algorithms have demonstrated promising performance on some of the problems, their overall effectiveness across a variety of problems is limited. Instead of using traditional trial-and-error method, an algorithm recommendation model based on prior knowledge is valuable to optimize a new problem. Therefore, we
ED
contend the characteristics of a given optimization problem may shed the light in choosing the appropriate algorithm using the meta-learning concept. In this paper, a generalized ARM based meta-learning model is proposed for adaptively selecting population-based optimizers by mapping the problem characteristics to algorithm
PT
performance. The gap between the algorithm performance and problem characteristics is bridged using the proposed four components: meta-feature module, algorithm performance module, learning module and recommending module. First, 18 computationally efficient meta-features including statistical features, geometrical
CE
measurement features and landscape features are empirically derived to characterize problem spaces. Second, algorithm performance is represented by rankings to normalize the computational difference on different problems. Third, a meta-learner in the learning module is triggered to learn the underlying mapping function of the problem
AC
properties and algorithm performance according to the knowledge learned from the first two modules. Finally, given a new problem, the meta-features are analyzed and the most appropriate optimizer is selected to address the problem. In this study, the benchmark functions of various characteristics and 2 real-world problems are collected as the problem repository and 15 various algorithms are included to validate the effectiveness of ARM. Experimental analysis is conducted through four experiments: (1) parameter tuning on the sample size for generating meta-features; (2) the ARM is initially tested using PSOs, and is comprehensively tested on benchmark functions with various properties in different dimensions; (3) the effectiveness of the ARM is then tested on 6 other population-based algorithms using the optimization problems from CEC 2015 and further validated by three ensemble algorithms and one hyper heuristic. (4) 2 real-world problems are adopted to validate ARM‟s practical performance. Experimental results demonstrate the effectiveness and efficiency of the proposed model for global optimization.
ACCEPTED MANUSCRIPT
In summary, the contributions of the developed model are threefold: (1) A practical set of meta-features is proposed to depict the problem space of global optimization; (2) To the best of our knowledge, this is the first adaptive recommendation model using meta-learning from the machine learning domain for population-based algorithms, which primarily addresses global optimization; (3) The knowledge-based learning mechanism of ARM, which requires little expert experiences, enables its flexibility and extendibility to include more efficient optimizers for complex problems. Please note that even though the model is derived using the PSOs and tested on other population-based algorithms, the performance of ARM can be scaled up with the integration of more efficient algorithms because of its adaptive recommendation strategy. In addition, ARM can serve as an alternative approach for traditional trial-and-error optimization tasks,
CR IP T
especially when the number of candidate optimizers is large and little prior knowledge of the problems is available. This study provides practical guidelines for the design, implementation and testing of a recommendation model for various global optimization problems. Specifically, it can facilitate non-experts with optimization algorithm selection by reducing the computational cost and improving optimization efficiency.
Despite ARM‟s promise, a considerable number of improvements could be undertaken for future research. (1)
Appropriate feature characterization for the global optimization problems could be explored further.
AN US
There are still many candidate features in the landscape analysis methods that describes the characteristics of local area [34]. In addition, the performance of the meta-learner requires investigation using other supervised learning algorithms, such as random forest, support vector regression and deep learning. (2)
Multi-criteria metrics, e.g. precision and computational cost, will be studied to enrich the ranking values for recommendation in addition to the ABF metric.
More numerical comparisons between ARM and other approaches, such as ensemble methods,
M
(3)
hyper-heuristics, and hybrid methods, are of great interest for further study. In addition, the impact of
PT
Acknowledgements
ED
algorithm diversity in ARM for global optimization problems will also be studied in our future work.
This work is partially supported by Major Project for National Natural Science Foundation of China (Grant No. 71790615, the Design for Decision-making System of National Security Management), the Key Project of
CE
National Nature Science Foundation of China (Grant No. 71431006, Decision Support Theory and Platform of the Embedded Service for Environmental Management), National Natural Science Foundation of China (Grant Nos. 71501132, 71701079, 71402103 and 71371127), Natural Science Foundation of Guangdong Province (Grant Nos.
AC
2016A030310067 and 2015A030313556), and 2016 Tencent “Rhinoceros Birds”- Scientific Research Foundation for Young Teachers of Shenzhen University. The authors would like to thank Dr Teresa Wu for her valuable advice in improving the quality of the manuscript.
References [1] M.A. Awadallah, M.A. Al-Betar, A.T. Khader, A.L.A. Bolaji, M. Alkoffash, Hybridization of harmony search with hill climbing for highly constrained nurse rostering problem, Neural Computing and Applications, 28 (3) (2017) 463-482. [2] C. Blum, J. Puchinger, G.R. Raidl, A. Roli, Hybrid metaheuristics in combinatorial optimization: A survey, Applied Soft Computing, 11 (6) (2011) 4135-4151.
ACCEPTED MANUSCRIPT
[3] P. Brazdil, C.G. Carrier, C. Soares, R. Vilalta, Metalearning: Applications to data mining, Springer Science & Business Media, 2008. [4] E.K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. Özcan, R. Qu, Hyper-heuristics: A survey of the state of the art, Journal of the Operational Research Society, 64 (12) (2013) 1695-1724. [5] X. Chen, Y.S. Ong, M.H. Lim, K.C. Tan, A multi-facet survey on memetic computation, IEEE Transactions on Evolutionary Computation, 15 (5) (2011) 591-607. [6] X. Chu, J. Chen, F. Cai, L. Li, Q. Qin, Adaptive brainstorm optimisation with multiple strategies, Memetic Computing, (2018) 1-14. [7] X. Chu, T. Wu, J.D. Weir, Y. Shi, B. Niu, L. Li, Learning-interaction-diversification framework for swarm
CR IP T
intelligence optimizers: A unified perspective, Neural Computing and Applications, (2018) 1-21.
[8] P. Civicioglu, Backtracking search optimization algorithm for numerical optimization problems, Applied Mathematics & Computation, 219 (15) (2013) 8121-8144.
[9] P. Civicioglu, Transforming geocentric cartesian coordinates to geodetic coordinates by using differential search algorithm, Computers & Geosciences, 46 (3) (2012) 229-247.
[10] C. Cui, M. Hu, J.D. Weir, T. Wu, A recommendation system for meta-modeling: A meta-learning based
AN US
approach, Expert Systems with Applications, 46 (2015) 33-44.
[11] C. Cui, T. Wu, M. Hu, J.D. Weir, X. Li, Short-term building energy model recommendation system: A meta-learning approach, Applied Energy, 172 (2016) 251-263.
[12] V.V. de Melo, W. Banzhaf, Drone squadron optimization: A novel self-adaptive algorithm for global numerical optimization, Neural Computing and Applications, (2017) 1-28.
[13] A.E. Eiben, S.K. Smit, Parameter tuning for configuring and analyzing evolutionary algorithms, Swarm & Evolutionary Computation, 1 (1) (2011) 19-31.
M
[14] D.G. Ferrari, L.N.D. Castro, Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods, Information Sciences, 301 (2015) 181-194. [15] V.D.B. Frans, A.P. Engelbrecht, A cooperative approach to particle swarm optimization, IEEE Transactions on
ED
Evolutionary Computation, 8 (3) (2004) 225-239. [16] F.E. Grubbs, Sample criteria for testing outlying observations, Annals of Mathematical Statistics, 21 (1) (1950) 27-58.
PT
[17] J.C. Helton, F.J. Davis, Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems, Reliability Engineering & System Safety, 81 (1) (2003) 23-69.
CE
[18] A. Horner, J. Beauchamp, L. Haken, Machine tongues xvi: Genetic algorithms and their application to fm matching synthesis, Computer Music Journal, 17 (4) (1993) 17-29. [19] T. Jones, S. Forrest, Fitness distance correlation as a measure of problem difficulty for genetic algorithms, in:
AC
International Conference on Genetic Algorithms, 1995, pp. 184-192. [20] J.Y. Kanda, A.C.P.L.F.d. Carvalho, E.R. Hruschka, C. Soares, Using meta-learning to recommend meta-heuristics for the traveling salesman problem, in: 10th International Conference on Machine Learning and Applications and Workshops, 2011, pp. 346-351.
[21] D. Karaboga, B. Basturk, On the performance of artificial bee colony (abc) algorithm, Applied soft computing, 8 (1) (2008) 687-697. [22] D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (abc) algorithm, Journal of Global Optimization, 39 (3) (2007) 459-471. [23] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International Conference on Neural Networks, 1995, pp. 1942-1948. [24] J. Kennedy, R. Mendes, Population structure and particle swarm performance, in: IEEE Congress
ACCEPTED MANUSCRIPT
Evolutionary Computation, 2002, pp. 1671-1676. [25] J. Liang, B. Qu, P. Suganthan, Q. Chen, Problem definitions and evaluation criteria for the cec 2015 competition on learning-based real-parameter single objective optimization, Technical Report201411A, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Singapore, (2014). [26] J.J. Liang, A.K. Qin, P.N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, Ieee Transactions on Evolutionary Computation, 10 (3) (2006) 281-295. [27] T. Liao, D. Molina, T. Tzle, Performance evaluation of automatically tuned continuous optimizers on different benchmark sets, Applied Soft Computing, 27 (2014) 490-503.
CR IP T
[28] M. Lunacek, D. Whitley, The dispersion metric and the cma evolution strategy, in: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM, Seattle, Washington, Usa, 2006, pp. 477-484.
[29] N. Lynn, P.N. Suganthan, Ensemble particle swarm optimizer, Applied Soft Computing, 55 (2017) 533-548.
[30] R. Mallipeddi, P.N. Suganthan, Q.K. Pan, M.F. Tasgetiren, Differential evolution algorithm with ensemble of parameters and mutation strategies, Applied Soft Computing, 11 (2) (2011) 1679-1696.
AN US
[31] A. Matala, Sample size requirement for monte carlo simulations using latin hypercube sampling, Helsinki University of Technology, Department of Engineering Physics and Mathematics, Systems Analysis Laboratory, (2008).
[32] N. Mladenović, J. Petrović, V. Kovačević-Vujčić, M. Čangalović, Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search, European Journal of Operational Research, 151 (2) (2003) 389-399.
[33] M.A. Muñoz, M. Kirley, S.K. Halgamuge, A meta-learning prediction model of algorithm performance for
Springer, 2012, pp. 226-235.
M
continuous optimization problems, in: International Conference on Parallel Problem Solving from Nature,
[34] M.A. Muñoz, Y. Sun, M. Kirley, S.K. Halgamuge, Algorithm selection for black-box continuous optimization
ED
problems: A survey on methods and challenges, Information Sciences, 317 (2015) 224-245. [35] H.R. Neave, P.L. Worthington, Distribution-free tests, Contemporary Sociology, 19 (3) (1990) 488. [36] G.L. Pappa, G. Ochoa, M.R. Hyde, A.A. Freitas, J. Woodward, J. Swan, Contrasting meta-learning and
PT
hyper-heuristic research: The role of evolutionary algorithms, Genetic Programming and Evolvable Machines, 15 (1) (2014) 3-35.
CE
[37] K.E. Parsopoulos, M.N. Vrahatis, Unified particle swarm optimization for solving constrained engineering optimization problems, in: International Conference on Natural Computation, Springer, 2004, pp. 582-591. [38] F. Peng, K. Tang, G. Chen, X. Yao, Population-based algorithm portfolios for numerical optimization, IEEE
AC
Transactions on Evolutionary Computation, 14 (5) (2010) 782-800. [39] Q. Qin, S. Cheng, Q. Zhang, L. Li, Y. Shi, Particle swarm optimization with interswarm interactive learning strategy, IEEE Transactions on Cybernetics, 46 (10) (2016) 2238-2251.
[40] J.R. Rice, The algorithm selection problem *, Advances in Computers, 15 (1976) 65-118. [41] F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, 65 (6) (1958) 386. [42] Y. Shi, R. Eberhart, Modified particle swarm optimizer, in: IEEE World Congress on Computational Intelligence, 1998, pp. 69-73. [43] K.A. Smith-Miles, Cross-disciplinary perspectives on meta-learning for algorithm selection, Acm Computing Surveys, 41 (1) (2008) 137-153. [44] K.A. Smith-Miles, R.J. James, J.W. Giffin, Y. Tu, A knowledge discovery approach to understanding
ACCEPTED MANUSCRIPT
relationships between scheduling problem structure and heuristic performance, in: International Conference on Learning and Intelligent Optimization, Springer-Verlag, 2009, pp. 89-103. [45] J.E. Smith, T.C. Fogarty, Operator and parameter adaptation in genetic algorithms, Soft Computing, 1 (2) (1997) 81-87. [46] C. Soares, A hybrid meta-learning architecture for multi-objective optimization of svm parameters, Neurocomputing, 143 (143) (2014) 27-43. [47] C. Soares, P.B. Brazdil, P. Kuba, A meta-learning method to select the kernel width in support vector regression, Machine Learning, 54 (3) (2004) 195-209. [48] R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for global optimization over
CR IP T
continuous spaces, Journal of Global Optimization, 11 (4) (1997) 341-359.
[49] G. Wu, R. Mallipeddi, P.N. Suganthan, R. Wang, H. Chen, Differential evolution with multi-population based ensemble of mutation strategies, Information Sciences, 329 (2016) 329-345.
[50] X.S. Yang, Firefly algorithm, stochastic test functions and design optimisation, International Journal of
AC
CE
PT
ED
M
AN US
Bio-Inspired Computation, 2 (2) (2010) 78-84.
ACCEPTED MANUSCRIPT
Appendix A. Benchmark functions Table 12 Properties of benchmark functions. Basic function
Search Range
f1
Sphere
[-100,100]
f2
Schwefel P2.2
[-10,10]D
f3 f4
Schwefel P1.2 Schwefel P2.21
D
MM
Se
Sf
Rt
Ns
MS
N
Y
Y
N
N
N
N
N
Y
N
N
N
[-100,100]
D
N
N
Y
N
N
N
[-100,100]
D
N
Y
Y
N
N
N
D
N
N
Y
N
N
N
Y
Y
N
N
Y
Y
N
N
Y
Y
N
N
N
N
N
N
Y
N
N
N
Y
N
N
N
Y
N
N
N
f5
Rosenbrock
[-100,100]
f6
Schwefel P2.21
[-100,100]D
N
N
[-100,100]
D
N
N
D
N
N
f7
Rosenbroock
CR IP T
#
f8
Diff Power
[-100,100]
f9
2D minima
[-5,5]D
Y
Y
[-5,5]
D
Y
Y
D
Y
Y
Y
N
Y
N
Y
N
N
N
Y
Y
N
N
N
N
Y
N
Y
N
N
N
Y
N
N
N
N
N
f10
Rastrigin Non-Rastrigin
[-5,5]
f12
Ackley
[-32,32]D
f13
Griewank
AN US
f11
[-600,600]
D
D
f14
Weierstrass
[-0.5,0.5]
f15
Salomon
[-100,100]D
Penalized 2 D
f18
2 minima
f19
Griewank
f20
Salomon
f22
Rastrigin
Non-Rastrigin
PT
f23
[-50,50]
Y
N
Y
N
N
N
D
Y
N
N
Y
N
N
[-600,600]D
Y
N
Y
Y
N
N
Y
N
N
Y
N
N
Y
N
Y
Y
N
N
Y
N
Y
Y
N
N
Y
N
Y
Y
N
N
Y
N
Y
Y
N
N
Y
N
Y
Y
N
N
[-0.5,0.5]
[-5,5]
D
f25
Schwefel P2.13
[-π, π]D
CE
[-32,32]
f27
Schwefel P1.2
[-50,50]
D
D
Y
N
Y
Y
N
N
[-100,100]
D
N
N
Y
N
Y
N
D
N
N
Y
Y
Y
N
Y
Y
Y
N
N
Y
Y
Y
Y
N
N
Y
N
N
Y
Y
Y
N
Quadric
[-100,100]
f29
Mis-scaled Rastrigin 10
[-5,5]D
AC
f28
f30
Mis-scaled Rastrigin 100
D
[-5,5]D
Ackley
Penalized 2
D
[-100,100]
f24
f26
D
[-100,100]
Weierstrass
f21
[-π, π]
M
f17
Schwefel P2.13
ED
f16
D
[-5,5]
D D
f31
Schwefel P1.2
[-100,100]
f32
Mis-scaled Rastrigin 10
[-5,5]D
Y
N
Y
Y
N
Y
D
Y
N
Y
Y
N
Y
f33
Mis-scaled Rastrigin 100
[-5,5]
where ” MM” denotes multimodal, ”Sf” denotes shifted operation, ”Rt” denotes rotated operation, Se denotes separable, Ns denotes noisy, MS denotes mis-scaled. The value of the corresponding column is „„Y‟‟ if the function has the specific property, otherwise, it is „„N‟‟; For the Rt column, apart from „„Y‟‟ and ‟‟N‟‟, ‟‟Single‟‟.
ACCEPTED MANUSCRIPT
Table 13 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
GPSO
1
1
0
0
0
0
0
0
0
0
0
LPSO
1
1
0
0
0
0
0
0
1
0
0
LCPSO
1
1
1
1
0
1
0
1
0
0
0
CLPSO
1
1
0
0
0
0
1
0
1
1
1
CPSO
1
0
0
0
0
0
0
0
0
0
0
UPSO
1
1
1
1
1
1
0
1
1
0
0
ARM-2NN
1
1
1
1
0.9
0.6
0.1
1
1
1
1
ARM-ANN
1
1
1
1
0.5
1
0.3
0.8
1
1
1
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
GPSO
0
0
1
0
0
1
0
0
0
0
0
LPSO
0
0
1
0
0
1
0
0
1
0
0
LCPSO
1
0
1
1
0
1
0
0
0
1
0
CLPSO
0
1
1
0
0
1
1
1
0
0
1
CPSO
0
0
0
0
0
0
0
0
0
0
0
UPSO
1
0
0
0
1
1
0
0
0
0
0
ARM-2NN
1
0.7
1
1
0
1
1
1
0.5
1
1
ARM-ANN
1
0.8
1
1
0.2
1
1
0.4
0.3
0.8
1
f23
f24
f25
f26
f27
f28
f29
f30
f31
f32
f33
GPSO
0
0
0
1
0
0
0
0
0
0
0
LPSO
0
0
0
1
0
0
0
0
0
0
0
LCPSO
0
CLPSO
1
CPSO
0
ARM-2NN
AC
M
ED 1
1
1
1
0
0
1
0
0
0
0
1
0
0
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
1
1
0.7
1
1
0.3
1
1
1
1
1
1
1
1
1
1
0.6
1
1
1
1
1
CE
ARM-ANN
1
PT
UPSO
AN US
Algorithms
CR IP T
best performer (when population size is 40 and on 10D).
ACCEPTED MANUSCRIPT
Table 14 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
1
0
0
LCPSO
0
1
0
1
0
1
0
0
0
0
0
CLPSO
1
1
0
0
1
0
1
0
1
1
1
CPSO
0
0
0
0
0
0
0
0
0
0
0
UPSO
1
1
1
1
0
0
0
1
0
0
0
ARM-2NN
0.8
1
0
1
1
0.5
1
0
1
1
1
ARM-ANN
1
0.
0
1
0.9
0.7
1
0.5
1
1
1
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
0
1
0
0
0
0
0
1
0
0
LCPSO
0
0
0
1
1
1
0
0
0
1
0
CLPSO
0
1
1
0
0
1
1
1
0
0
1
CPSO
0
0
0
0
0
0
0
0
0
0
0
UPSO
1
0
0
0
0
1
0
0
0
0
0
ARM-2NN
1
0.8
1
1
1
1
1
1
0.3
1
1
ARM-ANN
0.2
1
1
0.3
1
1
1
1
0
0.6
1
f23
f24
f25
f26
f27
f28
f29
f30
f31
f32
f33
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
0
0
0
LCPSO
0
CLPSO
1
CPSO
0
ARM-2NN
AC
M
ED 1
0
1
1
0
0
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
1
1
0.2
1
1
1
1
1
1
0
1
1
1
0.3
1
1
1
1
1
CE
ARM-ANN
0
PT
UPSO
AN US
Algorithms
CR IP T
best performer (when population size is 40 and on 30D).
ACCEPTED MANUSCRIPT
Table 15 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
0
0
0
LCPSO
0
0
0
0
0
1
0
0
0
0
0
CLPSO
1
1
0
0
1
0
1
0
1
1
1
CPSO
0
0
0
0
0
0
0
0
0
0
0
UPSO
1
1
1
1
0
1
0
1
0
0
0
ARM-2NN
1
1
0
0
1
0.1
1
0
1
1
1
ARM-ANN
1
0.9
0
0.2
1
0.6
1
0
1
1
1
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
0
0
0
LCPSO
0
0
0
1
1
0
0
0
0
1
0
CLPSO
1
1
1
1
0
1
1
1
1
0
1
CPSO
0
0
0
0
0
0
0
0
0
0
0
UPSO
0
0
0
0
0
0
0
0
0
0
0
ARM-2NN
1
1
1
1
1
1
1
1
1
0
1
ARM-ANN
1
1
1
1
0.9
1
1
1
1
0
1
f23
f24
f25
f26
f27
f28
f29
f30
f31
f32
f33
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
0
0
0
LCPSO
0
CLPSO
1
CPSO
0
ARM-2NN
AC
M
ED 1
0
1
1
0
0
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
CE
ARM-ANN
0
PT
UPSO
AN US
Algorithms
CR IP T
best performer (when population size is 40 and on 50D).
ACCEPTED MANUSCRIPT
Table 16 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the best performer (when population size is 60 and on 10D). f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
GPSO
1
1
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
1
0
0
LCPSO
1
1
0
1
1
1
0
1
1
0
0
CLPSO
0
0
0
0
0
0
0
0
1
1
1
CPSO
0
0
0
0
0
0
0
0
0
0
0
UPSO
1
1
1
1
0
1
1
1
1
0
0
ARM-2NN
0.2
1
0.4
1
1
1
ARM-ANN
0.9
1
0.3
1
1
1
f12
f13
f14
f15
f16
f17
GPSO
0
0
1
0
0
1
LPSO
0
0
0
0
0
0
LCPSO
1
0
1
1
1
1
CLPSO
0
1
0
0
CPSO
0
0
0
0
UPSO
1
1
0
0
ARM-2NN
1
0.9
1
1
ARM-ANN
1
0
0.9
1
f23
f24
f25
GPSO
0
0
0
LPSO
0
0
0
LCPSO
0
CLPSO
1
CPSO
0
ARM-2NN
AC
1
1
0
1
1
1
1
f18
f19
f20
f21
f22
0
0
0
0
0
0
0
1
1
0
0
0
1
1
1
AN US
1
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
1
1
1
0.1
1
1
1
0.8
1
0.9
0.1
1
1
1
f26
f27
f28
f29
f30
f31
f32
f33
1
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
M
ED
1
1
1
1
1
0
0
1
0
0
0
0
0
0
0
1
1
0
1
1
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
0
0
1
0
0
1
1
1
1
0.5
0.5
1
1
1
1
1
1
1
1
1
0.9
0.8
1
1
1
1
1
CE
ARM-ANN
0
1
PT
UPSO
CR IP T
Algorithms
ACCEPTED MANUSCRIPT
Table 17 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the best performer (when population size is 60 and on 30D). f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
GPSO
0
0
0
0
0
0
0
0
0
0
0
LPSO
1
1
0
0
0
1
0
0
1
0
0
LCPSO
1
1
1
1
0
1
0
1
1
0
1
CLPSO
1
1
0
0
1
0
0
0
1
1
1
CPSO
0
0
0
0
0
0
0
0
0
1
1
UPSO
1
1
1
1
0
1
1
1
1
0
0
ARM-2NN
1
1
1
1
0.8
1
ARM-ANN
1
1
1
0.8
1
f12
f13
f14
f15
GPSO
0
0
0
LPSO
1
0
LCPSO
1
CLPSO
CR IP T
Algorithms
1
1
1
1
1
0.1
1
1
1
1
f16
f17
f18
f19
f20
f21
f22
0
0
0
0
1
0
0
0
1
1
0
1
0
0
1
1
0
0
1
1
1
1
0
0
1
1
0
1
1
1
0
CPSO
0
0
0
0
UPSO
1
0
0
0
ARM-2NN
1
0.4
1
ARM-ANN
1
0.4
f23 GPSO
AN US
1
1
1
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0.8
0.4
1
1
0.6
0.6
0.4
1
1
0.3
1
1
1
0.4
0.9
0.6
1
f24
f25
f26
f27
f28
f29
f30
f31
f32
f33
0
0
0
0
0
0
0
0
0
0
0
LPSO
0
1
0
1
0
0
0
0
0
0
0
LCPSO
0
CLPSO
1
CPSO
0
ARM-2NN
AC
ED 0
1
1
1
0
0
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
1
1
0.1
1
1
0.4
1
1
1
1
1
1
1
0.6
1
1
0.3
1
1
1
1
1
CE
ARM-ANN
1
PT
UPSO
M
0
ACCEPTED MANUSCRIPT
Table 18 comparison among algorithms (PSO variants and meta-learner) in terms of the probability of being the best performer (when population size is 60 and on 50D). f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
GPSO
0
1
0
0
0
0
0
0
0
0
0
LPSO
0
0
0
0
0
0
0
0
0
0
0
LCPSO
1
1
0
0
0
0
0
0
0
0
0
CLPSO
0
0
0
0
1
0
0
0
1
1
1
CPSO
0
0
0
0
0
0
0
0
0
1
1
UPSO
1
1
1
1
1
1
1
1
0
0
0
ARM-2NN
1
0
0.7
1
0.8
0.9
ARM-ANN
0.9
0.8
0.1
0.4
1
f12
f13
f14
f15
GPSO
0
0
0
LPSO
0
0
LCPSO
1
CLPSO
CR IP T
Algorithms
1
1
1
1
0.8
0.9
0.1
1
1
1
f16
f17
f18
f19
f20
f21
f22
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
0
0
1
1
0
CPSO
0
0
0
0
UPSO
1
1
0
0
ARM-2NN
1
1
1
ARM-ANN
1
1
f23 GPSO
AN US
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0.5
0.9
1
1
1
1
0.7
1
1
0.5
0.8
0.9
1
1
1
0.3
1
f24
f25
f26
f27
f28
f29
f30
f31
f32
f33
0
0
0
0
1
0
0
0
1
0
0
LPSO
0
0
0
0
0
0
0
0
0
0
0
LCPSO
0
CLPSO
1
CPSO
0
ARM-2NN
AC
ED 1
0
1
1
0
0
1
0
0
1
0
0
0
1
1
1
0
1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
1
1
0
1
1
1
1
1
1
0.9
1
1
1
0.2
1
1
1
1
1
CE
ARM-ANN
1
PT
UPSO
M
0