Parallel transfer evolution algorithm

Parallel transfer evolution algorithm

Accepted Manuscript Parallel transfer evolution algorithm Yuanjun Laili, Lin Zhang, Yun Li PII: DOI: Reference: S1568-4946(18)30676-8 https://doi.or...

996KB Sizes 0 Downloads 36 Views

Accepted Manuscript Parallel transfer evolution algorithm Yuanjun Laili, Lin Zhang, Yun Li

PII: DOI: Reference:

S1568-4946(18)30676-8 https://doi.org/10.1016/j.asoc.2018.11.044 ASOC 5220

To appear in:

Applied Soft Computing Journal

Received date : 14 April 2018 Revised date : 21 October 2018 Accepted date : 28 November 2018 Please cite this article as: Y. Laili, L. Zhang and Y. Li, Parallel transfer evolution algorithm, Applied Soft Computing Journal (2018), https://doi.org/10.1016/j.asoc.2018.11.044 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Highlights (for review)

The highlights of the paper entitled “Parallel transfer evolution algorithm” are as follows. (1) Topological transfer and algorithmic transfer are proposed based on the single connection to enable a fast and flexible parallel search; (2) A group of evolutionary states are presented to guide the external communication and internal evolution of multiple sub-populations; (3) A local search configuration scheme is also introduced to enhance the search capability (4) Experimental results on two groups of benchmarks and a practical problem show that the proposed algorithm outperforms other ad-hoc evolutionary algorithms and parallel evolutionary algorithms without any change.

*Manuscript Click here to view linked References

Parallel transfer evolution algorithm Yuanjun Laili, Lin Zhang, Senior Member, IEEE and Yun Li, Senior Member, IEEE Abstract—Parallelization of an evolutionary algorithm takes the advantage of modular population division and information exchange among multiple processors. However, existing parallel evolutionary algorithms are rather ad hoc and lack a capability of adapting to diverse problems. To accommodate a wider range of problems and to reduce algorithm design costs, this paper develops a parallel transfer evolution algorithm. It is based on the island-model of parallel evolutionary algorithm and, for improving performance, transfers both the connections and the evolutionary operators from one sub-population pair to another adaptively. Needing no extra upper selection strategy, each sub-population is able to select autonomously evolutionary operators and local search operators as subroutines according to both the sub-population’s own and the connected neighbor’s ranking boards. The parallel transfer evolution is tested on two typical combinatorial optimization problems in comparison with six existing ad-hoc evolutionary algorithms, and is also applied to a real-world case study in comparison with five typical parallel evolutionary algorithms. The tests show that the proposed scheme and the resultant PEA offer high flexibility in dealing with a wider range of combinatorial optimization problems without algorithmic modification or redesign. Both the topological transfer and the algorithmic transfer are seen applicable not only to combinatorial optimization problems, but also to non-permutated complex problems.1 Index Terms—Evolutionary computation, parallel algorithm, topological design, algorithmic adaptation

I. INTRODUCTION

R

EAL-world non-deterministic polynomial-time hard (NP-hard) optimization problems are becoming more complex to solve and are presenting more challenges to evolutionary algorithms (EAs). An EA mimics natural evolution with a population in generational iterations to search for feasible and optimal solutions to NP-hard problems [1]. In dealing with these problems, parallel evolutionary algorithms (PEAs) have become increasingly popular [2]. Intuitive parallelism is to divide the population of EA into a number of sub-populations and map them onto multiple processors that work concurrently. It partitions the potential solution space, enhances global search for multi-peak problems, and gives more room to maneuver for algorithm hybridization. So far, PEAs have seen many successes in solving complex optimization problems [3,4]. In recent years, three main models of PEA have been reported as design bases. These models are master-slave model, island model, and diffusion model [5]. Meanwhile, hierarchical hybrid models combining one or more of the basic models have also been reported for certain cases. Owing to the widespread use of multi-core computers and clusters, island model [6–8] has become the most common, in which each sub-population evolves in an independent processor as an “island”. The “islanders” interact periodically via individual migration, in accordance with a pre-defined topology. The resultant communication overheads are generally lower than that in master-slave model and diffusion model [9,10]. Owing to the structure of island model, individual migration policy and island topology are the most critical elements in determining the efficiency of a PEA. Migration policy controls the migration frequency, the number of migrating individuals, the individual replacement rule, and the synchronization of sub-populations [1,11]. Much research has been reported on designing a migration policy in various scenarios, where certain offline schemes [3,12,13] and online strategies [14–16] are established not only to set the migration policy, but also to adaptively adjust key algorithmic parameters of sub-populations during the runtime. Island topology is also an important factor of PEA in determining the neighbors of each sub-population Yuanjun Laili and Lin Zhang are with School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China. (e-mail: [email protected], [email protected]). Yun Li is with School of Computer Science and Network Security, Dongguan University of Technology, Dongguan, 523828, China, and with Faculty of Engineering, University of Strathclyde, Glasgow, G1 1XJ, U.K. (e-mail: [email protected]). Lin Zhang and Yun Li are the corresponding authors.

for individual exchanges [17]. The most commonly used ones are ring [18], mesh [19], full-mesh [43], and star topologies [20]. Generally, the island topology of a PEA is not easy to determine optimally, as communication objects of each sub-population are difficult to determine during the runtime. There are two major reasons for this. First, the correlation between the state of evolution and the topology is difficult to evaluate quantitatively. Second, the implementation means of a specific topology in a PEA is normally fixed. To deal with the above problems, studies on random topologies [21,22] and graph-based dynamic topologies [23,24] have been carried out. Those topologies are randomly changed during the iteration and then adapted to the problem structure [25]. However, the neighbors of each island need to be recalculated and broadcast according to the new structure in every iteration. This takes a long time, resulting in performance degradation on the parallel evolution. Today, the design of an efficient PEA with a low communication overhead remains a challenge. One attempt to address this issue has been to tailor a PEA to the characteristics of the problem being tackled [26,27]. Another has been to assign multiple problem-dependent heuristics to a sub-population. A third approach has been to adapt the size of sub-populations. Nevertheless, the migration policy and topology are generally kept unchanged. The deficiency of these problem-specific PEAs is that once the problem objectives or constraints change, the algorithm is hard to cope or adapt. This issue is partially addressed using memetic algorithms and hyper-heuristics with multiple EAs [28]. The most common design is to allocate a group of operators or memes (i.e., local search strategies) to different islands directly and let them interact with one another via individual migration [26,29–32]. However, it requires an extra algorithm selection process to update individuals inside each sub-population, which will largely degrade the parallel efficiency as well. For a multiple-EA-based PEA, as its operators and upper-layer adaptation rules inside each sub-population are uniform, the search performance would be lower than those of the underlying EAs if the islands are not well balanced. Therefore, research on self-adaptation of island topology and dynamic selection of multiple EAs is imperative for PEAs. To extend the current research, this paper develops a parallel transfer evolution (PTE) scheme to structure a flexible PEA based on our previous work [77]. At the topological level, the partial connection between sub-population pairs is transferred adaptively during each generation. At the algorithmic level, a flexible approach is proposed to transfer superior operators through the transferred connection and to enhance the search capability of the sub-populations. The communication way between two sub-populations are adaptively controlled by a new evolutionary metric. The total communication overhead with only one single connection is maintained to be the minimum and the diversity of both sub-populations and the evolutionary operation are also preserved. For applications, this paper focuses on permutation-based combinatorial optimization, where multiple variables of the problem form a permutation such as in the case of a scheduling, assignment, or routing problem. The remainder of this paper is structured as follows. Section 2 gives a brief review on the state-of-the-art PEAs. In Section 3, the overview of PTE and the details of topological and algorithmic transfers are provided. The PTE is then fully tested with the combination of several classical evolutionary operators on various benchmarks and on a real-world virtual channel scheduling problem found in communication system, in Sections 4 and 5, respectively, giving comparisons with both traditional EAs and PEAs. Conclusions are drawn and potential future work is highlighted in Section 6. II. STATE OF THE ART OF PEAS In recent years, EA is gradually introduced to optimize parallel computing tasks and accelerate their efficiency on different multicore platforms [78][79]. However, due to the iterative nature of EA, PEA is hard to optimize. According to the number of EAs adopted in PEAs, existing research has focused mainly on two aspects to construct efficient PEAs progressively. These are the design of PEA with a single EA and the adaptation of PEA with multiple EAs. A. The design of PEA with a single EA Island model reveals that migration policy and cooperative topology are two crucial factors in the design of a PEA [2,33]. (1) Migration policy

For relatively simple problems, a linear or near linear speed-up can be achieved, owing to even divisions of population and solution space [34]. For example, Alba [4] has summarized and classified performance evaluations on parallelization, and has given instances to show that a linear speed-up is possible in a PEA, although the population division reduces the search capability of each sub-population. Considering the diversity collapse phenomenon that results from the high-fitness immigrants [35], Alba and Troya [36] studied the influence of random emigration on population diversity and suggested when to use fitness-based or random emigration at different states of evolution. Qian et al. [37] further introduced parallel processors to generate new individuals for multi-objective optimization and adopted a merge strategy to accelerate the comparisons in updating a Pareto archive. With low communication overhead between processors, this method was proved to be approximately linear both in theory and in practice. (2) Cooperative topology To obtain higher collaborative capability and search quality during parallel search, Cantú-Paz [11] introduced the concept of selection pressure and takeover time to evaluate the diversity and convergence of the entire population. Given the PEA topologies reported in [18–20], Matsumura et al. [17] compared them and concluded that ring topology would simultaneously guarantee high population diversity and information diffusion with a single migration policy. However, Hijaze and Corne [38] and Wang et al. [39] applied these topologies to distinct cases and showed that the influence of each topology varies according to the context. In view of the performance limitations of a fixed topology, Giacobini et al. [40] investigated small-world graphs and scale-free graphs as new candidate topologies for the construction of sub-populations. In addition, Li et al. [41] introduced β-graph-based network topologies and discussed their construction, complexity, and diversity. Liu et al. [42] have established an optimal r-regular graph topology for particle swarm optimization (PSO) and proved its efficiency both theoretically and practically. (3) Adaptation in migration policy and cooperative topology It has become clear that a uniform migration policy and topology cannot usually offer efficient collaboration among islands, as the population states at different search stages are different. Therefore, adaptive strategies in both migration policy and island topology are desirable. For a migration policy, Noda et al. [14] provided a series of knowledge-based rules to guide the selection and replacement of migrants. Lardeux and Goëffon [15] proposed a dynamic strategy to control the migration probability based on a complete graph. Following these efforts, Yang and Tinos [43] provided an elite set, instead of a random or a high-fitness-based migration strategy, to determine which individuals to exchange. Further, Araujo and Merelo [16] applied entropy as a representation of diversity and tested various adaptive migration policies in accordance with the distance between the migrant and the target island. In addition, Zhan et al. [44] proposed a mean-fitness-rank-based approach to migrate individuals from poor-performing populations to better-performing ones, so as to maintain a balanced search pace in the entire population. Their results provide comprehensive insight into the setting of a migration policy. In addition, Whitacre et al. [45] attempted to make the topology co-evolve with the population by locality and interaction epistasis. Arnaldo et al. [25] placed an emphasis on the importance of topology and hence attempted to match various island topologies to the problem structure by varying the topology at runtime for the first time. However, a drawback of this adaptation is that the topology-generating and co-evolving process takes a long time and a large amount of memory in a normal parallel programming environment, such as the message passing interface (MPI). To be specific, when the topology has changed, the algorithm needs to recalculate, store, and broadcast the communicating neighbors for each sub-population in every iteration. Hence, this method is inefficient and is seldom applied in practice. To reduce the communication overhead and improve the exchange dynamics between islands, Tao et al. [46] developed an adaptive pre-detection mechanism based on a full-mesh topology. This efficiently reduced the communication overhead in each iteration and simultaneously enhanced the search capability of the PEA developed therein. B. The adaptation of PEA with multiple EAs As a growing number of EAs have been developed in recent years, researchers have integrated multiple EAs in concurrent islands to realize parallel hybridization. The earliest memetic algorithms were developed based on this idea [47,48]. Subsequent representative parallelization work follows this

approach, and divides a population into islands in order to apply adaptive selection of memes to fine-grained individuals [29,49,50]. However, the migration policy and topology are both static. Although multiple EAs are collected and the algorithm selection strategy for individuals is pre-designed [51–55], most of these schemes are inefficient and inflexible for two reasons. First, with population divisions the search capability of a general PEA cannot be well maintained compared to its serial counterpart. Second, the strategies for both adjusting the action scope of algorithm candidates and the size of sub-populations will result large computational overhead in different processors. With additional topological adjustment, such as the complex graphs in [40-42], the speed-up will be largely degraded. Studies so far on how to adapt a topology dynamically to implement flexible and high-speed parallel search are very limited. Without a suitable algorithm adaptation mechanism for a PEA of multiple EAs, algorithms applied to specific problems will result in uneven and repeated search in multiple sub-populations. No matter how far the migration policy is explored, the search scope and diversity of PEA are restricted, as its efficiency and flexibility with multiple under-layer EAs are far from fully exerted. III. THE PARALLEL TRANSFER EVOLUTION SCHEME In this section, the framework of PTE being proposed is illustrated. Dynamic topological transfer and algorithmic transfer are elaborated following this framework. The evolutionary states used in PTE are also analyzed. A. Main structure of the PTE The basic structure of PTE is established as shown in Fig. 1. The execution process consists of three main steps: (1) sub-population evolution, (2) topological transfer, and (3) algorithmic transfer.

Initialization

Evolutionary operations

……

Evolutionary operations

Local search operations

……

Local search operations

Evolutionary state update

……

Evolutionary state update

Stopping criteria satisfied

No

……

Yes

Stopping criteria satisfied

Algorithmic transfer Groups communication

No

Topological transfer

Yes End Fig. 1 The schematic of PTE

(1) Sub-population evolution: This refers to the evolutionary operations combined with certain local search heuristics for producing a new sub-population in each generation. Inspired by memetic computing [49][50], the evolutionary operators are primarily applied for exploration, while the local search heuristics are adopted to exploit better solutions in a randomly located local area and therefore enhance the search

capability. (2) Topological transfer: To minimize the communication overhead in each period of exchange, the number of connections among the sub-populations is restricted to 1. The topological transfer then means to delete the existing connection and create a new one between another sub-population pair according to the updated evolutionary state. (3) Algorithmic transfer: Instead of using an upper-layer algorithm selection mechanism for each sub-population, an algorithmic transfer is designed to immigrate superior evolutionary operator from the dynamic connected neighbor along with the individual to be migrated. B. Evolutionary states and the process of PTE Despite parameter tuning in a single EA or the algorithm adjustment in multiple EAs, evolutionary states are of significant importance in both performance and evaluation control. The most commonly used states for a population include the best fitness ever found in the evolution process (BF), the number of generations for unchanged best fitness (UN), the variance of fitness values (VF), the convergence degree (CD), and the distance between two individuals (D). Assuming the number of decision variables of a combinatorial optimization problem is n, calculating the distance requires O(n2) operations. Therefore, only BF, UN and CD are applied in the parallel evolution to alleviate computing overhead. It assumes that the best, the average and the worst fitness values of a sub-population for minimization problem in generation t are f min (t ) , f (t ) and f max (t ) , respectively. Fmin and Fmax are the best and the worst fitness values that have been found ever by the specific sub-population. Then, the above states can be calculated as follows. (1) BF  min f min (t ) ,

VF (t )  ( f max (t )  f (t ))2  ( f (t )  f min (t ))2 / ( Fmax  Fmin   ) ,

(2)

(3) CD  VF (t  1) / VF (t ) , In Eq. (2),  is a small number that used to avoid the ‘division by zero’ error when Fmax  Fmin . Among the states, VF reflects the diversity of the current population, while UN and CD measure the convergence of the search process. As calculating the diversity of genotypes in a population is time-consuming, CD involves only the diversity of phenotypes. When the fitness values of the individuals are getting closer, CD will be larger than 1. On the contrary, when the converged individuals are leaded to new fitness values, CD will be smaller than 1. Usually only a slight variation would indeed happen to CD. According to CD, only the states of the recent two generations are reflected. UN offers another perspective on the convergence of the entire evolutionary process. It is used as the main observer to see whether the population is keeping evolving. But UN is not the only indicator to reflect the diversity of a population. Even if the global best fitness value does not change for a few generations, the fitness values of other individuals might be changed dynamically. As a supplement, these changes is recorded by CD. Therefore, an accumulated convergence measure, GM, is defined as the control state for the following step. It is calculated in Eq. (4). (4) GM  (1  UN )  CD . When BF is updated, UN  0 , and GM represents only the diversity of the current generation. Conversely, if UN  0 , then GM reflects a convergence degree of the iterative process. When the phenotypes of population is becoming more distributed or some new phenotypes is found, GM would be smaller. If GM  UN , it can be observed that both the best fitness value and other phenotypes are converged to a local point. It should be noted that there are many other metrics that can be used to evaluate the diversity of a population. Therefore, Eq. (2) can be replaced by other diversity formula to guide the following evolution. Based on the above metrics, the pseudocode of PTE is demonstrated in Algorithm 1. Algorithm 1: The process of PTE: t: the sum of function evaluations Step 1 For each sub-population i Step 2 Initialize the sub-population

Step 3 Randomly assign an evolutionary operator and a local search heuristic for the sub-population Step 4 Calculate VFi (t ) and BFi Step 5 End for Step 6 While t does not reach the threshold Step 7 For each sub-population i Step 8 Apply the selected evolutionary operator to generate offspring Step 9 Apply the selected local search heuristic to update the offspring Step 10 Count the number of function evaluations in evolutionary operation and local search Step 11 Calculate VFi (t ) , BFi and GM i Step 12 End for Step 13 Collect the sum of function evaluations in this generation, termed as t’ Step 14 t  t  t ' Step 15 Perform rank record according to Algorithm 4 Step 16 Replace all of the old individuals by the new offspring Step 17 Perform communication preset according to Algorithm 2 Step 18 Perform topological transfer according to Algorithm 3 Step 19 Perform evolutionary operator configuration according to Algorithm 5 Step 20 Perform local search heuristic configuration according to Algorithm 6 Step 21 If the optimal solution has been found Step 22 Terminate and output the solution Step 23 End if Step 24 End while Step 25 Output the final solution For simplifying the evolutionary process and reducing the communication time, it allows only one migrant in each communication period. The above states is then applied to determine whether the best individual or a random one is to be sent out, as illustrated in Algorithm 2. The migration policy is that the immigrant is always introduced to replace the worst individual in the target sub-population. Algorithm 2: Communication preset: Step 1 If rand ()  2 / (1  e  GM )  1 Step 2 Set the best individual as the migrant Step 3 Else Step 4 Randomly select an individual as the migrant Step 5 End if In the step of communication preset, GM is saturation-scaled by a modified sigmoid function within the interval [0,1). When the current sub-population has high phenotype diversity, it prefers to migrate a random individual to increase the diversity of another sub-population. When the current sub-population has low phenotype diversity, the individual with the best fitness value will be adaptively sent to the others to maintain the solution quality. C. Topological transfer Among the typical topologies of PEA, ring topology has been seen as the most efficient, which can simultaneously guarantee a high population diversity and information diffusion with the same migration policy [17]. However, it appears that only the predominant migrant can produce useful impact on a specific sub-population. Other less-competitive migrants introduced during periodic communication will be replaced quickly by the locally generated new individuals. Therefore, certain connections are unnecessary. As only the best migrant has a major impact on the search, the removal of other connections has almost no negative impact on the solution quality or convergence, and still produces a positive impact on acceleration owing to the decreased load in model communication. This means it only needs to migrate the predominant migrant in each communication period to one of the other groups. The migration destination can be randomly picked. Based on this analysis, a connection transfer mechanism is developed, as illustrated in Fig. 2.

c+3

……

m-1

c+2

m

c+1

c

1

c-1

2

c-2

……

3

c'+3

……

m-1

c'+2

m

c'+1

c'

1

c'-1

2 c'-2

……

3

Fig. 2 Illustration of the connection transfer between sub-population pairs

Assume there are m sub-populations in total and the c-th sub-population holds the best individual obtained thus far till the current generation. Borrowing the virtue of ring topology, the c-th sub-population is designated as the sender. The receiver is selected in a random ergodic manner by simply increasing the serial number of the remaining sub-populations. Then a single directed connection from the sender to the receiver is generated. In the next period of communication, a new sender c’ which holds the global best individual and a new receiver whose serial number is near the last one are selected. A new directed connection will be created between the two nodes, while the original one will be deleted simultaneously. It can be seen as the transfer of a single communication connection between two pairs. No matter how the position of the predominant sender changes, each sub-population has the opportunity to communicate with the global best sequentially during a certain period of time. The information propagation speed is exactly the same as in traditional ring topology, which is m times of communication at most. The additional computational load of finding the sender to which the most prominent individual belongs is taken by the source processor, i.e., the first processor. In finding the global best fitness value from m fitness values collected from all sub-populations, the additional computation complexity is only O(m). More importantly, the communication load in each period is reduced to O(m+n). That is, only m fitness values and a migrant with n variables are passed from the transferred connection in each communication period. This complexity is much lower than that in conventional and other dynamic topologies. To better understand the topological transfer process, its pseudo-code is shown in Algorithm 3. Algorithm 3: Topological transfer Step 1 Collect the fitness value of the best individual f x in each sub-population x to the old sender c; Step 2 Find and broadcast the serial number of the sender c '  arg min f x , x  [1, m] ; Step 3 Set the serial number of the receiver as d’ = (d + 1) mod m and broadcast it; Step 4 Send the global best individual and the operator indicator with the highest rank from the sender c’ to the receiver d’; Step 5 Replace the worst individual by the migrant in d’. Here, d and d’ represent the receivers of the last communication period and of the current period, respectively. It can be initially set as a randomly selected processor. In each period of communication, the receiver is changed one by one, as shown by the dashed lines in Fig. 2. Algorithm 2 is executed only in the sender and the receiver. Others will hand over their best individual to start the next generation independently.

D. Algorithmic transfer Algorithmic transfer is established based on the above topological transfer. To record the performance of the under-layer evolutionary operators and find the superior one to be transferred, the tabu strategy presented by Burke et al. [54] is introduced for each sub-population. It assumes that there are NE evolutionary operators in PTE and the population is divided into N sub-populations. In sub-population i, a rank vector Ri  {Rik | i  [1, N ], k  [1, N E ]} and a tabu vector Li  {Li k | i  [ 1N, ]k, [N E1 ,are ] } defined to record the ranks and states of operators in the step of evolutionary state update, as shown in Algorithm 4. Rik represents the rank of the kth evolutionary operator in the ith sub-population. Lik defines that whether the kth evolutionary operator is tabooed in the ith sub-population. Lik  0 means the specific operator is not tabooed. Otherwise, it is tabooed. It should be noted that only one operator is selected for a sub-population in each generation. The operator with the highest rank in the sender will be passed accompanied by the emigrant individual to the receiver. The receiver can decide autonomously whether to apply the immigrant operator and individual or not, as demonstrated in Algorithm 5. Algorithm 4: Rank record: Step 1 For each sub-population i Step 2 If BFi is updated

Rik  Rik  1 Else Rik  Rik  1 and Lik  1 End if If all operators are tabooed For k = 1 to NE Lik  0 End for Randomly assign an operator as Ei Else Set Ei be the operator with the highest rank which is not tabooed, i.e., max Rik , k  [1, N E ], Lik  0 Step 14 End if Step 15 End for Algorithm 5: Evolutionary operator configuration: Step 1 For each sub-population i Step 2 If the fitness value of Ii is better than BFi Step 3 Adopt Oi for the next generation Step 4 Else Step 5 Adopt Ei for the next generation Step 6 End if Step 7 End for In the pseudo-code of Algorithms 3 and 4, BFi represents the best fitness value of sub-population i. For Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 Step 10 Step 11 Step 12 Step 13

a minimization problem, BFi is Fmin,i . For a maximization problem, it becomes Fmax,i . Ii represents the immigrant of the sub-population i. Here, Ii is an n-dimensional vector to represent a solution. Oi represents the operator with the highest rank in the sub-population which provides the immigrant Ii. Ei is the operator that holds the highest rank in the receiver. Different from the strategy in [54], Ei is not directly used in the next generation, but sent to the neighboring group in the step of communication as Oi. With this mechanism, the sub-populations are capable of exchanging good operators in each period and quickly eliminating weak operators for different sorts of problems. The selection of evolutionary operators in this way is included in the communication. The transformation of only one index number will not increase the communication complexity, but simplify the selection process and enhance the search efficiency. Following the information exchange, each sub-population will find an operator according to both the

local performance records and the incoming algorithm indices for the next generation. The pseudo-code is illustrated in Algorithm 5. The time complexity of the rank record in steps 1–6 of Algorithm 4 is O(1). From step 7 to step 11 of Algorithm 4, the computational complexity is O( N E ) due to the parallel nature of sub-population. Additionally, Algorithm 3 uses two extra lists with length NE to support the rank. Hence, both the computational complexity and the space complexity of algorithmic transfer including Algorithms 4 and 5 are O( N E ) . To further improve the search efficiency of sub-population, local search heuristics are introduced as well. Local search is often used as a complementary component to enhance the exploitation of EA. It is able to bring more neighborhood information to accelerate the evolutionary pace of sub-populations. Without loss of generality, it assumes that NLS local search heuristics are collected after the evolutionary operation. Then, a random permutation-based mechanism as displayed in Algorithm 6 is brought to adjust several local heuristics for each individual in a sub-population. Algorithm 6: Local search heuristic configuration: Step 1 For each sub-population i Step 2 For each individual j Step 3 Get a random permutation Rperm from 1 to NLS Step 4 T  T0 , k  1 Step 5 While T  Tend Step 6

LSij  Rperm [k ]

Step 7 Step 8

Apply the No. LSij operator to individual j   fold (i, j )  f new (i, j )

Step 9

If   0 || rand ()  exp( / T )

Step 10

I ij,old  I ij,new

Step 11

T  T  Tdecay

Step 12 Step 13

End if If k  N LS

Step 14 Regenerate Rperm and set k  1 Step 15 End if k  k 1 Step 16 Step 17 End while Step 18 End for Step 19 End for In the above pseudo-code, Rperm represents a randomly generated permutation. T, T0, Tend and Tdecay represent the current annealing temperature, the initial temperature, the end temperature, and the decay rate, respectively. Further, f old (i, j ) and f new (i, j ) represent the old and new fitness values, respectively, of individual j in sub-population i before and after the local search. I ij,old and I ij,new represent the old individual j and the new individual j in sub-population i before and after the local search. Observably, this is a classical random permutation selection strategy combined with an annealing rule to control step length. There are two reasons for applying such a random strategy. For combinatorial optimization, the shape of solution space is usually irregular and even unknown. When it reaches a point in the solution space, which kinds of local search heuristics should be used to search the near range is unknown without problem-dependent information. More importantly, a local search heuristic that is suitable for one point in its neighborhood may not adaptable for another because they are located in entirely different landscapes. Therefore, passing the local search heuristic that performs well in a sub-population to its neighbor seems meaningless. Following the two-layer operations, i.e., evolutionary operation and local search operation, the stopping criterion is set as that the theoretical optimum is reached or as the maximum number of function evaluations is reached.

As the unilateral communication is driven in each generation, there is no control parameter in the upper layer of PTE. The three parameters for local search, i.e., T0 , Tend , and Tdecay , and the parameters for evolutionary operator are highly dependent on the problem to be addressed. Configuring each candidate by the tuners such like F-Race [80], Revac [81], and CRS-Tuning [82] for every problem is obviously a non-trivial task. To study the performance of topological transfer and algorithmic transfer in PTE, these underlying parameters are fixed for simplicity. This ensures a steady performance for each candidate. It should be noted that local search is not a necessary part in the framework of PTE if the candidate evolutionary operators are capable of operating a balanced exploration and exploitation. Likewise, local search heuristics can also be replaced by a group of problem-related rules. In order to promote a short local search time, the above initial temperature and the end temperature are set as T0  10 and Tend  0.1 with a fast decay rate Tdecay  0.9 in this paper. Such setting cannot guarantee the best local search performance for all problems, but can be a feasible scheme to enhance the quality of parallel evolution. In general, the time complexity of an evolutionary operator is dynamically varied with different problems. Let g Ei and gLS j be the complexity of the ith evolutionary operator and the jth local search heuristic, respectively, the complexity of the evolutionary operation be max g NSi , and the complexity of the local search operation be max gEi . Furthermore, the size of the sub-populations is set as Nsub uniformly. The complexity of topological communication and evolution process are O(m  n) and

O( NE  max gEi )  O( Nsub NLS  max gLSi ) . In short, PTE is defined as a parallel scheme to integrate multiple evolutionary operators and local search heuristics in a collaborative form, and use them adaptively to solve different problems. There is no specific communication topology, but only a unidirectional transmission between two sub-populations. Hence it is easy to implement in different platform environments and adaptable to diverse applications. Users only need to link the fitness evaluation function with related to a particular problem to the program and specify the number of processors to launch it. IV. EXPERIMENTAL TESTS ON TWO COMBINATORIAL OPTIMIZATION PROBLEMS In this section, the performance of PTE is comprehensively tested on a generic combinatorial optimization problem, the job-shop scheduling problem (JSP), which is often seen in the manufacturing industry [66]. It is also tested on a second generic combinatorial optimization problem, the quadratic assignment problem (QAP) [72]. The JSP is a problem to search for an effective dispatch sequence with a minimal machining makespan C. Given n jobs J1 , J 2 , , J n of varying sizes, each job consists of a certain number of operations, which should be performed by m identical machines. It assumes that O(i, j ) is the operation of job j processed by machine i, pij is the processing time of O(i, j ) , C ij is the completion time of O(i, j ) , and M j is the set of machines by which job j is processed. The objective is

Min Cmax

(5)

s.t. Cmax  Cij , Cij  pij  Ckl , Cij  pij  0 ,

Cij  pij  Ckj or Ckj  pkj  Cij , i, k  M j , i  k , Cij  pij  Cil or Cil  pil  Cij , j  l .

The QAP is a combinational optimization problem in which n facilities need to be duly located among n locations. Given a set of facilities P and locations L, c( p1 , p2 ) represents the commodities of a certain flow between facilities p1 and p2 , and d (l1 , l2 ) represents the distance between locations l1 and l2 . The main target is to minimize the sum of the distances multiplied by the corresponding flows, i.e., the total cost between different locations. The objective is

Min



p1 , p2 P

c( p1 , p2 )  d ( f ( p1 ), f ( p2 )) .

(6)

A. Experimental settings To solve a generic, permutation-based combinatorial optimization problem, integer coding scheme is adopted to represent solution phenotypes in evolution. PTE is capable of being configured with existing evolutionary operators. Therefore, 12 such EAs listed in Table 1 are used as the candidates in the scheme, with explanations of acronyms used hereafter. The learning operators of PSO, CMPSO, and five types of DE algorithms are replaced with the “swap operator” and “swap sequence” recommended in [64] to ensure that the new individual is a complete permutation sequence. For the same reason, a two-point swapping mechanism is applied as the basic operation to HS, ILS, and VNS. TABLE 1 Evolutionary algorithm examples used in the PTE Evolutionary algorithm Genetic algorithm [56] with swap sequence and swap operator Genetic algorithm with niched strategy [57] Particle swarm optimization [58] Particle swarm optimization with Cauchy mutation [59] Differential evolution with rand/1 mutation [60] Differential evolution with best/1 mutation [60] Differential evolution with rand/2 mutation [60] Differential evolution with best/2 mutation [60] Differential evolution with target-to-best/1 mutation [60] Harmony search [61] Iterative local search [62] Variable neighborhood search [63]

Abbreviation GA NGA PSO CMPSO DE1 DE2 DE3 DE4 DE5 HS ILS VNS

PTE is designed to be able to utilize the nine local search heuristics reported in [65], which are listed in Table 2. We assume that the length of block in the local search heuristics is no more than n and is also randomly generated within the interval [2, n] on every call. The maximum number of function evaluations is set to 105 n . TABLE 2 Local search heuristics [65] used in the PTE Strategies select two points and swap them select a point and insert it in the head of the sequence select a point and insert it in the tail of the sequence Exchange two selected blocks in a sequence Insert a selected block in the head of the sequence Insert a selected block in the tail of the sequence Swap a point with the maximal or the minimal value Exchange two adjacent points with a given probability from the beginning to end successively Select a piece from the sequence and inverse it

Abbreviation Swap Pre-insert Pos-insert Swap-block Pre-block-insert Pos-block-insert Swap-max/min Rand-circle Inverse

All the simulations are coded in C++ and MPI, and tested in a small cluster with four servers. The hardware configuration of each server is Intel® Xeon® E3-1220 v3 3.1-GHz CPU with 32 GB of 1.6-GHz DDR3 RAM. There are 16 cores in total. The proposed algorithm will be tested under different population sizes, i.e., 32 and 128, respectively. All of the tests are run 20 times on two processors, four processors, eight processors, and 16 processors, respectively, to see the speed-up and solution quality of these parallel schemes when the number of processors are doubled. The best fitness values and the CPU times for each problem are recorded and compared with the best-known solutions (BKSs). TABLE 3 COMPARISON OF RING-PEA, PTE/AT, AND PTE IN JSP WHEN THE POPULATION SIZE IS 32 (20 RUNS) Best /Time LA21 LA22 LA23 LA24 LA25 LA26 LA27 LA28 LA29 LA30 LA31 LA32 LA33 LA34

2 1072 935 1032 961 994 1218 1269 1234 1211 1355 1784 1850 1719 1721

Ring-PEA 4 8 1061 935 1032 961 986 1218 1262 1234 1210 1355 1784 1850 1719 1721

1055 930 1032 945 992 1218 1257 1226 1210 1355 1784 1850 1719 1721

PTE/AT

PTE

16

2

4

8

16

2

4

8

16

1073 932 1032 941 992 1218 1259 1216 1206 1355 1784 1850 1719 1721

1068 935 1032 954 992 1218 1262 1232 1211 1355 1784 1850 1719 1721

1046 932 1032 941 986 1218 1256 1226 1210 1355 1784 1850 1719 1721

1046 930 1032 940 986 1218 1256 1216 1205 1355 1784 1850 1719 1721

1055 932 1032 940 990 1218 1263 1226 1213 1355 1784 1850 1719 1721

1046 932 1032 939 984 1218 1257 1222 1206 1355 1784 1850 1719 1721

1046 927 1032 935 977 1218 1249 1221 1196 1355 1784 1850 1719 1721

1046 927 1032 935 977 1218 1249 1216 1192 1355 1784 1850 1719 1721

1046 930 1032 940 986 1218 1256 1221 1196 1355 1784 1850 1719 1721

LA35 LA36 LA37 LA38 LA39 LA40 P of W-test

1888 1296 1435 1254 1259 1250

1888 1291 1432 1241 1256 1246

1888 1287 1424 1237 1256 1243

1888 1296 1435 1253 1259 1250

1888 1296 1435 1231 1259 1250

1888 1291 1426 1230 1254 1249

1888 1287 1422 1222 1243 1239

1888 1296 1424 1231 1256 1246

1888 1291 1422 1237 1252 1244

1888 1268 1418 1222 1248 1233

1888 1268 1418 1215 1248 1233

1888 1287 1422 1222 1252 1244

0.002

0.002

0.002

0.006

0.008

0.003

0.014

0.003

-

-

-

-

TABLE 4 COMPARISON OF RING-PEA, PTE/AT, AND PTE IN JSP WHEN THE POPULATION SIZE IS 128 (20 RUNS) Best /Time LA21 LA22 LA23 LA24 LA25 LA26 LA27 LA28 LA29 LA30 LA31 LA32 LA33 LA34 LA35 LA36 LA37 LA38 LA39 LA40 P of W-test

2

Ring-PEA 4 8

PTE/AT

PTE

16

2

4

8

16

2

4

8

16

1077 935 1032 963 994 1218 1269 1238 1216 1355 1784 1850 1719 1721 1888 1301 1437 1258 1262 1259

1067 935 1032 956 992 1218 1269 1238 1212 1355 1784 1850 1719 1721 1888 1297 1433 1251 1259 1259

1061 932 1032 954 992 1218 1259 1234 1212 1355 1784 1850 1719 1721 1888 1296 1433 1241 1259 1250

1061 932 1032 945 991 1218 1259 1234 1206 1355 1784 1850 1719 1721 1888 1296 1427 1241 1256 1250

1072 935 1032 951 991 1218 1271 1242 1214 1355 1784 1850 1719 1721 1888 1297 1435 1250 1262 1252

1066 932 1032 951 989 1218 1269 1238 1212 1355 1784 1850 1719 1721 1888 1297 1428 1241 1254 1252

1063 932 1032 947 986 1218 1263 1234 1210 1355 1784 1850 1719 1721 1888 1296 1428 1237 1254 1250

1055 932 1032 941 968 1218 1256 1227 1206 1355 1784 1850 1719 1721 1888 1291 1422 1237 1254 1249

1055 935 1032 947 992 1218 1269 1238 1206 1355 1784 1850 1719 1721 1888 1296 1435 1237 1259 1257

1063 930 1032 940 990 1218 1269 1227 1206 1355 1784 1850 1719 1721 1888 1287 1422 1231 1254 1252

1055 927 1032 935 986 1218 1259 1227 1205 1355 1784 1850 1719 1721 1888 1287 1422 1222 1252 1250

1046 927 1032 935 977 1218 1256 1221 1205 1355 1784 1850 1719 1721 1888 1268 1418 1222 1252 1243

0.007

0.003

0.005

0.002

0.052

0.011

0.005

0.029

-

-

-

-

B. Results and discussions in solving JSP and QAP The best fitness values of PTE for 20 JSP benchmark instances (i.e. LA21-LA40) are summarized in Tables 3 and 4. To compare the efficiency of topological transfer and algorithmic transfer, the PEA with a ring topology and a random algorithm selection mechanism in each sub-population is tested and termed as Ring-PEA in this paper. Ring-PEA is a typical parallel scheme with fixed topology and operators for sub-populations. Similarly, the PEA with only topological transfer and random selection of the evolutionary operators in sub-population is also tested and termed as PTE/AT. In addition, the local search heuristics with a random selection strategy (shown in Algorithm 5) are applied in both Ring-PEA and PTE/AT to make sure that they are tested under the same conditions as PTE. The best results and average CPU times of Ring-PEA, PTE/AT, and PTE on 20 JSP instances (i.e., LA21–LA40) are recorded. The boldface in the tables indicates that the best known solution is found within a specific time. When the population size is 32, the performance of Ring-PEA is seen as the worst. Only the 6 simplest instances (LA23, LA26 and LA30–LA35) are well solved with the best known solutions (i.e. BKSs). When topological transfer is implemented, PTE/AT performs much better than the original Ring-PEA. As observed in Table 3, nine instances are solved with BKSs by PTE/AT. When algorithmic transfer is implemented, the performance of PTE is further enhanced. 14 instances are well solved by PTE with BKSs. To examine the differences between the other PEAs and PTE, pair-wise Wilcoxon-tests (abbreviated as W-tests) are carried out at a significant level of   0.05 . By using Bonferroni correction, the pair is significantly different only if the p-value is less than 0.05/8=0.0063. The statistical test results of each kind of PEA are compared in a pairwise manner with those obtained by PTE with the same processor number and are listed in the last two rows of Table 3. With 95% confidence, PTE performs significantly better than Ring-PEA. When the number of processors is 8, the solutions of PTE are close to those of PTE/AT. But when the number of processors is 2, 4, or 16, PTE performs significantly better than PTE/AT. When the processors are set to two, two sub-populations exchange individual and operator with each other in every generation. The population diversity will be reduced by frequent individual replacement. Therefore, the parallel performance can be poorer than the serial evolution of the corresponding

algorithm. PTE does not perform very good. However, as the number of processors increases, the population diversity will be enhanced gradually. On one hand, lesser individuals can join in another group. On the other hand, more sub-populations allow more operations to be executed simultaneously, and thus brought about dynamic search in which excellent operators are easier to survive. As a result, the solution quality is substantially enhanced. When the number of processors increases, the solution quality in small-scale cases is fully maintained with a reduced CPU time. For large-scale cases, the solution quality first rises and then falls. In particular, a sub-population of fewer individuals does not necessarily lead to an accuracy loss while improving the search performance. For all instances, when the processor number is increased from 2 to 8, the search precision of PTE improves constantly. However, with a fixed population size (i.e., 36), only two individuals are left in each sub-population when the processor number increases to 16, and hence the search capability of each single sub-population is reduced. From Fig. 3(a) it can be seen that the CPU times of PTE/AT and PTE are significantly shorter than those of Ring-PEA along with the increase of processor number. For the rest instances shown in Fig. 3(b), the speed-up of PTE is near-linear and close to that of PTE/AT, while Ring-PEA requires more CPU times along with the increase of parallel processors. Specifically, the average CPU time of PTE/AT with two parallel processors is similar with which of Ring-PEA. As the number of processors continues to increase, it is further reduced to about 61% to 80% of Ring-PEA. The overall execution time of PTE is maintained at the same level with PTE/AT.

(a) (b) Fig. 3 Boxplots corresponding to the average CPU times (s) of PTE on two groups of JSP instances when the population size is 32, (a) PTE on LA23, LA31–LA33, and LA35, and (b) PTE on LA21, LA22, LA24–LA30, LA34, and LA36–LA40.

When the population size is 128, the performance of the three parallel algorithms are slightly reduced as can be seen in Table 4. With fixed number of function evaluations, both the number of generation and the communication frequency are reduced. Repeat search is more likely to happen when the number of individuals in each sub-population increases. The exploitation in evolution is reduced as well. Although the overall performance degraded, the solution quality of each parallel algorithm does not fall back again when the number of processors reaches 16. PTE/AT and PTE still provide significantly better solutions that Ring-PEA. But the performance of PTE becomes more close to PTE/AT according to the pair-wise Wilcoxon-tests with Bonferroni correction shown in the last column of Table 4. In a word, increasing only the population size does not significantly enhance the performance of a PEA. More individuals in each sub-population may enhance the exploration capability, but will introduce more function evaluations, and thus lowering the search efficiency. To summarize the speed-up of the three parallel schemes with the increase of population size, their CPU times with 128 individuals are illustrated as boxplots in Fig. 4. For the 5 simple instances (LA23, LA31-LA33, and LA35) shown in Fig. 4(a), the average CPU times of the three parallel algorithms are almost doubled with the growing number of individuals in each sub-population. For the rest instances, the speed-up of PTE is greater than that of Ring-PEA, and is similar with that of PTE/AT. According to the median line of the boxplots, the overall execution times of the three parallel algorithms are maintained at the same level with the increase of population size. Algorithmic transfer does not bring much extra time,

but significantly improve the solution quality.

(a)

(b)

Fig. 4 Boxplots corresponding to the average CPU times (s) of PTE on two groups of JSP instances when the population size is 128, (a) PTE on LA23, LA31–LA33, and LA35, and (b) PTE on LA21, LA22, LA24–LA30, LA34, and LA36–LA40.

Moreover, the best solutions found by PTE with 36 and 128 individuals are combined and compared with those found by six EAs developed elsewhere specifically for the JSP. The experimental results, pairwise Wilcoxon-tests carried out on the six ad hoc EAs and PTE are shown in Tables 5–7, respectively. TABLE 5 SOLUTION CONSISTENCY OF PTE WITH PARALLEL PROCESSORS COMPARED TO SIX AD HOC EAS IN SOLVING JSP (20 RUNS) Nowicki Goncalves Aiex et Binato Sha et Zhang PTE et al. al. et al. al. et al. et al. BKS (1996) (2005) (2003) (2002) (2006) (2013) 2 4 8 16 [66] [67] [68] [69] [70] [71] 1046 1047 1057 1091 1049 LA21 1046 1046 1046 1046 1046 1046 927 935 960 932 LA22 927 927 927 927 927 927 1032 LA23 1032 1032 1032 1032 1032 1032 1032 1032 1032 935 939 953 954 978 940 939 LA24 935 935 935 935 977 977 986 984 1028 982 984 LA25 977 977 977 977 1218 1271 LA26 1218 1218 1218 1218 1218 1218 1218 1218 1235 1236 1256 1269 1320 1243 1256 1249 1249 1256 LA27 1235 1216 1232 1225 1293 1216 1222 1221 1221 LA28 1216 1216 1157 1160 1196 1203 1293 1163 1180 1206 1204 1192 1205 LA29 1355 1368 LA30 1355 1355 1355 1355 1355 1355 1355 1355 1784 LA31 1784 1784 1784 1784 1784 1784 1784 1784 1784 1850 LA32 1850 1850 1850 1850 1850 1850 1850 1850 1850 1719 LA33 1719 1719 1719 1719 1719 1719 1719 1719 1719 1721 LA34 1721 1721 1721 1721 1721 1721 1721 1721 1721 1888 LA35 1888 1888 1888 1888 1888 1888 1888 1888 1888 1268 1279 1287 1334 1274 1291 LA36 1268 1268 1268 1268 1268 1397 1407 1408 1410 1457 1408 1422 1418 1418 1418 LA37 1397 1196 1196 1219 1218 1267 1237 1222 1215 1222 LA38 1196 1196 1233 1246 1248 1290 1238 1252 1252 1248 1252 LA39 1233 1233 1222 1241 1244 1259 1224 1233 1244 1233 1233 1243 LA40 1229

When the number of processors is 4 to 16, PTE is better than the EAs proposed by Nowicki et al. [65] and by Binato et al. [68], with 95% confidence (according to Bonferroni correction, the p-value should satisfy p   / m  0.05 / 8  0.0063 ), similar to the EAs proposed by Goncalves et al. [66] and by Aiex et al. [67], although not as good as the EA proposed by Sha et al. [69]. With only a group of basic operators, the average CPU times of the PTE is much shorter than the others, as shown in Table 6. TABLE 6 COMPARISON OF WILCOXON-TEST RESULTS OF PTE RELATIVE TO SIX AD HOC EAS IN SOLVING JSP PTE vs 2 4 8 16

Problem

Nowicki et al. (1996) [66] 0.004 0.042 0.024 0.021

Nowicki et al.

Goncalves et al. (2005) [67] 0.307 0.130 0.033 0.358

Aiex et al. (2003) [68] 0.759 0.050 0.016 0.154

Binato et al. (2002) [69] 0.001 0.001 0.001 0.001

TABLE 7 TIMES (IN S) TAKEN BY PTE COMPARED TO SIX AD HOC EAS IN SOLVING JSP PTE Goncalves Aiex et Binato Sha et Zhang et al. al. et al. et al. al. 2 4

Sha et al. (2006) [70] 0.003 0.028 0.018 0.018

8

16

(1996) [66] -

LA21-25 LA26-30 LA31-35 LA36-40

(2005) [67] 602 1303 3691 1920

(2003) [68] -

(2002) [69] -

(2006) [70] 295 579 1462 471

(2013) [71] -

141.002 209.912 22.2784 339.452

117.102 164.365 18.3549 300.569

93.3873 142.717 16.8293 240.349

60.4407 110.540 12.9383 167.337

In summary, PTE performs the best when the processor number is 4 or 8 on a quad-core PC. As the number of islands increases, the transfer scheme automatically come into effect. The more processors are allowed, the more dynamic the search procedure becomes. TABLE 8 COMPARISON OF RING-PEA, PTE/AT, AND PTE IN QAP WHEN THE POPULATION SIZE IS 32 (20 RUNS) Ring-PEA 4 8

2 1852117 2509778 3263527 5103274 7473599 1856413 13752853 21738625 49044 278338 117032 153938 0.009

tai30a tai35a tai40a tai50a tai60a tai64c tai80a tai100a wil50 wil100 sko90 sko100a W-test

1847259 2509615 3259393 5152883 7458896 1857652 13601697 21735758 49000 278109 116774 153792 0.000

1832576 2509023 3247320 5071399 7450171 1859484 13623954 21664337 49002 278032 117180 153478 0.000

PTE/AT

PTE

16

2

4

8

16

2

4

8

16

1847259 2491852 3217531 5141750 7462543 1856453 13681244 21745582 49002 278144 117224 153654 0.000

1839817 2451839 3235219 5109128 7395215 1855928 13659838 21665199 49042 278164 117190 153832 0.212

1828299 2439526 3234812 5143075 7389817 1855928 13629978 21545726 48892 277662 116826 153210 0.000

1828112 2439526 3226731 5010049 7394016 1855928 13627368 21531437 49002 277500 116672 153626 0.001

1832751 2461532 3221289 5109539 7407223 1855928 13654931 21618125 48956 277356 116750 153762 0.004

1823135 2438329 3238560 5147015 7391345 1855928 13636377 21635769 48897 276659 116948 153841 -

1821056 2434551 3206459 5043459 7387519 1855928 13614458 21497643 48865 275531 116442 152984 -

1823521 2434664 3218664 5043166 7387182 1855928 13612845 21497329 48873 275763 116058 152633 -

1825967 2439526 3229871 5046897 7390278 1855928 13636457 21495571 48856 276612 116768 153430 -

TABLE 9 COMPARISON OF RING-PEA, PTE/AT, AND PTE IN QAP WHEN THE POPULATION SIZE IS 128 (20 RUNS) 2 tai30a tai35a tai40a tai50a tai60a tai64c tai80a tai100a wil50 wil100 sko90 sko100a W-test

1869580 2507502 3269172 5150597 7521129 1859481 13855911 21764654 49088 278942 117368 153938 0.004

Ring-PEA 4 8 1867485 2506050 3264240 5135374 7516322 1857651 13799138 21753917 49062 278469 117364 154390 0.002

QAP

BKS

Tai30a Tai35a Tai40a Tai50a Tai60a Tai64c Tai80a Tai100a Wil50 Wil100 Sko90 Sko100a

1818146 2422002 3139370 4938796 7205962 1855928 13499184 21044752 48816 273038 115534 152002

1858689 2505502 3258175 5125335 7455946 1855928 13728850 21735411 49002 278353 117260 153982 0.005

PTE/AT

PTE

16

2

4

8

16

2

4

8

16

1865473 2500375 3250593 5118452 7457432 1855928 13660865 21670976 49002 277056 117254 153836 0.003

1859817 2506315 3257578 5135345 7500316 1855928 13785951 21767515 49062 278455 117188 154454 0.004

1848299 2500375 3255985 5134538 7466379 1855928 13734164 21747693 49062 278208 117088 154108 0.013

1848112 2496890 3251214 5123218 7442153 1855928 13664612 21677536 49020 278005 116884 154116 0.003

1832751 2490217 3248942 5126173 7433497 1855928 13623658 21659254 48892 277553 116868 153614 0.003

1858829 2499592 3251511 5118563 7464948 1855928 13655911 21762149 49088 277940 117014 154174 -

1839007 2496868 3245240 5114564 7469693 1855928 13599138 21732196 49054 277812 116778 153520 -

1839217 2487162 3234773 5112051 7406161 1855928 13598850 21674314 49002 277751 116694 153350 -

1825229 2461829 3220842 5104091 7418776 1855928 13560865 21631986 48873 276905 116597 153342 -

TABLE 10 AVERAGE PERCENT DEVIATIONS OF QAP BY PTE PTE (Population size is 36) PTE (Population size is 128) 2 4 8 16 2 4 8 16 0.0027 0.0016 0.0029 0.0043 0.0224 0.0115 0.0116 0.0039 0.0067 0.0052 0.0052 0.0072 0.0320 0.0309 0.0269 0.0164 0.0306 0.0209 0.0246 0.0280 0.0357 0.0337 0.0304 0.0260 0.0405 0.0208 0.0207 0.0214 0.0364 0.0356 0.0351 0.0335 0.0251 0.0246 0.0245 0.0249 0.0359 0.0366 0.0278 0.0295 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0101 0.0085 0.0084 0.0101 0.0116 0.0074 0.0074 0.0046 0.0228 0.0211 0.0211 0.0210 0.0341 0.0327 0.0299 0.0279 0.0017 0.0010 0.0012 0.0008 0.0056 0.0049 0.0038 0.0012 0.0131 0.0090 0.0099 0.0129 0.0180 0.0174 0.0173 0.0142 0.0121 0.0078 0.0045 0.0106 0.0128 0.0108 0.0100 0.0092 0.0120 0.0064 0.0041 0.0093 0.0143 0.0100 0.0089 0.0088

The experimental results of Ring-PEA, PTE/AT, and PTE on 12 hard instances of QAP are shown in Tables 7 and 8. Table 7 gives the results when the population size is 36, while Table 8 presents the results when the population size is 128. The best solutions obtained in the 12 experiments and the p-values which are less than 0.05/8=0.0063 according to Bonferroni correction in the statistical tests are shown in bold. With only topological transfer, PTE/AT performs much better than Ring-PEA in both solution quality and CPU time. After implementing algorithmic transfer, the search capability of PTE are further enhanced. This is mainly because suitable operators can be broadcasted to all sub-populations faster. Better operators are able to make the evolution more efficient, accelerate the convergence process, and shorten the CPU time. Further, pairwise W-tests are also carried out and illustrated in the last two rows of Tables 7 and 8. The statistical differences between the two-PEAs and PTE are significant especially when the number of processors grows to 8 or 16, with 95% confidence (   0.05 ). Similarly, the increase of population size does maintain the solution quality when the number of processors is 16. But the overall performances of the three parallel algorithms are reduced due to lesser iterations.

Because most of the algorithms existing in the literature for QAP are evaluated by using the average percent deviation (APD) as a metric [72–75], the APD results obtained by PTE is demonstrated in Table 10. If two sub-populations are employed, the maximum errors between these results and the BKSs are 0.0405 and 0.364 when the population size is 32 and 128 respectively. If the number of processors increases to 4 and 8, the overall APD is significantly reduced with the growing search dynamics. The maximum errors with different population sizes are 0.246 and 0.366 respectively. When the number of processors raises to 16, the results with different population sizes are 0.280 and 0.335 by far. Hence, the increase of population size actually leads to slight performance degradation. Figs. 5 and 6 also provide a summarized view on the CPU times of the three parallel schemes with different population sizes on two groups of QAP instances. The optimum solutions of the small scale instances, tai30a, tai35a, tai40a, tai50a, tai60a, tai64c and wil50, are obtained within 80s by PTE. For the remaining larger scale instances, sub-optimal solutions are obtained in no more than 250s when the processor number is eight, and no more than 180s when the processor number reaches 16. The speed-up of PTE is similar to that of PTE/AT and is significantly greater than that of Ring-PEA. When the population size increase from 32 to 128, the CPU times of Ring-PEA are significantly reduced, while the CPU times of PTE/AT and PTE are maintained in the same level. Viewing the experimental results for both JSP and QAP, it can be seen that PTE maintains a good performance for different problems. It requires neither algorithm reconfiguration and parallel connections, nor prior information and domain knowledge of the problem. In summary, PTE has offered a high speed-up, good precision, and great robustness of optimization in solving the above two different complex problems.

(a) (b) Fig. 5 Boxplots corresponding to the CPU times (s) of PTE on two groups of QAP instances when the population size is 32, (a) tai30a, tai35a, tai40a, tai50a, tai60a, tai64c, and wil50, and (b) tai80a, tai100a, wil100, sko90, and sko100a.

(a) (b) Fig. 6 Boxplots corresponding to the CPU times (s) of PTE on two groups of QAP instances when the population size is 128, (a) tai30a, tai35a, tai40a, tai50a, tai60a, tai64c, and wil50, and (b) tai80a, tai100a, wil100, sko90, and sko100a.

V. VIRTUAL CHANNEL SCHEDULING CASE STUDY In this section, PTE is applied to a practical engineering problem, the virtual channel scheduling (VCS) problem, as a case study in communication systems. According to the discussion in the above section, the population size is fixed to 32. Without loss of generality, PTE is compared with five classical parallel schemes in solving the VCS problem. A. Virtual channel scheduling problem The VCS problem is detailed and modeled in [76], which is a complex NP-hard problem. It refers to scheduling various sorts of virtual channel (VC) services in different time slots. The target of VCS is to maintain stable and fast transmission by maximizing throughput and minimizing delay time, jitter, and loss packet rate. Different characteristics of VC services and multiple quality of service (QoS) requirements make it much more complex than a generic scheduling problem. It assumes that ni( k ) is the decision variable to denote whether VCi is scheduled in the kth time slot, M is the number of time slots, l is the number of VCs, and C is the data transmission rate for a downlink. The objective function and constraints of VCS can be represented as follows. l 1

l 1

i 1

i 0

Max  w1i  Throughputi (ni )   w2i  Lossi (ni )

(7)

s.t. l

n i 1

(k ) i

 1 , C   k 1 ni( k ) / M  B0 , Jitteri (ni )  Jiti , if i  2,3 M

Delayi (ni )  Deli , if i  0,2,3 Bi  C   k 1 ni( k ) / M  1.4Bi , if i  1,2 . M

In the above formulation, Deli , Jiti , and Bi are the maximum delay time, maximum jitter, and maximum bandwidth of VCi, respectively. The delay ( Delayi (ni ) ), jitter ( Jitteri (ni ) ), throughput ( Throughputi (ni ) ), loss packet rate ( Lossi (ni ) ), and weights ( w1i and w2i ) are calculated as in [76]. The variables ni( k ) can be mapped as a permutation within the interval [0,l]. Each number in the permutation denotes the virtual channel to be scheduled in a current time slot. All the initial settings of VCS in our experiments are the same as in [76]. For testing the problem in different scales, 500, 1000, and 1500 time slots are set as three different cases. The more time slots that are used, the smaller they are, and hence the higher the precision that is expected. B. Algorithmic settings In this application, PTE is tested with five classical parallel schemes, i.e., single-sided ring, double-sided ring, single-sided mesh, double-sided mesh, and full mesh topologies. For uniformity, the local search operators in accordance with Algorithm 5 are applied in all of the PEAs. Without loss of generality, the population size is set to 36 and the maximum number of function evaluations is set to 1000n . Then, a group of tests based on three scales of VCS are carried out to compare different parallel methods to PTE. 500 2

4

8

16

1000

TABLE 9 COMPARISON BETWEEN VARIOUS IMPLEMENTATIONS OF THE PEAS IN THE VCS APPLICATION RingPEA DRingPEA MeshPEA DMeshPEA FullmeshPEA PTE/AT 115.014 116.378 115.176 116.217 115.687 116.818 Best 114.249 114.359 114.185 114.342 114.273 116.094 Avg 74.8597 77.3414 75.8578 77.6211 78.4921 75.9021 Time (s) 116.896 117.213 116.343 117.231 117.054 117.196 Best 115.910 116.606 114.256 115.355 116.963 115.989 Avg 64.7997 69.5446 70.6761 74.0612 74.9523 59.6404 Time (s) 116.613 117.114 117.324 116.770 117.214 117.437 Best 115.891 115.571 115.695 114.070 116.533 116.354 Avg 57.1365 61.9562 62.7025 64.9946 66.7738 48.6408 Time (s) 116.594 117.331 116.452 117.381 116.731 116.798 Best 115.552 115.769 114.997 115.762 115.449 115.636 Avg 49.3078 54.7191 55.9711 61.0030 64.4077 40.3337 Time (s) RingPEA DRingPEA MeshPEA DMeshPEA FullmeshPEA PTE/AT

PTE 117.378 117.036 75.1138 117.436 117.125 59.7849 117.538 117.214 48.1401 117.413 116.984 39.8212 PTE

2

4

8

16

1500 2

4

8

16

Best Avg Time (s) Best Avg Time (s) Best Avg Time (s) Best Avg Time (s) Best Avg Time (s) Best Avg Time (s) Best Avg Time (s) Best Avg Time (s)

117.634 115.391 314.757 118.525 117.003 278.849 118.108 116.937 218.657 117.531 115.825 180.627 RingPEA 116.082 113.538 635.534 119.765 118.637 524.266 118.067 116.455 445.690 117.253 116.781 376.367

116.870 115.064 311.103 119.017 117.005 278.125 118.593 115.921 235.069 117.887 115.634 199.346 DRingPEA 115.506 112.951 626.727 118.078 117.625 549.636 116.384 115.405 470.396 116.276 115.110 391.897

117.285 116.531 314.837 118.611 117.439 294.168 119.113 117.052 237.177 117.305 116.187 203.699 MeshPEA 114.182 112.463 631.187 117.695 116.383 559.423 118.517 116.691 471.344 116.931 115.445 395.359

117.296 115.319 316.148 118.123 117.579 295.372 118.498 116.876 265.583 117.844 115.212 236.287 DMeshPEA 117.535 114.087 636.212 119.789 117.361 576.304 118.837 116.701 497.846 117.292 115.538 428.882

117.422 116.592 314.714 119.481 118.032 298.415 119.220 117.758 261.573 118.301 116.949 233.367 FullmeshPEA 118.859 115.691 635.403 119.157 118.405 569.939 119.297 117.174 497.708 118.769 116.473 424.716

118.577 117.148 313.318 118.651 117.712 251.116 119.073 117.745 193.216 118.690 116.851 152.366 PTE/AT 118.641 116.472 627.793 118.366 117.538 485.082 118.882 117.910 394.068 118.495 116.904 326.593

120.276 119.388 314.757 120.657 119.773 248.849 120.702 120.591 197.689 120.494 119.253 155.806 PTE 119.869 118.194 628.035 119.754 119.236 483.527 120.972 119.361 397.242 119.724 119.116 332.469

C. PTE performance and topology comparison The results of the five classical parallel topologies compared to PTE/AT and PTE are shown in Tables 9 and 10. With different problem scales and characteristics, the performance of these classical topologies change as well, and these changes are usually inconsistent. PTE always finds better solutions and takes much shorter CPU time. According to Bonferroni correction, i.e., the p-value should satisfy p   / m  0.05 / 6  0.0083 , the CPU time of PTE is significantly better than the others, and is very similar with PTE/AT. It offers a significant speed-up without precision loss while the processor number increases, as shown in Fig. 5. When the number of processors increases from 2 to 16, the execution time of PTE is always reduced significantly and linearly as well as which in JSP and QAP. PTE vs Solution Time (s)

TABLE 10 COMPARISON OF THE W-TEST AMONG VARIOUS IMPLEMENTATIONS OF PEAS FOR VCS RingPEA DRingPEA MeshPEA DMeshPEA FullmeshPEA W-test 0 0 0 0 0 0.004 0.006 0.002 0.002 0.003 W-test

PTE/AT 0 0.347

The best average times for the cases of 500, 1000, and 1500 time slots are 39.8212s, 155.806s, and 332.469s, respectively. Compared to the other five classical parallel topologies under the same conditions, PTE is much faster as the number of processors increases.

(a) small scale

(b) medium scale

(c) large scale

Fig. 5 The speed-ups of the PEAs on solving the VCS with three scales

Without changing any part of the algorithm, PTE is seen as capable of performing well not only in the benchmark tests, but also in solving a practical problem in diverse circumstances. Offering a shortened CPU time in both small and large scale problems, the search capability of PTE is well maintained at a high level. In short, PTE is designed as a scheme to expand the existing evolutionary operators in a highly flexible parallel way and make them faster and more adaptable in solving different sorts of combinatorial optimization problems. VI. CONCLUSIONS The focus of this paper has been to establish a parallel transfer scheme to structure parallel evolutionary algorithms flexibly to handle a wider range of real-world optimization problems. Using a group of classical evolutionary operators and local search heuristics, it has demonstrated both the communication connection and the evolutionary operator in PTE are able to transfer through sub-population pairs and thus to improve PEA performance. PTE enables efficient collaboration among sub-populations with minimal communication. To test the performance of PTE in solving combinatorial optimization problems, comprehensive experiments have been carried out on the generic JSP and QAP problems, as well as by applying PTE to a practical VCS problem. In most cases, PTE has outperformed other EAs and PEAs, especially in search speed and quality. Furthermore, the speed-up on the parallelism is approximately linear, while degradation of solution accuracy is avoided. Both topological transfer and algorithmic transfer are applicable not only to combinatorial optimization problems, but also to continuous or non-permutated complex problems. Classical evolutionary operators and local search heuristics can be extensively adjusted by some tuners or replaced by other subroutines, so as to configure and form a new algorithm. Therefore, it is motivated to take the parameter configuration of the underlying candidates into consideration and introduce typical algorithm tuners to adjust the candidates before and during the search. In addition, applying the PTE scheme to more practical domain following extended comparisons to the parallel hyper-heuristic based evolutionary algorithms is also a future work to be carried out. It is expected that this scheme is capable of solving not only single-objective, but also multi-objective, optimization problems for engineering practices and in changing environments. ACKNOWLEDGEMENT This work is supported by Natural Science Foundation of China under Grants No. 61703015. REFERENCES [1]

E Alba, M Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 2002, 6(5): 443-462. [2] E Alba. Parallel metaheuristics: a new class of algorithms. John Wiley & Sons, 2005. [3] A Lopez Jaimes, C A Coello Coello. MRMOGA: a new parallel multi-objective evolutionary algorithm based on the use of multiple resolutions. Concurrency and Computation-Practice & Experience, 2007, 19(4): 397-441. [4] E Alba. Parallel evolutionary algorithms can achieve super-linear performance. Information Processing Letters, 2002, 82(1): 7-13. [5] D A Van Veldhuizen, J B Zydallis, G B Lamont. Considerations in engineering parallel multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computations, 2003, 7(2): 144-173. [6] H R Cheshmehgaz, M I Desa, A Wibowo. Effective local evolutionary searches distributed on an island model solving bi-objective optimization problems. Applied Intelligence, 2013, 38(3): 331-356. [7] L delaOssa, J A Gamez, J A Puerta. Initial approaches to the application of islands-based parallel EDAs in continuous domains. Journal of Parallel and Distributed Computing, 2006, 66(8): 991-1001. [8] S C Lin. Coarse-grain parallel genetic algorithms: categorization and new approach. The 6th IEEE Symposium on Parallel and Distributed Processing, 1994: 28-37. [9] X Y Zhang, J Zhang, Y J Gong, Z H Zhan, W N Chen, Y Li. Kuhn-Munkres parallel genetic algorithm for the set cover problem and its application to large-scale wireless sensor networks. IEEE Transactions on Evolutionary Computation, 2016, 20(5): 695-710. [10] K E Parsopoulos. Parallel cooperative micro-particle swarm optimization: a master-slave model. Applied Soft Computing, 2012, 12(11): 3552-3579. [11] E Cantú-Paz. Migration policies, selection pressure, and parallel evolutionary algorithms. Journal of heuristics, 2001, 7(4): 311-334.

[12] S Skolicki, K De Jong. The influence of migration sizes and intervals on island models. Proceedings of the 2005 conference on Genetic and evolutionary computation. ACM, 2005: 1295-1302. [13] J Lassig, D Sudholt. Design and analysis of migration in parallel evolutionary algorithm. Soft Computing, 2013, 17(7): 1121-1144. [14] E Noda, A L V Coelho, I L M Ricarte I L M, A Yamakami, A A Freitas. Devising adaptive migration policies for cooperative distributed genetic algorithms. IEEE International Conference on Systems, Man and Cybernetics, 2002, 6: 6-pp. [15] F Lardeux, A Goëffon. A dynamic island-based genetic algorithms framework. Simulated Evolution and Learning. Springer Berlin Heidelberg, 2010: 156-165. [16] L Araujo, J Julian Merelo. Diversity through multiculturality: assessing migrant choice policies in an island model. IEEE Transactions on Evolutionary Computation, 2011, 15(4): 456-469. [17] T Matsumura, M Nakamura, J Okech, K Onaga. A parallel and distributed genetic algorithm on loosely-coupled multiprocessor system. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 1998, 81(4): 540-546. [18] M L M Beckers, E P P A Derks, W J Melssen, L M C Buydens. Using genetic algorithms for conformational analysis of biomacromolecules. Computers & Chemistry, 1996, 20(4): 449-457. [19] Y Fukuyama, H D Chiang. A parallel genetic algorithm for generation expansion planning. IEEE Transactions on Power Systems, 1996, 11(2): 955-961. [20] H Miyagi, T Tengan, S Mohanmed, M Nakamura. Migration effects on tree topology of parallel evolutionary computation. IEEE Region 10 Conference on TENCON, 2010: 1601-1606. [21] F M Defersha, M Chen. A parallel genetic algorithm for dynamic cell formation in cellular manufacturing systems. International Journal of Production Research, 2008, 46(22): 6389-6413. [22] F M Defersha, M Chen. A parallel genetic algorithm for a flexible job-shop scheduling problem with sequence dependent setups. International Journal of Advanced Manufacturing Technology, 2010, 49(1-4): 263-279. [23] L Li, J M Garibaldi, N Krasnogor. Automated self-assembly programming paradigm: The impact of network topology. International Journal of Intelligent Systems, 2009, 24(7): 793-817. [24] J M Whitacre, R A Sarker, Q T Pham. The self-organization of interaction networks for nature-inspired optimization. IEEE Transactions on Evolutionary Computation, 2008, 12(2): 220-230. [25] I Arnaldo, I Contreras, D Millán-Ruiz, J I Hidalgo, N Krasnogor. Matching island topologies to problem structure in parallel evolutionary algorithms. Soft Computing, 2013, 17(7): 1209-1225. [26] C Segura, E Segredo, C Leon. Scalability and robustness of parallel hyperheuristics applied to a multiobjectivised frequency assignment problem. Soft Computing, 2013, 17: 1077-1093. [27] J Jin, T G Crainic, A A Løkketangen. A cooperative parallel metaheuristic for the capacitated vehicle routing problem. Computers & Operations Research, 2014, 44: 33-41. [28] E G Talbi. A taxonomy of hybrid metaheuristics. Journal of Heuristics, 2002, 8: 541-564. [29] J Tang, M H Lim, Y S Ong. Diversity-adaptive parallel memetic algorithm for solving large scale combinatorial optimization problems. Soft Computing, 2007, 11: 873-888. [30] W Deng, R Chen, J Gao, Y Song, J Xu. A novel parallel hybrid intelligence optimization algorithm for a function approximation problem. Computers & Mathematics with Applications, 2012, 63(1): 325-336. [31] F Tao, Y J Laili, Y Liu, Y Feng, Q Wang, L Zhang, L Xu. Concept, principle and application of dynamic configuration for intelligent algorithms. IEEE Systems Journal, 2014, 8(1): 28-42. [32] F Tao, L Zhang, Y J Laili. Configurable Intelligent Optimization Algorithms: Theory and Practice in Manufacturing. Springer, 2014. [33] E Alba, A J Nebro, J M Troya. Heterogeneous computing and parallel genetic algorithms. Journal of Parallel and Distributed Computing, 2002, 62(9): 1362-1385. [34] T C Belding. The distributed genetic algorithm revisited. Proceedings of the 6th International Conference on Genetic Algorithms, 1995: 113-121. [35] J Denzinger, J Kidney. Improving migration by diversity. Proceedings of the Congress on Evolutionary Computation, 2003, 1: 700-707. [36] E Alba, J M Troya. Influence of the migration policy in parallel distributed GAs with structured and panmictic populations. Applied Intelligence, 2000, 12(3): 163-181. [37] C Qian, J C Shi, Y Yu, K Tang, Z H Zhou. Parallel pareto optimization for subset selection. Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI'16), New York, 2016: 1939-1945. [38] M Hijaze, D Corne. An investigation of topologies and migration schemes for asynchronous distributed evolutionary algorithms. Proceedings of the World Congress on Nature and Biologically Inspired Computing, IEEE, 2009. [39] G Wang, D Wu, K Szeto. Quasi-parallel genetic algorithms with different communication topologies. The IEEE Congress on Evolutionary Computation (CEC), 2007: 721-727. [40] M Giacobini, M Preuss, M Tomassini. Effects of scale-free and small-world topologies on binary coded self-adaptive cea. Lecture Notes in Computer Science, 2006, 3906: 86-98. [41] L Li, J M Garibaldi, N Krasnogor. Automated self-assembly programming paradigm: the impact of network topology. International Journal of Intelligent Systems, 2009, 24(7): 793-817. [42] Q Liu, W Wei, H Yuan, Z H Zhan, Y Li. Topology selection for particle swarm optimization. Information Sciences, 2016, 363: 154-173. [43] S Yang, R Tinos. A hybrid immigrants scheme for genetic algorithms in dynamic environments. International Journal of Automation & Computing, 2007, 4(3): 243-254. [44] Z H Zhan, Y Lis, J Zhang. Cloudde: a heterogeneous differential evolution algorithm and its cloud version. IEEE Transactions on Parallel and Distributed Systems, 2016. [45] J Whitacre, R Sarker, Q Pham. The self-organization of interaction networks for nature-inspired optimization. IEEE Transactions on Evolutionary Computation, 2008, 12(2): 220-230. [46] F Tao, Y Laili, L Xu, L Zhang. FC-PACO-RM: a parallel method for service composition optimal-selection in cloud manufacturing system. IEEE Transactions on Industrial Informatics, 2013, 9(4): 2023-2033.

[47] H Muhlenbein. Evolutionary in time and space: the parallel genetic algorithm. Foundations of Genetic Algorithms, 1991: 316-337. [48] P Moscato, M G Norman. A memetic approach for the traveling salesman problem implementation of a computational ecology for combinatorial optimization on message-passing systems. Parallel Computing and Transputer Applications, 1992: 177-186. [49] J Tang, M H Lim, Y S Ong. Parallel memetic algorithm with selective local search for large scale quadratic assignment problems. International Journal of Innovative Computing Information and Control, 2006, 2(6): 1399-1416. [50] J Tang, M H Lim, Y S Ong. Adaptation for parallel memetic algorithm based on population entropy. The 8th Annual Genetic and Evolutionary Computation Conference (GECCO 2006), ACM SIGEVO, 2006, 1-2: 575-582. [51] J A Vrugt, B A Robinson, J M Hyman. Self-adaptive multimethod search for global optimization in real-parameter spaces. IEEE Transactions on Evolutionary Computation, 2009, 13(2): 243-259. [52] D Hadka, P Reed. Borg: an auto-adaptive many-objective evolutionary computing framework. Evolutionary Computation, 2013, 21(2): 231-259. [53] J Grobloer, A P Engelbrecht, G Kendall, V S S Yadavalli. Alternative hyper-heuristic strategies for multi-method global optimization. IEEE Congress on Evolutionary Computation, 2010: 1-8. [54] E K Burke, G Kendall, E Soubeiga. A tabu-search hyperheuristic for timetabling and rostering. Journal of Heuristics, 2003, 9(6): 451-470. [55] C Qian, K Tang, Z H Zhou. Selection hyper-heuristics can provably be helpful in evolutionary multi-objective optimization. In: Proceedings of the 14th International Conference on Parallel Problem Solving from Nature (PPSN'16), Edinburgh, Scotland, 2016: 835-846. [56] J Holland. Adaptation in natural and artificial systems. The University of Michigan Press, 1975. [57] K De Jong, An analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan, Ann Arbor, 1975. [58] J Kennedy, R C Eberhart. Particle swarm optimization. IEEE International Conference on Neural Networks, 1995. [59] H Wang, Y Liu, S Zeng. A hybrid particle swarm algorithm with Cauchy mutation. IEEE Swarm Intelligence Symposium (SIS 2007), 2007: 356-360. [60] W Gong, Z Cai, C X Ling, H Li. Enhanced differential evolution with adaptive strategies for numerical optimization. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 2011, 41(2): 397-413. [61] Z W Geem, J H Kim, G V Loganathan. A new heuristic optimization algorithm: harmony search. Simulation, 2001, 76(2): 60-68. [62] H R Lourenco, O C Martin, T Stutzle. Iterated local search. arXiv preprint math/0102188, 2001. [63] N Mladenovic, P Hansen. Variable neighborhood search. Computers & Operations Research, 1997, 24(11): 1097-1100. [64] K P Wang, L Huang, C G Zhou, W Pang. Particle swarm optimization for travelling salesman problem. Proceedings of The 2nd International Conference on Machine Learning and Cybernetics, 2003. [65] P C Ma, F Tao, Y L Liu, L Zhang, H X Lu, Z Ding. A hybrid particle swarm optimization and simulated annealing algorithm for job shop scheduling. IEEE International Conference on Automation Science and Engineering (CASE), 2014. [66] E Nowicki, C Smutnicki. A fast taboo search algorithm for the job-shop problem. Management Science, 1996, 42(6): 797-813. [67] J F Goncalves, J J de Magalhaes, M G C Resende. A hybrid genetic algorithm for the job shop scheduling problem. European Journal of Operational Research, 2005, 167: 77-95. [68] R M Aiex, S Binato, M G C Resende. Parallel GRASP with path-relinking for job shop scheduling. Parallel Computing, 2003, 29: 393-430. [69] S Binato, W J Hery, D M Loewenstern, M G C Resende. A GRASP for job shop scheduling. Essays and Surveys in Metaheuristics, Springer, 2002: 59-79. [70] D Y Sha, C Y Hsu. A hybrid particle swarm optimization for job shop scheduling problem. Computers & Industrial Engineering, 2006, 51: 791-808. [71] R Zhang, S Song, C Wu. A hybrid artificial bee colony algorithm for the job shop scheduling problem. International Journal of Production Economics, 2013, 141: 167-178. [72] T James, C Rego, F Clover. Multistart tabu search and diversification strategies for the quadratic assignment problem. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans, 2009, 39(3): 579-596. [73] Y Marinakis, A Migdalas. A hybrid genetic-GRASP algorithm using lagrangean relaxation for the traveling salesman problem. Journal of Combinatorial Optimization, 2005, 10: 311-326. [74] J Yang, C Wu, H P Lee, Y Liang. Solving traveling salesman problems using generalized chromosome genetic algorithm. Progress in Natural Science, 2008, 18: 887-892. [75] T James, C Rego, F Glover. A cooperative parallel tabu search algorithm for the quadratic assignment problem. European Journal of Operational Research, 2009, 195: 810-826. [76] Y Zhu, P Wan, Y Chen, F Tao, L Zhang. Modeling and Solution for Virtual Channel Scheduling for Downlink Business. Asia Simulation Conference (AsiaSim 2014), Japan, 2014. [77] Y Laili, F Tao, L Zhang. Multi operators-based partial connected parallel evolutionary algorithm. Evolutionary Computation (CEC), 2016 IEEE Congress on. IEEE, 2016: 4289-4296. [78] Papenhausen E, Mueller K. Coding Ants–Optimization of GPU Code Using Ant Colony Optimization. Computer Languages, Systems & Structures, 2018. [79] Farhad S M, Nayeem M A, Rahman M K, Rahman M S. Mapping stream programs onto multicore platforms by local search and genetic algorithm. Computer Languages, Systems & Structures, 2016, 46: 182-205. [80] Birattari M, Stützle T, Paquete L, Varrentrapp K. A racing algorithm for configuring metaheuristics. Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, 2002: 11-18. [81] Nannen V, Eiben A E. Relevance Estimation and Value Calibration of Evolutionary Algorithm Parameters. International Joint Conferences on Artificial Intelligence (IJCAI), 2007, 7: 975-980. [82] Veček N, Mernik M, Filipič B, Črepinšek M. Parameter tuning with Chess Rating System (CRS-Tuning) for meta-heuristic algorithms. Information Sciences, 2016, 372: 446-469.