Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning

Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning

ARTICLE IN PRESS JID: KNOSYS [m5G;July 23, 2017;6:2] Knowledge-Based Systems 0 0 0 (2017) 1–16 Contents lists available at ScienceDirect Knowledg...

1MB Sizes 21 Downloads 39 Views

ARTICLE IN PRESS

JID: KNOSYS

[m5G;July 23, 2017;6:2]

Knowledge-Based Systems 0 0 0 (2017) 1–16

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning Lianbo Ma a, Shi Cheng b, Xingwei Wang a, Min Huang c,∗, Hai Shen d, Xiaoxian He e, Yuhui Shi f a

College of Software, Northeastern University, Shenyang, China College School of Computer Science, Shaanxi Normal University, Xi’an, China College of Information Science and Engineering, Northeastern University, Shenyang, China d College of Information Science & Engineering, Central South University, Changsha, China e College of Physics Science and Technology, Shenyang Normal University, Shenyang, 110034, China f Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China b c

a r t i c l e

i n f o

Article history: Received 31 August 2016 Revised 19 June 2017 Accepted 19 July 2017 Available online xxx Keywords: Bee foraging Multi-objective optimization Indicator Pareto

a b s t r a c t This paper proposes a novel multi-objective bee foraging algorithm (MOBFA) based on two-engine coevolution mechanism for solving multi-objective optimization problems. The proposed MOBFA aims to handle the convergence and diversity separately via evolving two cooperative search engines with different evolution rules. Specifically, in the colony-level interaction, the primary concept is to first assign two different performance evaluation principles (i.e., Pareto-based measure and indicator-based measure) to the two engines for evolving each archive respectively, and then use the comprehensive learning mechanism over the two archives to boost the population diversity. In the individual-level foraging, the neighbor-discount-information (NDI) learning based on reinforcement learning (RL) is integrated into the single-objective searching to adjust the flight trajectories of foraging bee. By testing on a suit of benchmark functions, the proposed MOBFA is verified experimentally to be superior or at least comparable to its competitors in terms of two commonly used metrics IGD and SPREAD. © 2017 Elsevier B.V. All rights reserved.

1. Introduction In many real-world industrial applications, decision makers (DMs) have to tackle complex problems with simultaneous satisfaction of multiple objectives that are non-commensurable and normally conflicting to each other. These problems are also referred as multi-objective optimization (MO) problems. Generally, the solutions to these MO problems, known as the Pareto-optimal solutions (PS), represent a possible best trade-off to satisfy all conflicting objectives [1,2]. In virtue of the promising property of population-based stochastic searching, evolutionary algorithms (EAs) have been widely used to solve the MO problems. Accordingly, the primary goal of these multi-objective EAs (MOEAs) is to find a set of non-dominated solutions that have an accurate and well distributed approximation of the true Pareto-optimal front (PF). One representative paradigm to obtain this goal is Pareto optimality based algorithm, such as NSGAII [3] and SPEA2 [4]. Paretobased MOEAs have been deeply developed to deal with the MO



Corresponding author. E-mail address: [email protected] (M. Huang).

problems and have received a surge of attention. However, sometimes these algorithms would become ineffective to improve convergence when dealing with complex MO problems because estimating the similarity in a MO space is difficult and it inevitably causes the difficulty of identifying the differences between solutions. Accordingly, several attempts at diversity improvement have been developed but do not solve this issue completely [5–7]. As a response to resolve this issue, a considerable number of algorithms have been proposed recently. The first type is the decomposition-based approach that utilizes aggregation functions to decompose a MO problem into a set of single-objective optimization problems. Among them, the representative algorithm is MOEA/D [8]. In MOEA/D, each solution vector is assigned with a specific weight vector for the sub-problem in a cooperative approach. Several contemporary variants have been proposed [9–12]. For instances, Li and Silva [9] develop a greedy randomized adaptive search procedure (GRASP) to optimize each sub-problem, Chen et al. [10] propose an framework of doublelevel archives for MOEA/D to retain fast convergence and even distribution along PF, and in [11] a new variant MOEA/D-EGO is developed for expensive MO problems, and it can yield a set of reasonable solutions within a given computational budget.

http://dx.doi.org/10.1016/j.knosys.2017.07.024 0950-7051/© 2017 Elsevier B.V. All rights reserved.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS 2

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

Another alternative to the Pareto-based approach is the quality indicator-based algorithm, within which the fitness of each solution is evaluated by the values of the quality indicator between individuals. The first indicator-based algorithm is the indicator-based evolutionary algorithm (IBEA) proposed in [13], which has a significant merit of boosting the convergence. However, it is proven that this indicator-based approach suffers from the lack of diversity maintenance in its indicator. Aiming at this issue, some improved variants have been proposed. In [14], the S-metric selection strategy is incorporated in the MOEA to enhance convergence and diversity simultaneously. Likewise, Gomez et al. proposes a R2 indicator based evolutionary algorithm called MOMBI [15]. Another interesting improvement is the incorporation of hypervolume-based metric in MOEAs, which can evaluate convergence and diversity simultaneously [16]. Recently, several novel MO bio-inspired algorithms i.e., MOGWO, MODA and MOALO are developed by Mirjalili based on social behaviors of grey wolf, dragonfly and ant lion and receive a surge of attention [17,18,19]. The third type of valuable improvements are to augment the selection pressure in MO space, such as the ε -dominance [20], the controlling dominance area [21], and the α -dominance [22]. These methods have been proven experimentally to be more effective than the original Pareto dominance. In addition, several other strategies are developed to improve dominance definitions, such as the fuzzy-Pareto dominance [23], L-optimality [24], and ranking method [25]. Especially, in [26], the crowding distance mechanism presented in NSGAII is replaced with the farthest-candidate method, which is more conductive to diversity preservation. In [27], a new grid dominance method is proposed to retain appropriate balance of convergence and distribution. Besides above categories, there are other emerging approaches, including NSGAIII [28] and two-archive MOEA [29]. NSGAIII uses a set of reference points with even distribution to refine the diversity maintenance, and it has been validated experimentally [28]. In [29], an improved two-archive algorithm is developed by assigning indicator and Pareto metrics to the two archives respectively, which can enhance the convergence and diversity separately. Other approaches based on reference or objective space reduction have also been proposed and developed [30–35]. Inspired by pioneering effort of the above works, especial that in [29], by leveraging the specific advantages of Pareto-based and indicator-based measure approaches, our work in this paper aims to design a two-engine multi-objective bee foraging algorithm (MOBFA) to achieve excellent performance on convergence and diversity without requiring any reference information in advance. As mentioned previously, managing convergence and diversity is an essential task in the design of MO algorithms. That is, the two common but conflicting goals i.e., to minimize the distance to the true PS and to maximize the diversity within the approximation of the PS, have to be appropriately balanced [36]. In order to solve this issue, only employing the Pareto dominance approach is insufficient. It has been proven that the quality indicator measure proposed in IBEA [13] has great potential to boost the convergence and the Pareto dominance is able to improve the diversity [29]. By using these features, we can follow the idea of cooperative evolution by combining with these two dominance relations. One of our main ideas is to cooperatively evolve two heterogeneous search engines (i.e., evolutionary populations) to promote convergence and diversity separately. Specifically, in the colony-level interaction, the Pareto-based engine (PE) that is to maintain diversity and the indicator-based engine (IE) that aims to boost convergence are evolved in parallel, and they update their archives independently, namely Pareto-based archive (PA) or indicator-based archive (IA). At each evolutionary step, each agent (or individual) of the engine is updated by indirectly learning from a random one selected from its counterpart’s archive. In the

individual-level searching, the artificial bee colony (ABC) paradigm [37] is selected as the basic search rule. Then the reinforcement learning (RL) is integrated into the ABC to adjust the flight trajectories of the foraging bee. By incorporating these mechanisms, the single population ABC has been extended to the interacting multihive and multi-objective model. Intuitively, the novelties and characteristics of the proposed MOBFA for MO problems can be summarized as follows. (1) The MOBFA uses two heterogeneous engines (or demes) with different evolutionary rules to deal with the convergence and the diversity separately. Obviously, this cooperative multiengine framework is open-ended and extensible that various potential MO strategies can be easily incorporated and accordingly new search engines including evolutionary algorithm (EA) and swarm intelligence (SI) paradigms can be selected in the corresponding engine. This also means this framework can achieve an appropriate search trade-off in the MO optimization. (2) Based on the proposed framework, the MOBFA provides a cooperative evolution mechanism of diversity preservation and distribution control and a mechanism for the communication of the two demes. • In the colony-level interaction, the Pareto-based engine (PE) and indicator-based engine (IE) are evolved through information exchanging to maintain a global archive. That is, each agent in the engine (PE or IE) is updated by indirectly learning from a random one selected from its counterpart’s archive (IA or PA) all through the search process. This way, the explicit amalgamation of competitive advantages of Pareto dominance and indicator-based measure, show huge potential to address the task of balancing convergence and diversity. • In the individual-level searching, the reinforcement learning (RL) based on Q function is incorporated into the singleobjective ABC engine to determine the flight trajectories of the foraging bee, in which each individual can learn from its neighbor through the neighbor-discount-information (NDI) mechanism, and use this information to update its position, which essentially enhances the efficiency of information exchange. The rest of this paper is organized as follows. Section 2 presents the literature review. In Section 3 the proposed algorithm is presented in detail. In Section 4, the experimental study on a set of benchmarks is given. Finally, Section 5 outlines the conclusions. 2. Literature review This section presents the principles of multi-objective optimization and the brief review of the multi-objective ABC algorithms. 2.1. Multi-objective optimization The principle of a multi-objective optimization problem can be stated as below:

Minimize



x

s.t.

F (x ) = ( f1 (x ), . . . , fm (x ))

gi ( x ) ≥ 0 , i = 1 , 2 , · · · , k h j ( x ) = 0, j = 1, 2, · · · , q

(1)

In this definition, x⊂Rn, , and x = (x1 ,…,xn ) is decision vector consisting of n variables, Rn is the decision space, F: x→Rm consists of m real-valued objective functions that should be optimized simultaneously, Rm is the objective space, and k and q is the number of inequality and equality constraints respectively. Essentially, the goal is to find a set of solutions that yield a best trade-off among all m objective functions. The formal definition of Pareto dominance is given as follows.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

(1) Given two decision vectors xa , xb ∈ Rn , xa is said to dominate xb , also denoted as xa xb , if ∀i ∈ {1,2,…,m}, fi (xa ) ≤ fi (xb ) and ∃i ∈ {1,2,…,m}, fi (xa )
3

which is the major challenging task in the MO optimization, is still not addressed in the MOABC algorithms. 3. Proposed algorithm This section presents the basic idea of the two-engine cooperation mechanism, and proposes the MOBFA algorithm in detail. 3.1. Basic idea This approach uses two types of search engines (i.e., colonies) to be evolved independently by different dominance relations in order to handle convergence and diversity separately as shown in Fig. 1. As we can see in Fig. 1, the indicator-based IE aims to guide the population to converge to the true PF at a fast speed, and the Pareto-based PE strives to add more diversity to the population in the multi-dimensional objective space. In the proposed approach, one engine represents a set of colonies with the same evolution rules. For the sake of simplification, we employ only one colony in each engine. Accordingly, IE and PE have different fitnessevaluation rules and archive updating mechanisms. Generally, as shown in Fig. 1, main components of MOBFA include multi-archive management mechanism, comprehensive learning, reinforcement learning, the basic foraging operators composed of sending employed bees, sending onlooker bees and sending scout bees. The following subsections will discuss them in details. 3.2. Multi-archive maintenance The hybrid archive management scheme is designed as shown in Fig. 2. In this scheme, two heterogeneous but cooperative archives (i.e., PA and IA) are updated independently based on different dominance relations, and then they are merged in the global archive EA by a specific selection methodology. Detailed procedures are given as below. 3.2.1. Archive for indicator-based engine In order to boost convergence, the IE engine uses the quality indicator Iε+ proposed in [13] as the fitness assignment and the selection principle for its archive. As depicted in [13], the binary additive ɛ-indicator is a performance metric that transforms a Pareto set approximation to a real number, which can be used to calculate the comparative quality of two Pareto set approximations relatively to each other. Mathematically, this indicator can be defined as

Iε+ (A, B ) = min{∀x2 ∈ B∃x1 ∈ B : fi (x1 ) ε

−ε ≤ fi (x2 ) f ori ∈ {1, ..., m}}

(2)

where x1 , x2 are the solution vector, m is the number of objectives. Especially, this indicatorIε+ is an enhanced extension of the Pareto dominance relation, and it donates the minimum distance by which a Pareto-optimal approximation needs to be transformed in each dimension in the decision space such that another approximation is weakly dominated. Then, the fitness assignment of the engine IE is re-defined by Iε+ as below:

F itIε+ (x1 )=



−e−Iε+ (x2 ,x1 )/(c−s )

(3)

x2 ∈P \{x1 }

where P is the population, c = maxi, j∈ p |Iε+ (i − j )|and s is a zoom factor, and c-s is usually set to 0.05. Essentially, this FIε+ (x )is the measure of the loss in quality if x is deleted, and can be directly employed to measure to determine dominance relation in the MO optimization. In the process of updating IA, the newly obtained individuals in IE, which are better than their corresponding parents in terms of

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS 4

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

Fig. 1. The basic flowchart of the proposed algorithm.

Algorithm 1. Maintaining archive IA. 1. Parameters: size_IA is the fixed size of IA. While (|IA| > size_IA) 2. x∗ = {x| min(F itIε+ (x ))}, xi ∈ PIA ,PIA is the population of archive IA. 3. Remove x∗ from IA. 4. Recalculate the remaining individuals in IA:  −e−Iε+ (x∗,x )/0.05 F itIε+ (x ) = F itIε+ (x ) + x∈PIA \{x∗}

End While

lutions. Accordingly, the PE updates its archive PA based on Pareto dominance where only non-dominated solutions can be put into PA. When PA is overflowed, the redundant individual is pruned based on the improved crowding distance mechanism. The conventional crowding distance [3] is a one-time pruning strategy to measure the density of solutions in the external archive. Formally, it is based on Euclidean distance, as defined below:

Ci = indicator-based fitness, are firstly added to IA. Then, the IA deletes redundant individuals on the basis of quality loss (Iε+ ) to prevent the quality deterioration of the archive. In each iteration, the individual with the smallest fitness is eliminated from IA, and then the fitness values of the remaining individuals in IA are recalculate as illustrated in Algorithm 1. 3.2.2. Archive for Pareto-based engine The main task for the usage of Pareto-based engine PE is to maintain an appropriate diversity over a set of non-dominated so-

m 

(| fi+1, j − fi−1, j | )

(4)

j=1

where Ci is the crowding distance of individual i, m is the number of objective functions, fi,j is the jth objective function value of individual i. Its detailed procedure can be referred in [3]. However, as depicted in [46], in many cases of high density of individuals surrounding particular individual, this conventional method would suffer from unexpected deletion of significant particles in dense regions at one time, which damages the distribution of the Pareto front. Thus, an improved selection method, namely

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

5

Algorithm 3. Maintaining archive EA. Parameters: size_EA is the fixed size of EA. 1. Select all individuals in IA and PA as A. 2. Recalculate the fitness of each member in A based on the quality indicator:  −e−Iε+ (x∗,x )/0.05 . F itIε+ (x ) = F itIε+ (x ) + x∈A\{x∗}

3. Sort all individuals according to the fitness values in descending order. While (|A| > size_EA) 4. x∗ = {x| min(F itIε+ (x ))}, x ∈ A 5. Remove x∗ from A. 6. Recalculate the remaining individuals in A:  −e−Iε+ (x∗,x )/0.05 F itIε+ (x ) = F itIε+ (x ) + x∈A\{x∗}

End While 7. EA=A. Return EA.

x∗ (see Algorithm 3, line 6). If the solutions from PA are deleted persistently, this deletion would damage the distribution of the obtained non-dominated solutions set. The excessive deletion of individuals from IA would cause the lack of the solution convergence. 3.3. Information exchange Fig. 2. The schematic diagram of information exchange and archive management in MOBFA.

In MOBFA algorithm, two colonies search in parallel in the MO decision space, and they will exchange information at two evoluAlgorithm 2. Maintaining archive PA. tionary phases in each iteration as shown in Fig. 2. Specifically, Parameters: size_PA is the fixed size of PA. at each updating step of each individual, the individual from PE 1. Set PA empty. communicates with the member of IE through the comprehensive While (|PA| i > size_PA) learning mechanism. That is, in the ABC algorithm, the bee xi in 2. Initialize the crowding distance of individuals in PA, i.e., Ci =0. 3. Sort the individuals by computing each objective function; the boundary the PE uses one randomly selected neighbour xk in the IE to genindividuals are pre-set to an infinite value to ensure the availability in next erate a new position. Obviously, this approach allows Pareto-based selection process. individual to incorporates the useful information of the IA for the 4. Compute the crowding distance of the individuals in PA by Eq. (4). 5. Determine the minimum individual called ID in the population, then remove it. compensation of convergence. After the global archive EA is updated, the two best non-dominated solution x∗ with the smallest 6. Re-compute the crowding distance of the individual ID+1and ID−1 by Eqs. (5) and (6): quality loss will replace the worst individuals in PA and IA. Notably, the worst individual in PA is the solution with maximum M  CID+1 = (| fID+2, j − fID−1, j | ) (5) crowding-distance value, and the worst solution in IA means the j=1 one with the smallest fitness value. As a result, the population diversity is expected to be improved significantly. M  CFlag−1 = (| fID+1, j − fID−2, j | ) (6) j=1

3.4. Reinforcement learning

End While

This subsection presents the neighbor-discount-information (NDI) learning mechanism. the dynamical crowded distance estimation is given in the procedures of the PA maintenance (i.e., Algorithm 2). 3.2.3. Global archive As shown in Fig. 2, the newly obtained non-dominated solutions and the previous solutions in the archives PA and IA are merged together and denoted as EA in each generation to prevent the deterioration of the quality of the archive. As the final output, the archive EA is managed and maintained as shown in Algorithm 3. In this approach, the EA explicitly integrates the competitive advantages of PA and IA by selecting the elites with the smallest quality loss. That is, these elites based on indicator-based selection from IA have positive selection pressure to converge towards the true PF, and the ones derived from the Pareto-based PA can improve the diversity of solutions. When the size of EA exceeds the pre-set size_EA, the individual x with the smallest quality loss identified by indicator I is deleted, and the indicator-based fitness values of the remaining individuals in the archive are recalculated. The computation complexity of this fitness calculation is not high because that the fitness values of each individual can be achieved directly by deducting the indicator value incurred by

3.4.1. Q-learning algorithm Reinforcement learning (RL) [47] derives from machine learning and its main components are agent, environment, states, actions, and rewards. Accordingly the Q-learning algorithm is incorporated [47] to implement RL in the bee foraging, whose main principle is to step-by-step determine an optimal strategy which maximizes the total discounted expected reward in future. Let S = [s1 , s2 ,…,sn ] be a serial of available states for each agent, A = [a1 , a2 ,…, an ] be a set of available actions to be executed, rt be the immediate reward for implementing actions at , the transition rule of Q-learning can be defined as

Q (s, a ) = rt + γ max{Q (s, a )} a

(7)

Each learning agent selects an action by observing the state vector and then enters next state, then in order to obtain the maximum return rewards at next state, the main procedures of Qlearning follow as Step 1: Observe current state st , Step 2: Select and implement an action at , Step 3: Receive a immediate reward rt ,

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS 6

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16 Algorithm 4. The NDI learning. 1. Initiate all Q(s) objects randomly. Repeat (for each round) 2. Determine initial state of each individual. Repeat (for each individual) 3. Randomly select a initial state. 4. i=1. Repeat (each step) 5. Get immediate reward rt , and observe the state s1−int . 6. Update the Q function by Eq. (9) 7. s=s1-int and Ind=Ind(s1-int ). 8. i=i+1. Until i=R+1 Until all individuals are handled. Until all rounds are completed.

Fig. 3. Interaction relationships of individuals in neighborhood.

Step 4: Adjust the Q value as

Qt (st , at ) = (1 − αt )Qt−1 (st , at ) + αt [rt + γ max Q (st+1 , a )] a∈A

(8) where at is the learning rate to control the learning speed, the discount factorγ is set within [0,1] to ensure the convergence of Q function. Watkins [47] has proven theoretically the mathematical convergence of Q-learning if the condition of a is met in Markov environment and the generalization proof has been given in [47]. 3.4.2. Neighbor-discount-information mechanism In order to enhance the efficiency of information exchange and the systemic emergence intelligence, the neighbor-discountinformation (NDI) learning mechanism is developed based on RL. Definition 1. Given an individual Indi , the individual that can communicate with Indi directly can be named as 1-interval neighbors, and the individual that can only communicate with 1-interval neighbors directly can be called 2-interval neighbors, and so on. Without loss of generality, the neighborhood interaction relationships in EA can be depicted as shown in Fig. 3. It is desirable that the evolutionary individual can learn from the elite neighbor’s state as its next state. Essentially, according to the Q-learning, the main idea of the NDI learning is to access the Q values of each individual’s neighbors. That is, if a 1-interval neighbor y of the individual x gets the maximum Q value, then this individual x will be transmitted to the state of y at the next evolutionary time step. Specifically, it is desired that the individual at the state st can receive a reward rst immediately and then it moves to the state st+1 of its 1-interval neighbors according to a optimal strategy sy. Then under this optimal strategy, in order to maximize the total discounted expected reward, the Q-learning function can be formulated as

Qt (st ) = (1 − αt )Qt−1 (st ) + αt [rt + γ max Q (s1−int )] s1−int

(9)

where 1-int represents one member of R1 , which is the set of 1interval neighbors of this individual. Note that there is a case that each individual has its own 1-interval neighbors, which causes the computation imposed by Eq. (7) is made in an infinite loop, thus we pre-set a threshold R-interval to control this process. That is, when the R-interval neighbor is involved, its immediate reward is employed instead of the discounted expected reward value. Finally, the NDI learning mechanism is given as in Algorithm 4. 3.5. Bee foraging operators Based on the two-engine cooperation mechanism and the NDI strategy, this subsection presents the detailed procedures of the MOBFA algorithm.

3.5.1. Population initialization In this phase, given a food source Xi =(xi1 ,xi2 ,…,xiD ), its position in each engine are initialized in the search space as follows:

xid = Lbd + r (U bd − Lbd ), d = 1, 2, ..., D

(10)

where r is a random coefficient with uniform distribution within (0,1), Lbd and Ubd are the lower and upper bounds for the dth variable respectively, D is the problem’s dimension. 3.5.2. Sending employed bees In this phase, an employed bee associated to each food source xi strives to explore a new temporary food source around xi by generating random positional change. For each food source in original ABC, the foraging rue of corresponding employed bee to explore a new food source is based on a randomly-selected neighbor k. Specifically, given food source xi , its temporary position can be computed as

vi, j = xi, j + φ (xk , j − xi, j )

(11)

where vi is the newly produced position by individual i, k is a randomly-selected neighbor index which are not equal to i, j is a randomly-selected dimension, φ is a random coefficient within range of [0, 1]. After a new temporary food source is yielded, its fitness is then evaluated over the old one with greedy approach. That is, the fitness evaluation measure is determined by the type of search engines (PE and IE), namely the Pareto-based fitness for PE and the indicator-based fitness for IE, which have been depicted in above sections. 3.5.3. Sending onlooker bees After finishing their exploration process, the employed bees share the food information regarding nectar amount and position with the onlooker bees by the dancing approach. Then, each onlooker bee determines a food source to exploit according to the selection probability based on the indicator-based fitness. This selection probability probi for the ith individual is defined as

probi =1 − f it (xi )/

N j=1

f it (x j )

(12)

where fit(xi ) represents the indicator-based fitness value of xi , N is the population size. As shown in this formula, a better food source (i.e., with lower fitness value) is with a larger probability to be chosen by a onlooker bee. Then, the mutation operation and fitness evaluation of the onlooker bee follow those of the employed bee. 3.5.4. Sending scout bees Once a food source is exhausted or cannot be improved in a limited number of cycles, the corresponding employed bee would become a scout bee, and its food source is re-initialized in the same manner as that in the original ABC initialization phase [37].

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

3.6. Computational complexity of MOBFA

4.3. Benchmark results and analysis

Given a MO problem with m objectives and a population size of N, the computational complexity of each generation of MOBFA is analyzed as below: It is essential to take into consideration the main steps in one loop of MOBFA. Apart from the bee foraging operators such as sending employed bees, sending onlooker bees and sending scout bees, the main computational cost is caused from the multi-archive maintenance procedures including maintaining EA, IA and PA. Compared with EA, maintaining sub-archives IA and PA is a more time-consuming step. As shown in Algorithm 1, maintaining IA includes the following components: removing the individual with the smallest fitness and recalculating the fitness values of the remaining individuals in IA. The time complexity for removing the individual with the smallest fitness is O(N). And the time complexity for recalculating the fitness values of the remaining individuals is O(MN2 ). In Algorithm 2, the main computational cost is caused by the procedure of computing the crowding distance of the individuals in PA. Hence the time required for maintaining archive PA is O(MN2 ). In summary, the worst-case overall computational complexity of MOBFA within one loop is O(MN2 ).

4.3.1. Comparison with classical algorithms

4. Experimental study This section presents experimental configuration, gives results, and provides discussion. 4.1. Test problems and performance metric A set of MO benchmarks are utilized to access the performance of the proposed algorithm. The first six bi-objective instances are composed of ZDT benchmarks [48] and CEC 2009 benchmarks [49]. The next tri-objective instances include three DTLZ instances [50] and three CEC 2009 benchmarks. Due to the page limitation, the mathematical formulas of these instances are not provided in detail and they can be found in [48,49,50]. In order to quantitatively evaluate the MO performance of our proposed algorithm, two performance metrics are adopted: 1) convergence metric-IGD [49]; 2) spread metric- [3]. The further information regarding these two performance metrics can be seen in [49,3]. The Wilcoxon test is applied to obtained results with a level of significance a = 0.05 [51-53]. In the test, the two-sample test value satisfying z > 1.640 indicates that the proposed algorithm is significantly superior to its counterparts. Conversely, z < 1.640 implies that it is significantly worse. Detailed description of the Wilcoxon test can be seen to [51]. 4.2. Experimental setup The MOBFA is evaluated against NSGAII [3], MOEA/D [11], SPEA2 [4], MOGWO [17], MODA [18], and MOALO [19]. All the test algorithms have two common parameters: the maximum function evaluation number FEs and the population size N. Here FEs is 50,0 0 0. For CEC20 09 instances (U1, U2, U8,U9 and U10), N is set to 120, for ZDTs and DTLZs, N is 100. Other parameters for MOEA/D, SPEA2, NSGAII, MOGW, MODA and MOALO remain the same with their original references [3,11,4,17,18,19]. For MOBFA, its empirical parameter values include: population size of both PE and IE is N/2, the size_PE, size_IE, and size_EA are equal to N/2, R = 2, and Limit = 200. All algorithms are run 20 times on the test problems and their related statistics results are provided in Tables 1–7. Note that our experiments mainly focus on the empirical parameter configure, while the best parameter configuration will be investigated further in a future work.

7

Tables 1 and 2 report experimental results based on IGD and

 obtained by MOBFA, NSGAII, MOEA/D and SPEA2 on the biobjective benchmarks, where the best terms are highlighted in bold and the statistical difference results between two algorithms are also provided. For illustration, Fig. 4 shows both the true PF and the final non-dominated solutions with the best IGD values from MOBFA, NSGAII, SPEA2 and MOEA/D. Fig. 5 illustrates the evolution of IGD values of one random run via the number of function evaluations in involved algorithms on these test instances. Table 1 shows that MOBFA is superior to its counterparts on ZDT1, ZDT3, U1 and U2, in terms of mean, best, and SD of IGD results, which is also verified by the illustration in Fig. 4. This indicates that MOBFA can keep pace with the true Pareto-optimal solutions in the PF whereas its compared algorithms cannot perform well on these test problems. For ZDT1, all the four algorithms perform well except MOEA/D. In the problem ZDT2, the best IGD values is obtained by MOEA/D, and MOBFA performs a little worse, but still better than other algorithms. For ZDT6, NSGAII and SPEA2 easily have been trapped into different local Pareto-optimal sets. MOBFA also suffers from this problem. This is mainly caused by the fact that the PF of ZDT6 has a thin density and its distribution is uneven. Fig. 4 shows that plotted contour profile of non-dominated solutions obtained by MOBFA looks closer to the true PF than those obtained by its counterparts, which can be verified by the  results in Table 2. As mentioned in [6], the aims to measure the extent of solution spread among final solutions. From Table 2, we can see that MOBFA exhibits more competitive results to other algorithms on ZDT1, ZDT3, U1 and U2. Fig. 5 also shows the good IDG convergence of MOBFA on ZDT1,ZDT2, ZDT6, DTLZ2, DTLZ6, and UF10. This performance improvement of MOBFA can be ascribed to the multi-deme cooperative mechanism that boosts both convergence and diversity of solutions. Tables 3 and 4 give computation results of the tested algorithms on tri-objective benchmarks including DTLZ1, DTLZ2, DTLZ6, U8, U9 and U10. Fig. 6 plots the final non-dominated solutions with the best IGD values obtained by the algorithms. Fig. 7 illustrates the evolutionary contour profile of IGD values for each algorithm during the search process. From Table 3, it can be seen that MOBFA gets the best IGD average ranking on DTLZ1, DTLZ6, U9 and U10, which is also identified by the Wilcoxon test. Generally, compared with bi-objective problems, these tri-objective instances are more difficult to handle for the algorithms whereas MOBFA still keeps stable performance to maintain satisfactory approximation and uniformity. For the spread metric, from Table 4 we can see that MOBFA shows satisfactory performance on most of test functions except DTLZ2. From Fig. 6, we can observe that, the best PF obtained by MOBFA are comparable to those of NSGAII and MOEA/D on DTLZ2 and U8 in terms of both convergence and diversity. But MOBFA fails to approximate a well converged and evenly distributed PF on U8, thus the performance of the algorithm is a little worse than that of NSGAII. For U10, NSGAII and MOBFA perform a little better than MOEA/D and SPEA2 in uniformity. For more complex U8, U9 and U10 with discontinuous Pareto fronts, MOBFA shows a significantly better approximation than SPEA2 and MOEA/D. Fig. 7 also demonstrates the superiority of the IGD convergence identified by MOBFA to other algorithms. According to these results, the hybrid quality evaluation rule by combing indicator measure and Pareto dominance and the RL method are essentially validated.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS 8

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

Fig. 4. Non-dominated fronts with lowest IGD values obtained by each algorithm on bi-objective test instances.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

9

Table 1 IGD results on 30-D ZDT1, 30-D ZDT2, 30-D ZDT3, 10-D ZDT6, 30-D U1 and 30-D U2 by all algorithms. IGD values MOBFA

NSGAII

MOEA/D

SPEA2

Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std

ZDT1

ZDT2

ZDT3

ZDT6

U1

U2

6.4802E−04 – 2.1326E−04 1.3032E−03 5.5822E−03 1.1953E+01 7.6042E−04 4.2353E−02 8.9412E−02 3.2942E+00 1.0922E−02 4.7977E−02 3.4765E−03 2.8433E+00 1.1076E−03 5.6832E−03

6.9623E−03 – 9.8693E−04 2.2573E−03 5.9912E−01 1.6852E+01 7.7743E−02 3.8021E−01 3.6494E−03 −1.3093E+00 9.1940E−04 5.8154E−02 3.7411E−03 −1.4653E+00 9.3322E−04 5.4944E−02

7.9488E−03 – 1.2822E−03 2.8464E−03 5.1777E−01 6.6532E+01 7.7255E−02 1.9912E−01 9.8367E−03 2.4333E+01 5.6012E−03 4.8474E−04 9.2744E−03 9.1178E+00 1.40212E−03 4.8454E−03

3.9832E−03 – 7.8023E−04 1.9354E−03 5.4422E−01 1.6353E+01 7.6665E−02 5.7023E−01 1.7132E−03 −1.4773E+00 3.7715E−04 2.9543E−03 2.4912E−03 −1.4233E+00 4.7275E−04 2.9522E−03

1.8532E−02 – 5.9523E−03 1.3764E−02 7.1777E−01 3.5923E+00 1.2911E−01 5.5653E−01 2.9622E−01 2.3175E+00 6.1132E−02 2.3178E−01 9.0283E−02 1.8421E+00 1.0612E−02 5.1743E−02

9.7232E−03 – 6.1711E−03 2.8664E−02 7.0733E−01 2.4263E+00 7.9323E−02 4.7722E−01 1.1743E+00 6.6822E+00 5.7733E−01 5.5088E−01 4.16433E−02 1.5922E+00 7.0864E−03 3.3876E−02

Table 2 Spread results on 30-D ZDT1, 30-D ZDT2, 30-D ZDT3, 10-D ZDT6, 30-D U1 and 30-D U2 by all algorithms.

 values MOBFA

NSGAII

MOEA/D

SPEA2

Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std

ZDT1

ZDT2

ZDT3

ZDT6

U1

U2

7.5165E−04 – 1.29322E−04 1.4811E−04 1.7965E−03 5.5823E+00 5.7032E−04 8.0623E−04 1.6542E−01 5.9364E+01 7.2123E−02 2.3264E−01 7.5722E−01 8.0366E+01 2.2075E−01 2.3243E−01

7.1727E−04 – 2.4040E−04 1.5154E−04 6.7384E−02 2.0 089E+0 0 8.4691E−03 5.8772E−02 1.7006E−01 6.2146E+01 6.7118E−02 4.7554E−01 7.7162E−01 8.1157E+01 9.8490E−02 9.5182E−01

4.0954E−03 – 5.4432E−04 2.5764E−03 5.4624E−03 1.8873E+00 6.3721E−04 5.1665E−03 1.5937E+00 7.7577E+01 7.4523E−01 1.0133E+00 3.9276E−01 1.2411E+01 6.5264E−02 6.6123E−01

8.2943E−05 – 1.1633E−05 2.7462E−04 3.0167E−03 3.6777E+00 6.4646E−04 5.6371E−02 1.6589E+00 3.6232E+01 6.5323E−01 1.9154E−01 6.8623E−01 8.4484E+00 9.9042E−02 1.3423E−01

6.1743E−04 – 9.1223E−05 5.3211E−04 3.7045E+00 7.0124E+01 8.4874E−01 2.5566E+00 1.4616E+00 4.2116E+01 9.5515E−01 4.7721E+00 4.1118E−01 6.1478E+01 7.1833E−02 5.2721E−01

1.0545E−03 – 6.9244E−04 6.1823E−03 3.6245E+00 7.4154E+00 6.0821E−01 6.1879E+00 1.3523E+00 6.3953E+00 5.3367E−01 6.2985E+00 2.3408E−01 2.0753E+00 7.2223E−02 8.1856E−01

Fig. 5. Convergence performance comparisons of all algorithms on bi-objective benchmarks.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS 10

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

Fig. 6. Non-dominated fronts with lowest IGD values obtained by each algorithm on tri-objective test instances.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

11

Table 3 IGD results on 7-D DTLZ1, 12-D DTLZ2, 22-D DTLZ6, 30-D U8, 30-D U9 and 30-D U10 by all algorithms. IGD values MOBFA

NSGAII

MOEA/D

SPEA2

Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std

DTLZ1

DTLZ2

DTLZ6

U8

U9

U10

5.5842E−03 – 7.4912E−04 2.6918E−03 5.5567E−01 3.1323E+00 8.7733E−02 3.7367E−01 1.2612E−01 2.8243E+00 4.0756E−02 3.6489E−01 1.7765E−02 1.8044E+00 3.6772E−03 5.6567E−03

3.4284E−02 – 7.8824E−03 4.6695E−02 6.4033E−02 1.6498E+00 8.5318E−03 3.2810E−02 8.2314E−02 1.5864E+00 1.1618E−02 6.1159E−02 2.9506E−02 −1.2523E+00 7.6733E−03 5.9918E−02

5.9736E−03 – 7.7442E−04 3.4193E−04 8.3915E−02 5.3941E+00 9.8786E−03 3.5557E−02 1.0478E−01 7.2846E+00 5.2705E−02 6.3113E−01 7.6949E−01 8.0809E+00 9.5719E−02 6.2484E−01

2.1543E−03 – 7.0078E−04 3.3565E−03 9.5123E−02 7.9422E+00 4.2815E−02 3.3556E−01 3.7773E−01 8.3322E+00 8.0634E−02 7.5384E−01 2.1223E−03 −1.1733E+00 6.72212E−04 4.7554E−03

3.6376E−03 – 7.9488E−04 2.4623E−03 6.7689E−01 7.7075E+00 9.8595E−02 7.2964E−01 6.4823E−01 6.1554E+00 8.5322E−02 4.3713E−01 7.1423E−01 8.8222E+00 1.0933E−01 2.2265E−01

3.7243E−03 – 9.1851E−04 3.8523E−03 3.1712E−03 −1.3767E+00 7.8899E−04 2.0142E−03 5.7183E−01 7.2478E+00 6.6187E−02 4.4023E−01 1.8047E−02 2.5422E+00 6.2383E−03 3.1144E−03

Table 4 Spread results on 7-D DTLZ1, 12-D DTLZ2, 22-D DTLZ6, 30-D U8, 30-D U9 and 30-D U10 by all algorithms.

 values MOBFA

NSGAII

MOEA/D

SPEA2

Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std Mean Z-value Best Std

DTLZ1

DTLZ2

DTLZ6

U8

U9

U10

1.8056E−03 – 7.0223E−04 4.8876E−03 1.0532E+00 9.0514E+00 6.6043E−01 5.8776E−01 1.9402E+00 1.0721E+01 6.8539E−01 1.9469E+00 4.3286E−01 3.1527E+00 6.1759E−01 2.9159E−01

6.5253E−01 – 8.2519E−02 3.2944E−01 2.5442E−03 −5.9451E+00 7.7390E−04 6.1067E−03 1.5902E−02 −1.7033E+00 7.4120E−03 8.0249E−02 3.1982E−01 1.5444E+00 9.2339E−02 7.2718E−01

1.7732E−03 – 5.1822E−04 3.6156E−03 7.8122E−02 3.1076E+00 1.6030E−02 2.8612E−02 3.0799E−01 5.3334E+00 6.6317E−02 6.7256E−02 2.2087E−01 8.1712E+00 6.2343E−02 7.2240E−01

2.9232E−01 – 7.9612E−02 5.0656E−01 1.5897E−02 −1.5956E+00 7.0078E−03 3.8103E−02 1.2123E+00 9.2490E+00 8.1623E−01 7.1615E−01 6.6439E−01 1.6535E+00 6.2746E−01 3.3344E−01

2.9889E−01 – 7.2044E−02 2.4495E−01 5.9961E−01 1.7462E+00 9.3986E−02 5.5437E−01 3.1909E−01 1.6046E+00 8.3982E−02 2.0812E−01 6.3905E−01 1.6114E+00 5.5843E−01 2.3557E−01

2.5634E−01 – 7.6812E−02 2.7163E−01 5.2712E−01 1.6842E+00 8.6633E−02 5.4178E−01 2.0562E+00 3.5811E+00 7.8747E−01 2.1477E+00 5.5328E−01 2.2932E+00 7.8788E−02 2.0472E−01

Fig. 7. Convergence performance comparisons of all algorithms on tri-objective benchmarks.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS 12

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16 Table 5 IGD results on U1, U2, U3, U8, U9 and U10 by MOBFA and MOGWO. IGD values MOBFA

Mean Z-value Best Std Mean Z-value Best Std

MOGWO

U1

U2

U3

U8

U9

U10

1.86E−02 – 5.94E−03 1.35E−02 1.14E−01 2.1800 8.02E−02 1.95E−02

9.75E−03 – 6.18E−03 2.88E−02 5.82E−02 1.7093 4.98E−02 7.39E−03

2.68E−01 – 9.56E−02 2.74E−01 2.556E−01 7.5320E−1 1.295E−01 8.070E−02

2.112 – 1.181 7.92E−01 2.0577 −4.0511E−1 1.1455 4.611E−01

1.910E−01 – 7.352E−01 8.341E−02 1.917E−01 2.4734E−1 9.25E−2 1.291E−01

4.363 – 3.634 6.752E−01 3.594 −2.6243E−1 3.488 1.0431

Table 6 IGD results on ZDT1, ZDT2, ZDT3, ZDT1 with linear front and tri-objective ZDT2 by MOBFA and MODA. IGD values MOBFA

MODA

Mean Z-value Best Std Mean Z-value Best Std

ZDT1

ZDT2

ZDT3

ZDT1 with linear front

tri-objective ZDT2

6.50E−04 – 2.22E−04 1.41E−03 6.12E−03 2.1004 2.40E−03 2.86E−03

6.96E−03 – 9.90E−04 2.26E−03 3.98E−03 8.7403E−01 2.30E−03 1.60E−03

7.95E−03 – 1.30E−03 2.90E−03 2.79E−02 1.8234 2.00E−02 4.02E−02

6.782E−04 – 3.482E−04 5.322E−03 6.160E−03 1.9833 2.200E−03 5.186E−03

9.161E−03 – 4.792E−03 2.535E−02 9.160E−03 −1.9834−03 4.800E−03 5.372 E−03

Table 7 IGD results on ZDT1, ZDT2, ZDT3, ZDT1 with linear front and tri-objective ZDT2 by MOBFA and MOALO. IGD values MOBFA

MOALO

Mean Z-value Best Std Mean Z-value Best Std

ZDT1

ZDT2

ZDT3

ZDT1 with linear front

tri-objective ZDT2

6.50E−04 – 2.22E−04 1.41E−03 1.52E−02 8.5507 6.10E−03 5.02E−03

6.96E−03 – 9.90E−04 2.26E−03 1.75E−02 2.6686 5.00E−03 1.09E−02

7.95E−03 – 1.30E−03 2.90E−03 3.03E−02 2.8883 3.03E−02 9.69E−04

6.78E−04 – 3.48E−04 5.32E−03 1.982E−02 7.6451 1.06E−02 7.545E−03

9.16E−03 – 4.79E−03 2.54E−02 2.629E−02 2.9103 1.91E−02 4.451E−03

4.3.2. Comparison with recent algorithms In this section, the MOBFA is compared with MOGWO, MODA and MOALO on a set of bi-objective instances and tri-objective instances. Note that MOGWO, MODA and MOALO are three recent MO bionic algorithms proposed by Mirjalili [17,18,19]. Accordingly, the experimental results of MOGWO, MODA and MOALO are directly taken from their original literatures [17,18,19]. Here the Wilcoxon test is used to identify the significant difference between the mean values of 20 runs obtained by MOBFA and those obtained by compared algorithm. Note that we take the given mean value of each compared algorithm in their original literatures [17,18,19] as the compared result of each run in Wilcoxon test. Table 5 shows experimental results of MOBFA and MOGWO on bi-objective U1, bi-objective U2, bi-objective U3, tri-objective U8, tri-objective U9 and tri-objective U10. From Table 5, it can be observed that MOBFA and MOGWO obtain similar results on half of the test functions, such as U3, U8, and U9. On U1, MOBFA is able to outperform its counterpart in the mean, best and standard deviation of IGD results. On U2, MOBFA can obtain the best values of the mean and best while MOGWO achieves the best standard deviation values. On U10, MOBFA performs somewhat laggard in the mean and best values. However, it still obtains the smallest standard deviation value, which verifies its performance stability. Table 6 reports computation results of MOBFA and compares them with the existing IGD results of MODA on bi-objective ZDT1, bi-objective ZDT2, bi-objective ZDT3, bi-objective ZDT1 with linear front, and tri-objective ZDT2. Note that the bi-objective ZDT1 with linear front, and tri-objective ZDT2 are respectively modified based

on ZDT1 and ZDT2 in the literature [18] and their detailed formulations can refer to [18]. From Table 6, we can see that on ZDT1 and ZDT3 MOBFA tends to outperform MODA, and on ZDT2 MOBFA performs worse than its counterpart. On tri-objective ZDT2, MOBFA performs very close to MODA in terms of the mean, best and standard deviation values. On ZDT1 with linear front, MOBFA obtains the smallest values of the mean and best items while MODA gets the smallest value of standard deviation. This means that MOBFA has the advantage of the solving precision and MODA is good at the solving stability. Table 7 shows the comparative results between MOBFA and MOALO on bi-objective ZDT1, bi-objective ZDT2, bi-objective ZDT3, bi-objective ZDT1 with linear front, and tri-objective ZDT2. From Table 7, we can clearly observe that on most of the test functions MOBFA can obtain the smallest values of the mean and best items, which verifies its searching efficiency. Note that MOALO also obtains the smallest standard deviation values on most of the test functions, which indicates that it has excellent performance stability to solve complex problems. 4.3.3. Discussion From the experimental results in Sections 4.3.1 and 4.3.2, it can be observed that on the bi-objective test functions, MOBFA, MOEA/D, MOGWO, MODA and SPEA2 can obtain satisfactory solutions in limited function evaluations. They perform similarly on these instances. Even on ZDT2 and ZDT6, MOBFA performs slightly worse than its counterparts, but when solving some more challenging tri-objective test functions, MOBFA does much better than other algorithms. These tri-objective functions are concave

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

13

Table 8 Representation of the vector x for each individual. Weight of raw materials (W)

Weight of original fused mass (wQ ),

Charging time (T)

w1 x1

wQ xd+1

t1 xd+2

w2 x2



wd xd

t2 xd+3

… …

Melting time (tm ). tm x2d+2

td x2d+1

Table 9 Component content of various elements in Bc_4 (unit: %). Elements

e1 e2 e3 e4 Impurity

Copper Materials 1(Wnew )

2(Wnew )

3(Wnew )

4(Wnew )

5(Wresg )

6(Wcw )

λj

uj

lj

97.3312 91.5712 0 1.4055 0.0920

13.1546 0 1.2827 0 0.312

92.5237 0 25.1783 2.23 0.442

0.00205 0.00415 0 95.34 0.0932

93.3672 3.6224 0.1526 0.03389 0.03712

94.0153 1.5094 1.0019 0 4.0442

96.1023 3.3315 0.0336 0.0838 0.0421

97.5323 2.2863 0.04136 0.1192 0.03984

97.8934 1.2934 0.02012 0.0803 0

or linear multimodal problems, with different properties in different areas. The Pareto-based MOEAs such as NSGAII and SPEA2 are relatively ineffective to deal with them because the similarity in a more-objective space is more difficult to estimate based on the Pareto optimality. That is, compared with the bi-objective instances, in the tri-objective search space, the density of nondominated solutions tends to reach the saturation state, and this state makes it more difficult to identify the differences between solutions. MOEA/D is based on the aggregation functions to decompose a multi-objective problem into a series of single-objective optimization problems. However, it is more difficult to assign its weight vectors in the tri-objective space compared with the biobjective space. Therefore, NSGAII, SPEA2 and MOEA/D obtain better solutions on the bi-objective instances but perform worse on the tri-objective instances. The advantage of MOBFA is mainly from the two-engine cooperation mechanism. The two heterogeneous engines aim to boost the convergence and the diversity separately. That is, the indicatorbased engine encourage a good performance on convergence and the Pareto-based one can promote a good performance on diversity. In addition, MOBFA makes use of the neighbor information offered by reinforcement learning to guide the flight trajectories of the foraging bee for the optimal search. 4.4. Real-world application In this section, a real-world problem of copper strip burdening optimization (CSBO) is investigated to show the excellent performance of the proposed method. The CSBO is an important multiobjective task in the non-ferrous metals industry, which aims to find the optimal formula of the raw materials to minimize or maximize the selected objective function while satisfying various physical and operational constraints imposed by copper smelting process limitations. And it can be formulated as follows. 4.4.1. Problem formulation Suppose that the weights of raw materials in burdening process are defined as a vector W, including the new copper (i.e., raw materials) as Wnew , the remaining copper in the same grade as Wresg , the remaining copper in different grade as Wredg and the chemical waste as Wcw . Apart from the new cooper Wnew , the remaining vector of W is regarded as the old raw materials Wold ={Wold |Wresg ∪Wredg ∪Wcw }; The charging time (or feeding sequence) is defined as T; The melting time is defined as tm ; Then, the decision variables are given as: W = (w1 , w2 , · · · wi , · · · wd ) where wi is charging weight of the ith raw material, d is the number of raw materials, T = (t1 , t2 , · · · , ti , · · · td ) where ti is charging time of the ith raw material. For the constraints of copper strip elements, we define the set of copper strip element, the set of main

elements, and the set of impurities as E, E1 , and E2 , respectively, E = {E| E1 ∪E2 }. As depicted in [54], the CSBO problem mainly considers two objectives: to minimize f1 that denotes the total cost of raw materials, and to maximize f2 that represents the amount of waste and old copper materials thrown into melting furnaces. Then, the CSBO problem can be formulated mathematically as:

 





min f1 W, T f , tm , − f2 W old



f1 (W, T , tm ) = F1 (W ) + F2 (W, T , tm )= =



f2 W old = f2 (wk+1 , wk+2 , · · · , wn ) = p 

s.t.

βi wi +

d 

δi wi i= p+1 ⎧i=m+1 Li ≤ wi ≤ Uim ⎪ ⎪ ⎪ 0 < T li ≤ ti ≤ T ui ⎪ ⎪ ⎪ ⎪ T lm ≤ tm ≤ T um ⎪ ⎪ ⎪ d ⎪  ⎪ ⎪ λi j wi + λ j wQ ≤ u j ⎨l j ≤

+

i=1 d 

ci wi [1 + ηi (tm − ti )]

i=1

+C wQ ηm (tm )



d 

⎪ ⎪ ⎪ 0≤ λi j wi + λ j wQ ≤ u j ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ d ⎪ ⎪ ⎪ ⎪ ⎩ wi + wQ = G

m 

αi wi

i=k+1

j ∈ E1 j ∈ E2

i=1

(13)

where ci is the coefficients of the cost for the ith raw material, tm is the melting time, C is the cost of the original fused mass, wq is the weight of the original fused mass. ηi (tm −ti ) = gi ln(hi ti + 1) is the burning loss function, ηm (tm ) = gm ln(hm tm + 1) is the burning loss function of the original fused mass. α , β , and δ respectively donate the penalty factor of the remainder in the same grade, remainder in different grade and chemical wastes. uj and lj are upper and lower boundaries of the jth element respectively. λj donates the proportion of the jth element of the original fused mass, wQ is the weight of the original fused mass, G is the total weight of copper materials, Di is safety stock of the ith copper material, Ui m = min{Ui , Di }. Ui , and Li are upper and lower boundaries of wi respectively, Tui , and Tli are upper and lower boundaries for charging time of the ith copper material respectively. Tum , and Tlm donate upper and lower boundaries for melting time respectively.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

ARTICLE IN PRESS

JID: KNOSYS 14

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16 Table 10 Elements parameters limit for Bc_4 (unit: %). Parameters

Copper Materials

Li /t Ui /t Di /t Tli /min Tui /min ci /(¥/t) hi (hq )

1(Wnew )

2(Wnew )

3(Wnew )

4(Wnew )

5(Wresg )

6(Wcw )

wQ

3.020 3.240 4.350 60 85 57,500 0.023

0.160 0.540 3.240 35 55 5750 0.022

0.0320 0.0410 0.540 35 55 65,200 0.020

0.125 0.355 0.745 40 60 15,100 0.015

1.325 11.13 4.33 40 50 28,500 0.025

0.230 0.850 1.440 45 65 14,600 0.022

/ / / 80 90 / 0.035

Table 11 A set of optimal solutions found by MOBFA on BC_4. Solutions

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

f1

f2

μk

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15

3.1500 3.1500 3.1500 3.1500 3.1500 3.1500 3.1500 3.1500 3.1500 3.1500 3.1510 3.1520 3.1500 3.1520 3.1500

0.1680 0.1680 0.1680 0.1680 0.1710 0.1710 0.1700 0.1680 0.1700 0.1680 0.1710 0.1710.1700 0.1710 0.1700

0.0320 0.0320 0.0320 0.0320 0.0310 0.0320 0.0320 0.0320 0.0320 0.0320 0.0320 0.0320 0.0320 0.0320 0.0320

0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190 0.1190

1.7330 1.6520 1.4220 1.9010 2.3380 4.5500 3.5490 3.9400 4.2310 3.0290 4.2100 4.0090 4.0320 4.0090 4.0320

0.8220 0.7250 0.7190 0.8220 0.8220 0.8120 0.8220 0.8220 0.8120 0.8220 0.8120 0.8130 0.8220 0.8220 0.8220

3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100 3.100

75 65 75 75 75 75 75 75 75 75 75 75 80 80 80

45 45 45 45 45 45 45 40 55 55 55 40 40 45 50

55 55 55 55 55 55 55 55 55 55 55 55 55 55 55

50 45 50 50 50 50 50 50 50 50 50 50 50 50 55

45 45 45 45 45 45 45 45 45 45 45 45 45 45 45

50 50 50 50 50 50 50 50 50 50 50 50 50 55 50

0 0 0 0 0 35 0 0 0 0 0 0 0 0 0

3.086E+05 3.864E+05 5.931E+00 3.501E+05 4.045E+05 6.402E+05 5.724E+05 5.095E+05 5.992E+05 4.141E+05 6.191E+05 6.712E+05 6.523E+05 6.621E+05 6.113E+05

5.736E+01 6.648E+01 4.558E+01 5.298E+01 6.860E+01 7.264E+01 6.413E+01 6.593E+01 7.125E+01 6.447E+01 5.936E+01 6.372E+01 5.582E+01 5.916E+01 6.681E+01

4.606E−02 4.764E−02 5.348E−02 4.205E−02 4.072E−02 3.598E−02 3.845E−02 3.755E−02 3.678E−02 3.867E−02 3.574E−02 3.493E−02 3.309E−02 3.187E−02 3.074E−02

Table 12 Results found by MOBFA, NSGA-II and MOEA/D on BC_4. Solutions

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 Average

MOBFA

NSGA-II

MOEA/D

f1

f2

μk

f1

f2

μk

f1

f2

μk

3.08E+05 3.85E+05 5.92 3.50E+05 4.04E+05 5.98E+05 5.71E+05 5.09E+05 4.14E+05 6.39E+05 4.17E+05

5.83E+01 6.76E+01 4.63E+01 5.38E+01 6.97E+01 7.24E+01 6.52E+01 6.70E+01 6.55E+01 7.38E+01 6.40E+01

4.662E−02 4.821E−02 5.413E−02 4.261E−02 4.120E−02 3.720E−02 3.890E−02 3.80E−02 3.910E−02 3.640E−02 4.220E−02

7.360E+05 7.590E+05 7.480E+05 7.161E+05 7.252E+05 6.301E+05 6.731E+05 7.493E+05 7.611E+05 6.381E+05 7.141E+05

7.63E+00 6.65E+00 6.81E+01 7.72E+00 6.03E+00 7.86E+00 6.76E+00 7.78E+00 6.94E+01 6.13E+00 1.94E+01

4.83E−02 5.18E−02 5.97E−02 4.63E−02 4.33E−02 3.69E−02 3.71E−02 3.71E−02 3.72E−02 3.55E−02 4.33E−02

5.36 6.59E+05 6.64E+05 6.92E+05 6.49E+05 7.12E+05 6.77E+05 7.09E+05 6.83E+05 6.67E+05 4.14E+05

4.50E+01 5.03E+01 4.80E+01 4.04E+01 4.22E+01 5.74E+01 4.73E+01 5.30E+01 5.64E+01 5.00E+01 4.90E+01

5.273E−02 5.351E−02 5.964E−02 4.732E−02 4.595E−02 3.946E−02 4.053E−02 3.992E−02 4.260E−02 3.910E−02 4.601E−02

4.4.2. Results and analysis Then, the NSGA-II and MOEA/D are selected as reference algorithms. The parameter settings of MOBFA, NSGA-II and MOEA/D remain the same with Section 4.2. In each algorithm, Eq. (13) is employed as the fitness function and each individual in the population can be encoded as shown in Table 8. The self-adaptive penalty function is used to handle the constraints including equality and inequality in the proposed model as depicted in [55]. A CSBO instance called Bc_4 is instantiated from actual data of the copper strips automatic production line. In Bc_4, four elements (i.e., e1 ,e2 , e3 ,and e4 ) are involved as shown in Tables 9 and 10, where 1 to 4 represent the new raw materials Wnew , 5 donates the remainder of the same grade Wresg and 6 is the chemical waste Wcw . Based on the principle of actual melting process, the Wredg (i.e., the remaining materials in different grade) is not used in the burdening, thus the penalty factors in f2 are determined empirically: α = 6, β = 0 and δ = 45, and other parameters can be set as: G = 12.5t, wQ = 3.5t and C = 85,0 0 0 ¥/t. Fifteen best candidate solutions found by MOBFA are reported in Table 11 where their objective function values and the best μk

are also given. As we can see in Table 12, most optimal solutions found by MOBFA exhibit an excellent convergence because that their charging times (i.e., x8 −x13 ) are practically consistent, and only a tiny proportion of them have different feeding amount. From another point of view, this diversity of feeding amount can offer more burdening choices for decision makers. Accordingly, the μk is the quality measurement of the solution. That is, the greater value of μk has the higher priority of the solution. Then, the computation results including objective function values and the μk values corresponding to ten best solutions found by MOBFA, NSGA-II and MOEA/D are reported in Table 12. It can be observed from Table 12 that MOFBA always obtains lower cost and higher μk than NSGA-II and MOEA/D. Fig. 8 plots final nondominated solutions found by all algorithms on Bc_4 where the xy coordinates are determined by Eq. (13), different with the objective function values given in Tables 11 and 12. As we can see from Fig. 8, MOBFA and MOEA/D can approximate a properly distributed shape of the plotted optima solutions on Bc_4. However, the solutions found by MOBFA can dominate those found by MOEA/D, which indicates MOBFA finds better solutions for decision maker.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS

ARTICLE IN PRESS L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

[m5G;July 23, 2017;6:2] 15

Fig. 8. Solutions found by each algorithm on the BC_4 instance.

5. Conclusions This paper has developed a new framework algorithm based on ABC paradigm called MOBFA to solve MO problems. MOBFA aims to approximate a well converged and properly distributed PF via evolving two cooperative bee colonies with different evolution rules. Specifically, in the colony-to-colony interaction, the Pareto-based engine communicates with the indicator-based engine based on the comprehensive learning mechanism, which essentially boost the population diversity. In the inter-colony evolution, the neighbor-discount-information (NDI) learning based on RL is developed into the single-objective ABC paradigm to enhance its local search ability. The proposed MOBFA has been experimentally compared with several state-of-the-art multi-objective algorithms including NSGAII, MOEA/D and SPEA2 on a set of typical test benchmarks and a real-world problem. Computation results show that MOBFA is significant superior or at least comparable to its competitors in terms of two commonly used metrics IGD and SPREAD. Note that MOBFA sometimes suffers the dilemma of being easily stuck at local MO optima on some test problems. Comparisons between MOBFA and several recent MO algorithms including MOGWO, MODA and MOALO also indicates that MOBFA obtains satisfactory performance in solution accuracy. Certainly, we can not assert that MOBFA is always superior to other MO algorithms. The strengths and weaknesses of MOBFA need to be investigated specifically based on the characteristics of test problems. A comprehensive sensitivity analysis of parameters of the algorithm, and the theoretic analysis of algorithm complexity will be highlighted in our future work. Acknowledgements The authors would like to thank the editors and anonymous reviewers for their helpful comments and suggestions on improving the quality of this paper. This work is supported by National Science Foundation for Distinguished Young Scholars of China under Grant No. 71325002; the National Natural Science Foundation of China under Grant No. 61503373 and No. 61572123; Natural Science Foundation of Liaoning Province under Grand No. 2015020 0 02; and Fundamental Research Funds for the Central Universities No. N161705001. References [1] K. Miettinen, Nonlinear Multiobjective Optimization, Kluwer, Norwell, MA, USA, 1999 chapter 1. [2] J.D. Knowles, D.W. Corne, K. Deb, Multi-Objective Problem Solving Form Nature, Springer, New York, NY, 2008. [3] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGAII, IEEE Trans. Evol. Comput. 6 (2) (Apr. 2002) 182–197.

[4] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the strength pareto evolutionary algorithm, in: Proceedings of EUROGEN 2001: Evolutionary Methods Design Optimization Control Appl. Ind. Problems, Athens, Greece, 2002, pp. 95–100. [5] M. Köppen, K. Yoshida, Substitute distance assignments in NSGAII for handling many-objective optimization problems, in: Evolutionary Multi-Criterion Optimization, Springer, Berlin, Germany, 2007, pp. 727–741. [6] S.F. Adra, P.J. Fleming, Diversity management in evolutionary many-objective optimization, IEEE Trans. Evol. Comput. 15 (2) (Apr. 2011) 183–195. [7] Q. Zhang, W. Liu, H. Li, The performance of a new version of MOEA/D on CEC09 unconstrained MOP test instances, in: Proceedings of Congress on Evolutionary Computation (CEC 2009), Norway, 2009, pp. 203–208. [8] Q. Zhang, H. Li, MOEA/D: A multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput. 11 (6) (Dec. 2007) 712–731. [9] H. Li, D.L. Silva, An elitist GRASP metaheuristic for the multi-objective quadratic assignment problem, in: Proceedings of 5th International Conference on EMO, 2009, pp. 481–494. [10] N. Chen, W.N. Chen, Y.J. Gong, et al., An evolutionary algorithm with double-level archives for multiobjective optimization, IEEE Trans. Cybern. 45 (9) (2015) 1851–1863. [11] Q. Zhang, W. Liu, E. Tsang, B. Virginas, Expensive multiobjective optimization by MOEA/D with Gaussian process model, IEEE Trans. Evol. Comput. 14 (3) (Jun. 2010) 456–474. [12] L.F. Santos, S.L. Martins, A. Plastino, Applications of the DM-GRASP heuristic: a survey, Int. Trans. Oper. Res. 15 (4) (Jun. 2008) 387–416. [13] E. Zitzler, S. Künzli, Indicator-based selection in multiobjective search, in: Parallel Problem Solving from Nature-PPSN VIII, Springer, Berlin, Germany, 2004, pp. 832–842. [14] N. Beume, B. Naujoks, M. Emmerich, SMS-EMOA: multiobjective selection based on dominated hypervolume, Eur. J. Operat. Res. 181 (3) (2007) 1653–1669. [15] R. Hernandez Gomez, C. CoelloCoello, MOMBI: a new metaheuristic for many-objective optimization based on the R2 indicator, Evol. Comput. (2013) 2488–2495. [16] J. Bader, E. Zitzler, HypE: an algorithm for fast hypervolume-based many-objective optimization, Evol. Comput. 19 (1) (2011) 45–76. [17] S. Mirjalili, S. Saremi, S.M. Mirjalili, et al., Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization, Expert Syst. Appl. 47 (2016) 106–119. [18] S. Mirjalili, Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems, Neural Comput. Appl. 27 (4) (2016) 1053–1073. [19] S. Mirjalili, P. Jangir, S. Saremi, Multi-objective ant lion optimizer: a multi-objective optimization algorithm for solving engineering problems, Appl. Intell. (2016) 1–17. [20] K. Deb, M. Mohan, S. Mishra, Evaluating the ε -domination based multi-objective evolutionary algorithm for a quick computation of Pareto-optimal solutions, Evol. Comput. 13 (4) (2005) 501–525. [21] H. Sato, H.E. Aguirre, K. Tanaka, Controlling dominance area of solutions and its impact on the performance of MOEAs, in: Evolutionary Multi-Criterion Optimization, Springer, Berlin, Germany, 2007, pp. 5–20. [22] K. Ikeda, H. Kita, S. Kobayashi, Failure of Pareto-based MOEAs: does non-dominated really mean near to optimal, in: Evolutionary Computation, 2001. Proceedings of the 2001 Congress on. IEEE, 2, 2001, pp. 957–962. [23] M. Köppen, R. Vicente-Garcia, B. Nickolay, Fuzzy-paretodominance and its application in evolutionary multi-objective optimization, in: Evolutionary Multi-Criterion Optimization, Springer, Berlin, Germany, 2005, pp. 399–412. [24] Z. Xiufen, C. Yu, L. Minzhong, et al., A new evolutionary algorithm for solving many-objective optimization problems, Syst. Man Cybern. Part B Cybern. IEEE Trans. 38 (5) (2008) 1402–1412. [25] Z. Xiufen, C. Yu, L. Minzhong, et al., A new evolutionary algorithm for solving many-objective optimization problems, Syst. Man Cybern. Part B Cybern. IEEE Trans. 38 (5) (2008) 1402–1412.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024

JID: KNOSYS 16

ARTICLE IN PRESS

[m5G;July 23, 2017;6:2]

L. Ma et al. / Knowledge-Based Systems 000 (2017) 1–16

[26] S. Kukkonen, J. Lampinen, Ranking-dominance and many-objective optimization, in: Proceedings of IEEE Congress on Evolutionary Computation (CEC), Singapore, 2007, pp. 3983–3990. [27] S. Yang, et al., A grid-based evolutionary algorithm for many-objective optimization, Evol. Comput. IEEE Trans. 17 (5) (2013) 721–736. [28] Approach, Part I: solving problems with box constraints, IEEE Trans. Evol. Comput. 18 (4) (2014) 577–601. [29] H. Wang, L. Jiao, X. Yao, Two_Arch2: an improved two-archive algorithm for many-objective optimization, IEEE Trans. Evol. Comput. 19 (4) (2015) 524–541. [30] R. Cheng, Y. Jin, M. Olhofer, et al., A Reference vector guided evolutionary algorithm for many-objective optimization, IEEE Trans. Evolut. Comput. (2016). [31] Z. He, G.G. Yen, Many-objective evolutionary algorithm: objective space reduction + diversity improvement, Helvetica ChimicaActa 86 (12) (2003) 4133–4151. [32] D.K. Saxena, K. Deb, Dimensionality reduction of objectives and constraints in multi-objective optimization problems: a system design perspective, in: IEEE World Congress on Computational Intelligence, 2008, pp. 3204–3211. [33] H.K. Singh, A. Isaacs, T. Ray, A pareto corner search evolutionary algorithm and dimensionality reduction in many-objective optimization problems, Evol. Comput. IEEE Trans. 15 (4) (2011) 539–556. [34] S. Bandyopadhyay, A. Mukherjee, An algorithm for many-objective optimization with reduced objective computations: a study in differential evolution, IEEE Trans. Evol. Comput. 19 (3) (2015) 400–413. [35] M.G. Gong, L.C. Jiao, D.D. Yang, et al., Research on evolutionary multi-objective optimization algorithms, J. Softw. 20 (2) (2009) 271–289. [36] E. Zitzler, et al., Performance assessment of multiobjective optimizers: an analysis and review, IEEE Trans. Evol. Comput. 7 (2) (2003) 117–132. [37] Karaboga, D., An Idea Based on Honey Bee Swarm for Numerical Optimization. 2005, Technical report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department. [38] Yi X., Zhou Y.. A dynamic multi-colony artificial bee colony algorithm for multi-objective optimization, Appl. Soft Comput., 2015, 35(C):766–785. [39] J.Q. Li, Q.K. Pan, K.Z. Gao, Pareto-based discrete artificial bee colony algorithm for multi-objective flexible job shop scheduling problems, Int. J. Adv. Manuf. Technol. 55 (2011) 1159–1169. [40] R. Akbari, R. Hedayatzadeh, K. Ziarati, et al., A multi-objective artificial bee colony algorithm, Swarm Evol. Comput. 2 (1) (2012) 39–52.

[41] S.N. Omkar, J. Senthilnath, R. Khandelwal, G. Narayana Naik, S. Gopalakrishnan, Artificial bee colony (ABC) for multi-objective design optimization of composite structures, Appl. Soft Comput. 11 (2011) 489–499. [42] B. Akay, Synchronous and asynchronous Pareto-based multi-objective artificial bee colony algorithms, J. Glob. Optim. 57 (2013) 415–445. [43] Y.B. Zhong, Y. Xiang, H.L. Liu, A multi-objective artificial bee colony algorithm based on division of the searching space, Appl. Intell. 41 (4) (2014) 987–1011. [44] Y. Xiang, Y. Zhou, H. Liu, An elitism based multi-objective artificial bee colony algorithm, Eur. J. Operat. Res. 245 (1) (2015) 168–193. [45] Y. Huo, Y. Zhuang, J. Gu, et al., Elite-guided multi-objective artificial bee colony algorithm, Appl. Soft Comput. 32 (2015) 199–210. [46] G. Peng, Y.W. Fang, W.S. Peng, C. Dong, X. Yang, Multi-objective particle optimization algorithm based on sharing-learning and dynamic crowding distance, Optik (2016). [47] C. Watkins, P. Dayan, Technical Note: Q-learning, Mach. Learn. 8 (3) (1992) 279∼292. [48] E. Zitzler, Evolutionary Algorithms for Multiobjective Optimisation: Methods and Applications PhD thesis, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, 1999. [49] Q. Zhang, W. Liu, H. Li, The performance of a new version of MOEA/D on CEC09 unconstrained MOP test instances. Evolutionary Computation, 2009. CEC ’09.IEEE Congress on. IEEE, 2009:203–208. [50] K. Deb, L. Thiele, M. Laumanns, et al., Scalable Test Problems For Evolutionary Multiobjective Optimization, Springer, London, 2005. [51] W.J. Connover, Practical Nonparametric Statistics, 3rd ed., Wiley, New York, NY, USA, 1999 ch. 5. [52] B.Y. Qu, P.N. Suganthan, Constrained multi-objective optimization algorithm with ensemble of constraint handling methods, Eng. Optimiz 43 (4) (2011) 403. [53] L. Ma, K. Hu, Y. Zhu, et al., Cooperative artificial bee colony algorithm for multi-objective RFID network planning, J. Network Comput. Appl. 42 (2014) 143–162. [54] H. Zhang, Y. Zhu, X. Yan. Multi-hive artificial bee colony algorithm for constrained multi-objective optimization Evolutionary Computation (CEC), 2012 IEEE Congress on. IEEE, 2012:1–8. [55] B.Y. Qu, P.N. Suganthan, Constrained multi-objective optimization algorithm with ensemble of constraint handling methods, Eng. Optimiz 43 (4) (2011) 403.

Please cite this article as: L. Ma et al., Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning, Knowledge-Based Systems (2017), http://dx.doi.org/10.1016/j.knosys.2017.07.024