Training recurrent neural networks by using parallel tabu search algorithm based on crossover operation

ARTICLE IN PRESS Engineering Applications of Artiﬁcial Intelligence 17 (2004) 529–542 Training recurrent neural networks by using parallel tabu sear...

Download PDF

366KB Sizes 0 Downloads 50 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS

Engineering Applications of Artiﬁcial Intelligence 17 (2004) 529–542

Training recurrent neural networks by using parallel tabu search algorithm based on crossover operation Adem Kalinli*, Dervis Karaboga Department of Computer Engineering, Engineering Faculty, Erciyes University, 38039 Kayseri, Turkey Received 22 September 2002; accepted 8 April 2004

Abstract There are several heuristic optimisation techniques used for numeric optimisation problems such as genetic algorithms, neural networks, simulated annealing, ant colony and tabu search algorithms. Tabu search is a quite promising search technique for nonlinear numeric problems, especially for the problems where an optimal solution must be determined on-line. However, the converging speed of the basic tabu search to the global optimum is the initial solution dependent since it is a form of iterative search. In order to overcome this drawback of basic tabu search, this paper proposes a new parallel model for the tabu search based on the crossover operator of genetic algorithms. After the performance of the proposed model was evaluated for the well-known numeric test problems, it is applied to training recurrent neural networks to identify linear and non-linear dynamic plants and the results are discussed. r 2004 Elsevier Ltd. All rights reserved. Keywords: Tabu search; Parallel tabu search; Continuous optimisation; Elman networks; System identiﬁcation

1. Introduction The great difﬁculty of optimisation problems encountered in practical areas such as production, control, communication and transportation has motivated the researchers to develop new powerful algorithms. The popular ones of these new algorithms include Genetic algorithms (GAs), simulated annealing (SA), ant colony optimisation (ACO), tabu search (TS) and artiﬁcial neural networks (ANNs) (Reeves, 1995; Corne et al., 1999; Pham and Karaboga, 2000). Although all of these algorithms have convergence property to global optimum, they cannot always guarantee the optimum solutions for the problem. Therefore, they are called approximate or heuristic algorithms. GAs initially developed by Holland (1975) are a form of directed random search and based on natural selection and genetic science. Each cycle of GA *Corresponding author. Tel.:+90-352-4374901/40706; fax: +90352-4375267. E-mail addresses: [email protected] (A. Kalinli), [email protected] (D. Karaboga). 0952-1976/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2004.04.003

is called generation which includes the evaluation of solutions and the application of selection and genetic operators such as crossover and mutation. The GAs used for numeric function optimisation can be mainly classiﬁed into two groups: real-coded and binary-coded GAs. Although the real-coded GAs have been sucessively applied to numeric function optimisation (Mihalewicz, 1996; Oliveira and Lorena, 2002), the operators used with binary-coded GAs are more simple than those employed with real-coded GAs. A standard GA can ﬁnd the promising regions of the search space very quickly due to its parallel nature. However, it might have a problem with local converging since the operations employed are probabilistic rather than deterministic. Although GAs store cumulative information implicitly regarding the past steps of the search, they might have to evaluate the similar or the same solutions several times during it since they do not have any mechanism to avoid the already evaluated solutions. Therefore, standard GA can require a high number of evaluations and for the problems where evaluations are expensive, it is not applicable anymore.

ARTICLE IN PRESS 530

A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

SA algorithm principles originally come from the metropolis algorithm (Metropolis et al., 1953). Kirkpatrick et al. (1983) suggested that the metropolis algorithm can be used to search the feasible solutions of an optimisation problem. The algorithm is a variant of heuristic technique of local search. It simulates the cooling process of a material in a heat bath, which is known as the annealing process. A standard SA algorithm is good at local convergence since it works in an iterative way. However, since it does not keep any history of the search and also uses probabilistic rule to select the next solution when no improvement occurs, similar solutions might have to be evaluated repeatedly. In addition, the converging speed of the search depends on the initial solution. If the initial solution is too far from the global optima region it might take a long time to get to. ANNs simulate the biological neural network structure and the learning system (Rumelhart and McClelland, 1986). When an ANN is applied to an optimisation problem, ﬁrst of all the problem must be mapped onto an ANN by a suitable coding of the possible solutions. Then an appropriate energy function corresponding to the optimisation objective must be chosen. Next, the mean ﬁeld equations are deﬁned and solved iteratively to ﬁnd the lowest energy. A robust method is required for solving the mean ﬁeld equations since the energy surface might be non-convex. Consequently, ANNs approach to the optimisation problems are different from the other heuristic methods. The optimisation with ANN is not a straightforward way and it does not use a trial-and-error process. ACO algorithm was proposed by Dorigo et al. (1991), which simulates the behaviour of real ant colonies. The main features of the algorithm are a distributed computation, positive feedback and constructive greedy search. Since 1991, several works have been carried out on the new models of ACO algorithm and their applications to optimisation problems. However, most of these works have been on the combinatorial-type problems (Gambardella and Dorigo, 1997; Di Caro and Dorigo, 1998; Stutzle . and Dorigo, 1999; Gambardella et al., 1999). In the literature, there are just a few works related to the models proposed for continuous optimisation and their engineering applications (Wodrich, 1996; Corne et al., 1999; Hiroyasu et al., 2000; Kalinli et al., 2001). Touring ant colony optimisation (TACO) algorithm has been proposed by Hiroyasu et al. (2000) for numeric optimisation problems. Kalinli et al. has also presented a modiﬁed version of TACO algorithm in 2001. TS algorithm is a form of iterative search and based on intelligent problem solving principles. The algorithm was developed by Glover (1986). TS has a ﬂexible memory to keep the information about the past steps of the search and uses it to create and exploit the new

solutions in the search space. A basic TS algorithm starts searching with a present solution and constructs a set of feasible solutions from the present one based on neighbourhood by using a history record of the search. It evaluates the solutions constructed and selects the one which has the highest evaluation value as the next solution. Having selected the next solution, the history record is then modiﬁed. This process is repeated until a predeﬁned stopping criteria is satisﬁed. In the use of TS, the possibility of repeatedly evaluating similar solutions decreases since TS records a history of search. Of the ﬁve algorithms stated previously, although GAs, SA and NNs have found many applications from both combinatorial and numeric optimisation areas (Karaboga, 1994; Davis, 1991; Goldberg, 1989; Van Laarhoven and Aarts, 1988; Abramson, 1991), ACO and TS algorithms have been generally applied to solving combinatorial-type optimisation problems rather than numeric ones (Costa, 1994; Bland, 1994a; Beyer and Ogier, 1991; Laguna et al., 1991; Hopﬁeld and Tank, 1985; Corne et al., 1999). However, TS seems to be quite promising for non-linear numeric optimisation problems, especially for on-line optimisation, since it is based on neighbourhood search and works iteratively (good for local converging) and can get out of a local minima in a multi-modal search space by using the history record of search (good for global optimisation) (Karaboga and Kaplan, 1995; Karaboga and Kalinli, 1996a,b). Therefore, some researchers have attempted to develop modiﬁed versions of TS for the problems with continuous variables (Hu, 1992; Bland, 1994b; Cvijovic and Klinowski, 1995; Battiti and Tecchiolli, 1996; Karaboga and Kalinli, 1996a; Siarry and Berthiau, 1997; Ni Guangzheng and Yan, 1998; Fanni et al., 1999; Chelouah and Siarry, 2000; Traferro and Uncini, 2000; Franze" and Speciale, 2001; Machado et al., 2001) and applied to solve the problems from different ﬁelds. The algorithms described in these works are of a conventional type, i.e. they have a serial structure. However, a conventional TS might have a problem with reaching the global optimum solution in a reasonable computation time when the initial solution is far away from the region where optimum solution exists. Two of the latest works related to continuous version of TS belong to Chelouah and Siarry (2000) and Franze" and Speciale (2001). Chelouah and Siarry proposed a TS algorithm called the enhanced continuous TS (ECTS) which is directly inspired from the simple combinatorial TS algorithm. ECTS ﬁrstly locates the most promising areas and then continues the search by intensiﬁcation within one promising area of the solution space to reach the global minimum. In their work, Chelouah and Siarry compared the performance of their algorithm with that of some other continuous versions of TS by using the results obtained for a set of well-known test functions. They reported that ECTS can produce similar or better

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

results than the ones provided by the other versions and ECTS is also simpler than other versions since it is directly inspired from the simple combinatorial TS algorithm. Franze" and Speciale described a TS algorithm called DOPE which is based on pattern search and tabu search, and also compared the performance of DOPE with that of other continuous versions of TS including ECTS. From the results obtained, they concluded that the performance of DOPE is strongly dependent on the initial solution. The dependence of the convergence speed of TS on the initial solution can be weakened by introducing a parallel structure into the algorithm (Malek et al., 1989; Crainic and Toulouse, 1997; Aiex et al., 1998). The parallelism helps the TS ﬁnd the promising regions of the search space very quickly. In this work a parallel model, which is based on the crossover operator of GAs, for the continuous version of TS is described. The parallel TS proposed is a very simple algorithm, although it has a parallel structure, since it employs a basic TS derived from combinatorial TS. The performance of the proposed TS is compared to that of the basic TS, GA and TACO algorithms for several numeric test problems. Then, it is employed to train a recurrent neural network to identify linear and non-linear dynamic plants. Section 2 describes the basic principles of TS algorithm. Section 3 presents the proposed model. The simulation results obtained from the test functions optimisation and the application of TS to training recurrent neural network are given in Section 4. Section 5 presents the discussion.

2. Basic principles of tabu search Tabu search is a general heuristic search procedure devised for ﬁnding a global minimum of a function f(x). The problem of searching the optimum value of x which makes f(x) minimum is called the optimisation problem of f(x) and can be mathematically expressed by Minimise f ðxÞ; Subject to xAX The function of f(x) may be linear or non-linear and the condition xAX describes the constraints on the solution x. A step of the TS starts with a present solution xnow. xnowAX has an associated set of feasible solutions Q which can be obtained by applying a simple modiﬁcation to xnow. This modiﬁcation is called a move. In order to be able to get rid of a local minima, a move to the neighbour, x, is created even if x is worse than xnow. This would cause the cycling of the search. In order to avoid the cycling problem, a tabu list T is introduced. The tabu list stores all the tabu moves that cannot be applied to the present solution, xnow. The moves stored in the tabu list are those carried out most

531

frequently and recently according to some criteria called tabu restrictions. The use of tabu list decreases the possibility of cycling because it prevents returning in a certain number of iterations to a solution visited recently. After a subset of feasible solutions Q is produced according to the tabu list and evaluated for f(x) the next solution is selected from it. The highest evaluation solution is selected as the next solution xnext. This loop is repeated until a stopping criteria is satisﬁed. A tabu restriction is activated when the reverse of a move recorded in the tabu list occurs within a predetermined number of iterations or with a certain frequency over longer range of iterations. The former produces a recency-based restriction and the latter a frequency-based restriction. Tabu restrictions might sometimes prevent the search to ﬁnd the solutions which have not been visited yet or even cause all available moves to be classiﬁed as the tabu. Therefore, it should be possible to forget the tabu constraints when a freedom is required for the search. A criterion called aspiration criteria is employed to determine which moves should be freed in such cases. The ﬂowchart of a basic TS is presented in Fig. 1. In the initialisation unit, a random feasible solution xinitialAX for the problem is generated, and the tabu list and other parameters are initialised. In the neighbour production unit, a feasible set of solutions is produced from the present solution according to the tabu list and aspiration criteria. The evaluation unit evaluates each solution x produced from the present xnow one. After the next solution xnext is determined by the selection unit, in the last unit the history record of the search is modiﬁed. If the next solution determined is better than the best solution found so far xbest, the next solution is replaced with the present best solution. 2.1. Basic tabu search used In this work, a string of binary numbers is used to represent a possible solution and TS employs the following two constraints which are based on recency and frequency memories as the tabu constraints: recencyðx Þ >¼ restriction period frequency ratioðx Þo ¼ frequency limit The recency of a move is the difference between the current iteration count and the last iteration count at which that move was created. The frequency measure is the frequency ratio whose numerator represents the count of the number of occurrences of a speciﬁc move and whose denominator represents the average numerator value over all possible moves. In the algorithm used, the highest evaluation move is selected as the next solution. The aspiration by default is employed as the aspiration criteria. According to this criteria the least tabu solution is selected as the next

ARTICLE IN PRESS 532

A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

the communication between the algorithms is carried out at the predetermined moments, then parallel algorithms are called synchronous. If this information exchange is realized at different moments, this type of algorithms are called asynchronous. The parallelism used in this work is of the last type. The information exchange process between the basic TS algorithms executed in parallel is based on the crossover operation used in GAs. 3.1. Crossover operator used The crossover operator employed by GAs is used to create two new solutions (children) from two existing solutions (parents) in the population formed by the selection operation. Depending on the representation way of the problem in the string form, a proper crossover operator must be chosen (Michalewicz, 1996; Goldberg, 1989). When the problem is represented in the binary string form, the simplest crossover operation can be applied as the following: two solutions are randomly selected as parent solutions from the population and cut at a randomly selected point. The tails, which are the parts after the cutting point, are swapped and two new solutions are produced. A crossover operation can thus yield better solutions by combining the good features of parent solutions. An example of this simple crossover operator is given below PresentSolution1 111100|001111 PresentSolution2 110001|111111 NewSolution1 NewSolution2

-Tail -Tail

111100|111111 110001|001111

Fig. 1. Flowchart of a basic tabu search.

3.2. Structure of the proposed model

solution. It is the solution which loses its tabu classiﬁcation by the least increase in the value of the present iteration number.

3. Proposed parallel model for tabu search There are mainly four sources of paralelism in TS algorithm (Crainic and Toulouse, 1997): Parallelism (a) in the cost function evaluation; (b) in the neighbourhood examination; (c) in the problem decomposition; and (d) in the solution domain exploration by carrying out different searches. In the last type parallelism, different independent TS algorithms are executed from different initial solutions. TS algorithms being executed in parallel have the ability of exchanging information. If

The ﬂowchart of the proposed model is depicted in Fig. 2. Blocks from 11 to h1 are each identical to the ﬂowchart shown in Fig. 1. These blocks represent h TSs executing independently of one another. The initial solutions for TSs at the ﬁrst level are created using random number generator with different seeds. After a given number of iterations (maxit1) the execution of h TSs is stopped. maxit1 is normally chosen to be sufﬁciently large to allow the search to complete the local searching. The h solutions found by these TSs represent the ‘preliminary’ solutions. An operation is applied to the solutions found by h parallel TSs at the ﬁrst level to create the initial solutions for TSs at the second level. Various methods could be used for this purpose, for example, the best solution found by the TSs at the ﬁrst level can always be selected as one of the initial solutions for the second level. The others can be

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

4. Simulation results

Randomly produced initial solutions

11

21

31

….. ..

h1

22

32

….. ..

h2

….. ..

hl

Crossover 2

Crossover s

1l

2l

3l

The simulation work consists of two parts: the numeric function optimisation and training Elman network to identify dynamic linear and non-linear systems. 4.1. Numeric function optimisation

Crossover 1

12

533

Finally produced solutions

Fig. 2. Flowchart of the proposed tabu search.

produced from the preliminary solutions by using a suitable crossover operator or all the initial solutions can be created using crossover operator. Another procedure to create initial solutions could be the following: the best preliminary solution is selected directly, some can be created by applying crossover operator to the preliminary solutions and the rest can be generated by a random number generator. In this work, the best preliminary solution is directly selected and the rest of initial solutions are created by applying crossover operation to the preliminary solutions. In the Fig. 2, block h2 is the same as block h1, except that it does not use an initial solution generated randomly. The TSs of blocks are then executed as maxit1 iterations and again the solutions found by TSs at the second level are used to form the initials of TSs at the third level. The TSs existing at the level l are executed in the same way. The best solution found through the whole search process is taken as the optimal solution for the problem. Each basic TS can have different control parameter values. However, in this work for the comparison it is assumed that the parameter values of all TSs are the same.

Seven well-known minimisation test functions were employed to determine the performance of the proposed TS algorithm. These test functions are given in Table 1. The ﬁrst four test functions have been proposed by De Jong (1975). All test functions reﬂect the different degrees of complexity: Sphere (F1) is smooth, unimodal, strongly convex and symmetric. Rosenbrock (F2) is considered to be difﬁcult, because it has a very narrow ridge. The tip of the ridge is very sharp and it runs around a parabola. Step (F3) is the representative of the problem of the ﬂat surfaces. Flat surfaces are obstacles for optimisation algorithms because they do not give any information as to which direction is favourable. The background idea of the step function is to make the search more difﬁcult by introducing small plateaus to the topology of an underlying continuous function. Foxholes (F4) is an example of a function with many local optima. Many standard optimisation algorithms get stuck in the ﬁrst peak they ﬁnd. Function (F5) has 40,000 local minimum points in the region when x1 and x2 are within [10,10]. Griewangk (F6) is also a non-linear and multi-modal function. The terms of the summation produce a parabola, while the local optima are above parabola level. The dimensions of the search range increase on the basis of the product. Rastrigin’s function (F7) is a fairly difﬁcult problem due to the large search space and large number of local minima. This function contains millions of local optima in the interval of consideration. The solutions were represented with binary strings. Therefore, a new solution which is not in tabu list was created from the present solution by changing the value of a bit, 1 to 0 or 0 to 1. The solutions for the functions, parameter bounds, resolutions and the length of each solution for each test function are given in Table 2. Simulations to evaluate the proposed TS were conducted in two stages. First, the basic TS described in Section 2.1 was executed 30 times with different initial solutions for each problem. After several trials it was seen that 16,800 evaluations were sufﬁcient to compare the performance of algorithms. Therefore, the algorithms were stopped after 16,800 evaluations at each run. All the parameter values of the TSs were kept constant through all levels.

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

534

Table 1 Numeric test functions used in the simulations Notation

Name

F1

Sphere

Function P f1 ¼ 3i¼1 x2i

F2

Rosenbrock

f2 ¼ 100ðx21 x2 Þ2 þ ð1 x1 Þ2

F3

Step

f3 ¼

Foxholes

where [xi] represents the greatest integer less than or equal to xi 1 1 P P2 6 f4 ¼ 0:002 þ 25 j þ ðx a Þ i ij j¼1 i¼1

F4

P5

i¼1

½xi ;

fa1j ; a2j Þg25 j¼1 = {(32,32), (16,32), (0,32), (16,32), (32,32), (32,16), (16,16), (0,16), (16,16), (32,16),...,(32,32), (16,32), (0,32), (16,32), (32,32)} F5 F6

Griewangk

F7

Rastrigin

f5 ¼ ðx21 þ x22 Þ=2 cos ð20px1 Þcosð20px2 Þ þ 2 !! 2 Q10 P10 xi xi p ﬃ i¼1 cos f6 ¼ 1 þ i¼1 4000 i P20 2 f7 ¼ 20A þ i¼1 xi 10 cosð2pxi Þ ; A ¼ 10

Table 2 Number of parameters, solutions, parameter bounds and length of solution for the test functions Function

F1 F2 F3 F4 F5 F6 F7

Number of parameters

4 2 5 2 2 10 20

Solutions

Parameter bounds

Length of a solution

xi

f(x)

Lower

Upper

0.0 1.0 5.12 32.0 0.0 0.0 0.0

0.0 0.0 30.0 1.0 1.0 0.0 0.0

5.12 2.048 5.12 65.536 10 600 5.12

5.12 2.048 5.12 65.536 10 600 5.12

In the second stage, the simulation results were obtained for the proposed model. The proposed TS was executed 30 times with different initial solutions. The number of the basic TSs running in parallel was four (h = 4). Each basic TS at the ﬁrst level was run for 600 (maxit1) evaluations. The best of the solutions found by TSs at the ﬁrst level was directly taken as the initial solution for a TS at the second level. The other three of the initial solutions required by the second level TSs were created by applying the crossover operator, described in Section 3.1 to the preliminary solutions found by TSs at the ﬁrst level. At each level, three crossover operations were carried out. After each crossover process, two new solutions were produced and one of these new solutions was randomly chosen and replaced with a present solution. The total number of evaluations made at the ﬁrst level was 2400 (h x maxit1). This process was repeated through seven levels (l=7), hence the total number of evaluations carried out throughout the whole search was 16,800 (h x maxit1 x l).

40 32 50 40 36 200 400

In order to show the robustness of the proposed model, the frequency histogram of the results obtained by using GA, TACO, the basic and the proposed TSs were drawn. Figs. 3–9 show the histograms for the test functions 1–7, respectively. For all algorithms, the total evaluation number was 16,800. In order to evaluate the convergence speed of the algorithms, the average improvement of the best solutions over 30 runs are demonstrated for the functions F6 and F7 in Fig. 10. 4.2. Training recurrent neural network by using TS algorithm From a structural point of view, there are two main types of neural networks: feedforward neural networks (FNNs) and recurrent neural networks (RNNs) (Pham and Karaboga, 1999). Connections that allow information to loop back to the same processing element are called recursive and NNs having this type connections are named RNNs. RNNs are more suitable than FNNs

ARTICLE IN PRESS 100

100

80

80

frequency (%)

frequency (%)

A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

60 40

535

60 40

20

20

0

0 0

0.01

(a)

0.02

0.02

0.03

0.04

0

0.05

0.01

(b)

Objective value

0.02

0.02

0.03

0.04

0.05

Objective value

60

60

40

40

frequency (%)

frequency (%)

Fig. 3. Histograms drawn from the results obtained for the function F1 by (a) the basic, and the proposed TSs and TACO algorithm, (b) GA.

20

20

0

0 1.0E-05 1.0E-04

0.03

1.0E-05 1.0E-04 1.0E-03 5.0E-03

0.25

(b)

Objective value

60

40

40

20

0.01

0.03

0.25

Objective value

60

frequency (%)

frequency (%)

(a)

1.0E-03 5.0E-03 0.01

20

0

0 1.0E-05 1.0E-04 1.0E-03 5.0E-03

(c)

0.01

0.03

0.25

1.0E-05 1.0E-04 1.0E-03 5.0E-03

(d)

Objective value

0.01

0.03

0.25

Objective value

100

100

80

80 frequency (%)

frequency (%)

Fig. 4. Histograms drawn from the results obtained for the function F2 by (a) the basic TS, (b) the proposed TS, (c) GA and (d) TACO.

60 40

60 40

20

20

0

0 -30

(a)

-29

-28

-27

-26

-30

-25

Objective value

(b)

-29

-28

-27

-26

-25

Objective value

Fig. 5. Histograms drawn from the results obtained for the function F3 by (a) the basic, and the proposed TSs, and GA (b) TACO.

for representing a dynamic system since they have a dynamic mapping between its output(s) and input(s). Although gradient-based search techniques such as back-propagation (BP) are currently the most widely used optimisation techniques for training neural networks, it has been shown that these techniques are severely limited in their ability to ﬁnd global solutions. Global search techniques such as GA, SA and TS have been identiﬁed as a potential solution to this problem. Although the use of GAs for ANN training has mainly

focused on FNNs (Arifovic and Gencay, 2001; Sexton and Gupta, 2000; Castillo et al., 2000), there are several works on training RNNs using GAs in the literature (Ku et al., 1999; Blanco et al., 2000, 2001). However, SA and TS have not found so many applications for the training purpose of ANNs (Castillo et al., 1999; Sexton et al., 1998; Battiti and Tecchiolli, 1995). A special type of RNNs is the Elman network (see Fig. 11) (Elman, 1990; Liu, 1993). The Elman network has feedforward and feedback connections. However, so

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

536

60

frequency (%)

frequency (%)

60

40

20

20

0

0 0.999

1

(a)

1.001 1.206 1.298 1.443 Objective value

0.999

1.001 1.206 1.298 1.443 Objective value

60

60

40

40

20

1

(b)

frequency (%)

frequency (%)

40

20

0

0 0.999

(c)

1

1.001 1.206 1.298 1.443 Objective value

0.999

(d)

1

1.001 1.206 1.298 1.443 Objective value

100

100

80

80

frequency (%)

frequency (%)

Fig. 6. Histograms drawn from the results obtained for the function F4 by (a) the basic TS, (b) the proposed TS, (c) GA and (d) TACO.

60 40 20 0 1.0025 1.0028 1.0106 1.0138 1.0183 Objective value

20

1.0000 1.0025 1.0028 1.0106 1.0138 1.0183

(b)

Objective value

100

100

80

80

frequency (%)

frequency (%)

40

0 1.0

(a)

60

60 40 20 0

60 40 20 0

1.0000 1.0025 1.0028 1.0106 1.0138 1.0183

(c)

Objective value

1.0000 1.0025 1.0028 1.0106 1.0138 1.0183

(d)

Objective value

Fig. 7. Histograms from the results obtained for the function F5 by (a) the basic TS, (b) the proposed TS, (c) GA and (d) TACO.

that it can be trained essentially as feedforward networks by means of the simple BP algorithm, the feedback connection weights have to be kept constant. For the training to converge, it is important to select the correct values for the feedback connection weights. However, ﬁnding these values manually can be a lengthy trial-and-error process. In this part of the work, the performance of the proposed TS is tested for training Elman network to identify dynamic plants. The use of TS to train the Elman network to identify a dynamic plant is illustrated in Fig. 12. Here, ym(k) and yp(k) are the outputs of the

network and plant, at the time k, respectively. Training of the network can be considered as a minimisation problem deﬁned by min JðwÞ;

wAW

ð1Þ

where w ¼ ½w1 w2 w3 ywv T is the weight vector of the network. The time-averaged cost function J(w) to be minimised by adaptively adjusting w can be expressed as !1=2 N 2 1X min JðwÞ ¼ yp ðkÞ ym ðkÞ ; ð2Þ N k¼1

ARTICLE IN PRESS 100

100

80

80 frequency (%)

frequency (%)

A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

60 40 20 0 0.10

(a)

0.15

0.20

0.25

40 20

0.30

1.0E-08 2.0E-08 4.2E-07 6.0E-07 8.0E-07 Objective value

(b)

Objective value 10 0

100

80

80 frequency (%)

frequency (%)

60

0 0.005

60 40 20

60 40 20 0

0 1

(c)

537

1.1

1.2

1.3

1.4

1.5

5

1.6

Objective value

(d)

6

7

8

9

10

11

Objective value

Fig. 8. Histograms from the results obtained for the function F6 by (a) the basic TS, (b) the proposed TS, (c) GA and (d) TACO.

Fig. 9. Histograms drawn from the results obtained for the function F7 by (a) the basic TS, (b) the proposed TS, (c) GA and (d) TACO.

where N is the number of samples used for the calculation of cost function. A solution to the problem is a string of trainable connection weights representing a possible network (Fig. 13). TS algorithm searches the best weight set by means of cost function values calculated for solutions in the string form. The structure of the network employed in this work is selected as in Liu (1993) to compare the results. Since the plants to be identiﬁed are single-input single-output (SISO), the number of the external input neurones and the output neurones is equal to one. The number of neurones at the hidden layer is equal to six. Therefore,

the total number of connections is 54, of which six are the feedback connections. In the case of only feedforward connections trainable, a solution is represented as a string of 48 weights. When all connections have trainable weights, then the string consists of 54 weights, of which the six are feedback connection weights. In both cases, each weight is represented with 16 binary bits. A new solution is produced from the present solution by changing the bit value of a binary number which is not in the tabu list. The feedback connections have weight values ranging from 0.0 to 1.0 while feedforward can have positive or negative weights between 1.0 and 1.0. Note that from the point of view

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

538

Fig. 10. The average progress of the best solutions over 30 runs for (a) the function F6 and (b) the function F7.

α1

context unit

w1

w2

w3

1

2

3

....

wp

α1

α2

p

p+1

p+2

αq ....

Fig. 13. Representation of the trainable weights of a network in string form.

αq

y(k) u(k)

output layer

hidden layer input layer

Fig. 11. Structure of the Elman network.

between the plant and recurrent network outputs is computed by means of Eq. (2). Next, the rms error values computed for the solutions are used to select the highest evaluation weight set. The weight set with which the minimum rms error was obtained is selected as the highest evaluation weight set. From the point of view of the optimisation, this is again a minimisation problem. Simulations were conducted to study the ability of RNN trained by the basic and the proposed TSs to model a linear and non-linear plant. A sampling period of 0.01 s was assumed in all cases. Linear plant: This is a third-order linear system described with the following discrete-time equation: yðkÞ ¼ A1 yðk 1Þ þ A2 yðk 2Þ þ A3 yðk 3Þ þ B1 uðk 1Þ þ B2 uðk 2Þ þ B3 uðk 3Þ;

u(k)

yp(k)

Plant + Recurrent Neural Network

p+v

error

ym(k)

TS Algorithm Fig. 12. Scheme for training a network to identify a plant using TS algorithm.

of TS algorithm, there is no difference between the feedback and feedforward connections and training one type of connections is carried out identically to training the other, unlike in the case of the commonly used BP training algorithm. In the training stage, ﬁrst a sequence of the input signals u(k) (k=0,1....) is fed to both the plant and the recurrent network designed with the weights obtained from a non-tabu solution. Second, the rms error value

ð3Þ

where A1=2.627771, A2=2.333261, A3=0.697676, B1=0.017203, B2=0.030862 and B3=0.014086. The Elman network with all linear neurons was tested. Training input signal, u(k), k=0,1,...199, was randomly produced and varied between 2.0 and 2.0. First, the results were obtained by assuming that only the feedforward connection weights trainable. Second, the results were obtained by considering all connection weights of the Elman network trainable. For each case, experiments were repeated six times for different initial solutions. The results obtained by using the BP algorithm, the basic and the proposed TSs are given in Fig. 14. As an example, the responses of the plant and the network designed by the proposed TS are presented in Fig. 15. The average rms error values and the improvement percentages for linear plant obtained by using BP algorithm, the basic and proposed TS algorithms are presented in Table 3. Non-linear plant: The second plant model adopted for the simulations was that of a simple pendulum swinging

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542 0.08

539

backpropagation

0.07 TS (only FF connections variable TS (all connections variable) PTS (all connections variable)

rms error

0.06 0.05 0.04 0.03 0.02 0.01 0 1

0

2

4

3

5

6

7

trial

Fig. 14. rms error values obtained for the linear plant for six runs with different initial solutions.

Plant

0.10

coefﬁcient, y the angle of deviation from the vertical position and u the external force exerted on the pendulum. The parameters used in this model were as follows:

Network

0.08 0.06

output

0.04 0.02

T ¼ 0:2s; g ¼ 9:8 m=s2 ;

0.00

l ¼ 1:2 kg m2 =s; M ¼ 1:0 kg; L ¼ 0:5 m:

-0.02 -0.04

Replacing the parameters with their values in Eq. (4) gives

-0.06 -0.08 -0.10 0

5

10

15

20

time (sec)

yðkÞ ¼A1 yðk 1Þ þ A2 yðk 2Þ þ A3 y3 ðk 2Þ þ B1 uðk 2Þ;

ð5Þ

Fig. 15. Responses of the plant and the network trained by the proposed TS (third-order linear plant, rms error=6.481271E03).

Table 3 Comparison of the results for linear plant Model

Average rms error

Improvement (%)

Back propagation (BP) Basic TS (a=1) Basic TS (all weights trainable) Proposed TS (all weights trainable)

7.67536E02 9.60774E03 8.16635E03 6.46933E03

— 87.48 89.36 91.57

through small angles (Liu, 1993). The discrete-time description of the plant is lT lT gT 2 yðkÞ ¼ 2 yðk 1Þ þ yðk 2Þ 1 ML2 ML2 L gT 2 3 T2 y ðk 2Þ þ uðk 2Þ; ð4Þ 6L ML2 where M stands for the mass of the pendulum, L the length, g the acceleration due to gravity, l the friction

where A1=1.04, A2=0.824, A3=0.130667 and B1=0.16. The Elman network with non-linear neurons in the hidden layer was employed. The hyperbolic tangent function was adopted as the activation function of nonlinear neurons. The neural networks were trained using the same sequence of random input signals as mentioned above. As in the case of linear plant, the results were obtained for six different runs with different initial solutions. The rms error values obtained by the BP algorithm, the basic and the proposed TSs are presented in Fig. 16. As an example, the responses of the nonlinear plant and the recurrent network with the weights obtained by the proposed TS are shown in Fig. 17. The average rms error values and the improvement percentages for non-linear plant obtained by using BP algorithm, the basic and proposed TSs are presented in Table 4. The proposed TS was the same as described in Section 3.2 and all parameters were constant except the evaluation numbers. The evaluation number for each level was 24,000 and as a result of this 168,000 evaluations were carried out.

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

540

0.35

backpropagation

0.30 TS(only FF connections variable) TS(all connections variable) PTS(all connections variable)

rms error

0.25 0.20 0.15 0.10 0.05 0.00 0

1

3

2

4

5

6

7

trial

Fig. 16. rms error values obtained for non-linear plant for six different runs with different initial solutions.

Plant

Network

1.0 0.8 0.6

output

0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 0

5

10 time (sec)

15

20

Fig. 17. Responses of the plant and the network trained by the proposed TS (non-linear plant, rms error=2.714256E02).

Table 4 Comparison of the results for non-linear plant Model

Average rms error

Improvement (%)

Back propagation Basic TS (a=1) Basic TS (all weights trainable) Proposed TS (all weights trainable)

0.26182E00 0.19589E00 0.11990E00 2.83189E02

— 25.18 54.21 89.18

5. Discussion From the histograms obtained for the numeric test functions 1 and 3, it is seen that the basic and the proposed TSs are able to ﬁnd the optimum solution for all runs (see Figs. 3 and 5). The reason is that the ﬁrst problem is a convex and continuous function, and the third is a convex and discrete function. For the rest of the numeric test problems, the basic TS, GA and TACO

cannot reach the optimal solution for all runs. However, the proposed TS can ﬁnd the optimum solutions or the solutions very near to the optimum for each run. From the histograms presented in Figs. 4 and 6–9(a–d), it can be easily concluded that the proposed TS is able to ﬁnd better solutions for all functions except the F1 and F3 than the basic TS, GA and TACO did. Moreover, Fig. 10 shows that the convergence speed of the proposed TS is signiﬁcantly better than other algorithms. The original Elman network could identify the thirdorder linear plant successfully. Note that an original Elman network with an identical structure to that adopted for the original Elman network employed in this work and trained using the standard BP algorithm had failed to identify even second order linear plant (Liu, 1993). Moreover, when the original Elman network had been trained by the standard GA, the third-order plant could not be identiﬁed, although the second-order plant had been identiﬁed successfully (Pham and Karaboga, 1999). It can be apparently seen that for both plants, the training was signiﬁcantly faster when all connection weights of the network were trainable than only feedforward connection weights could be changed. Thus, by using TS algorithm not only was it possible and simple to train the feedback connection weights, but the training time required was lower than for the feedforward connection weights alone. It is clearly seen from Figs. 14–17 and Tables 3 and 4 that, for both of the network structures (with all connection weights variable and with only feedforward connection weights trainable), the proposed TS trained the networks better than the basic TS. The performance of the proposed TS is more evident in the case of training non-linear plant, although the performance of both TSs is similar to the training linear plant. From the point of view of GA, the proposed TS model has all features of GA since it employs basic

ARTICLE IN PRESS A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

operations of GA such as selection, crossover and mutation, and it also has a parallel structure. Selection operation is applied in two different ways: the ﬁrst is the local elite selection applied during the TS phase to determine the next solution; the second is the global elite selection procedure applied to select the initial solution of a TS at the next level. After each level, a crossover operation is applied to create some of the initial solutions of the TSs at the next level. The operation used to create neighbours during the TS phase can be regarded as mutation operation supposing that the binary representation is employed. The selection of bits to which mutation will be applied is carried out by the tabu list. The frequency of the mutation operation is governed by the tabu constraints. If the tabu constraints are too strict, the frequency of mutation operation will be low or vice versa. Since the proposed TS combines the feature of both GA and TS, it has the advantages of both algorithms.

6. Conclusion In this study a parallel TS algorithm based on the crossover operator of GAs was proposed. The performance of the proposed TS was compared to that of the basic TS, GA and TACO algorithm for numeric test problems. It was also applied to training a recurrent neural network to identify linear and non-linear plants, and the results obtained were compared to those produced by the basic TS and BP algorithms. From the simulation results obtained, it was concluded that the proposed TS can be used to search the multi-modal search spaces successfully and efﬁciently applied to training recurrent neural networks to identify dynamic plants accurately, especially for non-linear plants.

References Abramson, D., 1991. Constructing school timetables using simulated annealing: sequential and parallel algorithms. Management Science 37, 98–113. Aiex, M.R., Martins, S.L., Riberio, C.C., Rodriguez, N.R., 1998. Cooperative multi-thread parallel tabu search with an application to circuit partitioning. Lecture Notes in Computer Science (LNCS), vol. 1457. Springer, Berlin, pp. 310–331. Arifovic, J., Gencay, R., 2001. Using genetic algorithms to select architecture of a feedforward artiﬁcial neural network. Physica A 289, 574–594. Battiti, R., Tecchiolli, G., 1995. Training neural nets with the reactive tabu search. IEEE Transactions on Neural Networks 6 (5), 1185–1200. Battiti, R., Tecchiolli, G., 1996. The continuous reactive tabu search: blending combinatorial optimization and stochastic search for global optimization. Annals of Operations Research 63, 53–188. Beyer, D.A., Ogier, R.G., 1991. Tabu learning: a neural network search method for solving nonconvex optimisation problems.

541

Proceedings of the International Joint Conference on Neural Networks, IEEE and INNS, vol. 2, Singapore, pp. 953–961. Blanco, A., Delgado, M., Pegalajar, M.C., 2000. A genetic algorithm to obtain the optimal recurrent neural network. International Journal of Approximate Reasoning 23, 67–83. Blanco, A., Delgado, M., Pegalajar, M.C., 2001. A real-coded genetic algorithm for training recurrent neural networks. Neural Networks 14, 93–105. Bland, J.A., 1994a. Tabu search approach to engineering optimisation. Proceedings of the ninth International Conference on Applications of Artiﬁcial Intelligence in Engineering. Computation Mechanics Publications, Southampton UK, pp. 423–430. Bland, J.A., 1994b. A derivative-free exploratory tool for function minimization based on tabu search. Advances in Engineering Software 19, 91–96. Castillo, P.A., Gonzalez Pen˜alver, J., Merelo, J.J., Prieto, A., Rivas, V., Romero, G., 1999. SA-Prop: Optimisation Of Multilayer Perceptron Parameters Using Simulated Annealing. In: Mira, J., S!anchez-Andr!es, J.V. (Eds.), Lecture Notes in Computer Science, vol. 1606. Springer, Berlin, pp. 661–670. Castillo, P.A., Merelo, J.J., Prieto, A., Rivas, V., Romero, G., 2000. Gprop: global optimisation of multilayer perceptrons using GAs. Neurocomputing 35, 149–163. Chelouah, R., Siarry, P., 2000. Tabu search applied to global optimisation. European Journal of Operational Research 123, 256–270. Corne, D., Dorigo, M., Glover, F. (Eds.), 1999. New Ideas in Optimization. McGraw-Hill, UK. Costa, D., 1994. A tabu search algorithm for computing an operational timetable. European Journal of Operational Research 76, 98–110. Crainic, T.G., Toulouse, M., 1997. Parallel metaheuristics. Research Report, Centre de Recherche sur les Transports Universite de Montreal, France. Cvijovic, D., Klinowski, J., 1995. Taboo search: an approach to the multiple minima problem. Science 667, 664–666. Davis, L. (Ed.), 1991. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. De Jong, K.A., 1975. An analysis of the behaviour of a class of genetic adaptive systems. Ph.D. Thesis, University of Michigan. Di Caro, G., Dorigo, M., 1998. Mobile agents for adaptive routing. In: Proceedings of the of 31st Hawaii Conference on Systems Sciences (HICSS-31), Hawaii, pp. 74–83. Dorigo, M., Maniezzo, V., Colorni, A., 1991. Positive feedback as a search strategy. Technical Report No. 91-016, Politecnico di Milano. Elman, J.L., 1990. Finding structure in time. Cognitive Science 14, 179–211. Fanni, A., Manunza, A., Marchesi, M., Pilo, F., 1999. Tabu search metaheuristics for electromagnetic problems optimization in continuous domain. IEEE Transactions on Magnetics 34 (3), 1694–1697. Franz"e, F., Speciale, N., 2001. A tabu-search-based algorithm for continuous multiminima problems. International Journal for Numerical Methods in Engineering 50, 665–680. Gambardella, L.M., Dorigo, M., 1997. HAS-SOP: hybrid ant system for the sequential ordering problem. Technical Report No. IDSIA97-11, IDSIA, Lugano, Switzerland. Gambardella, L.M., Taillard, E., Agazzi, G., 1999. MACS-VRPTW: a multiple ant colony system for vehicle routing problems with time windows. Technical Report IDSIA-06, Switzerland. Glover, F., 1986. Future paths for integer programming and links to artiﬁcial intelligence. Computers and Operations Research 5, 533–549. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimisation, and Machine Learning. Addison-Wesley, Reading, MA.

ARTICLE IN PRESS 542

A. Kalinli, D. Karaboga / Engineering Applications of Artificial Intelligence 17 (2004) 529–542

Hiroyasu, T., Miki, M., Ono, Y., Minami Y., 2000. Ant Colony for Continuous Functions, The Science and Engineering. Doshisha University, Japan. Holland, J.H., 1975. Adaptation in Natural and Artiﬁcial Systems. University of Michigan Press, Ann Arbor. Hopﬁeld, J.J., Tank, D.W., 1985. Neural computation of decisions in optimisation problems. Biological Cybernetics 52, 141–152. Hu, N., 1992. Tabu search method with random moves for globally optimal design. International Journal for Numerical Methods in Engineering 35, 1055–1070. Kalinli, A., Karaboga, N., Karaboga, D., 2001. A modiﬁed touring ant colony optimisation algorithm for continuous functions. The 16th International Symposium on Computer and Information Sciences (ISCIS XVI), Isık University, Turkey, November 5–7, pp. 437–444. Karaboga, D., 1994. Design of fuzzy logic controllers using genetic algorithms. Ph.D. Thesis, University of Wales College of Cardiff, Cardiff, UK. Karaboga, D., Kalinli, A., 1996a. Tuning PID controller parameters using tabu search algorithm. IEEE International Conference on SMC, Beijing, China. Karaboga, D., Kalinli, A., 1996b. A new model for tabu search algorithm. Proceedings of the First Turkish Symposium on Intelligent Manufacturing Systems, Sakarya University, Turkey, pp. 168–175. Karaboga, D., Kaplan A., 1995. Optimising multivariable functions using tabu search algorithms. The 10th Int. Symp. on Computer and Information Sciences, vol. 2, Turkey, pp. 793–799. Kirkpatrick, C.D., Gellat, C.D., Vechi, M.P., 1983. Optimisation by simulated annealing. Science 220, 671–680. Ku, K.W., Mak, M.W., Siu, W.C., 1999. Adding learning to cellular genetic algorithms for training recurrent neural networks. IEEE Transactions on Neural Networks 10 (2), 239–252. Laguna, M., Barnes, J.W., Glover, F., 1991. Tabu search methods for a single machine scheduling problem. Journal of Intelligent Manufacturing 2, 63–74. Liu, X., 1993. Modelling and prediction using neural networks. Ph.D. Thesis, University of Wales College of Cardiff, Cardiff, UK. Machado, J.M., Shiyou, Y., Ho, S.L., Peihong, N., 2001. A common tabu search algorithm for the global optimisation of engineering problems. Computer Methods in Applied Mechanics and Engineering 190, 3501–3510. Malek, M., Guruswamy, M., Pandya, P., Owens, H., 1989. Serial and parallel simulated annealing and tabu search algorithms for the travelling salesman problem. Annals of Operations Research 21, 59–84.

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculation by fast computing machines. Journal of Chemical Physics 21, 1087–1091. Michalewicz, Z., 1996. Genetic Algorithms+Data Structures = Evolution Programs. Springer, New York. Ni Guangzhen, Y.S., Yan, L., 1998. An universal tabu search algorithm for global optimization of multimodal functions with continuous variables in electromagnetics. IEEE Transactions on Magnetics 34 (5), 2901–2904. Oliveira, A.C.M., Lorena, L.A.N., 2002. Real-coded evolutionary approaches to unconstrained numerical optimization. Terceiro Congresso de Logica Aplicada a Tecnologia, Sao Paulo, 11–13 November. Pham, D.T., Karaboga, D., 1999. Training Elman and Jordan networks for system identiﬁcation using genetic algorithms. Journal of Artiﬁcial Intelligence in Engineering 13, 107–117. Pham, D.T., Karaboga, D., 2000. Intelligent Optimisation Techniques: Genetic Algorithms, Tabu Search, Simulated Annealing and Neural Networks. Advanced Manufacturing Series, Springer, London. Reeves, C.R. (Ed.), 1995. Modern Heuristic Techniques for Combinatorial Optimisation. McGraw-Hill, UK. Rumelhart, D.E., McClelland, J.L. (Eds), 1986. Parallel Distributed Processing: Exploration in the Microstructure of Cognition, vol. 1. MIT Press, Cambridge, MA. Sexton, R.S., Gupta, J.N.D., 2000. Comparative evaluation of genetic algorithm and backpropagation for training neural networks. Information Sciences 129, 45–59. Sexton, R.S., Alidaee, B., Dorsey, R.E., Johnson, J.D., 1998. Global optimisation for artiﬁcial neural networks: a tabu search application. European Journal of Operational Research 106, 570–584. Siarry, P., Berthiau, G., 1997. Fitting of tabu search to optimize functions of continuous variables. International Journal for Numerical Methods in Engineering 40, 2449–2457. Stutzle, . T., Dorigo, M., 1999. ACO algorithms for quadratic assignment problem. In: Corne, D., Dorigo, M., Glover, F. (Eds.), New Ideas in Optimization. McGraw-Hill, New York, pp. 33–50. Traferro, S., Uncini, A., 2000. Power-of-two adaptive ﬁlters using tabu search. IEEE Transactions on Circuits and Systems—II: Analog and Digital Signal Processing 47 (6), 566–569. Van Laarhoven, P.J.M., Aarts, E.H.L., 1988. Simulated Annealing: Theory and Applications. Kluwer, Academic Publishers, Dordrecht. Wodrich, M., 1996. Ant colony optimization. Under Graduate Thesis, Department of Electrical and Electronic Engineering, University of Cape Town, South Africa.

Training recurrent neural networks by using parallel tabu search algorithm based on crossover operation

Training recurrent neural networks by using parallel tabu search algorithm based on crossover operation

Recommend Documents