Optimization Via Evolutionary Processes SRILATA RAMAN Unified Design Systems Laboratory Motorola Inc. Austin, Texas, USA AND
L. M. PATNAIK
Microprocessor Applications Laboratory Indian Institute of Science Bangalore, India
Abstract Evolutionary processes have attracted considerable interest in recent years for solving a variety of optimization problems. This article presents a synthesizing overview of the underlying concepts behind evolutionary algorithms, a brief review of genetic algorithms, and motivation for hybridizing genetic algorithms with other methods. Operating concepts governing evolutionary strategies and differences between such strategies and genetic algorithms are highlighted. Genetic programming techniques and their application are discussed briefly. To demonstrate the applicability of these principles, representative examples are drawn from different disciplines.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Evolutionary Strategies (ESs) and Evolutionary Programming (EP) . . . . . . . . 2.1 Shortcomings of the ( m+ n)-ES . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Methods for Acceleration of Convergence . . . . . . . . . . . . . . . . . . . 3. Genetic Algorithms (GAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Selection Strategies Used in GAS . . . . . . . . . . . . . . . . . . . . . . . 3.2 Fitness Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Parameters of a Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . 3.4 Classification of Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . 3.5 Implicit Parallelism in Genetic Algorithms . . . . . . . . . . . . . . . . . . 3.6 GAS in Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . ADVANCES IN COMPUTERS, VOL. 45
155
156 157 160 160 161 I62 162 164 165 I66 166 167 I 67
Copyright 0 1997 by Academic Press Lld All nghts of reproduction in any form m e w e d
156
SRILATA RAMAN AND L. M . PATNAIK
4 . Extensions to Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Generating the Initial Population . . . . . . . . . . . . . . . . . . . . . . . 4.2 Use of Subpopulations in Place of a Single Population . . . . . . . . . . . . 4.3 Parallelism in Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . 4.4 Hybrid Genetic Algorithms (HGAs) . . . . . . . . . . . . . . . . . . . . . 4.5 Use of Intelligent Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Avoidance of Premature Convergence . . . . . . . . . . . . . . . . . . . . 4.7 Messy Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Parameterized Uniform Crossover . . . . . . . . . . . . . . . . . . . . . . 4.9 Scaling of Fitness Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Adaptive Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 1 GAS in Multimodal Function Optimization . . . . . . . . . . . . . . . . . . 4.12 Coevolution, Parasites and Symbiosis . . . . . . . . . . . . . . . . . . . . 4.13 Differences Between Genetic Algorithms and Evolution Strategies . . . . . 4.14 Reasons for Failure of Genetic Algorithms . . . . . . . . . . . . . . . . . . 5. Other Popular Search Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Population-based Incremental Learning (PBIL) . . . . . . . . . . . . . . . . 5.2 Genetic Programming (GP) . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 TheAntSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Some Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Partitioning Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . 6.3 VLSI Design and Testing Problems . . . . . . . . . . . . . . . . . . . . . . 6.4 Neural Network Weight Optimization . . . . . . . . . . . . . . . . . . . . . 6.5 The Quadrature Assignment Problem . . . . . . . . . . . . . . . . . . . . . 6.6 The Job Shop Scheduling Problem (JSP) . . . . . . . . . . . . . . . . . . . 7 . Comparison of Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Techniques to Speed up the Genetic Algorithm . . . . . . . . . . . . . . . . . . . 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
168 168 168 169 170 170 171 171 172 172 173 173 174 175 176 177 177 178 184 184 184 185 187 189 191 191 192 193 193 194
1. Introduction
In the fundamental approach to finding an optimal solution. a cost function is used to represent the quality of the solution. The objective function to be optimized can be viewed as a multidimensional surface where the height of a point on the surface gives the value of the function at that point . In case of a minimization problem. the wells represent high-quality solutions while the peaks represent low-quality solutions. In case of a maximization problem. the higher the point in the topography. the better is the solution . The search techniques can be classified into three basic categories . (1) Classical or calculus.based . This uses a deterministic approach to
find the best solution. This method requires the knowledge of the gradient or higher-order derivatives. The techniques can be applied to well-behaved problems.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
157
(2) Enumerative. In these methods, all possible solutions are generated and tested to find the optimal solution. This requires excessive computation in case of problems involving a large number of variables. ( 3 ) Random. Guided random search methods are enumerative in nature; however, they use additional information to guide the search process. Simulated annealing and evolutionary algorithms are typical examples of this class of search methods. Evolutionary methods have gained considerable popularity as generalpurpose robust optimization and search techniques. The failure of traditional optimization techniques in searching complex, uncharted and vast-payoff landscapes riddled with multimodality and complex constraints has generated interest in alternate approaches. The interest in heuristic search algorithms with underpinnings in natural and physical processes began as early as the 1970s. Simulated annealing is based on thermodynamic considerations, with annealing interpreted as an optimization procedure. Evolutionary methods draw inspiration from the natural search and selection processes leading to the survival of the fittest. Simulated annealing and evolutionary methods use a probabilistic search mechanism to locate the global optimum solution in a multimodal landscape. After we discuss the principles underlying Simulated Annealing (SA), and Evolutionary Algorithms, we present a brief survey of the Evolutionary Strategies (ESs) and Evolutionary Programming (EP). This is followed by a brief review of Genetic Algorithms (GAS). We then discuss various extensions to GAS such as parallel GAS, hybrid GAS, adaptive GAS, and deceptive GAS. We also highlight other popular search techniques such as Genetic Programming (GP), the ant system; and demonstrate the applicability of these methods. Diverse applications such as those encountered in partitioning, the traveling salesman problem, VLSI design and testing, neural network weight optimization, the quadrature assignment problem, and the job shop scheduling problem are explained. Prior to concluding this chapter with our brief observations on challenges and future prospects of this exciting area, we highlight a comparison of the various search algorithms, and methods to speed up the GAS.
1.1
Simulated Annealing
Annealing is the process of cooling a molten substance with the objective of condensing matter into a crystalline solid. Annealing can be regarded as an optimization process. The configuration of the system during annealing is defined by the set of atomic positions ri. A configuration of the system is
158
SRILATA RAMAN AND
L. M. PATNAIK
weighted by its Boltzmann probability factor, exp( - E ( r , ) / k T ) , where E ( r , )is the energy of the configuration, k is the Boltzmann constant, and T is the temperature [45]. When a substance is subjected to annealing, it is maintained at each temperature for a time long enough to reach thermal equilibrium. The iterative improvement technique for combinatorial optimization has been compared to rapid quenching of molten metals. During rapid quenching of a molten substance, energy is rapidly extracted from the system by contact with a massive cold substrate. Rapid cooling results in metastable system states; in metallurgy, a glassy substance rather than a crystalline solid is obtained as a result of rapid cooling. The analogy between iterative improvement and rapid cooling of metals stems from the fact that iterative improvement accepts only those system configurations which decrease the cost function. In an annealing (slow cooling) process, a new system configuration that does not improve the cost function is accepted based on the Boltzmann probability factor of the configuration. This criterion for accepting a new system state is called the metropolis criterion. The process of allowing a fluid to attain thermal equilibrium at a temperature is also known as the metropolis process.
7.1.1
The Metropolis Procedure
The metropolis procedure for a temperature T and starting state S is given below. Procedure M E T R O P O L I S ( S ,T ) begin r e p e a t M times begin News:= P e r t u r b ( S ) : del t a - c o s t : = C (News 1 - C ( S ) : i f (delta-cost
The terminology of the Metropolis procedure is as follows: 0
0
0
M . A large positive integer constant which represents the number of times the system configuration is modified in an attempt to improve the cost function. Perturb. Given a system configuration S, the function “Perturb” returns a modified system configuration News. C. A function which returns the cost of the given system configuration S.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
0
0
159
delta cost. The change in cost function when the system configuration changes from S to News. random. A function which returns a random number in the range 0 to 1.
1.1.2 The Simulated Annealing (SA) Algorithm The SA procedure is presented below. Simulated annealing essentially consists of repeating the metropolis procedure for different temperatures. The temperature is gradually decreased at each iteration of the SA algorithm. The step S2 of the procedure SA is the “cooling step”. The constant “alpha” used in the cooling step is less than unity; it is normally selected to be close to 1 so as to achieve the “slow cooling” effect. P r o c e d u r e SA begin T:= I n i t i a l T e m p e r a t u r e : S:= I n i t i a l system configuration w h i l e ( T > F i n a l T e m p e r a t u r e ) do b e g i n S1: M E T R O P O L I S ( S . I . ) ; /* T h e r m a l e q u i l i b r i u m a t T * / S2: T : = T * a l p h a ; / * c o o l * I endwhi 1e endprocedure
The various parameters such as the integral constant M, the initial temperature, the final temperature, and the value of real constant “alpha” are selected based on some thumb rule, experimental studies or theoretical basis. For practical implementations, the termination condition is modified as follows. The procedure SA terminates if b successive calls to METROPOLIS fail to modify the cost function.
1. 7.3 Probfemsin the Original Formulation of SA If the initial temperature is too low, the process gets quenched very soon and only a local optima is found. If the initial temperature is too high, the process is very slow. Only a single solution is used for the search and this increases the chance of the solution getting stuck at a local optimum. The changing of the temperature is based on an external procedure which is unrelated to the current quality of the solution, that is, the rate of change of temperature is independent of the solution quality. These problems can be rectified by using a population instead of a single solution. The annealing mechanism can also be coupled with the quality of the current solution by making the rate of change of temperature sensitive to the solution quality.
160
SRllATA RAMAN AND L. M. PATNAIK
1.2
Evolutionary Algorithms
Many researchers have been inspired by nature’s way of optimization using evolutionary techniques. In their quest, they have devised three broadly similar methods: genetic algorithms (GAS), evolutionary strategies (ES), and evolutionary programming (EP). All these methods are similar in the sense that they operate on a population of solutions. A population is a set of solutions. New solutions are created by randomly altering the existing solutions. A measure of performance is used to assess the “fitness” of each solution, and a selection mechanism is used to determine which solutions can be used as parents for the subsequent generation of solutions. These methods differ in their nature of modeling evolution and the search operators used. Darwin’s evolution is intrinsically a robust search and optimization mechanism. The biologcal systems that are evolved demonstrate complex behavior at every level (the cell, the organ, the individual, and the population.) The evolutionary approach can be applied to the problems where heuristic methods are not available or where heuristic methods generally lead to unsatisfactory results. Most widely accepted evolutionary theories are based on the Neo-Darwinian paradigm. These arguments assert that the history of life can be fully accounted for by the physical processes operating on and within the populations and species. These methods diEer in their emphasis on various types of search operators. Genetic algorithms emphasize on models of genetic operators as observed in nature, such as crossover, inversion and mutation; and apply evolution at the level of chromosomes. Evolutionary strategies and evolutionary programming emphasize mutational transformationsthat maintain a behavioral linkage between each parent and its offspring at the level of the individual or species. Evolutionary strategies rely on deterministic selection, and evolutionary programming emphasizes the stochastic nature of selection by conducting a stochastic tournament among the parents and offspring. The probability that a particular trial solution will survive depends on the score it obtains in the competition.
2.
Evolutionary Strategies (ESs) and Evolutionary Programming (EP)
The evolutionary algorithm as applied to function optimization problems, and discussed in [9], is as follows: (1) Find the real-valued n-dimensional vector associated with the extremum of the function to be optimized. (2) An initial population of P parent vectors is selected at random. The distribution of the initial trials is typically uniform due to the nature of the randomization function.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
161
(3) An offspring vector is created from a parent by adding a Gaussian
random variable with zero mean and predetermined standard deviation. This is done for all the P parent vectors. (4) Selection then determines which of these solutions is to be maintained by calculating the errors of all the vectors. The P vectors that possess the least error become the new parents. (5) This is repeated until a stopping criterion is satisfied. Each component is viewed as a behavioral trait, not as a gene. It is assumed that whatever genetic transformations occur are the result of the changes in the behavior of the individual. Evolutionary strategies rely on a strict deterministic selection, whereas evolutionary programming uses a probabilistic selection mechanism by conducting a stochastic tournament selection to determine the population for the subsequent generations. The probability of survival of a particular solution depends on its rank in the population. Thus, the selection in EP emphasizes global exploration. ES abstracts coding structures as analogues of individuals while EP abstracts the structures as analogues of distinct species. Thus ES may use recombination operators to obtain new individuals but this is not used in EP as there is no sexual communication in the species. In ES, most often, multiple offspring are generated from each parent as opposed to a single offspring in EP. In the basic EP algorithm, the mutation operator is applied to each parent to get one offspring. The parents and offspring compete for selection. For every individual j , some c solutions are selected randomly from the mating pool (this includes both the parents and the offspring). Let q be the number of solutions among these which have a better fitness than that of j . This is recorded as the win for j . This number is computed for all the solutions in the population. Solutions characterized by high wins are selected as the new parents of the next generation. This is stochastic tournament selection [I 5,461. Let m parents generate n offspring during each generation. In (rn + n)-ES, both m parents and n offspring compete for survival, only the best m (population size) survive. The parents are allowed to exist until some better children supersede them. This may cause some super-fit individuals to survive forever. In (m, n)-ES, only the n offspring compete, and the best m survive for the next generation.
2.1 Shortcomings of the ( r n + n)-ES (1) For the class of problems where the optimum value changes with time, the algorithm tends to get stuck at an outdated good solution if the parameters cannot help the algorithm to jump to the new area [ 121.
162
SRILATA RAMAN AND L. M. PATNAIK
(2) The same problem can be seen if the measurement of fitness or the adjustment of object variables is prone to noise. (3) With m / n > P (the probability of a successful mutation), there is a deterministic selection advantage for those offspring which reduce some of their variances. A formal description of the evolutionary process is given in [ l 1. We can immediately recognize two state spaces-a genotypic (or coding) state space G and a phenotypic (or behavioral) state space P. Two alphabets are defined: an input alphabet I of the environmental symbols and the output alphabet 2 of behavioral responses. Evolution within a single generation can be explained as follows. Consider a population of genotypes Gi.Genetics plays a major role in the development of complex phenotypes. Cell development dependent on the local environment is called epigenesis. The process can be explained by the use of four mappings. The first mapping, epigenesis, incorporates the rules of growth under local conditions. It is represented by f,:I x G +P. The second mapping, selection, describes the process of selection, emigration and immigration within the populations. It is represented by f2:P - ) P. The third mapping, representation, describes the genotypic representation within the population. It is represented by f3: P+ G . The fourth mapping, mutation, describes the random changes that occur in the genetic material of the population. It is represented by f 4 : G -) G.
Methods for Acceleration of Convergence One way to achieve quick convergence is to decrease the variance of the Gaussian mutation especially as optimality is approached. In the initial stages, gross optimization occurs very fast, the rate being proportional to the slope of the objective function. As optimality is reached, the surface begins to flatten. The search must now be confined to a small area of the surface around the optimal region, and large variations in the population must be avoided. Evolutionary algorithms have been applied to problems with many constraints [34]. In such problems, penalties may be used to penalize those solutions which do not satisfy certain given constraints. In addition to this, the number of constraints that are violated is taken as an additional entity that is to be minimized. 2.2
3. Genetic Algorithms (GAS) Holland [31] designed a new class of search algorithms, called genetic algorithms, in which some of the principles of natural evolution process
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
163
were incorporated. Genetic algorithms (GAS) are stochastic search algorithms based on the principles of biological evolution. Even though the mechanisms of the evolution process are not fully understood, some of its features have been observed. Evolution takes place on thread-like objects called chromosomes. All living beings are created through a process of evolution on these chromosomes. The traits and the features of the creatures are embedded into the chromosomes, and are passed down to posterity. Natural selection is the link between chromosomes and performance of the decoded structures, and the process of reproduction is the basis for evolution. Reproduction takes place through recombination and mutation. Recombination creates different chromosomes by combining the chromosome material of the two parents, and mutation causes the generated offspring to be different from those of the parents. Genetic algorithms are theoretically and empirically proven to perform robust search in complex spaces. Many research papers and dissertations have established the applicability of GAS in function optimization, control systems, combinatorial optimization, neural networks, and a class of engineering applications. GAS are not limited by restrictive assumptions (such as continuity and existence of derivatives) about the search space. GAS are different from other optimization and search procedures in four ways: (1) GAS work with a coding of the parameter set, not with the parameters themselves. (2) GAS search using a set of points (called a population). (3) GAS use a pay-off (objective) function. (4) GAS use Probabilistic transition rules.
In order to solve a problem using GA, the following issues have to be considered. 0 0
0 0 0
0
encoding or representation of a solution; generation of initial solutions; an evaluation function; a set of genetic operators; selection of GA parameters; termination conditions.
In the simple genetic algorithm (SGA), the main loop in the algorithm is executed once for each generation. In each generation, the algorithm calculates the fitness value of each individual in the population, selects fitter individuals for reproduction, and produces offspring using crossover and mutation operators. Selection, crossover, and mutation are the basic search
164
SRILATA RAMAN AND L. M. PATNAIK
operators of GA. The time steps (iterations) for evolution in a GA are called generations. The genetic algorithm, or the simple genetic algorithm (SGA), proposed by Holland [3 1] is as follows: Initialize population Evaluate population While termination condition is not reached Select solutions for the next generation Perform crossover and mutation Evaluate population EndWhile stop
3.1
Selection Strategies Used in GAS
The various selection strategies commonly used in GAS described in [46] are as follows: (1) Roulette wheel selection. A biased roulette wheel is used where the size of each slot is proportional to the percentage of the total fitness assigned to a particular string. The wheel is spun a number of times equal to the population size and the string pointed to by the roulette marker is selected each time. ( 2 ) Stochastic remainder selection. In the above selection process, highly fit structures may not get selected due to the probabilistic nature of the selection process. But in this method, such strings always get selected. The expected number of copies for an individual is E, where
4
E = - x Pop. Size
F
where Fi is the fitness of the individual, F is the average fitness and Pop. Size is the size of the population. Each string is allocated LEA copies. The remainder of the mating pool is selected by either of the two methods described below. (a) Stochastic remainder with replacement selection. Here the rest of the pool is selected using roulette wheel selection using the fractional parts as the weights for the wheel. (b) Stochastic reminder without replacement selection. Only the fractional parts are considered as probabilities and weighted coin tosses are performed to complete the pool.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
165
( 3 ) Stochastic universal selecrion. A weighted roulette wheel is used in this case as well. Along the boundary of the wheel, markers are placed at regular intervals corresponding to the average fitness of the population. Each time the wheel is spun, the number of markers within the slot of the individual determines the number of copies given to the individual. (4) Tournament selection. A random group, of size G, is selected and the best in this group is selected for the pool. This is repeated until the pool is full. A group size of G = 2 has been found to give good results. ( 5 ) Rank-bused selection. All individuals are ranked according to their fitness with the best (highest ranked) individual first. A non-increasing function is used to assign copies to the individuals. Proportionate selection is then performed to fill the mating pool.
3.2
Fitness Representation
Assignment of fitness values to the individuals can be done in many ways, depending on the problem. Methods of assigning fitness values, in the context of genetic programming, are discussed in [15]. These methods are also applicable to genetic algorithms and are discussed below. (1) Raw fitness (R,). This is the fitness value stated in the natural
terminology of the problem itself. If the raw fitness value is specified in terms of the error in the solution, raw fitness is to be minimized, otherwise it is to be maximized. (2) Standardizedfitness (S,). It is defined such that it is always minimized. If R, is the error, s,=Rf, else s,=R,,, - R,. Here, R,,, is chosen such that the best value of S, is zero. ( 3 ) Adjustedfrtness (A,). This is coded in the form, 1
(1 +S,)
It always lies between 0 and 1. It is a non-linear increasing function which is maximized. (4) Normalizedfrtness (N,). This is equal to,
5 T
c
where T = A,. It also lies between 0 and 1 and is an increasing function. The total of all the fitness values is 1.
166
SRILATA RAMAN AND L. M. PATNAIK
3.3 Parameters of a Genetic Algorithm A GA can be formally described by six parameters [ 111. (1) N , the population size. (2) C , the crossover rate. In each population, N x C structures undergo crossover. (3) M , the rnutation rate. M x N x L mutations occur in every generation. Here, L is the length of the coding structure. (4) G, the generation gap. This controls the percentage of population replaced in every generation. N x (1 - G) structures survive intact across generations. ( 5 ) W , the scaling window. The objective function is specified by U ( x )= F ( x ) - Fmm.F,,, is the minimum value that the fitness function F ( x ) can take. W determines how F,, is updated. As generations proceed, F,,, is updated to the minimum fitness value obtained in the last W generations. (6) S , the selection strategy used to determine the mating pool.
Numerous books and articles have been published on genetic algorithms. The interested reader can refer to [24,31,36,43,46,47,48] for a further understanding of the underlying principles of genetic algorithms.
3.4 Classification of Genetic Algorithms Genetic algorithms can be classified based on the following characteristics [31: (1) Selection method used. ( 2 ) Dynamic vs static. In dynamic GAS, the actual fitness values are used for selection; but in static GAS, only the rank of the individuals is used. (3) Extinctive vs preservative. The preservative approach guarantees non-zero probabilities of selection to every individual, while the extinctive approach ensures that some individuals are definitely not allowed to generate offspring. (4) Left vs right extinctive. In the right extinctive method, the worst performing individuals are not allowed to live, but in the left extinctive approach, some of the best performing ones are prevented reproducing in order to prevent premature convergence. ( 5 ) Elitist vs pure. In pure GAS, members have a lifetime of one generation. Elitist GAS provide unlimited lifespans to very good individuals. (6) Generational vs steady state (on the fly). In generational GAS, the set of parents is fixed until all the offspring have been generated,
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
167
unlike in steady state GAS where the offspring immediately replace the parent if the offspring perform better than the parent.
3.5
Implicit Parallelism in Genetic Algorithms
GAS work on the basis of the schema theorem [31]. According to this theorem, each member of the population is simultaneously an instance of many schemata. The schemata are the building blocks which are tested implicitly for their fitness values. As generations proceed, these high-fitness low-order schemata are combined to form high-order schemata. The property of implicitly searching all the schemata to which an individual belongs is called implicit parullelism in GAS. The number of schemata processed implicitly has been shown by Holland [31] to be of the order of k 3 where k is the population size. The population size is taken to be equal to c x 2', where 1 is the length of the encoding and c is a small integer. The implication of the above result is that by having a population of only n strings, the algorithm is able to search n3 schemata. The result has been extended and generalized in [4] to a population of size k = 2@'where is a positive parameter (Holland's result is a special case when /3= 1.) It is shown that the lower bound on the number of schemata tested is a monotonically decreasing function of the population size and /3. By assigning values to ,8, it is found that increasing /? drastically reduces the order of the lower bound. An analysis of genetic algorithms with respect to their convergence is given in [MI. Finite Markov chains are used to prove whether or not canonical GAS converge. Since the state of the GA depends only on the genes of the individuals, it is represented as a probability matrix as required by the analysis. It is proved here that a GA with crossover and mutation probabilities in the range [0,1] and using proportional selection does not converge to the global optimum. It is also proved that if the same algorithm maintains the best solution (an elitist strategy) over generations, it is guaranteed to converge.
3.6
GAS in Constrained Optimization
When GAS are applied to constrained optimization problems, it is seen that simple crossover and mutation often produce individuals that are invalid. To overcome this problem, three methods are often used. (1) The GA operators are modified so that they produce only valid solutions. These modified operators must ensure exponentially increasing copies of good solutions for the GA to work according to the schema theorem. One drawback with this approach is that for different problems, the operators must be redefined.
168
SRllATA RAMAN AND L. M. PATNAIK
( 2 ) The normal GA operators are retained in the second method. Any invalid solutions produced are penalized using penalty functions. These penalty functions must ensure that the GA does not converge to invalid solutions. (3) In the third method, too, the normal GA operators are retained. Invalid solutions are repaired and converted to valid solutions before being evaluated in the algorithm.
4. 4.1
Extensions to Genetic Algorithms Generating the Initial Population 151
Instead of the initial population being generated once randomly, each member in the population is taken as the best of n individuals randomly generated, where n becomes a parameter defined by the user. This is seen as a generalization of the usual method where n = 1. When GA was used to optimize a 10-dimensional function, it was found that 14% of the function evaluations were to determine the initial population.
4.2
Use of Subpopulations in Place of a Single Population [2]
Instead of having a single population that proceeds through generations, the initial population is divided into a number of subpopulations. Each of these subpopulations proceeds independently for some generations until some criterion is met. This duration is called an epoch. At the end of every epoch, some individuals, normally the best ones, are exchanged with the neighboring subpopulations. This is continued for some epochs, at the end of which the subpopulations are merged into a single population and the GA proceeds with a single population. This lends itself to many variations. (1) The criterion for the duration of an epoch can be just a fixed number of generations or an epoch can end when the subpopulations tend to saturate. (2) The number of individuals exchanged can be made a parameter of the algorithm. (3) All the subpopulations can be combined together in one generation or they can be merged gradually over many generations.
The main motivation for use of subpopulations can be explained using a biological metaphor. Isolated populations tend to evolve their own
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
169
distinctive characteristics depending on the local environments. It has been seen that when these individuals are put in a new environment, there is rapid evolution of the species. Here, changes due to the new environment cause improvement of an individual’s traits which are rewarded. This is called punctuated equilibria and is explained in [ 2 8 ] .
4.3
Parallelism in Genetic Algorithms
From the above, it is evident that the subpopulations can evolve independently and in parallel. The evaluation of the subpopulations can go on simultaneously, interaction being needed only between epochs. On the other hand, even with a single population, evaluation of the different individuals can take place simultaneously independent of one another. After the complete population has been evaluated, the results are collected and the genetic operators are applied to obtain the next generation. One other method to implement parallelism that has been proposed [19] is that of a master-slave parallel algorithm. Here, subpopulations evolve independently and at the same time a complete population (of size N) also evolves. If there are M subpopulations, then each will have a size of N / M . At the end of a specific number of generations, all the subpopulations are put together. N / 2 individuals are chosen from the complete population and N / 2 members are selected from subpopulations for the next generation. The different methods of parallelizing GAS are explained in [40]. 0
0
0
0
In the synchronous master-slave model, the master processor controls the mating and selection processes. The slave processor performs the evaluation in parallel. One of the main disadvantages in this method is that it depends a lot on the master processor to generate the new population. If the master processor fails for some reason, even though all the other processors are functioning, the algorithm cannot proceed. Efficiency of this method decreases if the evaluation time varies from individual to individual. In the semi-master-slave model, the synchronous constraints are reduced. This model is able to overcome the disadvantages of the previous model. In the distributed asynchronous concurrent model, each processor performs mating and selection independently. The population information is stored in a common memory shared among all the processors. In the network model, each processor runs a GA independent of the others. Each processor has its own memory. The best members are occasionally broadcast to the neighboring populations.
170
SRILATA RAMAN AND L. M . PATNAIK
4.4
Hybrid Genetic Algorithms (HGAs)
In HGAs, the genetic algorithm is supplemented with a local search mechanism like simulated annealing [45] or tabu search [14]. The main motivation for such a hybridization is that GAS always tend to improve the average fitness of the population rather than finding the global optimum. Thus GAS are able to very quickly find regions of good solutions but take relatively longer time to find the global optimum. It has been found fruitful to allow local searches to find the global optimum in good solution areas [14]. 4.4.1
Tabu Search
Tabu search [14] is a local search mechanism which is used often to try and improve the solution obtained by the genetic algorithm. The tabu search algorithm is explained as follows. The search starts with a random solution string. Then successive 2-opt exchanges are made to improve the solution. During 2-opt exchanges, two bits randomly chosen in the individual are exchanged. This is repeated to get many such randomly mutated copies of the original individual. All these copies are examined to determine their fitness. The copy which improves the solution better than the others is selected to replace the original solution. This is continued until some stopping criterion is reached. The stopping criterion may be based on the number of copies generated or the finding of a satisfactory result. To escape from local minima, exchanges which result in deterioration of the quality of the solution are also allowed. To prevent being pulled back into the local minima by successive improvements in the direction of the local minima, a list of exchanges that must be disallowed is maintained as a TABU list. In one implementation, the GA and the local search are run alternately. The results of one generation of GA are improved and handed over to the GA for one more generation. The complete population can be improved by the local search algorithm or only the best members of the GA can be given to the local search for optimization. Alternately, the GA is run until it can proceed no further, after which the local search completely takes over and tries to improve the result. Instead of using only one local search algorithm, many algorithms can be used in tandem to prevent bias towards any one algorithm, as each will have its own effective search area.
4.5
Use of Intelligent Operators
Rather than using the standard genetic operators, some problem specific information can be incorporated into them to improve the hill-climbing nature
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
171
of the genetic algorithm. In some problems, the individuals in the population must satisfy some conditions to be accepted as valid. The standard operators may not be directly applicable because random crossover and mutation points may result in the string becoming invalid. Thus, the operators must maintain valid members while preserving the building blocks. In the case of the traveling salesman problem, for an individual to be considered legal, it must contain all the cities once and only once. If standard crossover and mutation operators are used, such conditions might not always hold. The operators must thus preserve such a criterion across generations.
4.6
Avoidance of Premature Convergence
This has been a problem as early convergence of the population often results in the GA getting stuck in a local optimum. To prevent this, many options have been proposed [ 7 ] . Some of these approaches are given below. A preselection mechanism suggested by Cavicchio replaces the parents with their offspring. In De Jong’s crowding scheme, the offspring replace the closest matching string from a randomly drawn subpopulation, the size of which is specified as a parameter called the crowding fuctor. A uniqueness operator was developed by Mauldin. This uses a censorship operator which allows an offspring to survive only if it is different from all the other members of the current population. A sharing function was used by Goldberg where probabilities are determined talung into account the average fitness of similar individuals.
4.7
Messy Genetic Algorithms
In order to obtain better convergence properties, and to ensure good results even if the linkages between the building blocks are weak, Goldberg et al. [lo] proposed the messy GA. Messy GAS (MGAs) use variable length coding which may have too many or too few bits with respect to the problem being solved. In traditional GAS, strings have all the genes, and thus each string is a solution to the problem. In messy GAS, all the genes need not be present in a string. In some cases, due to the usage of too many bits to represent the string (overspecification), many unwanted bits may also be present in the individual. Thus, in MGAs, two structures need not have any genes in common and variable length operators are used unlike the fixed length crossover in standard GAS. The cut operator cuts a string with a probability equal to ( I - 1) x P, where P, is the probability of the cut operator and I is the length of the string. The splice operator joins strings with a probability P,.
172
SRILATA RAMAN AND L. M. PATNAIK
The evolution of the population takes place in two stages. The first, called the primordial phase, consists of a number of generations where fixed length individuals are improved through reproduction without any other operators. The population is halved periodically. The second, called the juxtapositional phase, uses cut, splice and other genetic operators on a fixed population size. In cases of underspecification, a method of competitive templates is used to fill the remainder of the structure with locally optimal structures. To remove any inconsistencies due to the variable length coding, two mechanisms have been proposed.
(1) Genetic thresholding allows two structures to compete only if they have more than some threshold of common genes. (2) Tie breaking is used to prevent parasitic bits from tagging along with low-order building blocks obstructing the formation of higher-order blocks.
4.8
Parameterized Uniform Crossover
Simple uniform crossover has been parameterized by Spears and De Jong [20]. The main reason for this is that though uniform crossover produces offspring which are much different from either of the parents (due to the large disruptive power of the operator), its advantages are quite numerous. It is simple, having only one crossover form and more importantly, its disruptive power is independent of the length of the coding used for the individuals. The explorative power is also very useful in the initial stages of the search, when the operator distributes the members over a large area in the state space. In simple uniform crossover, the probability of swapping of bits is fixed at 0.5. This probability is parameterized by a variable Po to obtain an unbiased crossover operator which can easily be controlled by a single parameter. The importance of coding length independence is highlighted when many additional fake bits are added to the encoding of a simple optimization problem [20]. The results showed that though the performance of two-point crossover worsened (due to the fake bits), there was no change in the performance of uniform crossover.
4.9
Scaling of Fitness Values
Sometimes, few individuals whose fitness values are much higher than the rest of the population are generated. These superfit individuals are given many more copies for the next generation when compared to the rest of the population. This results in their dominating the population and forcing
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
173
premature convergence. Scaling also becomes necessary if the objective function has negative values and proportional selection is used.
4.10
Adaptive Mutation
Application of the mutation operator as used in evolutionary strategies (ES) and genetic algorithms (GA) is discussed in [13]. In ESs, the probability of mutation (P,) is encoded as part of the individual’s string. During the process of generating offspring, P,,, is affected by the recombination and mutation operators. The mutation scheme first mutates the mutation rates, then these rates are used to mutate the members of the population. In traditional GAS, P, is usually specified as a parameter. If adaptive mutation is used, P, is also encoded as part of the member’s string as is done in ESs. 4.1 1
GAS in Multimodal Function Optimization
It is seen that GAS always converge to a single solution irrespective of the number of good solutions present in the function. This is termed genetic drift [36]. Obtaining many solutions instead of allowing the GA to converge to a single solution is discussed in [35]. Many schemes have been proposed to maintain the diversity of solutions so as to find more than one solution in a multimodal function optimization problem. The concept of a niche is often used. Each peak in the fitness landscape is thought of as a subspace or a niche. The biological metaphor for a niche is explained as follows. Each niche is an independent environment which can sustain some individuals depending on the fertility of the environment. The number of individuals that a niche can support is called its carrying capacity. If there are too many individuals in a niche, the weaker among them tend to die; if there are too few individuals, they are capable of exploiting the resources in the niche. The carrying capacity of a particular niche, or peak, depends on its fitness relative to the other peaks. The idea is to populate each niche with a small number of individuals so that they can find the best solution in that niche. In this way, many solutions can be obtained rather than a single solution. This entails maintaining diversity of the population as generations proceed. One method to preserve diversity is crowding proposed by De Jong [36]. Here, premature convergence is reduced by minimizing the change in the population between generations. Once offspring have been generated, a certain number of individuals are selected randomly from the population and the offspring replace the most similar individuals in the selected group. Similarity of
174
SRILATA RAMAN AND L. M . PATNAIK
individuals is decided based on some similarity metrics. These may be domain independent (the Hamming distance between the strings) or can be problem specific. The disadvantage with this method is that not much exploration is done by the members of the population. Another method similar to De Jong’s method is proposed in [37].In this method, domain-specific similarity metrics are used and instead of the offspring replacing an individual from a randomly chosen group, replacement is limited only to the parents. The child replaces the parent if it has a higher fitness compared to that of the parent. This method is termed deterministic crowding. Another method, called sharing, is discussed in [3 1 1. In this method, fitness of similar individuals is reduced in order to give preference to a solution that explores some new region of the space. A new method called dynamic niching is introduced in [35]. The main advantage with this method is reduced computation time as compared to sharing. In addition to the methods stated above, some restrictions are also imposed during reproduction in order to maintain diversity and these are as follows. In one case, once an individual has been selected for crossover, the other one is selected from only those members in the same niche as the first member. In another method, called line breeding, a good solution is repeatedly mated with others in the same niche. In inbreeding, members mate with other members in the same niche. If after many generations, the average fimess of the niche does not increase, cross-breeding with the other niche members is allowed. These mating restrictions coupled with the methods to maintain diversity have been used for optimization of the problems in [37]and are described in [35]. The concept of a gene-invariant genetic algorithm (GIGA) is presented in [38]. GIGAs are presented as a subclass of GAS where the genetic structure of the population does not vary. If the population is represented by a two-dimensional array, where each row corresponds to one member, genetic invariance means that the multiset of values in any column do not change with time. Thus, in any column, though the genes may be exchanged within the column as generations proceed, no new gene is introduced into the column. This invariance is maintained by ensuring that the children replace only their parents. The concept of a family is also introduced in [38]. A family is a set of offspring produced by a set of crossover operations on a pair of parents. The number of sets of offspring is the family size and the best offspring replace the parents.
4.12
Coevolution, Parasites and Symbiosis
The concept of coevolution is explored in [42]. In a scenario where coevolution and symbiosis are used, two gene pools which evolve separately
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
175
are maintained. This is analogous to the biological situation where a host and a parasite evolve together. The population representing the solutions to the problem forms the hosts while the fitness cases used to evaluate the solutions form the parasites. The populations are assumed to be distributed on a two-dimensional toroidal grid. Both the host and parasite populations interact with each other through the fimess function. The hosts are given fitness values depending on the number of test cases which they are able to satisfy. On the other hand, the parasites are scored according to the number of solutions that are not able to satisfy the particular fitness case. This method has some inherent advantages. If a part of the population of hosts gets stuck in a local optimum, the members of the parasitic population would evolve towards it, thus reducing the fitness of that part of the population. This moves the population out of the local optimum. It is also seen that after many generations, due to the evolution of the parasites, only those test cases that are not satisfied by many solutions exist in the parasitic pool. This effectively reduces the number of test cases to be applied in order to find the fitness of the solutions.
4.12.1
Symbiosis in GAS
Similar to the above model, a model based on symbiosis is developed in [41]. In this model, a number of species cooperate in ways that are beneficial to all the species. In such an environment, a complex optimization problem is split up into many subproblems. Parallel GAS then simultaneously solve the subproblems. Each population now represents the solution to a subproblem. The fitness of the individuals in each population is based on the interactions with members in other populations.
4.13
Differences between Genetic Algorithms and Evolution Strategies
Following are some significant differences between GAS and ESs [13]. (1) GAS operate on fixed bit strings, later mapping them onto object values. GAS work on encoded representations of the actual problem and use functions to map them onto actual points in order to obtain their fitness values. ESs work on real-valued vectors. (2) Since ESs work completely in the phenotypic domain, they utilize much more problem specific knowledge than GAS. (3) ESs can only be applied to function optimization problems whereas GAS cover a much wider range of applications.
176
SRILATA RAMAN AND L. M. PATNAIK
(4) In GAS with proportional selection, reproduction rates are assigned dynamically to each individual based on the respective fitness values. The worst individual may also have some chance of reproducing. In ESs, reproduction rates are assigned statically, without any regard to the fitness values. ( 5 ) In GAS,mutation is mainly used as a secondary operator whose main purpose is to regenerate lost genetic material, but in ESs, mutation is the main operator and is implemented as a self-adapting hill climbing operator. (6) In ESs, the rate of mutation is controlled by a gaussian random variable. This rate is adjusted depending on the distribution of the fitness values of the individuals. This is known as collective selflearning of parameters, which is present in ESs and not found in the case of GAS.
These differences are mainly due to the difference in the representation schemes used for the two methods. 4.14
Reasons for Failure of Genetic Algorithms
Failure of GAShas been attributed to three main reasons [ 2 3 ] . (1) Deceptive problems. The GA’s search mechanism is based on the schema theorem. The GA finds solutions by combining several highfitness low-order schemata to get higher-order schemata. In some problems however, the optimal solution does not contain the highfitness low-order schemata. In such state spaces, the GA is led away from the global optimum and gets stuck in a local optimum. Such problems are called deceptive problems. (2) Sampling error. In some cases, even though a particular member may have a good fitness value, it may not be high compared to the other members. This may cause the member to die due to the selective pressure, even though it has above average fitness; that is, due to the other members having fitness values greater than that of this member, no copies may be given to it during the selection process. ( 3 ) Disruption of the schema. This happens if the crossover operator has not been properly designed. The operator quickly disrupts good low-order schema and prevents the formation of good solutions. In such cases, crossover is not able to guide the search to form high-order schemata even though the problem is not deceptive.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
5.
177
Other Popular Search Techniques
5.1 Population-based Incremental Learning (PBIL) In GAS, the population can be represented by a probability vector. This vector has the same length as each of the members of the population. In a fully generational GA with fitness proportional selection and a general pairwise recombination operator, the probability vector gives the probability of value j appearing in position i in a solution vector. As is obvious from the above, the same vector can represent radically different populations. Another property that the probability matrix has is that the population is unlikely to improve over the generations as the bits of an individual solution are considered independent of one another. The probability vector has been made use of by [2] to create a population-based search technique called PBIL. In PBIL, a probability vector is used to describe the population. In a binary-encoded string, the vector specifies the probability of the bit taking a value 1. In such a representation, maximum diversity is found when the probabilities are 0.5. Unlike GAS, operations are defined on the vector rather than on the population. Similar to a competitive learning network, the values in the vector are gradually shifted towards the vector representing a highfitness solution. The algorithm works as follows: (1) Initially the probability vector is initialized to 0.5. (2) From the vector, a population of representative solutions is generated. (3) The solutions are evaluated using a fitness function as required by the problem. (4) Mutation is performed to see if the solutions can be improved. (5) The vector is then moved towards the best of these solutions, by changing the probabilities of the probability vector to resemble the highest evaluating solution. (6) The above process is repeated until a satisfactory solution is obtained or some stopping criterion is met. The formula used to change the probabilities is given by
PROB, = (PROB,x (1.0- L.R.)) + (L.R. x VECTOR,). where PROB, is the probability of generating a 1 at position i, VECTOR, is the ith position in the solution vector towards which the vector is moved, and L.R. is the learning rate. The PBIL algorithm requires four parameters: the population size, the learning rate, the mutation probability and the mutation shift (the magnitude of the effect of mutation on the vector).
178
SRllATA RAMAN AND L. M. PATNAIK
Some extensions to the basic PBIL algorithm are also discussed in [2]. In one case, the vector is moved in the direction of the best M solutions, where M e N , the population size. This can be realized in many ways: the vector can be moved equally in all directions or the vector is moved only in the positions where there is a consensus in all or most of the best solution instances. In another case, the vector is moved based on the relative evaluations of the best M solutions. In this case the solutions are ranked based on the solution quality as only the rank of the best solutions is needed. The probability vector is moved away from the lowest evaluation solutions. In this case, the probabilities in the vector are moved in the direction opposite to the probability vector that is representative of poor solutions.
5.2
Genetic Programming (GP)
In genetic programming [15], the genetic algorithm is applied to a population of programs in order to find the one that gives the best solution to the given problem. The fitness value associated with the programs may either be a measure of how best they solve the problem (a maximization function) or the error in the solution produced by the program (a minimization function). The programs are represented as hierarchical trees to which the genetic operators are applied. The program may be boolean-, integer-, real-, complex-, vector- or symbolic-valued as required by the problem. The operators are applied to parts of the program (subroutines or subprograms) as in normal GAS. When this process is repeated over many generations, the programs produced will be of increasing fitness due to the very nature of neo-Darwinian evolution. The members in the population are organized as hierarchical symbolic expressions (S-expressions). The nodes in the trees are obtained from a function set and a terminal set of symbols which form the arguments to the functions. The function set includes boolean, arithmetic and conditional operators, iteration and recursion functions, etc. When applying the operators, care is taken so that all the trees produced are valid S-expressions. Invalid trees result when there is a mismatch in the number of operands to a function or the type of operands to a function. For example, the square-root function must take a real or integer-valued variable, not a boolean one, and the function to find xy must have two arguments. It can be easily seen that nodes from the function set always form the internal nodes in the tree (as they always have operands) and nodes from the terminal set always form the leaf nodes. Each S-expression tree is, therefore, a collection of functions along with
179
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
OR
J /AND\ X
FIG. 1.
\ NOT
Y
X
1
\Y
/AND
NOT
1
Tree representation of the Exclusive-OR function.
their arguments; that is, each tree represents a program which when executed solves a given problem.
Example. A simple example of a program to compute the ExclusiveOR function represented in the form of a tree is as follows. In Fig. 1, the function set is the set { A N D , NOT, O R ) and the argument or terminal set is 1 X , Y I . The same tree is expressed in the form of a LISP program as, f o r ( and ( x n o t ( y
)I)
t and
(
not
( x
1 y
) )
1.
Thus a program is a set of functions along with a set of arguments. Since these functions include comparison operations and iterations, any program can be represented as a tree for use in the GP algorithm. Many methods have been suggested for the generation of the initial population of trees. In one method, the tree is such that all the leaves of the tree are found in the same specified level. In another method, the only restriction on the trees is that they must not exceed a maximum specified depth. Sometimes a combination of these methods is used to obtain the initial population. The main operators used in GP are selection and crossover. In crossover, the crossover points are selected randomly from two selected parents. The subtrees at these points are exchanged. Following are the implications of using such an operator: (1) If leaf nodes are selected, the process becomes equal to one point mutation. (2) If the root of one parent is selected, this parent becomes a subtree of the second parent to form one of the offspring. The part of the second parent removed by the crossover operator becomes the other offspring. (3) If the roots of both the parents are selected as the crossover points, no crossover takes place.
7 80
SRILATA RAMAN AND L. M. PATNAIK
(4) Importantly, unlike the crossover operator in GAS, even if both parents are identical, the offspring produced may be very different from both the parents. This helps to a great extent in the prevention of premature convergence due to some superfit individuals. Some secondary operators used in genetic programming are as follows. (1) Mutation. A randomly generated subtree is inserted at a randomly selected point. ( 2 ) Permutation. This operator is defined as the generalization of the inversion operator used in GAS. It randomly permutes arguments of a randomly chosen point, unlike the inversion operator, where a gene sequence is inverted. (3) Editing. This operator evaluates subtrees and replaces them by their results. Example: X AND X is replaced by X . (4) Encapsulation. Potential subtrees are identified and named so that they can be referenced and used later.
The main parameters that must be chosen in genetic programming before starting the algorithm are as follows: (1) The terminal sef. The set from which arguments are given to the function nodes in the tree. (2) The function set. The set of functions which are used to determine the internal nodes of the tree. (3) The fitness evaluation technique. In some cases, where a continuous function is not available to compute the fitness, fitness cases are used. Fitness cases represent the value of the function at specific points, and are used if the function values are not available at any arbitrary point. Unless these points are taken from the entire range of the function, they will not be representative of the function and they may result in poor quality solutions. (4) The numeric and qualitative parameters. These include the parameters used in GAS. The numeric parameters include the population size, the maximum number of generations to be allowed before the algorithm is terminated, the probability of selection and the probabilities of the secondary operators used. The qualitative parameters include the selection method, the type of fitness function and the use of the elitist strategy. Other parameters not present in GAS but used in GP include the maximum depth allowed in the trees obtained after applying the crossover operator, the maximum allowed depth of the initially generated random trees and the method used to generate the initial population.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
181
( 5 ) The termination criteria. As in GAS, the termination criteria achieve a sufficiently good result, reaching some maximum number of generations and reaching a stage where the algorithm is not able to further improve the solution (saturation).
GP as applied to three classes of problems is discussed in [ 151, 0
0
0
Optimal control problems. In this class, the problem is represented by a set of state variables. Control variables have to be chosen such that the system goes to a specific state with an optimal cost. In one example, the algorithm finds a program to bring a cart to rest at a target spot in the least time. The cart is modeled as moving on a one-dimensional frictionless track. The program must give the optimum acceleration in the correct direction to stop the cart. Robotic planning. The GP algorithm has to find a program which correctly moves an artificial ant in a two-dimensional toroidal grid to find all the food pieces scattered in the grid. The Santa Fe Trail problem is used as an example to test the algorithm. In this problem, 89 food pieces are scattered in the grid. The GP algorithm has to generate a program which, when used by the ant, must lead it to all the pieces of food. The food pieces are not available on adjacent squares and there are many breaks in the trail which the ant must successfully cross. The permitted operations of the ant include moving forward, turning left or right and sensing food. The fitness value given to the program is the number of food pieces it successfully finds. GP was also run on the Los Altos Trail. This trail contains more than a hundred food pieces distributed on a larger grid. The Los Altos Trail is much more complex than the Santa Fe Trail and includes many more irregularities. The GP algorithm was able to find a program that successfully solves this trail. Compared to the program used to solve the Santa Fe Trail, this program is more complex. Symbolic regression. The algorithm must find a program that represents an expression that correctly fits the sample data. The algorithm not only has to find the constants in the expression, but also the expression itself. The difference between the sampled values and the generated expression’s values is taken as the measure of the fitness of the program. The test function chosen is a polynomial one. The function set includes trigonometric and logarithmic functions which are not necessary for the particular problem. After finding many closely, but not exactly fitting functions, the algorithm finds the correct function. Some examples of simple regression problems to which GP has been applied are given below:
182
SRILATA RAMAN AND L. M. PATNAIK
1. Trigonometric identities. GP has to obtain mgonometric identities by finding an expression equivalent to the one given. Example: Consider the identity sin(a + b ) = sin(a)cos(b)
+ cos(a)sin(b)
Given sin(a + b), GP finds a program that evaluates to the right-hand side of the above identity. The test expression used is cos 2x. The GP algorithm finds two programs which evaluate to 1 - 2 sin’x and sin(x/2 - 2x) respectively, both being equal to cos 2x. The algorithm thus finds two trigonometric identities. The fitness function used in the above problem consists of fitness cases. The main issue in the problem is the use of correct representative points for the fitness cases. The points used for the fitness cases must be distributed uniformly in the range [0, n] for the fitness function to properly represent the objective function cos 2x. 2. Symbolic integration. Given an expression, the algorithm has to find a program (a set of functions) that evaluates to the integral of the given expression. 3. Symbolic differentiation. The algorithm has to find a program which is the derivative of the given expression. 4. Solution of differential equations. Given a differential equation, whose solution is in the form of a function, GP has to find a program that represents the solution.
In all the problems considered, LISP has been used while generating the programs in the population. Genetic programming is also applied to a class of problems where a complex behavior develops as time progresses. An example of this is an ant colony [8]. The majority of the ants spend their time in collecting food for the colony. As more and more food is collected, the ants are able to distinguish between those places where abundant food is available and those places where there is no food. This collective behavior, explained in [8], is as follows. Ants, which are nearly blind, are able to find the shortest route between two places. It has been found that frequently used paths are established as pheromone trails. The ant lays varying quantities of a substance called pheromone along the path it travels. If a randomly moving ant encounters a pheromone trail, it follows the trail with a high probability and in the process reinforces the trail with its own pheromone. This leads to more and more ants using the trail; this has been termed as autocatalytic behavior. The probability of choosing a path increases with the number of ants that have already used the path. Since shorter paths mean less time for
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
183
the ant to go to the destination and return, all the ants will eventually choose the shortest path available. GP has been used to model a colony of ants [15]. The behavior of the ants is represented in the form of a program. GP has to find a program such that by following it, the ants successfully find all the food. The fitness used is a measure of the distribution of food. The correct program for the ants must be the one which collects all the food at one place when the algorithm terminates. Thus at the completion of a program, the more the food is scattered, the less is the fitness value of the program. GP has also been tried in the area of game theory. The algorithm finds a program to play a game using the minimax strategy. A two player zero-sum game is used to test the program. The fitness evaluation consists of adding the gains of the moves generated by the program for all possible moves of the opponent. In traditional GP, random nodes are selected as crossover points. Though this maintains diversity, it is seen that building blocks become distributed over the entire tree due to the repeated crossover operations. Two new operators which preserve the context in which subtrees are located by restricting the crossover points are discussed in [39]. The context of a subtree is defined by its position in the tree. The position of a subtree is the unique path from the root to the subtree. One new crossover type introduced is strong context preserving crossover (SCPC). Here, the crossover points are selected such that the subtrees chosen have exactly the same position in both the trees. This type of crossover is found to be too restrictive and does not allow exploration of the entire state space. Another disadvantage is that good building blocks are not spread to other parts of the tree. SCPC is useful in those problems for which the solution trees contain some repeated code. Another crossover, weak context preserving crossover (WCPC) is also discussed in [39]. In this type, once two crossover points (nodes in the tree) have been selected, the subtrees to be exchanged are determined as follows. In one parent, the subtree at the crossover node becomes the subtree to be exchanged. However, in the second parent, a random subtree of the crossover node is selected to be exchanged. This results in an asymmetric exchange as opposed to a symmetric one in SCPC. One of the problems that was tested using the crossover operators 1391 is the food foraging problem described in [15]. It is seen that SCPC, along with regular crossover produces better results than standard GP. The results also show that the solution trees obtained by the new method are much smaller when compared to the standard GP algorithm. It is also seen that a mix of SCPC and regular crossover outperforms the case when WCPC alone is used.
184
SRILATA RAMAN AND L. M. PATNAIK
5.3 The Ant System This is an optimization technique taken from nature which follows the behavior of ants described earlier. The system is based on the way ants cooperate to find food. The algorithm consists of a number of ants (agents) which form the population. This is similar to strings used in GASthat form the population of potential solutions. The problem is represented in the form of a complete graph and the goal is to find a route that satisfies some criteria and minimizes some objective function. Initially, all the agents complete some tour that satisfies all the required criteria. Once the agents have completed the tour, the relative merits of the tours, which are the quality of the solutions, determine how much pheromone is laid on the paths of the tour. Once this is over, the agents again try to find a tour starting from their current position. This process of finding tours is repeated until some stopping criterion is met. As time progresses, knowledge about good routes is accumulated in the form of large quantities of pheromone on these routes. This knowledge is made use of by the agents by making the probability of selecting the next move a function of the quantity of pheromone found in the paths originating from the current position; more the quantity of pheromone, higher the probability of selecting the particular path. In this way, pheromone builds up on those paths that form good solutions, thus leading to the optimization of the function.
6. Some Optimization Problems 6.1 Partitioning Problems The problem of optimizing the allocation of tasks in multicomputers using hybrid genetic algorithms has been discussed in [17]. Here, the given problem is partitioned into subproblems and allocated to different processors in a multiprocessor system in order to minimize an objective function. The allocation is based on a loosely coupled synchronous parallel model where the computation and the communication phases do not overlap. Instead of allowing the search to proceed blindly, some problem specific knowledge is incorporated into the search algorithm in the form of hill-climbing operators. The algorithm divides the search process into three stages: a clustering stage which forms the basic pattern of the division of tasks based on interprocessor communication, a calculation balancing stage where the emphasis is on the computational load to increase the fitness, and finally a boundary adjustment stage where hill climbing is performed. At the end of the first two stages, a nearly optimal solution is obtained where each cluster
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
185
represents the tasks allocated to a single processor. In the third stage, since the population is near convergence, the power of crossover diminishes due to the similarity of the individuals. Mutation is then used to try and improve the solutions by swapping some small tasks between the processors. This is essentially equal to fine-tuning the solution and is accomplished with the help of hill climbing by the individuals. Elitist ranking followed by random selection has been used as the solution strategy. The individuals are ranked between 1.2 (best) and 0.8 (worst) and the others in between. Those individuals above a rank of 1.O are given a single copy in the mating p o l . The fractional parts of the ranks of all the individuals are then used as probabilities to fill the remainder of the mating pool. Two-point crossover, the standard mutation and the standard inversion operators are used to obtain the offspring. In the hill-climbing operation, an element at the boundary of a cluster in an overloaded processor is moved into another processor in the system provided it causes an improvement in the objective function. The experiments lead to the conclusion that without the hill climbing stage, the quality of the result deteriorates and the algorithm becomes almost a hundred times slower. A similar method of solving the K-partition problem is discussed in 161. A parallel GA has been used on a hypercube model, where the subpopulations proceed in isolation, occasionally exchanging their solutions with their neighbours. The objective is to partition the graph of elements, each having some area, into K partitions such that the total area of each partition is below a certain value and the number of interconnections among the partitions is minimized. A fixed number of generations is used to mark an epoch. At the end of every epoch, each processor copies a subset of its n individuals to its neighbors. This results in each processor having more individuals than the subpopulation size. The processors then select n members required for the next epoch from this pool. The number of individuals exchanged is defined as a parameter, as is the number of generations in every epoch. One-point crossover and the standard mutation operators have been used as the genetic operators.
6.2 The Traveling Salesman Problem The advantages of hybrid GAS over standard GAS are examined in [14], where the algorithms are tested on the traveling salesman problem using two local search techniques: simulated annealing (SA) and tabu search along with the standard GA. The basic algorithm consists of the following steps: (1) Get the initial population of N different tours. (2) Run SA on the population to get some local solutions (solutions which are the best among all their neighboring solutions).
186
SRILATA RAMAN AND L. M. PATNAIK
(3) Run tabu search on the population to get some local solutions. (4) Run GA for one generation. ( 5 ) Repeat the above steps until the termination criteria are met. To prevent bias towards a single local search, two local search techniques are used together in the algorithm. The members in the population are represented by an array of names of cities such that each city is connected to its neighboring ones. A heuristic greedy crossover operator is used. In this case, to generate the offspring, a randomly chosen city forms the starting point of the offspring’s tour. The distances of the cities connected to this one, in each of the parents, are examined. The offspring’s tour is then extended by taking the shorter of these distances. If this creates a cycle, the next edge in the offspring’s tour is chosen randomly. The main issues in this hybrid algorithm are the tabu list conditions and the tabu list size. Several tabu conditions like imposing a restriction that one city must be visited before another or a fixed position for a city in the tour, have been proposed. It has been found that the tabu list has to be small in the case of highly restrictive tabu conditions. If its size is too small, cycling would result in the solutions being pulled back into the local optima; too large a size would move the solutions away from the global optimum during the later stages of the search. The experiments also demonstrate that the quality of the solution obtained by simulated annealing depends a lot on the cooling schedule used. If the schedule is carefully designed, SA finds better solutions than GA or tabu, but finding the optimum schedule is found to be computationally expensive. tabu search has been found to converge to solutions, though suboptimal, faster than both GA and SA. Much of the effectiveness of tabu search depends on the heuristically determined tabu conditions. The hybrid GA outperforms all the other methods used individually. When both the local search techniques are used, the performance improved considerably. The experiments have shown that GA + tabu + SA found the optimum route in the 100 city TSP every time it was run [14]. When run alone, none of the algorithms was able to find the global optimum even once. The known optimum of the TSP is 21 247 miles [30]. The programs were executed on a Sun4/75 computer using the C programming language. Though GA + S A + tabu finds the result in lesser generations, more time is spent in each generation refining the solution, thus increasing the time taken by the algorithm to converge to the optimum solution. The TSP has been solved using the ant system and has been compared with other heuristic techniques like tabu search and simulated annealing [8]. The TSP is represented as a complete graph. In the algorithm used, called the ant cycle, at any given town, the next town is chosen depending on the
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
187
distance between the two towns and the amount of pheromone trail on the edge connecting the two towns. Once the tour is completed, a substance called trail is laid on all the edges visited by the ant. The ant is forced to make legal tours only with the help of a tabu list associated with each ant. This list contains all the cities visited by the ant so far, preventing the ant from visiting them again in the current tour. The number of ants used is equal to the number of cities in the tour; initially each ant being placed in one city. Once the tour is complete, the tabu list is emptied and the process is repeated. A balance is achieved between a greedy heuristic and the quantity of pheromone found on the edges of the graph connecting the cities. The greedy heuristic says that close towns must be visited with a high probability. In two variants of the algorithm, the pheromone trail is laid as soon as the ant makes a transition. In one of them, ant density, the amount of trail, is a fixed quantity while in the other, ant quantity, the quantity of pheromone is inversely proportional to the distance between the two cities. The algorithm is controlled by four parameters: the importance that should be given to the pheromone trail while selecting the next city in the tour; the importance that should be given to the heuristic when deciding the next city in the tour; (3) the persistence of the trail (how long the trail lasts); (4) the quantity of the pheromone deposited on the edges of the graph. All the parameters are used as probabilities to calculate the next city that the ant should visit.The algorithms were tested on the Oliver30 problem P61. Ant cycle has been found to give better results than ant quantity and ant density. This is explained by the fact that since in both ant quantity and ant density, pheromone is laid as soon as a transition is made, both the algorithms use local information to determine the quantity of the pheromone as opposed to global information used by ant cycle where the pheromone is laid only when the tour has been completed. The performance of the algorithm has been compared with special purpose heuristics designed for the TSP and also with general purpose ones like tabu search (TS) and simulated annealing (SA). The results show that the ant system performs better than the other algorithms,
6.3
VLSl Design and Testing Problems
The genetic algorithm is used in VLSI test case generation [21]. The GA finds the optimum set of test vectors that locate all the faults in the circuit.
188
SRllATA RAMAN AND L. M . PATNAIK
Faulty circuits are separated from fault-free ones by the different responses they produce to the same inputs presented to the circuits. The input to the circuit, in the form of 0s and Is, is directly used as the coding for the population, that is, each bit in an individual’s string represents the value of one of the inputs to the circuit. Since the inputs can take only two values, zero or one, the individual’s string is defined over a binary alphabet. Faults are detected by simulating the correct and faulty response of the circuit to a random input test vector. Once a fault is detected, it is removed from the list. This process of detecting and subsequently removing a fault is repeated until a sufficient percentage of the faults have been detected. The simple genetic algorithm (SGA) and the adaptive genetic algorithm (AGA) were used to solve the problem and compare the results. Scaling of fimess values, proportional selection and parameterized uniform crossover (uniform crossover parameter P,, = 0.05) were used in both the algorithms. It was observed that the AGA outperformed the SGA on all circuits, in some large circuits requiring only half the generations to find the result. The AGA’s performance is compared with Lisanke’s approach [16], which generates pseudo-random vectors without any correlation between successive vectors. The results clearly show the better performance of the AGA compared to Lisanke’s method. The problem of GA-based routing has been addressed in [19]. Different models of GAS are suggested to solve the problem. The main idea stressed is the use of intelligent, problem-specific genetic operators. The solutions are represented as graphs and operators that take advantage of this representation are developed. Different mutation and crossover schemes are proposed to solve the problem. Among the diEerent mutation and crossover schemes, one is selected probabilistically at runtime. Deterministic solution refinement scheme is also used after the termination of the GA to try and improve the result. GAS have been applied for the design of multi-chip modules (MCMs) [22]. The entire design process is split into three stages and at each stage, GAS are used to find the optimum solution. During the partitioning stage, the components must be assigned to various chips, with all the chips being finally placed on the same MCM. Each chip has its own constraints which must be satisfied. During placement, the chips must be allocated to slots on the chip layer substrate of the MCM so as to reduce the wiring length and get an even heat dissipation. In layer assignment, the connections between the components must be optimally distributed over a minimal number of layers in the MCM. A standard GA with a non-linear increasing objective function is used in the design process. The function is of the form A
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
189
where A and B are constants and f ( x ) is a function of the variables to be optimized. The lower the value of f ( x ) , the better is the result and consequently the higher is the value of the objective function. The algorithm is tested on some benchmark circuits and the results have been compared with those of simulated annealing (SA). Genetic algorithms have demonstrated their superiority in solving partitioning problems [49]. A novel adaptive genetic algorithm-based partitioning scheme for MCMs integrates four performance constraints simultaneously: pin count, area, heat dissipation, and timing [50,51]. A similar partitioning algorithm based on evolutionary programming has also been proposed [50]. Experimental studies demonstrate the superiority of these methods over the deterministic Fiduccia Mattheyes (FM) algorithm and the simulated annealing technique. The adaptive algorithms yield improved convergence properties. The placement results of both SA and GA are found to be comparable in all the cases [22]. In layer assignment, the results of the genetic algorithm are compared with those of SA and a deterministic algorithm [22]. It is found that for large circuits, SA performs poorly when compared to the other algorithms; the results are roughly identical for small circuits.
6.4
Neural Network Weight Optimization
Training of large neural networks using GAS has been reported in [27]. Three major implementation differences exist between GAS that can optimize the weights in large artificial neural networks (requiring coding of more than 300 bits) and those that cannot. In those GAS that can optimize the weights: (1) encoding is real valued rather than binary; (2) a much higher level of mutation is used; ( 3 ) a smaller population is present, the implication of which is reduction in exploration of multiple solutions for the same network.
The GA used in [27] is a variant of GENITOR [25] which uses one-at-atime recombination and ranlung. Hill-climbing is also used in the algorithm. The algorithm is tested on two examples and the results are compared with those from the back propagation algorithm. The back propagation algorithm is a gradient descent method that uses the derivative of the objective function to descend along the steepest slope on the error surface to get to the minimum. For a neural network that adds two-bit numbers, the genetic hill-climber converges to a solution in 90% of the runs. Search times are roughly comparable with but not superior to back propagation with momentum.
190
SRILATA RAMAN AND L. M. PATNAIK
Another example used to test the GA is a large signal detection network. The network identifies a signal pulse in one of the several channels that spans a frequency range. The problem is complicated by the following facts: (1) A valid signal causes fake signals to appear in surrounding channels. (2) More than one valid signal exists simultaneously across multiple channels. Three hundred training examples and several thousand test examples were used. The results are comparable to those of back propagation. Mutation is used as a hill-climbing operator. If, after mutation, a better solution is obtained, the change is recorded; else mutation is continued. As generations proceed, the population is s h r u n k until only one member is left. After this, mutation remains as the only operator since crossover cannot be used on a single solution. By using this method, it is seen that though the speed of the algorithm consistently improves, the rate of successful convergence decreases. Training of neural networks by GAS is also reported in [21]. Here three examples are used in order to test the algorithm:
(1) A neural network to realize the exclusive-or function. It has 5 neurons and 9 weights. (2) A neural network to output a 4-bit parity. It has 4 inputs, 1 output, 9 neurons, 25 weights and 16 input patterns. The output is 1 for an odd number of ones in the input. (3) A neural network for encoding and decoding. It has 10 inputs, 10 outputs, 25 neurons, 115 weights and 10 input patterns. The input is encoded and decoded such that the output of the network is the same as the input. The results show that the better performance of the adaptive GA (AGA) becomes more noticeable as the problem size increases. It is also seen that the AGA does not get stuck even once in a local optimum. Training of neural networks using genetic programming (GP) is explained in [15]. This class of problems is different from other problems solved by GP in the sense that the solution trees generated have to possess a certain structure that corresponds to a neural network. Since any network cannot be classified as a neural network, the operators have to always maintain legality of the programs that are generated. The GP algorithm not only optimizes the weights, but also finds an optimal architecture for the network. The first step in finding the solution is to model the network as a tree. Some simple rules, which when applied recursively can be used to construct a tree which represents a neural network are described. The operators are designed to preserve the characteristics of the generated trees.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
191
6.5 The Quadrature Assignment Problem The quadrature assignment problem (QAP) has been solved using evolutionary strategies [18]. A QAP of order n is the problem that arises when trying to assign n facilities to n locations. The QAP is modeled as an integer programming problem. It has been shown that the QAP is NP-hard. The problem is represented using two matrices: 0 0
D specifies the distance between the locations. F specifies the flow of material, information, etc. between the locations.
The principal diagonal of both the matrices is 0. The method used here is (1, n)-ES: n children are created by copying the parent and then randomly swapping integer values on the string via mutation. Recombination is not employed. The parent is not allowed to compete for the next generation. This is reported to be better than when the parent also competes for survival. The number of swaps during mutation is randomly chosen to be one or two. The best child obtained becomes the parent for the next generation. If the child is not better than its parent, a counter is incremented; else it is reset to 0. When the counter reaches a predetermined value, some non-standard operator is applied in order to shift the focus of search to a new region of the state space and to escape from the local minimum. The ant system applied to the QAP is discussed in [8]. The algorithm is run on some standard problems described in [29] and the results are compared with those of other algorithms. It is seen that the ant system along with nondeterministic hill-climbing is able to find the best solution to all the test problems. The QAP has also been solved using an evolutionary approach in [32,33I. In both the approaches, local search methods are used to try and improve the results after every generation. The algorithms are implemented on parallel systems.
6.6 The Job Shop Scheduling Problem (JSP) In the JSP, n jobs must be processed by m machines. Each job must be processed on a specific set of machmes in a particular order and each machine takes a given amount of processing time. The JSP is to find a sequence of jobs in each machine so as to minimize an objective function. The objective function takes into account the total elapsed time to complete the job, the total idle time of the machmes and the specified due date of completion of the job. Scheduling in a production plant is essentially the job of finding an optimal way of interleaving a set of process plans (a process plan consists of a set of instructions to process a job) so as to share the resources. Given a job, there may be a large number of valid process plans. Thus, the optimizing algorithm must not find optimal process plans and optimal schedules in isolation of each other as some optimal process plans may cause bottlenecks in a schedule, leading to a sub-optimal schedule.
192
SRILATA RAMAN AND L. M. PATNAIK
The coevolution model is used to solve the JSP [41]. In this model, each population represents a feasible process plan for a particular job. The fitness of the individuals is calculated based on the resources shared between them. Thus, an optimal schedule is also found in this process without actually having to include a separate stage to find the optimal schedule. Another population of arbitrators whose main job is to resolve conflicts among population members with respect to the resources shared among them is maintained. The more the conflicts resolved by an arbitrator, the higher is its fitness. Each arbitrator consists of a table which specifies which population must be given precedence if a conflict occurs. The members of each population are spread over a twodimensional toroidal grid. A local selection method allows competition only among members in the neighborhood. Good results are reported for problems with up to ten jobs [41].
7. Comparison of Search Algorithms The salient features of the different algorithms mentioned in this chapter for optimization, are presented in Tables I and II. TABLEI STRLJ~~URES USEDIN THE SEARCH PROCESS Algorithm
Structure
GA GP
Population of fixed length strings Population of a hierarchical compositionof functions A single point in the state space A real-valued vector A vector of weights A domain specific structure
Hill-climbing ES NN SA
TAFJLE I1 OPERATORS THAT MODIFY THE STRUCTURE
Algorithm
Operations
GA
Selection, crossover, mutation Selection, crossover Gradient information Gaussian mutation Error measure or delta rule Domain-specificmethod
GP Hill-climbing ES NN SA
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
193
8. Techniques to Speed up the Genetic Algorithm Since GAS are computationally intensive, even small changes in the algorithm leading to substantial savings in computation time are desirable. Some of the more common methods for speed-up are listed below:
(3)
(4)
(5)
(9)
Recalculation of fitness of individuals not affected by mutation or crossover can be avoided. If the evaluation function has trigonometric or logarithmic functions, a look-up table can be used to get the values rather than using a generating series (such as the Taylor series.) For small state spaces, evaluation can be a look-up process. Complex calculations can be simplified and approximated if very accurate answers are not required. If the algorithm is being timed, unnecessary output (graphics or printer output) can be removed from the program. Programs can be compiled and optimized for speed. Finding the correct selection procedure saves time (rank-based procedures require sorting of the individuals in the population). Since in GA problems most of the computation time is spent in evaluating the individuals, even small improvements in the evaluation function greatly speed up the algorithm. Repeated access of secondary storage for every generation must be avoided, especially in a multiprogrammed environment.
9.
Conclusions
Though evolutionary concepts have yielded attractive results in terms of solving several optimization problems, there are many open issues to be addressed. Notable among them are: (i) the choice of control parameters; (ii) the characterization of the search landscape amenable to optimization; (iii) the exact roles of crossover and mutation; (iv) convergence properties. In recent years, such computing paradigms are emerging as independent disciplines, but they demand considerable work in the practical and theoretical domains before they are accepted as effective alternatives to several other optimization techniques. This article is aimed at providing a synthesizing overview of the several issues involved in the design of efficient algorithms based on evolutionary principles. The examples discussed in the chapter unfold the promise such techniques offer. It is hoped that the number and diversity of the applications will expand in future.
194
SRILATA RAMAN AND L. M. PATNAIK
Future developments in this significant area, among other things will be directed more towards the design of hybrid systems which have an association of evolutionary techniques and other optimization algorithms. A typical example is a combination of genetic algorithms and neural networks or expert systems. The underlying principle behind such hybrid algorithms have been highlighted in this chapter. “Best things come from others”, this optimization hopefully lies behind the further success of this significant area. REFERENCES 1. Atmar, W. (1994). Notes on the simulation of evolution. fEEE Transactions on Neural
Networks 5(1), 130-147. 2. Baluja, S. (1994). Population Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning. Technical Report CMU-(2-94-163, Camegie Mellon University, Pittsburgh, June. 3. Back, T., and Hoffmeister, F. (1991). Extended selection mechanisms in genetic algorithms. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA. pp. 92-99. 4. Bertoni, A., and Dorigo, M. (1993). Implicit parallelism in genetic algorithms. Artifzcial Intelligence, 61, 307-314. 5. Bramlette, M. F. (1991). Initialization, mutation and selection methods in GAS for function optimization. Proceedings of the 4th International Conference on GAS, Morgan Kaufrnann, San Mateo, CA, pp. 100-107. 6. Cohoon, J. P., Martin, W. N., and Richards, D. S. (1991). A multipopulation genetic algorithm for solving the K-partition problem on hypercubes. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA, pp. 244-248. 7. Davidor, Y. (1991). A naturally occurring niche and species phenomenon: the model and first results. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA, pp. 257-263. 8. Dorigo, M., Maniezzo, V., and Colorni, A. (1996). Ant system: optimization by a colony of co-operating agents. IEEE Transactions on Systems, Man and Cybernetics, 26(1), 29-41. 9. Fogel, D. B. (1994). An introduction to simulated evolutionary optimization. IEEE Transactions on Neural Nehvorks, 5(1), 3-14. 10. Goldberg, D. E., Deb, K., and Korb, B. (1991). Don’t worry, be messy. Proceedings of the 4th International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 24-30. 11. Grefenstette, J. J. (1986). Optimization of control parameters for GAS.IEEE Transactions on Systems, Man and Cybernetics, 16(1), 122-128. 12. Back, T., Hoffrneister, F., and Schwefel, H.-P. (1991). A survey of evolution strategies. Proceedings of the Fourth International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 2-9. 13. Hoffmeister, F., and Back, T. (1992). Genetic Algorithms and Evolution Strategies: Similarities and Differences. Technical Report No. SYS-1/92, University of Dortmund, February. 14. Kido, T., Takagi, K., and Nakanani, M. (1994). Analysis and comparisons of GA, SA, TABU search and evolutionary combination algorithms. Informatica, 18(4), 399-410.
OPTIMIZATION VIA EVOLUTIONARY PROCESSES
195
15. Koza, J. R. (1993). Genetic Programming On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA. 16. Lisanke, B. F., De Gaus, A,, and Gregory, D. (1987). Testability driven random testpattern generator. IEEE Transactions on CAD, CAD6, 1082- 1087. 17. Mansour, N., and Fox, G. C. (1991). A hybrid genetic algorithm for task allocation in multicomputers. Proceedings of the 4th International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 466-473. 18. Nissen, V. (1994). Solving the quadrature assignment problem with clues from nature. IEEE Transactions on Neural Networks, 5(1), 66-72. 19. Prahalada Rao, B. P. (1994). Evolutionary Approaches to VLSI Channel Routing. Ph.D. Dissertation, Indian Institute of Science, Bangalore. 20. Spears, W. M., and De Jong, K. A. (1991). On the virtues of parameterized uniform crossover. Proceedings of the 4th International Conference on GAS, Morgan Kaufrnann, San Mateo, CA, 230-236. 21. Srinivas, M. (1993). Genetic Algorithms: Novel Models and Fitness Based Adaptive Disruption Strategies. Ph.D. Dissertation, Indian Institute of Science, Bangalore. 22. Vemuri, R. (1994). Genetic Algorithms for Partitioning, Placement and Layer Assignment for Multi Chip Modules. Ph.D. Dissertation, University of Cincinnatti. 23. Vose, M. D., and Liepins, G. E. (1991). Schema disruption. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA, pp. 237-242. 24. Ribeiro Filho, J.’L., Trelevan, P. C., and Alippi, C. (1994). Genetic algorithm programming environments. IEEE Computer, June, 28-43. 25. Whitley, D., and Kauth, J. (1988). GENITOR: a different genetic algorithm. Proceedings of the 1988 Rocky Mountain Conference on Artificial Intelligence, pp. 118- 130. 26. Whitley, D., Starkweather, T., and Fuquay, D. (1989). Scheduling problems and traveling salesman: the genetic edge recombination operator. Proceedings of the 3rd International Conference on GAS,Morgan Kaufmann, pp. 133- 140. 27. Whitley, D., Dominic, S., and Das, R. (1991). Genetic reinforcement learning with multilayer neural networks. Proceedings of the 4th International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 562-569. 28. Cohoon, J. P., Hedge, S. U., Martin, W. N., and Richards, D. (1988). Distributed Genetic Algorithms for the Floor Plan Design Problem. Technical Report TR-88-12, School of Engineering and Applied Science, Computer Science Department, University of Virginia. 29. Nugent, C. E., Vollmann, T. E., and Ruml, J. (1968). An experimental comparison of techniques for the assignment of facilities to locations. Operations Research, 16, 150- 173. 30. Smith, J. M. (1982). Evolution and The Theory of Games, Cambridge University Press, Cambridge. 31. Holland, J . H. (1975). Adaptation in Natural and Artificial Systems. Ph.D Thesis, University of Michigan Press, Ann Arbor, MI. 32. Brown, D. E., Hurtley, C. L., and Spillane, R. (1989). A parallel genetic heuristic for the quadratic assignment problem. Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann, pp. 406-415. 33. Muhlenbein, H. (1989). Parallel genetic algorithms, population genetics and combinatonal optimization. Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann, pp. 416-421. 34. Kunsawe, F. A. (1991). A variant of evolution strategies for vector optimization. In Parallel Problem Solving from Nature, (H. P. Schwefel and R. Manner, Eds), pp. 193-197. 35. Miller, B. L., and Shaw, M. J. (1995). Genetic algorithms with dynamic niche sharing for multirnodal function optimisation. IlliGAL Report No. 95010, University of Illinois, December.
196
SRILATA RAMAN AND L. M. PATNAIK
36. De Jong, K. A. (1975). Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph.D. Dissertation, University of Michigan, Ann Arbor, Michigan. 37. Mahfoud, S. W. (1992). Crowding and preselection revisited. In Parallel Problem Solving from Nature-2, (B. Manner and B. Manderick, Eds), Elsevier, Amsterdam, pp. 27-36. 38. Culberson, J. (1992). Genetic Invariance: A New Paradigm for Genetic Algorithm Design. Technical Report TR92-02, University of Alberta, Canada, June 92. 39. Dhaeseleer, P. (1994). Context preserving crossover in genetic programming. Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE Press, pp. 256-261. 40. Grefenstette, J. (198 1). Parallel Adaptive Algorithms for Function Optimisation. Technical Report CS-81-19, Vanderbilt University, Computer Science Department. 41. Husbands, P., and Mill, F. (1991). Simulated coevolution as the mechanism for emergent planning and scheduling. Proceedings of the 4th International Conference on Genetic Algorithms, (R. Belaw and L. Booker, Eds), Morgan Kaufmann, San Mateo, CA, pp. 264 - 270. 42. Hillis, W. D. (1990) Co-evolving parasites improve simulated evolution as an optimisation procedure. Physica, D.42, 228-234. 43. Davis, L. (1991) Handbook of Genetic Algorithms, Von Nostrand Reinhold, New York. 44. Rudolph, G. (1994). Convergence analysis of canonical genetic algorithms. IEEE Transactions on Neural Nefworks, S(1). 96- 101. 45. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671 -680. 46. Goldberg, D. E. (1989). Genetic Algorithms in Search. Optimization, and Machine Learning, Addison-Wesley, Reading, MA. 47. Srinivas, M., and Patnaik, L. M. (1994). Genetic algorithms: a survey, lEEE Computer, June, 17-26. 48. Grefenstette, J. J. (1984). Genesis: a system for using genetic search procedures. Proceedings of the Conference on Intelligent Systems and Machines, pp. 161- 165. 49. Raman, S., and Patnaik, L. M. (1995). An overview of techniques for partitioning multichip modules. International Journal of High Speed Electronics and Systems, 6 (4), 539-553. 50. Raman, S., and Patnalk, L. M. (1996). Performance-driven MCM partitioning through an adaptive genetic algorithm, IEEE Transactions on VLSl Systems, 4(4), 434-444. 51. Majhi, A. K., Patnaik, L. M., and Raman, S. (1995). A genetic algorithm-based circuit partitioner for MCMs. Microprocessing and Microprogramming, The Euromicro Journal, 41.83-96.