PHYSICA
N
ELSEVIER
Physica D 109 (1997) 343-350
Chaotic behavior in evolution strategies G.W. G r e e n w o o d 1 Department of Electrical & Computer Engineering, Western Michigan University, Kalamazoo, M149008, USA
Received 25 June 1996; received in revised form 3 February 1997; accepted 22 April 1997 Communicated by L. Kramer
Abstract Evolution strategies are heuristic optimization techniques which emulate biological systems. In this paper first it is shown that they can be formulated as one-dimensional maps and that they can be extremely sensitive to initial conditions. Then it is shown how this chaotic behavior can be exploited to efficiently conduct search operations for a globally optimal solution. Keywords: Evolution strategies; Fractal basin boundaries
1. Introduction In recent years there has been an enormous amount of research activity using evolution strategies to find good solutions to NP-hard optimization problems. Evolution strategies use the principles of reproduction and selection found in nature to explore exponentially large problem spaces for acceptable solutions. Evolution strategies are (in the limit) guaranteed to asymptotically converge to the globally optimum solution. Chaotic systems produce unpredictable results. Although chaotic behavior has been observed for some period of time, it is only in recent years that techniques have been developed to identify and analyze their effects. Chaotic behavior has been observed in a variety of surprisingly simple physical systems. Since evolution strategies emulate biological systems, it is natural to ask if they too are susceptible to chaotic behavior. My results indicate that, under 1Fax: +1 616 387-4024; e-mail:
[email protected].
certain circumstances, this is indeed the case. The discussion begins with a brief introduction to evolution strategies. More detailed information can be found elsewhere [1].
2. Evolution strategies Evolution strategies are based upon the principles of adaptive selection found in the natural world. Each generation (iteration of the algorithm) takes a population of individuals (potential solutions) and stochastically modifies the genetic material (problem parameters) to produce new offspring. The basic algorithm is shown in Fig. 1. The initial population of individuals is randomly generated but, ideally, should be uniformly distributed throughout the search space so that all regions may be explored. Each individual in the population represents a potential solution to the optimization problem and its fitness reflects the quality of that solution. Individuals with a higher level of fitness should have a high probability for survival so
0167-2789/97/$17.00 Copyright © 1997 Elsevier Science B.V. All rights reserved Pll S0167-2789(97)00072-9
G.W. Greenwood/Physica D 109 (1997) 343-350
344
Procedure ES R a n d o m l y g e n e r a t e initial p o p u l a t i o n with # individuals Evaluate population W h i l e s t o p p i n g c r i t e r i a not m e t d o A p p l y genetic o p e r a t o r s t o p r o d u c e A offspring E v a l u a t e offspring Select # best fit individuals t o survive End while
E n d ES Fig. 1. A simple evolution strategy.
that they can reproduce in future generations. For a real-value parameter optimization problem (which is the type of problem discussed here) each individual contains a vector X E R n. An objective function f ( X ) is used as a measure of the individual's fitness. There are several different versions of an evolution strategy with the (be + X)-ES and (be, X)-ES being the most common. In the former be parents produce X offspring; parents and offspring compete equally for survival. In the latter # parents produce ~. > be offspring but only the best/x offspring survive. Thus the lifespan of any solution is only a single generation. It has been shown that a (be + X)-ES will asymptotically converge to the globally optimum solution [1]. This capability is not guaranteed with gradient, branch and bound or other heuristic techniques. It must be stressed that evolution strategies are not genetic algorithms. These two paradigms were independently developed (in Germany and US, respectively) and are philosophically quite different. Fogel [2] points out that genetic algorithms emphasize models of genetic operators as observed in nature (e.g., one-point crossover) while evolution strategies emphasize mutational transformations that maintain phenotypical behavior between parent and offspring. In addition, survival in genetic algorithms is probabilistic while it is strictly deterministic in evolution strategies. Both Fogel [2] and B~ick [3] describe in detail the significant differences between these two paradigms and provide some comparisons of their performance. Evolution strategies outperform genetic algorithms on the functions tested in [2,3],
3. Example problem Let X = [Xl, x2]T, where x~, x2 6 [-2.048, 2.048] and assign f ( X ) = 100(x 2 - x2) 2 -}- (1 - Xl) 2.
(1)
Eq. (1) is the well-known Rosenbrock function [4]. The largest values of this function occur as xl -+ 4-2.048 and/or x2 --> :t:2.048 with the global maximum at Xl = x2 = -2.048. This function has been used to evaluate the performance of different types of evolution algorithms [5]. If f ( X ) represents a fitness function (which is to be maximized 2), a (1 + 1)-ES can be formulated with the following pair of one-dimensional maps: 3
xln+l) = MI(x[ n)) =
{ X~n), ~In),
f ( X ) _> f ( X ) , f ( X ) < f(:K),
X n+l) 2
M2(x~n)) _-- { x(2n)'
f (X) > f (X),
)~2(n),
f(X) < f(X),
=
x=[xl I X2
'
-~2
[.X2 q- O'2JV'2
(2)
'
where M i : [-2.048, 2.048] --+ [-2.048, 2.048]. Each iterate of these maps is equivalent to producing and 2 Usually one is interested in finding the minimal value of the Rosenbrock function. In this analysis the maximum value is sought after. 3ff X is n-dimensional, there are n such one-dimensional maps.
G.W. Greenwood/Physica D 109 (1997) 343-350 evaluating one generation in the (1 + 1)-ES..Af denotes a normally distributed random variable with zero mean and unity variance. Multiplying N" by o- changes the standard deviation to or. The term ,A/~imeans that a new random variable is selected for each i, Thus, adding criN~i to xi is the mutation mechanism used by the (1 + 1)-ES and xi is the offspring of the parent xi. Reproduction operators of this type are extensively used in evolution strategies [1,4]. Notice that Mi (x (n)) is noninvertible since x}n+l) can come from either x}n) or 2~n). Individuals in the population are composed of both object parameters (the xi 's) and strategy parameters (the cri 's). Offspring are produced by perturbing these parameters though the process differs depending on the parameter type. Object parameters are usually mutated by adding gaussian noise. The strategy parameters effect the magnitude of that noise but no single constant value produces optimal convergence to a highly fit fixed point. Rechenberg [6] proposed a 1~5-success rule which deterministically adjusts cr in a (1 + 1)-ES. This rule says that the ratio of successful mutations to all mutations should be 1/5; cr should be increased (decreased) if the ratio is greater (smaller) than 1/5. In the multimember variants a log-normal method of self-adaptation is used to alter or. In its simplest form adaptation is done by ~' = ~ exp(r .N'),
(3)
where r is an exogenous control parameter and a single ~r value is used with all object parameters. This selfadaptation is done in each generation and it is normally performed prior to mutating the object parameters. 4 The adjustable parameter for the above maps is o-. It is interesting to determine just how sensitive an evolution strategy is to this parameter. While it is true that evolution strategies always adapt these parameters, initially I will assume a fixed value so that a sensitivity analysis can be performed, This will be followed by an analysis of the self-adapting case.
4Recombination is not precluded in evolution strategies though mutation is the primary search mechanism (see [3, p. 132]).
345
3.1. Analysis without self-adaptation A number of trial runs were conducted for randomly selected initial values of xl and x2 and for different values of or. In all cases the maps reach a fixed point attractor (x, = - 2 . 0 4 8 or 2.048) within a few hundred iterations. A bifurcation diagram [7] indicated that neither of the maps contain periodic orbits with period p >_ 2, In the analysis that follows only the x2 map is considered. Fig. 2(a) shows the attractor for the x2 map. This figure was constructed by using a 256 x 256 grid of initial conditions. (In all cases xl was initialized to 0.06.) For each initial condition the map was iterated several thousand times. If an initial condition produced an orbit which reached the fixed point x, = 2.048, then a black dot is plotted at the initial condition's coordinates. Conversely, if the orbit reached the fixed point x, = - 2 . 0 4 8 , then a blank was printed. Thus, the black and white regions essentially define the basin of attraction (at least within the finite resolution of the printer). Fig. 2(b) shows a magnified portion of the basin boundary. This fine-scaie structure is still prevalent and it can be shown that it exists to an arbitrary scaling. It is interesting to investigate the sensitivity of the evolution strategy to or. This sensitivity analysis was conducted using the technique described by McDonald et al. [8]. A single initial condition (xl, x2) = (0.06, 0.0025) was chosen as was a random value of a in the interval [0, 1]. The maps were then iterated a few thousand times to determine which of the two attractors the x2 orbit approaches. ~ was then perturbed by ±E and the same initial condition was iterated for both the perturbed maps. If either of the perturbed orbits approached a different attractor, that particular cr value was marked uncertain under the error e. This experiment was repeated several thousand times (using the same initial condition) for e < 10 -1. Recall that offspring are produced by adding crib/ to the parent. It was important to ensure that the arrival at a different attractor could be attributed solely to a perturbed ~ value. This was achieved by recording the .Af/ terms from the unperturbed orbit. These recorded
G.W. Greenwood/Physica D 109 (1997) 343-350
346
~2
-0.5
-
oo •
lip o o
°o
•
oo
-1.0
el4)
b*°8'
-1.5
". -~.~ %, : oo. ~l-qll"
I
~.'
~b
.FI.'-~- .o -oo-
-" " " " - ~~ '- *;,.%,~, : --~-,2~.~_., ~ ..." "r.~; , N . ~ . - . / , ~ , ~ i~., . . . . •.:~.~:'.~ "=~-i.c ~ "~'~'-~
-
5x10 -3 •.
•
~'.~;,
"z-...~'~ "~',:~':
I ..~,..'.~:.-.,~ :..,. :-~.~
-2.0 0.0
,
,
I
u
,
0.2
0.4
0.6
0.8
1.0
•. ~ . r.~ ~
~:"--'. - "
o"
+--- 5 x 1 0 - 3 --4
(a)
(b)
Fig. 2, (a) Basin of attraction for x2 map with Xl = 0.05. (b) Magnified in region 0.800 < cr < 0.805 and 0.0 < x2 _<0,005. Black (blank) regions indicate that x2 orbit terminates at the fixed point 2.048 (-2,048). Evolution strategy did not use self-adaptation, values were then used for each of the corresponding perturbed orbits. The fraction f of ~ values marked uncertain is plotted as a function of E in Fig. 3. This log-log plot indicates that a clear power law relationship exists between the uncertain fraction of (7 values and the error e. More specifically, f ~ ec~, where ot is the slope of the line in Fig. 3. Let D denote the problem space dimension and d the dimension of the basin boundary. The uncertainty exponent oe is related to the basin boundary dimension by [8]
The dimension of the problem space is D = 2 and ~ 0.52. Substituting yields d = 1.48 which indicates that the basin boundary has a fractal dimension. One practical consequence of this is that one cannot say which attractor will eventually be reached by the evolution strategy with any degree of certainty.
a = D -- d.
x}n+l) = Mi (x} n), a (n))
(4)
3.2. Analysis with self-adaptation I now assume that g is self-adapted with each generation using Eq. (3) with r > 0. The maps from Eq. (2) now have the form (5)
347
G.W. Greenwood/Physica D 109 (1997) 343-350 _
- 1 -
m
log(f)
-2 -
--3-
I
I
I
I
I
I
-6
-5
-4
-3
-2
-i
log(e) Fig. 3. Log-log plot of f versus e for evolution strategy without self-adaptation. Slope of line is 0~~ 0.52.
+- 5xlO -3
to emphasize the fact that the strategy parameters vary over time. Surprisingly, the basin of attraction was found to be remarkably similar to the no selfadaptation case even when reasonable values of v were used. Fig. 4 shows a magnified portion of the basin boundary for a self-adapted evolution strategy with r = 0.25. (This is the same region that was magnified in the no self-adaptation case. The reader should compare this figure with Fig. 2(b).) The greater density of black dots in Fig. 4 indicates that self-adaptation has a greater tendency of driving the orbit towards the attractor at 2.048. Nevertheless, a fine grained structure is still apparent which means that the final destination of an orbit which begins in this region cannot be reliably predicted. Conducting the same type of sensitivity analysis as was done in the previous section would be of little value because a is modified anyway by the selfadaptation process. The following type of analysis was used instead. Two xe orbits which started in the fractal basin boundary were constructed. For both orbits the initial value of x] was identical but in the second orbit the initial value of x2 was perturbed by 1 0 - 4 . The e v o -
~.*..,,~ . , . ~ . o . . ~
X2
• ,~...
.,~,-,~"~ ~-~ ". ¢- "s.,,~ 5 x l O - 3
$
O"
Fig. 4. B a s i n o f attraction for x 2 m a p w i t h x 1 = 0.06. B a s i n is m a g n i f i e d in the r e g i o n 0.800 _< a _< 0.805 and 0.0 < x2 < 0.005. E v o l u t i o n strategy was self-adapted w i t h v = 0.25.
lution strategies were self-adapted with v = 0.25 and a (°) = 0.45. The same randomness was used in the evolution of both orbits so that any deviation could be attributed to a sensitivity to initial conditions. Fig. 5 shows a representative case that was generated in this manner. Clearly the distance between these two orbits is exponentially increasing. Extensive testing indicated that this exponential divergence does not occur for every pair of orbits. Nevertheless, it did occur in
348
G.W. Greenwood/Physica D 109 (1997) 343-350 _
d i
s t n c
e
32-
_/
i-
0 0
I
I
I
I
1
20
40
60
80
i00
Iteration (n) Fig. 5. Distance between two x2 orbits with position of second orbit perturbed by 10-4. Evolutionary strategy was self-adapted with ~(0) = 0.45 and r = 0.25. The same randomness was used in both orbits.
1.0o(n)
0.5-
0.0
0
[
I
1
I
I
20
40
60
80
100
Iteration (n) Fig. 6. Self-adaptation of cr over time. a large number of cases. This behavior is completely consistent with the fine structure depicted in Fig. 4. Sensitivity to initial conditions is a defining attribute of chaotic systems. It is interesting to see how cr (n) varied over time. A large number of runs were conducted with all orbits beginning near the origin (a relatively flat region in the Rosenbrock plot). Without loss of generality only results from runs in which both the Xl and x2 maps reached the fixed point x , = - 2 . 0 4 8 were recorded. In a majority of these runs there was an initial increase in (n) but over time it decreased indicating convergence. Fig. 6 shows a representative case. It is important to note that the orbits reached x , in far fewer iterations than did the runs without self-adaptation.
4. Discussion
I did not investigate the case where each onedimensional map had its own strategy parameter. The
self-adaptation process is affected by the number of strategy parameters. 5 No map works in isolation but is coupled to all other maps via the fitness function. Therefore making any generalizations about the long term evolution of a strategy parameter for a particular map would be difficult especially if the number of maps is large. Moreover, it is unlikely that these properties will hold with a different fitness function. Ultimately the selection process of the evolution strategy will determine if the mutated strategy parameter is suitable. The above examples do not guarantee that chaotic behavior is always present in an evolution strategy but they do indicate that it is a possibility. While chaotic behavior can be undesirable in some systems, it can prove to be advantageous in evolution strategies. Indeed, chaotic behavior can even be exploited to conduct a more thorough search of the problem space in order to find the global optima. 5 See [1, Section 4] for a discussion on this issue.
G.W. Greenwood/Physica D 109 (1997) 343-350
2.0
//\
1.5 1.0
_ _ _
"
f
349
((~Zl,
• ..
/
(~X2)
(0,0)
/
J
I k," I
/ /
0.5
- ...............
\
I ,/
0.0
.t
-0.5 |
-1.0 -1.5
/
[ I
":
-2.0 I
I
I
I
I
I
I
I
- 2 . 0 - 1 . 5 - - 1 . 0 - 0 . 5 0 . 0 0.5 1.0 1.5 2.0
F i g . 7. T w o o r b i t s f o r the R o s e n b r o c k f u n c t i o n w i t h a = 0 . 6 5 . O r b i t s t e r m i n a t e in the l o w e r left a n d l o w e r r i g h t q u a d r a n t s . E v o l u t i o n strategy did not use self-adaptation.
Conventional wisdom says that the initial population in the evolution strategy should be uniformly distributed throughout the problem space so that all regions may be investigated. This may be difficult to achieve as the initial population is randomly generated. Yet, it may still be possible to explore all regions of the problem space by taking advantage of the extreme sensitivity to initial conditions that exists in chaotic systems. Suppose a search for the global optima of the above example problem is to be conducted using a (50 +50)ES. Instead of trying to create an initial population of 50 uniformly distributed individuals, create only one individual. The initial conditions of the xi's and should be chosen so that this single individual resides on the fractat basin boundary. The remaining 49 individuals are created by perturbing each xi from the original individual by some small random variable 3xi. It can easily be verified that all these individuals are distinct and reside on the fractal basin boundary. Each individual then becomes the parent of a (1 + 1)-ES. Thus, instead of running a single (50 + 50)-ES, one runs 50 independent (1 + 1)-ES. The orbits of these (1 + 1)-ES can be expected to exponentially diverge. Hence each (1 + 1)-ES will explore a different region
of the problem space. After F generations, the final resuits can be compared to find the overall best solution. It is important to stress that these (1 + 1)-ES orbits are computed independently. Hence this process is ideally suited for running on a parallel processing system. The concept of exploring with diverging orbits is shown in Fig. 7. The first orbit had x 1(0) = 0.0 and x 2(0)
= 0.0 while the second orbit incremented these by some small 3x. Notice that the orbits traverse completely different regions of the problem space and converged to different attractors. Admittedly in practice this process of finding a basin boundary and determining its characteristics can be quite time consuming (though it could be run efficiently on a parallel processing system). While this approach does not guarantee a better solution will be found, it does produce a thorough search which increases the likelihood of finding that better solution.
5. Final remarks As cr --+ 0 any offspring produced will be in the vicinity of its parent within the phase space. It is possible for the strategy parameters to become
350
G. W. Greenwood/Physica D 109 (1997) 343-350
arbitrarily small under mutation. Note that as cr -+ 0 the corresponding map transforms into an identity map Mi ( x i ) =-- xi which fixes all points thus halting the search process. To preclude this from happening a lower bound is chosen so that a (n) > a0 > 0. For this investigation a0 = 0.05 was used though a (n) values this small were rarely observed. The rate of progress in an evolution strategy is strongly influenced by the mutation step size. Indeed, only a small range of step sizes seems to produce reasonable progression rates. Rechenberg [9] refers to this as an "evolution window". The ~r values influence the mutation step size since they determine the standard deviation of the N}/ terms. Too small a mutation step size leads to search stagnation while too large a value reduces to a random search. What this paper shows is that while evolution may progress over only a small range of mutation step sizes, it may not be possible to predict to what point this progression is occurring. It is natural to ask if chaotic behavior is inherent in evolution strategies or are the results presented in this paper just an isolated case. Although no extensive testing with other fitness functions has yet been conducted, I conjecture that this type of behavior may occur on any number of occasions. The (1 + 1)-ES can be formulated as a one-dimensional map. This map is noninvertible because there are two possible sources for a parent (the parent or offspring from the previous
generation). Ott [7] points out that chaos is possible in one-dimensional maps if they are noninvertible. Consequently, chaotic behavior cannot be ruled out in any particular instance of an evolution strategy. This is not necessarily a bad thing as it can be exploited as a search mechanism.
References [1] T. Back and E Hoffmeister, Basic aspects of evolution strategies, Statistics and Computing 4 (1994) 51-63. [2] D. Fogel, Evolutionary Computation (IEEE Press, New York, 1995). [3] T. B/ick, Evolutionary Algorithms in Theory and Practice (Oxford Press, New York, 1996). [4] H. Rosenbrock, An automatic method for finding the greatest or least value of a function, Comput. J. 3 (1960) 175-183. [5] N. Schraudolph and R. Belew, Dynamic parameter encoding for genetic algorithms, Machine Learning 9 (1992) 9-21. [6] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution (Frommann-Holzboog Verlag, Stuttgart, 1973). [7] E. Ott, Chaos in Dynamical Systems (Cambridge University Press, New York, 1993). [8] S. McDonald, C. Grebogi, E. Ott and J. Yorke, Fractal basin boundaries, Physica D 17 (1985) 125-153. [9] I. Rechenberg, Evolution strategy, in: Computational Intelligence, Imitating Life, eds. J. Zurada, R. Marks and C. Robinson (IEEE Press, New York, 1994).