J 1
,:
7"
FUZ2'¥
I
sets and systems Fuzzy Sets and Systems95 (1998) 67-76
ELSEVIER
Short Communication
Applying evolutionary programming to selected set partitioning problems P a u l G o m m e a'b, P a u l G . H a r r a l d c'* aSimon Fraser University, Burnaby, B.C., V5A 1S6, Canada bCREFE/UQAM, Casepostale 8888, succursale Centre-Ville, Montrdal, Qudbec, H3C 3P8 Canada °Manchester School of Management, UMIST, P.O. Box 88, Manchester, M60 1QD, UK
Abstract Evolutionary programming is applied to several instances of the set partitioning problem. Comparison is made between the distribution of best-evolved solutions arising from implementations of the EP with the empirical distribution of a randomly selected trial solution. © 1998 Elsevier Science B.V.
I. Introduction
that is
There has been a recent significant revival in interest in the evolutionary optimization techniques of evolutionary prooramming, developed by Fogel and colleagues in the early 1960s [7], and exemplified in several recent works such as Fogel [4-6]. Along with 9enetic aloorithms [3,8, 10], evolution strategies [1, 13, 14], and simulated annealing [2,11], evolutionary programming (EP) completes the basic categorization of stochastic search routines based on broadly evolutionary principles. This paper contributes to this line of research by developing an EP for the set partitionin9 problem.
S j ~ O, S~c~S ~ = 0
* Corresponding author.
for i C j,
(2)
U s~ = s,.
(3)
J
Define the following: 6~=
~
N~
(4)
Ni~S j
and
=
2. The set partitioning problem Consider a finite set of real numbers 5e = {N i}~=1. " Now consider a partition of S~, {SJ }j=l, "
(1)
F, i= l , . . . , r a -
Z 2 j=i+
I~,- ~jl;
(s)
l,...,m
then the set partitioning problem is to specify each Sj, subject to the restrictions above, in order to minimize A. This abstract problem has several realworld counterparts. The best known example is that of crew scheduling for flight legs [9]. Our
0165-0114/98/$19.00 © 1998 ElsevierScience B.V. All rights reserved PII S0165-01 14X(96)00404-6
68
P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67 76
characterization can be interpreted broadly as a task allocation problem in which the subsets forming the partition are individual working units, and the integers assigned are tasks (e.g., flight legs) assigned to each unit (e.g., crews). Written in this form (taken from [12, p. 258]) the objective is to minimize differences in workloads among the crews. Generically, this SPP represents a class of problems in which tasks of varying resource requirements are to be allocated to productive units as "evenly" as possible, which applies to all problems in which the cost of using the productive units is convex in the rates of their use.
3. Evolutionary programming Consider the generic problem of minimizing some objective function F(x) for x ~ 9. A classic EP implements a stochastic search of ~ by operating on a population of trial solutions from ~ in the following manner: (1) Generate an initial "population" of random elements of 9. (2) Remove the worst performing half of the population (those with the greatest associated values of F(x)). (3) "Mutate" each remaining member of the population to replace those solutions just deleted, and go back to (2). This procedure selects "fitter" members of the current population to act as parents for the next. The definition of fitness will vary from context to context (but it should, of course, reflect the optimization problem at hand), and the meaning of "mutation" will depend upon the context and the chosen representation of a trial solution. For example, if x is real-valued, it is typical to apply a sample from a Normal distribution, perhaps parameterized dynamically. In other applications, such as order-based problems (e.g., the traveling salesman problem, see [4]), mutation remains welldefined as a mapping from ~ into 9. An alternative routine for selection would have each member of the current population compared to a random subset of other solutions. For each favorable comparison, the solution receives a
"win", and then based on wins a chosen selection of the population can be removed and replaced. For some researchers a probabilistic selection routine is considered important, but remains a topic for further study in this application.1
4. The evolutionary program To encode any given solution to the SPP we use a vector of n integers, in the range l .... , m given by (al . . . . . an), indicating the subset that (N1 .... , Nn) are assigned to. We generate an initial population of size p. This is achieved by randomly assigning the elements of 5P to subsets 5ei. Any initial solution with an 5ej = O is discarded and replaced with a new solution. Each solution is evaluated according to the criterion (5). After evaluation, the worse half of the population (i.e., those with high function evaluations) is discarded and replaced with a mutation of each member in the top half of the population. 2 Specifically, we mutate a solution by randomly choosing q elements from 5a, deleting each element from its subset 5~j and reassigning it to a randomly chosen subset 5aj'. An element is not deleted if it would produce an invalid solution (i.e., SPJ= 0). Execution of the EP continues until an optimum solution is found, or until q is reduced until it has a value of 1, i.e., one element is reassigned from one subset to another. We begin with q = 10 and reduce it by one after one-tenth of the maximum number of function evaluations. This approach is similar in spirit to the annealing schedule in simulated annealing)
1 It has been pointed out that for many researchers, the
definition of an EP requires a probabilistic selection mechanism, which we do not employ. Early studies of EP did not use a probabilistic selection mechanism, whereas modern versions do. z Note that we use the word "population" to refer to the total number of solutions maintained - which is twice the number of parents. 3 In fact, we might view our approach as a form of parallel simulated annealing.
P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76
5. Data
Table 1 Constructed data sets
Real-world data sets are large, heavily constrained, and are inappropriate for an exploratory study such as our own in which we seek to deliberately alter qualitative features of our data to assess effects on performance. Instead, we evaluate the performance of our E P versus r a n d o m search (RS) by generating data sets with k n o w n properties. F r o m (5), it should be clear that the m i n i m u m possible A is zero which will occur if each 6 j has the same value. Fixing the n u m b e r of elements in 6 e~, we r a n d o m l y choose positive real n u m b e r s such that ~1 equals some fixed number. We repeat this for each subset 5:j. Thus, for the constructed data sets, there exists at least one solution where A = 0. Table 1 describes the collection of data sets constructed. We varied the n u m b e r of elements in each subset, the n u m b e r of subsets, and the variance in the n u m b e r of elements in each subset. Results for a relatively small n u m b e r of these data sets are actually presented below - those indicated in the table; a full set of results is available from the authors u p o n request.
Data set
Number of elements, by subset
1 2 3 4 5 6 7 8 9 10
Low variance 5,5,5 10, 10, 10 20, 20, 20 40, 40, 40 80, 80, 80 5,5,5,5,5,5,5,5,5 10, 10, 10, 10, 10, 10, 10, 10, 10 20, 20, 20, 20, 20, 20, 20, 20, 20 40, 40, 40, 40, 40, 40, 40, 40, 40 80, 80, 80, 80, 80, 80, 80, 80, 80
11 12 13 14 15 16 17 18 19 20
Medium variance 4,5,6 8, 10, 12 16, 20, 24 32, 40, 48 64, 80, 96 4,4,4,5,5,5,6,6,6 8, 8, 8, 10, 10, 10, 12, 12, 12 16, 16, 16, 20, 20, 20, 24, 24, 24 32, 32, 32, 40, 40, 40, 48, 48, 48 64, 64, 64, 80, 80, 80, 96, 96, 96
21 22 23 24 25 26 27 28 29 30
High variance 2,5,8 4, 10, 16 8, 20, 32 16, 40, 64 32, 80, 128 1,2,3,4,5,6,7,8,9 2, 4, 6, 8, 10, 12, 14, 16, 18 4, 8, 12, 16, 20, 24, 28, 32, 36 8, 16, 24, 32, 40, 48, 56, 64, 72 16, 32, 48, 64, 80, 96, 112, 128, 144
6. Results F o r each data set, the E P and RS were each applied 100 times (with new seeds to the r a n d o m n u m b e r generator). F o r each such application, the total function evaluations was 100000 plus the size of the initial population. To make RS comparable with all population sizes, for each run a total of 100 400 function iterations were performed (100400 corresponds to the n u m b e r of function iterations for an E P with a population of 400). Results averaged over the 100 replications are summarized in Fig. 1. The only true regularity across the data sets is that the mean solution for the E P is always lower than that of RS. The gain from using an E P can be considerable; in some cases the mean solution for RS is over 100 times larger than that for an EP. The mean solutions tend to fall with the number of elements in the set 5 e. The size of the population is of greatest importance for a small number of elements in 5: with larger populations recording a lower mean solution.
69
Note: In the table above, variance refers to the variance in the number of elements in the subsets of the known (generated) solution. Datasets 1-5 have the same number of elements in each subset in the known solution and a small number of subsets; datasets 6-10 increase the number of subsets. Datasets 11-15 and 21-25 are directly comparable with 1 5, with the variance in the number of elements in each subset in the known solution increasing from datasets 1-5 to 11-15 to 21-25. Likewise, the variance in the number of subsets increases from datasets 6-10 to 16-20 to 26-30. The remaining graphs in Fig. I give some idea as to the dispersion of solutions for b o t h RS and EP. Both RS and E P are able to find the k n o w n mini m u m of zero when the n u m b e r of elements in the
P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67-76
70
Mean Solution Low Variance 10
RS 20 50 i00 200 400
i
5 Mean
t
I0 Number
.........
20 of E l e m e n t s
Medium i0
7
~ .....
......... ........... ...... ........... . . . . . . . . . .
. . . . . . . .
- . .
40 in E a c h S u b s e t
80
Variance i
i
R S 20 50 100 200 400
I
Mean
I
i0 Number
-,
20 of E l e m e n t s
- .......... ........... ....... ........... ..........
'
40 in E a c h S u b s e t
80
High Variance 10 '
I
0 5
(a)
Mean
i0 Number
'
~"
I
20 of E l e m e n t s
RS - 2 0 .......... 5 0 ........... ioo
. . . . .
200 400
........... ..........
-~
40 in E a c h S u b s e t
'
80
Fig. 1. (a) M e a n solution; (b) m i n i m u m solution; (c) m a x i m u m solution; (d) v a r i a n c e of solutions; Note: The key in each d i a g r a m i n d i c a t e s the s o l u t i o n m e t h o d . "RS" refers to " r a n d o m search" while the n u m b e r e d entries all refer to e v o l u t i o n a r y p r o g r a m s where the n u m b e r gives the size of the p o p u l a t i o n . The title " l o w v a r i a n c e " refers to d a t a s e t s 1-5, the " m e a n n u m b e r of e l e m e n t s in each subset" i n d e x i n g the d a t a s e t s (see T a b l e 1); the title " m e d i u m v a r i a n c e " refers to d a t a s e t s 11-15 a n d " h i g h v a r i a n c e " to d a t a s e t s 21-25.
set Ae is small, as seen in Table 2. In general, the best solutions found by EP are better than those found by RS. We also examined the w o r s t solution found by each algorithm (over the 100 replicates). With one
exception, the maximum (worst) RS solution is worse than that found using an EP. For a given algorithm, the worst solution is generally decreasing with the number of elements in the set Se. When the number of elements in the set ~ is large, the
71
P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76
M i n i m u m Solution Low Variance 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.I 0
i00 ......
5
I0 20 Mean Number of Elements
40 in Each Subset
80
M e d i u m Variance .9 8 7 6 5 4 3 2 1 0
|
i
RS 20 50 100 200
400
~ - ~ : , - : . - , _ ~ _
_
i0 20 Mean Number of Elements
~
- .......... ........... ............. ........... ..........
"
40 in Each Subset
80
High Variance 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.i 0
/
400
,o-
5
(b)
RS 20 50 100 200
. . . . . . . . . .
~ :
.
- ......... ........... ........... ..........
.
i0 20 Mean Number of Elements
40 in Each Subset
80
F i g . 1. ( C o n t i n u e d ) .
worst performance of the EPs is considerably better than that of RS. This indicates that, at least for these SPPs, the mutation operator successfully avoids any local minima in the objective function. A final measure of dispersion is the variance of the solutions (again, over the 100 replicates). The EPs uniformly produce a tighter distribution of
solutions than RS. This is particularly true when there is a large variance in the number of elements in each subset of ~ for the known solutions, and especially when there are a small number of subsets. Fig. 2 gives some idea as to the difficulty in optimizing the function/1. The known solution for each data set was altered as follows:
P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76
72
Maximum Low
Solution
Variance i
RS 20 50 I00 200 400
30
20
- .......... ........... ..... ........... ..........
i0
!
0 Mean
i0 Number
!
of
20 Elements
Medium
40 in Each
80
Subset
Variance R S 20 50 i00 200 400
3O
20
- .......... ........... ----.......... ..........
i0
I
0 5 Mean
I0 Number
I
of
I
20 Elements
High |
in
80
40 Each Subset
Variance i
F.S 20 50 100 200 400
30
20
.......... ........... .......... ..........
I0 ii
|
0 5 (c)
Mean
i0 Number
of
20 Elements
in Each
40 Subset
80
Fig. 1. (Continued).
(1) For each element Ni e 6~, place the element in subset S t f o r j e {1, ... ,m}. (2) For each pair of elements Ni and Ni,, allocate the elements to subsets S t and S t', respectively, for all possible values of i, i', j and j'. The first alteration amounts to all one-step deviations (q = 1 mutations) from the known solution while the second constitutes all two-step devi-
ations. 4 Densities were computed using a Normal kernel estimator with a window width of 500. Thus, Fig. 2 gives the distribution of solutions which are 0, 1 and 2 mutations away from .the known 4 N o t i c e t h a t this a c t u a l l y includes the k n o w n s o l u t i o n several times. A c c o u n t i n g for the k n o w n s o l u t i o n s was a n u i s a n c e in the c o m p u t e r code.
P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76
73
Variance of Solutions Low Variance
80 70 60 50 40 30 20 i0 0
i
i
RS 20 50 i00
~
~
I
Mean
20 of E l e m e n t s
Medium 80
200
...........
400
..........
I
. . . . .
i0 Number
- .......... ........... ......
i
in E a c h
80
40 Subset
Variance ,
i
R S - 20 .......... 50 ........... 100 ......
70 60 50
200 400
40 30 20
........... ..........
I0 0
I
--
5 Mean
I0 Number
I
I
20 of E l e m e n t s
40 in E a c h S u b s e t
80
High Variance 80 70 60 50 3O 20 0
L-'-]-". . . . . . . . 5
(d)
-
20 50
.......... ...........
100 200 400
40
i0
RS
Mean
, i0 Number
,
-
.......... ........... ..........
,
20 of E l e m e n t s
40 in E a c h S u b s e t
80
F i g . 1. ( C o n t i n u e d ) .
minimum, with the probability along the vertical axis and the value of the function A along the horizontal axis. While there is typically a large probability mass around zero, the bulk of the probability mass is typically far from zero, where "far" is taken to mean "much larger than the worst solution obtained by either RS or the EPs". In this
sense, the set partitioning problem is a "difficult" problem. C P U time increases with the total number of elements. 5 For a small number of elements (data s All runs were conducted 621, using
on a Sun SPARCstation
only one of the CPUs.
20 model
74
P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67-76
Low Variance, 3 Subsets 0.0040 5
0.0030
(\
10 ........ 20 ........... 40
%
0.0020
. . . .
80 ...........
0.0010 '."..ji}\...... 0.0000
~
0
~
1000
2000 3000 Function Value
,
4000
5000
Medium Variance, 3 Subsets 0.0040
i
5 10 20 40 80
0.0030 0.0020
~ .......... ........... ..... ...........
0.0010 0.0000 0
1000
2000 3000 Function Value
4000
5000
High Variance, 3 Subsets 0.0040
' 5 ~ 10 .......... 20 ........... 40 . . . . . . 80 ...........
0.0030 r ~
0.0020
d~
0.0010 0.0000 0
1000
(a)
.... 2000 3000 Function Value
I
40O0
5000
Fig. 2. (a) 3 subsets: high, medium and low variance and (b) 9 subsets: high, medium and low variance. Note: These figures give the distribution of solutions 0, 1 and 2 mutations away from the known minimum, computed with a Normal kernel estimator. The numbers in the keys refers to the mean number of elements in each dataset. As in Table 1, "variance" refers to the variability in the number of elements in each subset of the known solution. "Low variance, 3 subsets" refers to datasets 1-5, "medium variance, 3 subsets" to datasets 11-15; "high variance, 3 subsets" to datasets 21-25; "low variance, 9 subsets" to datasets 6-10; "medium variance, 9 subsets" to datasets 16-20; and "high variance, 9 subsets" to datasets 26-30.
sets 1, 11 and 21), RS and E P s of p o p u l a t i o n 20 and 50 take roughly the same a m o u n t of time (12-14 rain) while an E P with p o p u l a t i o n 400 takes almost 3 times longer (38 min). Whereas for a large
data set (data sets 10, 20 and 30), RS takes a b o u t half the time of an E P with p o p u l a t i o n 20 (211 versus 4 4 0 m in) while an E P with population 400 takes 1811 min. In the light of the above comparisons,
75
P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76
Low Variance, 9 Subsets 0.0007 0.0006 it 0.0005 0.0004 0.0003 , i 0.0002 0.0001 0.0000 0
'
5
m
10 .......... 20
...........
40 . . . . . . 80 ...........
10000
20000 30000 Function Value
40000
50000
Medium Variance, 9 Subsets 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 0.0000
!
5 10 20 40 80
- .......... ........... ......... ...........
t~ i.V',
/ i ~,.~----,, ,
0
10000
20000 30000 Function Value
I
40000
50000
High Variance, 9 Subsets 0.0006
i
5 10 20 40 80
!i [i !i
0.0005 0.0004 IL 0.0003
I i
- .......... ........... ......... ...........
0.0002 r",. , ' 1 ,,,~ 0.0001 0.0000
I-,
0
10000
2O000 3000O Function Value
(b)
40000
50000
Fig. 2. (Continued).
it is easy to recommend the use of an EP with a relatively small population: the EPs tend to outperform RS, and by a sufficient margin for large data sets to compensate for the extra computational burden; the performance of an EP with a small population is favorable relative to the one with a large population, with the benefit of substantially smaller computational requirements.
7. Conclusions Our results demonstrate beyond question the utility of EP in the particular incarnations of the SPP we have used. While our data is fabricated, it should be noted that real-world data is limited, and consists of very large data sets, the properties of which cannot be varied systematically in the
76
P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67-76
Table 2 Number of times minimum achieved Data set
1
11
21
Population size
Number of times minimum achieved
RS
7
20 50 100 200 400
5 4 6 13 14
RS 20 50 100 200 400
43 26 34 24 22
RS 20 50 10(3 200 400
20 13 22 26 20
3
2
Note: These are the only three datasets for which the known minimum of zero was achieved. "RS" denotes "random search" while the remaining entries under "population size" refer to results for the evolutionary program.
manner of our experiments. We have chosen to evaluate our EP in a natural manner: by examining the results of such a search. Further research will make a comparison between the EP we have described and more sophisticated EPs in which selection routines and parameters such as population size, proportion of the population that is replaced and mutation rates are allowed to vary throughout an individual experiment. Also, it seems natural now to compare the performance of the EP against that of simulated annealing and genetic algorithms. Again, a careful control of the properties of data sets will allow some credibility to any claims of superior performance, but it is of the nature of the problems within the domain of evolutionary optim-
ization that such claims are always conjectures, and that dismissal of one method or another must be undertaken with great trepidation. References [1] T. Brick, G. Rudolph and H.-P. Schwefel, Evolutionary programming and evolution strategies: similarities and differences, in: D.B. Fogel and W. Atmar, Eds., Proc. 2nd Ann. Conf. on Evolutionary Programming (Evolutionary Programming Society, La Jolla, CA, 1993). [2] I.O. Bohachevsky, M.E. Johnson and M.L. Stein, Generalized simulated annealing for function optimization, Technometrics 28(3) (1986) 209-218. [3] L. Davis Ed., Handbook of Genetic Algorithms (Van Nostrand Reinhold, New York, 1991). [4] D.B. Fogel, Evolving artificial intelligence, Doctoral Dissertation, University of California, San Diego (1992). [5] D.B. Fogel On the philosophical differences between evolutionary algorithms and evolutionary programming, in: D.B. Fogel and W. Atmar, Eds., Proc. 2nd Ann. Conf. on Evolutionary Programming (Evolutionary Programming Society, La Jolla, CA, 1993). [6] D.B. Fogel, An introduction to simulated evolutionary optimization, in: D.B. Fogel and W. Atmar, Eds., Proc. 2nd Ann. Conf. on Evolutionary Programming (Evolutionary Programming Society, La Jolla, CA, 1993). [7] L.J. Fogel, A.J. Owens and M.J. Walsh, Artificial Intelligence Through Simulated Evolution (Wiley, New York, 1966). [8] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (Reading, MA, Addison-Wesley, 1989). [9] K.L. Hoffman and M. Padberg, Solving airline crew scheduling problems by branch and cut, Management Sci. 39(6) (1993) 657-682. [10] J.H. Holland, Adaptation in Natural and Artificial Systems (Ann Arbor: University of Michigan Press, Michigan, 1975). [11] S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671-680. [12] Z. Michalevicz, Genetic Algorithms + Data Structures = Evolution Programs, 2rid ed. (Springer, New York, 1994). [13] I. Rechenberg, Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution (Frommann-Holzbood Verlag, Stuttgart, 1973). [14] H.-P. Schwefel, Kybernetische Evolution als Strategie der Experimentellen Forschung in der Strrmungstechnik, Diploma Thesis, Technical University of Berlin (1965).