Applying evolutionary programming to selected set partitioning problems

Applying evolutionary programming to selected set partitioning problems

J 1 ,: 7" FUZ2'¥ I sets and systems Fuzzy Sets and Systems95 (1998) 67-76 ELSEVIER Short Communication Applying evolutionary programming to se...

525KB Sizes 1 Downloads 184 Views

J 1

,:

7"

FUZ2'¥

I

sets and systems Fuzzy Sets and Systems95 (1998) 67-76

ELSEVIER

Short Communication

Applying evolutionary programming to selected set partitioning problems P a u l G o m m e a'b, P a u l G . H a r r a l d c'* aSimon Fraser University, Burnaby, B.C., V5A 1S6, Canada bCREFE/UQAM, Casepostale 8888, succursale Centre-Ville, Montrdal, Qudbec, H3C 3P8 Canada °Manchester School of Management, UMIST, P.O. Box 88, Manchester, M60 1QD, UK

Abstract Evolutionary programming is applied to several instances of the set partitioning problem. Comparison is made between the distribution of best-evolved solutions arising from implementations of the EP with the empirical distribution of a randomly selected trial solution. © 1998 Elsevier Science B.V.

I. Introduction

that is

There has been a recent significant revival in interest in the evolutionary optimization techniques of evolutionary prooramming, developed by Fogel and colleagues in the early 1960s [7], and exemplified in several recent works such as Fogel [4-6]. Along with 9enetic aloorithms [3,8, 10], evolution strategies [1, 13, 14], and simulated annealing [2,11], evolutionary programming (EP) completes the basic categorization of stochastic search routines based on broadly evolutionary principles. This paper contributes to this line of research by developing an EP for the set partitionin9 problem.

S j ~ O, S~c~S ~ = 0

* Corresponding author.

for i C j,

(2)

U s~ = s,.

(3)

J

Define the following: 6~=

~

N~

(4)

Ni~S j

and

=

2. The set partitioning problem Consider a finite set of real numbers 5e = {N i}~=1. " Now consider a partition of S~, {SJ }j=l, "

(1)

F, i= l , . . . , r a -

Z 2 j=i+

I~,- ~jl;

(s)

l,...,m

then the set partitioning problem is to specify each Sj, subject to the restrictions above, in order to minimize A. This abstract problem has several realworld counterparts. The best known example is that of crew scheduling for flight legs [9]. Our

0165-0114/98/$19.00 © 1998 ElsevierScience B.V. All rights reserved PII S0165-01 14X(96)00404-6

68

P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67 76

characterization can be interpreted broadly as a task allocation problem in which the subsets forming the partition are individual working units, and the integers assigned are tasks (e.g., flight legs) assigned to each unit (e.g., crews). Written in this form (taken from [12, p. 258]) the objective is to minimize differences in workloads among the crews. Generically, this SPP represents a class of problems in which tasks of varying resource requirements are to be allocated to productive units as "evenly" as possible, which applies to all problems in which the cost of using the productive units is convex in the rates of their use.

3. Evolutionary programming Consider the generic problem of minimizing some objective function F(x) for x ~ 9. A classic EP implements a stochastic search of ~ by operating on a population of trial solutions from ~ in the following manner: (1) Generate an initial "population" of random elements of 9. (2) Remove the worst performing half of the population (those with the greatest associated values of F(x)). (3) "Mutate" each remaining member of the population to replace those solutions just deleted, and go back to (2). This procedure selects "fitter" members of the current population to act as parents for the next. The definition of fitness will vary from context to context (but it should, of course, reflect the optimization problem at hand), and the meaning of "mutation" will depend upon the context and the chosen representation of a trial solution. For example, if x is real-valued, it is typical to apply a sample from a Normal distribution, perhaps parameterized dynamically. In other applications, such as order-based problems (e.g., the traveling salesman problem, see [4]), mutation remains welldefined as a mapping from ~ into 9. An alternative routine for selection would have each member of the current population compared to a random subset of other solutions. For each favorable comparison, the solution receives a

"win", and then based on wins a chosen selection of the population can be removed and replaced. For some researchers a probabilistic selection routine is considered important, but remains a topic for further study in this application.1

4. The evolutionary program To encode any given solution to the SPP we use a vector of n integers, in the range l .... , m given by (al . . . . . an), indicating the subset that (N1 .... , Nn) are assigned to. We generate an initial population of size p. This is achieved by randomly assigning the elements of 5P to subsets 5ei. Any initial solution with an 5ej = O is discarded and replaced with a new solution. Each solution is evaluated according to the criterion (5). After evaluation, the worse half of the population (i.e., those with high function evaluations) is discarded and replaced with a mutation of each member in the top half of the population. 2 Specifically, we mutate a solution by randomly choosing q elements from 5a, deleting each element from its subset 5~j and reassigning it to a randomly chosen subset 5aj'. An element is not deleted if it would produce an invalid solution (i.e., SPJ= 0). Execution of the EP continues until an optimum solution is found, or until q is reduced until it has a value of 1, i.e., one element is reassigned from one subset to another. We begin with q = 10 and reduce it by one after one-tenth of the maximum number of function evaluations. This approach is similar in spirit to the annealing schedule in simulated annealing)

1 It has been pointed out that for many researchers, the

definition of an EP requires a probabilistic selection mechanism, which we do not employ. Early studies of EP did not use a probabilistic selection mechanism, whereas modern versions do. z Note that we use the word "population" to refer to the total number of solutions maintained - which is twice the number of parents. 3 In fact, we might view our approach as a form of parallel simulated annealing.

P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76

5. Data

Table 1 Constructed data sets

Real-world data sets are large, heavily constrained, and are inappropriate for an exploratory study such as our own in which we seek to deliberately alter qualitative features of our data to assess effects on performance. Instead, we evaluate the performance of our E P versus r a n d o m search (RS) by generating data sets with k n o w n properties. F r o m (5), it should be clear that the m i n i m u m possible A is zero which will occur if each 6 j has the same value. Fixing the n u m b e r of elements in 6 e~, we r a n d o m l y choose positive real n u m b e r s such that ~1 equals some fixed number. We repeat this for each subset 5:j. Thus, for the constructed data sets, there exists at least one solution where A = 0. Table 1 describes the collection of data sets constructed. We varied the n u m b e r of elements in each subset, the n u m b e r of subsets, and the variance in the n u m b e r of elements in each subset. Results for a relatively small n u m b e r of these data sets are actually presented below - those indicated in the table; a full set of results is available from the authors u p o n request.

Data set

Number of elements, by subset

1 2 3 4 5 6 7 8 9 10

Low variance 5,5,5 10, 10, 10 20, 20, 20 40, 40, 40 80, 80, 80 5,5,5,5,5,5,5,5,5 10, 10, 10, 10, 10, 10, 10, 10, 10 20, 20, 20, 20, 20, 20, 20, 20, 20 40, 40, 40, 40, 40, 40, 40, 40, 40 80, 80, 80, 80, 80, 80, 80, 80, 80

11 12 13 14 15 16 17 18 19 20

Medium variance 4,5,6 8, 10, 12 16, 20, 24 32, 40, 48 64, 80, 96 4,4,4,5,5,5,6,6,6 8, 8, 8, 10, 10, 10, 12, 12, 12 16, 16, 16, 20, 20, 20, 24, 24, 24 32, 32, 32, 40, 40, 40, 48, 48, 48 64, 64, 64, 80, 80, 80, 96, 96, 96

21 22 23 24 25 26 27 28 29 30

High variance 2,5,8 4, 10, 16 8, 20, 32 16, 40, 64 32, 80, 128 1,2,3,4,5,6,7,8,9 2, 4, 6, 8, 10, 12, 14, 16, 18 4, 8, 12, 16, 20, 24, 28, 32, 36 8, 16, 24, 32, 40, 48, 56, 64, 72 16, 32, 48, 64, 80, 96, 112, 128, 144

6. Results F o r each data set, the E P and RS were each applied 100 times (with new seeds to the r a n d o m n u m b e r generator). F o r each such application, the total function evaluations was 100000 plus the size of the initial population. To make RS comparable with all population sizes, for each run a total of 100 400 function iterations were performed (100400 corresponds to the n u m b e r of function iterations for an E P with a population of 400). Results averaged over the 100 replications are summarized in Fig. 1. The only true regularity across the data sets is that the mean solution for the E P is always lower than that of RS. The gain from using an E P can be considerable; in some cases the mean solution for RS is over 100 times larger than that for an EP. The mean solutions tend to fall with the number of elements in the set 5 e. The size of the population is of greatest importance for a small number of elements in 5: with larger populations recording a lower mean solution.

69

Note: In the table above, variance refers to the variance in the number of elements in the subsets of the known (generated) solution. Datasets 1-5 have the same number of elements in each subset in the known solution and a small number of subsets; datasets 6-10 increase the number of subsets. Datasets 11-15 and 21-25 are directly comparable with 1 5, with the variance in the number of elements in each subset in the known solution increasing from datasets 1-5 to 11-15 to 21-25. Likewise, the variance in the number of subsets increases from datasets 6-10 to 16-20 to 26-30. The remaining graphs in Fig. I give some idea as to the dispersion of solutions for b o t h RS and EP. Both RS and E P are able to find the k n o w n mini m u m of zero when the n u m b e r of elements in the

P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67-76

70

Mean Solution Low Variance 10

RS 20 50 i00 200 400

i

5 Mean

t

I0 Number

.........

20 of E l e m e n t s

Medium i0

7

~ .....

......... ........... ...... ........... . . . . . . . . . .

. . . . . . . .

- . .

40 in E a c h S u b s e t

80

Variance i

i

R S 20 50 100 200 400

I

Mean

I

i0 Number

-,

20 of E l e m e n t s

- .......... ........... ....... ........... ..........

'

40 in E a c h S u b s e t

80

High Variance 10 '

I

0 5

(a)

Mean

i0 Number

'

~"

I

20 of E l e m e n t s

RS - 2 0 .......... 5 0 ........... ioo

. . . . .

200 400

........... ..........

-~

40 in E a c h S u b s e t

'

80

Fig. 1. (a) M e a n solution; (b) m i n i m u m solution; (c) m a x i m u m solution; (d) v a r i a n c e of solutions; Note: The key in each d i a g r a m i n d i c a t e s the s o l u t i o n m e t h o d . "RS" refers to " r a n d o m search" while the n u m b e r e d entries all refer to e v o l u t i o n a r y p r o g r a m s where the n u m b e r gives the size of the p o p u l a t i o n . The title " l o w v a r i a n c e " refers to d a t a s e t s 1-5, the " m e a n n u m b e r of e l e m e n t s in each subset" i n d e x i n g the d a t a s e t s (see T a b l e 1); the title " m e d i u m v a r i a n c e " refers to d a t a s e t s 11-15 a n d " h i g h v a r i a n c e " to d a t a s e t s 21-25.

set Ae is small, as seen in Table 2. In general, the best solutions found by EP are better than those found by RS. We also examined the w o r s t solution found by each algorithm (over the 100 replicates). With one

exception, the maximum (worst) RS solution is worse than that found using an EP. For a given algorithm, the worst solution is generally decreasing with the number of elements in the set Se. When the number of elements in the set ~ is large, the

71

P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76

M i n i m u m Solution Low Variance 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.I 0

i00 ......

5

I0 20 Mean Number of Elements

40 in Each Subset

80

M e d i u m Variance .9 8 7 6 5 4 3 2 1 0

|

i

RS 20 50 100 200

400

~ - ~ : , - : . - , _ ~ _

_

i0 20 Mean Number of Elements

~

- .......... ........... ............. ........... ..........

"

40 in Each Subset

80

High Variance 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.i 0

/

400

,o-

5

(b)

RS 20 50 100 200

. . . . . . . . . .

~ :

.

- ......... ........... ........... ..........

.

i0 20 Mean Number of Elements

40 in Each Subset

80

F i g . 1. ( C o n t i n u e d ) .

worst performance of the EPs is considerably better than that of RS. This indicates that, at least for these SPPs, the mutation operator successfully avoids any local minima in the objective function. A final measure of dispersion is the variance of the solutions (again, over the 100 replicates). The EPs uniformly produce a tighter distribution of

solutions than RS. This is particularly true when there is a large variance in the number of elements in each subset of ~ for the known solutions, and especially when there are a small number of subsets. Fig. 2 gives some idea as to the difficulty in optimizing the function/1. The known solution for each data set was altered as follows:

P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76

72

Maximum Low

Solution

Variance i

RS 20 50 I00 200 400

30

20

- .......... ........... ..... ........... ..........

i0

!

0 Mean

i0 Number

!

of

20 Elements

Medium

40 in Each

80

Subset

Variance R S 20 50 i00 200 400

3O

20

- .......... ........... ----.......... ..........

i0

I

0 5 Mean

I0 Number

I

of

I

20 Elements

High |

in

80

40 Each Subset

Variance i

F.S 20 50 100 200 400

30

20

.......... ........... .......... ..........

I0 ii

|

0 5 (c)

Mean

i0 Number

of

20 Elements

in Each

40 Subset

80

Fig. 1. (Continued).

(1) For each element Ni e 6~, place the element in subset S t f o r j e {1, ... ,m}. (2) For each pair of elements Ni and Ni,, allocate the elements to subsets S t and S t', respectively, for all possible values of i, i', j and j'. The first alteration amounts to all one-step deviations (q = 1 mutations) from the known solution while the second constitutes all two-step devi-

ations. 4 Densities were computed using a Normal kernel estimator with a window width of 500. Thus, Fig. 2 gives the distribution of solutions which are 0, 1 and 2 mutations away from .the known 4 N o t i c e t h a t this a c t u a l l y includes the k n o w n s o l u t i o n several times. A c c o u n t i n g for the k n o w n s o l u t i o n s was a n u i s a n c e in the c o m p u t e r code.

P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76

73

Variance of Solutions Low Variance

80 70 60 50 40 30 20 i0 0

i

i

RS 20 50 i00

~

~

I

Mean

20 of E l e m e n t s

Medium 80

200

...........

400

..........

I

. . . . .

i0 Number

- .......... ........... ......

i

in E a c h

80

40 Subset

Variance ,

i

R S - 20 .......... 50 ........... 100 ......

70 60 50

200 400

40 30 20

........... ..........

I0 0

I

--

5 Mean

I0 Number

I

I

20 of E l e m e n t s

40 in E a c h S u b s e t

80

High Variance 80 70 60 50 3O 20 0

L-'-]-". . . . . . . . 5

(d)

-

20 50

.......... ...........

100 200 400

40

i0

RS

Mean

, i0 Number

,

-

.......... ........... ..........

,

20 of E l e m e n t s

40 in E a c h S u b s e t

80

F i g . 1. ( C o n t i n u e d ) .

minimum, with the probability along the vertical axis and the value of the function A along the horizontal axis. While there is typically a large probability mass around zero, the bulk of the probability mass is typically far from zero, where "far" is taken to mean "much larger than the worst solution obtained by either RS or the EPs". In this

sense, the set partitioning problem is a "difficult" problem. C P U time increases with the total number of elements. 5 For a small number of elements (data s All runs were conducted 621, using

on a Sun SPARCstation

only one of the CPUs.

20 model

74

P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67-76

Low Variance, 3 Subsets 0.0040 5

0.0030

(\

10 ........ 20 ........... 40

%

0.0020

. . . .

80 ...........

0.0010 '."..ji}\...... 0.0000

~

0

~

1000

2000 3000 Function Value

,

4000

5000

Medium Variance, 3 Subsets 0.0040

i

5 10 20 40 80

0.0030 0.0020

~ .......... ........... ..... ...........

0.0010 0.0000 0

1000

2000 3000 Function Value

4000

5000

High Variance, 3 Subsets 0.0040

' 5 ~ 10 .......... 20 ........... 40 . . . . . . 80 ...........

0.0030 r ~

0.0020

d~

0.0010 0.0000 0

1000

(a)

.... 2000 3000 Function Value

I

40O0

5000

Fig. 2. (a) 3 subsets: high, medium and low variance and (b) 9 subsets: high, medium and low variance. Note: These figures give the distribution of solutions 0, 1 and 2 mutations away from the known minimum, computed with a Normal kernel estimator. The numbers in the keys refers to the mean number of elements in each dataset. As in Table 1, "variance" refers to the variability in the number of elements in each subset of the known solution. "Low variance, 3 subsets" refers to datasets 1-5, "medium variance, 3 subsets" to datasets 11-15; "high variance, 3 subsets" to datasets 21-25; "low variance, 9 subsets" to datasets 6-10; "medium variance, 9 subsets" to datasets 16-20; and "high variance, 9 subsets" to datasets 26-30.

sets 1, 11 and 21), RS and E P s of p o p u l a t i o n 20 and 50 take roughly the same a m o u n t of time (12-14 rain) while an E P with p o p u l a t i o n 400 takes almost 3 times longer (38 min). Whereas for a large

data set (data sets 10, 20 and 30), RS takes a b o u t half the time of an E P with p o p u l a t i o n 20 (211 versus 4 4 0 m in) while an E P with population 400 takes 1811 min. In the light of the above comparisons,

75

P. Gomme, P.G. Harrald / Fuzzy Sets and Systems 95 (1998) 67-76

Low Variance, 9 Subsets 0.0007 0.0006 it 0.0005 0.0004 0.0003 , i 0.0002 0.0001 0.0000 0

'

5

m

10 .......... 20

...........

40 . . . . . . 80 ...........

10000

20000 30000 Function Value

40000

50000

Medium Variance, 9 Subsets 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 0.0000

!

5 10 20 40 80

- .......... ........... ......... ...........

t~ i.V',

/ i ~,.~----,, ,

0

10000

20000 30000 Function Value

I

40000

50000

High Variance, 9 Subsets 0.0006

i

5 10 20 40 80

!i [i !i

0.0005 0.0004 IL 0.0003

I i

- .......... ........... ......... ...........

0.0002 r",. , ' 1 ,,,~ 0.0001 0.0000

I-,

0

10000

2O000 3000O Function Value

(b)

40000

50000

Fig. 2. (Continued).

it is easy to recommend the use of an EP with a relatively small population: the EPs tend to outperform RS, and by a sufficient margin for large data sets to compensate for the extra computational burden; the performance of an EP with a small population is favorable relative to the one with a large population, with the benefit of substantially smaller computational requirements.

7. Conclusions Our results demonstrate beyond question the utility of EP in the particular incarnations of the SPP we have used. While our data is fabricated, it should be noted that real-world data is limited, and consists of very large data sets, the properties of which cannot be varied systematically in the

76

P. Gomme, P.G. HarraM / Fuzzy Sets and Systems 95 (1998) 67-76

Table 2 Number of times minimum achieved Data set

1

11

21

Population size

Number of times minimum achieved

RS

7

20 50 100 200 400

5 4 6 13 14

RS 20 50 100 200 400

43 26 34 24 22

RS 20 50 10(3 200 400

20 13 22 26 20

3

2

Note: These are the only three datasets for which the known minimum of zero was achieved. "RS" denotes "random search" while the remaining entries under "population size" refer to results for the evolutionary program.

manner of our experiments. We have chosen to evaluate our EP in a natural manner: by examining the results of such a search. Further research will make a comparison between the EP we have described and more sophisticated EPs in which selection routines and parameters such as population size, proportion of the population that is replaced and mutation rates are allowed to vary throughout an individual experiment. Also, it seems natural now to compare the performance of the EP against that of simulated annealing and genetic algorithms. Again, a careful control of the properties of data sets will allow some credibility to any claims of superior performance, but it is of the nature of the problems within the domain of evolutionary optim-

ization that such claims are always conjectures, and that dismissal of one method or another must be undertaken with great trepidation. References [1] T. Brick, G. Rudolph and H.-P. Schwefel, Evolutionary programming and evolution strategies: similarities and differences, in: D.B. Fogel and W. Atmar, Eds., Proc. 2nd Ann. Conf. on Evolutionary Programming (Evolutionary Programming Society, La Jolla, CA, 1993). [2] I.O. Bohachevsky, M.E. Johnson and M.L. Stein, Generalized simulated annealing for function optimization, Technometrics 28(3) (1986) 209-218. [3] L. Davis Ed., Handbook of Genetic Algorithms (Van Nostrand Reinhold, New York, 1991). [4] D.B. Fogel, Evolving artificial intelligence, Doctoral Dissertation, University of California, San Diego (1992). [5] D.B. Fogel On the philosophical differences between evolutionary algorithms and evolutionary programming, in: D.B. Fogel and W. Atmar, Eds., Proc. 2nd Ann. Conf. on Evolutionary Programming (Evolutionary Programming Society, La Jolla, CA, 1993). [6] D.B. Fogel, An introduction to simulated evolutionary optimization, in: D.B. Fogel and W. Atmar, Eds., Proc. 2nd Ann. Conf. on Evolutionary Programming (Evolutionary Programming Society, La Jolla, CA, 1993). [7] L.J. Fogel, A.J. Owens and M.J. Walsh, Artificial Intelligence Through Simulated Evolution (Wiley, New York, 1966). [8] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (Reading, MA, Addison-Wesley, 1989). [9] K.L. Hoffman and M. Padberg, Solving airline crew scheduling problems by branch and cut, Management Sci. 39(6) (1993) 657-682. [10] J.H. Holland, Adaptation in Natural and Artificial Systems (Ann Arbor: University of Michigan Press, Michigan, 1975). [11] S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671-680. [12] Z. Michalevicz, Genetic Algorithms + Data Structures = Evolution Programs, 2rid ed. (Springer, New York, 1994). [13] I. Rechenberg, Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution (Frommann-Holzbood Verlag, Stuttgart, 1973). [14] H.-P. Schwefel, Kybernetische Evolution als Strategie der Experimentellen Forschung in der Strrmungstechnik, Diploma Thesis, Technical University of Berlin (1965).