Genetic algorithms for generating minimum path configurations

Genetic algorithms for generating minimum path configurations

Genetic algorithms for generating minimum path configurations Peter K Sharpe*, Alan G Chalmers t and Adam Greenwood* Minimum path configurations ha...

547KB Sizes 1 Downloads 101 Views

Genetic algorithms for generating minimum path

configurations

Peter K Sharpe*, Alan G Chalmers t and Adam Greenwood*

Minimum path configurations have been shown to be very useful in solving complex problems on large multiprocessor systems. However, minimum path configurations are irregular and thus their construction by more traditional methods for even modest numbers of processors may not be feasible in reasonable times. This paper describes how genetic algorithms may be used successfully to generate these configurations and compares the solutions obtained with those obtained by a heuristic depth-first strategy.

Keywords: genetic algorithms, minimum path configurations, parallel processing

'Searching the full search space for the 32-processor AMP will take a billion billion years-but if we use Sicstus * Prolog it will only take half a billion billion years '1 . Solving a problem on a distributed memory multiprocessor system requires the individual processors to co-operate when necessary, by exchanging messages. These messages are transferred between processors across links which interconnect the processors. If the number of links available at each processor is not restricted then the system may be fully interconnected, having a maximum distance between any two processors of one for any number of processors. In practice, however, there are limitations to this maximum number of links per processor, which implies that processors will no longer be able to communicate directly with each other, but must do so via intermediate processors. Thus as the numbers of processors in the multiprocessor system increase, so too does the number of intermediate processors through which a message must pass. In this *Bristol Transputer Centre, University of the West of England, Coldharbour Lane, Bristol BS16 1QY, UK. E-mail: [email protected] tDepartment of Computer Science, University of Bristol, University Walk, Bristol BS8 1TR, UK. E-mail: [email protected] *Developed by the Swedish Institute of Computer Science Paper received: 11 October 199:;. Revised: 13 July 1994

paper we restrict attention to systems that support a maximum of four links per processor, but the principles described are equally applicable to any number of links per processor. The distance between any two processors may be measured as the number of links a message has to traverse between its source processor and the destination processor. Minimum path (AMP) configurations are irregular configurations, constructed to minimize this number of links, that is to minimize the diameter, dmax, of the interconnection network 2. A configuration of five processors with a diameter of 1 may be constructed as shown in Figure la. However, to provide a useful parallel processing platform, a multiprocessor system must have access to input/output facilities. Most systems achieve this by designating one processor as the system controller (SC), while the other processors perform the actual processing of the problem. If no additional links are available for including the system controller, then some existing links must be used, as shown in Figure lb. In this case the number of processors within the configuration must be reduced to four if the diameter of 1 is to be maintained. Due to their irregular nature, the problem of finding optimum AMP configurations for larger numbers of processors is a formidable task even for traditional heuristic depth-first search strategies ~, prompting Gregory's quote above. For example, from the initial spanning tree of a 32 processor AMP (see below), there are 66 links that need to

be connected, giving 65!/66 ~ 1.25 ~ 10 a° possible link combinations. However, it is this type of problem which has been usefully addressed in other fields by genetic algorithm techniques, and so these techniques offer a means of constructing the AMP configurations in acceptable times. The genetic algorithm, developed by Holland 4, is a powerful directed search technique which is loosely based on biological models of evolution. With its inherent parallelism it effectively pursues multiple paths towards an optimum solution. It is also extremely robust in the face of

0141-9331/95/$09.50 ~", 1995 Elsevier Science B.V. All rights reserved Microprocessors and Microsystems Volume 19 Number 1 February 1995

Genetic algorithms for generating minimum path configurations: P K Sharpe et al.

(=)

(b)

Figure 1 Processorsystemswith a diameter of 1: a, five-processorsystem without SC; b, inclusion of SC using existing links

I: conflicting information. This paper presents findings on the suitability of genetic algorithms for the generation of AMP configurations.

MINIMUM PATH CONFIGURATIONS

Figure 2 The S3-processorAMP configuration

A minimum path (AMP) configuration is constructed so that the diameter, dmax, of the interconnection network is minimized. This principle is maintained even at the expense of the loss of regularity in a system. Other features of the AMP configurations include: •



every link of a processor within the configuration must be connected to a different processor; that is, no two processors are connected by more than one link; as AMP configurations are used for connecting multiprocessor systems, a system controller processor, for providing the necessary input/output interface, must be included without the provision of additional links ~.

If every processor in the configuration has the same number of links, A, and the diameter of the configuration, is at most dmax, then the upper bound on the maximum number of processors in the configuration, usually called the Moore bound, can be found by counting the number of processors which are a distance 0, 1, 2 ..... dmax from a given processor. In extremal graph theory, graphs achieving this bound are called Moore graphs 5. It has been shown by Biggs 6 that (except for dmax z 1 or A _ 3), Moore graphs can exist only for the cases: dmax - 2 and A -- 3, 7 or 57. In other cases, graphs can be found that approach these boundJ.

Spanning trees Within a non-disjoint interconnection network routes exist from every processor to all other processors. These routes initially traverse one of the source processor's links and from there travel across other processors' links until reaching their destination. More than one route may exist between any processor pair and these routes may differ in length. The set of routes which gives the shortest distance between a processor and all other processors may be termed its spanning tree 8. Figure 2 shows the 53-processor AMP and Figure 3 shows the spanning tree for the processor labelled 9 in the 53-processor AMP. The maximum length of any route contained in the

10

Figure 3 The spanning tree for processor labelled 9 in the 53-processor AMP configuration spanning tree will be tess than or equal to the diameter of the configuration, and thus the maximum depth of a spanning tree is also less than or equal to the diameter. AMP configurations have been shown to be particularly suitable for solving a class of problem which requires global communication 9' 10. Communication overheads will have a significant influence on system performance and so, when comparing AMP configurations with other interconnection networks, we will use the properties of diameter and the average interprocessor distance between any two processors. The diameter of a multiprocessor configuration, dmax, has already been defined as the maximum direct distance between any two processors in the system. The number of intermediate processors that have to forward the message is thus (dmax - 1). The average interprocessor distance, davg, is the number of links on average

Microprocessors and Microsystems Volume 19 Number 1 February 1995

Genetic algorithms for generating minimum path configurations: P K Sharpe et al. that a message has to traverse between a source processor and a destination processor. As we are dealing with irregular configurations, this average interprocessor distance will depend on the source processor's position within a configuration and thus may not necessarily be the same for all processors in the configuration. Therefore, we shall use the average interprocessor distances of all the processors in the configuration in this study. The average interprocessor distances for the N processors for a configuration with a diameter of dmax will be determined by: 1 d ,,v~ =

Npd N2

(1) ,

where Npd is the number of processors distance d away from processor p.

THE GENETIC ALGORITHM The genetic algorithm (GA) is used to optimize members of a population of 'structures' where the structures are encoded in a 'gene string'. The population of structures is evolved in a manner which is analogous to a naive view of biological evolution. In this case the determinant of evolution is the 'fitness' of each structure as measured by an appropriate fitness function. Continuing the analogy of biological evolution, the structures are called phenotypes and their representations as strings are called chromosomes or genotypes. In many cases, these representations are fixed length strings in which each component, or gene, may take only a restricted set of values, or alleles. In this work the set of values is restricted to two, so that the genotypes consist of binary strings. The fitness function provides a measure of the fitness of a particular phenotype which, since there is a mapping between genotype and phenotype, gives a measure of the fitness of the genotype. In practice, the mapping between genotype and fitness is seldom perfect. However, the genetic algorithm seems able to overcome this difficulty. During the evolution of the population, strings which do well in terms of the fitness function are more likely to be selected for the 'breeding' process which creates new, more 'fit' populations. In summary, the process is: 1. 2. 3. 4.

5.

Randomly generate a population of strings. For each string, calculate its utility in terms of the fitness function. For each string, calculate a selection probability based on its fitness. Create a new population by selecting strings, based on the selection probability, and applying genetic operators. Repeat from (2) until specified stop condition.

In the problem areas in which the genetic algorithm is likely to be applied a simple search, for example a random search, would last an inordinate length of time. The advantage of a genetic search is that the time to obtain a reasonable, not necessarily optimal, solution can be drasti-

cally reduced. This is explained by the notion of 'intrinsic parallelism'. The genetic search may be seen as a search for co-adapted alleles. Each individual string exemplifies a large number of possible 'patterns of co-adapted alleles', or schemata, as they are called by Holland 4. Assessing the fitness of this string assesses all the schemata of which the string is an instantiation. If the genetic operators which are applied to the population generate new instances of above average schemata then an extremely effective intrinsically parallel search method is created. The operation of the genetic algorithm is an iterative process of evaluation, selection and the application of genetic operators. The genetic operators used in this work are crossover and mutation. The selection procedure is a randomized process which ensures that the number of times a string is chosen for inclusion in the new population is proportional to the structure's fitness, relative to the rest of the population. The new population is then created by applying crossover and mutation. The crossover operation involves taking a pair of strings and exchanging a randomly selected segment of each string. One of the two new offspring is placed in the new population. Crossover parent1 parent2 offspring 1 offspring2

100 101 100 101

110101 010010 010010 110101

101011010 111010101 101011010 111010101

The proportion of the new population to which crossover is applied is governed by the crossover rate parameter. Returning to the idea of schemata, it can be seen that each crossover introduces new schemata for trial as well as assessing current schemata in new situations. Crossover, therefore, serves two complementary functions in the genetic search, both to further explore areas which are inherent in the population and to introduce new areas of exploration. Mutation is applied to each string in the new population. Each allele in the string is given a chance (given by the mutation rate parameter) of undergoing mutation. In the type of GA used here, the mutation rate specifies the probability of not copying an allele at each point in the string. If mutation is to be applied the allele is chosen at random from the set of possible values. A mutation rate of 1.0 would give random strings. However, the mutation rate is normally set at a low level so that mutation functions as a background operator which ensures that the crossover has a full range of alleles on which to work. The newly created strings are evaluated and the sequence of selection, crossover, mutation and evaluation is repeated for a specified number" of times or until a required performance level is achieved. The process ensures that 'good' segments of the strings are retained and become more prevalent in the population as a whole, while mutation introduces variety and may lead the algorithm into new, and more profitable, areas of search. In this work the population consists of strings which represent a series of numbers. These code for the connectivity of the spare links in a processor configuration after the

Microprocessors and Microsystems Volume 19 Number 1 February 1995

11

Genetic algorithms for generating minimum path configurations: P K Sharpe et al. minimum spanning tree for the one processor has been constructed. The evaluation function consists of constructing a processor connection matrix from the information encoded in a string and calculating the total interprocessor distance of the resultant configuration. The objective is to minimize the total interprocessor distance of the processor network.

String encoding At the outset a decision was taken to keep the form of representation as simple as possible. Each free link on a processor is given a number. Each segment of the binary string codes for such a number. Thus, the first segment codes for the free link to which free link 0 connects, the second segment for the connection to free link 1, and so on. The segment size was chosen to be big enough to represent the largest possible number for the configuration under investigation.

Connection matrix construction The decoding process results in a raw link list. This clearly contains some redundant or conflicting information. The list is parsed from beginning to end to remove this, which process results in some free links re-appearing. These are re-connected using a simple join-furthest-apart heuristic. The final process is the breaking of a link between processors in order to provide for the system controller connection. This is also placed under the control of the GA. An extra segment is added to the string which codes for the connection to be broken. The resulting refined link list is then translated into a processor connection matrix. From this matrix the sum of the minimum paths between all processors is calculated and returned to the GA as the evaluation for the string in question. If, for any pair of processors the diameter is exceeded, a small penalty is added to the sum of the minimum paths. This results in the GA minimizing the total path length in the configuration while attempting to stay within the set diameter. By careful choice of penalty, the GA is able to search configurations outside the set diameter and is thus not restricted to a particular sub-optimum set of solutions. This refinement process results in a mismatch between the actual information encoded in the initial string and the resultant processor connection matrix. The result is that the evaluation process adds noise to the information acted on by the genetic algorithm. With a less robust technique this might lead to failure; however, one of the characteristics of GAs is their resilience in the face of noisy data. In this work various approaches were tried before settling on the simple method used above. These included having the string code for half the connections (thereby effectively specifying the other half), parsing the string randomly and re-injecting strings accurately reflecting the connection matrix back into the population. It did not seem that these refinements led to any better results.

12

Running the GA The initial spanning tree of the AMP is first generated by a program which also specifies all the parameters required by the GA, in particular, the number of processors and the number of free links. It is these free links which the GA manipulates, leaving the initial spanning tree unchanged. Using this input, a crossover rate of 0.6 and a mutation rate of 0.001, experiments were carried out with varying population size, number of trials and initial random seed.

COMPARISON WITH OTHER CONFIGURATIONS In this section we evaluate the AMP configurations generated by the genetic algorithm, (labelled AMPGA), with AMP configurations which have been obtained using a heuristic depth-first search strategy implemented in Prolog ~ (which we will label AMPDFs) and some other configurations which are often cited in the literature. The AMP configurations make use of existing links to include the system controller, while the other configurations use additional links. Table 1 shows the diameters, while Table 2 shows the average interprocessor distances for each of the configurations. As can be seen, the diameters of the AMP configurations are lower than those of any of the other configurations, and the diameters of the 64 and 128processor configurations are lower for the AMPt~Fsconfigurations than those generated by the genetic algorithm, AMPGA. However, the average interprocessor distance for the AMPGA configurations is lower than any of the others, including that for the 128-processor hypercube which requires seven links per processor rather than the four used for the AMPs.

PARALLEL PANEL METHODS USING AMP CONFIGURATIONS Panel methods have found a wide range of applications in aerospace and other industries, for example, the design of Formula 1 racing cars, sailing vessels, locomotives etc., where a need arises to predict the low speed flow past complex configurations. For example, according to Hess 11, a company such as Douglas Aircraft performs flow calculations about a complete aircraft approximately ten times ]-able 1 Comparisonof diameters Number of processors

AMPGA AMPDFs Hypercube Torus Ternary tree Ring

Microprocessors and Microsystems Volume 19 Number 1 February 1995

8

13 16 23 32 40 53 64 128

2 2 3 3 4 4

2 2 4 6

3 3 4 4 5 8

3 4 4 4 5 3 3 4 4 4 5 6 6 7 7 6 6 6 8 8 11 16 20 26 32

6 5 7 12 10 64

Genetic algorithms for generating minimum path configurations: P K Sharpe et al. Table 2 Comparisonof averageinterprocessordistances Number of processors 8

13

16

23

32

40

53

64

128

1.28 1.28 1.50 1.50 1.97 2.00

1.55 1.55 2.56 3.23

1.66 1.73 2.00 2.00 2.91 4.00

2.02 2.05 3.39 5.74

2.29 2.31 2.50 3.00 3.93 8.00

2.46 2.53 3.50 4.25 10.00

2.58 2.76 4.77 13.25

2.79 2.92 3.00 4.00 5.01 16.00

3.41 3.58 3.50 5.64 6.25 32.00

AMP¢;a AMPDFs Hypercube Torus Ternary tree Ring

Table 3 AMPc.Aresultsfor a 918 panel problem Processors

Geom

Set up

Solve

Total

16 23 32

2.05 2.02 2.04

44.75 31.9 24.32

247.77 198.68 175.36

294.60 232.69 201.77

Table 4 AMPDFsresultsfor a 918 panel problem Processors

Geom

Set up

Solve

Total

16 23 32

2.02 2.03 2.03

44.89 32.08 24.14

248.58 200.24 173.18

295.53 234.40 199.39

Table 3 shows the results for the three stages of the panel method when implemented on the minimum path configurations generated by the genetic algorithms. Table 4 shows similar results for the panel method when implemented on AMP configurations generated by the directed search method. These tables show that the lower average interprocess distance of the AMPGA configurations results in lower computation times compared with those AMPDFs configurations with the same diameter. However, although the AMPGA for 32 processors has the same average interprocessor distance as the 32 processor AMPDFs, the diameter of 4 for the AMPGA does make a difference, resulting in higher computation times than those of the 32 processor AMPDFs.

CONCLUSIONS per day. Panel methods are applicable to any problem that is governed by Laplace's equation. These methods were originally known as surface singularity methods; however, the technique of covering the domain of the problem with small quadrilaterals led to the name 'panel methods'. To calculate the flow past an object, its surface is first discretized using a mesh of m x n points. This produces ( m - 1) x ( n - 1) panels. The influence of each panel on every other panel is computed to produce a matrix of (m 1)2 x (n 1)2 influence coefficients. The solution of the matrix is used to determine the velocities on the object's surface. The calculation of flow past an object by means of a panel method thus comprises four stages: 1. 2. 3. 4.

Calculation of panel geometries Influence function calculation and assembly of the matrix Solution of the influence matrix Calculation of flow quantities on or off the surface of the body.

The global communication requirements and variations in computational complexity makes the parallelization of these methods on large numbers of MIMD processors a formidable task, unless effective techniques can be found to minimize the inherent message densities. The parallel implementation of the panel method used to obtain the following results for the first three stages is described fully in Reference 9.

Genetic algorithms have been successfully used to obtain minimum path configurations with low average interprocessor distances. These average distances are as good as or better than those of the AMP configurations generated by a heuristic depth-first search, and significantly better than the average distances of the torus, ternary tree, ring and hypercube configurations. The genetic algorithm obtained the configurations with a low average interprocessor distance by disadvantaging members of the population with an amount proportional to how much they were over the expected diameter. This strategy has, however, still resulted in configurations with higher than expected diameters. Despite the low average interprocessor distances, this increased diameter does affect the communication overheads prevalent in the configuration when solving complex problems involving global communication, resulting in greater times to solve the problems. Future work will examine a two-stream parallel genetic algorithm. One stream will select members of its population based solely on the diameter of the configuration, while the other stream will select on the average interprocessor distance. These streams will exchange a proportion of their subpopulations at frequent intervals. In this way it is hoped to obtain minimum path configurations with the lowest possible average interprocessor distances within the diameter suggested by extremal graph theory.

Microprocessors and Microsystems Volume 19 Number 1 February 1995

13

G e n e t i c a l g o r i t h m s for g e n e r a t i n g m i n i m u m path c o n f i g u r a t i o n s : P K S h a r p e et al.

REFERENCES 1 Gregory, S Private communication 2 Chalmers, A G 'A minimum path system: a communication-efficient system for distributed-memory multiprocessors' S. Afr. J. Sci. Vol 89 (April 1993) pp 175-181 3 Chalmers, A G and Gregory, S 'Constructing minimum path configurations for multiprocessor systems' Parall. Comput. Vol 19 (April 1993) pp 343 355 4 Holland, J H Adaptations in Natural and Artificial Systems University of Michigan Press, Ann Arbor (1975) 5 Bollob&s, B Extremal Graph Theory Academic Press, London (1978) 6 Biggs, N Algebraic Graph Theory Cambridge Tracts in Math. No. 67, Cambridge University Press, London (1974) 7 Bermond, J C, Delorme, C and Quisquater, J J 'Strategies for interconnection networks: some methods from graph theory' I. Parall. Distrib. Comput. Vol 3 (1986)pp 433-449 8 Chalmers, A G 'A minimum path system for parallel processing' PhD thesis, University of Bristol, Department of Computer Science (August 1991)

9 Chalmers, A G, Fiddes, S and Paddon, D J 'Parallel panel methods' In H S M Zedan (Ed) 13th Occam Users Group Conf. lOS Press, York (1990) pp 313 321 10 Chalmers, A G and Paddon, D J 'Parallel radiosity methods' In D L Fielding (Ed) 4th North American Transputer Users Group lOS Press, Ithaca, NY (October 1990) pp 183-193 11 Hess, J L 'Panel methods in computational fluid dynamics' Ann. Rev. Fluid Mech. Vol 22 (1990)

Peter Sharpe is a senior lecturer at the Bristol Transputer Centre, He received a PhD in surface chemistry from UMIST in 1976 and an MSc in parallel computer systems from the University of the West of England in 1989. Currently he is investigating adaptive algorithms applied to fundamental issues in parallel processing and data exploration.

14

M i c r o p r o c e s s o r s and M i c r o s y s t e m s V o l u m e 19 N u m b e r 1 F e b r u a r y 1995

Alan Chalmers is a lecturer in the Department of Computer Science at the University of Bristol. He obtained his MSc in 1984 in South Africa and his PhD from the University of Bristol in 1991. His current research interest is the development of advanced parallel processing techniques in order to solve complex science and engineering applications in reasonable times.

Adam Greenwood has a degree in computer science and an MSc in parallel computer systems from the University of the West of England awarded in 1992. Since his graduation he has moved into the field of computer publication, but still maintains his research interest in the application of parallel genetic algorithms.