Intelligent Systems
NORTH-HOILAND A Fuzzy Approach to Mapping Problems V. CATANIA A. PULIAFITO and L. VITA
Istituto di Informatica e Telecomunicazioni, VialeA. Doria, 6--95125 Catania, Italy
ABSTRACT This paper explores the possibility of using fuzzy logic to solve the problem of mapping in some significant cases. The basic idea is to exploit the flLZzylogic prerogative of accurately managing values of which only a qualitative estimate is known in such a way as to generate fuzzy inference rules for the heuristic solution for the mapping problem. The paper first deals with a simple case of mapping so as to describe the method used to deduce the fuzzy inference rules, considering both topological and cardinal variation. Then the method is applied to a toroidally connected network of processors. The proposed method is then implemented by means of a tool for the development of fuzzy systems through which it is possible to run the rule-based mapping system. ©Elsevier Science lnc. 1996
1.
INTRODUCTION
T h e p e r f o r m a n c e t h a t c a n b e o b t a i n e d by a p a r a l l e l system is strictly l i n k e d to t h e use o f a v a i l a b l e p r o c e s s o r s b y t h e p r o c e s s e s t h a t h a v e to b e executed. The distribution of processes over the processors, commonly k n o w n as t h e mapping problem [3, 13], is m e a n t to m i n i m i z e t h e t o t a l l e n g t h o f t i m e for e x e c u t i o n o f t h e p r o c e s s e s . I n o r d e r to r e a c h this objective, m a p p i n g o p e r a t i o n s a i m at o p t i m i z i n g two critical factors. T h e first f a c t o r is r e p r e s e n t e d by t h e fact t h a t t h e n u m b e r o f p r o c e s s e s ( M ) is o f t e n g r e a t e r t h a n t h e n u m b e r o f p r o c e s s o r s ( N ) a v a i l a b l e o n t h e system (cardinal variation). T h i s m e a n s t h a t m o r e t h a n o n e p r o c e s s will h a v e to b e
INFORMATION SCIENCES 95, 191-217 (1996) © Elsevier Science Inc. 1996 655 Avenue of the Americas, New York, NY 10010
0020-0255/96/$15.00 PII S0020-0255(96)00137-5
192
V . C A T A N I A E T AL.
mapped on a processor, and attention will therefore have to be paid to avoid overloading the processors so as not to create blocks or delays in the parallel processing. The second factor consists of the difference between the structure connecting the processes which are to be allocated and the topology of the parallel architecture (topological variation). This means that some processes which need to communicate with each other may find themselves on processors which are not directly connected, and communications will not only take up more of the network, but will also be subject to delays due to passage through intermediate nodes. An efficient mapping strategy therefore aims at reducing the communication cost on the interconnection network and ensuring balancing of the load between the processors (CPU Cost). A mapping strategy can be applied statically or dynamically. In the static approach [1], the mapping strategy is only initially activated. The main limitation of this approach is its incapacity to manage the occurrence of nondeterministic events such as the creation and destruction of processes, with a consequent degradation in performance. Dynamic management of the mapping process [17], on the other hand, is based on the signaling of any variations in traffic parameters between the processes in the system. This information should allow the machine's operative system to modify mapping at run-time so as to reduce the communication overhead. The risks of this approach are, briefly, the overhead introduced because of the movement of the various processes to follow the rtm-time variations and the difficulty in establishing when to perform a new mapping. In this paper, we are interested in a mapping strategy called Cost Optimization Mapping. This strategy is based on minimization of a cost function which can be considered as an index of the efficiency of a certain mapping; the lower the value assumed by the cost function, the better the mapping it is associated with, Appendix 1 summarizes the criterion used for the cost optimization map, through which two cost functions are defined: Fc and Fv The first expresses the communication cost, and the second the unbalanced CPU cost, both relating to a particular mapping. The complexity of Cost Optimization Mapping has been shown to be an NP hard problem [5], and so is not applicable in most cases. Different heuristic approaches have been proposed which promise an acceptable complexity producing suboptimal mapping [4, 11, 12, 1]. There are, however, other problems which are difficult to solve, among which is the problem linked to the estimation of the parameters on which heuristics are to be applied. T o be more specific, it is quite difficult to estimate with any accuracy the volume of communication between pro-
A F U Z Z Y A P P R O A C H TO MAPPING PROBLEMS
193
cesses or the CPU load of each process, and this might make application of the mapping method useless. In this paper, we explore the possibility of using an approach to the mapping problem based on fuzzy logic [18, 14, 6]. The proposed strategy, directly usable in solving the static mapping problem, also can be adopted in the dynamic approach by providing the operating system with suitable service primitives. The basic idea is to exploit the capacity fuzzy logic provides for accurately representing values of which only a qualitative estimate is known, to generate fuzzy rules of inference in a heuristic approach to the mapping problem. The heuristics in accordance with which we produce fuzzy rules was inspired by a proposal contained in [2], according to which it is possible to obtain suboptimal mapping by appropriately performing progressive allocation of processes onto processors and reduc!ng the computational complexity. In the paper, we first deal with a simple case of mapping to describe the method used to deduce the fuzzy inference rules, considering both topological and cardinal variation. Then the method is applied to a toroidally connected network of processors. The method proposed is assessed with regard to the efficiency of the mapping through a program which computes a chosen cost function and, at the same time, a tool is used for the development of fuzzy systems through which it is possible to run the rule-based mapping system. Section 2 of the paper presents a brief overview of fuzzy logic; Section 3 defines the framework of the paper and presents a simple application; Section 4 formalizes the proposed method for a class of systems made up of toroidally connected networks of computers, initially examining topological variation alone, and then proposing an extension which includes cardinal variation as well. Section 5 illustrates an application of mapping strategy with both topological and cardinal variation. Section 6 and the last one present some remarks about the proposed approach and the authors' conclusions. 2. AN OVERVIEW OF F U Z Z Y LOGIC Nowadays, fuzzy logic [18] is of increasing interest in various fields of application, such as industrial process control [15], medical diagnosis [7], security trading [10], robot control [9], and in the management of complex decision-making or diagnosis systems in general [21, 8]. Fuzzy logic is based on the concepts of linguistic variables and fuzzy sets. A fuzzy set in a tmiverse of discourse U is characterized by a membership function mf which assumes values in the interval [0,1]. A fuzzy set F is represented as
194
V. C A T A N I A E T AL.
a set of ordered pairs, each made up of a generic element u ~ U and its degree of membership I~y(u). A linguistic variable x in a universe of discourse U is characterized by a set W(x)=(Wlx .... ,W,x) and a set M(x) = ( M 1..... ,Mxn~), where W(x) is the set of names the linguistic variable x can assume, and W~ is a fuzzy set whose membership function is Mix. If, for instance, x indicates a temperature, W(x) could be the set W(x) = (Low, Medium, High), each element of which is associated with a membership function. The fuzzy set associated with High, for example, could be the one shown in Figure 1. If x and y are taken to be two linguistic variables, fuzzy logic allows these variables to be related by means of production rules of the following type:
"if x is A then y is B " where x is A is the premise of the rule, while y is B is the conclusion. This rule makes it possible to deduce, using specific inferential methodologies, a fuzzy set for y for each input value of x, whether it is associated with a fuzzy set or assumes a numerical value (crisp). The premise defined the conditions in which the conclusions are to be applied; the conclusions
1.00
0.80 <1.)
"~ 0.60
m
_C~ 0.40
E 0.20
0.00
i
~0
t
i
;
I
i
20
30
40
50
60
Fig. 1.
Example of fuzzy set " H I G H . "
t
70
t
80
90
100
A F U Z Z Y A P P R O A C H TO MAPPING PROBLEMS
195
define the actions to be taken when the conditions of the premise are satisfied. More specifically, the degree of membership of the premise is calculated and, through application of a fuzzy logic inference method (typically Max dot or Min Max) to the conclusion, it allows the output y to be determined. In the rule described above, the degree of membership of the premise is calculated by assessing the degree of membership of a generic value of x in the fuzzy set A. If x is made up of a fuzzy set, its degree of membership is determined by making an intersection between the fuzzy value of x and the fuzzy set A and choosing the maximum value of membership; if x is a crisp value, its degree of membership in the fuzzy set A is made up of the value the membership function of A assumes corresponding to x. In general, the premise of a fuzzy rule is made up of a statement in which different operators can be used to combine fuzzy logic expressions such as the fuzzy operators and and or. The fuzzy logic operator and, when applied to two operands a and b, a or b, returns the minimum of the degrees of membership of a and b; the fuzzy logic operator or, when applied to the operands a and b, a or b, returns the maximum of the degree of membership of a and b. Figure 2 gives an example of the whole inferential procedure. We assume we have the following two rules which relate the two input variables x and y and the output variable z: if x is L O W o r y is L O W t h e n z is M E D I U M if x is L O W a n d y is H I G H then z is L O W .
The method illustrated is the MAX-DOT inference method, according to which the final output membership function for each output is the union of the fuzzy sets assigned to that output in a conclusion after scaling their degree of membership values to the peak at the degree of membership for the corresponding premise. As shown in Figure 2, the final output membership function is determined as the maximum (or connective in the premise) or the minimum (and connective in the premise) degree of membership values for the intersection of the membership functions for the input fuzzy data value and the fuzzy set used. Having obtained the final output membership function, Z, it is possible to obtain a crisp value by adopting one of several defuzzification techniques, of which the Center o f Area Method seems to give the best results [8]. In this method, the output value coincides with the abscissa of the baricenter of the output membership function Z; assuming U as the union of the inference made by each of
196
V. C A T A N I A E T A L
|j ~5 O
~D
N
I .m_ ,n
,-
! J=
~
>
~'1-~ i e~ (2
E
.=.
©
X
e4
h~
Iv"
A F U Z Z Y A P P R O A C H TO MAPPING PROBLEMS
197
the individual rules, this value can be determined by
fZtXu( Z ) dz zo = f m,( z) az which is the weighted average over all elements in Z. 3.
F U Z Z Y MAPPING
The basic idea of this work is to explore the possibility of applying fuzzy logic to solve the mapping problem in certain significant cases. We started from the consideration that the values to be attributed to the communication patterns between processes and CPU loads for each process are not always exactly predictable. Preliminary assessment of the effort induced by a process in a network of processors is, in fact, based on criteria which could hardly determine it with any accuracy. It is, however, possible to estimate this effort in qualitative terms, and fuzzy sets are an effective method of representation of this assessment. For our purposes, each process to be mapped can be characterized by a CPU load and a set of communication patterns with other processes. The communication patterns are indicated with the symbol uij , where "i" and " j " are the indexes which indicate the process the communication originates from and the one to which it is destined, respectively. If we define Cij = Cji as the overall communication patterns between processes P/ and Pj, it follows that Cij = Cjf-~- uij -[- uji. These communication patterns can thus be represented by means of a symmetrical matrix with respect to the main diagonal. Of course, the elements of the main diagonal are null. IK[ is an array, the generic element of which, Ki, expresses the CPU load induced by the process i. A conceptual model of the fuzzy system we use for the mapping problem is shown in Figure 3. The input variables are made up of two kinds of linguistic variables, Cij and Ki, the inverse of discourse for which is assumed as being the set of integers [0...255] (Figure 4a) and the interval [0,1], respectively. For each kind of variable, we define the sets W a n d M , respectively, of the names and membership functions associated with them. The output linguistic variables are made up of the array lal, the generic element of which, aij, called the assignment gain, provides a measure of the effectiveness of the assignment of the process i to the processor j. The universe of discourse chosen is the interval of real values [0,1], and the set
198
V. CATANIA ET AL.
}cl
IK[
I&l
7i lnference Engine ! ~ t
Defuzzifier
ti ~uzzy Baser,a,,( i" '
Tuning
Fig. 3. Fuzzy system model.
®
0.8
"o
0.6
0.4
.e
o.~
0 0
UNIVERSE OF DISCOURSE Cij ............
LOW
.....
255
HIGH
(a)
,0.811iiiiiiiiiii!iiiii
-o ._o.
0.6
0.4
:~
0.2
0
UNIVERSE OF DISCOURSE a~j ............
LOW
.....
HIGH
(Cq).
(b) Membership function of
O) Fig. 4. (a) Membership function of input variables output variables (aq).
A F U Z Z Y A P P R O A C H TO MAPPING PROBLEMS
199
of names W= [low, high] to which the membership functions shown in Figure 4b are associated. Though a process of defuzzification, the crisp values hij , used to perform the mapping, are obtained. The higher the crisp value Aij obtained, the greater the possibility that the assignment in question is suitable to obtain a good mapping. The fundamental blocks making up the system are (Figure 3): • The fuzzy inference engine: Its role is to match the preconditions of rules in the fuzzy rule base with the input state linguistic terms and perform implications. The result is a membership function curve for each output variable aij. • defuzzifier: Through the center of the area method, it produces a crisp value aij for each output variable Aiy. • tuning: It modifies the fuzzy rule base depending on the outputs IAijl and the experience of the designer. For the development of the fuzzy system described, we used the Tiilshell software package by Togay Infralogic [16]. This software enables the user to define (and subsequently simulate) a fuzzy system by using a simple high-level language called the Fuzzy Programming Language (FPL), by means of which fuzzy constructs can be described and it is possible to generate a source code, C, relating to the routines and data necessary to implement the expert system. To describe the fuzzy expert system in FPL, the following steps have to be followed: 1. 2. 3. 4.
identifying the inputs and outputs choosing their kinds of data defining the membership functions determining the set of rules which govern the system.
As regards choice of the kinds of data which are to be used to represent and store input and output data, it is necessary to consider the range of values the data in question can assume (universe of discourse) and its solution. Once the inputs, outputs, and membership functions have been created, all through a high-level graphic interface, it is necessary to specify the I F - T H E N fuzzy rules which relate the inputs and the outputs. Compilation of the FPL code resulting from this initial phase produces a source code, C, which can be used to define a complete application or to integrate it into an already existing one.
~0
V. C A T A N I A E T A L .
3.1. FOUR PROCESSORS~FOUR PROCESSES MAPPING In this section, we describe a simple example in order to show how fuzzy logic can be used to solve the problem of mapping. Our objective is to map four processes onto a network made up of four processors connected as shown in Figure 5. The lack of cardinal variation allows us to neglect the CPU costs of the processes to be mapped. So the only input variables to solve the problem are the elements Cij of the array ICI of communication patterns, each of which is associated to a set of names made up of High and Low. The output variables aij can also assume the names High and Low, to which, of course, different membership functions are associated. Because of the symmetry of the interconnection structure being considered, it is possible to make the arbitrary decision to map the process P1 on the processor/,1. The really interesting assignment gains are thus those in which the second index is 3 (i.e., those referring to processor 3). If, in/act, a second process is assigned to processor 3, it is clear that assignment of the other two processes can be done arbitrarily. Having excluded a13 from the output variables since, by hypothesis, the process P1 has already been assigned to the processor/~1, the significant output variables which remain are only a23 , a33 , a n d a43. Below, we give the fuzzy rules used, written on the principle of making the possibility of assigning processes with strong communication requirements to directly communicating processors as great as possible:
RI--IF R2--IF R3--IF R4--IF Rs--IF R6--IF
C12 is L O W a n d C34 is L O W , then a23 is H I G H C12 is H I G H and C34 is HIGH, then a23/s L O W C13 is L O W a n d C24 is LOW, then a33 is H I G H C13 is H I G H and C24 is HIGH, then a33 is L O W C14 is L O W a n d C23 is LOW, then a43 is H I G H C14 is H I G H and C23 is HIGH, then a43 is LOW.
As can be observed, each assignment gain is deduced from a pair of rules.
Fig. 5. Hardware architecture used to connect four processors.
A FUZZY APPROACH TO MAPPING PROBLEMS
201
The logic governing each pair of rules can be understood through the following reasoning. Let us consider as an example the first pair of rules, R1 and R2, to evaluate the assignment gain a23. We assume P2 is mapped onto /z3 and, consequently, that the pair of processes P3 and P4 are mapped onto the pair of processors which have not yet been used--/z2 and /~4 (in any order). The efficiency of this mapping (measured through a23) will obviously depend on the communication patterns among the four processes. To be more specific, if we consider the communication volumes C12 and C34 between the processes P1 and P2 and P3 and P4, which are distant in the mapping (i.e., not directly communicating), it can be stated that: • the lower (LOW) C12 and C34 are, the higher (HIGH) a23 is (rule R1); ,~ the higher (HIGH) C12 and C34 are, the lower a23 is (rule R2). Since, on any variation of C12 and C34 in the inverse of discourse each of the two propositions will be activated with a different degree of truth, correct assessment of a23 can only be made by changing (in the fuzzy logic sense) the two rules, R1 and R2. The remaining two pairs of rules, R3-R4 and R5-R6, can be explained in the same way. More specifically: • the rules R3-R4 refer to the case in which and P2 and P4 onto the processors g2 and • the rules R5-R6 refer to the case in which and P2 and P3 onto the processors tz2 and
P3 is mapped onto /z3 /z4; P4 is mapped onto /z3 /z4.
Once the values of a23, a33, and a43, resulting from the input communication pattern values, have been calculated, assignment of a process to processor 3 is immediate: the one having the maximum possibility of assignment Aij is chosen. At this point, it Would seem appropriate to make some considerations on the rules and fuzzy sets used. Choosing a greater set W(x) of names the linguistic variable Cij can assume (e.g., introducing Very Low, Very High,...), it would, in fact, be possible to obtain greater precision in the output variables: the fuzzy rules would, in fact, be better able to cover the spectrum of possible values. We choose to include only two names in the set W(x) (High and Low) as the relatively simple topological structure does not justify greater complexit3( in the definition of the rules and the greater computational cost this entails. In addition, the structure of the rules shown above is the final result of a tuning process in which the rules have been gradually modified so as to obtain an acceptable compromise
202
V. C A T A N I A E T AL.
between the computational complexity of the rules and the accuracy of the final results.
3.2. EIGHT PROCESSES~FOUR PROCESSORS MAPPING In the presence of cardinal variation, it is necessary to pay attention so as not to overload one processor more than the others, thus taking account the CPU loads of the single processes in addition to the communication patterns. When there are eight processes to be mapped onto a system made up of four processors connected as shown in Figure 5, we will therefore have eight input variables of the type K i (i = 1, 8) besides the 28 of the type Cij. The approach used in this case orders the set of processes to be mapped in a nonincreasing manner with respect to the CPU load, and subdividing it into subsets with a cardinality equal to the number of processors. These subsets are then mapped in subsequent steps so as to map the processes with a higher CPU load into different processors. In this approach, we privilege balancing of the CPU load rather than the communication cost as our main aim is to show the applicability of fuzzy logic to the mapping problem, even at the expense of simplifying hypotheses. The method used can, however, be adapted to mapping heuristics which attribute a greater weight to communication cost or both. In the case examined, we assume that the first subset to be mapped is made up of the processes P1, P2, P3, and P4, and that the fuzzy algorithm described in Section 3.1 determines the following partial mapping:
Processor
Process
~1 /~2 /x3 /~4
P1 P2 P3 P4
T o map the second subset of processes, we arbitrarily choose the processor t~l onto which we map an appropriate process (one criterion could be that of choosing one which.has strong communication patterns with the processes already mapped onto the processors adjacent to /zl). If we suppose that this process is P5, it is possible to obtain the fuzzy rules to map the remaining three processes (see Appendix 2). Examining one of
A FUZZY APPROACH TO MAPPING PROBLEMS
203
these rules, we can deduce that the various factors making up the premise include: • the communication patterns between the processes to be allocated and all the processes previously allocated • the communication patterns between the processes to be mapped onto different processors than the one being considered • the sums of the CPU loads of the processes to be mapped and those already mapped onto the processor being considered. As can be seen, in all we used 18 additional rules which, together with the six described in Section 3.1, give a total of 24. The use of nine output variables (three for each processor) allows us in the second phase to assess the suitability of mapping more than one process on a processor, according to the assignment gain values. In other words, in optimal mapping there could be processors occupied by more than two processors or, vice versa, processors occupied by a single process. This kind of assignment would allow us to reserve one or more processors for one or more processes with much higher CPU loads than the others. 4.
A TOROIDALLY CONNECTED NETWORK
Having verified the effectiveness of fuzzy mapping in the simple case of a four-processor architecture, we then turned to the study of mapping in the case of a toroid-type architecture. A toroid is a two-dimensional array of processors with the sides wrapped around. Each processor has four others connected to it, one each on the left, right, top, and bottom. A network like this is characterized by two parameters, X and Y, which represent the number of rows and columns of which it is constituted. Figure 6 shows a toroidally connected network of nine processors.
Fig. 6. A toroidaUy connected network.
204
V. CATANIA ET AL.
The fuzzy mapping method proposed when there is only topological variation is based on a recursive algorithm, so the solution of the most simple toroid structure, the one made up of nine processors, is of fundamental importance for the solution of any structure of this kind. We will see later how this method can be exploited to solve the more general case, i.e., when there is also cardinal variation. 4.1.
TOROIDALLY CONNECTED N E T W O R K WITH NINE PROCESSORS
In this section, we describe the basic steps for the mapping of nine processes onto nine processors connected as shown in Figure 6. 1. The set of names of the input variables {LOW and HIGH} and the relative membership functions are defined. 2. The set of processes to be mapped is ordered in a nonincreasing manner with respect to the local average weight (see Appendix 1) and subdivided into subsets with a cardinality equal to the number of processors. 3. The first processor is chosen arbitrarily and is assigned the first process in the first subset. 4. The processes having higher communication patterns with the process already assigned are mapped onto the four processors directly communicating with the first one used. 5. Supposing that the method used to assign the first five processes, although heuristic, is capable of obtaining a satisfactory partial mapping, the fuzzy rules are written. For each of the four free processors and for each of the four processes Pi which still have to be allocated, i.e., for each of the pairs ( I~j, Pi) where
/~j ~ A = {set of free processors} P / ~ B = {set of processes to be assigned},
a pair of (dual) rules is defined of the following kind:
&k [ if Cik is HIGH(LOW)] and &k fib is LOW(HIGH) then ai] is HIGH(LOW)
A FUZZY APPROACH TO MAPPING PROBLEMS
205
where
&i [ espri ] : indicates a logical product of fuzzy expressions k : Pk ~ C = {processes already mapped on processor 1 jumps from/z j} h : Ph E D = {processes already mapped onto processor 2 jumps from/z j}. According to the assignment gains, the processes affected by the rules are then assigned. Each available processor/zj is assigned the process Pi with a greater a U than the others. If, on thebasis of this criterion, a process is assigned to more than one processor, definitive assignment is obviously based on the greater assignment gain.
4.2. A TOROIDALLY CONNECTED NETWORK WITH MORE THAN NINE PROCESSORS The general method proposed for topological variation alone entails a recursive procedure based on the solution of the simplest toroid architecture: one with nine processors. The steps to follow are summed up as follows: 1. Choice of the set of names for the input variables. 2. If the dimensions X and Y of the network are such that X is different from Y, then it is necessary to examine the subnetwork S of dimensions Z x Z, where Z = min{X, Y}. 3. Nine processors, organized into a 3 X 3 matrix and positioned by convention on the upper left-hand side of the subnetwork S, are chosen and assigned nine processes, using points 2, 3, and 4 of Section 4.1. 4. The remaining processes are assigned to the adjacent processors, i.e., to the processors which are immediately below and to the right of the matrix of processors which have already been used, with rules written according to the following criterion (we hypothesize having chosen the following four names for the input variables; ,Very LOW, LOW, HIGH, Very HIGH): for all the pairs (IXj, Pi) of processors belonging to A and processes belonging to B, a pair of (dual) rules is defined as follows:
&~ [ if Cik is V E R Y H I G H ( V E R Y L O W ) ] and t~ h [Cih is H I G H ( L O W ) ]
and &, [C. is LOW( HIGH)] and ~¢m [ Cim is V E R Y L O W ( V E R Y H I G H ) ] , then aq is H I G H ( L O W )
206
V. CATANIA ET AL.
where
& i [ espri ] : indicates a logical product of fuzzy expressions k : P k ~ C = {processes already mapped onto processor 1 jumps from/~j} h : P h e D = {processes already mapped onto processor 2 jumps from/z j} l : P l ~-E: {processes already mapped onto processor 3 jumps from/~j} m : Prn ~_F: {processes already mapped onto processor 4 jumps from ~j}.
5. Point 4 is repeated until all the processors making up the subnetwork S are used. 6. The remaining processes are mapped onto the rows or columns of processors which are still free, always proceeding towards the outside of the basic architecture and writing the rules with the same criterion as in point 4. The f u z ~ mapping algorffhm can be ~rmalized by means of Me ~llowmg recurswe procedure: Procedure mapping fuzzy (X, Y, processes) if X ~ Y consider the network with dimension Z=min{X, Y} begin mapping_dim equal (Z, processes); proceed towards the outside mapping the processes on the rows and columns of free processors: point 6 end else begin Z=X; mapping dim equal(Z, processes) end Procedure mapping dim equal(X, Processes) if X : 3 , t h e n m a p on t o r o i d a l l y c o n n e c t e d n e t w o r k with nine processors else begin mapping dim_equal (XI, processes); mapping adjacent processes: point 4 end
A FUZZY APPROACH TO MAPPING PROBLEMS
207
In order to assess the method proposed, we developed a program capable of calculating the cost functions, as defined in Appendix 1, associated with all the possible mappings of a set of N processes on a toroidally connected network made up of N processors (N = 1... 18), and thus assessed the optimal mapping. The results were then compared with those obtained in the simulation performed with the Tilshell software package by Togai Infralogic. The comparison confirmed that the solution found with the algorithm based on fuzzy logic corresponds to a mapping which really minimizes the cost function. 4.3. A TOROIDALLY CONNECTED NETWORK WITH CARDINAL VARIATION
The approach used in this case, like the description given in Section 3.2, orders the set of processes to be mapped in a nonincreasing manner with respect to the CPU cost and subdivides it into subsets with a cardinality equal to the number of processors. These subsets are then mapped in subsequent steps according to mapping procedures that can be considered as an extension of the one described for cases in which there is only topological variation (previous section). This procedure ensures that processes with a higher CPU cost will surely be mapped onto different processors. If, for example, we have nine processors and 27 processes, the procedure will have to be applied three times. As in the case of eight processes on four processors, it is obviously necessary to insert into the rules some terms which will take account of the overall CPY cost deriving from the simultaneous presence of more than one process on a processor. The other difference as compared with cases in which there is only topological variation is the fact that, i n the premises of the rules concerning the communication patterns, it will be necessary to consider the terms which take account of the processes already assigned in the previous steps. More formally, this means considering the sets C, D, E, and F considered in steps 4 and 6 in Section 4.2 which are made up not only of the processes already mapped in the current step, but also those belonging to any subsets previously mapped. Steps 1, 2, 3, and 5 of the general case of topological variation alone can be repeated for each of the subsets considered (of course, step I is only performed once at the beginning). An evident advantage of this procedure is the fact that the additional rules are proportional to the number of processes in excess with respect to the availability of processors. The computational complication, on the other hand, can be attributed to the fact that the rules written for the subsets after the first assume considerable dimensions.
208 5.
V.
CATANIA
ET
AL.
APPLICATION
Let us suppose we wish to map 18 processes onto a toro architecture with nine processors following the method described in Section 4.3. We hypothesize the following communication pattern [C] matrix: 0
2201210
220
230
250
30
50
40
15
150
140
40
35
0
160
50
180
220
0
60
160
90
145
195
90
60
174
35
0
212
190
45
20
205
200
210
60
0
45
155
160
105
165
0
220
0
195
35
200
0
0
187
0
220
160
45
0
55
65
25
175
170
200
47
23
0
96
210
220
50
0
230
90
155
55
0
0
150
20
155
215
198
0
240
55] 230
0
15
20
25
145
160
65
0
0
160
170
0
0
190
30
230
33
210
0
15
30
195
105
25
150
160
0
55
148
15
0
180
0
2101230
250
210
50
2401 i
50
90
165
175
20
170
55
0
163
30
210
40
0
0
230
210
40
60
0
170
155
0
148
163
0
10
20
190
200
20
40
0
0
200
1741220
200
215
0
15
30
10
0
195
200
30
10
0
200
35
190
0
193
10
15
1701200
150
35
0
47
198
190
0
210
20
195
0
210
20
215
0
140
0
195
23
0
30
180
40
190
200
210
0
230
25
200
33 I 21
0
40
212
35
0
240
230
0
0
200
30
20
230
0
192
193
2001210
22
35
190
200
96
55
240
210
170
20
10
215
25
192
0
80
181
200
21
0
45
0
2101230
33
230
200
40
0
0
200
193
80
0
0
200
33
200
181!
0
01210
220
211210
200,
0
160
20
0
220
0
210
250
0
0
200
0
50
205
187
50
15
0
210
230
0
35
193
180
200
0
0
20
15
50
210
200
190
10
0
22
211200
Let us also suppose we have the following CPU load values: K1 = 125
K2 = 110
K3 = 110
K4 = 70
K5 = 90
K6 = 60
K7 = 65
K8 = 95
K10 = 10
Kll
= 50
K9 = 80 K12 = 30
K13 = 55
K14 = 35
K15 = 40
K16 = 38
K17 = 47
K18 = 58
0]
210
0
10
220 I 10
0
A FUZZY APPROACH TO MAPPING PROBLEMS
209
According to the CPU cost values, the first subset of processes to be mapped is the one comprising the processes from one to nine. Once steps 1, 2, and 3 of Section 4,1 have been performed, assuming we have chosen (/z5, P1) as the first process/processor pair, we can, for example, obtain the following partial mapping:
Processor
Processes
/z2 /z4 /z5 ~6 /z8
P2 P5 P1 P3 P4
Applying the method described in step 4, we write a total of 32 rules (see Appendix 2) which allow us to obtain the assignment gains for allocating the remaining processes onto the free processors. The following are the Tilshell results [A] and the associated mapping:
6 7 8 9
1
3
7
9
0.4381 0.6208 0.3671 0.5573
0.6246 0.5867 0.5681 0.3688
0.3753 0.4131 0.3671 0.6311
0.5618 0.3791 0.6328 0.4426
Processor
Processes
/zl /z2 /z3 /z4 /x5 /z6 ~7 /x8 /z9
P7 P2 P6 P5 PI P3 P9 P4 P8
210
V. C A T A N I A E T AL.
To map the processes in the second group, we began by allocating a process from the second group onto processor/z5. At this point, step 3 is performed, assigning the processes to the processors directly connected with /,5. The criterion followed to assign processes in this phase takes account of the communication patterns with the processes allocated on processors directly communicating with the processors in question. The last four processes are assigned with fuzzy rules which take account of the allocations already made in the previous steps, as shown in the table:
Processor
Processes
/~1
P7
1~2
P2, P16
/x3
P6
~4
p5, P12
/x5
P1, P10
ix6
P3, P l l
/~7 /,8 /,9
P9 P4, P18 P8
Let us see one of the rules used:
If (K6 q-K15) is LOW and C15,1 is LOW and Cls,2 is HIGH and C15,3 is HIGH and C15,4 is LOW and C15,s is LOW and C15,6 is LOW and Cls,7 is HIGH and C15,8 is HIGH and C15,9 is LOW and C15,10 is LOW and C15,n is HIGH and C15,12 is LOW and C15,16 is HIGH and C15,18 is LOW, then a15,3 is HIGH.
Of course, we will have in all 32 rules. The effectiveness of the mapping is guaranteed by the fact that the processes with a higher CPU cost were not assigned to the same processor (according to the way in which the sets of processes to be subsequently assigned were chosen) and, in the second phase (assignment of the second group of nine processes), the fact that both cardinal and topological
A F U Z Z Y A P P R O A C H TO MAPPING PROBLEMS
211
variation are taken into account. The following are the Tilshell simulation results and the related mapping:
13 14 15 17
1
3
7
9
0.6391 0.4825 0.5432 0.4529
0.4353 0.5932 0.4163 0.6433
0.5670 0.3201 0.6525 0.4133
0.4835 0.4207 0.5033 0.5327
Processor
Processes
/zl /z2 /z3 /z4 /z5
P7, P13 P2, P16 P6, P17, P14 P5, P12 P1, P10 P3, P l l P9, P15 p4, P18 P8
iz6 ~7
I.~8 /~9
From careful examination of the communication pattern and CPU load values, it can be seen that the mapping obtained gives both a balanced distribution of the overall CPU loads and allocation of heavily communicating processes onto directly communicating processors. 6.
REMARKS ON O U R A P P R O A C H
The aim of this paper is not to provide the solution to the problem of mapping as much as a possible solution using the fuzzy approach. The reason why we feel this approach to be significant is as follows. The validity of most of the algorithms proposed in the literature for the solution of mapping problems is compromised by the uncertainty characterizing estimation of the parameters these algorithms operate on (e.g., communication patterns and CPU load). It is therefore of interest to explore alternative approaches--in this specific case, the fuzzy approach--which, although apparently imprecise, allows the uncertainty
212
V. CATANIA ET AL.
characterizing a procedure or the data to which it is applied to be expressed. The fuzzy approach proposed in the paper allows the mapping problem in toroid topologies to be solved even in the presence of topological and cardinal variation, always obtaining results which are very close to an optimal solution, as confirmed through comparison with some mapping methods proposed in the literature [3]. Some simplifying assumptions were made in the paper to make the logic governing the construction of fuzzy rules more direct. Once this logic is understood, we feel that it should not be too complicated to extend the fuzzy approach to cases in which the simplifying hypotheses we adopted are removed. We consider the advantage of the solution proposed to lie in the great simplicity of the formal definition of the problem, as well as the speed with which it is possible to obtain results. Once, in fact, the fuzzy rules are written, processing times are negligible if compared to those of classical approaches based on cost function minimization techniques. There are, of course, several open issues in the approach proposed which require further study. One regards the techniques to be used to represent, through fuzzy sets, the communication pattern and CPU load values estimated, for example, by an expert in the problem. Another concerns the problem of optimizing the rules produced to reduce the complexity of the system and increase its scalability. When the number of processors increases, the solution proposed requires new fuzzy rules to be written. Figure 7 shows how the number of fuzzy rules increase while increasing the number of processors and processes. This operation, which is extremely onerous if performed manually, can, however, be automatized quite easily thanks to the elementary logic with which the rules are defined. In this sense, we are currently investigating to include the fuzzy approach proposed in the paper in a more complex 150 130 I10 "~
90
~-
71)
-'_
30
Z
10 -10
0
5
[
I
I
10
15
20
N u m b e r o f p r o t e c t o r s amd p
Fig. 7.
....
~
N u m b e r of rules versus n u m b e r of p r o c e s s o r s and processes.
I
2~
A F U Z Z Y A P P R O A C H TO MAPPING PROBLEMS
213
strategy for dynamic management of the mapping process in a distributed system. More specifically, the fuzzy approach could be used whenever the operating system has to perform run-time computation of a new mapping. Finally, we think it is relevant to point out that the approach proposed could prove to be very interesting if it is considered that applications in the field of parallel systems often exhibit regions in which communications are prevalently local. In this case, it would be necessary to perform optimal mapping for each region, which would require a limited number of fuzzy rules.
7.
CONCLUSIONS
This paper has described an approach to the mapping problem based on fuzzy logic. The cases examined show that it is possible with relative simplicity to produce fuzzy rules derived from a qualitative description of the mapping strategy one wishes to adopt, Along with the possibility of using fuzzy sets to represent the typical mapping parameters, we feel that this makes a useful contribution towards exploring alternative strategies, based on the inexact description of human experience, to solve the mapping problem. The results obtained in the cases dealt with were validated through comparison with some mapping methods proposed in the literature [3]. In all the cases examined, the fuzzy approach adopted gave mappings which were very close to optimum, obtaining a considerable reduction in computational, complexity as compared with other methods. The proposed solution, directly usable in the static approach, can also be adopted to dynamic one, assuming that the operating system owns the mapping directives to invoke the run-time of the fuzzy mapping procedures [17]. There are, of course, several open issues in the approach proposed which require further study. One regards the techniques to be used to represent, through fuzzy sets, the communication pattern and CPU load values estimated, for example, by an expert in the problem. Another concerns the problem of optimizing the rules to reduce the complexity of the system and increase its scalability. Finally, we think it is relevant to point out that the approach proposed could prove to be very interesting if it is considered that applications in the field of parallel systems often exhibit regions in which communications are prevalently local. In this case, it would be necessary to perform optimal mapping for each region, which would require a limited number of fuzzy rules.
214
V. CATANIA ET AL.
APPENDIX 1 The way in which both the network of processors and the logical connectior/s between the processes in an application are commonly represented is by means of graphs. Therefore, if we use the following symbols: Pi: generic process to be mapped /zj: generic processor Ep: set of connecting arcs between processes E~: set of connecting arcs between processors Vp: set of processes V~: set of processors It is possible to define the following graphs:
G p = (Vp, Ep) in which the edges represent the processes and the arcs the logical connecting links between the processes G ~ = (V~, E~) where the nodes correspond to the processors and the arcs to the physical connections. Each arc in the graph of processes is associated with a weight which represents the volume of communication between the connected processes. The function W:Ep ~ N is thus defined, which associates a communication volume to each arc in Ep. W returns zero if its argument is NULL. Similarly, for each node in the graph of processors, it is possible to define the function R : V, ~ N, which gives the processing overhead due to the processes allocated to it. Then, having introduced the following notation: G: Vp x Vp ~ E : the function which returns the connecting arch between two processes, N U L L otherwise d:V~ x V~ ~ N: the minimum (integer) number of physical links that have to be crossed by a message traveling from processor i to processor j
(nominal distance) m : Vp ~ V~: the function which returns the processor on which a certain process is mapped. It is possible to define the two following cost functions: • communication cost (Fc) due to topological variation • unbalanced distribution cost (F l) due to cardinality variation For a certain mapping (/z), F¢ can be represented as Fc(m) =
E
W(G(P~,Pi))*d(m(Pi),m(P]))
A FUZZY APPROACH TO MAPPING PROBLEMS
215
while the F l function can be expressed as Fl(m) =
1
Iv.I
g¢
E (H-L.j) z ~je~
where
H=
L~] =
E
R(k).
k~ VpAm(k)= IXj The global cost function we consider is expressed through a linear combi. nation of the two cost functions, Fo and FI:
x(m)=6aFo(m)+(1-a)Ft(m
)
0~a~l
where 6: represents a normalization factor a : weights the two cost functions according to requirements. Finally, g i v e n a generic node in the graph of processes, n e Fp, we define as the Local average weight 1 O'(n) = iEnl
E W(i) i~E n
where
E n = { i ~ E p l i ~ { G ( n , j ) } a n d j ~ Up} is the set of arcs to node n. APPENDIX 2 F o r each rule reported in this Appendix, there exists a dual rule which can be obtained as described in Section 3.1.
Rules for Eight Processes/Four Processors Mapping: • IFC16 is L O W a n d C26 i s f H I G H a n d C36 is L O W a n d (K3+K6) is L O W and C46 is H I G H and C56 is L O W and C78 is LOW, then a63 = HIGH.
216
V. C A T A N I A E T AL.
• I f C17 is L O W and C27 is H I G H and (K3 + K7) is L O W and C47 is H I G H and C57 is L O W and C37 is L O W and C68 is L O W , then a73 = HIGH. • I f C18 is L O W and C28 is H I G H and C38 is L O W and C48 is H I G H and C58 is L O W and C67 is L O W and (K3 + K8) is L O W , then a83 = HIGH. • I f C16 is H I G H a n d C26 is L O W a n d (K2+K6) is L O W a n d C36 is H I G H and C46 is L O W and C56 is H I G H and C78 is HIGH, then a62 = HIGH. • I f C17 is H I G H and C27 is L O W a n d C37 is H I G H and C47 is L O W and C57 is H I G H and C68 is H I G H and (K2 + K7) is L O W , then a 72 = HIGH. • I f C18 is H I G H and C28 is L O W and C38 is H I G H and C48 is L O W and C58 is H I G H and C67 is H I G H and (K2 + K8) is L O W , then a82 = HIGH. • I f C16 is H I G H and C26 is L O W a n d C36 is H I G H and C46 is L O W and C56 is H I G H and C78 is H I G H and (K4 + K6) is LOW, then a64 = HIGH. • I f C17 is H I G H and C27 is L O W and C37 is H I G H and C47 is L O W and C57 is H I G H and C68 is H I G H and (K4 + K7) is L O W , then a 74 = HIGH. • I f C18 is H I G H and C28 is L O W and C38 is H I G H and C48 is L O W and C58 is H I G H and C67 is H I G H and (1(4 + K8) is L O W , then a84 = HIGH.
Rules for Nine Processes/Nine Processors Mapping: • I f Ci2 is H I G H and Ci3 is H I G H and Ci5 is L O W and Ci4 is L O W , then ai3 is HIGH, i = 6, .... 9.
Second Group of Rules (Processor 9): • I f Ci3 is H I G H and Ci4 is H I G H and Ci5 is L O W and Ci2 is L O W , then ai9 is HIGH, i = 6,..., 9.
Third Group of Rules (Processor 7): • I f Ci5 is H I G H and Ci4 is H I G H and Ci2 is L O W and Ci3 is L O W , then ai7 is HIGH, i = 6,..., P.
Fourth Group of Rules (Processor 1): • I f Ci5 is H I G H and Ci2 is H I G H and Ci3 is L O W and Ci4 is L O W , then ail is HIGH, i = 6 , .... 9.
A FUZZY APPROACH
TO MAPPING PROBLEMS
217
REFERENCES 1. J. IC Aggarwal and S.-Y. Lee, A mapping strategy for parallel processing, IEEE Trans. Comput. C-36(4) (Apr. 1987). 2. C. Anglano, G. Balbo, S. Donatelli, G. Franceschinis, and M. Sereno, The mapping problem: Combinatorial optimization approaches and performance evaluation, Tech. Report, Dipartimento di Informatica, Universit~ di Torino, Italy, 1989. 3. S. H. Bokhari, On the mapping problem, 1EEE Trans. Comput C-30(3) (Mar. 1981). 4. M. Davoren, A structural mapping for parallel digital logic simulation, Proc. Distributed Simulation (1989). 5. L. Kronsjo, Complessita' computazionale degli algoritmi sequenziali e paralleli, Techniche Nuove 81-92 (1987). 6. R. C. T. Lee, Fuzzy logic and the resolution principle, J. Assoc. Comput. Mach. 19:109-119 (1972). 7. C. C. Lee, Fuzzy logic in control systems: Fuzzy logic controller--Parts 1 & 2, 1EEE Trans. Syst., Man, Cybern. 20(2):404-435 (1990). 8. C.-T. Lin and C. S. G. Lee, Neural-network-based fuzzy logic control and decision system, 1EEE Trans. Comput. 40(12) (Dec. 1991). 9. E. M. Scharf and N. J. Mandic, The application of a fuzzy controller to the control of a multi-degree-freedom robot arm, in Industrial Application of Fuzzy Control, North-Holland, Amsterdam, 1985, pp. 41-62. 10. K. L. Self, Fuzzy logic design, IEEE Spectrum 27:42-44, 105 (Nov. 1990). 11. K. Sharma, G. Haring and W. Mullner, The application of simulated annealing for optimal module placement in a multiprocessor system, Tech. Report, Institute for Statistics and Computer Science, University of Vienna, Austria. 12. H. Shen, Self-adjusting mapping: A heuristic mapping algorithm for mapping parallel programs onto transputer networks, in Proc. 1st OUG DevelopingTransputer Application, Edinburgh, Scotland, Sept. 1989. 13. L. Snyder and F. Berman, On mapping parallel algorithms into parallel architectures, J. Parallel and Distributed Computing 4 (1987). 14. M. Sugeno, Fuzzy decision-making problems, Trans. SICE 11:709-714 (1975). 15. M. Sugeno, Industrial Applications of Fuzzy Control, North-Holland, Amsterdam, 1985. 16. TILshell User's Manual, Togai Infralogic, Inc., ver. L2, 1990. 17. ¥. Wolfstahl, Mapping parallel programs to multiprocessors: A dynamic approach, Parallel Computing 10 (1989). 18. L. A. Zadeh, Fuzzy sets, Inform. Contr. 8:338-353. 19. L. A. Zadeh, Fuzzy languages and their relation to human and machine intelligence, in Proc. Conf. ~lan and Comput., 1970. 20. L. A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. Syst., Man, Cybem. SMC-3(1) (Jan. 1973). 21. L. A. Zadeh, Fuzzy logic, IEEE Comput. Mag. 83-93 (Apr. 1988). Received I September 1995; revised I November 1995, 3 March 1996