North-Holland Microprocessing and Microprogramming 17 (1986) 29-40
29
Analysis of a few List Scheduling Algorithms for Compaction of Horizontal Microprograms S. Upendra Rao Systems Dept., Tata Iron and Steel Co,, Bearings division, Kharagpur- 721301, India,
and A.K. Majumdar Dept. of Computer Science & Engg,, I.I.T., Kharagpur721302, India. Microcode compaction is an essential tool for efficient translation of microprograms in computers. This paper describes and compares the performance of a few list scheduling algorithms for local microcode compaction. A new approach to utilize resource conflict information in microcode compaction has been developed and a dynamic modification of the microoperation weights has been suggested. The dynamic modification approach has been found to be efficient (worst case computational complexity of order Nz), and is capable of handling long microprogram graphs. Simulation results comparing the performance of the list scheduling algorithms are also
reported. Keywords: Horizontal microprogramming, Code optimization, Parallel processing, List scheduling algorithms, Data dependency, Microcode compaction, Resource conflict.
1. Introduction
Microprogramming is a commonly used tool in the design of computers to achieve flexibility and to improve the performance. Optimization of microprograms improves the execution characteristics of object programs and the memory requirements. Most microprogrammable processors allow simultaneous control of several hardware resources. So, by distributing parallel activities in microprogram to different resources we would be able to reduce microprogram execution length. In fact, compaction of a microprogram involves choosing from the possible arrangements of concurrent activities that will minimize the execution time of a microprogram and its size [3]. This is indeed a difficult task because even though two microoperations are found to be not
data dependent it may not be possible to execute them concurrently due to resource conflict. So identification of concurrently executable microoperations requiring both parallelism detection and resource allocation are to be properly incorporated in optimization algorithms. Efficient microcode for small microprogram segments can be achieved by handcoding. But for large microprogam segments and for the automated translation from higher level microcode into near to handcoded efficient microcode, compaction techniques are needed. The problem of microprogram optimization has received considerable attention for the last one and half decade. Most of the microcode compaction algorithms first partition the given microprogram into straight-line microcode (SLM) segments (basic blocks) and then perform local microcode reduction. Then data flow analysis techniques are generally used for global optimization. However, even the problem of optimizing basic blocks of microcode has been found to be NP-complete [13]. In view of this result many researchers have proposed suboptimal algorithms with polynomial time complexity such that efficient code compaction is achieved [1-6, 8, 11-13]. In fact, recent studies in local microcode compaction indicate that some list scheduling methods compact microprogams so well that their lack of guaranteed optimality may be ignored in most practical situations [3, 5, 6]. In list scheduling algorithms, the microoperations are assigned priority values according to some evaluation weight function and then microinstructions are found by repeated scans of microoperations ordered lists. Fisher [5] has carried out an extensive simulation of different list scheduling algorithms for local microcode compaction and has classified them into groups according to their performance. In most of the list scheduling algorithms resource conflict does not contribute significantly to the microoperation weight function [5]. But in
30
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
many situations it may be desirable to execute a microoperation, say MOi, earlier than microoperation MOj, if MOi has resource conflict with a larger number of microoperations in the given microprogram than that of MOj. This may enable the microoperations which are data dependent on MOi to be freed so that they may execute concurrently with remaining microoperations. Fisher considered a few list scheduling algorithms, such as 'Dense neighbou~hood plus highest levels', 'Dense neighbourhood times highest levels' and 'Resource bottleneck compensation', where weight values of a microoperation also depend on resource conflict information. In the first two methods, the resource conflict contribution to a microoperation weight is defined as a number of microoperations which do not conflict with it, and in the third method it is computed by the modified version of Fernandez and Bussel lower bound (see Fisher [5] for further details). But the resource conflict contribution values determined as above are pessimistic and consist of redundant conflict information. For instance, suppose the microoperation MOi does not have resource conflict with the microoperation MOj and let MOi be data dependent on MOj. Then while computing the contribution of the resource conflict to the weight function of MOj as in above methods, we may neglect the contribution of MOi as it can never be a candidate for parallel execution with MOj. In this connection it may be mentioned here that unlike data dependency, resource conflict of any microoperation, say MOi, is not static. As other microoperations are executed, the number of resource conflicts of MOi with remaining microoperations also reduces. However, in Fisher's algorithms the contribution of resource conflict to the weight function has been assumed to be static. In view of this we propose here a few algorithms, where the weight function depends on both data dependency and resource conflict of a microoperation and the contribution of resource conflict to the weight function is dynamically modified during compaction. Similar to Fisher's experiments we have compared the proposed algorithms with some of the existing list scheduling algorithms using random microoperation sets to measure their effectiveness.
2. Basic Definitions
Some of the definitions given here are common to the microprogramming literature and hence are briefly described. Appropriate references for these definitions have been indicated. We only elaborate the definition of the potential set which we have introduced and this concept is crucial to the design of our algorithms. A Junctional unit is a hardware device which transforms input data or control signals into output data or control signals [13]. A microinstruction, denoted by MI, is an ordered set of control signals in a machine at a given time [6]. A microoperation, denoted by MO, is a seperate machine activity specified in a microinstruction [6]. For two microoperations in a program segment, one microoperation is directly data dependent on the other if that microoperation either relies on the data produced by the other microoperation or destroys the data needed by the other microoperation [131. Resource conflict is a relationship between two microoperations or a microoperation and a group of microoperations, whereby both contend for the same processor device, such as a register, bus, ALU, shifter etc., and fields of the microinstruction [13]. A data dependency graph, denoted by DDG, is an acyclic digraph which represents data dependency information for all microoperations of a microprogram. Each node of the graph represents a microoperation [6]. A data available microoperation is a microoperation for which all microoperations, on which it is data dependent, have already been assigned to some microinstructions [10]. A data available set, denoted by D-set, is a set of data available microoperations [10]. A complete-instruction, denoted by CI, is a microinstruction which can contain no other microoperation from the D-set, because of its resource conflict with other microoperations present in the completeinstruction [6]. The potential set of i-th microoperation, denoted by P(i), is a set of microoperations which does not depend on the i-th microoperation and on which the i-th microoperation does not depend, directly or
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
/'\ I
2
4
6
Fig. 1
indirectly. These definitions are illustrated with a D D G shown in Fig. 1. The nodes 2, 4, 5 and 6 are data dependent on node 1, so the potential set of node 1, [P(1)] is (3, 7, 8), the potential set of node 2 [P(2)] is (3, 4, 5, 7, 8) and similarly P(3) is (1,2, 4, 5, 6). Consider the node 7, which is directly data dependent on node 3 and on it node 8 is dependent. Therefore P(7) is also (1, 2, 4, 5, 6). In case of node 8, it is directly data dependent on node 7 and indirectly data dependent on node 3, therefore P(8) = P(7). The basic idea of the potential set is the grouping of potential microoperations which can be executed in parallel with a given microoperation. But the potential set of a node does not remain static throughout the compaction process. Once a microoperation is selected for compaction we do not need to include it any more in the potential set of remaining microoperations. This idea is utilized in the dynamic microcode compaction methods proposed in section 4. It may also be noted that if MOi is a member of potential set P(j), then it is implied that MOj will be a member of the potential set P(i). Hence symmetric property holds between any two potential sets.
3. Existing Algorithms for Local Optimization The objective of compaction algorithms is to discover concurrently executable microoperations and to compose them into an optimum or near optimum sequence of microinstructions. In this section we first briefly discuss some o f the existing list scheduling algorithms which produce efficient microcode compaction [5, 12]. Later the performance of these algorithms is compared with the algorithms
31
we propose in section 4. The basic procedure in all list scheduling algorithms is the construction of D-set and then the best complete-instruction is determined from the D-set using some metric. This process is repeated untill all the microoperations are included into microinstructions. Let W(i) be the weight assigned to a microoperation MOi using some evaluation function. The selection ofmicrooperations from the D-set for inclusion into a microinstruction (M/) is done as follows: If W(i) = Max [W(j)] for all MOi, MO] ~ DJ set, then select MOi first for compaction. If there exist another microoperation, say MOk, such that W(i) = W(k) = Max [W(j)] for all MOi, MOj
J D-set, select the one that is earliest in the original straight-line microprogram ordering. The selected MO is tested for resource conflict with any MO already present in the current MI and if it has no conflict with them it can be added to ML else it remains in the current D-set. This process is repeated until a complete instruction (C/) is formed after examining all microoperations in the D-set. This CI is then added to the microinstruction list. Next, a new D-set is constructed and the above process is repeated until all the microoperations in the SLM have been placed.
Scheduling Algorithm-l: (HLF) [5]
Highest
Levels" First
Each microoperation in the data dependency graph is assigned a priority value equal to the height of that microoperation in the graph. The height H(i) of a microoperation MOi, is defined as the largest chain from i-th microoperation to the exit [5]. For this method W(i) = H(i). Fig. 2 shows the algorithm at work on the microoperations of the example SLM. Applying the H L F method we arrive at the microinstructions shown in Table 1. Note that the optimum solution as indicated in Table 1 requires a smaller number of microinstructions.
Scheduling Algorithm-2: Wood's Heuristic Method [5, 121 The weight of a microoperation i is the number of
32
S. Upendra Rao and A.K. Majumdar / Analysis of List Schedufing Algorithms
i X
1
/2
31
I 6I 5l
7
T\i/'r
15
13
17
l/
I
16
~8
c
z,
i
9 C %C5 83,5,6,9,10,11,12
11
10
C 14,15
I
12
C
,,
I It,
13 C 14 C
2
3
L
51
7
C 8
/
15
14,18 ,6,17
1 C 3,5,6,7,8 4 C 7,8
6
/\ 7
8 (a)
DDG of the SLM
(b)
Conflict information Fig. 2
Fig. 3
Table 1 D- Set
Complete-Instruction
OptimumSol ution
Table 2
1,4 2,3,4 5,6,7,11 8,9,10, 12 9,10,12, 15 13,14,15, 17 14,16,18 16
1 2,3,4 5,6,7,11 8
1 2,3,4 5,6,7,11 9,10,12
Microoperation (i)
Weight
9,10,12
8,14
13,15,17
13,15,17
14,18 16
16,18
1 2 3 4 5 6 7 8
1 3 4 0 3 2 0 0
D(i)
Table 3
descendants [d(i)] of that MO, direct or indirect, in the DDG, i.e. W(i) = d(i). Consider a data dependency graph as shown in Fig. 3 and the weight values are shown in Table 2. For this case Wood's method produces seven microinstructions, shown in Table 3. The optimal microinstruction sequence for this example consists of six microinstructions only. It may be noted that in Wood's method the M O s which are having maximum data dependency are executed first, independent of the resource conflict. For example in Fig. 3, node 1 has data dependency weight value 1. The execution of this node is delayed because of its smaller weight, even though it has the highest resource conflict. Since the node 1 is delayed, the node 4 which depends on it is also delayed. Had we executed node 1 earlier, we could have
D-Set
CompleteInstruction
OptimumSolution
1,2,3 1,5 1,6 1,7,8 4,7,8 7,8 8
3,2 5 6 1 4 7 8
1,2 3,4 5 6 7 8
combined the M 0 4 with the M 0 3 , thereby reducing the number of microinstructions. In a recent analysis Davidson et al. [3] have shown that the heuristic approach proposed by Wood [12] has many desirable properties and Fisher [5] has shown that it provides better code compaction for wide microprogram graphs.
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
Scheduling Algorithm-3: Dense Neighbourhood Plus Highest Levels [5] Dense neighbourhood of a microoperation i, denoted by NBH(i), in a given D D G is defined as the number of microoperations which do not have resource conflict with MOi, In this strategy the weight o f a microoperation i is defined as W(i) = NBH(i) + H(i), where H(i) is as given in H L F method.
Scheduling Algorithm-4: Dense Neighbourhood Times Highest Levels [5] The weight o f a microoperation i, is defined as:
W(i) = N B H ( i ) . H(i). The algorithms 1-4 are known to have computational complexity of order N 2, where N is the number ofmicrooperations in the data dependency graph [3, 5, 6].
33
conflict with node i,
= 0 otherwise. It may be recalled that the potential sets have a symmetric property and this symmetry is also reflected in the matrix M ( i,j ) . Now the resource conflict weight c(i) of a node i is defined as N
c(i) = ~ M(i,j). In the dynamic modification methods these c(i) values are modified during the compaction process. We next propose four list scheduling algorithms, the first two work with the static assignment of weights and in the last two contribution to the resource conflict weight is dynamically modified. Although there are several ways of combining the data dependency weight and the resource conflict weight here we have selected simple functional forms reported in strategies 5 and 6 for this purpose.
Scheduling Algorithm-5: Sum of Weight Components Method
4. Proposed Algorithms for Local Optimization The weight o f a microoperation i, is defined as W(i) In this section we present a few list scheduling algorithms. All are dependent on data dependency weight and resource conflict weight of microoperations. Thus, the weight functions we propose here are sensitive to both the data dependency graph and the resource constraints. In order to find these components of weight function we need to construct the D D G for a given microprogram. Data dependency weight. The weight of each node is the number of descendents of that node. This is same as W o o d ' s weight function d(i) described above. We preferred d(i) to H(i) as the first component of our weight function because it provides complete data dependency information of a node. Davidson et al. [3, 6] have shown that the data dependency weight function provides better code compaction in m a n y situations. Construction of resource conflict weight. Let there be N number of nodes in the D D G . First construct a N × N resource conflict Boolean matrix M ( i j ) . In order to assign values to M(id), we construct a potential set P(i) for each node i of the graph. Then we assign values to M(i,j) as follows. M(i,j) = 1 i f j e P(i) and node j has a resource
Table 4 Node i
Potential Set P(i)
1 2 3 4 5 6 7 8
2,3,5,6,7,8 1,3,4,5 1,2,4 2,3,5,6,7,8 1,2,4 1,4 1,4,8 1,4,7
Table 5 Node Data Dep.Wt.
Conflict Wt.
Wt. of Node
i
d(i)
c(i)
W(i) = d(i) + c(i)
1 2 3 4 5 6 7 8
1 3 4 0 3 2 0 0
5 0 1 2 1 1 3 3
6 3 5 2 4 3 3 3
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
34
Node i I
3
2
4
1
1 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
6
7
1
1
1
.
.
.
.
.
.
.
.
8 1
.
.
.
.
.
.
.
2 3
1
4 0 Z
1 .
.
.
.
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
1
6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 .
.
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
I .
8
.
.
.
.
.
.
1
.
.
.
.
.
.
1 .
.
1
0nly
non-zero
.
.
.
.
.
.
.
.
.
1
elements are shown
Fig. 4
= d(i) + c(i). These weight function values remain static throughout the execution of the algorithm. After assigning the weight function values to each node in the D D G the compaction procedure is same as in existing list scheduling methods. This method is illustrated with the D D G shown in Fig. 3. The potential set of each node and the resource conflict matrix are given in Table 4 and Fig. 4 respectively. The static weight function is determined for each node, and is given in Table 5. Applying the sum of weight components algorithm we have obtained the microinstruction sequence shown in Table 6. In this case the number of microinstructions obtained is indeed minimum. The static function has caused the node 1 to be executed first, thereby allowing node 4 to be executed in parallel with node 3. As mentioned earlier, Wood's algorithm delays the execution of node 1, because it has low data dependency
Table 6 D - S e t No.
D-Set
Complete-Instruction
1 2 3 4 5 6
1,2,3 3,4 5 6 7,8 8
1,2 3,4 5 6 7 8
weight even though it has highest resource conflict. The resource conflict of a microoperation is not static. As a microoperation, say MOi, is compacted into a microinstruction the resource conflict of the remaining microoperations with M O i may now be ignored. Because of this reason this algorithm may fail to produce sufficient code compaction in many situations. This is evident from the microprogram graph given in Fig. 2 and the corresponding microinstruction list given in Table 8. In this algorithm the additional computation involves construction of the potential set and the conflict matrix. Moreover this conflict matrix has to be scanned for each node to find the resource conflict weight. The construction of the potential set for each node requires scanning of the D D G and has the worst case complexity of O(N2). Similarly construction and scanning of the conflict matrix also have the worst case complexity of O(N2). Therefore, the overall computational complexity of the static algorithm remains O(N2). Scheduling Algorithm-6." Maximum of Weight Components Method In this method the two independent components are treated separetely and the compaction process slightly differs from that of the above methods. The cumulative data dependency weight, say D, is determined by adding the d(i) of all MOs present in the current D-set and the cumulative weight, say C, by adding c(i) of all MOs in the same D-set. The algorithm can be described as: 1. If D-set consists of more than one M O then D = X d ( i ) , C = E c ( i ) for all MOi~D-set else take this M O as a MI. 1. I f D > C, W(i) = d(i). 3. I f D < C a) select MOi such that c(i) = Mjax [c(j)] for all MOj s D-set b) If there exists another microoperation, say MOk, such that c(i) = c(k) = Max [c(j)] for all M O j e D-set select M O i i f d(i) > d(k) else if d(k) > d(i) select M O k else d(k) = d(i), ties are broken according to SLM ordering.
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
As for example consider the D D G given in Fig. 3. Here if we take the D-set (1, 2, 3), the contributions of data dependency weight and the conflict weight are
D = d(l) + d(2) + d(3) = 8 C = e(1) + e(2) + e(3) = 6. Since D > C the selection of MOs from this D-set depends on the data dependency weight of each microoperation. Now, a m o n g the MOs 1, 2 and 3, the MOs 3 and 2 respectively have more weight and they do not have resource conflict with each other. Therefore a complete-instruction (3, 2) is formed. The final complete-instruction list obtained for this graph is given in Table 7.
Table 7 D-set
D C Cl
Order Selection
1,2,3 1,5 4,5 6 7,8 8
8 4 3
6 6 3
Data dependency Conflictwt. Data dependency
0
6
3,2 1 5,4 6 7 8
SLMordering
35
belonging to CIj. Thus for all M O k t: CIj we would have M ( i , k ) = 0, for all i. This procedure would eliminate the contribution to conflict weight of the uncompacted MOs due to MOs which have already been compacted. Modified values of the resource conflict weight for the remaining microoperations are now obtained by summing the non-zero entries in the corresponding rows of the modified conflict matrix. The proposed dynamic modification algorithm is as given below.
Step 1. Generate a D D G of N nodes for a given microprogram. Step 2. Determine data dependency weight d(i) and potential set P(i) for each microoperation MOi. Step 3. Generate an initial resource conflict matrix M ( i,j ) . Step 4. Assign values to the matrix M as follows M ( i j ) = 1 for all MOj e P(i) and MO/ has resource conflict with MOi. = 0 otherwise. Step 5. Determine resource conflict weight c(i) for each MOi as n
c(i) = X M(i,j). Scheduling Algorithm-7: Sum of Weight Components with Dynamic Modification In algorithm 5 the resource conflict weight is computed after the construction of the initial resource conflict matrix where the potential set of each node contains m a x i m u m possible entries. But as pointed out in section 2, when a microinstruction has already been formed, it is no longer necessary to include the microoperations belonging to this M I to be considered as potential candidates for the remaining microoperations. Hence there is a possibility of dynamically modifying resource conflict weight of the node after formation of each MI. This algorithm essentially undertakes this task. Here, as in static algorithm, we start with potential sets having m a x i m u m possible entries and construct the initial conflict matrix M(i,j) accordingly. After a complete-instruction (say CIj) is formed we modify the conflict matrix by deleting all non-zero entries in the row and columns identified by the microoperations
i- 1
Step 6. Determine the values of the initial weight function as W(i) = d(i) + c(i). Step 7. Generate a D-set, if it is empty, optimum solution is achieved, STOP, otherwise go to STEP 8. Step 8. Create an empty CI. Step 9. ind the MOi with maximum W(i), f more than one M O has maximum weight, select MOi on SLM ordering. If MOi has resource conflict with any MO already present in CI then skip MOi, go to STEP 9 else add MOi to CI. Step lO. If all the entries of D-set are examined then add CI to M I list and V M O i ~ CI, M(i,j) = 0 j = 1,n M ( k , i ) = 0 k = l,n go to STEP 5 else go to STEP 9.
S. UpendraRao andA.K. Majumdar /Analysis of List SchedufingAlgorithms
36
i d(i) c(i] W W1 W2 W3 ............................................................ 1
13
1
1/.
-i~
2 3
7 6
0 l
7 7
7 7
* *
4
3
I
4
3
*
W4
WS
W6
W7
duling algorithm-7 has produced optimum solution, where as both Wood's method (i.e. strategy 2) and the list scheduling algorithm-5 have failed to do SO.
5
5
1
6
6
6
6
3
1
4
4
4
,w,
7 8 9
4 3 2
0
4
7 2
10 Z,
4 10 4
4 9 ,4,
6 4
10
4
3
7
7
7
7
11
2
1
3
3
3
12
1
2
3
3
3
13
1
2
3
3
3
3
3
14
0
4
4.
4
4
4
3
Z,
Scheduling Algorithm-8: Maximum of Weight Components Method with Dynamic Modification
3 2
*
15
1
3
4
Z,
4
16
0
I
I
I
1
1
I
0
0
17
I
1
2
2
2
2
2
1
.
IB
0
1
1
I
1
1
Fig. 5
This procedure is illustrated with a D D G and microoperation conflict information given in Fig. 2. The modified weights after each selection of CI is shown in Fig. 5. The weight values at k-th iteration are indicated by Wk. The MOs selected during k-th iteration are indicated by a * symbol. The MOs already selected before k-th step are marked by symbol. In Table 8, we have shown the microinstruction sequences generated for this microprogram using list scheduling algorithms two, five and the present one. It may be noted that in this example the list sche-
Here the dynamic modification of the resource conflict weight is used in the decision procedure in algorithm 6. After every microinstruction selection the resource conflict weight of a microoperation is modified as in algorithm 7 and the new C values are computed. Algorithms 1 to 6 discussed so far have computational complexity or order N 2, whereas in algorithms 7 and 8, additional computations necessary involve modification of resource conflict and recomputation of weight function during each iteration. The worst case computational complexity of the algorithms with dynamic modification of weight function has been shown to be O(N 3) [10]. It may be mentioned that in most practical situations the number of microoperations per basic block is not large as reported by Fisher [5]. Therefore the additional computational complexity involved in the modification of resource conflict weight is tolerable.
5. Simulation Results Table 8 Strategy 2 D-set 1,4 2,3,4 5,6,7,11 8,9,10, 12 8,14,17 13,14,15, 18 14,16,18 16
Cl 1 2,3, 4 5,6, 7,11 10,9, 12 8,17 13, 15 14, 18 16
Strategy 5 D-set 1,4 2,3,4 5,6,7,11 8,9,10, 12 9,10,12, 15 13,14,15, 17 13,17 16,18
Cl 1 2,3, 4 5,6, 7,11 8 10,9, 12 14, 15 13, 17 16, 18
Strategy 7 D-set
Cl
1,4 2,3,4
1 2,3,4
5,6,7,11
5,6,7, 11 10, 9,12 8,14
8,9,10, 12 8,14 13,15,17 16,18
13,15, 17 16,18
In this section we describe simulation experiments done on PRIME-550, to measure the effectiveness of the eight strategies. In order to simulate a microprogram graph with N nodes, we constructed a N × N Boolean matrix and the upper diagonal entries of the matrix are filled up with a uniformly distributed random number generator. We need only to fill up the upper diagonal entries of the matrix since we are dealing with directed acyclic graphs and it is known that a digraph is acyclic if and only if its vertices can be ordered such that the adjacency matrix is an upper diagnonal matrix [7]. We have similarly generated the resource conflict information among the microoperations by filling up another N × N Boolean matrix with a uniform random number generator. Our experiments are similar to Fisher's [5] simu-
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
37
strategy 7 as a base. In this Figure, the X-axis represents the number of microoperations and the Y-axis represents the percentage deviation of the average microinstruction length for each strategy from that of strategy 7. Thus let L,(n) denote the average microinstruction of strategy i for microprogram segments having n number of microoperations. Then the relative performance of a strategy i with respect to that of strategy-7 can be defined as
lation experiments and for each strategy tested we varied the number of microoperations per set from 5 to 120. For each set size, 200 microoperation sets are generated and the average microinstruction length obtained by applying the list scheduling algorithms discussed in the previous sections is reported in Table 9. The column headings (numbered 1 to 8) indicate the list scheduling algorithm which has been used for code compaction. It is evident from Table 9 that for any size of microoperation set, the strategy 7 is consistently better than any other strategy. It may be observed from the simulation results that the average microinstruction length for any compaction algorithm increases almost linearly as the number of microoperations increases. Similar observation can also be drawn from the simulation results reported by Fisher [5], (see Fig. 6). In this report Fisher has compared the performance of a few list scheduling algorithms with the theoretical lower bound computed by the modified version of Fernandez and Bussel. In our simulation, since the strategy 7 produces consistently minimum values as compared to other strategies, the relative performance of the strategies is plotted in Fig. 7 by taking
Pi(n) = [Li(n) - L7(n)l/L7(n).
Fig. 7 indicates distinct clustering of the strategies and based on this observation we may partition the strategies into the following groups of progressively deteriorating performance. GROUP l: [7] GROUP 2: [8] GROUP 3: [1,2, 6] GROUP 4: [4, 5] GROUP 5: [3] We have kept strategy-8 in a separate group because in most cases its performance is better than that of the strategies 1, 2 and 6. However, strategies 1 to 6 have less computational complexity [0(At2)] than strategies 7 and 8 [O(N3)]. It may be noted that among the strategies of the same order of complex-
Table 9 No. MOs 5
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 90 1O0 110 120
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
4.325 8.355 12.400 16.225 20.290 24.435 28.095 32.370 36.560 40.865 44.510 48.450 52.625 56.585 60.195 64,535 73.910 79.810 89,105 95.005
4.305 8.330 12.340 16.185 20.200 24.355 27.935 32.255 36.385 40.615 44.215 48.205 52.295 56.240 59.870 64.230 72.980 80.215 88,675 94.720
4.360 8.520 12.630 16.455 20,745 25.010 28.615 32.970 37.265 41.490 45.380 49.440 53.600 57.575 61.405 65.920 74.615 83.825 90.970 97.265
4.345 8.430 12.500 16.350 20.510 24,720 28,340 32,660 36.935 41.165 44.820 48.845 53,070 57.060 60.860 65,255 74.405 80.800 90.515 95.645
4.315 8.370 12.485 16.395 20,450 24.715 28.420 32.660 36.970 41.330 44.960 49.1 O0 53.005 57.275 61.270 65,290 74.335 81.905 90.295 95.330
4.350 8.420 12.455 16.285 20.330 24.460 27.975 32.360 36.375 40.635 44.310 48,275 52,350 56,375 60,350 65.145 73.050 80.115 88.505 94.530
4.300 8.260 12,305 16.045 20.050 24.175 27.760 32.010 36.195 40.275 43.940 47.830 51.950 55.915 59,430 63.780 72.760 79,235 87,505 93.885
4.345 8.420 12.450 16.185 20.325 24,455 27.920 32.345 36.335 40.605 44.21 5 48.200 52,280 56.235 59.865 64.230 72.880 79.535 88.115 94.175
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
38
IlO-¢-
¢-
I00
--
90-tO
80-U L.
70--
"3
m ¢-
t-
60~
U
E
50~ 40--
L~
30-20--
0 0
io
zo
3o
40
50
Number
60 of
7o
so
90
Joo
lJo
J2o
j3o
140
150
microoperQtlons Fig. 6
ity, strategies 1, 2 and 6 are found to be better with small differences detected among them. Fisher [5] observed that Wood's method, i.e. strategy 2, gives better code compaction for wide graphs. Since the strategies 5 to 8 proposed here are modifications of Wood's method, it is expected that these strategies would also provide better code compaction for wide graphs. In fact, this conclusion was indeed found to be true in some simulation experiments similar to Fisher's. Lastly we have also tried other functional forms for combining the data dependency weight and the resource conflict weight and a few have produced comparable code compaction results. In all these cases it was observed that the performance of the algorithms after considering the dynamic modification of the resource conflict weight values was consistently better than that of their static counter parts. From these results we can conclude that with
slight increase in time complexity we could achieve better code compaction.
6. Conclusion In this paper we have examined a few list scheduling algorithms for local microcode compaction. Several evaluation functions are shown to perform very well on large samples of random data dependency graphs. An evaluation function we proposed is sensitive to both data dependency graph and resource constraints and is found to be the best of those tested. Simulation experiments similar to Fisher's had shown that better basic block optimization can be obtained, with the worst case computational complexity of order N 3, where N is the number of microoperations per block. In most practical situations the number of micro-
S Upendra Rao and A K Majumdar I Analysis of List Scheduling Algorithms
39
4.0 3"0 3.6 3""
3.
C Q_
3,0 2.8
I 01 OJ
cl
?-,6
01
\/I
,
IV~i
2.4
t=.
'^
?..2
II
o Z.O
u ¢, ¢J
::
E tO
1.6
L-OI
1.4' °~
"6 oJ nc
~l
/I
1.8
MI
~1
I
!!A
'
/
\\ \\~
I.?-. 1.0
,,
O, •
:
.!
i ~.v~. J ",x ^'~..
I ~.~V/ " I , ,,A!! /]~'~./~.~
j r-.-
, 1
0,,6
J
~:
"
"~,i/i
0 .,4
V
O.Z0 0
/ //'\ \
I
I
I
I
I
I
I
I
I
I
I
I0
20
30
40
50
60
70
SO
90
I00
I10
I 120
Number of microoperations Fig. 7
operations in a basic block is not large [5], therefore the extra computations involved for dynamic modification of resource conflict weight is tolerable. Here, it may also be mentioned that during global code optimization, maximum parallelism among the microoperations is mostly confined to the adja-
cent basic blocks of a microprogram [9]. The dynamic modification techniques can also be applied in global compaction problems without significantly increasing the computational complexity. In a subsequent paper we shall extend the concept of dynamic code compaction for global optimization problems.
40
S. Upendra Rao and A.K. Majumdar / Analysis of List Scheduling Algorithms
References [1 ] Astopas F. and K.I. Plukas: Method of Minimizing Computer Microprograms, Automat, contr., vol. 5., pp. 1016. 1971. [2] Dasgupta S., and Tartar J., The Identification of Maximal Parallelism in Straight-line Microprograms. IEEE Trans. comput. C-25, 10, pp. 986-992. Oct. 1976. [3] Davidson S., Landskov D., Shiriver B.D., and Mallet P.W.: Some Experiments on Local Microcode Compaction for Horizontal Machines. IEEE Trans. comput., vol C-30, 7, pp. 460-477. July 1981. [4] Dewitt D.J.: A Machine Independent Approach to the Production of Optimal Horizontal Microcode. Ph.D. dissertation, Univ. of Michigan, Ann Arbor, June 1976. [5] Joseph A. Fisher: The Optimization of Horizontal Microcode within and Beyond Basic Blocks. Courant math. and comput.lab, New York Univ. Tech. Rep. C00-3077161, Oct. 1979. [6] Landskov D., Davidson S., Shriver B. and Mallet P.W.: Local Microcode Compaction Techniques. ACM computing surveys, vol. 12, 3 pp. 261-294, Sep. 1980. [7] Narsingh Deo: Graph Theory with Applications to Engineering and Computer Science. Reading, Prentice Hall, INC., 1974. [8] Ramamoorthy C.V. and Tsuchiya M.: A High-Level Language for Horizontal Microprogramming. IEEE Trans. comput. C-23, 8, pp. 791-801. Aug. 1974. [9] Sadahiro I., Yoshizami K. and Toru I.: Global Compaction of Horizontal Microprograms Based on the Generalized Data Dependency Graph'. IEEE Trans. Comput., vol. c-32, No. 10, pp. 922-933. Oct. 1983. [10] S. Upendra Rao and A.K. Mazumdar: An Algorithm for Local Compaction of Horizontal Microprograms. Infor-
mation Processing letters, North-Holland Publishing company, VI. 20, No. 1, pp. 29-33. January 1985. [11] Tsuchiya M. and Gonzalez M.J.: Toward Optimization of Horizontal Microprograms. IEEE Trans. Comput. vol C-25, 10, pp. 992-999. Oct. 1976. [12] Wood G.: On the Packing of Microoperations into Microinstruction Words. In: proc. 11th Annual workshop on microprogramming SIGMICRO News letter 9, 4, pp. 51-55. Dec. 1978. [13] Yau S.S., Schowe A.C. and Tsuchiya M.: On Storage Optimization of Horizontal Microprograms. In: proc. 7th annual workshop on microprogramming (ACM), pp. 98-106.1974. S. Upendra Rao was born in Andhra Pradesh, India, in 1959. He received the B.Sc degree from Andhra University in 1979 and M.Sc degree in Mathematics from I.I.T., Kharagpur, India in 1981. From 1982 to 1985 he was on the staff of the Metal Box India Ltd., Kharagpur. He is currently with the Tata Iron & Steel Company Limited, Bearings Division, Kharagpur, India, where he is working in the area of computer application software. His interests include computer architecture, optimization, programming and microprogramming languages. A.K. M a j u m d a r was born in Calcutta, India, in 1948. He received M. Tech and Ph.D degree in Applied Physics from the University of Calcutta, India in 1968 and 1973 respectively. He also obtained Ph.D degree in Electrical Engineering from the University of Florida, Gainesville, Florida, U.S.A. in 1976. From 1976 to 1977 he was associated with the Electronics & Communication Sciences Unit, Indian Statistical Institute, Calcutta. He served as Associate Professor in the School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi, from 1977 to 1980. Since 1980 he is associated with the Indian Institute of Technology, Kharagpur, where he is presently a Professor in the Computer Science & Engineering Department. His research interests are computer architecture, design and analysis of algorithms, data base management systems, image processing and artificial intelligence.