ii~!!iiii!iiii~
Microelectronics Journal, 26 (1995) 351-359 iii:i::~iiiii!:
!i!ili~i!ii!i
111!!¸17:¸11
!!filial!i~~i~ii!!~
Hardware area estimator based on the SQP algorithm S.S. Dlay and P.D. Neighbour University of Newcastle, Department of Electricaland ElectronicEngineering, Merz Court, Newcastle upon Tyne, NE1 7RU, UK. Tel: +44 (0)191 222 8356. Fax: +44 (0)191 222 8186'. E-mail:
[email protected]
This paper shows the SQP algorithm enhanced in a number of ways to achieve optimum results. The initial placement used by our algorithm (ESQP) is based on 'attraction and repulsion' (AR), force-directed graphs, and provides a more realistic platform from which to work. The ESQP algorithm can be used to generate candidate floor plans, which then allows 'the overall area to be estimated. In addition, by the judicious use of constraints, a designer is able to experiment, evaluate and compare alternative appoaches to the design. A number of bench tests have been performed and comparisons made with the SQP algorithm. The results illustrate the efficiency of the new algorithm in terms of area. 1. Introduction A
pragmatic approach to the synthesis o f digital signal processing (DSP) architectures has been developed that supports the early stages o f the design process [1]. This is needed since it is often necessary to investigate several architectures to test ideas and estimate their likely performance before committing h u m a n cost and effort to a detailed analysis. This paper concentrates on the floor planner, which is an essential part o f the estimation system. It allows the overall area to be obtained and the designer to experiment, evaluate and compare alternative approaches by generating candidate floorplans.
0026-2692/95/$7.00 (9 1995 Elsevier Science Ltd
Strategies for optimizing the layout o f a set of components are an active research area and a n u m b e r o f useful algorithms have been cited in the literature. These range from constructive layout, where an adequate solution is found by combining partial results, to iterative algorithms, which attempt to perturb some existing optimal solution in the direction o f a better low-cost solution. Examples o f the latter are simulated annealing methods [2, 3] and genetic algorithms [4], and examples o f the former are cluster growth [5] and partitioning-based placement [6]. O n e promising algorithm is simultaneous quadratic partitioning (SQP) [7], which is based on the Min-cut algorithm [6,8]. T h e recursive partitioning approach has been extended to the level o f four sections partitioned simultaneously, and thus aims to optimize the global M i n - c u t partitioning. It achieves this by making the cuts between two subpartitions smaller if the n u m b e r o f cuts in the first bipartition is not a m i n i m u m . Minimizing the n u m b e r of cuts o f a particular cut line can minimize the overall net length, as shown in Fig. 1. This figure shows that there are four cut lines, and because of the placement in Fig. l(a) the n u m b e r o f cuts required is five. In Fig. l(b) an alternative placement has resulted in
351
S.S. Dlay and P.D. Neighbour/Estimator based on SQP
This enhanced SQP algorithm (ESQP), together with the performance estimation tool (PEST) [1], enables a designer to experiment with his ideas and to obtain a good area-time performance estimation before financial and human effort are committed. Placement a
2. AR algorithm The AlL algorithm is based on gravitational techniques [10] which classify hardware cells as 'friendly or unfriendly', and specify that each module should be positioned near its friendly neighbour and away from its unfriendly one. The algorithm takes into consideration attractive and repulsive forces and the sizes of the hardware cells.
Placement b
Fig. 1. Min-cut partitioning and placement.
the number of cuts being reduced to four; therefore the overall length of interconnect has been minimized. In traditional placements the objective is to minimize the overall net length rather than the cuts of a particular cut line. This paper shows that the SQP algorithm can be enhanced in a number of ways, such as aspect ratios, constraint handling and initial placement. Aspect ratios depict the sizes of the hardware cells and hence result in a better estimate of the overall area that is required. By introducing constraints, a realistic environment is set up in which a designer would work. The constraints include priority modes [9] in which certain blocks or nodes need to be placed closer together or in certain areas irrespective of the connection list. The initial placement used by SQP is random. Then, using the abovementioned constraints and the interconnection list based on an 'attraction and repulsion' (AR.) force-directed graphs algorithm, a better initial placement and a more realistic platform from which to work are achieved.
352
The A R algorithm represents the hardware cells as nodes and the interconnections between them as weighted edges in a force-directed graph (Fig. 2). The connections are given by the following equations: Con O.= n/j Con/j = 0
for i # j for
i=j
where n O• is the number of connections between node i and node j. Therefore a connection matrix can be formed which gives the value of the connections which are directly and indirectly connected to each other. This also allows the forces to be calculated for the hardware cells. The attractive forces are assumed to be between strongly connected pairs of nodes and the repulsive forces between weakly connected pair of nodes. The forces on every pair are calculated through several cycles until a balanced position is obtained. In this calculation, the blocks that are connected through other blocks are considered to be loosely coupled. The output of the A R algorithm for the example in Fig. 2 is given in Table 1. The negative values show the strongly connected blocks.
Microelectronics Journal, Vol. 26, No. 4
~"
I
input description
3
[
"t" Convert input file
t
ESQP algorithm
I
t---+
change network or constraint
+ ~etermination imenslons of node [
Fig. 2. Force-directed graph.
fioorplan generation I TABLE 1 Output of the AR algorithm. The matrix shows the forces exerted on the blocks
A B C D E
A
B
C
D
E
0 -0.38 1.37 1.79 1.78
-0.38 0 -1.38 1.45 1.44
1.37 -1.38 0 -2.01 -2.97
1.79 1.45 -2.01 0 -1.09
1.78 1.44 -2.97 - 1.09 0
The list of final coordinates produced by the AR process is as follows: Node A x:24.55 y:16.44 Node B x:21.83 y:19.16 Node C x:20.11 y:20.88 Node D x:19.64 y:21.35 Node E x:19.72 y:21.26 T h e area is t h e n divided into four sections, and this is t h e n used as the initial placement in the E S Q P algorithm. T h e resultant placement then has the aspect ratios o f the cells added, and any overlaps are eliminated by shifting the hardware cells in the x and y directions.
3. ESQP algorithm T h e E S Q P algorithm is based o n the simultaneous quadratic partitioning (SQP) algorithm [7], w h i c h reduces the global m i n i m u m o f the s u m o f all the cut lines in order to reduce the final net length. T h e E S Q P algorithm uses a heuristic to m i n i m i z e the n u m b e r o f cuts, and in principle works for an arbitrary n u m b e r o f partitions. H o w e v e r , it has been developed w i t h four concatenated sections in mind; the gain function then simply considers the n u m b e r o f
Fig. 3. Block diagram of the ESQP algorithm.
TABLE 2 Pseudo-code for the ESQP algorithm BEGIN Read in maximum and minimum number of nodes in each partition Read in list of all nodes in the circuit IF priority mode is 'ON' then assign the list of given node(s) to the required partition(s) Add the list of parent nodes to the graph structure REPEAT {read the netlist} Read parent node REPEAT Read sibling node Read pin weight UNTIL {entire net has been read} Construct the interconnections between the given nodes UNTIL {the end of the net list is reached} Read the initial placement of the nodes END
crossing cut lines for each net. T h e various stages o f the E S Q P algorithm are given in Fig. 3 and the pseudo-code for the algorithm is given in Table 2.
3.1 Input description T h e interconnect-ion list file aims to provide a flexible e n v i r o n m e n t to detail the n e t w o r k w h i c h is to be entered into the floorplanner. T h e interconnection net list file contains the following elements.
353
S.S. Dlay and I.D. Neighbour/Estimator based on SQP
(1) T h e m a x i m u m and m i n i m u m n u m b e r o f nodes in each partition. (2) A flag to switch the priority m o d e ' O N ' or 'OFF'. (3) Allocation o f node(s) to certain partition(s) if priority is ' O N ' . (4) A list o f all the nodes in the circuit; these are referred to as the 'parent' nodes. (5) T h e net list itself, which can be viewed as a set o f N nodes connected by a set o f n nets, where it is assumed that a net is defined as a set consisting o f at least one parent node, p, and a sibling node, s, and that each node is contained in at least one net. All nodes which are connected to the parent node are called 'sibling' nodes. T h e n u m b e r o f pin connections (the pin weight, w) between the parent node and each sibling node is also given. Thus
nC_N p e N , s e N , w e set o f integers and the n u m b e r o f nets = N. In addition, each net is translated one at a time, and each one is given in its entirety before the next one is translated. (6) T h e initial placement which was generated by the initial placement algorithm. 3.2 ESQP floorplanner constraints T h e ability o f the floorplanner performance is governed by implicit circuit constraints. These constraints may generally be split into two main groups:
(i) mandatory; (ii) user-specific. T h e mandatory constraints are that each partition is assigned a m a x i m u m and m i n i m u m n u m b e r o f nodes, all nodes must be assigned a partition, all nodes are rectangular in shape
354
(plausible, as in the early stages o f design it is difficult for designers to be particular about the final shape and size o f the modules), and each node is connected to at least one other node. T h e user-specific constraints, i.e. nodal orientation, priority modes, wiring lengths, aspect ratios, etc., change according to the desired result and whether any constraints interfere with one another. T h e algorithm E S Q P uses as its user-specific constraints the priority node restriction and will allow for aspect ratios. During the reading o f the net list the adjacency list is formed, and the cut n u m b e r of the initial placement and gain table is formed from this structure. T h e algorithm is then entered with a pre-defined n u m b e r o f passes. T h e n u m b e r of passes through the placement stage will act as a measure to guarantee that an optimal placement has been found, i.e. a few passes through the algorithm will almost certainly result in the production of the optimal layout, but a larger n u m b e r might be tried for more complex circuits to ensure that all possible paths, and hence placements, have been taken into account. T h e n the core of the original SQP procedure is entered to determine which node is to be moved; the data structure 'partition moves' is used. If a new move is made, the base node is m o v e d to its destination partition, d, and deleted from the source partition, s. T h e circuit n o w has a new cut number. A new gain table is then formed to reflect the new state o f the circuit, and is then sorted to ascertain the best node to move again. In the original SQP heuristic, each placem e n t produced is evaluated to see whether it is better than a 'global' best cut number. In the ESQP, an initial placement, and hence a realistic cut n u m b e r to optimize on, is provided. T h e final placement from each pass is compared with that o f the initial placement. If it is better than this placement, it is kept, and hence takes over from the initial placement circuit. All other placements produced are thus compared with
Microelectronics Journal, Vol. 26, No. 4
TABLE 3 Placement code
moved from its source partition to the destination partition;
BEGIN Ascertain cut number of initial placement Calculate gain table Sort gain table Best-cut = initial-cut FOlk counter = 1 TO number-of-passesDO BEGIN Enter main ESQP algorithm Calculate gain table Sort gain table IF best-ESQP-cut < best cut THEN Save best-ESQP-placement as best-cutplacement Save best-ESQP-cut as best-cut END END this placement, with each better one replacing the current best cut number. The pseudo-code for this is given in Table 3. 3.3 Data s t r u c t u r e s The three main data structures used within the algorithm are detailed more fully below. 3.3. 1 Gain table
O n e o f the new data. structures is a gain table. This is a table o f m o v e m e n t data and allows selection o f the base node for the purpose o f the E S Q P program. The designation o f the columns is shown by the typical format o f the gain table (Table 4), in which: • gain is the factor by which the overall cut n u m b e r would be reduced w h e n a node is
code represents the move that is to be made, i.e. 12 implies a move from partition 1 to partition 2; nodes in source are the number o f nodes left in the source partition if the move were to be carried out; nodes in destination are the number o f nodes left in the destination if the move were to be carried out; priority factor is set to '1' (true) w h e n a move does not move a node from its priority position, and to '0' (false) w h e n the move breaks the priority position. The gain table is sorted into order o f descending gain value; that is t o say, the higher the gain value, the lower the overall cut number o f the circuit will be, and hence that will be the first choice o f node to move. The table is traversed by first trying the highest gain for a possible move. A move is only possible if (i)
the constraints on the m a x i m u m and minim u m number o f nodes in the source and destination partitions have not been broken; (ii) the node has not been moved more than a pre-defined number o f times (this prevents the algorithm from getting stuck in an infinite loop); (iii) the priority condition is still true ('1').
TABLE 4 Gain table Gain
Move-code
Nodes in source Nodes in destination Priority factor
2
12
1
2
1
1
14
2
10
0
-3
21
1
2
1
355
S.S. Dlay and P.D. Neighbour/Estimator based on SQP
If the first node in the table cannot be moved the heuristic moves on to the next node, and so on until a move is found. If a move is not found, then obviously no move can be made and the circuit cannot be improved any further. 3.3.2 Partition moves
For each pair of partitions Ps, Pd (s ~- d) there is a linked list data structure which contains the nodes in that partition, ~and the gain in the overall cut n u m b e r that would result in this node being m o v e d from its current partition, s, to its eventual destination partfion, d. A situation may arise where a move-code points to a partition which may contain two or more nodes which would reduce the cut n u m b e r o f the circuit by the same amount. At this stage no move by any node is better than a node with the same gain value. In this case a node is picked at random, and it is likely that the next pass through the algorithm will result in a different node being picked and hence a different placem e n t path will be taken. Therefore one is justified in assuming, with a high probability, that the minimal placement for a given input circuit will be found. 3.3.3 The adjacency list This is defined by G = (N, n), where N = set o f nodes in initial placement i.e. N = {node 1, node 2, node 3 , . . . , node n} n = set o f nets. For each n: n C N, n # ~b G is stored as a multi-linked adjacency list, whereby the nodes o f the graph are represented by a linked list. Each node in this list is the parent node to another list o f sibling nodes which constitute each individual net of the graph. Thereby, a structure exists where all the parent nodes are entered into this list, with interconnections stored between each node in the circuit sibling node.
356
4. Simple examples O n entry to the E S Q P stage, pre-defined constraints are entered, and the user can give priority to certain nodes according to demand; i.e. two nodes may have to be as close together as possible to minimise the signal delay time between t h e m and thus they must be in the same section o f the layout. Below are two examples which demonstrate the ability o f the E S Q P algorithm to optimize on a pre-defined initial layout circuit. T h e initial placement, from the A R algorithm, which is given in Table 1 has a cut n u m b e r o f four. T h e user constraints with regard to the m a x i m u m and m i n i m u m n u m b e r o f nodes per partition are formulated from the initial placement. For the above example, the m a x i m u m of nodes per partition is three and the m i n i m u m is zero. T h e output from the E S Q P is shown in Table 5. It can be seen that the cut n u m b e r has been reduced to two, and this illustrates that the E S Q P has achieved an o p t i m u m result. Therefore, the final circuit layout presents an overall saving on the silicon area before a detailed analysis is undertaken. Another example, with an initial placement taken as the final result from [7], and an initial cut n u m b e r of eight, is shown in Table 6. It can be seen that the cut n u m b e r has been reduced to four. A final stage o f the floorplanner is to add the aspect ratios of the hardware cells and then use the algorithm for the initial placement to eliminate the overlaps. T h e results can then be used in the PEST system [1] to produce an estimate of the silicon area used. 5. Results A n u m b e r o f bench mark tests have been performed with the E S Q P and the SQP [7]
Microelectronics Journal, Vol. 26, No. 4
TABLE 5 Output from the ESQP algorithm - Example 1 3, 0:3, 0:3, 0:3, 0 *C, 1 A,B, C, D, E. A,B, 1 B,A, 1, C, 2 C,B, 2, D, 3, E, 4 D, C, 3, E, 2 E, C, 4, D, 2 I:E 2:D 3:B 4:A
Optimising circuit layout... Please wait The optimum found for the input circuit is: Partition 1 contains the node(s): D E C 3 node(s) in it Partition 2 contains the node(s): B
A 2 node(s) in it Floorplan generator - SQP method eSQP - Version II eSQP - Initial placement Negative gains working Paul Neighbour 1993
Partition 3 contains the node(s): 0 node(s) in it Partition 4 contains the node(s): 0 node(s) in it
Priority mode ON It has a cut number of two
algorithms and the results are given in Table 7. T h e y have been obtained by using the algorithms on the same set o f examples. The four examples that have been chosen are custom chips which have been designed at Newcastle University, and these range from a cryptology chip VISD [11] to an n-bit recursive adder C A D D [12]. There.fore, the problems chosen are practical and ones that a chip designer would encounter. The algorithms have all been written in the high-level language PASCAL and the comparisons have taken place on a VAX 3100 workstation. In Table 7, the average number o f pins per net is given in C o l u m n 3, the average number o f pins per node is given in C o l u m n 4 and the time shown is the C P U time in seconds. It can be seen that the E S Q P algorithm completes the placement for each example in the least number o f cuts, and therefore the interconnection length needed is less.
There is a penalty to pay for this efficiency as the ESQP algorithm takes longer to execute. Although a designer is interested in h o w long it will take to generate a design, one parameter that is very high on the priority list is the amount o f silicon area that will be saved. If the placement time is put into context, then it can be seen that the design that took the longest to execute, for the A R algorithm, took 63 min o f C P U time. If this is compared with the time taken to complete a design, which is measured in man-years, it can be seen that this is a small penalty to pay and can be tolerated.
6. Conclusions An automatic floorplanner is an essential part o f any performance estimation system since it allows the overall area to be obtained, and enables the designer to experiment, evaluate and compare alternative approaches by generating candidate floorplans. In this way the cost o f the
357
S.S. Dlay and P.D. Neighbour/Estimator based on SQP
TABLE 6 Output from the ESQP algorithm - Example 2 2, 1:3, 2:2, 1:2, 1 # A, B, C, D, E, F, G, H. A,B, 1 B,A, 1, C, 1 C,B, 1, D, 1, E, 1 D, C, 1, E, 1, F, 1 E, C, 1, D, 1, F, 1 F, D, 1, E, 1, G, 1 G,F, 1, H, 1 G,F, 1, H, 1 I:G, H 2:F, E 3:C, D 4:A, B
Optimising circuit layout... Please wait The optimum found for the input circuit is: Partition 1 contains the node(s): H 1 node(s) in it Partition 2 contains the node(s): G E F 3 node(s) in it Partition 3 contains the node(s): D C 2 node(s) in it
Floorplan generator- SQP method eSQP - Version II eSQP - Initial placement Negative gains working Paul Neighbour 1993
Partition 4 contains the node(s): B
A 2 node(s) in it
Priority mode OFF It has a cut number of four
TABLE 7 Comparison table for the SQP [7] and ESQP algorithms Name
VISD [11] CADD MEM NMULT [12]
Chips
SQPP [7]
Nodes
Pins/net
Pins/node
Cuts
Time
Cuts
Time
6256 2944 4784 3312
37.3 13.2 11.8 7.2
92.2 36.3 30.9 16
55 440 10 366 6653 3850
572.4 132.7 308.8 176.7
36172 9917 6468 3491
3784.2 484.3 968.2 748.9
overall design can quickly be reached before a detailed analysis is performed. This paper shows the SQP algorithm [7] enhanced in a number of ways, such as aspect ratios, constraint handling and initial placement. The initial placement used by most SQP-type algorithms is random, but here, by using a placement based on 'attraction and repulsion' force-directed graphs, the algo-
358
ESQP
rithm is given a better initial placement and a more realistic platform from which to work. T w o i m p o r t a n t metrics that have to be c o n s i d ered in a design are 'design t i m e ' and silicon area, a n d these c o n t r i b u t e to the cost o f the design. It has b e e n d e m o n s t r a t e d that the e x a m p l e s to w h i c h the E S Q P a l g o r i t h m has b e e n applied
Microelectronics Journal, Vol. 26, No. 4
have s h o w n a saving in silicon area, b u t have taken m o r e time to execute. If the results are examined in depth, it can be seen that this stage o f the design is taking, at the longest, a matter o f hours to execute, whereas the overall design time o f a c u s t o m chip is measured in man-years. Therefore, a few hours is a small price to pay for better utilization o f silicon area. T h e algorithm is being incorporated into the design tools being developed at the University o f Newcastle u p o n Tyne, w h i c h are being used o n a regular basis b y research staff and students. A measure o f the success o f the above algorithm is that the overall cost: o f the design can be estimated relatively quickly, so that m o r e time can n o w be spent o n the; actual design o f the circuit and m a n a g e m e n t o f the c o m p l e x data.
References [1] S.S. Dlay, O.R. Hinton, M.tL McLauchlan andJ.R. Holbrooke, Use of performance estimation in topdown design of VLSI architectures for DSP, JFIT Technical Conference at the University of Keele, UK, 23 and 24 March 1993, pp. 197-204. [2] S. Kirkpatrick, C.D. Gellatt and M.P. Vecchi, Optimization by simul;,ted annealing, Science, 220(4598) (1983) 671-680.
[3] R.A. Rutenbar, Simulated annealing algorithms: an overview, IEEE Circuits Devices Mag., 5 (1989) 1926. [4] J.J. Grefenstette, R. Gopal, B.J. Rosmaita and D. Van Gucht, Genetic algorithms for the travelling salesperson problem, Proc. Int. Conf. GeneticAlgorithms and theirApplications, Pittsburgh, PA, 1985, pp. 154-159. [5] M. Hanan and J.M. Kurtzberg, Placement techniques, in M.A. Breuer (ed.), Design Automation of Digital Systems: Theory and Techniques, pp.213-282, Prentice Hall, Englewood Cliffs, NJ, 1972. [6] M.A. Breuer, Min-cut placement, Design Automation Fault Tolerant Computing, 1(4) (1977) 343-362. [7] A.G. Hoffman, Towards optimizing global rain-cut partitioning, Pr0c. European Conf. Design Automation (EDA C), Amsterdam,25-28 February1991, pp. 167-171. [8] G. Vijayan, Min-cost partitioning on a tree structure and applications, Proc. 26th DesignAutomation Conference (DAC), Las Vegas, NV, 25-29June 1989, pp. 771-774. [9] P.D. Neighbour, Automatic floorplan generator for custom VLSI, Undergraduate Thesis No. 1599, University of Newcasde upon Tyne, 1992. [10] K. Ueda, H. Kitazawa and I. Harada, CHAMP: Chip foor plan for hierarchical VLSI layout design, IEEE Trans. ComputerAided Design, 4(1) (1985). [11] V. Iliev, S.S. Dlay, M.R. McLaughlan, A.M. Koelmans and D.J. Kinniment, Advanced VLSI validated input security device employing data and hardware verification features, IEE Proc., 136(E6) (1989) 471477. [12] H.C. Yung and C.R. Allen, Part 1: VLSI implementation of a hierarchical multiplier, IEE Proc., 131(G2) (1984) 61-66.
359