Global optimization of multipleiers and buses in interconnect synthesis

Global optimization of multipleiers and buses in interconnect synthesis

Microelectronics Journal, 24 (1993) 513-532 i~iii ii i ~ i !i~ii! Global optimization of multiplexers and buses in interconnect synthesis T. C. Wils...

1MB Sizes 2 Downloads 43 Views

Microelectronics Journal, 24 (1993) 513-532 i~iii ii i ~ i

!i~ii!

Global optimization of multiplexers and buses in interconnect synthesis T. C. Wilson, B. Halley and D. K. Banerji Department of Computing and Information ,Science, University of Guelph, Guelph, Ontario, Canada NIG 2W1

M. K. Garg IBM Laboratories, Toronto, Ontario, Canada M3C IH7

R. Deadman Mitel Corporation, 350 Legget Drive, Kanata, Ontario, Canada K2K 1X3

In this paper, we consider the problem of optimizing interconnection complexity in behavioral level synthesis of digital systems. We assume that, as a result of other steps m synthesis, logical connection requirements have already been determined, with a corresponding level of multiplexing implied. We further reduce the total amount of multiplexing by combining logical connections into shared (physical) connections wherever possible. Our measure of interconnection complexity is the number of equivalent 2 x 1 multiplexers. Using this criterion, we can guarantec an optimal solution to this problem. The problem is modelled with a graph, which is then pruned extensively bcfore being used as an input to any one of the two alternative solution techniques. The primary (and optimum) technique employs integer linear programruing. We also provide a faster heuristic solution that yields near-optimal results. Since the actual implemcntation of

shared connections may vary widely, we also show how the linear program can be modified to optimize and count the total number of n x 1 multiplexers (n ~> 2) or the total number of tri-state buffers in a bus-based interconnect implementation.

1. Introduction I

n t e r c o n n e c t e l e m e n t s s u c h as m u l t i p l e x e r s , buses and wires consume significant area on a V L S I c h i p . A s a r e s u l t , m u c h e f f o r t has g o n e i n t o m i n i m i z i n g i n t e r c o n n e c t costs [11. In this p a p e r , w e e x p l o r e a n d s o l v e an i m p o r t a n t s u b p r o b l e m in i n t e r c o n n e c t o p t i m i z a t i o n .

0026-2692/93/$6.00 ((;~ 1993, Elsevier Science Publishers Ltd.

513

T. C. Wilson et al./Optimization of multiplexers and buses

Specifically, we assume that a high-level synthesis system has already allocated a 'minimal' set o f major components: functional units and registers. W e also assume that the system has judiciously mapped operations to functional units and variables to registers in order to minimize the number o f connections among these major components. However, these connections are all presumably point-to-point, without any significant sharing o f wires by different connections. The point-to-point connection scheme imposes an initial native multiplexing requirement at any terminal that serves as a destination for more than one connection. Our goal is to further reduce the amount of native

multiplexing and interconnection complexity, by aggregating certain point-to-point connections into shared connections. We refer to this process as connection coalescing. We refrain from using the term 'merging', in case this suggests pair-wise combining o f connections. Our formulation and solution are global and optimum, in that the sets o f connections which are coalesced achieve the m a x i m u m possible reduction in overall interconnect cost. Wc measure this cost reduction in two differcnt ways, depending on the interconnect style preferred by the designer. In one implementation style, all the sources that share a connection employ multiplexing to access their shared connection. The objective is to replace some of the native multiplexing entering destinations by a smaller amount o f multiplexing entering the shared connections. In this case, wc use thc number o f equivalent 2 x 1 multiplexers as the measure o f cost. (We realize that n x 1 multiplexers may be uscd in practice, and that thesc havc areas which are not linear functions o f n. Latcr we explain how to adapt our formulation for 2 × 1 muxes, so that n × 1 mux costs can bc evaluated exactly.) The sccond implementation style employs buses for the shared connections. In this case, we count

514

the number o f tri-state buffers needed to access the buses. For either implementation style, we count the number and sizes o f data steering elements, but not path lengths or widths. Figure 1 shows the use o f both the m u x and the bus implementations for improving an initial point-to-point interconnection pattern. Figure la displays three sources accessing three destinations via six point-to-point connections. We assume that all six connections arc active at different times, and therefore capable o f sharing a c o m m o n connection. The two 2 x 1 muxcs shown entering the middle destination might bc realized by a single 3 x 1 mux in practice. Figure lb shows connections 3, 4, 5 and 6 all sharing a c o m m o n connection. This shared conncction is entered by a single 2 x 1 m u x that replaces thc two shaded 2 x 1 muxes in the original scheme. This is a m i n i m u m cost configuration when using a mux implementation. Figure lc shows the same four original connections combined, but this time they sharc a bus, acccsscd via two tri-state buffers. A lcss expensive bus implemcntation appears in Fig. l d. Wc do not pcrmit mixing the two implementations o f shared connections, but we do provide both a prccisc formulation and an cfficient optimal solution technique for whichever implementation is chosen. Where there is no need to distinguish betwecn the two implementations, wc sometimes refer to the coalesced (wirc or bus) connection as simply a coalesced connection (CC). This aggrcgation o f connections occurs before component placement and connection routing and does not explicitly account for connection lcngth. However, by reducing the ovcrall number of connections, total wirc length should be reduced as a byproduct. Furthermore, ira preliminary floor plan is available, our algorithms can avoid impractical connection arrangemcnts by preventing physically rcmotc conncctions from being combined. After describing relatcd work in section 2,

Microelectronics Journal, Vol. 24

D-

3

destinations

| /

S - 3

(a)

(C)

muxjT-lJ Shared /S°urce .,~ PathC°nnecti°n

sources

native multiplexing

(b)

optimal mux implementation

businplementation using

(d)

anoptimal bus implementation

same finks as (b)

Fig. 1. M a p p i n g c o n n e c t i o n s o n to a c o a l e s c e d c o n n e c t i o n .

section 3 provides the basic concepts and an overview o f the optimization process. Section 4 describes exact criteria for evaluating the effect o f coalescing any feasible set o f connections. Subsequent sections describe h o w these criteria are employed in discovering o p t i m u m sets o f connections to coalesce. Both heuristic and exact linear programming solutions are presented. This is done for both the m u x and bus implementations o f shared connections. Finally, we show some experimental results and c o m m e n t on the use o f this optimization in the context o f a large synthesis system.

2. Background 2.1 Preliminaries

Before reviewing existing solution methods, a few definitions w o u l d be helpful. A link is a unique, o n e - w a y logical connection from a source to a destination at the register transfer level. Sources and destinations are typically

individual registers or terminals on functional units. W e assume that all data travelling between a particular source-destination pair traverse the same link. Links are inherently point-to-point and are the objects o f aggregation into shared connections. T w o links are compatible if they are never active on the same control step. Compatibility is a necessary condition for sharing a connection. It can be determined easily by an inspection o f the scheduled dataflow graph. This relationship is depicted by the compatibility,graph, H. Each o f its nodes represents a link, and t w o nodes are joined by an edge if and only if the corresponding links are compatible. A clique is any completely connected subgraph (of H), not necessarily a maximal one. In fact, each individual node constitutes a trivial clique. Every non-trivial clique o f H identifies a set o f links that could feasibly be merged on to the same coalesced connection. W h e n H is partitioned into disjoint

515

T. C. Wilson et al./Optimization of multiplexers and buses

cliques (called a clique cover), the set o f cliques

ref. [2] suggests that this is a reasonable estimate. (A different cost assumption would affect the graph pruning and heuristic search methods for the bus-oriented mode o f implementation. However, the linear programming formulation would not be affected, nor would any methods applied to mux-oriented implementation.)

corresponds to the set o f connections in the design. Non-trivial cliques correspond to shared connections. Note that any feasible solution to our optimization problem is equivalent to some clique cover. W e use the term link set to describe the set of links whose corresponding nodes in H constitute a particular clique Ci. W h e n examining the implications o f adopting clique Ci as part o f the final clique cover, we often make use o f a related connection graph G i. The edges o f C i a r e precisely the link set whose nodes formed clique Ci in H. The nodes of Gi represent the registers and terminals o f the design that are connected by members o f that link set. Thus Gi is an abstraction o f the original connection pattern and native multiplexing affected by Ci (but without explicitly representing the implied multiplexers). Figure 2 shows some examples that relate to Fig. 1, and will be explained later. Native multiplexing, because it occurs near the destinations o f links, is often called destination multiplexing. Muxes that allow links to access a shared connection are called source multiplexers.

2.2 Related work The published approaches to the problem can be distinguished by the style o f algorithms employed. The majority are constructive and iterative. When a new operator binding, register binding, or datapath reduction is being considered, successive decisions depend on the incremental impact on interconnections. Many of these approaches employ a greedy pairwisc merging of links, for example Midwinter [3 I, early HAL [4], L Y R A and A R Y L [5], and W o o [1]. The pairwise merging may be accompanied by other optimizations. ADPS 16] includes flipping operands for commutative operators. M A B A L [7] considers m a n y interconnect implementations, including possible introduction o f an additional module. The search for a shareable path cxtends even to paths through idle functional units in the Elf system [81 . SPLICER [9] considers a large number o f possibilities via branch-and-bound search.

Finally, throughout this paper we assume that a tri-state buffer consumes approximately onehalf the area required by a 2 × 1 multiplexer;

0

)

D (a) original G i

3

O

3

O

5 3

(b) reduced G i for mux implementation Fig. 2. Connection graphs.

516

D

O

)

D

D

D

(c) r e d u c e d G i for bus implemenation

Microelectronics Journal, Vol. 24

The problem with any step-by-step constructive approach is the possible loss o f a global optimum. The alternative is to consider the problem in its global dimensions right from the start. Some recent w o r k [10] assumes that scheduling and allocation have taken place, and then constructs a tree o f all possible link combinations. After searching this tree with a branch-and-bound method, a global solution is found. N o preliminary reduction is attempted, and for large trees heuristic searching replaces the exhaustive method. Most global approaches, including ours, look for cliques in the link compatibility graph H. Facet [11] is an early example that sought a m i n i m u m number o f cliques; no consideration was given to which cliques w o u l d best group links that could actually share sources and destinations. Latterday H A L [12] improved upon this approach by weighting the edges o f H to indicate the interconnect advantage o f associating each pair o f links. Edges having too low a weight are suppressed during clique finding. Although this does take a global view, the weights still focus on pairwise merging, and their meaning within a clique of three or more links (nodes) is not clear. Any m i n i m u m clique cover o f the graph (with suppressed edges) is considered acceptable. W e also take a global approach, based on finding disjoint cliques in H, but our technique differs m the following ways: we severely prune H to eliminate any links and cliques having no potential for reducing the number o f multiplexers. W e do not consider pairwise costs at all, and therefore do not weight the edges. W e do not consider all m i n i m u m clique covers as equivalent, nor do we even seek a m i n i m u m number o f cliques to cover H. W e have developed a precise criterion to assess the interconnect effects o f any individual clique, based entirely on structural properties o f a related graph. As a rcsult, we can evaluate exactly the effect o f any combination of disjoint cliques and can discovcr a globally o p t i m u m set o f cliques.

3. Essence of the solution 3.1 Characterizing an optimal solution

Any final solution to the connection sharing problem is represented by a set o f mutually disjoint cliques that cover every node o f the link compatibility graph H. Each clique in the final solution will represent a distinct coalesced connection, and all links in any clique will share that connection. O u r starting point is the native multiplexing situation where no connection sharing has been introduced. Each individual node constitutes a clique o f its own, and represents a direct connection from a source to a destination. The native clique cover consists entirely of" singlelink (one-node) cliques. O u r objective is to discover a clique cover that provides m a x i m u m improvement over this baseline situation. C o n trary to popular wisdom, a clique cover containing the fewest number of cliques is not necessarily the best choice. Although the final set o f selected cliques must be disjoint, it is certainly possible t-or t w o or more o f these cliques to have links that terminate at a c o m m o n destination. This, o f course, implies multiplexing at such destinations to select input from more than one coalesced connection. l)estination multiplexing need not be explicitly counted, however, because it is already part o f the original native multiplexing requirement. It may be considered as residual native multiplexing that has not been removed by connection sharing. The m u x entering the middle destination in Fig. lb is an example. The contribution to the final solution from any candidate link set can be evaluated by considering properties o f this link set independently of any other. O n e reason is that the link sets will be disjoint in any final solution. The other reason depends on using equivalent 2 x 1 muxes as the basis o f measurement. This makes the multiplexing cost at destination terminals a linear

517

T. C. Wilson et al./Optimization of multiplexers and buses

function o f the number o f incoming links. Any decrease in the number o f incoming links has the same cost advantage at the destination terminal, regardless o f the combination o f factors that cause the number o f links to decrease. Each affected destination will n o w see a single 'representative' wire from the shared connection, rather than separate wires from all pertinent links in the set. The resulting impact in multiplexing cost at each affected destination depends only on the number o f pertinent links being coalesced by this link set. O f course, each candidate link set will imply a certain amount o f internal cost to provide access from its sources to its o w n shared connection. Thus, the net effect o f this link set can be determined entirely from its o w n internal structure. In summary, we have developed a simple and exact measure to evaluate the individual contribution o f any clique Ci to a final solution. This contribution is based entirely on consideration o f the associated connection graph G i. O u r quest is to find a clique cover o f H, whose individual clique contributions have the largest possible sum.

3.2 A three-step solution strategy Obviously, all possible candidate cliques are subsets o f the maximal cliques of H. N o t all subcliques, however, are useful candidates. Before searching for a final set o f cliques, we can eliminate any candidate cliques that offer no advantage in terms o f m u x reduction. In addition, we can often reduce the size o f cliques that do have potential benefit by removing links that happen to bc compatible with the others but do not themselves contribute to the improvement. The identification, elimination and reduction o f candidate cliques are valuable (but not mandatory) preliminaries to the actual sclcction o f a maximum-benefit clique cover. O u r solution strategy involves three basic steps:

518

(1) Identify the maximal cliques o f H that are large enough to offer potential advantage. (2) By inspecting the connection pattern within each clique, identify its minimal subcliques that offer m a x i m u m advantage, thus reducing the effective size of some candidate cliques and eliminating some others.

(3) Actually search for a set o f disjoint cliques that together provide m a x i m u m interconnect cost reduction. The final step is the most important. W e propose t w o alternative solution approaches for this selection process. O n e is guaranteed to be optimal and employs an integer linear program (ILP). The other is heuristic and involves analysis o f graphs. Both solution techniques benefit from prior analysis and reduction o f H and G. Hence, the first t w o steps construct and manipulate the appropriate graphs in a w a y that reduces thc problem space to its minimum size without risking the loss o f a globally o p t i m u m solution. Connection sharing problems are usually capable of extensive prior reduction. Although not strictly necessary, this reduction enables the final selection o f cliques to proceed very quickly. O u r ILP and heuristic solutions usually have comparable times, typically 1 or 2 seconds. In general, the candidate cliques o f H overlap to a considerable extent, whereas the final solution requires each link to belong to a single selected clique. The ILP and heuristic solutions differ essentially in the w a y thcy handle links that lie in the intersection o f several candidate cliques. The heuristic assigns all the links in an intersection to the same candidate clique, whereas the ILP assigns each link individually and more carefully. The actual selection process (Step 3) will bc described in sections 5 and 6. Section 4 cxplains the graph reduction process of Steps 1 and 2.

Microelectronics Journal, Vol. 24

4. Graph reduction 4.1 The overall reduction process

Step 1 is the construction of the link compatibility graph, H, and recognition o f its potentially useful maximal cliques. As we will show later, a m i n i m u m number o f links must share a connection before this sharing can possibly reduce the native interconnection cost. (This number happens to be four links in the mux implementation and three in the bus implementation.) Any clique containing too few nodes can be safely eliminated. The maximal cliques from H that have sufficient size certainly contain all sets o f links that could be aggregated, but these cliques are not necessarily good choices themselves. Step 2 examines the connection pattern o f the links that constitute each such maximal clique o f H. Such patterns are revealed by the corresponding clique connection graph. Step 2 has three major functions: • A clique is decomposed into separate subcliques whenever the clique's connection graph is not itself connected; the subcliques correspond to the connected components o f the clique connection graph. The rationale is that link sets which do not share any c o m m o n sources or destinations should not be forced to share a c o m m o n connection. This decomposition always lowers the cost o f a m u x implementation but has no effect on the cost o f bus implementation, yet is done anyway to simplify the solution process. • Links are removed from any subclique when their inclusion within the clique cannot be advantageous. Including some o f these links w o u l d be positively costly, while including others simply w o u l d not contribute to cost reduction. • A n y remaining subclique must meet certain

necessary conditions to have an impact on cost reduction. A n y subclique that does not satisfy these conditions is eliminated from further consideration. If all the cliques to which a link belongs are eliminated, the link itself can be excluded from further consideration. For example, we assumed that the six links in Fig. 1 were mutually compatible. Thus, they w o u l d constitute a subgraph o f six completely connected nodes in the link compatibility graph, H. Since this clique contains sufficient members, it w o u l d survive the first step and be considered in the second. Figure 2a shows the connection graph for our example, and corresponds to the native connection o f Fig. la. The connection graph in Fig. 2b and its realization in Fig. l b represent an o p t i m u m interconnect reduction for the m u x implementation: t w o destination muxes replaced by one source m u x entering the shared connection. R e m o v i n g any link from the aggregate will introduce another 2 × 1 m u x at a destination. Including either or both o f the other links (1 or 2) will not improve the situation; an additional 2 × 1 mux w o u l d be required to allow the first source access to the bus. O n the other hand, the implementation shown in Fig. lc is not the best when using buses. It also corresponds to the connection graph in Fig. 2b, is certainly feasible, and is better than the native multiplexing pattern. H o w e v e r , for a bus implementation, the o p t i m u m connection (sub)graph is the one shown in Fig. 2c, whose circuit realization is shown in Fig. l d. Inclusion o f the remaining link (1) does not reduce interconnections further. The bus and m u x implementations exhibit different necessary conditions for suitable cliques; this accounts for the differences in this example.

519

T. C. Wilson et al./Optimization of multiplexers and buses

Since our algorithms ignore links and cliques that cannot contribute to cost reduction, the cliques selected in Step 3 may not cover every link o f the original graph H. This does not indicate an infeasible or inadequate solution. Any link which is not explicitly covered by our algorithm can remain as a direct connection without any effect on the interconnect cost (in terms o f our measure). These are links which cannot possibly decrease connection costs by sharing connections. Notice that throughout the examples, we identified the m i n i m u m number o f links whose aggregation would produce the m a x i m u m improvement. There are two important advantages in doing this. The first is to rcduce the search space to the m i n i n m m size without sacrificing optimality. The sccond is to retain flexibility for later, when a floor plan is finally available. Links that are not already aggregated with others can often still utilize shared connections, without altering the number o f muxes or tri-state buffers. By deferring the path sharing decision for these links, physical considerations can be employed later, without affecting the cost o f data steering elements. For examplc, without affecting our cost measure, link 1 in Fig. l d could be routed to its destination using the bus, if this would simplify the wiring. As a final example o f our strategy, consider the link compatibility graph in Fig. 3. Step 1 would idcntify five maximal cliques: {1,2,3], {1,3,4,5}, {4,5,6}, {5,7} and {7,8}. If a mux implementa-

(

T

Fig. 3. Link compatibility graph, H.

520

tion is being considered, only the cliquc containing four links is large enough to be considered for a shared connection. If Step 2 does not reduce it and evaluates it positively, links {1,3,4,5} will share a connection, and links 2, 6, 7 and 8 will remain direct connections. However, if a bus implcmentation is bcing considered, only the last two maximal cliqucs will be eliminated by Step 1. Suppose that Step 2 decides that clique {4,5,6} has no advantage, and that removing link 4 from {1,3,4,5} does no harm. Then Step 3 will be given only two candidate cliques: {1,2,3} and {1,3,5}. These cliques obviously overlap, and happcn to be o f m i n i m u m size tbr useful (bus) cliqucs. Therefore, the selection process will choose the one with the greatest advantage, say {1,3,5}. This implics that the final clique cover will contain five singlc-node cliques besides {1,3,5}. This six-clique cover is in marked contrast to the ' m i n i n m m ' 3-clique cover, suggested by traditional wisdom. O f course, physical considerations can later cause link 4 to share a connection with either link 6 or with {1,3,5}. No extra data steering elements would bc introduced if {4,6} has zero advantage, or if { 1,3,4.5 } has the same advantage as {1,3,5}, respectively. Similarly, links 7 and 8 might cventually share a connection. 4.2 Evaluating an individual clique We now consider tile net reduction in (equivalent) 2 × 1 muxcs if all the links in some clique Ci of H are merged onto the same CC. The interconnect measure is called thc advaHtage o f including clique Ci ill the final solution. This measure will cnable H to be significantly rcduced to include only those links that could lead to mux count reduction through sharing. It also governs the heuristic solution technique.

Consider any clique Ci. For the corresponding conncction graph Gi we dcfinc:

Microelectronics Journal, Vol. 24

Si ~_ number o f source nodes D i = number o f destination nodes Si + Di (number o f nodes) L i -= number o f links M i = number o f 2 x 1 native muxes required to connect the Si sources to D i destinations with L i links d i ~-- number o f 2 x 1 source muxes joining all sources o f Gi to the C C implemented using muxes number o f tri-state buffers joining all B i sources o f Gi to the C C implemented as a bus

The connection graph represents the native point-to-point multiplexer configuration, the situation before our optimization. The number o f 2 3< 1 muxes required at a destination is one per .incoming line excluding the first. Thus, if all destinations have at least one incoming line Mi = Li -- Di

(1)

If this entire set o f links in G, is coalesced using muxes, the result is a cluster of 2 x 1 muxes entering the single CC. The number o f 2 x 1 source muxes is: A i ~-

S i -

1

(2)

These source muxes replace all the M i destination muxes pertaining to these links. W e define the advantage, A(Gi), o f coalescing all links o f clique Ci using 2 × 1 muxes as the resulting net reduction in 2 x 1 muxes. Using relations (1) and (2): A(GI) = M i - A i =

Li -- Ni

= (L,.-Di)n t- 1

(Si-1)

(3)

If a bus implementation is being considered, all Si sources require a tri-state buffer to access the bus, each such buffer being half the cost o f a 2 x 1 m u x [2]. Thus, we define the advantage o f coalescing all links o f C i onto a bus as:

B(Gi) = M i - O ' 5 B ;

= Li-Di-0.5Si

(4)

Using relation (3) or (4), a simple comparison between the numbers o f nodes (Si and Di) and edges (Li) in a connection graph (Gi) precisely computes the relative 'usefulness' o f the corresponding clique (Ci) in H. Table 1 shows the advantage measures for the connection graphs o f Fig. 2. Note that the smallest connection graph having greatest advantage identifies the clique chosen to be a candidate in the final solution (Step 3). Thus, for the m u x implementation, the configuration o f Fig. 2b is chosen over Fig. 2a. For the bus implementation Fig. 2c is preferred. 4.3 Conditions for a useful clique (mux implementation) This section develops criteria which a clique must satisfy in order to contribute to m u x reduction when a m u x implementation is being considered. These criteria often allow early removal o f parts o f the link compatibility graph, thereby simplifying the search for useful cliques. N o clique should be considered unless the merging o f its links onto a C C produces an actual reduction in the number o f equivalent 2 x 1 muxes, i.e. A ( G i ) >~ I => L , - ( S i +

D , ) + I >~ 1

(5)

=> Si + Di <~ Li

The fact that at most one link may connect any source-destination pair leads to the relation TABLE 1 Evaluationof graphs from Fig. 2 Corresponding graph in Fig. 2

Li

Di

Si

A(Gi) B(Gi)

2a 2b 2c

6 4 5

3 2 2

3 2 3

1 1 0

1.5 1.0 1.5

521

7-. C. Wilson et al./Optimization of multiplexers and buses

(6)

Li ~ S i x Di

be handled separately. If Gi has m > 2 disjoint connected components Gi = Gil (.I Gi2 I..J ... U Gim, then

Limiting attention to strictly positive values, relations (5) and (6) are both satisfied only when

= A(c,) + ( m - 1) > A(C,.) (a) Si 9 2 ;

(b) Di 9 2 ;

(c) L i 9 4

(7)

All the conditions in relation (7) must be satisfied if coalescing all L i links is to produce any reduction in the number e f 2 x 1 muxes. For example, cliques containing fewer than four links (4 nodes in H) can be ignored!

4.4 Graph reduction algorithm (mux implementation) Steps 1 and 2 from section 3 can now be expanded:

Step 1: Identify sufficiently large cliques of H (1) Construct the link compatibility graph H.

(2) R e m o v e from H any node that is met by less than three edges. Since there must be at least four compatible links to derive any benefit from merging links onto a CC, a link must be compatible with at least three other links.

(3) In what remains of H, find maximal cliques C1, C2,... , C n (not necessarily disjoint). (4) Ignore any clique containing less than four nodes, and discard any of its nodes that belong to no other remaining clique.

Step 2: Reduction of individual cliques

k

because (m - 1) muxes would be required just to force the m components onto a c o m m o n CC.

(3)

For each disjoint component Gik o f Gi:

(a) Discard component Gik if Sik = 1, Dik = 1, Lik ~ 3, or Nik > Lite.

(b) Repeat the following until it does not apply or until Gik can be discarded by rule 3a: R e m o v e any node in Gik. which is met by a single edge. These represent benefit-preserving, neutral reductions. Lowering Lik by 1 and Nik by 1 does not affect the advantage o f the clique, but it does reduce the complexity o f future computations by reducing the numbcr o f links for consideration. (4) Discard from H any node which belongs to no rcmaining clique, other than Ci, and whose corresponding edge in Gi was discarded. Discard Gi (and hence Ci) if all its components arc discarded. At this point, all nodes remaining in Gi are met by at least two edges. Because each edge has one end in each (bipartite) node set, Li > 2Si

and

Li > 2Di =:5 Li > Si Jr- Di

=> A(Gi) > 1 For each potentially useful clique Ci obtained in Step 1, remove from Ci any links whose merging will not contribute to m u x reduction: (1) Construct the corresponding connection graph G i o f clique Ci. (2) Disjoint components o f Gi, if any, should

522

Therefore, any links remaining in G i would benefit from merging onto a coalesced connection. Links removed from Gi (edges in Gi) are also removed from Ci (nodes in Ci), thus reducing Ci from a 'maximal' clique to an 'essentially useful' one, which n o w becomes a candidate for coalescing.

Microelectronics Journal, Vol. 24

Figure 4 gives an example o f reducing an individual clique that originally contained nine links. After discarding the smaller component (having S = 1, L = 2 and N > L), the larger component remains with (L = 7, N = 7 and Advantage = 1). However, that component can have three unprofitable links (and three nodes) removed by rule 3b, leaving only four links connecting four nodes. This subgraph also has advantage 1, because it contains the very links that offered the advantage to the original component. Notice in Fig. 4 how two shaded destination muxes are replaced by a single shaded source mux. One o f the original destination muxes (unshaded) remains, with the coalesced connection becoming one o f its inputs. The 'bow tie' figure t h a t remains in G i after reduction (in middle o f Fig. 4) is a familiar sight in cliques with advantage >~ 1. Every useful clique must have a cycle in its bipartite connection graph (when edges are considered undirected).

4.5 Graph reduction for bus implementation W h e n considering a bus implementation for CCs, the analysis is similar, but thc values are

somewhat different. In a bus implementation, a useful clique must satisfy the relation: B(Gi) > 0 => 0.5Si + Di < Li

(8)

Since relation (7) still holds, and all components must be positive integers, we can derive some necessary bounds on the values for the relevant variables in a useful clique: (1) IfSi = 1, then Li = Di, but these conditions are incompatible with the requirement that B(Gi) > 0. Thus Si >1 2. (2) If D, = 1, then Li = Si, and in order to have B(Gi) > 0 we must have Si ~> 3. Therefore: Di = l ~ S, >~ 3 and Li >~ 3.

(3) For reasons similar to the m u x implementation c a s e : S i ~ 2 and Di >1 2 => Li >1 4. The preceding items result in the general requirements for a useful clique: Si >~ 2;

Si -F Di = Ni >/4;

and

Li ~> 3 (9)

BEFORE 3 4

All links (edges) are compatible 1

2

3

4

5

3 4

"I

8

9

AFTER

3 6

7

8

9

L=7 N=7 A(G) = 1

4

10

k;4 N=4

A(G)= 1

Fig. 4. Clique reduction.

523

T. C. Wilson et al./Optimization of multiplexers and buses

T w o important differences between this set o f requirements and those for the m u x implementation are that: (1) here a clique is useful even if it has only one destination, provided it has three or more sources; and (2) considering separate components is a convenience rather than a necessity, because:

B(Gil)-{-...

-f- B(Gin) -~ B(Gil L_J... L_JGin)

The graph reduction algorithm for the case o f bus implementation is:

Step 1: Identify sufficiently large cliques of H (1) Construct the link compatibility graph, H. (2) R e m o v e from H any node that is met by 0 or 1 edges.

5. Linear programming solution The final and essential third step o f our solution process actually selects the final set o f cliques that (together with certain single-node cliques) will form a clique cover for H. Although a heuristic solution is described later, our preferred solution method uses integer linear programming (ILP). N o t only does an ILP formulation precisely characterize the problem, it also provides a means for obtaining an o p t i m u m solution. And in most instances o f connection coalescing, it does so as quickly as the heuristic. This section first describes the basic ILP formulation, in which only 2 × 1 muxes are counted. W e then adapt this formulation to handle the bus implementation and show h o w to extend either o f these formulations to consider arbitrary n × 1 muxes.

(4) If Ci has fewer than three nodes, discard Ci and discard any o f its nodes if they belong to no other remaining cliques.

5.1 ILP formulation (basic 2 x 1 mux version) Figure 5 shows an integer linear programming formulation o f the (2 x 1) multiplexer minimization problem. The ILP manipulates the following objects, having indices and index sets as shown in Table 2.

Step 2: Reduction of individual cliques

TABLE 2

For each Ci derived in Step 1

Object

lndices

Index set

Links Shared connections Link sources Link destinations

i andj k

L P

s(i) d(i)

S D

(3) In what remains o f H, find maximal cliques

Ci.

(1) Construct the corresponding Gi. (2) Consider its disjoint components separately. (3) For each disjoint component, G i k . . . continue the following actions until they no longer apply: • Discard Gik if Sik = 1, N~k < 4, Lia. < 3, or B(Gik) ~ 0; • R e m o v e any destination node met by a single edge. (4) Discard from H any node which belongs only to Ci, and whose corresponding edge in Gi was discarded.

524

Link i has a particular source s(i) and a destination d(i). The 0-1 matrix H indicates pairwise link compatibility. H , / = 0 ifflinks i a n d j are active during the same time slot (control step), or are otherwise incompatible. P, the set o f potential coalesced connections, will not be larger than L, and must be as large as the largest number o f mutually incompatible links. Input to the ILP consists of what remains o f H and the set o f all reduced Gis. This latter set is

Microelectronics Journal Vol. 24

Constants Qi~

=

O, 1;

Q;j = 0 ~ link i and hnk j are incompatible (active concurrently)

S o l u t i o n Variables mik

:-

O, i

zik

a,(i)k

=

0,1

a,(i)k

bk =

0,1

cka(i)

=

= 1 => link i is assigned to CC k = 1 ~ source of link i has a physical connection to CC k

bk = 1 @ CC k is included in the design

O, 1

cka(i)

= 1 :=> CC k has a connection to destination of link i

Objective Function rain

%a(;)] E as(i) k -- ~-* bk -]s(i)6S keP d(1)ED kEP kEP

Constraints

(1)

E

Vi • L :

Zik =

1

k6P

(2)

V i , j e L , k • P where Q~ = 0: xik + ~k _< 1

(3)

Vi•L,

(4)

gk•P:

k•P:

3Zlk
bk
denoted by G, where G = G1 L_J... L.J Gn; it specifies the k n o w n connection requirements o f all interesting links.

(a)

an access path leading from the link's source to the CC;

(b)

the coalesced connection itself; and

Since G is already pruned, the edges (links) that remain in G are good candidates for coalescing.

(c)

an access path from the C C to the link's destination.

The ILP assigns links in G to coalesced connections. Assigning a link to a C C requires the existence o f three physical components in the resulting design:

These components, o f course, constitute the actual connection and account for the interconnect cost. The ILP must guarantee that each link is supported by an appropriate set o f

525

T. C. Wilson et al./Optimization of multiplexers and buses

physical connections. It is useful here to extend our notion o f C C to include the trivial case of a dedicated connection from one source to one destination. Although such a path is not truly 'shared', it conforms to a zero-advantage single-edge connection graph (because A ( G i ) = L; - Ni + 1 = 1 - 2 + 1 = 0), and can be viewed as a combination o f initial, intermediate, and final portions. If we interpret such degenerate 'single-link' connections as possible CCs, the ILP assigns every link in G to some CC. Inclusion of trivial CCs does not affect the optimality o f the results. The objective function minimizes the total number o f 2 × 1 muxes, and is explained as follows: the number o f CCs actually needed is bk. The number o f connections leading from sources to shared (and single-link) connections is as(i) k. From relation (2), the number of 2 × 1 source multiplexers is Ha,(i)k -- ~ b~. ~2ck,t(i) represents the total number o f connections from shared (and single-link) CCs to destinations. From relation (1), the number of 2 × 1 multiplexers entering [D] destinations is H cka(i)- ]D]. The total is simply the sum of the source and destination multiplexers, which we hope to minimize. Because ]D] is constant, it does not appear in the objective function.

sure that each required component exists, perhaps shared by multiple links. Constraint (4) prevents a CC from being introduced, unless some link is actually assigned to it. This counteracts the tendency o f the objective function to encourage an excess number of COs. In the worst case, the ILP may have O ( L 2) variables and O(L 3) constraints. The examples we have encountered so far allow extensive graph pruning, leaving only a modest number o f constraints and variables for the ILP. Consequently, the linear programs run very fast, typically in the order of seconds. 5.2 ILP formulation for bus implementation The preceding solutions to the multiplexer minimization problem could bc used unchanged to approximately model a bus implementation. However, the ILP can be extended in a straightforward way to provide an exact solution when the coalesced connections are implemented as buses. Again, the cost to be minimized is a combination o f destination nmltiplexer area and tri-state buffer area; bus length and layout considerations are not included. The extension hinges on two points:

Each variable in the ILP must acquire a 0 or 1 value to indicate the absence or presence o f the corresponding object or mapping. The primary variable is xik, which indicates the assignment o f link i to shared path segment k. Constraint (1) requires each link to be assigned to some CC. Constraint (2) prevents links that are incompatible from being assigned to the same CC.

(1) The area o f a tri-state buffer At, differs from

W h e n a particular assignment is chosen, the appropriate path components must exist. Constraint (3) enforces this. If xi~ = 1, then the solution variables a,(i)k, bk and cka(i) are also forced to 1. Note that separate path components are N O T necessarily created for each link assigned to a CC. Constraint (3) mcrely makes

Here, we will call a shared path segment with n >~ 2 inputs a non-trivial bus. The second point suggests that the 2 x 1 mux count be increased by one for each non-trivial bus, in order to correctly count tri-state buffers. Trivial buses having n = I input still require n - 1 = 0 tristate buffers.

526

the area of a 2 × 1 multiplexer A,,. A6 ~ 0-5 A,, is a fairly good estimate [2]. (2) An n-input bus (where n >~ 2) requires n tristate buffers, whereas the number of 2 x 1 muxcs needed to combine n inputs is only n -

1.

Microelectronics Journal, Vol. 24

Define variable gk = 0, 1 where gk = 1 means that C C k is a non-trivial bus. In other words, Y]iELas(i)k~ 2 if and o n l y if gk = 1. T w o additional constraints guarantee this:

gk E P : 2gk -- ~

as(Ok ~ 0

(5)

iEL

gk E P : ~ a~(i)k--gkN

~< 1

(6)

iEL

Constraint (5) says that ifgk = 1, the n u m b e r o f physical inputs must be /> 2. Constraint (6) says that if the n u m b e r o f physical inputs exceeds 1, then gk must be 1. N is a large constant that makes constraint (6) true w h e n e v e r gk = 1. H a v i n g included gk and the associated constraints, the n u m b e r o f non-trivial buses in the design will be Hgk. Every source implies a tristate buffer exceptthose that terminate at a trivial bus, whose n u m b e r is given by: H b k - HgkThe n u m b e r o f tri-state buffers will then be:

i m p l e m e n t e d as buses (unless they are trivial buses), but muxes m a y remain at the destination terminals.

5.3 Accommodating general multiplexer implementations A l t h o u g h the n u m b e r o f 2 × 1 muxes is a useful measure o f interconnect complexity, a design m a y e m p l o y n x 1 muxes in the actual implementation. In this section, we indicate h o w either o f the preceding linear programs can be extended to correctly account for n × 1 muxes. To simply count (and minimize) the n u m b e r o f n × 1 muxes, we need to identify, w i t h a solution variable, just which input terminals and registers have t w o or m o r e i n c o m i n g segments. Define mp= 0, 1, where mp= 1 iff destination (register or input terminal) p has t w o or m o r e i n c o m i n g segments. Define y~p = 0, 1, where E,p = 1 iff a segment is required b e t w e e n source s and destination p, s E S. T w o constraints are required in this context:

gp : ~ s(i) E S kEP

kEP

kEP

The n u m b e r o f destination 2 × 1 muxes remains E Ckd(i ) - -

IDI.

Therefore, the objective function becomes:

min[Ab(~-~ag,.)k--~-~bk+ ~-~gk) ~,(i)cs key ~p \ kEe

This minimizes the total area required by tristate buffers and 2 × 1 muxes, where all C C s are

Ysp -- N * mp < 2

sCS

and

2rap<~ ~ ysp

(10)

sES

The first constraint forces me to b e c o m e 1 w h e n e v e r t w o or m o r e ys are 1; N is a 'large' value that satisfies the constraint if rnp = 1. T h e second constraint forces the sum o f ys to be >~2 w h e n e v e r mp= 1. The n u m b e r o f general muxes can then be included in the objective function as: Hp rap. Instead o f merely counting the total n u m b e r o f muxes, we could separately e n u m e r a t e the 2 × 1, 3 × 1, 4 x 1 muxes, and so on. This w o u l d allow a m o r e precise cost (area) coefficient to be included in the objective function. W e extend our previous definition, to provide a n u m b e r o f solution variables for each potential multiplexing point:

527

T. C. Wilson et al./Optimization of multiplexers and buses

m~n) =

0, 1;

a[n) = O, 1;

m~n) =

solution; the pruning is only a speedup technique, and is not strictly necessary.

1 iff destination p has precisely n incoming segments; n > O.

(n) ap = 1 iff destination p has n or

fewer incoming segments; n>O. Variables a (") indicate whether an n x 1 mux is adequate to handle the multiplexing at the input point. The a variables are constrained in the ILP as follows: /9

.

.

gn, p " ~-]~y~/9 + N * a~") > n

k and

~-'],Ykp ~ < n + ( 1 - a ~

~)) * N

(11)

/e

For any particular p, increasing values o f n have non-decreasing values for ~"). In other words, the first n for which an n x 1 m u x is adequate corresponds to the first non-zero value o f a t') in the sequence. Successive values of a~") are Pall 1. The unique transition from 0 to 1 identifies the simplest type of mux that could be used at this input point. Thus, after defining a~°) = 0,

Vn >~ 1, p"

m/') = a~,") - a("t, - ')

Although the ILP formulation usually provides a rapid solution, an ill-conditioned problem may require more time. Therefore, we have developed a heuristic solution as well. Again, there are two versions one for each implementation of coalesced connections. W e begin with the (2 x 1) mux implementation. The reduced link compatibility (H) and connection (G) graphs contain only those links whose coalescing might contributc to mux reduction. Any disjoint clique in H is instantly chosen as identifying a set of links to merge onto the same CC. However, for overlapping cliques, the choice is not so clear. The links in a clique intersection must each be mapped to only one of the intersecting cliques. Without explicitly considering cliques, the ILP approach, in effect, considers each link individually and finds the globally optimal assignment to cliques. The heuristic approach takes all the links in an intersection as a group and assigns all of them to one of the containing cliques.

(12>

If the cost of an n x 1 m u x is denoted PVn, the objective function can n o w include the following term to evaluate exactly the space required by each multiplexer type:

Note that the preliminary graph pruning assumes that 2 x 1 muxes will actually bc used. The pruning may, in some cases, eliminate certain nodes and edges in a w a y that precludes finding the true o p t i m u m solution using n x 1 muxes. If a true o p t i m u m is required, the approach is to start directly with the LP

528

6. A heuristic solution

This technique can bc illustrated with an example. Figure 6 shows t w o connection graphs, G1 and G2, each representing a clique o f mutually compatible links. Their intersection, denoted X, which is also shown, will play an essential role. Thc edges o f X represent the links in question, and all these edges will, in the end, remain in either Gt or in G2 and be remo~ed from the other. When thesc cdges are removed from a connection graph, say G,, some of Gi's nodes may become isolated. W e define Gi to bc G, without the e d g e ~ f r o m _X and with isolated nodes removed. Gl and G2 are both shown in Fig. 6. Using this example, deciding which G, should retain the links from the intersection X depends on evaluating thc advantage for each alternative:

Microelectronics Journal, Vol. 24

1

2

3

3

4

6

7

8

2

3

4

A (G2) =

A (GI) = 2

5

6

/ 2

X = GlnG

2

1

3

o, o o o ",.. i",, ./'] G t u G2

5

6 Gl

2

1

2

3

7

2

3

8

4

A (G2) = 1

A(G1)= 0 1 ~ 5

6

isolated by B's removal

6

Fig. 6. Intersecting cliques.

(1) Associate X with GI: A(G1) + A(G-7) = ( 6 - 5 + 1)

+(s-5+1)=3

which the vertex sets o f G i and G, differ by the fewest nodes. For example, in Figs. 6 and 7, G1 and G1 have the same five nodes in c o m m o n , whereas G 2 has one node that is not contained in G2. Therefore, we associate X with G1.

(2) Associate X with G2: A(G-[) + A(G2) = (4 - 5 + 1) -+-(7-6+

1)= 2

Thus, we associate X with G1. The original point-to-point interconnection, corresponding directly to G1 tO G2, uses seven 2 x 1 (equivalent) muxes and is shown in Fig. 7a. The result o f merging both G1 and G2 onto CCs uses four muxes and appears in Fig. 7b. The reduction o f three muxes is predicted by A(G1) + A(-C2). The intersection set X should always be associated with that connection graph Gi for

Let X, represent the intersection o f n cliques in H, and assume that all n cliques are mutually disjoint after removal o f X,,. This assumption is always true for the case o f n = 2 cliques, as shown in Fig. 7, but is not always true for larger n. The assumption g u a r a n t e e s t h a t the corresponding connection graphs Gi are all edgedisjoint. Their combined advantage function is the sum o f their individual values, and represents the baseline value which is to be increased bb_y incorporating the edges o f Xn into one o f the Gi. Thus, we seek that i which maximizes this increase, i.e. which maximizes the value o f A ( G i ) - A(G-7). Define L, and ~ to be the

529

7. C. Wilson et al./Optimization of multiplexers and buses

1

3

2

5

4

7

6

8

(a) "Natural", point-to-point configuration (7 muxes)

(

5

2

3

4 ,

6

7

)

3 8

(b) Two level configuration with two CCs (4 muxes) Note how 6 accesses 2 and 3 via G !

Fig. 7. Implementation using merged cliques. edges and nodes, respectively, that belong to Gi. Then A(G,) - A(Gi) = ( L , - N~ + 1)

-(Li-Niq= (L,-

1)

F,.) - ( N , -

However, (Li - Lii) is the same for all values of/; it is simply the n u m b e r of edges in X,,. Hence, m a x i m u m advantage is gained by associating X,, with that i for which (Ni - - ~ i ) is minimized. A similar argument for a bus implementation suggests associating X, with that i which minimizes the function:

2 * ( D i - Dii) - ( S , - Si) In general, several cliques/connection graphs often intersect. W e begin at the innermost intersection and work outward, level-by-level, until all the links involved have been partitioned into disjoint cliques. At each step, we assign thc innermost intersection to one of its containing cliques which provides m a x i m u m advantage; this advantage is computed for each surrounding clique independently, as if it did not overlap any of the other candidate cliques. Although the disjoint clique assumption is not true in general, the heuristic rule is simple to apply and gives very good results in practice.

530

Results

The ILP and the heuristic solution have been implemented in U N I X / C as alternative final stages in a more comprehensive behavioral synthesis system developed by the authors. This optimization package is labelled M i n M u x (for m i n i m u m multiplexors, although bus optimization is also performed). Table 3 shows the effect of thcse two solution techniques. The '5-filter' is the well-known fifth-order elliptical wave filter benchmark [13]. The version used is thc 19-step schedule having two adders and a pipelined multiplier that appeared in HAL [13]. The 'Bandpass' filter example appeared in rcf. 16]; it uses 8 control steps, 12 registers and 5 functional units. 'Filter-X' is a practical digital filter example supplied by an industrial source; it has 34 control steps, 11 registers and 8 functional units. 'Poisson' is our o w n design example that gencratcs an arbitrary value from the Poisson distribution; it has 18 control steps, 16 registers and 3 functional units. The column labelled 2xl muxes before M i n M u x in Tablc 3 rcprescnts the result of considerablc interconnect optimization already, specifically, commutative operand flipping and very careful binding of variables to registers. The indicated numbers of 2 × 1 muxes are probably the m i n i m u m numbers attainable with the given schedules and operator bindings, when limited

Microelectronics Journal, Vol. 24

TABLE 3 Example

Effect o f link merging 2 x 1 muxes before

After M i n M u x ILP

MinMux

5-filter Bandpass Filter-X Poisson

20 20 22 25

Heuristics

2 x 1 muxes

Percent reduction

CPU (s) ~

2 x 1 muxes

Percent reduction

CPU (s) a

18 19 17 19

10 5 23 24

0.34 0.22 4.23 1-73

18 19 18 20

10 5 18 20

0.16 0-10 0.35 0.56

a O n a S U N S P A R C s t a t i o n 2.

to native destination multiplexing. The other columns in Table 3 illustrate the additional reduction in 2 x 1 muxes, made possible by including source multiplexing - the improvement due to the methods presented in this paper. N o t e that the heuristic technique does almost as well as the ILP, and that the ILP running times are not excessive.

Table 4 compares our final results with those o f other systems for the t w o benchmark examples from the literature. For the fifth-order elliptical wave filter (5-filter) example, our 'before M i n M u x ' solution, using a single-level o f multiplexing, is as good as HAL's solution [12] which also uses source multiplexing. Both our ILP and the heuristic approach were able to remove t w o more multiplexers. The result from ELF [8] is the best-known previously. For the TABLE 4

Comparison o f link merging

Example

System

Reference

Equivalent 2 × 1 muxes

5-filter

MABAL HAL ELF

[7] [12] [81

22 ~ 20 19

MinMux

--

18

ADPS MinMux

[6] --

21 19

Bandpass filter

"Using 10 registers and no pipelined units.

bandpass filter example, M i n M u x results in 19 2 x 1 muxes, compared to 21 from A D P S [6].

8. Conclusion The problem considered in this paper is the coalescing o f links having k n o w n connection requirements on to a number o f shared path segments in such a w a y as to minimize the number o f equivalent 2 x 1 muxes or the number o f tri-state buffers if using a bus implementation. An extension is outlined for minimizing and evaluating the cost o f general n x 1 multiplexers. The problem is considered in its global form - as finding disjoint cliques o f the link compatibility graph, each o f whose members will all be coalesced onto a shared path segment associated with that clique. W e have developed an exact 'advantage function' to predict the effect in 2 x 1 m u x reduction for any candidate clique. W h e n the cliques are disjoint, their individual advantage functions sum to the overall reduction in muxes that may be achieved with that collection o f cliques. This enables identification o f a globally o p t i m u m solution. Note, however, that our solution considers just one level o f connection sharing, e.g. no sharing o f connections that lead to a bus. Neither does it seek paths that involve use o f idle functional units. The preliminary solution technique uses integer linear programming. The ILP expresses the

531

7-. C. Wilson et al./Optimization of multiplexers and buses

optimization problem precisely and directly, without explicit reference to cliques. A faster, alternative heuristic procedure explicitly constructs a set o f disjoint cliques, basing its decisions on our advantage function. Both methods benefit from a comprehensive preliminary reduction o f the link compatibility graph, in which links with no potential for contribution to m u x reduction are removed. This graph pruning is based on several criteria that a 'useful', contributing clique must satisfy (e.g. having a m i n i m u m n u m b e r o f compatible links). The pruning is usually so extensive that even the ILP solution executes rapidly. If a shared path segment carries more than one logical link, it could also be implemented by an n x 1 multiplexer or by a multiple input bus. We provide an extended ILP formulation to minimize area from path-merging components when buses are used to implement the shared path segments. This w o r k is a global solution approach that is not based on pairwise merging o f paths or reusing portions of existing paths. By taking a fresh and abstract view o f the problem, it provides both a theory o f multiplexer minimization and two practical techniques, one of which guarantees the m i n i m u m area for data steering components that can be achieved by replacing (some) destination muxes with source muxes, or alternatively with tri-state buffers and buses. Whatever implementation may be used, this technique can significantly reduce the interconnection complexity of a design.

[2] [31 i4]

[5]

[6]

[7]

[8]

[9] [10]

[111 [12] [13]

References [1] N.-S. Woo, A global, dynamic register allocation

532

and binding for a data path synthesis system, Proc. 27th ACM/IEEE Design Automation Conf., 1990, 505510. A. Mukherjee, Introduction to nMOS & C M O S VLSI Systems Design, Prentice Hall, NJ, 1986. J. Midwinter, Improving interconnectfor the behavioral synthesis of ASICs, Master's thesis, Carleton University, Ontario, Canada, 1988. P. G. Paulin, J. p. Knight and E. F. Girczyc, HAL: A multi-paradigm approach to automatic data path synthesis, Proc. 23rd ACM/IEEE Design Automation Conf., 1986, 263-270. C.-Y. Huang, Y.-S. Chen, Y.-L. Lin and Y.-C. Hsu, Data path allocation based on bipartite weighted matching, Proc. 27th ACM/IEEE Design Automation Conf., 1990, 499 504. C. A. Papachristou and H. Konuk, A linear program driven scheduling and allocation method followed by an interconnect optimization algorithm, Proc. 27th ACM/IEEE Des(gn Automation Conf., 1990, 77-83. K. K/.ictikcakar and A. C. Parker, MABAL: A software package for module and bus allocation, Intl. Journal of Computer Aided VLSI Design, 2 (1990) 419-436. T. A. Ly, W. L. Elwood and E. F. Girczyc, A generalized interconnect model for data path synthesis, Proc. 27th ACM/IEEE Deszg,n Automation Conf., 1990, 168-173. B. M. Pangrle, SPLICER: A heuristic approach to connectivity binding, Proc. 25th ACM/IEEE Design Automation Conf., 1988, 536 541. H. Sekigawa, Y. Nakamura, K. Oguri, A. Nagoya and M. Yukishita, Multiplexor assignment after scheduling and allocation steps, Proc. 6th Intl. Workshop on High-Level Synthesis, 1992, 410 417. C-J Tseng, Automated Synthesis of Data Paths in Digital Systems, PhD thesis, Carnegie-Mellon University, 1984. P. G. Paulin, Scheduling and binding algorithms for high-level synthesis, Proc. 26th ACM/IEEE Design Automation Confi, 1989, 1 6. P. G. Paulin, High-Level Synthesis of Digital Circuits Using Global Scheduling and Binding Algorithms, PhD thesis, Carleton University, Ontario, Canada, 1988.