Competitive neural architecture for hardware solution to the assignment problem

Competitive neural architecture for hardware solution to the assignment problem

oxY3-6oxo/Yl $3.00 + .ou Copyright ~1 IYYI Pcrgamon Press plc Nrurul Nrrworks. Vol. 4. pp. 431-442. 1991 Prmtcd in the USA. All rights rcscrvcd. INV...

1MB Sizes 43 Downloads 91 Views

oxY3-6oxo/Yl $3.00 + .ou Copyright ~1 IYYI Pcrgamon Press plc

Nrurul Nrrworks. Vol. 4. pp. 431-442. 1991 Prmtcd in the USA. All rights rcscrvcd.

INVITED

ARTICLE

Competitive Neural Architecture for Hardware Solution to the Assignment Problem S. P.EBERHARDT,T.

T.

A. KERNS-~, X BROWNANDA.P.

DAUD,D.

California Institute of (KeceiL&

IS

THAKOOR

Technology

Murch

1991)

Abstract-A neural network architecture for competitive assignment is presented. with details of a t’ery large scale integration (VLSI) design and characterization of critical circuits fabricated in complementary metal-oxide semiconductor (CMOS). The assignment problem requires that elements of two sets (e.g., resources and consumers) be ussociated with each other such as to minimize the totul cost of the associations. Unlike previous neural implementations. ussociation costs are applied locally to processing units (PUS, i.e., neurons). reducing connectivity to VLSI-computible O(number sf’ PUS). Also, each element in either set may be independently programmed to associate with one, several, or a range of elements of the other set. A novel method of “hysteretic annealing.” effected by gradually increasing positive feedback within each PU, was developed and compared in .simulations to mean-field annealing implemented by increasing PU gain over time. The simulations (to size

64 x 64) consistentl>>found optimal or near-optimal solutions. with settling times of about 150 microseconds, except ,for a few variable-gain annealing trials that exhibited oscillation. Keywords-Assignment VLSI.

problem,

Neural

network,

1. INTRODUCTION Assignment is a class of global optimization problems that poses a computational bottleneck in a number of computation-intensive real-time systems. In the assignment paradigm, members of one set {R,: 1 5 i 5 M} (e.g., resources) are to be assigned to members of another set {C, : 1 5 j 5 N} (e.g., consumers). An assignment is denoted by a binary matrix X = {x,,}, where x,, = i if R, is associated with C, and Xi, = 0 otherwise. The total cost of a selected assignment is given by

+ D.

A.

Kerns‘

Systems. California

current

address is Computation

Institute

of Technology,

Acknowledgements: ful discussions.

research described

We thank Fernando

Raoul Tawel

and

California

Institute

the Strategic Defense

Office.

Advanced

Technology,

of Technology.

Defense

and Technology

Initiative

for reprints

Space Microelectronics California

Institute

assistance.

Jet Propulsion

and was jointly sponsored

Projects Agency.

of Technology.

by

Science and

through an agree-

and Space Administration.

should be sent to T. Daud, Technology.

for

Laboratory.

Organization/Innovative

Aeronautics

The

by the Center

Joint Tactical Fusion Program Office,

Research

ment with the National Requests

Pincda for many use-

for technical

in this paper was performed

Space Microelectronics

and Neural

Pasadena CA Y 1115.

Jet Propulsion Pasadena.

Center

for

Laboratory.

CA 91109.

Competition,

Neuroprocessor.

Hysteresis.

Annealing,

where (W,,) is the pairwise assignment cost matrix. Assignments are not arbitrary; “blocking” constraints specifying the number of elements of {C,} that may be associated with each element R,, and the number of elements of {R,} that may be associated with each C,. are given as well. While one-toone blocking constraints are most commonly encountered, some applications require one-to-several or several-to-several associations. Finally, the assignment problem: Choose an assignment X that minimizes COSTTO.l-AI (X) and satisfies the blocking constraints. The complexity of the assignment problem may be understood by considering that for sets of N and M elements, the number of possible solutions satisfying one-to-one (and in some cases one-to-zero) blocking constraints is N! /( N - M)!. assuming N 2 M. Because the problem requires a global solution, it cannot be decomposed without risk of losing the optimal solution. Nevertheless, when fast solutions are required, the problem is generally solved either by decomposition or limited best-first search. Search algorithms that find optimal solutions include Munkres’ method (Munkres 1957; Blackman, 1986), which has a computational complexity of O(MN’). While more efficient than exhaustive enumeration, such algorithms remain computation-intensive, requiring seconds or minutes of processing on powerful

computers. These methods are therefore not suitable for many real-time applications. Furthermore, for problems with one-to-several and several-to-several blocking constraints, the number of possible solutions can be considerably larger, and to our knowledge no efficient search algorithms exist. A number of application areas that fit into this assignment paradigm are detailed in the next few paragraphs. In the military domain, an obvious and time-honored application is weapon-target association. Of particular importance are intelligent missiles, which have attained a position of primary importance among munitions in today’s battlefield arenas. In many cases, the time between threat detection and countermeasure deployment is on the order of seconds to minutes, so the automated detection and decision-making components of a defense system must be highly optimized for speed, especially if humans are in the loop as well. In an idealized situation, the various threats are identified, and the most appropriate countermeasures applied. Indeed, it may be necessary to allocate several disparate countermeasures to the more serious threats, or perhaps in some cases several threats may be dealt with by one defensive system. Consequently, the traditional one-to-one assignment paradigm is no longer adequate. Another application is radar tracking of aircraft (e.g., at large airports) or missiles (e.g., intercontinental ballistic missile tracking as a component of Strategic Defense Initiative systems). In order to track individual objects from a swarm of tens or hundreds, “blips” from successive radar sweeps must be matched. An alternative approach is to use Kalman filters to estimate the bearing and velocity of the individual objects, and to assign the instantaneous estimated positions to the actual positions determined by radar (Blackman, 1986). Finally, an application that is becoming increasingly important as computer and national telecommunication networks convert to the transmission of information (e.g., audio, fax, video) by discrete units of digital data, is high-speed network packet switching. Within a communications network, the function of a switching node can be viewed as an assignment of node inputs to node outputs. The blocking constraints require that at any one time each input is assigned to at most one output and vice versa. The switch may have additional constraints if there is no direct path from each input to each output. The cost of assigning an input to an output depends on whether or not the input has data waiting for that output. Cumulative latency time and packet priority may need to be taken into account as well. The assignment paradigm developed in this paper is compatible with a switching network having maximal throughput and near-optimal delay (Brown & Liu, 1990).

From these applications. the need tc)r ;I i’ast ~I~CJ versatile method for solving assignment problems iq obvious. One means for overcoming the limitation* in computation speed imposed by senal digital computers is parallel processing, whether digital or XIalog. In this paper, we present a method for solving assignment problems based on massively parallel analog computation. In the following sections. we rcview work on implementing assignrncnt problems on neural network simulations, then prcscnt :I novel architecture derived from neural riiodels that is compatible with hardware implementation of large assignment problems. To ensure i~mtrolled settling of the architecture. a novel ItNditJLi for anncahng was developed that involves increasing over time the level of positive feedback within processing units (PUS) (i.e., neurons). Finally, det
NETWORK APPROACHES TO ASSIGNMENT

To date, the fully-connected feedback (additive) neural network architecture (Cohen & Grossberg.. 1983, Hopfield. 1984) has been the primary vehicle for neural implementations of global optimization problems such as the travelling salesman problem (Hopfield & Tank, 1985), scheduling (Foo & Takefuji, 19Sr(), generalized assignment (Fang & Li 1990b), quadratic assignment (Li & Fang, tY90), bipartite matching (Goldstein, Toomarian 6t Barhen, 1988), and load balancing on parallel computers (Barhen, Toomarian & Protopopescu. 1987). With few exceptions. applications have been mapped onto the additive networks by first formulating a Lyapunov function that has energy minima which correspond to the desired solutions. Typically, this “energy equation” is composed of simple quadratic terms each of which is minimized when a particular application constraint is satisfied. From this equation, the interconnection weight matrix is then derived by a simple transformation. An alternative approach is to use heuristics to implement required constraints directly (Fang 81 Li. 1990). The classical one-to-one assignment problem may be mapped onto an additive architecture (Figure 1A) by allocating a matrix of processing units, such that each set element Ri corresponds to one row in the array, and each C, to one column (Figure 1B). Every PU thus represents a unique pairing of one element of {Ri} with one element of {Cj>, or equivalently, one entry of the assignment matrix X. The system must be configured such that only as many PUS can be active in each row or column as allowed by the blocking constraints, and association costs must be incor-

Conyetitive

Neurul Architecture

SET #2 ELEMENTS

(4

(B)

FIGURE 1. (A) Schematic representation of a fully-connected additive network. The connection weight values W, determine the function of the network. (B) Implementation of an assignment problem using the additive network. Each row corresponds to a member of the one set, and each column a member of the other set. Note that the connection matrix (not shown) is a three-dimensional structure from a circuit implementation standpoint.

such that as the system settles, or relaxes. the better solutions have a competitive advantage over the poorer. When the PUS, which apply a nonlinear saturating function to the sum of input signals, are allowed to take on an analog range of values, as would generally be the case in hardware implementations. the solution space takes the form of an MN-dimensional hypercube, the corners of which correspond to the assignment decisions. To form complete, unambiguous solutions X. all PUS must relax to saturated “on” or “off” states. An additive network was used by Tagliarini and Page (1988) for solving the concentrator assignment problem. a one-to-several assignment problem in which multiple elements of the one set (named “sites”) are matched with each element of the other set (named “concentrators”). One application of this problem is selecting how goods should be distributed from warehouses to a chain of retail outlets. such that transport distance is minimized. Capacities of the individual concentrators were programmed by allocating k, additional ‘slack” neuron rows to the basic assignment matrix. The energy function used

porated

W;1S

where V,, gives PU activation, and k, gives the maximum number of elements supported by concentrator j. From this Lyapunov function, the required weights for satisfying the blocking and PU on/off constraints were derived. The association costs were heuristically added to the resulting interconnection matrix. In a series of 100 trials of a small 5 x 12 problem,

the network never found the optimal solution, but all solutions were within 0.3% of optimal, and most within 0.01%. The authors concluded that the network effectively rejected poor solutions. While solutions of this quality are quite satisfactory for most applications, the problem size was very small. Of concern is whether more realistic problem sizes would result in solutions as good. Also. consider the connectivity complexity for this architecture. If k,,,, is the largest expected k,, then the number of connections is O(M’(M + k,,;,,)z). Such extensive connectivity would pose serious difficulties for hardware implementations. The method of “simulated annealing” (Kirkpatrick, Gelatt, & Vecchi. 19X3), is one means to improve the quality of solutions found by such competitive optimization architectures. Traditionally, annealing involves injecting uncorrelated noise into each PU and gradually reducing the level of the noise according to an annealing schedule. This has the effect of exciting the system out of local minima in the energy surface and into the deeper minima corresponding to better solutions. An alternative method, mean-field annealing (Bilbro, Mann, Miller. Snyder, Van den Bout, &_White, 1989). is especially appealing for hardware implementations. It can be implemented by gradually increasing the slope of the saturating PU function in the presence of a small level of fixed noise. That annealing can significantly improve the solutions found by competitive optimization architectures was noted by Peng & Reggia (19X9), who developed a network for diagnostic problem solving. Initially. a method for “resettling” the network was used in which multiple production operations were performed in order to reduce the probability of obtaining a poor solution. With the incorporation of simulated annealing, the network, without resettling. produced quite good solutions to small-scale problems.

Annealing was incorporated into two hardware implementations of assignment problems. A 7 x 7 one-to-one assignment problem (Moopenn, Duong. & Thakoor, 1990) was solved using cascadable neural network “building-block” synaptic (interconnection) chips. Four 32 x 32 crossbar matrix synapse chips were combined with discrete neuron circuits to construct a 64 x 64 fully-connected additive network. When the one-to-one assignment hardware was applied to 100 randomly generated cost matrices. the system in all cases found solutions that were within 1% of optimal, and optima1 solutions were found in 40% of the cases. Annealing times of about 10 neuron time constants, or 500 microseconds, were used. The hardware was also used to implement a 3 x 9 concentrator assignment problem (Moopenn Br Thakoor, 1989). Unpublished results indicated that simulated annealing was necessary in order to obtain a good solution.

3. NEUROPROCESSOR

ARCHITECTURE

From a circuit implementation standpoint, the additive network for assignment requires a two-dimensional matrix of PUS (see Figure lB), and therefore the connection matrix requires a third dimension. As current VLSI technology is essentially two-dimensional, only small additive networks may be implemented on standard die sizes, and even nonstandard technologies such as wafer-scale implementations and chip stacking may limit N and M to a hundred or so. Larger problem sizes require a reformulation of the network mode1 to reduce connectivity. The modified architecture presented in this section has only a few times more connections than PUS, and is compatible with VLSI implementation. While the same general mode1 was used, with one PU representing each possible assignment pairing, (C,, R, ). alternative means were heuristically developed for applying association costs, satisfying blocking constraints, forcing PUS to the hypercube corners, and implementing annealing. A constraint on the number of elements of {R,) that may be associated with C, is equivalent to a constraint on the number of activated PUS in column j of the PU matrix. Similarly, constraints on the number of elements of {Cj> associated with R, specify how many PUS may be on in row i. Focusing on a single row, a circuit that forces only one neuron to turn on-thus enforcing one-to-one assignment-is the winner-takes-all (WTA) (Carpenter & Grossberg, 1987). The WTA, via mutual inhibition, allows only the PU with the maximum input excitation to become activated; all other PUS are turned off. The WTA may be implemented with O(N) connectivity (Lazzaro, Ryckebusch, Mahowald, & Mead, 1989). The WTA model has been extended to allow multiple

winners in the k-WTA formulation oi Majanl. IXlanson, and Abu-Mostafa (1989). This model aist; can be implemented using O(N) connections. Wh& the k-WTA would allow each R; :utd C, IO IX ci~~~j. ciated with a programmable number ,J” or k’ ot elements of the other set, we have generali& the method one step further to allou clement5 10 bc independently associated with a raugc. or windocv. of elements of the other set. The resulting function. the k.m-WTA. uses the variable pair (A. m) tct specify the minimum and maximum numhcr of associated elements. By setting k -z- in. the, k-WTA <,nc-toseveral association function IS obtained. and by setting k - IN -= 1, one-to-one association ina> be specified. Each row and each column is connected using the k.m-WTA. As will be shown, this also can be constructed with O(N) connectivity. Using O(N) connections for each of the ,iM rows and O(M) connections for each of the N columns, ail of’ the blocking constraints can be implemented with just O(NM) connections. The question of whether these multiply overlapping WTA circuits can still function correctly was addressed by Brown (Brown & Liu, 1990: Brown, 1991), who found that WTA and kWTA circuits still function correctly when overlapped arbitrarily. Note, however. that it is possible to specify inconsistent blocking constraints which cannot all be satisfied simultaneously. Letting (k/ 1 rnp ) and (k: , rn: ) be the row and column constraints respectively, a necessary condition for consistencv is

In an additive network, the association costs are distributed throughout the interconnection matrix. A key to the present assignment architecture is that association costs are subtracted locally from the input PUS representing the assignments (Moopen, Thakoor, & Duong. 1988). This tends to inhibit PUS with a high association cost while pushing PUS with low energy costs towards activation. Consequently. lowcost assignment configurations arc favored by the k.m-WTA circuits, By storing the costs in this manner, the need for the fully-connected feedback matrix is obviated. and a processor can be designed with connections only along rows and columns (to implement the k.m-WTA) and a few global connections to all PUS. A block diagram of such a low-connectivity 4 X 4 assignment processor architecture is given in Figure 2. Each PU is paired with an association-cost memory (designated by “W”) that provides inhibition proportional to stored cost. Along each row and column is a k,m-WTA that compares the sum of PU activations (measured via the “sum” tine) to the programmed min and max (i.e.. (k, mjj window limits.

Competitive

Newul

Architecture

SET

#2

ELEMENTS

FIGURE 2. Block diagram of the two-dimensional assignment processor. Association costs W are applied directly to PU inputs, as thresholds, and k,m-Winner-Take-All (k,m-WTA) circuits are used to oversee each row and column in order to implement blocking constraints. Blocking constraints, which can be independently programmed for each element, may be unit value (for one-to-one element association), multiple, or a range of values (so that a row element may be matched to 35 column elements, for example).

If the PU activation sum is outside of this blocking constraint window, the excitation level applied in tandem to all PUS in the group is increased or decreased as required to bring activations into compliance. Note that as long as the blocking constraints are satisfied. the excitation level returned to the PUS remains constant. Both the PU and k,m-WTA may be modeled as integrators. since analog circuitry implementing these functions is limited in bandwidth. However, while PU bandwidth should be large, to maximize computation speed, the k,m-WTA integration time constant must be explicitly limited to avoid the possibility of sustained oscillation. The PU may be modeled by d v., r,, dr = - v,, + f’( -aw,,

+ PR, + ;‘c, + VS,,)

g(sum. min. max) =

max - wm 0 ( min - sum

sum > max min 5 sum 5 max min > bum (5)

This function returns a negative value when the max window limit is exceeded, and a positive value when sum is less than the lower window bound. This function can be used as a basis for the k.m-WTA, which can be formulated as

(3)

where V,, is the PU output, W,, is the association cost. K, and C, are the excitation signals returned from the row and column k,m-WTAs, S,, is a time-varying random noise value, r,, is the time constant of the PU. and a. fi. ;I and ‘1 are fixed gain factors. The nonlinear activation function f() is given by .f(x-) = 112 (1 + tanh(x/T))

tor (MOS) transistor pairs naturally provide a very close approximation. A windowing function must be selected for the k.m-WTA. One simple form is

(4)

where T is the temperature, or gain, of the activation function. The function f() is easily implemented in CMOS VLSI; differential metal-oxide semiconduc-

for rows and

for columns, where r, is the time constant of the k,m-WTA. At this point. the association cost and blocking constraint requirements have been addressed, but a means must still be specified for performing annealing and forcing PUS into fully on or off configurations. While the WTA function is generally

designed to perform the latter function, our k,mWTA implementation does not, since the PU activations are sensed over a single connection. Consequently the k,m-WTA circuit cannot differentiate between a few fully-active or many partially-active PUS. Instead, the on/off constraint requirement can be satisfied by the two mean-field annealing mechanisms presented below. In hardware implementations. annealing is required not only to avoid local minima, but also to provide a mechanism for orderly settling of the network. That is, we avoid the need for switching circuitry to restrain the network from settling before all the costs and constraints have been programmed. Additionally, annealing allows explicit control over the settling schedule, thus making the settling process much less sensitive to variations in the time constants of different PU and k,m-WTA circuits. Smith and Portmann (1989) have shown that variations in delays of additive networks may render the networks unstable and oscillatory. As mentioned previously, mean-field annealing can be implemented by gradually decreasing PU temperature T (i.e., increasing gain or activation function slope, see eqn (4)). The assignment processor would operate as follows: Initially, with a small gain. most PUS would be partially on, with activation levels reflecting association costs and row and column k,m-WTA excitations. As gains are increased. PUS will increasingly tend to turn on or off, thus in a sense pruning the solution space, until only PUS competing for the best solutions remain partially active. As the gain is increased to very high levels. even these competing PUS will essentially turn on or off. resulting in a final solution decision. Also, as PUS increasingly activate or deactivate. the k,m-WTA blocking decisions become increasingly more precise, until at maximum gain the decision error is minimal. The major disadvantage of this mean-field annealing implementation is that PUS partaking in competing near-optimal solution configurations will have their activation levels near or on the high-gain region of the PU threshold function, so that small levels of noise in the system may cause these PUS to change state. What is missing is a robust mechanism to ensure that state configurations remain locked in. Consequently, a modified form of mean-field annealing was developed for this application, which we call hysteretic annealing. Investigators have observed that incorporating fixed hysteresis into the PUS of an additive network can improve the settling dynamics. Energy function troughs corresponding to solutions are widened (Yanai & Sawada, 1990), and settling dynamics are less sensitive to hardware-imposed variations in PU time constants, which could otherwise lead to effects such as sustained oscillation (Smith & Portmann, 1989).

Hysteretic annealing is implemcntcd by appiving an increasing level of positive fceJb;ick from i.ach PUS output back into its input. The feedback rcsuits in a hysteretic effect that pushes the PC! toward\ the “on” state if its output is above :I ;iicci thr&oiti level. and towards the “off“ ht;nix I:lthcrwisc. This establishes an energy barrier that 111uktbc overcome if the PU is to change state. Once the ~~arricr cscecds the maximum excitation possible at rho’ PU input. IX: combination of input values can I;IUSC a PU state change. Thus, the relaxed state is ~utrarttecd to be completely stable. If the PU gain I< .>ct to a fixed, low value. then the initial network \;tatc is near the center of the hypercube, and incrcaGng the positive feedback forces each PU towards
Important additional advantages of hysteretic annealing are that by controlling the annealing schedule, a maximum settling time can be guaranteed, and that the tradeoff between settling time and goodness of solution can be explicitly controlled. These factors are important in many applications where a solution is required in a fixed time interval. As the settling time is forced to ever shorter intervals, however, it is expected that obtained solutions will be increasingly suboptimal.

4. SIMULATION To evaluate the performance of the proposed architecture as a function of problem size, a series of simulations were carried out with problem~sizes ranging from 4 x 4 to 64 x 64. The processor was implemented by integrating the equations af motion for the PUs and k,m-WTA circuits (eqs (3)-(8)) using the fourth-order Runge-Kutts method (Press, Flannery, Teukolsky, & Vetterling, 1988). Both the variable-gain and hysteretic annealing techniques were simulated. One-to-one blocking constraints were used to allow comparison with the optimal solution, as determined by Munkres alogrithm (Munkres, 1957; Blackman. 1986).

C‘ot~~petitive Ncuml

4.37

Architecture

was used for the vari-

An exponential schedule able-gain annealing: 7‘(t) = 7‘exp where r,, controlled quadratic schedule nealing:

(-t/T<,),

(9)

the speed of annealing. and a was used for the hysteretic an-

m(t)

=

(t/T<,)‘.

(I(J)

The quadratic schedule was chosen so that the hysteretic energy barrier would build up slowly in initial annealing stages. In order to limit the bandwidth of the annealing noise. and to increase computation speed. noise values S,, were only updated every 10 simulation passes. Ten trials were run for each of 16 problem sizes ranging from 4 x 4 to 64 x 64. Association cost matrices were generated randomly, with costs distributed uniformly over the interval [O.OO,lO.OO]. The same cost matrices were used to evaluate both annealing methods. Values of the multiplicative parameters. gains. and time constants used are given in Table I. For hysteretic annealing, one set of parameters served for all problem sizes, whereas for variable-gain annealing, parameters were much more difficult to select, and two sets were required. Network settling times for variable-gain annealing trials are shown in Figure 3a, and for hysteretic anncaling in Figure 3b. In both cases. the maximum observed settling time was about 175 1~s. with much shorter times for most of the smaller problems. The differences in settling times and large variance of the variable-gain results reflects the differences in annealing mechanisms and schedules. With an exponential schedule. the variable-gain system quickly

evolved to relatively high gains, allowing the system in some cases to settle to a solution much more quickly than possible with the quadratic hysteretic schedule and mechanism. On the other hand, in cases where multiple near-optimal solutions existed, the variable-gain system tended to oscillate between these solutions, extending settling times. In contrast. the hysteretic annealing results are much more consistent because once the hysteretic energy barrier overcame the capability of k,m-WTAs to change PU states. convergence was forced. In a number of cases, the variable-gain simulations did not settle. The simulation was halted whenever any PU changed state 200 times, with the assumption that the system was oscillating in a sustained manner. It was found that in each of these oscillatory trials. the system would alternate between the optimal solution and one or more additional solutions with costs very close to optimal. When the annealing schedule was extended. however. many of the oscillatory trials converged. For the 64 x 64 problem, for example. extending the annealing time constant from 40 to 100 /ls reduced the number of oscillatory trials from seven to three. The quality of the solutions is given in Table 2 as a percentage of the optimal solution. In all variablegain trials that converged, the simulation found the optimal solution. For hysteretic annealing. a few suboptimal solutions were found. mostly for the larger problems. However. the worst solution was a mere 0.51% from optimal. These remarkable results suggest not only that this architecture can be expected to produce near-optimal results for problems to 64 X 64, but that good solutions might be expected from architectures significantly larger than 64 x 64.

TABLE 1

Circuit Parameters Used for Variable-Gain and Hysteretic Annealing Simulation Runs. For Variable Gain, Two Sets Were Used, One for Smaller Matrix Sizes and One for the Larger Sizes

Parameters Neuron time constant (5,) k,m-WTA time constant (5,) Annealing time constant (5,) Cost gain (0) Row gain (J) Column gain (;b) Noise level (ye) Initial temperature (To) Time increment (its) Maximum neuron reversals for oscillations Neuron settling criteria: Minimum Maximum

Variable Gain Annealing Small Size Matrices (516 x 16) Large Size Matrices 1

5 IO 20 35 35.2 0.2 6 0.025 200

0.1 0.9

Hysteretic Annealing

10 40 40 a5 85.2 0.2 3 0.2 200

1 10 40 40 a5 85.2 0.2 5 0.2 200

0.2 0.8

0.2 0.8

1

TABCE 2 Goodness of Solutions From Simulations (Results are in Percent Optlmal) .--___

Size

4x4 8X8

0

a

16

24

32

40

Matrix

Size

48

56

64

12 16 20 24 28 32 36 40 44 48 52 56 60 64 -

x x x x x x x x x x x x x x

12 16 20 24 28 32 36 40 44 48 52 56 60 64

Mean Field --__ Hysteretic Annealing Annealing Mean Worst Mean Worst ------_.___ 100.00

100.00

?00.00

10u.00

100.00 100.00 100.00

100.00

100.00

100.00

100.00

100.00 100.00

100.00 100.00

100.00

100.00

100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00

100.00 100.00 1 osc. 1 Osc. 3 osc.

100.00 : 00.00 100.00 100.00 99.99

100.00

100.00

1 osc. 2 osc. 3 osc. 3 osc. 7 osc.

100.00 lQO.00 99.89 99.92 99.92

100.00 100.00 100.00 100.00 100.00 100.00 100.00 99.93 100.00 100.00 100.00 99.49 99.53 99.56

Overall Mean

99.98

99.90 --.-.

with standard CMOS technology. 75

50

25

0

0

a

16

24

32 Matrix

-

MAX TIME

---c

AVERG TIME

-

MIN TIME

40

48

56

64

Sire

FIGURE 3. Simu&ted settling times for (A) vartabfe-gain anneallng, and (B) hyataratic anneatlng. Ten trials wwe run for each probkm size, with unlformfy ra&m cost m&&s. The three curves are minimum, average, and maxtmum settling times based on a 1.0 pa PU time constant.

5. HARDWARE In order to implement

IMPLEMENTATION

the assignment processor in analog CMOS VLSI technology, the assignment costs, as well as the blocking constraints window vaiues. must be programmed explicitly as analog values. Thus, one analog memory circuit is required for each PU, and two per k,m-WTA. Only a few techniques have been developed for storing analog weights using standard CMOS VLSI technology, including floating-gate devices (Kerns, Tanner, Sivilotti, & Luo, 1991)) digital memory with digital-to-anaIog conversion {Moopenn et al., 1990), and on-chip capacitors that are charged to a potential corresponding to the desired cost (Eberhardt, Duong, & Thakoor, 1989). The last approach gives the best tradeoff between memory size, programming time, and compatibility

At room temperature, CMOS capacitor discharge rates are on the order of tens of millivolts per second (Kub, Moon & Modolo, 19Yl: Eberhardt, Duong. & Thakoor. 1989b) for capacitor values under a picofarad. Assuming maximum annealing schedules of hundreds of microseconds, it is anticipated that such memory storage capacitors will retain sufficient charge to carry out the complete assignment problem without refreshing the charges. A complete circuit diagram of the PU is given in Figure 4. Block 1 comprises an extension of-the dual difference amplifier (Sackinger & Guggenbuhl, 1987) for scaling and summing the four input terms

where W, is the potential stored on the capacitors of the sample-and-hold in Block 2, K, and C, are the excitation voltages returned from the row and column k,m-WTA circuits, V, is the output voltage from the PU, (I is varied over time to perform annealing, and Q, /i’ and 7 are fixed gain terms. Scaling is achieved by adjusting the bias currents through the differential pairs ROW, COL, ~HYST, and COST. Source degeneration transistors .Ql-QlO are used to extend the dynamic range of their associated inputs by reducing the transconductance of the differential pairs. To maximize capacitance, the association-cost storage capacitor in Block 2 is implemented as a three-layer device, with a double-polysilicon capacitor overlying diffusion to form a transistor. Also in Block 2 is a NOR select gate operating from row (ROW) and column (COL) select lines. Each of the PUS may be individually (and serially) accessed to

Competitive

Neural

434,

Architecture

0 I

FIGURE the cost achieved the gain and row

AD3ER/SCA,ER

4. CMOS analog VLSI circuit schematic of the PU. Inputs from the k,m-WTA circuits (ROWEXCIT, COLEXCIT) and memory sample-and-hold (in Block 2) are scaled and summed (Block 1). A variable gain sigmoidal characteristic is by attenuating the summer output voltage using a follower circuit (Block 3), with follower drive capability controlling (and variable-gain annealing). The output circuit (Block 4) converts the result into currents that are applied to column sum lines, as well as tristate global output line CHIP OUT.

the association cost memory and to read out the activation state of the PU over the global line CHIP OUT. The output current from the adder circuit is applied to a transconductance circuit in Block 3. By adjusting the quiescent bias current set by GAIN CONTROL, the effective resistance seen by the adder can be varied over a range of about 30: 1. The output of this stage is the voltage at the summation node SUM, which is further shaped into a sigmoidal form by the limiting transistors Qll and Ql2. The GAIN CONTROL thus sets the PU gain, and may be used for mean-field annealing. Finally. the SUM voltage is converted into a current by the differential pair 019 and 020 in Block 4. Mirrored copies of the resulting single-ended current are applied to the row and column sense lines connected to the k.m-WTA circuits, and switched onto global output line CHIP OUT whenever the cell is selected. An early version of the PU cell without the hysteretic feedback loop was fabricated through Metal Oxide Semiconductor Implementation Service (MOSES) using a double-poly 2-micron n-well process. Figure 5 gives the measured output currents when the COST input was swept through a 4v range, load

and the GAIN CONTROL potential was stepped such that the current supplied by Q21 varied from 1 .O to 80 microamperes. The gain is indeed variable over a 30-to-1 range, and the sigmoidal activation function is monotonic, with some gain increase towards the rails. Also, at low gains the current level

FIGURE 5. Measured input-output characteristic of the PU circuit. Output current is displayed as a function of input voltage, for different gain settings. A gain variation of 30: 1 was achieved.

at the rails is somewhat reduced; this should not affect the settling dynamics with variable-gain annealing because only as the PUS approach the hypercube corners, at higher gains, is high precision required (for correct blocking enforcement). With hysteretic annealing, where a fixed, low gain setting is required, the k,m-WTA window limits need to be adjusted to compensate for the lowered rails (corresponding to hypercube “shrinkage”). The k,m-WTA circuit schematic is shown in Figure 6. As in the PU circuit, Block 2 comprises cell select and sample-and-hold and capacitor memory circuits, in this case with a pair of memories for storing the constraint window limits. The pair will be programmed in parallel from off-chip digital-to-analog converters. For simplicity, row and column cell select address lines can be interleaved with those accessing the PU cells. The window limits are compared with the summed PU activations by the circuit in Block 1. The PU activation sum is mirrored by Ql and Q2 and compared with currents derived from the window limit memory potentials by differential pairs Dl and D2. The high-gain current comparison at I,,,, and I,,, results in digital control voltages that specify whether the input sum is above, within. 01 below the constraint window limits. Finally, in Block 3, the excitation level at the output ROWiCOL EXCIT is increased or decreased by the chargeidis-

---------------

r---:DMPARAT@R

‘. i

charge circuit Q3-Q6 when the PI sum is betow charge currents are approximately equal, and arc luil\, programmable by external bias line CHARGF RATE. which sets T,. Note that this circuit does not rrnplement the shape of the windowing function .given in eq (5). Instead of returning the difference between the window limit and sum, the new function in effect returns a programmable constant whenever the k.m constraints are violated, with sign depending on the direction of violation. Consequently. the rate of change in k,m-WTA excitation dE/dt (eqnx (6) and (7)) will be constant whenever the PU sum i\ outside the k.m limits. This deviation from the simulated model was undertaken because the modified model could he implemented with a cleaner and simpler design. Operation of the k.m-WTA circuit was characterized by applying the output to .L virtual ground opamp circuit so as to measure the charge and discharge currents. From the graph 111Figure 7. it can be seen that the window limits were programmed to correspond to input summation currents of about 12 and 23 microamperes. The three plots correspond to ‘_~_______‘_______.. ,

2

CELL SELECT

-

AND SAMPLE 1::

AND~HOLD

%A5

Q4

ROW/CDL Swl Ir’ ‘b

^ i J

cl1

I ‘Ji

/

,’

05

c exe ’

a ‘ai

I

a VOLTAGE-TO-CURRENT CONVERSION AND COMPARATOR L___------________--t (7.)

CHARGE RATi

I I

DIGITAL LOGIC AND CHARGE CIRCUIT

~~~~,~~

(Ri /c] )

341

creasing the annealing time. however. the number of oscillatory trials could be reduced. In all cases where the network settled, the optimal solution was

found. Settling times varied considerably.

FIGURE 7. Characterization of the k,m-WTA circuit. The capacitor C.,, charge/discharge current is shown as a function of input ROWiCOL SUM current, for different CHARGE RATE currents. The constraint window is clearly evident.

CHARGE RATE currents of about 10. 20 and 30 microamperes. The nonzero offset currents within the constraint window were an artifact of the measurement.

6. CONCLUSIONS In this paper. we presented simulation results and VLSI implementation details of a novel neural-based competitive architecture for solving assignment problems. Instead of programming the association costs. row and column blocking constraints, and PU activation requirements into an additive network interconnection matrix, these constraints were independently implemented with limited connectivity. Association costs were summed directly into PU inputs, the blocking constraints were programmed using k,m-WTA “overseer” circuits (one for each row and each column). and the requirement that each PU turn fully on or off was combined with a mcchanism for simulated annealing, again in a manner local to each PU. The purpose of these deviations from the additive model was to reduce connectivity from O(M’N’) to O(MN). In circuit implementation terms. the architecture was reduced from three dimensions to two. making possible an implementation in the two-dimensional VLSI medium. Two methods of mean-field simulated annealing were considered. The first requires that the gain, or activation function slope, of the PUS is gradually increased to force the system to come to a solution in which all PUS are fully on or off, and to regulate the settling process so as to avoid poor solutions. In simulations, this method resulted in sustained oscillations in about 15% of the simulation trials. In these cases. the system could not discriminate between optimal and multiple near-optimal solutions. By in-

and were

longer whenever multiple very good solutions existed. Based on a conservative PU time constant of I microsecond, maximum settling times for nonoscillatory trials were I75 microseconds. The other annealing method was developed to overcome the possibility of sustained oscillation, and to guarantee a maximum settling time. This method. hysteretic annealing, used positive feedback from each PU’s output to its input to implement an increasing hysteretic energy barrier that must be overcome to change the state of the PUS. If the barrier is made sufficiently large, the state of each PU is locked, and oscillation is impossible. Furthermore, because the barrier is externally controlled via the annealing schedule. the tradeoff between goodness of solution and settling time could bc explicitly adjusted. This annealing method was extremely successful. with the optimal solution found in almost all simulation trials. The worst solution found was YY.4Y’X of optimal. As expected. better solutions were obtained with longer annealing schedules. In comparing both annealing methods. it was found that the selection of system parameters (i.e. time constants and gain values) was quite critical for the variable-gain annealing. while the hysteretic annealing network WIS insensitive to the values seIccted. This implies that much larger network sizes could be implemented using hysteretic annealing, before the resolving limit of the network is reached. It should also be noted that a combination of both annealing mechanisms might result in better solutions than either method alone. Of primary importance in developing this architecture was the requirement that it be readily implementable in silicon. As an initial proof of concept, individual cells implementing the PU and k.m-WTA circuits were designed and fabricated in CMOS VLSI. Characterization of the circuits showed that they functioned as specified. Initial circuit cell sizes were IS0 microns square, allowing systems at least 30 X -!O in size to be implemcntcd using standard MOSIS die sizes. With wafer-scale integration. it should be possible to fabricate much larger systems, since the processor is intrinsicall), fault-tolerant in that nonfunctional rows and columns may simply be omitted.

REFERENCES J.. Toomarian. N.. & Protopopexu. V. (10X7). Optimization of the computational load ot ;I hypercuhe supercomputcr onhoard 8 mobile robot. Appki Of~ks. X(13). .X)07-

Barhen.

5011. Bilbro.

G..

Mann.

R..

Miller.

T. K..

Sn\der.

W. E..

Van den

Bout, D. E., 8~ White, M. (1989). Optimization by mean field annealing. In D. Tourettky (Ed.), Advances in Neural Information Processing Systems 1 (pp. 91-98). Palo Alto, CA: Margan Kaufmann Publishers. Inc. Blackman, S. S. (1986). Multiple-target trucking with rudar applications. Norwood, MA: Artech House. Brown, T. X, & Liu, K. H. (1990). Neural network design of ~1 banyan network controller. IEEE Journal of Selected Areas in Communication,

8(g),

1428-1438.

Brown, T. X (1991). Neural network design for switching network control. Ph.D. Thesis, Dept. Electrical Engineering. California Institute of Technology. Pasadena, CA. Carpenter. G., & Grossberg. S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphic,c, and Image Processing. 37. 54-l IS. Cohen, M. A.. & Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on $ysrems. Man. and Cybernetics, SMC-13, X15-826. Eberhardt. S. P., Duong. T., & Thakoor. A. P. (1989). Design of parallel hardware neural network systems from custom analog VLSI “building-block” chips. Proceedings of the International

Joint

Conference

1989. Washington,

D.C.,

on Neural

Networks.

June

18-22.

(Vol. Il. pp. 1X3-190). Piscataway.

NJ: IEEE. Eberhardt. S. P.. Duong. T., & Thakoor. A. P. (1989b). A VLSI synapse “building-block” chip for hardware neural network implementations. Proceedings of the Third Annual Purallrl Processing Symposium, March 29-31. Fullerton. CA (Vol. I. pp. 257-267). Fullerton. CA: IEEE Orange County Computer Society. Fang, L.. & Li, Y. (1990). Design of Competition-based Neural Networks for Combinatorial Optimization. International Journal of Neural

Systems.

l.(3)

221-235.

Fang. L.. & Li. T. (1990b). Neural networks for generalized assignment. Proceedings qf the Second IASTED Internationul Symposium

Expert

Systems and Neural

Networks.

August

1%

(pp. 78-80). Anaheim, CA: ACTA Press. Foo. Y. S.. & Takefuji. Y. (1988). Stochastic neural networks for solving job shop scheduling: Part I. Problem representation & Part 2. Architecture and simulations. Proceedings of the 17. Hawaii

International

Conference

on

Neural

Networks.

July

24-27.

(Vol. II. pp. 275-290). Piscataway. NJ: IEEE. Goldstein, M., Toomarian, N., & Barhen, J. (198X). A comparison study of optimization methods for the bipartite matching problem (BMP). Proceedings of the Internutional Conference on Neural Networks. July 24-27. San Diego (Vol. II, pp. 267274). Piscataway. NJ: IEEE. Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences USA, (Vol. 81, pp. 3088-3092). Hopfield. J. J., & Tank D. W. (1985). “Neural” computation of decisions in optimization problems. Biological Cybernetics. 52. 1988. San Diego

141-152.

Kerns. D. A.. Tanner, J. E., Sivilotti, M. A.. & LUO. J. (1991). CMOS UV-writable non-volatile analog storage. Proceedings ofAdvanced

Research

in VLSI:

International

Conference

1991.

Santu Crux. (‘X. March 3S-2!t. Cambr!dgc. 11~: ~1 I‘ prc,,,. Kirkpatrick. S., Gelatt. C. D.. & Vccchl. M 1’. (IW) ~~pttmization by simulated annealing. Scicr~t,t,. 220. h7()--h$(i. Kub. F. J.. Moon. K. K.. & Mod&). J. A ! i
hum. AL

I I I?. .-\t:-

/)c~r,~r;,~t’ F(+

Madison. IvI. f $tlNI Prc\\. Lazzaro. J.. Ryckebusch. S.. Mahowald. ‘il. 4 & ble;ld. (. i\. (IYN). Winner-take-all network\ 01 01 U; cornplcxity. In 1.1 Tourctzky t Ed.). Advanc,es in Ncurcr! i,i!t/rrlltrtion Proc,es,sirrp ~S~.StemY1. (pp. 703-71 I )_ PalO Altcl. (~‘\ Morgan Kaufmann I’uhlishcrs. Inc. Li, ‘I’.. S: Fang. !.. (1990). A compariltl\i \rutly ui compctitiurr based and mean field networks using qu%ldratic assignment. (pp

Proceedings &cur

795-803).

of the Second

Sy.>tcms and Neural

l,ASTED

It//i*r rrarional

Networks

:ILL~U.U

.Sympo.\rrtn/

1 .i- 17,

Hawaii

(pp. 23-26). Anaheim, CA: AC’I‘A Pra,\\ Majani, E.. Erlanson. R.. & Abu-Mostal,i. 1. ( 1989). On the kwinners-take-all network. In D. Touretyky (Ed.). Advances irr Neural Information Proce.ssing S~.rtern\ 1 i pp. h3J-h42) Palo Alto. CA: Morgan Kaufmann PuhlishL~r~. Inc. Moopcnn. A.. Duong. ‘I’.. Ji Thahoor. 1 1’ (1990). DIgItsIanalog hybrid synapse chips for clectrollic: neural netwurks. In D. Tourctzkv (Ed. ). Advances irr IL’(*iira/ Irl,fcjrmation Proczssing Systems 2 (pp. 769-776). Palo Alto:, (‘A: Morgan Kautmann Publihhcra, Inc. Moopcnn, A.. S Thakoor. A. (1UXU).I’rcIprammable synaptic devices for clcctronic neural ncta. Pw~wrlirg~ c!f’the Fifh l.4 .y TED

Intrrnationul

,Vetwork.,.

‘fhcory

.Symposiutri

Ev/Jl’ii

and ,4pplications.

\~~,~trm.s and NeuraJ

.4ucrtv/

16-1X.

Anaheim. CA: AC1‘A Prc\\. A.. Thakoor. A. P.. & Duon:!. I I 19xX).

HI (pp.

tfonolulu

36--4(l),

Moopcnn. network

for euclidean

the IEEE

International

distance

mimmization.

C0nferenc.e

on Neurul

A ncura!

Proceedings Networks.

qj July

San Diqo (Vol. II. pp. 31% :%I). Piacatawav. NJ: IEEE. Munkrcs. 3. (lYS7). Algorithms tor the asslgnmcnt and trampor. tation problems. Journal of the Socieii, fi/r Industriai Appliccl/ions o,f Mathematic.r. 5.( 1 ). 32-38. Peng. Y.. & Reggia. J. A. (1989). A connectronist model for diagnostic problem solving. IEEE Trunsacti0n.s on ~4ystem.s. 23-37,

Man.

and Cybernetics,

19,(2).

285-19X.

Press. W. H.. Flannery. B. P.. Tcukolsky S A.. & Vetterling. W. T. (198X). Numerical recipes 111C. Cambridge: Cambridge University Press. Slckinger, E.. & Guggenbiihl. W. c.1987: :\ versatile building block: The CMOS differential difference amplifier. /EEE Journul of Solid-State Circuifs, SC-22. 1X7--204. Smith, M. J. S.. & Portmann. C. L. (19893. Practical design and analysis of a simple “neural” optimization circuit. IEEE Transuttiony

on (‘ircuits

and Systems,

36.( 1 !. J?- 50.

Tagliarini. G., XLPage, E. (1988). A neural-network solution to the concentrator assignment problem. in D. Anderson (Ed.). Neural Information Processing Systems :&w York: American lnstitutc of Physics. Yanai. H., & Sawada. Y. (1990). Associative memory network composed of neurons with hysteretic property. Neural Networks, 3,(2), 223-228.