Asymmetric mean-field neural networks for multiprocessor scheduling

Asymmetric mean-field neural networks for multiprocessor scheduling

Neural Networks. Vol. 5. pp. 671-686. 1992 Printed in the USA. All rights reserved. 0893-6080/92 $5.00 + .00 Copyright ~e, 1992 Pergamon Press Lid. ...

1MB Sizes 26 Downloads 89 Views

Neural Networks. Vol. 5. pp. 671-686. 1992 Printed in the USA. All rights reserved.

0893-6080/92 $5.00 + .00 Copyright ~e, 1992 Pergamon Press Lid.

ORIGINAL CONTRIBUTION

Asymmetric Mean-Field Neural Networks for Multiprocessor Scheduling BENJAMIN J. H E L L S T R O M AND LAVEEN N . K A N A L University of Maryland ( Received 3 .hdy 1990: iw'ised and acc~7,ted 16 .hmuao" 1992 ) Abstract--th~l?/wM and ~ m k ~" proposed techniqtw ./or emhe~kling ~q~timizatimt prol,/em.s, such as the travelling salesman, in mean:/ield thermodynamic netwm'ks sff/.]brs.li'mn several restrictimt.s. In parlicula/; each discrete optimization prohlem musl be reduced to the minimizalimz qf a O-l llamilhmian, lh~l?held am/ Tank ~ lec/mique .|'ie/ds./ill[)'-ctmnecled networks Q/]illtCliolla/I.l' holllogc'll~'OllS t'isilde itnils with hm'-m'der .~.l'mmelric Colt/l~'cliotls. II ~, present a program-constructive approach to embedding d~[/icttlt iwohh'ms in m'ttral network.~-. Ore"th'rit'ation method ot'erconw.~ the lhmtihonian reducilfility rcqltircm~qtl and promotes networks willl./itncliona/I)" helerogeneott.s hidden it/lits attd as.l'mnletric collnections o[hoth hnv and high-order The ttnder/l'ing nlechanism int'oh'c.s the tk,compo.~ition q/'arlfitrao' prohlcm CllO'g.l'gr{Idi~'/tls into piecewise linear /itnctimts which cat! h~' modeh,d as the outpttts qf small grotq~s qf hidden ttnits. To illustrate our method, we ~k'rive thcrmod)'nanlic mean-held ncltral networks./i~r multiproce.ssor schedttling. Simtt]aliott q/tttned networks q[zq~ to 2.400 ttnits to )'ie/ds vo3" good. and q[ien. ~:\act solutions.

Keywords--Multiprocessor scheduling, Mean-field theory networks, Asymmetric neural networks, Energy-gradient decomposition. Combinatorial optimization. yields fully connected networks of functionally homogeneous visible ~ units with low-order symmetric connections. Theoretically, all NP-hard optimization problems can be reduced to the travelling salesman problem (and consequently, 0-1 Hamiltonian minimization problems). However, direct reductions have been found for only structurally simple problems. These include map and graph coloring (Dahl, 1988), graph partitioning, graph K-partitioning, vertex cover, m a x i m u m independent set, maximum clique ( Ramanujam & Sadayappan, 1988), clique partition, and max cut (Aarts & Korst, 1989). With the exception of the TSP, which has been extensively studied (Aarts & Korst, 1989: Aarts & Korst, 1987; Gutzmann, 1988, Hedge, Sweet, & Levy, 1988: Szu, 1988: Wilson & Pawley, 1988), and the map and graph coloring networks ( Dahl, 1988), simulation results are generally not cited. Typically, the particular optimization problem is deemed "'solved" solely by reduction to 0-1 Hamiltonian minimization. Recently, we showed ( Hellstrom & Kanal, 1990) that embedding the NP-hard knapsack optimization prob-

I. I N T R O D U C T I O N Combinatorial optimization ranks among the first applications of modern neural networks. Hopfield and Tank (1985) embedded the objective function of the travelling salesman optimization problem (TSP) in a mean-field thermodynamic neural network. Although the importance of Hopfield and Tank's work cannot be disputed, their method of embedding combinatorial optimization problems in analog neural networks suffers from several shortcomings. A major restriction on the applicability of the technique is that each discrete optimization problem must be reduced to the minimization of a 0-1 Hamiltonian of the form n

n

n

1(.,.) = - ~ Z Z .,.,.,.,w,, + Z .,.,o,. i-I

1=1

(I)

t-I

where s = (.h, s2 . . . . . s,,) are state variables which are represented as the activation values of a collection of n neuron-like units, and wij and 0i are constants which are referred to as the connection weight between units i and j and the threshold of unit i, respectively. A sufficient but not necessary condition for network stability is that w be symmetric (i.e., w0 = u)i). The technique

The terms visihk, and hidden units arise in the literature on pattern recognition with neural networks. The states of visible units are externally measured in order to ascertain the "'networksolution." Hidden units serveto support visibleunits by detecting useful features. A set of units is functionally heterogeneous if qualitatively different features are detected by membersof the set.

L. N. Kanal would like to thank NSF for Grants DCR-8504011 and IRI-8802419. Requests for reprints should be sent to Benjamin Hellstrom,2929 Pinewick Road. Ellicott City, MD 21042. 671

672

B. J. Hellstrom and L. N. Kanal

lem in neural networks required the use o f functionally heterogeneous units. Even under primitive fixed exponential annealing schedules knapsack-packing networks were found to yield results that were no worse than those of a well-known greedy heuristic. In this paper, we present a program-constructive approach to embedding difficult discrete optimization problems in analog neural networks. A simple program is defined for gradient-directed descent of a n o n - H a m iltonian energy which embeds the objective and constraints of a multiprocessor scheduling problem. This program describes the non-neuronlike behavior of a collection of binary-state processing elements. A series of stability-preserving program transformations is applied ( hence the program-constructive approach ). The resulting transformed programs correspond to nonneuronlike analog, and analog neural networks. O u r approach has several advantages over the existing method. In particular the requirements for functionally homogeneous visible units and 0-1 Hamiltonian reducibility of the objective function are circumvented. The process of creating a neural network from an arbitrary discrete optimization problem is broken into a sequence of manageable steps. The results of extensive simulation under a variety of commonly-used operational environments are presented.

2. M U L T I P R O C E S S O R

SCHEDULING

The multiprocessor scheduling decision problem is given (Garvey & Johnson, 1979) INST-INCE: A finite set T of "'tasks," a "length"/(t) E Z ~ for each t E T, a number m @ Z ' of "processors" and a "deadline" D @ Z +. QUESTION. Is there a partition, T = TI to T2 t O . . . tO T,,,. of T into m disjoint sets such that max/Z

3. N E T W O R K

DERIVATION

3.1. Representation and State-Space Mapping We define T = {fi, t2 . . . . . tr}, r = I TI, the n u m b e r of tasks, and n = r . m. The processing time required by task t is given by/r, for t @ T, and the global deadline is D. We choose a mapping, M , from the space of all m-partitions, ( T~, 7"2. . . . . T,,,), of T t o the corners of an n-dimensional unit hypercube, S ' M( Tt, T2 . . . . .

T,,) = [s,j] .....

and [~

ift;@ T;] i f t i ~ Tj '

'%=

for 1 < i < r, 1 _< j __< ;?7. We first derive a r × m rectangular network, MS l, of binary-state processing units with non-neural behaviors. Unit 3 @, by adopting a state value of 1, represents the allocation o f task t; to processorj. It remains for us to derive the function of each unit, the connectivity of the network, and the transformation to a larger network of neuron-like units. Once we have completed the derivation, problem instances can then be m a p p e d to networks and relaxed under some, yet to be defined, operational environment. We define the inverse mapping M -~ (s) = ( T~, T2 . . . . . T,,,) where s = [SO]r× .... Tj = {li: S0 > X}, and X is a constant threshold with 1 _< i _< r, 1 < j < m, M -j gives a multiprocessor allocation that is a vector of m sets of tasks comprising a partition of T.

3.2. Energy Function Let E = E4 + Ec,

(2)

where

I(t):l
E.4=.I'Z

k IC l'j

}

max

j=l

sifli - D, O . k\i=l

(3)

The problem of packing task execution times into a bounded time period for a single processor is similar to the knapsack problem. Indeed, the knapsack problem is a restriction of the multiprocessor scheduling problem. For the large, difficult problems that we intend to address, the "density" of exact solutions in the problem variable space is vanishingly small. Consequently, it is unlikely that our network simulation algorithms will reliably find exact solutions to these decision problems in polynomial-time and any performance measure based on the frequency of exact solutions will lack descriptive range. This motivates us to seek a richer performance measure. We use average and m a x i m u m processor overload. 2

and

2 "Processor overload" or simply "'overload"is defined as the sum tardiness of all tasks scheduled on a particular processor,

3 In future discussions, we will often use the same variable to represent both the name of a unit as well as its activation value.

r

m

~2

E.f represents the total overloading of processors. This can easily be seen as Y~i= ~ sol; is the total processing time for processor j, and m a x { ( ~ I-1 soli) - D, 0} is the overload of processorj. Since the network seeks to minimize E, configurations that represent allocations that exceed the capacity of any processor are avoided. E c represents a synlaA" constraint. If only E4 is present, an energy m i n i m u m of 0 is achieved when s o = 0 for all 1 _< i _< r and 1 <_ j <_ m . E c is minimized when each task is allocated to exactly one processor, that is,

Multiprocessor Scheduling Networks

673

only one unit in each row of the network has an activation value of 1 and the rest are 0. This is similar to an energy term proposed by Szu ( 1988 ) for constraining network configurations for the TSP. An important difference between our syntax term o f e q n (4) and that of Szu is the inclusion of the factor I i. Consider large and small tasks, tL and ts (i.e., ItL Its). In the absence of the/~ factor, the cost of failing to allocate task tt is C. The cost for failing to allocate ts is also C. However, the benefit, in terms ofE,j, of failing to allocate tL is potentially - A . 1,,.which is much better than - , 4 . Its. By discarding the larger task, the load on all processors can be substantially reduced thereby allowing unallocated tasks to be packed with ease. 4 The inclusion o f l i in (4) has several consequences. During annealing, gross features of the energy landscape are a primary consideration at higher temperatures. If A ~> C then the satisfaction of the overloading restraint embedded by (3) yields large improvements (i.e., smaller values), of E. Violation of the overloading restraint yields large energy penalties. Since large improvements ( or penalties) correspond to gross features of the energy landscape, and these gross features affect the configuration trajectory at higher temperatures ( i.e., early in annealing), the selection of,4 >> C gives the satisfaction of the overloading restraint temporal precedence over the satisfaction of the syntax constraint. By weighing the syntax term of each task by the task's size, we give temporal precedence to the allocation of larger tasks over smaller ones. This results in a natural "fit-decreasing'" packing of the tasks. More importantly, the energy penalty due to violating the syntax constraint becomes critical (or "'competitive") at the same temperature as the energy penalty due to an overload of equal size. The gradient of E is defined at the corners of the hypercube by AE L AE~

AEc

-+ -A S x . v As.~.y "

As..

(5)

and As.,r = CI,.

2

.%-

(7)

1 .

) ~')'

for all l _ < x < r a n d

1
3.3. Binary-State, Network

Non-Neural

Computational

The discrete gradient is used to define the functions of non-neural binary-state units of network MS1 as follows. m



I.I q.,.v ~-- ~ S,-i"

p.,.,, a-- ~ SO./i :

i-I

i=l

j#)'

i÷.~

2.1 A E .*- C(2q~. r -

1 ):

3.1 if p,.,. > D then

AE ~- AE + ..l/,.

4.1

5.1 else if p,,. + !~ > D then 6.1

AE -,---AE + .-1. (p.~.,.+ (~ - D):

7.1 if AE < 0 then 8.1

.%v ~

1

9.1 else I0.1

.s'xr ~- 0;

We use '-/ to denote the computational temperature. By replacing line 7.1 with 7.2 if U ( 0 , 1 ) < -

-

1

1 + e "xt-/'r

then

where U ( x t , x2) is a random number drawn from a continuous uniform (x~, x2) distribution we employ stochastic smoothing in order to avoid shallow local minima. It is important to note that in the low temperature limit as '7- * 0, and for AE 4: 0, lines 7.1 and 7.2 are equivalent.

where m -

r

0

3.4.

if ~ si,.li + I,. <- D i=1

i¢'x

.4/,

if ~ swli > D i=1

AE.f

i~x

Asx, A

(

I,. -

so.I i

D +

if

9

So, Ii < D

-=

i=

i#x

i÷x

Analog, Non-Neural

Network

We now transform program MSI into program MS2 which represents an analog, non-neural network. Instead of probabilistically adopting a state of 0 or 1, the units of MS2 adopt states that model the expected values of analogous units in MSI. Thus, MS2 is a meanfield model of MSI. Lines 7.2 and 8.1...10.1 are replaced with 9.3 s,,. "--.l.r(---XE):

r

and

Computational

si,li + l~.> D

where

i=1 i#a"

I

(6)

4 This behavior is easily empirically verified.

.l.'r(X)- I + e ~-'/r~

(8)

is a graded threshold function. Since units adopt continuous states on [0, 1] we now have an analog network.

674

B. J. Hellstrom and L. N. Kanal

6.3 AE "*- 2Cq.,.y- C

The graph of network MS2 is isomorphic to that of MSI. We replace lines 3 . 1 . . . 6 . 1 with

7.3

+ .'l/x" ¢'1

3.2 if p,.,. - D > 0 then

8.3

+ A (p.,. + /~ - D)" c23:

4.2

9.3 s,. " - f r ( - A E ) .

~ E ~-- ~ E + ,41~;

5.2 i f ( D - Px,, + ~ > 0 ) and ( p , , + l, - D > 0 ) then 6.2

~E.*-- ~ E + A . ( p , . + I, - D);

which, for suitably chosen e (0 < ~ ~ 1 ) are operationally equivalent. The "if" statements ( lines 3.2 and 5.2 reflect discontinuities in the derivative of the gradient AE. To facilitate parallel gradient search, it is in our interest to smooth the energy landscape to facilitate the avoidance of local minima. In general, we therefore replace statements of the form

Let us examine the d y n a m i c s of this algorithm. As T ~ 0, c, "detects" the condition of the first " i f " statement (line 3.1 ) since c~ ~ 1 whenever "p,.,. > D " holds and c~ + 0 otherwise, c2s detects the c o m p l e m e n t of "Px,. > D'" in conjunction with the condition of the "else if" statement ( line 5.1 ), "p.,. + l,. > D.'" Finally, we incorporate a m i n o r improvement. Since c~ and c2 detect c o m p l e m e n t a r y conditions, we could replace 3.3 c: ~ . [ r ( D - p,.,. + t), with

if z > 0 then

3.3 c2 ~--.~r ( F - 2 Fcl ),

AE ~ AE + w; where F is a network parameter ( F > 0). Alternatively we can replace

with statements of the form

2.3 cl ~--.[r(P,~.- D),

..kE ~ AE + w..[r ( z ). Again, it should be noted that in the low-temperature limit as 7" ~ 0 the two forms are equivalent. This transform is specifically applied to lines 3.2...4.2 giving lines 2.3 and 7.3. Conjunctive conditional statements of the form

with 2.3 c~ " ~ - ) r ( F - 2Fc2). Instead, we correlate the activity of ct and c2 by combining their respective AE values as follows 2.3 cl "*--J.r(P,,.- D + F - 2Fc:)

i f ( : ~ > 0 ) and (z_, > 0 ) then

and A E ~ - - A E + w:

3.3 c2 "*-.fr(D - P,r + ~ + I " - 2Fc~).

are replaced with the statements c'~ ~ .It(z.): c, ~ J4-( :_, ):

,.XE ~ _XE + w - ~ (Kc, + Kc_, - ~ )

:

where c~ and c2 can be interpreted as auxiliary variables and K is a network parameter like .4 and C. This transform is applied to lines 5 . 2 . . . 6 . 2 giving lines 3 . 3 . . . 5 . 3 and 8.3. After application of these rewrite rules and renumbering line 1.1 and 1.3 and 2.1 as 6.3 the function of unit ( x , ) ' ) in network MS2 is given by

1.3 q . . . .

Z ,%;

p.~,.-.,-- ~ .%./,:

j:l / ~"r

t=l t 4".~.

2.3 cL ~-.l'r(P.,-,.- D): 3.3 c2 -~--.[;(D - p.,. + ~);

The resulting connections implement another winnertake-all subnetwork comprised of c~ and c_,. Although we have described these unit functions by discrete-time algorithms, with local m e m o r y for the storage of t-'~, c2, c3, c23, and s , , with appropriate relative timing constraints they can be c o m p u t e d continuously by each unit. 3.5.

Analog

Neural Network

t.'l , C2, C3, C23, and .%. can be c o m p u t e d in parallel. Furthermore, in the parallel decomposition, each resuiting process attains neuron-like behavior, that is, continuous integration and graded thresholding. Each unit ( x , ) ' ) in MS2 is replaced by a cluster of four hidden units and one visible unit giving network MS3. The unit functions of network MS3 follow.

*) *)

( (

1.4 cl..'*-'.[r F - 2Fc2,,- D + ~ s, fl, , i=1 I ¢x

2.4 c2,,.~-.¢r D + e + F - 2kk'~,,.- Z.s,./i i=l

4.3 c3 ~'-J.r(p,-, + /.,. - D); r

,

Multiprocessor Scheduling Net works

4.4

6 75

4. PARALLEL S I M U L A T I O N

+

Simulations of networks MS3 were conducted on a high-performance MIMD machine consisting of eight processors arranged in a cube topology. Each processor has a 10 MIPS, 32-bit, integer processor that executes concurrently with a 1.5 MFLOPS, 64-bit, floating point processor. The simulators for our multiprocessor scheduling networks were written in the "C" programming language.

m

5.4 S,-:.~-.Lr C - 2C ~ s.,j - A/,.c,,,. j=l )~v r

-

+

s,,,,)) i 4t,,

We have subscripted the c~, c2, c3, and c'23 units to emphasize their duplication over all tasks/processor pairs. In this expanded form it is easy to identify the connection weights and the unit biases which are illustrated in Figure I. Network MS3 contains both normal and conjunctive synapses. Conjunctive synapses allow c23-units to moderate the flow of activation from s-units to other s-units in the same column. In theory, these special synapses can be approximated with conjunctive subnetworks, although we have not assessed the impact of the resulting approximation error on network solutions.

4.1. Benchmark Generation Before testing the performance of network MS3 under a variety of operational environments we first define a collection of "meaningful" problem instances to serve as benchmarks. Intuitively, there are several criteria that characterize -easy" multiprocessor scheduling problems: 1. ~ I /, <~ roD, or 2.

r - - < 177, o r

Processors

[~ F I G U R E 1.

:~

~

~,~

.~

Multiprocessor scheduling neural network MS3 with detailed cluster-internal and entering connections.

6 76

B . J . Hellstrom and L. N. Kanal

3. the variance of the/g is small (the restricted problem in which/i = 1, for all i, is known to be in P ) , or 4. there is an obvious clustering of the lg. In order to avoid satisfying the first criteria, we generated random multiprocessor scheduling problem instances with the s a t u r a t e d processor property ( S P P ) (Garey & Johnson, 1975)

was terminated, a binary threshold was applied to all s-units. The modified activation values were then examined to ascertain whether the proposed solution was syntactically correct. The proposed solution is determined to be syntactically correct if

[

,,

]

(Vx) 1 <_v <_ r'-~ Z b(s~:i- X) = l ,

(8)

j=l

Z 1, = roD. t

where

I

b(x)=[ ~

Two classes of problems were generated: Class 1

Class 2

m = 6

m = I0

20 D = 10

r = 48 D = 100

r=

with MS3 networks of 600 and 2,400 units, respectively. For each fixed tuple, ( m , r, D ) , many problem instances were generated. Initially, m tasks, each of length D, are defined and the SPP holds. A task t, with l, > 1 is randomly selected and randomly split into two smaller tasks, t, and t2 where/~ =/,. + It_,. Random task selection and splitting is repeated until there are r tasks. After each random split the SPP is preserved so that we are guaranteed that each final problem instance has at least one exact solution. In order to reduce the tendency to generate excessively many small tasks an additional criterion, U(0, 1 ) >_ I ~ / m . D , must be met by each randomly selected task t. A histogram of task sizes for 10 class-2 problems is shown in Figure 2. In order to avoid any unforeseen clustering, the task sizes are randomly reassigned prior to use. 4.2.

Quantification of Solution Quality

For each operational environment, a sample of problem instances was generated and solutions sought on the network simulators. For each instance, after relaxation

25 F 20

q u

15

r

13.,.= Z /i- b(.%. - X ). t

(9)

1

is computed and compared with global deadline, D. If (3y)[l _< y_< m/3 A.> D],

(10)

then a processor overload exists and the restraint represented by E4 is violated. We refer to this condition as an A-violation. Let O = max{ max {fi~.- D}.0}.

(11)

I ~ y.nl

so that O is the m a x i m u m overload of any single processor. Let N be the number of simulations executed and N4 be the number of resulting A-violations. For the i-th simulation, we refer to the m a x i m u m overload by Oi. We define the average m a x i m u m processor overload, a m o , by Y-;'--',Oi

amo - - -

N.!

iiiiiiiiiiii!iii!!i!!!!i ";'" max { 1~,.- D, 0 } p = -~.=, IH

e n c

If any task is allocated to more than one processor, or is unallocated, the syntax constraint represented by Ec is violated. We refer to this condition as a C-violation. lfa C-violation is detected then the solution is discarded and the next problem instance is treated. If the proposed solution is syntactically correct, the schedule length for each processor y,

(12)

and propose to use it as a measure of the extent of the A-violation, In a sense, a m o is a pessimistic measure since a schedule in which one processor is overloaded by Oi time units is considered to be as bad as a schedule in which every processor is overloaded by Oi time units. Let

3O

r e

ifx>0 ] otherwise J'

10

Y 5

0

11

21

31

41

51

Task

61

71

81

91

so that P is the average overload taken over all m processors. Let Pi be the average overload, P, for the i-th syntactically correct simulation, 1 _< i _< N. The total average overload, tao, is given by

Size

FIGURE 2. Histogram of task sizes.

(13)

tao

Z~, P;

N..I

( 14 )

Multiprocessor Scheduling Net works

677

Unlike a m o , tao reflects the number of overloaded processors as well as the extent of overloading.

4.3. Simulator Organization The overall flowchart for the simulator is shown in Figure 3. The starting temperature, "To, is computed by warming the network while checking for complete melting. In order to guarantee that our system is completely melted, we must have "To = ~ . At this temperature, units of a stochastically-smoothed binary state network accept all transitions, hence, the probability of any unit being active is 0.5. The mean-field analog model is consistent since.f~ (x) = 0.5 for x 4= 0. If`To is selected so that s,:,. = 0.5, ct.,,,. = 0.5, c2,:,. = 0.5, c3,:,. = 0.5, and c23,:,. = 0.5 at thermodynamic equilibrium, for all 1 < x _< r and 1 _< y < m, then we will have achieved the same effect as i f q o = ~ . An initial value for `To is arbitrarily selected. The activation values of all units are initialized to U(0.4, 0.6) random numbers and the network is relaxed at temperature To. Relaxation consists ofiterative application of an update rule to each unit. In separate simulations parallel, partially-synchronous update (Cernuschi-Frias, 1989) and "pure" synchronous update rules are iteratively applied until the activation values of all s-units cease to change (i.e., a stabilization criterion ) (Vi)[0 < i < n ~ Is,~+j - s~+l [ < ~o],

Select starting temperature, 'T o

I

Initialize unit aefvation values

Update units until stabilization Determinemaximumdepolarization,u

no

Incrementtemperature'/" o

yes

Initializeunit activationvalues ,/'=%

Update units until stabilization Determine maximumpolarization.~'

+

n

o

---~ Decrementtemperature"/"

yes

FIGURE 3. Simulator organization.

(15)

is met. For simulation, a value of c0 = 10 -4 was used. With synchronous update, limit cycles are not dampened by the noise that is induced by signal propagation delays. Several approaches can be used to avoid oscillation with synchronous update. Wilson and Pawley (1988) suggest computing the next state, o~, of unit i by the formula o~,'*--(1 - # ) . w i + / ~ .

(')

1 + e (a~:/'r) :

for very small p (e.g., 10-5). Unfortunately, this long integrative time delay results in extremely long relaxation times. Alternatively, we can dampen oscillation with noise by computing the next state of unit i as u "-- U(0.0, r): coi~-(l-p)'wi+#"

('/

1 + c ~ar/'r)

:

where I - r is defined differently from, but with role that is (inversely) similar to the integrative time-delay given by Hopfield. This method of adding of noise has several attractive properties. It allows very fast relaxation with the synchronous model, serves as a generation mechanism for counter-gradient steps in the energy function, and diminishes as the system stabilizes. We used this "noisy" synchronous update rule with r = 0.5, a considerably larger value than the/1 prescribed by Wilson and Pawley. After network stabilization is detected, the maxim u m unit polarization, o, is computed by t, = max l ...... {Isx.,.- ½1}. When, at equilibrium, the maximum polarization falls below a constant threshold, t, < ~,,, all s-units behave as if'To = oo. Higher temperatures serve no purpose and increase the effort required to anneal to low temperature. If, at equilibrium, v >_ ~,,, the system is not stable at the center of the hypercube, and at the current temperature, "To is incremented by the rule 'To *-- ~.`T~ where ~ > I. For our simulations, we selected ~ ~ 1.43 and found starting temperatures near 400 ° and 5,000 ° for class- 1 and class-2 problems, respectively. When multiple simulations of networks embedded with problem instances with similar characteristics ( ranges of task sizes, number of tasks and processors, and deadlines) are executed, the time required to locate `To for the second and subsequent problems can be decreased by utilizing the starting temperature returned from the previous search as the starting point for the next search. After `To is computed, unit activations are again initialized to within close proximity of the center of the hypercube with U (0.4, 0.6 ) random numbers. The system is relaxed until stable. If the energy landscape is sufficiently smooth then deeper basins of attraction attract the global state early in the relaxation process and there are no sudden changes in the energy gradient as the global state approaches a corner. It is therefore rea-

6 78

sonable to assume that once the global state reaches a corner, it will not reverse direction or veer toward a different corner. The termination criterion that we use requires the application of a threshold, ~,,,, to the maximum depolarization, v' = maxl . . . . . . { / -- [ S A T - 1 [ } , of all s-units. As the global state of the s-units approaches a corner, the m a x i m u m depolarization approaches 0. During our experiments, we selected ~,,, = 10 2. If, after network relaxation, the network configuration is insufficiently close to a corner of the hypercube, temperature, q', is exponentially decremented by the rule '7" ~ 3,-'T where 0 < 3, < I and the next relaxation cycle is initiated. Otherwise, the simulation is completed and the activation values of the s-units are thresholded and examined to determine whether the constraint represented by Ec has been violated, and if not, the extent to which the restraint represented by E f has been satisfied. For our simulations we found 3' -- 0.90 to be adequate. 4.4. N e t w o r k T u n i n g

The quality of solutions obtained from the simulated network depends upon the selection of the network parameters, .4, C, F, and K, and the annealing schedule. These factors are related in that a poor choice of network parameters can, to some extent, be compensated for by a longer, more finely-tuned annealing schedule. Unfortunately, this results in a significant increase in the execution-time of the simulation. Network solutions resulting from poor annealing schedules cannot be improved upon by varying the network parameters. We started by empirically locating a set of approximate network and annealing parameters. The network parameters F and K are intrinsically different than A and C since they do not bias network solutions in favor of either constraint. F and K affect the fluidity of certain network configurations while ,-t and C affect the level of network frustration. By varying F and K we control the temperature at which the Cl, c2, and c_,~units exhibit decisive output. Values of F = 4.0 and K = 4.0 were found to be adequate. Network parameters .4 and C are strongly related. When ,4 is increased relative to C, the network gives higher weight to minimizing overload at the expense of producing syntactically invalid solutions. The typical manifestation is that some set of tasks is not allocated and has rows of s-unit activations that are all 0. If C is increased relative to A then the network gives higher weight to syntactic correctness at the expense of processor overloading. There is an inverse relationship between network parameters and the operational temperature range. This can be seen in the graded threshold function that is applied to each unit. Network parameters are often factors in the numerator of the exponent and 'T is a factor of the denominator. In general, increasing the

B.J. Hellstrom and L. N. Kanal

magnitudes of the network parameters has an effect similar to annealing over a lower operational temperature range. Increasing a single network parameter (e.g., C), causes those units with activation functions dependent upon C to behave as if they are cooler than units that are independent of C. The dependent units ~'freeze" at higher temperatures thereby giving the treatment of the associated constraint (syntax in the case of C) a temporal precedence with respect to other constraints. Restated, by increasing a single network parameter, we encourage the network to satisfy the associated constraint earlier in the relaxation process.

4.5.

Example

We observed the configurations of s-units for the problem instance: Optimal Allocation Processor

Task sizes

I

1

1

2

I

2 3 4 5 6

7 1 4 10 5

3 7 1

2 2

3

2

3

2

2

1

after 3 phase transitions (at 8.1 °, 6.8 °, and 4.9 ° ) and at the completion of relaxation. These "snapshots" are illustrated in Figures 4 and 5, respectively. The activation of each s-unit is represented by the size and shade of the small blocks. Let us interpret these figures. In Figure 4, at T = 8.1 °, we see that the allocation of the largest task, t7 (lv = 10), has stabilized with processor 2, as indicated by the large white block at s72. The remaining s-units in column 2 are strongly inhibited in conformance with E~. As the temperature is lowered to T = 6.8 °, the allocations of midsized tasks, t2, h3, hs, and tt6 with sizes 7, 5, 4, and 7, respectively, freeze. O f particular interest is task t~5 of size/~5 = 4 which is allocated to processor 6. The remaining s-units in column 6 show varying degrees of inhibition, s-units that are associated with tasks whose allocation to processor 6 are incompatible with that of h5 (i.e., t2, t3, ts, tT, ll2, 113, and h6), are inhibited. The s-units associated with tasks that can be allocated to processor 6 together with task t~5 are somewhat excited. At T = 4.9 °, all but the smallest tasks (those of size I or 2 ) have been allocated. The final configuration at T = 0.51 o, which represents an exact solution, is shown in Figure 5. This sequence of configurations serves to illustrate the evolution to a low-energy state that occurs during network relaxation. Given the apparently strict order of packing, in order of decreasing task size, we might

Multiprocessor Scheduling Networks

679

Processor x

1

2

3

Processor 4

5

6

l x

x

i

2

3

Processor 4

5

6

Ix

1

1

I

1

2

?

2

7

3

x

4-

I

2

3

4

5

6

]

1

1

2

?

3

3

3

4

2

4

2

4

5

3

5

]

5

3

6

2

6

2

6

2

2

7

zo 4 -

7

lO

7

10

8

2

8

2

8

2

9

2

9

2

9

2

1o

2

1o

2

10

2

11

2

..~

11

2

..¢

11

2

12

3

~

12

3

~

12

13

5

13

5

13

3 $

14

1

4-

14

1

14

1

15

4

15

4

4-

~6

?

16

?

4-

16

?

17

1

17

1 1

17 |

18

|

18

19

;

19

1

19

1

20

l

20

1

T

=8.1

°

T=6.8

°

4-

15

18

20

4-

T=4.9

°

Legend 1.0

0.75

0.5

0.35

0.3

0.25

FIGURE 4. Intermediate configurations of s-units after phase transitions.

consider abandoning the neural network approach altogether for some simple, deterministic, fit-decreasing algorithm. However, it is important to note that as the allocation of large tasks progresses the small tasks have significant influence upon the computation. Large tasks are allocated with the final placement of small tasks under consideration. It is interesting to note that if we are willing to introduce additional conjunctive synapses, we can improve our model and d.ispense with the c23-units altogether. We refer to the resulting network as MS4. in theory, c2~-units behave properly only in the low temperature limit (i.e., as '7" $ 0). If we interpret the activation values of the c2, and c3,, units as representative of the probabilities that 1. processor y is not overloaded and 2. the allocation of task .v to processor y introduces an overload respectively, then the joint probability of events I and 2 is given by the product of their individual probabilities. In network model MS3, at moderate temperatures, this is clearly not the case. We simply need to replace the value of c,__~,, with c2,.- c%. We define the function o f . % in network MS4 by ,( "'

s,..,. ~ . f

C-

2(' ~..% - .-l/,q,,

(,

I#t'

-- .'lt.'2~ C3~ ~

r))

t -- D + ~ .s,,./t i-1 t,~,,



Since at high temperatures c23,~ is a poor approximation of c2,- c3, we expect network MS4 to have improved performance. Network MS4 is shown in Figt.re 6.

4.6. Results In order to compare results for the different update rules and network types, we simulated 20 "hard" randomly-generated problems from each of the two problem classes and for each of the two parallel update methods ( partially and pure synchronous). We selected the parameters ( = 5 × 10 3 ~o = 104

3' =0.90

r = 0.5

t,.= 0.15 e,,,= 10-2

..I = 4.0

C = 4.0

F = 4.0

~ = 1.43

K = 4.0

In case of the partially synchronous algorithm, 8 and 30 units were updated by each real processor prior to each synchronization point for class-I and class-2 problems, respectively. These values were selected so that during each iteration of the relaxation update loop roughly 10% of all units were updated before activation values were distributed. For each sample of 20 problems, we calculated the percentage of exact solutions, and :l-violations (there were no C-violations). In addition, we recorded the average execution time of each simulation. The summarized results are shown in Table I.

680

B. J. Hellstrom and L. N. Kanal

Processor 1

2

3

4

5

6

x

1 x

I

1

2

7

3

3

4

2

5

3

6

2

7

io

8

2

9

2

10

03

11 12

problems with tat) = 0.23--roughly 1% of the average size of a task. One cannot help but notice the discrepancy in performance between networks embedded with class-I and class-2 problems. Although our final simulations with class-2 problems produced very few exact solutions, the total average overloads were quite small-much less than the average size of a single task. In general, synchronous algorithms performed no better than parallel asynchronous algorithms in terms of solution quality. However, parallel asynchronous algorithms tended to stabilize faster. This phenomenon is also described by Peterson and Hartman ( 1989 ). The stabilization rate of synchronously updated networks can be adjusted, to some extent, by varying r. Small values of r result in small, accurate steps through the configuration space. Large values of r result in larger steps, but with increased error in the direction of each step. The optimal value of r therefore depends on the contour of the energy landscape. Smooth, nearly planar surfaces call for large T values and coarse surfaces for small values.

13 14

4.7. Scalability Issues

15

We have frequently observed that the final configurations of columns of s-units that represent overloaded processors indicate the allocation of a small number relatively large tasks. For example, for D = 100, sets of tasks with sizes ~59, 45 } or [ 32, 78 1 might be allocated to single processors. We never observed overloaded processors scheduled with small tasks (e.g., ~ 38, 54, 4. 6 1). The explanation for this behavior is straightforward. Large tasks are packed at high temperatures and at high temperatures, small overloads (4 and 10 in the first two sets above) are ignored. After sufficient cooling, even small overloads are unacceptable. However, at the low temperatures at which the system attempts to correct small overloads, the larger tasks are already "locked" into their allocations. If we assume that the distribution of task sizes includes midsized tasks, then by the time low-temperatures are reached, the states of s-units that represent both large and midsized tasks will be strongly polarized. The reallocation of any large task will therefore necessarily involve large, if only temporary, overloads. The inability of the network to resolve small energy differences at high temperatures is the basis for the precipitous decline in the frequency of exact solutions as the "sizes" of embedded problems increase. If problem size is defined strictly in terms of m and r with the set of allowable task sizes and deadline D fixed then network performance degrades much more gracefully. For example, multiprocessor scheduling networks that are embedded with randomly generated problem instances from the class:

16 17 18 19 20

'T=0.51

°

FIGURE 5. Final configuration of s-units.

In this and subsequent tables, we refer to the percentage of A-violations by "% I ~, .'" Since, for class-2 problems, networks with no c23units relaxed under the parallel asynchronous update rule showed the best performance, we attempted to improve upon the results of these networks by decreasing the temperature decay rate, % to 0.99. The result is shown in Table 2. In summary, processor schedules, for class-2 problems, had total average overloads of 0.44 and 0.48 for synchronous and parallel asynchronous algorithms, respectively. For class-2 problems, the expected task size is 20, hence, the average overload amounts to less than 3% of the average size of a single task. Model MS4, with two additional conjunctive synapses and no conjunctive c23-units, yielded improved results for class-2

Multiprocessor Scheduling Networks

681 Processors

@ @ @ @

@ ®

======================

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

~!~i~.~!i~

/

FIGURE 6. Multiprocessor scheduling neural network MS4.

m = 10 r = 48 D = 10

despite the fact that they are as large as class-2 problems, yield solutions that are c o m p a r a b l e with those of networks for class- 1 p r o b l e m s : 20 simulations o f p r o b l e m s in this class on the parallel a s y n c h r o n o u s MS3 network

TABLE 1 Summary of Simulation Results for Multiprocessor-Scheduling Networks MS3 and MS4

Problem Type Class-1

Class-2

Update Rule

% Exact

% VA

amo

tao

Asynchronous U )date Asynchronous U )date No c23 units Synchronous U )date a Synchronous U )date No c23 units Asynchronous U 3date Asynchronous U 3date No c23 units Synchronous U )date Synchronous U )date No c23 units

95% 100% 100% 100% 0% 15% 0% 0%

5% 0% 0% 0% 100% 85% 100% 100%

2.0 N.A. N.A. N.A. 2.85 2.00 2.90 3.70

0.0167 0.0 0.0 0.0 0.48 0.23 0.41 0.52

Execution Time 1m 0m 1m 1m 4m 2m 5m 4 m

11 s 40 s 30 s 48 s 39 s 50 s 3s 16 s

° Our simplified implementation of the synchronous update rule, required that the number of tasks, r, be a multiple of the number of real processors--eight in our case. Consequently, the "class-1" problems that we simulated under synchronous update actually used r = 24 instead of r = 20. Because of the necessity for synchronization of real processors after each update, the difference in execution time attributed to the extra 200 units is negligible.

B. J. Hellstrom and L. N. Kanal

682

TABLE 2 Summary of Simulation Results for 20 Class-2 Problems Using Network MS4 Relaxed Under an Extended Annealing Schedule with 3, = 0.99 and ~o = 5 × 10 -4

Problem Type

Update Rule

% Exact

% VA

amo

tao

Execution Time

Class-2

Asynchronous Update No c23 units

5%

95%

1.42

0.285

13 m 35 s

(without c_,3units) yielded 80% exact solutions and 20% A-violations. 5. A P P R O X I M A T E N E T W O R K S Our approach of reconstructing energy gradients with linear segments can be extended by using graded thresholds as bias functions. In our multiprocessor scheduling network constructions we have justified the correctness of each transformation, AE', of AE by showing that A E ' and AE are, in the low-temperature limit, equivalent. What if we relax this standard and require only that A E ' and AE be approximately equivalent? We refer to the resulting networks as approximate networks. Depending on the form of AE, approximate networks may have important advantages in size, degree of connectivity, and ease of implementation in electronic hardware. We do not intend to confer a thorough analysis of error and the effects of approximation in these networks but, because of their astounding simplicity, are compelled to offer a brief presentation. The subnetworks that are given by embedding Ec in networks MS3 and MS4 are very efficient at minimizing Ec.. These winner-take-all subnetworks have been thoroughly studied and utilized in connectionist models for many years. Ec. can be described in the form o f a 0-1 Hamiltonian in s 0 and hence, does not require augmenting the network with specialized units. Conversely, E..r is difficult to embed and requires the addition of three or four specialized feature detectors for each s-unit. Let r

p~,. = x7 s,,./,.

(16)

t:l tel

Consider the function ./'.t,(z) shown in Figure 8. For appropriately selected temperature 7,-, this function approximates g,.. If the error of approximation is "small enough" to be ignored, then we can replace all of the c-units in each processor/task cluster with a single unit that approximates g,. as shown in Figure 9. We refer to this unit as a g-unit and the resulting network as MS5. Network MS5 has 2mr units and mr 2 + r m 2 - r m connections. Unlike its predecessors, the units of network MS5 do not operate at the same temperature, '7". Furthermore, the temperature of each g-unit does not change during relaxation. Let 7-.,. be the temperature of unit g,.. ~'., is chosen so that ].¢r.,(z) - g , ( z ) l dz,

f~

is minimized. Since both g,. and J.) are symmetric about z = 0. it is sufficient to minimize [¢r,(Z) - g.,(z)l dz.

fo

d~r(-) d:

I

1 -

at

) 2

z=0.

/1

I. . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

i

i

i !

where ........

I i .........

/

- ............................. i

g~(z) =

~+~

if-

_
if=>_/-' 2 Function g,. is shown in Figure 7.

18)

(21)

4'-/-

derivative of g,.(-) at - = 0 is 1//,-. If l/l,- >- 1/ 4'T at - = 0 then g,-(=) _< . f r ( - ) for - / , . / 2 _< _- -< 0. This form is illustrated in Figure 10. The area bounded by.¢r(--) and g,.(=)..4j + A_,, is given by

17)

if=_< - -

'r

The

. . . . . . . . .

0

(20)

Depending on 7",., the approximation error function (integrand) has two possible forms. The derivative of ./:r is

We then have

AE.,-.-tl,..~,(p.,.,.-D+~)

(19)

1

FIGURE 7. Normalized overflow penalty.

i

i I .: ........... ]

i i

i

Multiprocessor Scheduling Networks

683 In this case

'/

!

.4t + A z is given by

:o

,

l

( Zo + +

f_o

(g.,-(:) - f r ( : ) )

dz.

(24)

o

........~......... ~ ........ i:......... ~ ........ ~ .......... i

ol

I

/i /i

~

i I

I

i i I

We k n o w that g.,.(Zo) = zo /,

i I

dz +

f_O

(25)

f : o ./.:c(z) d:,

(26)

zc

.;o(,

(27)

-]- e - z / ' r

(./.'r(Z) - g~(z)) dz

(22)

1~/2

= T.ln(2)

1 2"

and

FIGURE 8. Graded threshold approximation of overflow penalty.

f-1.,/2 .~r(-)

+

= T . I n ( 1 + e .-o/,r). -/'± 8"

(23)

So that 2

for - < 0. If 1 / / , < 1 / 4 T t h e n , f r and g,. have an intersection, zo, on ( - I , . / 2 , 0) as shown in F i g u r e 11. '4

.-1, = ~ - - I n ( l + e -:°/'r)

NN

2/~

Processors

....::i:!:~i[~!~

,~.

(28)

~

~:~

N

N

._2Cl~s~ L,III,

,~,

FIGURE 9. Multiprocessor scheduling approximate network MS5 with detailed cluster-internal and entering connections.

(29)

684

B. J. HelLs't,om and L. N. Kanal

In summary, we have

0.5

I

A, + A2

-

T.ln(2)

1 1' if~>-~-~

1~8

2"/" In( I + e :°/'r)

.................................... STHRESH Cr ,'-y

Zo + I,.zo + I~

-TIn(2)

8

1.,

if I

I

~- <

35) All that remains is to derive the intersection, -0.

gx

o o

ix

_-

2

--*-

FIGURE 10. Approximation error with lower temperature graded threshold.

-o

/~

+

We now treat .42.

£"

(g.,(z)

-./.r(Z)) dz.

(30)

£:,,

(31)

II

=

f,

dz +

g,(z)

:0

=

+

.[r(Z) dz,

dz + T . ( l n ( l + e ' - W r ) - I n ( 2 ) ) .

(32)

_ [ --o + 21_ ) . = T . ( I n ( I + e :~ur) - In(2)) - -o~,_~.,

(33)

o

Combining both A, and .42 gives .4, + .-1. = 2T ln(I + e :"zr) - T ln(2)

/.,

.

(34)

l

2

1

- - 1 + e --oJr •

2zo - / ~ 2Co + 1,

-

.-1, =

36)

.I.r( Zo) = g,.( zo).

0

+

e --°/'r

37)

= 0.

38)

The zero of (.[r(Zo) - g , ( z o ) ) does not have analytic form. However, we can still c o m p u t e a numerical approximation of.4~ + .42 whenever 1/[, < 1/47". The m i n i m u m value of.4, + .'12 is also numerically approximated. This function is concave on ( - I , / 2 , O) so that finding a m i n i m u m is straightforward. For locating Zo as a function o f T , we used routine Z E R O I N ( Forsythe, Malcolm, & Moler, 1977 ). For minimization of.4~ + .42. we used routine F M I N (same reference). The specific choice of these routines is u n i m p o r t a n t - any zero-search or convex function minimization routine would serve as well. [, was varied from I to 100 in steps of 1, and for each value o f / , , the value o f T minimizing .-1, + A2 was found numerically. This relationship is nearly linear and is illustrated in Figure 12. The linear regressor of the value o f t that minimizes A, + A2, as a function o f / , , is used to approximate f i r and is given by q',. ~ 0.23917126. [, + 0.00022136.

o I

0.5

o I

( 39 )

o O

................................... 25

20 ..... ........................... 15

!

!

lx 2

Zo

0.0

iii[i'

.....

. . . . . . . . . . . . . . . . . . . . . . . . . i

,

i

,

i0

20

30

40

.

.

.

.

.

.

.

i

,

i

i

:

i

50

60

70

80

90

I00

!> 0

FIGURE 11. Approximation error with higher temperature graded

threshold.

i

IX

FIGURE 12. T which minimizes A~ + A= as a function of I,.

Multiprocessor Scheduling Networks

685

TABLE 3 Summary of Simulation Results for Multiprocessor-Scheduling Network MS5

Problem Type Class-1 Class-2

Update Rule

% Exact

% V4

amo

tao

Execution Time

Asynchronous Update Asynchronous Update

95% 10%

5% 90%

1.0 3.61

0.0083 0.48

35 s 2 m 12 s

The slope of f . t , ( : ) at : = 0 is I

h('
1

C(,,,),I

d:

dg,.(:)]

::o

> =

d:

(41)

J:-.o"

and .~1~ + A_, has the second form, as shown in Figure 11. We reiterate that for network simulation, the temperature of unit g,. is fixed at ~ while the c o m m o n temperature, 7", of all s-units is gradually lowered under the annealing procedure. Three sets of simulations of network MS5 were pertbrmed with parallel asynchronous update and the parameters given in Section 4.6. In the case of the partially synchronous algorithm, 3 and 12 units were updated by each real processor prior to each synchronization point for class- I and class-2 problems, respectively. O u r simulation results are shown in Table 3 and are comparable to the best results obtained from network MS4 with the exception that the simulation time of MS5 is roughly 13% and 37% less than that o f MS4 on classI and class-2 problems, respectively. This is directly attributable to the reduced n u m b e r of nodes in network MS5. While we do not intend to delve further into approximate networks here, it is interesting to note that by approximating any gradient, AE, using ./'r as a basis function, that is, by finding a sequence of temperatures 'T a and coefficients a#, box, and cij for 1 < i, g <_ n, g v~i, 1 _
b,j~.s'x - c,

ao'.[r,, -

,

(42)

1:1 g÷i

we can build a network of n . ?f units for minimizing E. The form o f A E / A s , determines how large 7f must be in order to achieve a close approximation. More complex gradients require larger ~ values and yield larger approximate networks. In the case of network MS5, s-units are named by an index pair so that i is s y n o n y m o u s with (x, y) and 1 < i < n with 1 _< x_< r, l _<), < m. We then have 7f=l,

a(.~...y), I

=

.'|ltd.,

~.,.,.~., = ~,.,

= ~

ify v~ g,.], and ify = k'i,.J

-- D,

at z = 0, is 1//~ so that for l, >_ 1,

d/.,, (:)]

A.~'i

0 /x',

(40)

4 ' i , ~ 0.95668504. i', + 0.0008852 The slope o f g , ( z ) ,

=

6. C O N C L U S I O N We have presented a program-constructive approach to deriving asymmetric mean-field networks for multiprocessor scheduling. O u r approach avoids the problem o f 0 - I Hamiltonian reducibility by defining hidden units which reconstruct arbitrary energy gradients either by piecewise composition or, in the case of network MS5, by approximation with a series o f graded thresholds. Under simulation, we have seen that as the problem size, in particular, the range of deadline D and the variability of task sizes,/,, increases, the incidence of exact solutions becomes vanishingly small. There is little that can be done to reverse this unfortunate trend. Nevertheless, despite the scarcity of exact solutions, we have been pleased to find the frequency of syntactically invalid solutions to be consistently negligible and the average schedule tardiness to be small. In future research, we hope to apply our method to scheduling and sequencing problems as well as increasingly constrained packing problems such as 1 and 2dimensional bin-packing.

REFERENCES

Aarts, E. H. L., & Korst, J. H. M. ( 1987). Boltzmann machines and their applications. Proceeding cw!ference on ParaIM,-Irchitectures and Languages Europe. Eindhoven, Springer Lecture Notes. Berlin: Springer-Verlag, 34-50. Aarts, E., & Korst, J. (1989). Simulated annealing and Bahzmann machines. New York:John Wiley & Sons. Cernuschi-Frias, B. ( 1989). Partial simultaneous updating in Hopfield memories. IEI-E Transacthms on Systems. Man. attd C:vherneth's. 14(4), 887-888. Dahl, E. D. ( 1988). Neural network algorithm for an NP-complete problem: Map and graph coloring. IEEE ICNN Cot!l~'renceProceedings. 3, 113-120. Forsythe, G. E., Malcolm, M. A.. & Moler, C. B. (1977). Computer methods.lbr mathematical computations. Englewood Cliffs. NJ: Prentice-Hall. Garey, M.. & Johnson, D. ( 1975). Complexity results for multiprocessor scheduling under resource constraints. SL,IM J. Compttt. 4(4), 397-41 I.

686 Garey, M., & Johnson. D. ( 1979 ). Computers amt int,aclahilitj': .4 guide to the theoo' t~[NP-comph,teness. New York: Freeman. Gutzmann, K. M. ( 1988 ). Combinatorial optimization using a continuous state boltzmann machine. IEEE ICNN Conference Proceedings, 3, 721-734. Hedge, S. U., Sweet. J. L., & Levy, W. B. (1988). Determination of parameters in a Hopfield/Tank computational network. IEEE ICNN Cot!l~,renceProceedings. 2, 291-298. Hellstrom, B., & Kanal, L. (1990). The dt~/htition ofm'cessarv hidden units in neural networks./i, coml~inalorialoptimization. Technical Report UMIACS-CSD-CS-TR-2388). College Park, MD: University of Maryland. Hopfield. J. J., & Tank, D. W. (1985). "'Neural" computation of

B. J. Hellstrom and L. N. Kanal decisions in optimization problems. Bioh~gical Cybernetics. 52, 141-152. Peterson, C., & Hartman, E. (1989). Explorations of the mean field theory learning algorithm. Neural Networks, 2 (6), 475-494. Ramanujam, J., & Sadayappan, P. (1988). Optimization by neural networks. IEEE ICNN Col!ferem'e Proceeding. 2, 325-332. Szu, H. ( 1988 ). Fast TSP algorithm based on binary neuron output and analog neuron input using the zero-diagonal interconnect matrix and necessary and sulficient constraints of the permutation matrix. IEEE ICNN Cot!ference Proceeding. 2, 259-266. Wilson, G. W., & Pawley G. V. (1988). On the stability ofthe traveling salesman problem algorithm of Hopfield and Tank. Bioh~gical Cybernetics. 58, 63-70.