Parallel Computing 6 (1988) 45-56 North-Holland
45
A distributed algorithm for convex network optimization problems * Stavros A. ZENIOS Department of Decision Sciences, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, U.S.A. J o h n M.
MULVEY
Department of Civil Engineering and Operations Research, Princeton University, Princeton, NJ 08544, U.S.A. Received April 1986
Abstract. Gauss-Seidel type relaxation techniques are applied in the context of strictly convex network optimization problems. The algorithm lends itself for processing in a massively distributed environment. A synchronous relaxation method (SRM) is proposed, based on the k-coloring properties of the network graph. The method is tested in a simulated distributed environment on a sequential machine. Lower bounds for the expected efficiency of SRM are developed and compared with its performance as obtained through computational experiments. Keywords. Networks, optimization, nonlinear programming, distributed systems.
1. Introduction Nonlinear optimization problems with network constraints arise in a wide variety of real world contexts. Water distribution and electrical networks, financial models and air-traffic control, and hydroelectric power generation are a few examples of physical systems that can be modeled as nonlinear network optimization problems (NLN). Lasdon and Warren [15] review some of these areas and provide a bibliography. Developing algorithms and software for NLN problems is an area of active research, judging from the literature that constantly appears. See for example [1,4,9,10,14] and their references. In the past, progressively larger problems were solved by algorithms that were designed to capitalize on the problem's intrinsic structure. Refer for example to the computational work by Dembo [8], Ahlfeld et al. [1] or Hearn et al. [14]. Until recently the problem's characteristics were the sole criterion for algorithmic design and implementation. Recent developments in computer architecture add a new dimension to the problem of algorithmic design. The (re)introduction of parallelism in the design of computer systems requires the development of algorithms that may be processed efficiently on distributed or parallel machines. In this paper we consider a relaxation algorithm for convex network optimization problems that lends itself for processing in a distributed environment. A Synchronous Relaxation Method (SRM) is proposed. Lower bounds for its performance when implemented in a distributed environment are developed and the method is evaluated empirically using simulation on a sequential machine. * Research partially supported by NSF grant No. DCR-8401098 and a University of Pennsylvania Research Foundation grant. 016%8191/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)
46
S.A, Zenios, .1.1tl. Mulvey / An algorithmfor network optimizationproblems
Relaxation techniques for network problems were originally used by Birkhoff and Diaz [6]. Bertsekas and Tseng [5] applied relaxation techniques for pure linear networks and more recently Zenios and Mulvey [21] studied relaxation algorithms for strictly convex pure networks and extended the algorithm to the class of generalized network problems (with arc gain/losses). The first to realize the potential of relaxation algorithms for distributed processing were Bertsekas and El Baz [3]. They proposed an Asynchronous Relaxation Method (ARM) and studied its convergence properties. The rest of this paper is organized as follows: Section 2 presents the relaxation algorithm. The implementation of the algorithm in a distributed environment is discussed in Section 3. Section 4 describes the simulation of the algorithm on a sequential machine and gives some computational results. Conclusions and general discussion are the subject of Section 5.
2. The relaxation algoritlun In this section we define the convex network optimization problem and give the necessary and sufficient o p ~ a l i t y conditions. The basic steps of the relaxation algorithm are also presented. The interested reader should refer to [20,21] for a more detailed description of the algorithm, discussion of implementation issues and some computational results. Given a directed network G - (N, E) where N = { i l i - - 1 , . . . , n } is the set of nodes and E = {(i, j ) l i , j E N } the set of arcs (edges), we define 8+(i) = { (i, j ) for all (i, j ) ¢ E } :
the set of all arcs leaving node i,
8- (i) ffi { ( j , i) for all ( j , i) ¢ E } :
the set of all arcs entering node i.
The primal separable convex network problem can be defined as [NLN1]
minimize
Y'.
f ij ( x ,s ),
(i,j)~E_
subject to A. ~ ffi b, where f~j(x~s) : convex function of one variable, A :node-arc incidence matrix with at most two nonzero entries in every
column, ], ~
: vector of decision variables, : vector of supplies and demands, : vectors of lower and upper bounds.
Refer to Fig. 1 for a typical network and the underlying node-arc incidence matrix. The problem can be transformed into one of unbounded variables if we define
/,s(x,j)f {w,j(x,s) + oo
l,s x,s u,s,
for otherwise
(2.1)
where wis(x~s ) is a strictly convex function of one variable. (The restriction for strict convexity of w will be justified when the algorithm is presented.) The circulation subspace of the network G is defined as
I(i,j)~8+(i) x,
(j,i)~.8-(i)
,22,
S.A. Zenios, J.M. Mulvey / An algorithm for network optimization problems
47
bl
b3 -b 4 Graph
Representation N= [ 1.2,3,4] E = [ (1,2), (1,3), (1,4), (1,4), (2,3), (2,4), (3,4) ] G = (N.E)
Node - Arc Incidence Matrix
3 -9 I
1
1
3 3 -5
-2
5 1 -10 -1
-1 -1
Fig. 1. A typical network problem and node-arc incidence matrix.
Then we may rewrite [NLN1] in the form [NLN2]
nfmitmz"e
~
fij(Xij),
(i,j)EE
subject to • ~ C. The dual problem can be defined [18] in terms of the the orthogonal complement C- to C as [DNLN]
minimize
E
convex conjugate function f * to f and
fiT(to),
(i,j)~E
subject to t E Cwhere
(i, j ) E E ) ,
fiy(tij) = sup(tij'xu--fo(xij),
x~j
(2.3)
t , j - ~ i - % , (i, j ) E E ) . (2.4) The vector t is known as the tension vector. The necessary and sufficient optimality conditions C - = (~ Iprice vector ~:
for [NLN1] are (I) Primal feasibility: x ~ C; (II) Dual feasibility: t E C-; 0If) Complementary slackness:
X~y- u~y iff
tij >
~Mlij(xij) [ OXij
(2.5)
Xij~tli ]
xij = l~j iff
Xij "- Xij
iff
t~j < tij --
~
I
Ow,j(x,J)I OXij
(2.6)
xo=t,,'
xo =.~,j
for liy ~
(2.7)
S.A. Zenios, J.M. Mulvey
48
/An algorithmfor network optimizationproblems
I t ' s convenient to characterize an arc (i, j ) ¢ E as follows: (i) Active if t~j > (Ow~j(xu)/i)xij) l x,j==,~; (ii) Balanced if t~j = (~)w~/(x~j)/~x~j) l ,,,j=,~,~ for lij <~~ j <~u~j; (iii) Inactive if tij < (~w~j(xo)/i)x~j) l,,,~=t,~. We state now a proposition that motivates the relaxation algorithm and provides the necessary theoretical foundation. This proposition follows as a corollary from Luenberger [16, Lemma 1, p. 398] and is also proved by Bertsekas and Tseng [5]. Proposition 2.1. Let F(cr)= E(,.j)~E f/~(t,y)---- T'(i.j)~E ~'~(¢ri-¢~). Then aF(~)/~cr~=d, for every i ~ N, where we define the deficit of node i by
E
a,=
E
x,j-
(i,j)~8+(i)
(2.8)
(j.i)~8-(i)
We may interpret this proposition as follows: along the ith direction of minimal support in the dual space, defined by ~ - (0, 0 . . . 0, ¢t~,0 . . . 0), the dual function attains its minimum when the deficit di becomes zero. The basic relaxation algorithm is presented in Fig. 2. Comments. (1) At Step 1 a tolerance is set for the allowable deficit of all nodes. This tolerance is decreased until it reaches some final value. (2) Step 2 is a bracketing procedure that determines the range of values of a for the linesearch phase of Step 3.
Step 0: (Initialization) Set k *- 0. Assign prices ~.o to all nodes i ~ N. Compute ~o to satisfy conditions (2.5)-(2.7).
Step 1: Set k *- k + 1. Find a node n with I d~ [ > TOLD, where TOLD is some tolerance. If no such node exists, decrease TOLD or STOP. Step 2: (Locate dual break points) If d~ < 0 solve for ~:
awnj(Xnj)[
, (n,j)~8+(n)
aw'"(x")I
, (i,n)~8-ln)
~r = max(%,, %=) If d~ > 0 solve for _~:
axnJ ix"j=i"J j)~8+(n) aWan(Xan)I , (i,n)~8-(n) --- min(vr,,,, ~r.=)
Step 3: (One dimensional search) If d~ < 0 solve
rain F(~ + ( a - ~ ) x ~.)
,,,",~ =~ # and ~ satisfies complementary slackness conditions. (~,, is a vector of all zeroes, except a + 1 at the n th position.) If d~ > 0 solve
rain F(@ +(a- ~r~)xd.) and ~ satisfies complementary slackness conditions. Set ~r~ = a*, where a* is the solution to the one-dimensional search, and go to Step 1.
Fig. 2. The relaxation algorithm.
S.A. Zenios, J.M. Mulvey / An algorithmfor network optimization problems
49
(3) By Proposition 2.1 the one-dimensional search converges to the price a* that makes the deficit for node i equal to zero; in effect until I dn I < TOLD. Essentially if the deficit is negative at the kth iteration we increase the price until the deficit, corresponding to ~ variable that satisfies complementary slackness, becomes zero. For positive deficit the price decreases. (4) The strict convexity of w~j(xij ) ensures that the convex conjugate fi~(tij ) is continuously differentiable. In addition, for a given price vector ~ the value of ~ that satisfies complementary slackness conditions is uniquely specified through equations (2.5)-(2.7). The linesearching procedure of Step 3 has a unique solution. (5) Conceptually the algorithm is similar to coordinate-wise minimization in the dual space, since steps are always taken in the directions of the dual coordinate axis.
3. Distributed synchronous relaxation The careful reader must have noticed that the algorithm iterates on the price of one node at a time, given information only from adjacent nodes. We may exploit this fact in implementing the relaxation algorithm in a distributed environment. We allow for l~e prices of more than one node to vary simultaneously, in a synchronized fashion that preserves the global convergence properties of the algorithm--Synchronous Relaxation Method (SRM). An Asynchronous Relaxation Method (ARM) has been proposed by Bertsekas and El Baz [3]. In their scheme relaxations are carried out in parallel by several processors in arbitrary order and with arbitrarily large interprocessor communication delays. In Section 5 we discuss the relative advantages and disadvantages of ARM and SRM. In specializing an algorithm for a distributed environment we have to consider the following three issues [12]: (1) Partitioning problem: Partition an algorithm into tasks that can be executed by independent processing units. (2) Scheduling problem: Assign each task to one or more processors for execution. (3) Synchronization problem: Specify an order of execution that guarantees correct results. Before considering these issues it is important to fix a target system configuration. We consider a design common in many supercomputers like the CRAY-XMP/48, AUiant FX/8, CRAY-2 or HEP, namely shared memory multiprocessor systems. Figure 3 outlines the basic components of a prototype distributed system [19]. We consider systems that in Flynn's taxonomy [11] may be classified as Multiple Instruction Multiple Data (MIMD) machines, with P independent processors. 3.1. Algorithm partitioning To achieve a distributed algorithm we have to identify parts of the problem that can be processed by many independent processors. Since the relaxation algorithm iterates on one node at a time given information at adjacent nodes, we may relax in parallel prices of nodes that are not directly connected to each other. Therefore, in partitioning the relaxation algorithm for distributed processing we have to identify disconnected subsets of nodes. We define the partitioning problem as follows: Given G - ( N, E) find a set of subsets { Ni }, i = 1,..., K such that K
Nffi U
N,nNj- ,
i*j
i----1
and for every k, I ~ N i there does not exist ( k, 1) ~ E, for all N i.
50
S.A. Zenios, J.M. Mulvey / An algorithmfor network optimizationproblems I
CPU
CPU (vector)
I
I Main memory
[
I
Billions
(vector)
I
CPU
[
CPU (vector)
i
(vector)
of 64-bit words CPU (vector)
[
J
(vector)
I
(vector)
I
Secondary
mass storage
Fig. 3. A typical distributed processing system.
The smallest number of disjoint subsets ( K ) is the chromatic number of the network [13]. Solving for the chromatic number of a network is an NP-complete problem for which several heuristics are available. However, we are not interested particularly in the smallest number of K. Instead we want to determine sets N~ whose cardinality is approximately equal to the number of processors (P) in our system, in order to achieve load b~ancing among the processors. This problem is solved using a modification of a graph coloring heuristic [7]. Refer to Fig. 4 for the modified graph coloring algorithm. Once we have solved the partitioning problem we may process simultaneously all the nodes in the same subset.
3.2. Task scheduling We consider now the problem of scheduling the different processors to the independent tasks obtained by solving the partitioning problem. We consider two modes of task scheduling: a static mode and a dynamic mode. In the static mode every processor will process exactly the same node in every set. This scheduling mechanism has an advantage for computer systems that have some form of local memory associated with every processor: the only data that have to be fetched from global memory are the prices of all adjacent nodes. Information related to the node processed may be kept in local memory. The major disadvantage of this mechanism is that it makes the distributed relaxation 0-resilient, i.e. the algorithm will not converge unless all P processors are
Step 1: Sort all nodes in N by their degree in descending order. Set K = 0 for the first color. Step 2: Go down the fist of ordered nodes coloring with K the first node i which is not colored and is not connected to a node that is already colored with K.
Step 3: Drop i from the list of uncolored nodes. If the number of nodes with color K is equal to the number of available processors P, or if the fist of nodes has been exhausted go to Step 4. Otherwise repeat Step 2.
Step 4: If all nodes in N are colored terminate. Otherwise set K ffi K + 1 and go to Step 2. Fig. 4. Modified graph coloring algorithm.
$.A. Zenios, J.M. Mulvey / An algorithmfor network optimization problems
51
operational. This is due to the fact that we have only one degree of freedom in our system, and even if only one processor fails we will have K nodes whose prices will remain constant. We consider a dynamic scheduling mechanism that makes distributed relaxation ( P - 1)resilient: every processor is assigned-- in a cyclic fashion--to a different node of a set during every iteration. This scheme has the disadvantage that whenever a new node is processed all relevant data will have to be fetched from global memory. This will not cause memory access conflicts since every processor iterates on a different node. The major advantage of dynamic scheduling is that even if all processors but one fail the method is still convergent. The remaining processor will iterate over all nodes in some order and we have a uniprocessor system.
3.3. Task synchronization Synchronization of the relaxation algorithm raises several practical issues. We may consider a situation where all processors terminate on the current set before processing a different set. This will result ha an algorithm whose time behavior for every set is equal to the time behavior of the slowest processor. On the other extreme we may terminate all processors when at least one processor converged. In general there is a tradeoff between the number of processors that remain idle during part of the time, and the amount of work performed by the remaining active processors. The extent to which synchronization is achieved depends on the target configuration, the overhead involved in assigning the processors to a new set, the characteristics of the network problem and so on. Between iterations on different sets the prices of all nodes processed and the flows of all arcs have to be saved in global memory. This procedure is not a bottleneck for distributed relaxation: prices and flows are updated in parallel by their respective processors. Memory access conflicts are avoided since no two processors will attempt to write in the same price or flow location.
3.4. Discussion We have described how the relaxation algorithm can be implemented in a distributed environment. Before presenting some computational results we develop a lower bound for the performance of the distributed algorithm and discuss the issue of message exchange between processors. Let T(P) denote the time taken to solve a problem with P processors. An absolute lower bound on T(P) is given by T ~ ( P ) - T(1)/P. A tighter lower bound is obtained by making use of Amdhal's law [2]
(")] 1-
T(P) = T(1)
7
iZ l f ,
-'
where f~ is the percentage of the program executed at level of parallelism i = 1, .... P - 1. (Note that T~(P)= !iml,_.0 T(P).) We may compute f~ based on the coloring of the graph for a particular problem and a fixed number of processors, and using the fact that for every node n the amount of work done is O(d(n)), where d(n) is the degree of the node. The lower bound for the expected performance of the distributed relaxation was used as a benchmark against which the computational results have been compared. It can also be used to predict the behavior of the algorithm when implemented on different computer systems.
S.A. Zenios, J.M. Mulvey / An algorithmfor network optimization problems
52
Example 3.1. Consider a problem with 400 nodes and 1000 arcs. The modified coloring algorithm, applied on this problem with P = 10 processors, resulted in 39 sets of ten nodes each, one set of nine nodes and a set with a single node. The average degree of a node is 2.5, and hence we have T(IO)
T(1)[[3901000 1 00(2.5) xx + 91000 9x0(2.5) X + O(2.5)1~J] =0.041 X O(2.5) • T(I).
The number of messages exchanged between processors is an important issue in multiprocessing systems, since excessive amount of message exchange will result in bottlenecks. Every time the processors are assigned to a new set of nodes the following data have to be updated: save the currenl values of prices and arc flows, and fetch the prices for all nodes connected to the nodes in the new set. If we have P processors the amount of work required for data storage is (1) O(P) for prices, (2) O(EP=I d(i)) for arc flows, where d(i) is the degree of the node processed by the ith pr _ocessor. If d(n) denotes the average degree for all nodes, then the amount of information (S) requiring memory access for storage is
S - O( P) + O( P x d ( . ) ) . In retrieving the data required to process the new set we have
R - O(P x a(n)), since for every node to be processed we need the prices of all nodes connected to it. Hence if we let a denote the number of times required to process all K sets of nodes, the amount of messages exchanged is given by
x K x [o(,,,)+ 2x o(p x a(,,))].
4. Simulation results The SRM algorithm was implemented in a simulated distributed environment on a sequential machine. The algorithm was essentially run by a single processor. Tasks of the algorithm that could be run in parallel were timed together and results were stored in temporary arrays. The results of all independent tasks were updated into global arrays when all tasks were completed, and the execution time of the slowest task was recorded. The updating procedure and the modified graph coloring algorithms were executed sequentially and were timed separately. The simulation considers the following issues: (1) Time the SRM when P processors are available, P > 0; (2) Time anticipated bottlenecks of SRM, specifically the chromatic number algorithm and the updating procedure; (3) Compare Synchronous Relaxation Method with the Asynchronous Relaxation Method [31. The experiments were carried out on a VAX 11/750 under UNIX 4.2BSD at Princeton University. The software system NDUAL [21], coded in FORTRAN 77, performed the relaxation steps. Three problems ranging in size from 50 nodes/125 arcs to 400 nodes/1000
S.A. Zenios, ,I.M. Mulvey / An algorithmfor network optimizationproblems < ~ ~:
150
90
W
60
=,.=~=, z
•
k
~," _ w
mo
53
SIMULAT£D TIME
-'~
30 C
I
'
z
'
20
NUMBER OF PROCESSORS
2,° l
. . . . . . .
~o~ 8~ ~o ,, o ~z
S _ o
30
60
eo
i~
,so
NUMBER OF PROCESSORS
=~°-=.O=ooU =o. =°-40°i ~o 2°°~-~.r~ 0.~
0 0
; ~a 40 80 120 160 NUMBER OF PROCESSORS
200
Fig. 5. Simulation results for test problems.
arcs were solved. (Refer to [21] for more details of the problem characteristics; Problems Grouplaa-lac-lad.) The results of the simulation for the three test problems are shown in Fig. 5. The values of T(P) for different values of P are also shown, together with the curve for Too(P). We observe a good degree of agreement between the shapes of the three curves. Our computational results differ from T(P) by a constant factor, which may be attributed to the fact that in plotting 7'(P) we set O(d(n)) = d-(n), as a rough approximation. Overhead involved in the experiment and fluctuations in the timing routines also contribute to the discrepancy between the theoretical and empirical results. The discrepancy between 0.8 • GROUP IAA o GROUP IAC ~ GROUP lAD
_
0.4 -0 I
¢ 0
I 4O
I I 80 120 NUMBER OF PROCESSORS
I 160
200
Fig. 6. Timing results for modified chromatic number algorithm.
54
S.A. Zenios, J.M. Muivey / An algorithmfor network optimization problems
0.6 o GROUP IAA •'~ G R O U P
lAD
A
01
0.4
_u °'2f~~p o~)
,I 12o
I
41o
80 NUMBER
I
160
200
OF PROCESSORS
Fig. 7. Timingresults for updating procedure.
T(P) and T~(P) is insignificant: it arises when SRM cannot utilize efficiently processors beyond a maximum number, which is determined by the coloring of the particular graph. The anticipated bottlenecks of SRM are the execution of the chromatic number algorithm and the updating of prices and flows into global memory. These parts of the algorithm were timed separately and the results are shown in Figs. 6 and 7. We observe that these parts take two orders of magnitude less time than the relaxation steps, even for large number of processors. Finally we consider the maximum number of processors that can be used efficiently by $RM. As we keep adding processors to our system a different coloring of the network will be generated, and a smaller number of bigger sets will be processed independently. At some point though the system will become saturated and additional processors will remain idle. The maximum number of processors that can be utilized efficiently by every problem depends on the sparsity of the underlying graph. Figure 8 illustrates the situation for our three test problems and two large scale applications--STICK4 and BIGBANK [1]. 5. Conclusions We discussed in this paper the potential implementation of a network relaxetion algorithm in a distributed processing environment. The performance of the distributed algorithm was
500 o 40C 30C 2oo
m
~oo 0.~
I
I
I
-D.
-Q.
~"
D.
Z ,~
0 re, (.9
0 D::: c9
0 o:: 0
u ~-. 01
m (.9
PROBLEM
Fig. 8. Maximum number of processors utilized by SRM.
S.A. Zenios, J.M. Mulvey / An algorithmfor network optimization problems
55
Table 1 Comparing SRM with ARM Characteristics Lowerbo
dest
SRM
ARM
atefor
i
solution time with P pro-
P - 1 f/
T +
T1
i -- 1
cessors
T(1)
P
Resilience
P- 1
1
Global convergence properties
Convergent
Convergent subject to some restrictions on the starting point [3]
investigated using both theoretical lower bounds on its behavior, and empirical results from a computational simulation. We observe that the distributed algorithm has an improved performance by an order of magnitude over a sequential version of the algorithm. The expected speedup justifies the study of this mathematical algorithm in the context of distributed processing, if large problems are to be solved efficiently on distributed computer systems. The relaxation algorithm has been studied extensively elsewhere [20,21], and was shown to be a poor match for currently used algorithms for nonlinear network optimization like truncated Newton [1] or simplicial decomposition [17]. Our study indicates that this belief may change as distributed machines become more widely available to the scientific community. It is, however, a topic of current research how commonly used algorithms can be streamlined for vector or distributed machines. On a more microscopic level we observe a good degree of agreement between the results of the simulation and the lower bounds obtained by a theoretical analysis of the algorithm. In addition the lower bound of SRM is almost identical to the 'best case' lower bound T~(P). We notice from our study that SRM cannot utilize an infinite number of processors. This is observed when the simulation time and T(P) remain unchanged as we add more processors. Our Synchronous Relaxation Algorithm (SRM) may be compared with the asynchronous algorithm [3] before actually implemented in a distributed environment. Table 1 summarizes their main features. While ARM has potential to achieve higher speedup in a distributed environment, SRM utiliTes more effectively a limited number of processors. The algorithm is a promising method that can be used in monitoring the behavior of nonlinear networks ill equilibrium. When some of the dual prices change from an optimum setting, relaxation can be used to determine the effect on the remaining prices. Physical systems like electric power generation or communication networks and economic systems can be monitored in this manner.
References [1] D.P. Ahlfeld, R.S. Dembo, J.M. Mulvey and S.A. Zenios, Nonlinear programming on generalized networks, ACM Trans. Math. Software, to appear. [2] G. Amdhal, The validity of single processor approach to achieving large scale computing capabilities, AFIPS Proc. 30 (i967) 483-485. [3] D.P. Bertsekas and D. El Baz, Distributed asynchronous relaxation methods for convex network flow oroblems, MIT Report LIDS-P-1417, 1984.
56
S.A. Zenios, J.M. Mulvey / An algorithmfor network optimizationproblems
[4] D.P. Bertsekas and M.E. Gafni, Projected Newton methods and optimization of multicommodity flows, IEEE Trans. Automat. Control 28 (12) (1983). [5] D.P. Bertsekas and P. Tseng, Relaxation methods for minimum cost network flow problems, MIT Report LIDS-P-1339, 1983. [6] G. Birkhoff and J.B. Diaz, Nonfinear network problems, Quart. Appl. Math. 13 (1956) 431-44. [7] N. Christofides, An algorithm for the chromatic number of a graph, Comput. J. 14 (1) (1971) 38-39. [8] R.S. Dembo, A primal truncated Newton algorithm for large-scale nonlinear network optimization, School of Organiaafion and Management Working Paper, Series B#72, Yale University, 1983. [9] R.S. Dembo and J.G. Klincewicz, A scaled reduced gradient algorithm for network flow problems with convex separable costs, Math. Programming Studies 15(1981) 125-147. [10] R.S. Dembo and T. Steihaug, Truncated-New gorithms for large-scale unconstrained optimization, Math. Programming 26 (1983) 190-212. [11] M.J. Flynn, Some computer organizations an ctiveness, IE££ Trans. Comput. 21 (9) (1972) 948-960. [12] D.D. Gajski and J.K. Peir, Essential issues in ~ o c e s s i n g systems, Computer (June 1985) 9-26. [13] M.R. Garey and D.S. Johnson, Computers and In,actability (Freeman, San Fransisco, 1979). [14] D.W. Hearn, S. Lawphongpanich and J.A. Ventura, Restricted simplicial decomposition: Computation and extensions, Research Report No. 84-38, University of Florida, GainesviUe, 1984. [15] L.S. Lasdon and A.D. Warren, A survey of nonlinear programming applications, Operations Res. 28 (1980) 34-50. [16] D.G. Luenberger, Linear and Nonlinear Programming (Addison-Wesley, Reading, MA, 1984). [17] J.M. Mulvey, S.A. Zenios and D.P. Ahlfeld, Simplicial decomposition for convex generalized networks, Report EES-85-8, Princeton University, 1985. [18] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, N J, 1970). [19] J. Warlton, Supercomputers: Past-present-future, Proc. BCS Supercomputer Summer Institute, University of Washington, Seattle (1985). [20] S.A. Zenios, Sequential and parallel algorithms for convex generalized network problems and related applications, PhD Dissertation, Engineering-Management Systems, Civil Engineering Department, Princeton University, Princeton, NJ, 1986. [21] S.A. Zenios and J.M. Mulvey, Relaxation techniques for strictly convex network problems, in: Annals of Operations Research, Vol. 5 (Scientific Publishing Company, Switzerland, 1986); Special volume on Algorithms and Softwarefor Optimization.