Parallel Computing 18 (1992) 103-107 North-Holland
103
Short communication
A synchronous algofitban for shortest paths on a tree machine EI-Sayed M. EI-Horbaty Maths. Dept., Faculty of Science. Ain Shams University, Egypt
Alaa El-Din H. Mohamed Maths. Dept., Faculty of Science, Cairo University, Egypt
Received March 1991 Revised May 1991
Abstract
EI-Sayed M. EI-Horbaty and Alaa El-Din H. Mohamed, A synchronousalgorithm for shortest paths on a tree machine, Parallel Computing 18 (1992) 103-107. This paper presents a synchronous (SIMD) algorighmfor solving the single source problem for finding shortest paths in a network on a tree machine model. The algorithm requires O(N Iog2N ) complexity time using a tree machine with N leaf processingelements. Keywords. Shortest paths problems; parallel algorithms" tree machines; complexity time.
I. Introduction The advance of parallel architectures has prompted the design of numerous parallel algorithms using various models of computations. With the recent advances in VLSI technology, it has now become technically possible to build parallel machines with thousands of processing elements (PEs). The interconnection between PEs is one of the most critical issues in the architecture of parallel machines. M a n y interconnection network structures have been proposed and used h~ various parallel machines, such as shared memory, mesh connected, cube connected, and perfect shuffle models [7], and tree model [2]. Recently m a n y parallel algorithms have been developed for tree machines for solving various problems, such as by Bentley and Kung [2] for searching, Browning [3] for sorting and matrix multiplications, and Siva and Murthy [8] for evaluating polynomials. 1 Present address: Dept. of Maths. and Comp. Science, Emirates University,Al-Ain, U.A.E. 0167-8191/92/$05.00 © 1992 - Elsevier Science Publishers B.V. All rights reserved
104
EI-Sayed M. EI.Horbaty. Alaa El-Din H. Mohamed
The shortest paths problem is considered one of the most fundamental problems in combinatorial optimisation. Therefore, a large number of sequential algorithms has been developed for finding shortest paths in a network, such as by Bellman [1], Floyd [6], Deo and Pang [4]. Dijkstra [5] has proposed an efficient algorithm for solving the shortest paths problem; namely the single source problem. It requirtes O(N 2) complexity time on a serial machine. Recently, Yadegar et al. [9] have developed parallel algorithms for shortest paths problems (single source and all pairs) on a 2-dimensional mesh connected model (distributed array processor (DAP)) having complexity time O ( N ) using N 2 processing elements. Our motivation in this research has been to develop a synchronous (single instruction stream-multiple data stream (SIMD)) algorithm for shortest paths problems; namely, the single source problem, on a tree machine model. The algorithm requires O(N log2N) using N leaf processing elements. The rest of this paper is organized as follows. In Section 2, we briefly outline the tree machine model. Section 3 contains the relevant definitions and notations. In Section 4, we describe the parallel algorithm and its complexity. Section 5 contains an implementation of the algorithm on the tree machine. We conclude the paper in Section 6.
2. The architecture of the tree machine model
A tree machine model is illustrated in Fig. 1. It is a full binary tree. The node with no parent is called the root. The level of a node is defined as the number of arcs in the path from the root to that node. Thus the leaf nodes are at level log2N where N is the number of the leaf nodes. The tree contains two types of nodes: circles and squares (PEs). The primary functions of the circles are to broadcast data and also to combine their inputs. This means that branches of the tree represent two-way communication links. The squares (PEs) are used for storing data and computing. Each PE has its own local memory. The PEs receive a single instruction transmitted by the circles and execute it simultaneously on multiple data stored in their memories. Thus, the tree machine is an SIMD model (for further details, see for example Bentley and Kung [2]).
3. Definitions and notations
Let G = (V, A) be a network consisting of a finite set V of nodes and a finite set A of ordered pairs of nodes called arcs, I V I = N. Each arc (i, j), i ~ j, has a real number, ljj, called the arc-length of the arc (i, j), l i i = oo if there is no such arc and l, = 0 for every i ~ V. Let (u, v) be an arc in G, then v is called an immediate successor of u, and u is called an immediate predecessor of v. A path P from node s to node t is a finite sequence of arcs (s, ul), (ul, u2) ..... (uk, t) in which all nodes are distinct. The length ofapath P is the sum of lengths
Level
o
Level
z
Level
2
Leuel
3
Leaves Fig. 1. Tree machine.
A synchronous algorithm for shortest paths on a tree machine
105
of all arcs in P. The path from s to t with minimum length is called the shortest path from s to t and the length of a shortest path is called the shortest length.
4. Algorithm SPT Given a network G--(V, A) with nonnegative arc lengths. Algorithm SPT works by partitioning the network G into two disjoint subnetworks G1 = (//1, El) and G2 = (V2, E2). The subnetwork G~ contains the set of nodes of which their shortest lengths have already been determined and the shortest path from the source to any node in this set lies wholly in G1. The subnetwork G2 has the complementary set of nodes, which are in G and not in G~, and all arcs among these nodes. Each node j ~ V is represented as a record with three fields: - N O D E ( j ) indicates the node number, i.e. N O D E ( j ) - - j for all j ~ V. - PREDECESSOR(j) indicates the nearest immediate predecessor which has already been known in the shortest path from the source to j. The actual shortest paths can be obtained by tracing the final PREDECESSOR. - SL(j) indicates the shortest length which has ~,h'eady been known from the source to j. The algorithm can be described as follows: begin {Initialisation} For j:= 1 to N d o
begin N O D E ( j ) - - - j ; P R E D E C E S S O R ( j ) . = 1; S L ( j ) .~- !~/ end; V2 .'-~ V - [1]; repeat {Repetition} Determine r where SL(r) -- mini e v2 (SL( j)); V2 := V2 - [r]; for each k in V2 do in parallel if SL(k) > SL(r) + Irk then begin SL(k) := SL(r) +/,i,; PREDECESSOR(k ):= r end until V2 = [ ] end Ir~;tially, V1 contains only the source node (say nod.z 1), and both of its associated shortest lengt~ and predecessor become permanent. At each repetition step, we look, simultaneously, at all nodes in V2. The node with minimum shortest length will be determined, say node r, then the node r is transferred to V~ and both of its associated shortest length, SL(r), and predecessor, PREDECESSOR(r), become permanent. For each node k ~ V~, we calculate, in parallel, the length of the path from the source to k via the node r, say P,k ( P , ~ - - S L ( r ) + Irk). If P,~ is smaller than SL(k) then update SL(k) to become Prk and PREDECESSOR(k) to become r. We will show that SL(r) is the short.est length fron'., the source node to node r. Suppose SL(r) is not the shortest length of r, then there exist a shortest path q~ containing some nodes other than r which are not in V~. Let x be the first such node in q~, but then the length from the source to x is shorter than SL(r). This contradicts the fact that r was selected to be the node with the minimum shortest length in V2 and G contains only nonnegative arc-lengths. We
106
EI.Sayed M. EI.Horbaty. Alaa El.Din H. Mohamed
conclude that the path q~ does not exist and SL(r) is the length of the shortest path from the source node to node r. Clearly, the algorithm terminates after all nodes are transferred to Vt. Therefore, we have the following Theorem. Let G =(V, A) be a network with nonnegative arc-lengths. The algorithm SPT determines shortest paths from the source node to all other nodes in G after a finite number of iterations.
S. Algorithm SPT on the tree machine
On the tree machine our algorithm can be implemented as follows: each leaf processing element PE(j) keeps the value of a record of node j, j ~ V2. Here the PEs in the machine operate synchronously; i.e. at any given time the PEs execute the same instruction, each on a different data set. The initialization step stores a record in each processing element. This requires linear time on the tree machine. In the repetition step, finding the minimum of N numbers on the tree machine takes exactly one sweep up the tree; i.e. O(log2 N). As the inner loop, the FOR loop, we employ the massive parallelism of the tree machine. That means, we simultaneously compute SL(r) + l,k for every k ~ V2. We (temporarily) make the PEs with SL(k) > SL(r) + l,k as active PEs; i.e. turn on these PEs. We update SL and PREDECESSOR fields for all active PEs. Then turn back on all PEs currently representing elements on V2 and execute the repeat loop until V2 becomes empty. Updating the appropriate fields requires constant time Thus the total time of each repeat step requires O(log 2 N) time on the tree machine and since ~ms step is executed N - 1 times, the total time of the repeat loop is O(N log 2 N). That cost dominates the cost of the initial step. Thus the running time of algorithm SPT is O(N log2 N ) using N leaf PEs.
6. Conclusions A synchronous (SIMD) algorithm for finding shortest paths as well as shortest lengths from a source node to all other nodes in a network on a tree machine with N leaf PEs is represented. The complexity time of the algorithm is O(N log2 N) using N leaf PEs. Our algorithm performs N iterations of logarithmic cost. In the worst case, it performs ( ~ ) calculations, which is optimal. One can also see that for any function '/" growing as fast as log N, the algorithm can be implemented in N , v t , ( N ) time on an N/vt,(N) processor machine, giving an optimal processor-time product of O(N2). The idea of developing parallel algorithms on tree machines to solve combinatorial optimisation problems is promising and deserves much more attention and will likely lead to significant performance improvement.
References [!] R. Bellman, On a routing problem, Quarterly Appl. Math. 16 (1958), 87-90. [2] J.L. Bentley and H.T. Kung, A tree machine for searching problems, in: Proc. IEEE Intern. Conf. on Parallel Processing (1979) pp. 257-266. [3] S.A. Browing, Computation on a tree of processors, in: Proc. Caltech Conf. on VLSI (1979).
A synchronous algorithm for shortest paths on a tree machine
107
[4] N. Deo and C.Y. Pang, Shortest path algorithms: taxonomy and notation, Technical Report No. CS-80-057, Computer Science Department, Washington State University, Pullman, Washington, USA (1980). [5] E.W. Dijkstra, A note on two problems in connexion with graphs, Numer. Math. 1 (1959) 269-271. [6] R.W. Floyd, Algorithm 97, shortest path, Commun. A C M 9 (1962), 345. [7] D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms a~d a new generalized connection network, J. ACM 29 (3) (1982) 642-667. [8] C. Siva Ram Marthy, Synchronous and asynchronous algorithms for evaluating polynomials on a tree machine, in: Proc. 4th Intern. Conf. on Supercomputing, Santa Clara, CA, 1 (1989) 177-179. [9] .L Yadegar, D. Parkinson, S. EI-Horbaty and A.M. Frieze, Algorithms for shortest paths problems on an array processor, Proc. 4th lnten~. Conf. on Supercomputing, Santa Clara, CA, 1 (1989) 167-176.