Mesh permutation routing with locality

Mesh permutation routing with locality

24 August 1992 Information Processing Letters 43 (1992) 101-105 North-Holland Mesh permutation Steven Cheung and Francis routing with locality C...

487KB Sizes 0 Downloads 108 Views

24 August 1992

Information Processing Letters 43 (1992) 101-105 North-Holland

Mesh permutation Steven

Cheung

and Francis

routing with locality

C.M. Lau

Department of Computer Science, The University of Hong Kong, Hong Kong

Communicated by K. Ikeda Received 7 March 1991 Revised 31 January 1992

Abstract

Cheung, S. and F.C.M. Lau, Mesh permutation

routing with locality, Information Processing Letters 43 (1992) 101-105.

Given the permutation routing problem on mesh-connected arrays with a known maximum distance, d, between any sourcedestination pair, we show how sorting and the greedy algorithm can be combined to yield a deterministic, asymptotically optimal algorithm for solving the problem. This simple algorithm runs in d + O(d/f(d)) time and requires an O(f(d)) buffer size (or O(d) time and constant buffer size if we choose f(d) to be a constant). It also gives efficient solutions to the k-k routing problem with locality. Keywords: Parallel algorithms, mesh-connected

computers, permutation routing, k-k routing, routing with locality, MIMD

1. Introduction There has been a great deal of study on the permutation routing problem in two-dimensional n x n grids, which is a basic subproblem of the packet routing problem in these structures [l 23 ,4 >7 711,151. In the permutation routing problem, every processor has exactly one packet to send, and exactly one processor wants to send a packet to any other processor. In the model we use, all the processors are assumed to run in the synchronous MIMD mode. At any time step, each processor can communicate with all of its grid neighbours; it can both send and receive one packet along each edge. In addition, processors can also store packets in their own queues. This model is the same as the one used in [1,3-5,7,111. Leighton et al. [7] gave an optimal algorithm for the permutation routing problem which required 2n - 2 time steps and constant buffer size. These bounds are the best possible if we consider

that the maximum distance between the source and the destination is 2n - 2. When the packets only need to travel a relatively small distance, however, directly applying Leighton et al.‘s algorithm becomes less favourable. This problem of restricted distance routing arises, for example, when other types of processor or process networks are embedded onto mesh-connected computers to yield a small dilation [2,8-10,12,14]. Let d, and d,, be the maximum among all origin-destination distances along the horizontal and vertical axes respectively, and let d,,, = max(d,, d,:). Krizanc et al. [3] proposed a randomized algorithm which ran in time 3d,,, + an d re q uired @(log n) buffers, both with o(d,,,) high probability. In addition, they described a randomized O(d,,,) algorithm for the case in which all d,, d,, and d,,, were unknown. Leighton [6] analyzed some greedy routing algorithms ’ and showed that the basic greedy algo-

Correspondence to: Dr. F.C.M. Lau, Department of Computer Science, The University of Hong Kong, Hong Kong.

’ In greedy routing algorithms, packets always try to follow the shortest routes to their destinations.

0020-0190/92/$05.00

0 1992 - Elsevier Science Publishers B.V. All rights reserved

101

Volume

43, Number

2

INFORMATION

PROCESSING

rithm (simply called the greedy algorithm hereafter) could route packets to their destinations in d + 00og n) time steps using very small constant buffer size, both with high probability, where d is the maximum distance between any source-destination node pair. In addition, the expected time complexity is only d + c, where c is a small constant. Kunde [4] gave a deterministic algorithm which needed 2d + O(n/f(n)) time steps and 2f(n) buffer size, 1 > and O(f(n>> by using block size of n/f(n) x n/f(n), when n G d < (2n - 2). In the next section, we show that we can do better than this by combining the sorting phase in [4] and the greedy algorithm, resulting in time and buffer size complexities of d + O(d/f(d)) and O(f(d)) respectively. In Section 3, we discuss the results of applying the proposed algorithm to the k-k routing problem.

2. Sorting + greedy algorithm In this section, we first analyze the performance of the greedy algorithm [3,6,7] in the context of restricted distance routing, and then show 102

LETTERS

24 August

1992

how it can be combined with sorting to give a d + O(d/f(d)) time, O(f(d)) buffer size algorithm. As the distance bound is d, the resulting algorithm is asymptotically optimal. Lemma 1. Given the restricted distance permutation routing problem with maximum distance d, if all the packets are first routed to the correct row, and then routed to the correct column using the “farthest destination first” strategy, then d time steps are sufficient and each processor needs O(d) buffers. Proof. In the model we use, only one packet is transmitted in each time step. If there are more than one packet competing for a particular edge, only one of them will be selected according to some contention resolution strategy. In the farthest destination first strategy, any packet having a farther destination in the direction of the edge has a higher priority [3,6,11]. We claim that at the end of the ith time step, all the remaining packets have distance < d i + 1, where 1 < i < d. Consider the ith step. During this step, any packets routed along the vertical edges would not be blocked. For packets moving toward their destination columns, those with distance d - i+ 1 would always be selected as the candidates to proceed because of the farthest destination first strategy and the fact that no two packets can have the same destination. Therefore after d time steps, all the packets will have reached their destinations. Note that in each time step, at most two new packets would be added to the queues of any processor, and so O(d) buffers suffice. Moreover, it is easy to construct a scenario in which two additional packets are queued up per step for about d/3 steps. Hence, a(d) buffers are required. 0 The time complexity of greedy algorithm is very desirable, but not its buffer size requirement. To reduce the buffer size requirement, sorting can be used initially to redistribute the packets, thus avoiding an excessive number of packets going to the same processor. Therefore, we refine Kunde’s algorithm as follows:

Volume

43, Number

2

INFORMATION

PROCESSING

24 August

LETTERS

1992

Algorithm 1 (Sorting + Greedy Algorithm for Permutation Routing) 1. Sort the packets in each block which is of size in row-major order (i.e. phase d/f(d) X d/f(d) 1 of Kunde’s algorithm). rows, 2. Move all the packets to their destination and then using the farthest destination first strategy move them to their destination columns. Let (r, c) denote the address of a packet, where r and c are the row and column address respectively of the destination processor; then the row-major ordering of two packets with destination addresses (I~, c,) and (r2, c2) is defined as (rr, cr) < (rZ, c2) iff rl < r2. The processors are indexed by a one-one mapping onto (1,. . . , n2} and the sorting problem with respect to this mapping is to move the ith smallest packet to the ith indexed processor. See Fig. 1 for row-major and snake-like row-major indexing of the processors. Theorem 2. Algorithm 1 can be used permutation routing problem on mesh mum source-destination distance d O(d/f( d)) time steps and O( f(d)) node.

to solve the with maxiusing d + buffers per

Proof. The proof of the buffer requirement is similar to that in [41. Without loss of generality, consider an arbitrary row r and an arbitrary column of blocks, each of size d/f(d) x d/f(d) (see Fig. 2). Label the block in this column intersecting row r as block 0. The blocks above block 0 are indexed 1, 2,. , ., where blocks nearer to block 0 have smaller indexes. Similarly for blocks

Fig.

1. Row-major

indexing (left) and indexing (right).

snake-like

row-major

Fig. 2. Labeling

of blocks

below block 0, except that they are indexed by negative numbers. It is important to observe that only blocks with indexes in the range [-f(d), ..., -1, 0, l,...,f(d)1 can have packets with row address r. After the sorting phase *, packets in every block are sorted in row-major order with respect to row-major indexing of the processors. Let b(i) be the number of packets in block i with row address r; then after sorting, the number of packets in column c going to processor (r, c) according to the greedy algorithm, L, is as follows: f(d)

c

L=

fb(i) + (d/f(d))1

i = -f(d) f(d)

c

<

i

i= -f(d)

b(i) xf(d)/d

Moreover, the total number of packets in this column of blocks with row address r is at most 2d + d/f(d). Hence Cfc? fCd) b(i) G 2d + d/f(d), and so L < 4f(d) + 2. ’ For the time complexity, sorting the packets with respect to row-major indexing can be done in 4d/f(d) + o(d/f(d)) time (first by sorting them with respect to snake-like row-major indexing in 3d/f(d) + o(d/f(d)) time [131, and then using

2 The 3n sorting algorithm require any extra storage.

in [131 is used which

does not

103

Volume

43. Number

2

INFORMATION

PROCESSING

d/f(d) more steps to change it to row-major indexing “1. Now let the source-destination distance of a packet along the horizontal and vertical axes be d, and d, respectively. Clearly after the sorting step, they would become at most d, + d/f(d) and d, + d/f(d) respectively. Therefore, by Lemma 1, the whole routing phase can be completed after at most d, + d, + 2d/f(d) time steps, which is bounded by d + 2d/f(d); so the whole algorithm can finish in d + 6d/f(d) time steps. q

3. k-k routing A more general version of the permutation routing problem is called the k-k routing problem, in which each processor has exactly k packets to send, and exactly k packets to receive at the end. If the maximum source-destination distance of all the packets is d, a time lower bound for this problem on an r-dimensional mesh with side length n is kd, for d G [n/2]. The proof can be derived using the cut argument in [5]: If each of the processors with address (pl,. . . , p,>, 1
3 If we stopped at snake-like row-major indexing, then the bound on the number of packets in column c going to processor (r, c) contributed by block i would be [b(i)+ (d/f(d))l+ 1 instead. 104

LETTERS

respect to layer-major, (i.e. phase 1 of Kunde’s

24 August

1992

row-major indexing 4 k-k routing algorithm

[51X 2. Move all the packets to their destination rows, using the FIFO strategy to resolve conflicts, and then move them to their destination columns using the farthest destination first strategy. Theorem 3. Algorithm 2 can be used to solve the k-k routing problem on mesh with maximum source-destination distance d using kd + O(kd/f(d)) time steps and O(kf(d)) buffers per node. Proof. (By induction on the number of time steps.) After the sorting phase, the source-destination distance of any packet is increased by at most 2d/f(d). For convenience, we define a bulk step to be k time steps, and let D be d + 2d/f(d). Induction hypothesis: At the end of the ith bulk step, all the remaining packets should have distance < D - i + 1. During the (i + 11th bulk step, all packets moving towards their destination rows should be able to move one step forward because there are at most k packets in any processor competing for a vertical communication link at any moment. Among packets moving along the horizontal links to their destination columns, those with remaining distance D - (i + 1) + 1 would advance because of the farthest destination first strategy and the induction hypothesis. In other words, every packet can reach its destination after kD time steps. As sorting only requires O(kd/f(d)) time steps [51, the total number of steps required is kd + O(kd/f(d)). The proof of the O(kf(d)) buffer size bound is omitted here, because it is simiIar to that in the permutation case (Section 2). Cl

4 Packets in each processor are divided into several layers, with packets with smaller destination row addresses moving to layers with smaller indices; then row-major indexing is used within each layer.

Volume

43, Number

4. Concluding

2

INFORMATION

PROCESSING

remarks

We have shown a d + O(d/f(d)) steps, O(f(d)) buffer size algorithm for the restricted distance permutation routing problem. By choosing f(d) to be a constant, our objective of finding an O(d) time deterministic algorithm that uses constant buffer size in the worst case is achieved. As the distance bound is d, our algorithm is asymptotically optimal. We also showed how the sorting + greedy algorithm can be used to tackle the k-k routing problem on a 2-dimensional mesh with time and buffer size complexities of kd + O(kd/f(d)) and O(kf(d)) respectively. In fact, it can be used to improve the time bound of the algorithm in [3] for the individual locality routing problem in which any packet with source-destination distance di is to be routed to its destination within O(di.) steps. Since our algorithm treats all elements of the mesh the same, and does not depend on any global property of the mesh, our results hold for rectangular meshes and wrap-around meshes. Leighton et al. 171 have made a very important contribution to the special case d equal to 2n - 2, but it is still open whether one can find a deterministic of randomized algorithm that can solve this problem of restricted distance permutation routing using d time steps and constant buffer size.

Acknowledgment We are grateful to Professor Francis Chin for introducing us to this problem of routing with locality and his helpful comments, and to the anonymous referees for their useful suggestions.

References [ll T. Han

and D.F. Stanat, Move and smooth routing algorithms on mesh-connected computers, in: Proc. 28th Ann. Allerton Conf: on Communication, Control, and Computing (1990) 236-245.

LETTERS

24 August

1992

[2] D. Krizanc and L. Narayanan, Off-line routing with small queues on a mesh-connected processor array (Extended abstract), in: Proc. 3rd IEEE Symp. on Parallel and Distributed Processing (1991) 301-304. [3] D. Krizanc, S. Rajasekaran and T. Tsantilas, Optimal routing algorithms for mesh-connected processor arrays, in: Proc. Third Aegean Workshop on Computing: VLSI Algorithms and Architectures, Lecture Notes in Computer Science 319 (Springer, New York, 1988) 411-422. [4] M. Kunde, Routing and sorting on mesh-connected arrays, in: Proc. Third Aegean Workshop on Computing: VLSI Algorithms and Architectures, Lecture Notes in Computer Science 319 (Springer, New York, 1988) 423433. [51 M. Kunde and T. Tensi, (k-k) Routing on multidimensional mesh-connected arrays, .I. Parallel Distributed Comput. 11 (1991) 146-155. [61 T. Leighton, Average case analysis of greedy routing algorithms on arrays, in: Proc. 1990 ACM Symp. on Parallel Algorithms and Architectures, pp. 2-10. [71 T. Leighton, F. Makedon and LG. Tollis, A 2n -2 step algorithm for routing in an n x n array with constant size queues, in: Proc. 1989 ACM Symp. on Parallel Algorithms and Architectures, pp. 328-335. on near[81 S. MatiC, Emulation of hypercube architecture est-neighbor mesh-connected processing elements, IEEE Trans. Comput. 39 (5) (1990) 698-700. [91 R.G. Melhem and G.Y. Hwang, Embedding rectangular grids into square grids with dilation two, IEEE Trans. Comput. 39 (12) (1990) 1446-1455. Embedding one inter1101B. Monien and H. Sudborough, connection network in another, in: G. Tinhofer et al., eds., Computational Graph Theory (Springer, New York, 1990) 257-282. and T. Tsantilas, An optimal randomized [ill S. Rajasekaran algorithm for the mesh and a class of efficient mesh-like routing networks, in: Proc. 7th Conf. on Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science 287 (Springer, Berlin, 1987) 226-241. The performance of WI D.A. Reed and D.C. Grunwald, multicomputer interconnection networks, IEEE Comput. 20 (1987) 63-73. [131 C.P. Schnorr and A. Shamir, An optimal sorting algorithm for mesh connected computers, in: Proc. 18th Ann. ACM Symp. on Theory of Computing (1986) 255-263. parallel neighboring com[141 L. Tao and E. Ma, Simulating munications among square meshes and square toruses, in: Proc. 2nd IEEE Symp. on Parallel and Distributed Processing (1990) 850-857. [151 L.G. Valiant and G.J. Brebner, Universal schemes for parallel communication, in: Proc. 13th Ann. ACM Symp. on Theory of Computing (1981) 263-277.