JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO.
35, 57–66 (1996)
0068
Dimension Ordering and Broadcast Algorithms in Faulty SIMD Hypercubes1 C. S. RAGHAVENDRA*
AND
M. A. SRIDHAR†
*School of Electrical Engineering Computer Science Washington State University, Pullman, Washington 99164; and †Department of Computer Science, University of South Carolina, Columbia, South Carolina 29208
computing. Therefore, it is necessary to compute important primitive functions even in the presence of faults. The hypercube network is quite robust [2, 20]; in fact, at least n faults are needed to disconnect Qn into two components. The symmetry and robustness of hypercube can be exploited to compute many functions efficiently even when there are about n faults in the systems. Several researchers have developed algorithms for solving a wide range of different problems in the presence of processor and/or communication link faults [2, 8, 17, 4]. In this paper, we develop algorithms for an important global operation, namely, broadcasting from a specific node in the n-dimensional hypercube to all other nonfaulty processors, in the presence of up to n 2 1 faulty processors. We are particularly interested in the SIMD mode of operation, in which processors operate in lock-step and every data transfer that occurs at a given time step is made along the same dimension. Further, the broadcasting algorithms presented in this paper are source independent, meaning that the sequence of dimensions used for broadcasting is the same regardless of which source is broadcasting. SIMD operation has the merit that node programs implementing the algorithms are independent of the processor ID, and therefore very simple to develop; thus, our results have direct practical application. It is important to point out here that the broadcasting problem in general has been addressed by many researchers in the past [6, 5, 12, 1, 15]. Some of these techniques utilize only local faulty nodes information and/or MIMD mode of operation. Others use table based approach and require large amount of preprocessing. However, very few papers have addressed faulttolerant broadcasting in the SIMD mode of operation. A recent paper studies broadcasting with only link faults and present an optimal broadcasting algorithm with up to n 2 1 link faults [14]. This algorithm works in SIMD mode of operation, however, it is not source independent, which means the successive dimensions used depends on the source node address. We also make two further simplifying assumptions: (1) that there are no more than n 2 1 faulty node processors, and no faulty links, in the network and (2) that we know a priori and set of faulty nodes in the network. Our primary intent in making these assumptions is to gain insight into
In this paper, the problem of broadcasting in an n-dimensional SIMD hypercube, Qn , with up to n 2 1 node faults is studied. In an SIMD hypercube, during a communication step, nodes can exchange information with their neighbors only across a specific dimension. The broadcasting algorithms must work independent of the location of the source node and faulty nodes. In a fault-free hypercube, any source node can broadcast a message to all nodes in n steps, by successive communication along any arbitrary ordering of the n dimensions. Given a set of at most n 2 1 faults, an ordering d1 , d2 , ..., dn of n dimensions is developed, depending on where the faults are located. An important and useful property of this dimension ordering is the following: if the n-cube is partitioned into k-subcubes using the first k dimensions of this ordering, namely, d1 , d2 , ..., dk for any 2 # k # n, then each k-subcube contains at most k 2 1 faults. This result is then used to develop several new algorithms for broadcasting. These algorithms use n 1 3 log n, n 1 2 log n 1 2, n 1 log n 1 O(log log n), n 1 log n 1 5, and n 1 12 time steps respectively, and thus improve upon the best known algorithms for this problem. This ordering of dimensions is also demonstrated in the presence of node as well as link faults. In this paper, it is also known how to extend the dimension ordering theorem for handling up to (n2) faults. Using this result, it seems possible to obtain even more fault-tolerant algorithms for the broadcasting problem. 1996 Academic Press, Inc.
1. INTRODUCTION
For massively parallel computers, several processor network architectures have been proposed. Among these, meshes and hypercubes are used in many experimental and commercial machines including Intel machines [13], Ncube, and the connection machine [9]. Hypercube is a highly regular, symmetric, and recursive structure, and its properties have been studied extensively [7, 20]. Parallel algorithms for many problems including linear algebra [10], image processing [18], and communication problems [3, 11] have been developed for hypercube machines. Fault tolerance in machines such as hypercubes is important in order to achieve sustained high performance 1 This research is supported in part by NSF Grants MIP-9296043 and MIP-9103086.
57 0743-7315/96 $18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
58
RAGHAVENDRA AND SRIDHAR
the broadcasting problem in this restricted model of parallel computation, and to lay the groundwork for handling more complex cases. The preclusion of link faults is not a severe restriction, and can easily be relaxed. The assumption of a priori knowledge of faulty nodes, on the other hand, allows us to improve upon previously known algorithms for this problem [21, 17], which require 2n time steps. Moreover, since the occurrence of processor faults in practice is infrequent, it is reasonable to assume that fault information is broadcasted to all nodes (using, for example, the algorithms of [17]) so that the more efficient algorithms proposed here can be used for subsequent application execution that may require frequent broadcast operations. In the context of these assumptions, we describe a series of new algorithms for broadcasting. The first two are simple and practical. The first uses n 1 3 log n time steps, and the second uses n 1 2 log n 1 2 time steps. The third algorithm is intended to exhibit an asymptotic speedup using recursive application of earlier algorithms, and uses n 1 log n 1 O(log log n) time steps. By further analyzing fault patterns, two additional algorithms are developed which take n 1 log n 1 5 steps and n 1 12 steps, respectively. It is emphasized that we are measuring the actual number of steps needed for broadcasting, and hence, there is significant improvement in terms of extra steps needed for broadcasting compared to the required n steps. This last algorithm is nearly optimal, since it is known that even with MIMD operation it takes n 1 1 steps for broadcasting in the presence of up to n 2 1 faults [16]. Our algorithms are all based upon a certain property of the dimensions of the hypercube in the presence of faults; this property is interesting in its own right and is likely to have other applications. These algorithms perform better than previously known algorithms for this SIMD faulttolerant broadcasting problem. However, if one relaxes the requirement of SIMD mode and allows MIMD or synchronous MIMD mode of operation, then it is possible to perform broadcasting in n 1 1 time steps [16]. We also sketch a way of handling link faults for these algorithms. In Section 6, we will show how the main theorem can be extended to handle up to (2n) node faults. This extended theorem can be used for obtaining more fault-tolerant algorithms; the construction of such algorithms, however, is omitted from the present paper. 2. BACKGROUND AND NOTATION
We will use the term k-cube to mean a k-dimensional hypercube. We will use the term k-subcube to mean a kdimensional subcube of some larger cube. Every subset S # h1, 2, ..., nj of the dimensions of Qn induces a collection of 2n2uSu uSu-subcubes, each obtained by varying the bits of the dimensions of S over all possible values while fixing the rest of the bits at an arbitrary value. However, when given a particular node x [ Qn and a particular subset S # h1, 2, ..., nj of dimensions, there is a
unique uSu-subcube induced by S containing x. We will denote this subcube by xkSl. For example, given a node x [ Q5 , we denote by xk2, 4, 5l the 3-subcube induced by the dimensions 2, 4, and 5 containing x. It is well known that the hypercube Qn can be viewed as the Cartesian product of the single edge (Q1) with itself, taken n times. We will occasionally find it convenient to take a viewpoint related to this, i.e., that Qn can be thought of as the Cartesian product Qk % Qn2k , for any k in the range 1 # k # n 2 1. This essentially means viewing Qn as being an (n 2 k)-dimensional cube, each of whose nodes is really a ‘‘supernode’’ that contains a k-dimensional cube. Given a specific set D of dimensions of Qn , i.e., D # h1, 2, ..., nj, we denote by Qn /D the (n 2 uDu)-dimensional cube that uses the dimensions not in D, whose supernodes use the dimensions in D. We will say that the dimensions in D are the internal dimensions (used within the supernodes), and those not in D are the external ones. Throughout the rest of this paper, we will address the situation in which we are given a particular n-cube Qn and a particular set F of faulty (or otherwise unavailable) nodes in Qn such that uFu # n 2 1. A doubly faulty edge is defined to be an edge with both end nodes faulty. 3. ALGORITHMS FOR BROADCASTING
The following theorem asserts the existence of a particular ordering of the dimensions of Qn and is central to our subsequent constructions. THEOREM 1. The dimensions of Qn can be ordered as (d1 , d2 , ..., dn) such that for every k, 2 # k # n, every subcube induced by the dimensions (d1 , d2 , ..., dk) contains at most k 2 1 faulty nodes. Proof. Assume that the fault set F is ordered arbitrarily, as F 5 ( f1 , f2 , ..., fn21). We will first prove another fact using induction: that there is an ordering (i1 , i2 , ..., in22) of some n 2 2 dimensions such that, for every j, 1 # j # n 2 2, no two faulty nodes among f1 , ..., fj agree in their values of the j-bit vector in positions i1 , ..., ij . Define the jth dimension ij as follows: (a) Choose i1 to be any dimension in which f1 disagrees with f2 . (b) For j $ 2, if fj11 agrees with some earlier faulty node fl (l # j) in all the dimensions i1 , i2 , ..., ij21, then choose any dimension t in which fj11 disagrees with fl , and let ij 5 t. (c) If there is no such fl , then choose for ij any dimension that is not already chosen. The base case, of choosing the first dimension i1 , is easy to verify. To show the inductive step, note that when choosing ij , if fj11 disagrees with all of f1 , f2 , ..., fj in the ( j 2 1)bit vectors induced by bit positions i1 , ..., ij21 , then we can choose any dimension for ij that is not already chosen, and our inductive hypothesis will be preserved. On the other hand, if there is some earlier fl with which fj11 agrees on
BROADCASTING IN FAULTY SIMD HYPERCUBES
these j 2 1 bits, then there is at most one such fl , by the inductive hypothesis; and we only need to choose some dimension for i on which fj11 and fl disagree. We now construct the ordering required by the theorem by essentially using the reverse ordering of the ij’s. We choose d1 and d2 to be the two dimensions that were not chosen in the above procedure, and for j $ 3, dj 5 in2j11 . To see that this choice of the d’s does indeed meet the conditions of the theorem, consider the dimensions d1 , d2 , ..., dj . If we vary these bit positions over all possible values, while fixing the remaining bits arbitrarily, we get a j-subcube that contains at most j 2 1 faults. This is because, from the proof of the inductive hypothesis above, no two faulty nodes among f1 , f2 , ..., fn2j11 agree on their values of the (n 2 j)-bit vector in positions dj11 , ..., dn; in other words, every j-cube formed by varying the bits in dimensions d1 , d2 , ..., dj contains at most j 2 1 faults, i.e., the faulty nodes fn2j12 , ..., fn21 together with one of f1 , f2 , ..., fn2j11 . For example, consider the case j 5 2. From the proof of the inductive hypothesis above, no two faulty nodes agree on their values of the (n 2 2)-bit vector in positions d3 , d4 , ..., dn ; i.e., every 2-cube formed with edges in dimensions d1 and d2 contains at most one fault. j To illustrate the above procedure for constructing the dimension ordering, consider the following example. Suppose that n 5 6, and that we have five faulty nodes 011101, 000101, 110101, 111000 and 101000. For convenience we can represent these faulty nodes in the form of Fig. 1. We can choose i1 5 2, since f1 and f2 disagree in dimension 2. Now since f3 agrees with f1 in dimension 2, we have to pick a dimension in which f3 and f1 disagree; we can choose i2 5 1. Similarly, f4 agrees with earlier node f3 in both the previously chosen dimensions 2 and 1, so we choose i3 5 3, since f4 and f3 disagree in dimension 3. Finally, since none among f1 , ..., f4 agree with f5 in all three dimensions 2, 1 and 3, we choose i4 5 4 (some dimension not already chosen). One resulting dimension ordering is (5, 6, 4, 3, 1, 2), since the dimension ordering is the reverse of the order of choices we made. It is easy to see that the claims of the theorem are met by this example. For instance, for the three dimensions (d1 , d2 , d3) 5 (5, 6, 4), we can see that there is no more than one fault with any given bit pattern in the remaining three dimensions 1, 2, and 3. When given the fault set F, it is not hard to construct an O(n2) sequential algorithm that outputs the dimension ordering for the given fault set. Such an algorithm would make use of some of the bit-twiddling tricks described in
FIG. 1. An example set of faults.
59
[19]. Again, we would like to point out that the occurrence of faults is very infrequent and so the set of faulty nodes information changes rarely. Thus, even though this ordering computation takes O(n2) time, it helps in speeding up broadcasting operation. Broadcasting is a common operation in parallel computations and used quite often in application programs. Therefore, performance of application programs can be improved with more efficient broadcasting by using the algorithms described in this paper. An interesting special case of this ordering, one that will be useful subsequently, is the following corollary. COROLLARY 2. In the ordering of the above theorem, let dj be any dimension such that j $ 3. Then every 3subcube induced by the dimensions hd1 , d2 , djj contains at most two faulty nodes. Proof. Let dj be any dimension among d3 , d4 , ..., dn . Let C be any 2-subcube that uses dimensions d1 and d2 . By the above theorem, C contains at most one faulty node; again by the above theorem, the 2-subcube that is the ‘‘image’’ of C along dimension dj contains at most one faulty node. Thus the 3-subcube formed by d1 , d2 , dj contains at most two faults. j 3.1. Broadcast Algorithms The above theorem can now be applied to obtain efficient algorithms for SIMD broadcasting. In general, it is possible to completely specify an SIMD algorithm for broadcasting by simply describing the sequence of (not necessarily distinct) dimensions along which the algorithm requires its nodes to transmit data during each time step. To illustrate this point, here is a simple broadcasting algorithm in a fault-free cube: ALGORITHM A. Execute n steps, such that during the ith step, every node sends its message along dimension i. This algorithm is described by the sequence (1, 2, ..., n). This algorithm does not work when one or more faulty nodes exist. However, a straightforward modification of this algorithm tolerates f # n 2 1 faulty nodes and uses n 1 f 1 1 time steps: ALGORITHM B. Execute 2n steps, such that the sequence of dimensions along which the non-faulty nodes send messages is 1, 2, ..., n, 1, 2, ..., n. Algorithm B correctly broadcasts messages from an arbitrary source node to every other node in the presence of up to n 2 1 faulty nodes [22]. This 2n step algorithm correctly broadcasts in a hypercube with n 2 1 faults was shown in [17], where the first n steps are used to communicate to all neighbors of source node and the next n steps are used for recursive doubling. Here we give a simple proof that Algorithm B works correctly in SIMD fashion by showing that a message from source exercises all n node disjoint paths to any destination node. Let S be the source node and consider some arbitrary non-faulty destination node D. Let S and D differ in bits
60
RAGHAVENDRA AND SRIDHAR
( j1 , j2 , ??? , jm) and be the same at bit positions k1 , k2 , ??? , kn2m . It is known that there are n node disjoint paths between S and D [20]. Within the m-subcube containing S and D there are m node-disjoint paths. Their dimension sequences are ( j1 , j2 , ..., jm), ( j2 , j3 , ..., jm , j1), ..., ( jm , j1 , ..., jm21). These paths are obtained by rotating the first string as given in [20]. There are (n 2 m) nodedisjoint paths outside the subcube containing nodes S and D; each of these paths has a dimension sequence which starts and ends with some ki , 1 # i # (n 2 m), and has j1 , j2 , ..., jm in between, i.e., the sequence ki( j1 , j2 , ... jm)ki . Note that the dimensions inside the parenthesis can be traversed in any order. All these paths are node-disjoint as they are paths in independent subcubes. When executing Algorithm B, the message from source S travels through all n node-disjoint paths to any nonfaulty destination node D. Consider a path ( jl , jl11 , ..., jm , j1 , j2 , ... jl21), which is a path in the subcube containing S and D. Clearly, the message from S travels along ( jl , jl11 , ..., jm) in the first n steps of Algorithm B and the message travels through the remaining dimensions in the second n steps of the Algorithm. Consider a path that is outside of subcube containing S and D, which starts and ends with dimension ki . If ki , j1 or ki . jm , then consider path ki( j1 , j2 , ..., jm)ki; otherwise, consider path ki( jl , jl11 , ... jm , j1 , j2 , ... jl21)ki , where jl21 , ki , jl . Clearly, in 2n steps of Algorithm B, the message from node S traverses to destination D through one of these two paths. Since the message from S exercises all n node-disjoint paths to any arbitrary non-faulty destination node D, Algorithm B correctly broadcasts in 2n steps. Figure 2 shows broadcasting in a 5-dimensional hypercube, where the dimensional ordering with four faults is shown. The source node is marked S and faulty nodes are marked by X. This figure is drawn using the dimension ordering obtained as above, thus, each subcube formed with d1 , d2 will have at most 1 fault, each subcube with d1 , d2 , d3 will have at most two faults. For clarity, all links between corresponding nodes are not shown. The
FIG. 2. Broadcasting in a 5-dimensional hypercube.
broadcasting algorithm B will use the dimension sequence d1 , d2 , d3 , d4 , d5 , d1 , d2 , d3 , d4 , d5 . Consider a nonfaulty node D which differ from S in dimensions d1 , d3 , d4 , and d5 . The node disjoint paths between S and D exercised in the above algorithm are (d1 , d3 , d4 , d5), (d3 , d4 , d5 , d1), (d4 , d5 , d1 , d3), (d5 , d1 , d3 , d4) and (d2 , d3 , d4 , d5 , d1 , d2). The first two paths are fault-free and the last three paths will be blocked by faults. In light of the above discussion, the approach we take to constructing SIMD broadcast algorithms is to simply specify the sequence of dimensions to be used by the algorithm. We will assume a priori knowledge of which nodes are faulty, and then derive the dimension sequence using this knowledge. The algorithms we describe here closely follow Algorithm A above, in that they attempt to execute data transfers along the same sequence of dimensions. However, since we consider the scenario in which up to n 2 1 nodes are faulty, our algorithms will need to interrupt this sequence with periodic ‘‘fix-up’’ phases. Thus our algorithms execute in a series of ‘‘phases’’; each phase is designated as either a transmission phase (corresponding to the subsequence obtained from Algorithm A) or a flooding phase (in which fix-up occurs). The flooding phase is intended to allow non-faulty nodes that would ordinarily have received the message, but did not do so because of the presence of other faulty nodes, to catch up. Typically, our algorithms will consist of several long transmission phases, each of which is followed by a short flooding phase that compensates for faulty processors. Consider broadcasting from node S in the 5-dimensional faulty hypercube shown in Fig. 2. Our approach now is to first broadcast within a small cube of size k spanned by dimensions d1 , d2 , ... dk . This subcube has at most k 2 1 faults and so the number of non-faulty nodes with the message after broadcasting in k-subcube will be 2k 2 k 1 1. Next, we transmit along dk11 , dk12 , ... d2k2k . The observation to be made here is that after each transmission at most one other nonfaulty node within the k-subcube may be blocked by a fault. We can continue the transmissions along successive dimensions until just one nonfaulty node has the message with in a k-subcube. Therefore, we transmit along dimensions dk11 through d2k2k at which time there can be a k-subcube with just one nonfaulty node with the message. Now, we need to perform local copying (broadcasting) to all nonfaulty nodes within k-subcubes. Referring to Fig. 2, let us say, we broadcast within 3-cubes first and then transmit along other dimensions and then copy to all nodes within 3-cube that did not receive the message. So, the sequence used for the example in Fig. 2 will be d1 , d2 , d3 , d1 , d2 , d3 followed by d4 , d5 , (d1 , d2 , d3 , d1 , d2 , d3). The second part of the sequence shown in parentheses is the local copying steps, so that all nodes within that 3subcube gets the message. This is a small example, but with a larger n this algorithm will have fewer steps than Algorithm B. Now, we present this Algorithm C for general values of k and n.
61
BROADCASTING IN FAULTY SIMD HYPERCUBES
ALGORITHM C. Input. A cube Qn , the source node S for the broadcast and a fault set F containing # n 2 1 faults, and a parameter k such that 3 # k # n 2 1. Output. A dimension sequence for SIMD broadcasting from S to all nonfaulty nodes. Method. (1) Construct the dimension ordering (d1 , ..., dn) of theorem 1. (2) Use Algorithm B to obtain the dimension sequence Seq for broadcast from S to the k-subcube induced by d1 , d2 , ..., dk containing S. Note that Seq is now of length 2k. (3) Set i r k. repeat (3a) Let m 5 min(i 1 2k 2 k 2 1, n). Append to Seq the sequence (di , di11 , ..., dm), and make each node transmit its data along the dimensions in this sequence. (This is the transmission phase.) (3b) Append to Seq the sequence (d1 , ..., dk , d1 , ..., dk), and make each node transmit its data along the dimensions in this sequence. (This is the flooding phase.) (3c) If m $ n, then terminate this loop. (3d) Set i r i 1 2k 2 k. forever. Analysis of Algorithm C. The sequence output by this algorithm correctly broadcasts from S because it maintains the following invariant: Invariant for Algorithm C. Before Step 3a begins, all nonfaulty nodes in the subcube Skd1 , ..., di21l have received S’s message (where i is as described in step 3).
It is clear from Algorithm B that this invariant holds before the loop of step 3 is entered, i.e., for i 5 k. We can show, using an inductive argument, that the invariant holds for subsequent executions of the loop as well. Consider an execution of step 3a. This step involves a total of 2k 2 k transmission steps. Before step 3a begins, we know (by the inductive hypothesis) that at least 2k 2 k 1 1 nonfaulty nodes in any k-subcube of Skd1 , ..., di21l induced by dimensions d1 , ..., dk have received S’s message. After each transmission of step 3a, at most one of these nodes will lose its message to a faulty neighbor. Therefore, it is possible that after 2 k 2 k 1 1 steps, an entire k-subcube of Skd1 , ..., di21l (induced by edges in dimensions d1 , ..., dk) will be blocked from receiving the message because of faulty processors encountered. We remedy this possibility by halting after 2k 2 k steps, so that after the execution of step 3a, in any k-subcube induced by the dimensions d1 , ..., dk , at least one nonfaulty processor has received the message. This processor then transmits the message to its entire subcube using the flooding phase, which is essentially Algorithm B executed inside a k-subcube. Note that here we made full use of the fat that Algorithm B
functions correctly in SIMD fashion regardless of fault structure. It is also easy to see, from the description of the algorithm, that the total number of transmissions used by all of the transmission phases is n. Also, there are a total of (n 2 k)/(2k 2 k) flooding phases, each of which uses 2k transmission steps. Thus the algorithm uses a total of
# 2k 1 (n 2 k) 1 2k
5 n 1 2k
L2n 22kkJ k
SL J D n2k 2k 2 k
1
1 2
transmission steps; this improves upon the 2n steps used by Algorithm B. ALGORITHM C9. Note the special case, with k 5 log n, yields an algorithm that uses n 1 3 log n transmission steps. We will refer to this special case as Algorithm C9. There is another way to exploit the result of theorem 1. We can choose any constant k, such that 1 # k # log n 2 1, and obtain an algorithm whose time bound depends on k. The idea is as follows. View Qn as Q9n 5 Qn /hdk12 , ..., dnj, which contains 2k11 supernodes, each of which is an (n 2 k 2 1)-dimensional cube. Consider the effect of transmissions from S along d1 , d2 , ..., dk11 , in that order. After the transmission along di , the message will have reached at least 2i21 1 1 nodes, because only one node’s message can be blocked by a fault after each transmission. Thus after the sequence of transmissions, there are at least 2k 1 1 supernodes of Qn , at least one of which contains a node with the message. Among these supernodes, there exists at least one supernode H containing a number of faults f that is no more than (n 2 1)/ (2k 1 1). (We do not necessarily need to identify this supernode; its existence suffices for our purposes.) Thus, if we now use dimensions dk12 , ..., dn , dk12 , ..., dk1f11 , we have effectively applied Algorithm B inside H, and thus all of the nonfaulty nodes of H receive the message. Finally, we use Algorithm B along the external dimensions of Q9n to complete the broadcast in Qn . ALGORITHM D. Input. A cube Qn , the source node S for the broadcast, a fault set F containing # n 2 1 faults, and a value k [ h1, 2, ..., log n 2 1j. Output. A dimension sequence for SIMD broadcasting from S to all nonfaulty nodes. Method. 1. [Send to a destination subcube H.] Transmit along d1 , d2 , ..., dk11 . 2. [Broadcast within H.] Use Algorithm B to broadcast from the target node to all nodes within H. This uses n 2 k 1 (n 2 1)/(2k 1 1) steps. 3. [Broadcast to nodes outside H.] Every one of the
62
RAGHAVENDRA AND SRIDHAR
non-faulty nodes y [ H now has S’s message. Each node y [ H now uses Algorithm B to broadcast in the subcube ykd1 , d2 , ..., dk , dk11l. The total number of steps used by Algorithm D is
S
# (k 1 1) 1 n 2 k 1
5n131
K(2n 2111HD 1 (2k 1 2) k
K2n 2111H 1 2k. k
For the special case k 5 log n, this algorithm uses n 1 2 log n 1 3 time steps. A slight improvement to this bound can be obtained in the case k 5 log n 1 1, by observing that the subcube H formed using dimensions d1 , d2 , ..., dk11 in step 1 is guaranteed to be completely fault-free, so that broadcasting in H takes n 2 log n 2 1 steps. This results in an algorithm that takes n 1 2 log n 1 2 steps. 3.2. An Asymptotic Improvement Using Recursive Application It is possible to generalize the ideas of the previous subsection to obtain asymptotically faster algorithms. One such possibility is now shown. The approach is to use an ‘‘exponentiation’’ strategy that makes better use of the idea of Algorithm C. The central idea of this extension is the observation that after one execution of step 3 of Algorithm C, we have successfully completed a broadcast operation in the subcube Skd1 , d2 , ..., d16l. This cube contains at least 216 2 15 nonfaulty nodes, because of the ordering of theorem 1. We can now use this cube as the base cube for broadcast in the next step; thus we can run the transmission phase for the next 216 2 14 time steps before flooding is required in the base 16-cube. This flooding requires an additional 32 steps. This process of increasing the size of the base cube can be repeated until all n dimensions are covered. In fact, we can use Algorithm C9 itself for this flooding phase, so that we get better asymptotic behavior. Referring to Fig. 2, we can broadcast first within 2-subcubes spanned by d1 , d2 , then increase the size to 4-cubes, etc. The dimension sequence used for the example in Fig. 2 with this recursive algorithm will be d1 , d2 , d1 , d2 , then d3 , d4 , (d1 , d2 , d1 , d2) and finally, d5 , (d1 , d2 , d1d2). Again, dimensions in parentheses correspond to local copying steps. The number of steps needed by this algorithm will be smaller than previous algorithms for large n. We now describe this Algorithm E for general n. ALGORITHM E. 1. Set i r 1, m r 2. Apply Algorithm B to broadcast from S to all nonfaulty nodes within the subcube Skdi , ..., di1m21l. 2. while m ,5 n do 3. [Transmission phase.] Execute 2m 2 m 2 1 transmission steps along the dimensions di1m , ..., di12m21.
4. [Flooding phase.] At every non-faulty node y reached by step 3, Apply Algorithm C9 to broadcast from y within the subcube ykdi , di11 , ..., di1m21l. 5. Set i r m, m r 2m. end while. As with Algorithm C, the transmission phases of this algorithm use a total of n time steps, since every execution of step 3 above uses a different set of dimensions. To account for the times used by the flooding phases, note that each execution of step 4 uses m 1 3 log m time steps. Since we initially set m 5 k and then set m to 2m every time, the sequence of values that m takes on are k, 2k, k 22 , ..., log log n, log n. The corresponding costs of step 4, written in the opposite order, are log n 1 3 log log n, log log n 1 3 log log log n, .... Thus this algorithm uses n 1 log n 1 4 log log n 1 4 log log log n 1 ??? 1 4 log* n time steps. It is possible to get a similarly improved algorithm based on Algorithm D; however, it does not seem possible to do better than improving the constants multiplying the logarithmic factors in the above expression for time bound. ALGORITHM F. We now show how to use Theorem 1 in yet another way to construct an efficient broadcast algorithm. [Stage 1.] Consider the effect of a sequence of transmissions from the source S along dimensions d1 , d2 , ..., dlogn11 . It is easy to construct an inductive argument using Theorem 1 to show that after the i-th step in this sequence, at least 2i21 1 1 nonfaulty nodes will have received S’s message. Thus at the end of the sequence, at least n 1 1 nonfaulty nodes in the subcube H 5 Skd1 , d2 , ..., dlogn11l will have received the message. [Stage 2]. Next, we continue with transmissions along the dimensions dlogn12 , ..., dn . During this process, for every (log n 1 1)-dimensional subcube J induced by dimensions d1 , ..., dlogn11 , each node in H that was reached at the end of stage 1 will attempt to send its message to its image in J. Note that the paths from nodes in H to their images in J are node-disjoint. Since H contains at least n 1 1 nodes with messages, and there are at most n 2 1 faults, every J must contain at least one good node that receives the message at the end of stage 2. [Stage 3.] Transmit along dimensions d1 , d2 , ..., dlogn11 . At the end of stage 3, all nodes in any fault-free supernode J in Qn /H (where H is as in step 2) will have received the data, i.e., will have been ‘‘filled.’’ Therefore, let us think of any ‘‘unfilled’’ supernode that contains at least one faulty node as faulty. We see that Qn /H is an (n 2 log n 2 1)-dimensional cube with at most n 2 1 faulty supernodes. There is at least one dimension dj( j $ log n 1 2) of Qn / H along which there is no more than one doubly-faulty edge; this is because at least 3(n 2 log n 2 1)/2 faults in Qn /H are needed to produce two or more doubly faulty edges in every dimension [23]. Thus we can consider the family of 2-subcubes induced by dj and dj11 in Qn /H. At least one supernode in every such subcube is fault-free,
BROADCASTING IN FAULTY SIMD HYPERCUBES
for otherwise there would be two doubly faulty dj-edges in Qn /H. By transmitting on dj , dj11 , all nodes in every supernode in Qn /H receive the message, except those individual nonfaulty nodes, say x, whose neighbors along dj and dj11 are faulty. Now, consider the 2-subcube induced by d1 and d2 that includes such a node x. First, there can be at most one faulty node among these four nodes and, next, the other two good nodes will have received the data since only one node among these four can have a faulty neighbor across dj or dj11 . All such nodes x can be made to receive data by transmitting again on dimensions d1 and d2 . Thus we have an algorithm that uses n 1 log n 1 5 transmission steps. 4. A NEARLY OPTIMAL ALGORITHM
Finally, we show how to make better use of the results in this paper as well as other results to derive an algorithm that uses n 1 c transmission steps, where c is a small constant. This algorithm is almost optimal, since it is known [21] that the diameter of Qn with n 2 1 faults is at least n and at most n 1 1. To this end, we will first establish a few preliminary results. LEMMA 3. In any n-dimensional hypercube Qn (with n $ 6) containing at most 2n faulty nodes, there exist three dimensions J 5 hj1 , j2 , j3j such that for any j [ J, there are at most three doubly faulty j-edges. Proof. The proof is essentially a counting argument that counts the number of node faults needed to render our hypothesis false. For notational convenience, let us say that a dimension j is unusable if there are four or more doubly faulty j-edges. Thus suppose that the hypothesis is false, i.e., that there are as many as n 2 3 unusable dimensions. To facilitate counting, suppose that there are no faulty nodes to begin with, and we throw in faulty nodes one by one, to cause this many unusable dimensions. To create three unusable dimensions, we must use up eight node faults (i.e., make an entire 3-subcube faulty). Subsequent to this, we can ‘‘reuse’’ existing node faults, but it is easy to verify that we must add at least seven more node faults to obtain three more unusable dimensions. Thus, to obtain n 2 3 unusable dimensions we must have used up 8 1 7(n 2 3)/3 $ 7n/3 1 1 faulty nodes; however, this number is greater than the 2n faults we assumed. j The above result does not hold when n # 5; for example, a straightforward construction shows that in Q5 with ten faulty nodes, we cannot find three such dimensions. LEMMA 4. In any n-dimensional hypercube Qn containing at most 2n faulty nodes, there exist three dimensions
63
j1 , j2 , j3 such that every 3-subcube induced by these three dimensions contains at least one non-faulty node. Proof. Consider the set of three dimensions constructed in the proof of Lemma 3. Every 3-subcube induced by these three dimensions must contain at least two faultfree nodes; otherwise, it is easily verified that at least one of these dimensions would be unusable. j Another way to look at this result is that those three dimensions have three or fewer doubly faulty edges. Hence, if we pick one of these j1 , j2 , j3 and pick two other dimensions and partition Qn into 3-subcubes, then every 3-subcube has at least one nonfaulty node. Otherwise, this particular dimension in J we picked will have four doubly faulty edges. The first step in this algorithm is to have the broadcasted message traverse a short path to a large fault-free subcube. In the context of our problem (where we have at most n 2 1 faults), let D 5 hd1 , d2 , ..., dn/2j, and let Q9n/2 5 Qn /D. In Q9n/2 , since there are at most n 2 1 faults and 2n/2 supernodes, at least one of the supernodes must be entirely fault-free. In fact, we can prove a stronger result: Given any source node S, there is a node x a short distance from S that is part of such a fault-free supernode, as follows. LEMMA 5. Given any source node S, there exists a node x at distance at most 6 from S that is part of a completely fault-free supernode of Q9n/2 . Proof. First, for every faulty node y, color all the nodes in y’s supernode red. Since there are only n 2 1 faults in Qn , there can be at most n 2 1 red supernodes in Q9n/2 , Therefore, using Lemma 3, there are three external dimensions h j1 , j2 , j3j of Q9n/2 such that every 3-subcube induced by these dimensions contains at least one nonred supernode. Consider the 2-subcube containing S induced by the dimensions d1 , d2 , and this subcube is part of the supernode containing S. First, route along dimensions d1 , d2 , and d1 again. Now, all good nodes in this 2-subcube has the message and there are at least 3 such nodes. Next, route along j1 and we perform parallel data transfer from a 2-subcube to another along j1 . By Theorem 1, at most one of these three can be blocked by a fault and so two good nodes will get the message. Next, we route along j2 , and again with parallel data transfers, at least one good node will receive data (see Fig. 3). Finally, we route along dimension j3 . Since at least one of these 2-subcubes (shown in Fig. 3) belong to a completely fault-free supernode, at least one good node in each 2-subcube (including our node x) has data. This data movement is shown in Fig. 3, and number inside each 2-cube indicates the least number of good nodes with data after each of the last three steps. Since, the 2-subcube where x is located is completely faultfree, we can reach x from S in six steps. j Thus our algorithm is as follows. ALGORITHM G. (1) Identify the node x as in Lemma 4, and send the message from S to x. This takes six steps.
64
RAGHAVENDRA AND SRIDHAR
there is an ordering of dimensions similar to Theorem 1 in the presence of both node and link faults, so long as the total number of faults does not exceed n 2 1. THEOREM 6. Given a cube Qn with up to n 2 1 node and/or link faults, there is an ordering d1 , d2 , ..., dn of the dimensions of Qn such that for every k, each subcube induced by the dimensions d1 , d2 , ..., dk contains at most k 21 node/link faults.
FIG. 3. Routing from S to node x.
(2) Using the internal dimensions d1 , d2 , ..., dn/2 , send the message to all the nodes in the fault-free subcube containing x. This takes n/2 transmissions. Let Cn/2(x) denote this (n/2)-subcube containing x. (3) Now view Qn as Q0n/2 5 Qn /hdn/211 , dn/212 , ..., dnj; i.e., the roles of the internal and external dimensions are now reversed from Q9n/2 to Q0n/2 . Every supernode of Q0n/2 contains one node from Cn/2(x). Therefore, we transmit along dimensions dn/211 , dn/212 , ..., dn . (4) Finally, we invoke a cleanup step. For every faulty node y, color all the nodes in the supernode it belongs (in Q0n/2) red. Then, as done in Lemma 4, we can identify three dimensions k1 , k2 , k3 from among the external dimensions of Q0n/2 (i.e., among d1 , d2 , ..., dn/2) such that every 3-subcube induced by these three dimensions contains at least one noncolored nodes. In a nonred supernode all nodes have received the broadcast message. Again, these three dimensions k1 , k2 , k3 do not have four doubly faulty edges, and one of these is different from d1 and d2 . Let k be the dimension which has three or fewer doubly-faulty edges. As explained earlier, every 3subcubes of supernodes spanned by d1 , d2 , k has at least one nonred node. Now, in each of the 3-subcubes of individual nodes spanned by d1 , d2 , k there is one representative node from the nonred supernode. What we have now is a partitioning of Qn into 3-subcubes (of individual nodes) such that there is at least one node with message and each subcube contains at most 2 faulty nodes by Theorem 1. Therefore, by transmitting along d1 , d2 , k and again along d1 , d2 , k , all good nodes can be reached. Thus by executing six additional steps, all nonfaulty nodes in Qn receive the broadcast message. The entire algorithm takes n 1 12 steps.
Proof. First suppose that there are no node faults, and let li denote the number of dimension-i links that are faulty. We can simply order the dimensions in ascending order of their li values; it is easy to verify that this ordering satisfies the conditions of the theorem. Suppose now that Qn contains t link faults and up to n 2 1 2 t node faults, where t # n 2 2. To handle this case, we will construct the ordering as the concatenation of two subsequences N and L. Construct N to be the n 2 2 2 t dimensions i1 , ..., in222t , which are the dimensions i constructed as in the proof of Theorem 1. At this point, every subcube induced by dimensions not included among i1 , i2 , ..., in222t contains at most one node fault. Next, we construct L by ordering the remaining dimensions (the ones not used by N) in ascending order of li values. (Note that some dimensions might not have any faulty links; their li values will be zero, so that they will appear first in L.) The final ordering is the reverse of the concatenation of N and L. It is straightforward to verify that the resulting composite sequence satisfies the conditions of the theorem. j Given the above ordering, the algorithms described above for broadcasting can be modified with little effort to handle link faults. 6. THE CASE OF MANY FAULTS
In this section, we consider the situation where there are more than n node faults in the hypercube. LEMMA 7. Let k and n be positive integers, and let F be a collection of distinct n-bit vectors, of largest possible cardinality, such that for any given coordinate position i, the number of vectors in F with a 1 in position i is at most k. Then F satisfies the following property: If F contains a vector of weight i, then for every j , i, F contains every bit vector of weight j. Also, if k , n, then uFu # (k 1 1) n/2 1 1. Proof. Let xi denote the number of vectors in F of weight i, and let t be the weight of a greatest-weight vector in F. Then F maximizes the number of vectors in it, i.e., the quantity
Ox
i
5. HANDLING LINK FAULTS
Here we will sketch an extension of the ideas presented above to handle link faults. In particular, we show that
0#i#t
subject to the conditions that the sum of all of the weights is at most nk, i.e.,
65
BROADCASTING IN FAULTY SIMD HYPERCUBES
O ix # nk, i
0#i#t
and that 0 # xi # (in) for every i. Under these conditions, it is straightforward to show that the greedy strategy that maximizes the values of the xi’s, in increasing order of i, provides the maximal size of F. That is, we must have x0 5 1, x1 5 n, x2 5 (2n), and so forth. This implies the claimed result. Consequently, if k 5 1, then F contains at most n 1 1 vectors. For 2 # k # n, we can choose the zero vector and the n vectors of weight 1 to cover every position once; after this, every choice of a weight 2 vector covers two positions. Therefore, the most vectors that F can contain is 1 1 n 1 n/2(k 2 1) 5 (k 1 1)n/2 1 1. j COROLLARY 8. Let k and n be positive integers, with k # n. Given a collection F of distinct n-bit vectors such that uFu $ kn/2 1 2, there exists a coordinate i such that there are at least k vectors in F containing a 1 and at least k containing a 0 in position i. Proof. Suppose to the contrary that for every co-ordinate i, there are either at most k 2 1 1’s or at most k 2 1 0’s in position i. Construct a new family F9 from F by complementing all those coordinate positions in which # k 2 1 vectors in F have 0’s. Then F9 has at most k 2 1 1’s in every co-ordinate position. Now the largest that F9 can be, by Lemma 7, is kn/2 1 1. j Using this corollary, we can show that there is an ordering of dimensions similar to that of Theorem 1 when there are many faults. DEFINITION. Given a cube Qn containing f node faults, where f # (2n), the badness of Qn , denoted b(Qn), is the greatest value of k such that kn/2 1 2 # f # (k 1 1) n/2 1 1, which reduces to b(Qn) 5
K2f n2 3H .
The surplus of Qn , denoted s(Qn), is the number f 2 (kn/2 1 2) where k is the badness of Qn . Thus the surplus of Qn is the number of faults it contains in addtion to those it needs to make its badness equal k. Note that 1 # b(Qn) # n 2 1 and 0 # s(Qn) # n/2 2 1. LEMMA 9. Given a cube Qn with badness b(Qn) 5 k, the cube contains f # (2n) faults, and there is a dimension d of Qn such that when Qn is split along d into two subcubes Q0n21 and Q1n21 , we have for j [ h0, 1j: either (a) b(Q nn21) 5 b(Qn) and s(Q nj 21) # s(Qn) 2 k, or (b) b(Q nj 21) # b(Qn) 2 1; also, b(Q nj 21) # n 2 1. Proof. Since there are f $ kn/2 1 2 n-bit vectors in the fault set of Qn , the above corollary applies, and there
is at least one dimension d along which at least k faulty nodes have 1’s and at least k have 0’s. So we can split Qn along dimension d and obtain two subcubes Q0n21 and Q1n21 such that each of Q0n21 and Q1n21 contains at least k of the faults. Also, since Q nj 21 contains at least k faults, Qn122j1 contains the remaining faults that number at most
f2k5
Kkn2 H 1 1 1 s(Q ) 2 k. n
Therefore
b(Q nj 21) #
5
K2( f n22k)12 3H
U
SK H
2
kn 2
D
1 1 1 s(Qn) 2 k 2 3 n21
–
U
–
5
Kkn 2 (kn mod 2)n 21 12s(Q ) 2 2k 2 1H
5
Kk(n 2 2) 1 2s(Qn)221(1 1 (kn mod 2))H
n
n
Now since 0 # s(Qn) # n/2 2 1, and since 2n/2 2 2 , n 2 1 whenever n $ 2, we note that if s(Qn) $ (k 1 1 1 (kn mod 2))/2, then b(Q nj 21) # k; otherwise, b(Q nj 21) # k 2 1. To show that b(Q nj 21) # n 2 1, we only need to substitute k # n 2 1 and s(Qn) # n/2 2 1 into the above equation. j THEOREM 10. Given a cube Qn with badness b(Qn) 5 k, there is an ordering d1 , d2 , ..., dn of the dimensions of Qn such that, for every p [ h1, ..., nj, every subcube induced by the dimensions d1 , d2 , ..., dp contains at most (min(p, k) 1 1)p/2 1 1 faults. Proof. Suppose we apply the idea of Lemma 9 repeatedly. In other words, we find a dimension satisfying Corollary 8 in Q0n21 , split along this dimension, and repeat the process. Notice that after each split, the badness of the subcubes obtained progressively diminishes, because of Lemma 9. At the last step of the splitting process, we will attempt to split a 1-cube, and we must necessarily have a badness of 1 in this step. This means that the badness values diminish from k down to 1 during the splitting process. Now consider the sequence of dimensions i1 , i2 , ..., in used for the splits in the above process. Given some p, suppose we fix the values of the bit positions i1 , ..., in2p arbitrarily and allow the remaining bit positions to vary over all possible values. This results in a p-subcube Qp that has badness b(Qp) 5 min(p, k), so that it contains no more than (min(p, k) 1 1)p/2 1 1 faults. j
66
RAGHAVENDRA AND SRIDHAR
Application of this theorem to developing algorithms must account for the fact that with a number of faults larger than n 2 1, the cube might not be connected. We must therefore operate within the connected components obtained because of the faults. However, this does not seem to be too hard to handle. 7. CONCLUSIONS
The fundamental question we have tried to address in this paper is the following: In an SIMD hypercube with up to n 2 1 faulty nodes, the fautls being known a priori, what is the least number of time steps needed to accomplish broadcasting from a given processor node to all others? We have described five new algorithms for this problem, each of which does better than the previous one. Each is applicable for a particular range of dimensions. For example, Algorithm C9 performs better than the simple Algorithm B when n $ 8, and Algorithm D does even better when n $ 5. Algorithm F does better when n . 8, but Algorithm G is best when n $ 128. Our results have been based on an ordering on the dimensions of the cube, and we have shown how to extend this ordering to account for link faults and for a larger number (up to (2n)) of node faults. REFERENCES 1. Al-Dhelaan and B. Bose, Efficient fault-tolerant broadcasting on the hypercube. Proc. 4th Conf. on Hypercube Concurrent Computers and Applications, 1989. 2. B. Becker and H. U. Simon, How robust is the n-cube? Inform. and Comput. 77 (1988), 162–178. 3. R. Boppana and C. S. Raghavendra, Optimal self-routing of linearcomplement permutations in hypercubes. Proceedings Distributed Memory Computing Conference, 1990. 4. J. Bruck, R. Cypher, and D. Soroker, Running algorithms efficiently on faulty hypercubes. Computer Architecture News 19, 1 (1991), 89–96. 5. M. S. Chen and K. G. Shin, Depth-first approach for fault-tolerant routing in hypercube multicomputers. IEEE Trans. Parallel Distribut. Systems (Apr. 1990). 6. M. S. Chen and K. G. Shin, Message routing in an injured hypercube. Proc. Third Conf. on Hypercube Concurrent Computers and Applications, 1988. 7. F. Harary, J. P. Hayes, and H.-J. Wu, Survey of the theory of hypercube graphs. Comput. Math. Appl. 15 (1988), 277–289. 8. S. M. Hedetniemi, S. T. Hedetniemi, and A. L. Liestman, A survey of gossipping and broadcasting in communication networks. Networks 18 (1988), 319–349. 9. W. D. Hillis, The Connection Machine, MIT Press, New Haven, CT, 1985. 10. S. L. Johnsson, Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distrib. Comput. 4 (1987), 133–172. Received May 10, 1994; revised May 22, 1995; accepted November 13, 1995
11. S. L. Johnsson and C. T. Ho, Optimal broadcasting and personalized communication in hypercubes. IEEE Trans. Comput. (Sept. 1989). 12. T. C. Lee and J. P. Hayes, Routing and broadcasting in faulty hypercube computers. Proc. Third Conf. on Hypercube Concurrent Computers and Applications, 1988. 13. S. F. Nugent, The iPSC/2 direct-connect communications technology. Hypercube Conference, 1988. 14. S. Park and B. Bose, Broadcasting in hypecubes with faulty links. Proc. Frontiers of Massively Parallel Computers, Oct. 1992, pp. 286–290. 15. M. Peercy and P. Banerjee, Distributed algorithms for shortest path and deadlock-free routing and broadcasting in arbitrarily faulty hypercubes. Proc. Int. Symp. on Fault-Tolerant Computing, 1990. 16. C. S. Raghavendra, P. J. Yang, and S. B. Tien, Free dimensions—An Effective approach to achieving fault tolerance in hypercubes. Proc. 22nd International Symposium on Fault Tolerant Computing, 1992. 17. P. Ramanathan and K. G. Shin, Reliable broadcasting in hypercube multicomputers. IEEE Trans. Comput. 32, 12 (Dec. 1988), 1654–1657. 18. S. Ranka and S. Sahni, Image template matching on SIMD hypercube multicomputers. Proc. Int. Conf. on Parallel Processing, 1988, Vol. 3, pp. 84–91. 19. E. M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms. Prentice–Hall, Englewood Cliffs, NJ, 1977. 20. Y. Saad and M. H. Schultz, Topological properties of hypercubes. Research Report 389, Dept. of Computer Science, Yale Univ., June 1985. 21. S. B. Tien and C. S. Raghavendra, Algorithms and bounds for shortest path and diameter problems in faulty hypercubes. IEEE Trans. Parallel Distrib. Systems (June 1993), 713–718. 22. S. B. Tien and C. S. Raghavendra, Optimal global operations and broadcasting on faulty SIMD hypercubes. Technical Report, Dept. of EE–Systems, USC, 1992. 23. P. J. Yang, S. B. Tien, and C. S. Raghavendra, Embedding of rings and meshes on to faulty hypercubes. Technical Report, Dept. of EE–Systems, USC, 1991.
C. S. RAGHAVENDRA was born in India. He received the B.Sc (Hons) physics degree from Bangalore University in 1973 and the B.E and M.E degrees in electronics and communication from Indian Institute of Science, Bangalore in 1976 and 1978, respectively. He received the Ph.D degree in computer science from the University of California at Los Angeles in 1982. From September 1982 to December 1991, he was on the faculty of the Electrical Engineering–Systems Department at the University of Southern California, Los Angeles. Currently, he is the Boeing Centennial Chair Professor of Computer Engineering in the School of Electrical Engineering and Computer Science at Washington State University in Pullman. His research interests are high-speed networks, parallel processing, fault-tolerant computing, and distributed systems. Dr. Raghavendra was a recipient of the Presidential Young Investigator Award for 1985. M. A. SRIDHAR received the Master’s degree from the School of Automation at the Indian Institute of Science, Bangalore in 1976 and the Ph.D. degree in computer science from the University of Wisconsin, Madison in 1986. Since then, he has been on the faculty of the Computer Science Department at the University of South Carolina in Columbia, SC. His current research interests are in algorithm design, programming languages, and object-oriented systems.