Microelectron. Reliab., Vol. 34, No. 8, pp. 1301-1310, 1994 Copyright © 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0026-2714/94 $7.00+ .00
Pergamon
FINDING N O N F A U L T Y SUBTREES IN FAULTY BINARY TREE ARCHITECTURES RAVI MITTAL Department of Computer Science and Engineering, Indian Institute of Technology, Madras 600 036, India email:
[email protected] BIJENDRAN. JAIN Department of Computer Science and Engineering, Indian Institute of Technology, Delhi 110 016, India email:
[email protected] and RAKESHK. PATNEY Department of Electrical Engineering, Indian Institute of Technology, Delhi 110 016, India email:
[email protected]
(Received for publication 5 November 1993)
Abstract In this paper we have studied fault tolerance of a full binary tree in terms of availability of nonfaulty (full) subtrees. When an unaugmented full binary tree is faulty, then the computation can be carried out on the largest available non-faulty (full) binary subtree. It is shown that the minimum number of faulty nodes required to destroy all subtrees of height h in a full binary tree of height n is given as fbt(n, h) = L(2n-1)/(2h-1).J. It follows that the availability of a non-faulty subtree of height h = n-w, in an n level full binary tree containing u faulty nodes, can be ensured, where w is the smallest integer such that u ~ 2w. An algorithm which evaluates whether a given set of faulty nodes will destroy all subtrees of some specified height, is given. This algorithm can also evaluate the largest available nonfaulty subtree in a faulty full binary tree. We also study the availability of a non-faulty subtree in some augmented binary tree architectures.
I. INTRODUCTION A full binary tree is an efficient communication network, where any two nodes exchange information in O(log N) time, where N is the total number nodes. The binary tree is expandable, and is thus considered to be an attractive and useful interconnection network [1-5]. However, its fault tolerance is low, since there exists only one path between any pair of nodes. Thus, in the presence of a single fault the tree may be disconnected. To improve the fault tolerance of binary tree architectures, various networks based on augmentation of a full binary tree with redundant nodes (processors) and links, have been proposed [4-7]. Some of the commonly known augmented tree networks are Hayes's tree [4], The preliminary version of this paper appeared in the Proceedings of the International Conferenceon Computing *rodInformation (ICCI). Ottawa, C~mada, May 1991. 1301
1302
R. MIT'rAL et al. RAE-tree [5], and AB-tree [6]. In an augmented tree architecture, the rigid full binary tree structure is maintained even in the presence of node faults. The redundant links and nodes are used to replace faulty nodes and links in the tree so as to maintain its binary tree topology. The augmented binary tree architectures can be reconfigured only when the number of faults is limited. If the number of faulty nodes in an augmented binary tree is large then it will not be possible to reconfigure the faulty tree into a non.faulty full binary tree (of the same height). In which case, a largest non-faulty (full) binary subtree may be identified and used for computation. In this paper, we study availability of non-faulty binary subtrees in a full binary tree or an augmented tree architecture. It is known that many of the basic algorithms for universal networks (such as n-cube, shuffle-exchange, binary tree, etc.) have the property that they can be formulated with the size of the network as a parameter of the algorithm. Thus, algorithms can invariably be executed efficiently on a subnetwork of the same topology with some slow-down factor. For example, many fundamental algorithms designed for execution on a full binary tree of height n can be modified to run on a non-faulty subtree of height h, where h < n. A node of the tree architecture is a Processing Element (PE) which consists of a processor, a memory module and a communication processor. Each PE is connected to a host computer which works as the central controller of the multiprocessor. The host computer broadcasts the same instruction to all nodes when the multiprocessor functions in SIMD mode. The testing (fault diagnosis) is performed by the host computer by applying test stimuli to all PEs and then analyzing their responses. In this way, the host computer can identify all faulty PEs present in the system. Having obtained the indices of all the faulty nodes in the network, the host computer can determine the largest nonfaulty full-subtree (and other nonfaulty full-subtrees) available in the network. The nonfaulty full-subtrees can be used for parallel execution of programs. The size of the largest possible non-faulty full binary-subtree (henceforth, we will refer full binary-subtree as subtree) depends on the number of faulty nodes and their locations in the network. In this paper we find out the largest available non-faulty subtree when a full binary tree contains a given combination of faulty nodes. We also evaluate the minimum number of faulty nodes required to make all subtrees of a given height faulty. The binary tree considered here is rigid, i.e., interconnections between nodes are fixed. Also, the fault model assumes only permanent faults in nodes. Since the probability of a node failure is much higher than that of a link failure, only node failures have been considered in this study. A subtree of height h, which originates from node j, is referred to as STh(j). The height of
an n level full (sub) tree is taken as n. A subtree is considered to be faulty if it contains at least one faulty node. A Fault set is defined as a set whose elements correspond to indices of faulty nodes in the tree. If nodes of a fault set of a full binary tree of height n are such that they destroy all subtrees of height h then the fault set is known as critical fault set or CFS(n, h). For given values of n and h, a minimum critical fault set MCFS(n, h) is defined as a critical fault set CFS(n, h) with minimum cardinality. Thus, I MCFS(n, h) [ = fbt(n, h), where fbt(n, h) is defined as the minimum number of faulty nodes that destroy all subtrees of height h. Clearly, there may exist a number of minimum critical fault sets. Note that a critical fault set for a subtree of height h is also critical for subtrees of height greater than h, but it may or may not be critical for subtrees of height less than h. However, a minimum critical fault set for a subtree of height h is never a minimum critical fault set for a subtree of height larger than h. Such a
Nonfaulty subtr~s
1303
minimum critical fault set will not be critical fault set for a subtree of height smaller than h. As an example, consider a five level full binary tree shown in Figure 1. A fault set {3, 4, 6, 7, 11} is critical for subtrees of height three. However, the fault set {3, 4, 10, 12, 20} is not critical because the subtree ST3(7 ) is non-faulty. Both fault sets are critical for subtrees of height 4. Further, a MCFS(5, 3) = {4, 5, 6, 7}. Another MCFS{5, 3} = {4, 20, 12, 31}. The rest of this paper is organized as follows: In section 2, we show that fbt(n, h) is computable and is equal to L(2n-1)/(2h-1)J. An algorithm which finds out whether a given fault set is critical for subtrees of some height h, is given in Section 3. In Section 4, we show that the cardinallty of minimum critical fault set can be increased by augmentation of a full binary tree with redundant nodes and finks. The cardinality of minimum critical fault sets for Hayes's binary tree [4], and RAE-tree [5] are also evaluated in this section.
II. M I N I M U M CRITICAL F A U L T SET Let the number of faulty processors in a full binary tree of height n be u. The number of nodes in a full binary tree of height n and a subtree of height h are represented as N and H, respectively. Obviously, N = 2n-1 and H = 2h-1. By definition of fbt(n, h), ifu < fbt(n, h) then there exists at least one non-faulty subtree of height h. On the other hand, if u > N-H then there does not exist any non-faulty subtree of height h. The following theorem determines the value of fbt(n, h).
Theorem 1: For a full binary tree of height n, the minimum number of faulty nodes required
to destroy all (full) subtrees of height h is given by fbt(n, h) = L(2n-1)/(2h-1)J. Proof: The proof follows in two parts. The first part shows that there exists a set of [.(2n-1)/(2h-1)J nodes which when faulty destroy all subtrees of height h. In the second part, we show that [.(2n-1)/(2h-1).l is the minimum number of faulty nodes required. Let n = m*h + k, where m = Ln/hJ and k is an integer. a). Consider a set of nodes {j: j at level i, where i = n-h+ 1, n-2*h+ 1..... n - m ' h + 1}. We show that if all of these nodes are faulty then there does not exist any non-faulty binary subtree of height h. This set of faulty nodes is then a CFS(n, h). To show this, consider a subtree of height h rooted at any level b, 1 < b < n-h + 1. Clearly, such a binary tree of height h must include one or more nodes at level n-x'h+ 1, where x is largest integer such that (n-x'h+ 1) > b. But, these
1
2
3
4
16
7
18
20
22
24
26
28
Fig. 1. A five level full binary tree.
30 31
1304
R. MITTAL et al. nodes are assumed to be faulty. Thus, no non-faulty subtree of height h is available. Now, the n u m b e r of nodes in CFS(n, h) can be found by adding nodes at each level i and is given as ICFS(n, h)l = 2 n-h + 2 n-2*h + .... + 2 n'm*h
(1)
The right side of relationship (1) can be simplified as in the following steps. I CFS(n, h) l = 2n-h{ 1 + 2-h+ 2"2"h+... + 2"(m'l)*h}, = (2n-1 -(2k-1))/(2h-1),
where k = n - m*h
= L(2"-1)/(2h-1).]. T h e last equality follows from the fact that (2n-l-(2k-1))/(2h-1) is an integer, 0 < (2k-1)/(2h-1) < 1, and n u m b e r of nodes is always an integer. Therefore, fbt(n, h) [_(2n-1)/(2h-1)/.
b). The n u m b e r of nodes in a subtree of height h is 2 h -1. It can be trivially shown that the maximum n u m b e r of node-disjoint subtrees of height h in an n-level full tree is L(2n-1)/(2h-1)]. Since each subtree can be destroyed if it contains at least one faulty node, the m i n i m u m n u m b e r of faulty nodes required to destroy all subtrees of height h, is l_(2n-1)/(2h-1)]. Therefore, fbt(n, h) > L(2n-1)/(2h-1)]. F r o m parts (a) and (b), fbr(n, h) = [_(2n-1)/(2h-1)J.
As an example let the values of n and h be 6 and 3, respectively. Then, a m i n i m u m critical fault set is {1, 8, 14, 18, 20, 25, 44, 52, 61} as illustrated in Figure 2. T h e following is a direct consequence of T h e o r e m 1 and the fact that if the cardinality of a fault set is less than fbt(n, h), then there exists a non-faulty subtree of height h.
Corollary 1: Let the n u m b e r of faulty nodes in an n-level full binary tree be u. T h e n there exists a non-faulty subtree of height h = n-w, where w is the smallest integer such that u g 2w. For example, if n = 20 and u = 13, then w = 4. Hence, there exists at least one non-faulty subtree of height 16.
1
32
34
36
40
44 I
4$
52
56
Faulty node
Fig, 2. A MCFS (6,3) = { 1, 8, 14, 20, 25, 44, 52, 61 }.
61
63
Nonfaulty subtrees
III. O B T A I N I N G N O N - F A U L T Y S U B T R E E S The available nonfaulty subtrees in a faulty tree can be identified by the host computer. The host computer performs the testing and detects all faulty nodes (fault set) in the network. The host computer then runs the algorithm CHECK to evaluate whether the fault set is critical or not. By repeatedly executing this algorithm for different values of h, it can determine the largest non-faulty subtree. A fault-set, for given values of n and h, is critical if it destroys all subtrees of height h. If the cardinality, u, of a fault set, FS, is less than fbt(n, h) hence the fault set is not critical. However, if fbt(n, h) ~ u ~ N-H then the fault set may be critical depending on locations of its faulty nodes. For a given height of a subtree, it can be evaluated whether or not a fault set is critical. In the algorithm CHECK a node j is said to be marked if STh(j) contains at least one faulty node. Otherwise, it is unmarked. After executing this algorithm, each unmarked node corresponds to a non-faulty subtree of height h. Algorithm CHECK/* Th/s algorithm finds out whether a given fault set is a critical */
be#l for each node x, such thatx e {FS} do
mark node x; for i := 1 to h-1 do if (I.x/2iJ > 1) then mark node lx/2iJ; end; if(any node at level i, 1 < i a n-h+ 1, is unmarked) then FS is non-critical else fault set FS is critical;
The time required to mark ancestors of nodes of fault set FS is h I FS I. The time taken to find unmarked nodes is 2 nh+l - -1. Hence, the total time required to execute the above algorithm
is hlFSI + 2n'h+l-1. A non-faulty subtree of the largest size can be obtained by executing the algorithm CHECK starting with h = n-1. If no node is found unmarked, then the algorithm is executed for lower values of h, till an unmarked node is found.
IV. IMPROVEMEN'IX3 IN S U B T R E E AVAIl.ABILITY The availability of a non-faulty subtree of a given height is increased significantly if the cardinality of a minimum critical fault set is increased. The latter can be achieved by introducing some redundancy in the tree. For given values of n and h, there exist many MCFS(n, h). Consider an augmentation of a full binary tree with redundant nodes such that w number of nodes (w z 1) from each minimum critical fault set are replaced by redundant non-faulty nodes. Then, the minimum number of faulty nodes required to make all subtrees of height h faulty becomes f'(n, h) =fbt(n, h) + w. The value of w depends upon the architecture of an augmented tree. This ensures availability of at least one non-faulty subtree of height h if the number of faulty nodes is less than if(n, h). Further, this technique increases the possibility of finding a non-faulty subtree even when the tree contains f'(n, h) or more faulty nodes, unless the faults belong to a critical fault set. M R 34/8---B
1305
1306
R. MITTAL et al. subtrees faulty is given as L(2n-2n'i'h)/(2h-1)J + L(2n+l'i*h-1)/(2h-1)J, L(2n-1)/(2h-1)J - L(2n'i*h-1)/(2h-1)J + I.(2n+l"i*h-1)/(2h-l)J,
>
fbt(n, h) + k, where k =
L(2n+l'i*h-1)/(2h-1)J - L(2n'i*h-1)/(2h-1)]. Since k z 1, for all i, 1 < i < m-l, the
above contradicts T h e o r e m 1.
r--q
l.~mma 3: Each minimum critical fault set contains at least one node from level b, C L m < b < h, where m = [.n/hJ. Proof: Consider the following two cases: a). n = m ' h : For this case CL m = 1. The subtree of height h rooted at node 1 can become faulty only if there exists at least one faulty node at level b, 1 < b ,: h. b). n = m*h + k, where 1 g k < h: Let an MCFS(n, h) not contain a faulty node at level b, C L m < b < h. Then the number of node disjoint subtrees which can be constructed from nodes at level CL m and those below, is =
2n-m*h L(2m*h.1)/(2h.1)J,
=
L(2n-2n-m*h)/(2h-1)J.
A subtree of height h rooted at node 1 can be faulty only if it contains a faulty node at any level 1 through h. Because of the assumption above, it must be at a level 1 through CLm-1. Hence, the total number of faulty nodes required is =
[.(2n-2n'm*h)/(2h-1)J + I
>
fbt(n, h).
The latter is true since [.(1-2n-m*h)/(2h-1)J = 0. This contradicts T h e o r e m 1 Therefore, each minimum critical fault set contains at least one node from a level b, C L m ~ b < h. The above results lead to the following Theorem.
T h e o r e m 2: For a full binary tree of height n and subtrees of height h, n > 2h-1, each minimum critical fault set contains i)
at least one faulty node at each level CL i, 1 < i < m-l, where m = [n/h.l, and
ii)
at least one node that lies at level b, C L m < b ~ h.
z23
For a full binary tree o f height n and subtrees of height h, the cardinality of a minimum critical fault set can be increased by augmenting the full binary tree with redundant nodes which can be used to replace some of the faulty nodes from each MCFS(n, h). We, now, evaluate the cardinality of a minimum critical fault set (from the viewpoint of availability of non-faulty subtrees) for R A E - t r e e [5], AB-tree [6] and Hayes's binary tree [4].
B. RAE-tree and AB-tree The RAE-tree [5] and AB-tree [6] are functionally the same but topologically different. In the presence of one faulty node at each level, an n-level AB-tree (or RAE-tree) can be configured into an n-level non-faulty full binary tree. A minimum critical fault set for an n-level AB-tree
Nonfaulty subtrees
1307
corresponds to the minimum critical fault set for an n-level full binary tree where the faulty nodes are restricted to fewest number of levels. The following Theorem evaluates the minimum number of faulty nodes, lAB(n, h) required in an AB-tree to make all subtrees (of height h) faulty. Theorem 3: Consider an AB-tree of height n and subtrees of height h. The minimum number of faulty nodes, fAB(n, h), required to destroy all subtrees of height h is given as lAB(n, h) = fbt(n, h) + m, where m = ln/h/. Proof: The proof consists of two parts. The first part shows that there exists a set of fbt(n, h) + m faulty nodes which destroys all subtrees of height h. In the second part, we show that at least fbt(n, h) + m faulty nodes are needed to destroy all subtrees of height h. a). Consider the set of nodes {j: j at level CLi, for i = 1, 2 ..... m} U { i':i' at levels CLi, for i = 1, 2,.., m}. It can be observed that if all these nodes are faulty then all subtrees of height h become faulty (Figure 3 for n = 5 and h -- 3). Hence, fAB(n, h) ~ fbt(n, h) + m. b). From Theorem 2, each MCFS(n, h) for a full binary tree contains at least one node at each level CLi, 1 ~ i ~ m-l, and a node at a level b, CL m ~ b ~ h. Hence, one faulty node at each of these levels can be replaced by a non-faulty node at the same level. Thus, lAB(n, h) z fbt(n, h) + m.
Hence, fAB(n, h) = fbt(n, h) + m. C. Hayes's binary tree architecture Hayes's binary tree [4] of height n consists of an n-level full binary tree augmented with n redundant nodes and 2m+l+2m-3 redundant links. In the presence of a single faulty node, an n-level Hayes's tree can be reconfigured into a tree which is isomorphic to an n-level full binary tree. In this tree, there exists one redundant node at each level of the tree. Note that the redundant node at a level can replace one node at that level. However, due to fewer number of redundant links (as compared to that in the AB-tree), the Hayes's tree cannot be reconfigured into a full binary tree of the same height if there is one faulty node at each level. Theorem 4: The cardinality of a minimum critical fault set, flat(n, h), for Hayes's binary tree is given by fht(n, h) = fbt(n, h) + m, where m = Ln/hJ. Proof : The proof is similar to that of Theorem 3.
For example, consider a Hayes's tree of height four and subtrees of height two. Then, a minimum critical fault set is { 1, 2', 3', 4, 5, 6, 7}, as shown in Figure 4. 1 l'p
~
sJ I
Gr
~
~
1%N
7
.St(-'.. 5'
16
18
20
22
24
26
28
3031
5'
Fig. 3. A five level AB-tree with fault set {2', 2, 3, 4', 8-15}.
1308
R. MITTAL et al. Before determining the increase in the cardinality of a m i n i m u m critical fault set for some known augmented binary trees, the characteristics of m i n i m u m critical fault sets are discussed in the following subsection. A.
Characteristics
of
M i n i m u m Critical Fault Sets
We also define critical levels. For given values of n and h, the i th critical level is defined as a level CLi, where CL i = n + 1- i ' h , 1 ~: i g m, where m = /n/hi. The value of m indicates the n u m b e r of critical levels in a full binary tree and a subtree of height n and h, respectively. T h e only critical level in the tree of Example 1 is C L 1 = 3.
L e m m a 1: Consider a full binary tree of height n and subtrees of height h, such that n z 2h-l, and h z 2. Each m i n i m u m critical fault set contains at least one node at level C L 1 = n-h+ 1. Proof (by contradiction): Assume that an MCFS(n, h) does not contain a faulty node at level CL t. T h e n u m b e r of node disjoint subtrees of height h starting from CL 1 is 2 n'h. To m a k e each of these subtrees faulty, at least 2 n-h faulty nodes are required. Since faulty nodes are assumed to be not at level CL1, these nodes must lie at levels below CL1, i.e. at levels C L 1+ 1, CL 1 + 2 .... CLI+h-1. It can be shown that the n u m b e r of node disjoint subtrees which can be constructed from nodes at levels 1 through CL 1 is L(2n-h+l-1)/(2h-1)J. All of these subtrees can be faulty if each of these subtrees contains at least one faulty node. Hence, the m i n i m u m n u m b e r of faulty nodes F required to destroy all subtrees in the entire tree is given as F
=
2 n-h + [.(2n'h+l -1)/(2h-1)J.
If n -- 2h-l, then F = 2n'h+ 1> fbt(n, h), since fbt(2h-1, h) = 2 n-h, which contradicts T h e o r e m 1. But, if n > 2h-l, then F can be rewritten as (since 2 n-h is an integer): F
= L2n'h+ (2 n-h+l -1)/(2h-1)J, = [.(2n-1)/(2h-1) + (2n'h)/(2h-1) J, z L(2n-1)/(2h-1)J + L(2n-u)/(2h-1)J > fbt(n, h),
which again contradicts T h e o r e m 1. Hence, each minimum critical fault set contains at least one faulty node at C L r
L e m m a 2: If n ~ 2h, then each m i n i m u m critical fault set MCFS(n, h) contains at least one faulty node from each critical level CLi, 1 ~ i g m-l, m = Ln/hJ. Proof (by contradiction): Let C L i does not contain any faulty node. Then, the n u m b e r of node disjoint subtrees which can be constructed from nodes at level C L i and from nodes at levels below CL i, is =
2rt-i'h L(2i*h.l)/(2h.1)J,
=
L(2n-2n-i*h)/(2h-1)J.
Similarly, the n u m b e r of node disjoint subtrees which can be constructed from nodes at CL i and levels above CLi, is =
[.(2n + l'i*la. 1)/(2h. 1)J,
Since there exists no faulty node at CLi, all (node disjoint) subtrees can be faulty if each subtree contains one faulty node. Hence, m i n i m u m n u m b e r of faulty nodes, F, required to m a k e all
Nonfaulty subtrees 1'
4'
8
9
1
10 •
1309
11 12
13
14
15
Faulty node
Fig. 4. A four level Hayes' tree with fault set { 1, 2', 3', 4, 5, 6, 7 }.
V. C O N C L U S I O N In this paper, we have studied fault tolerance of binary tree architectures in terms of availability of non-faulty (full) binary subtrees. It is shown that the minimum number of faulty nodes that destroy all subtrees of height h in a full binary tree of height n is equal to fbt(n, h) =L(2n-1)/(2h-1)/. Further, an algorithm, CHECK, is developed which determines whether or not a given fault set destroys all subtrees of a given height. This algorithm can be used to compute the largest non-faulty subtree in a binary tree containing faults. We have also shown that the cardinality of a minimum critical fault set can be increased by adding redundant nodes and links. Using this we have studied availability of non-faulty subtrees in some of the augmented binary tree architectures.
REFERENCES [ 1] G.J. Liopovski and M. Malek, Parallel Computing: Theory and Comparisons, John Wiley & Sons, 1987.
[2] E. Horowitz and A. Zorat, "The Binary Tree as an Interconnection Network: Applications to Multiprocessor Systems and VLSI", IEEE Transactions on Computers, Vol. C-30, No. 4, pp. 247-253, April 1981. [3] S.W. Song, "A Highly Concurrent Tree Machine for Data Base Applications", Proceedings of the 1980 International Conference on Parallel Processing, pp. 259-268, 1980. [4] J.P. Hayes, "A Graph Model for Fault-tolerant Computing Systems", IEEE Transactions on Computers, Vol. C-25, No. 9, pp. 875-884, September 1976. [5] C.S. Raghavendra, A. Avizienis and M. D. Ercegovac, "Fault Tolerance in Binary Tree Architectures", 1EEE Transactions on Computers, Vol. C-33, No. 6, pp. 568-572, June 1984. [6] B. N. Jain, R. Mittal, and R. IC Patney, "Fault Tolerant Analysis and Algorithms for a proposed Augmented Binary Tree", Proceedings of the 9th International Conference on Distributed Computing Systems, Newport Beach, pp. 524-529, June 1989.
[71 A. S. M. Hassan and V. K. Agarwal, "A Modular Approach to Fault-Tolerant Binary Tree
Architectures", Digest of Papers of the Fifteenth Symposium on Fault Tolerant Computing, pp. 344-349, June 1985.
R. MITTALet al.
1310
[8]
B. Becker and H. U. Simon, "How Robust is the n-Cube. - 9,,, Proceedings of the 27th Annual IEEE Symposium Foundations Computer Science, pp. 274-282, October 1986.
[9] R. Mittal, Augmented binary tree architectures and their fault tolerance, Ph.D. Thesis, Department of Electrical Engineering, Indian Institute of Technology, New Delhi, August 1990.