~
Microelectron. Reliab., Vol. 37, No. 6, pp. 885-891, 1997
© 1997 ElsevierScienceLtd Printed in Great Britain.All rights reserved 0026-2714/97 $17.00+ .00
) Pergamon PII: S0026-2714(96)00128-X
RELIABILITY ANALYSIS OF HYPERCUBE MULTICOMPUTERS C. R. TRIPATHY,t~ R. N. MAHAPATRA§ and R. B. M I S R A t + Department of Electrical Engineering; § Department of E & EC Engineering, Indian Institute of Technology, Kharagpur 721302, India (Received for publication 18 June 1996)
Abstract--This paper proposes a topology-based general algorithm to generate all Markov states of an n-dimensional hypercube. Considering the system to be degradable, we define task-based reliability (TBR) measures for the hypercubes. Using the proposed algorithm, Markov models for two and three-dimensional hypercubes are developed and solved to obtain various task-based reliability measures. Markov states for hypercubes of higher dimensions are generated using the proposed method. Results are validated through extensive simulation. © 1997 Elsevier Science Ltd. All rights reserved.
1. INTRODUCTION
modelling approach [8, 9-1 is considered to be more appropriate than the existing techniques. There are two aspects to the problem of dealing with a Markov model with a large number of states: construction and analysis. The Markov models can be solved by well-known software tools like HARP and SHARPE [10]. Though these are quite helpful for analysing the reliability of simple systems under rigid assumptions, these packages need either the fault-tree or Markov chain as the input, and are not general enough to handle complex systems. Unfortunately, there exists no tool to generate a fault-tree (or Markov chain) of complicated networks like n-dimensional hypercubes. For these reasons, researchers have resorted to developing approximate models. In the present attempt, our aim is to develop an efficient technique to evaluate the exact reliability of an n-dimensional hypercube (hereafter denoted as HC,) under various task requirements. The paper is organised as follows: Section 2 discusses the topological features of the hypercube multicomputers. Section 3 introduces the concept of task-based reliability and presents a general algorithm for generating all Markov states of an n-dimensional hypercube taking topology into consideration. In Section 4 we obtain the Markov states for two, three and four-dimensional hypercubes using the proposed algorithm, and compare reliability results for two and three-dimensional hypercubes under various task requirements. The plotted analytical results are validated through extensive simulation. The paper concludes with Section 5.
With continuing advances in VLSI technology, research on parallel computers is drawing more and more attention. Based on interconnections, parallel and distributed computer architectures are broadly classified into two categories, namely: tightly coupled and loosely coupled machines. In a tightly coupled system, all processors in addition to their own local memory have access to a global memory and interprocessor communication is achieved through this shared memory. On the other hand, in a loosely coupled system each processor has its own private memory and message/packet switching is used for communication among processors. Among all the loosely coupled parallel processors, the hypercube (binary n-cube) has enjoyed the largest popularity due to many attractive properties such as regularity, symmetry, reconfigurability, scalability and partitionability [1, 2]. Quite a large number of machines based on the hypercube topology have been experimented and employed in several research/commercial sectors. As the size and complexity of a system increases, its performance and reliability analysis becomes extremely important. Reliability issues of multiprocessor interconnection networks were presented in Refs [3-5]. In Ref. [6], the reliability measures of the hypercube have been discussed using a graphtheoretic approach. A combinatorial model for reliability evaluation of hypercube multicomputers has been proposed by Kim et aL in Ref. [7]. Their model is based on recursive decomposition of higher dimension hypercubes into smaller cubes. However, the models proposed so far do not consider the complete on-line system behaviour of the hypercube. For handling dynamic situations, the Markov ,+C. R. Tripathy is on study leave from the Department of Electronics & Telecommunication Engineering, University College of Engineering, Sambalpur University, Burla 768018, India.
2. HYPERCUBE TOPOLOGY In this section our purpose is to explore the basic topological properties of the hypercube multicomputers [I]. The hypereube belongs to a class of M I M D (multiple instruction multiple data) parallel
885
886
C.R. Tripathy et al.
processors where a large number of identical processors are connected to each other by a definite rule. In this type of machine, interconnection is achieved by message passing and computation is data-driven. We will model hypercubes as undirected graphs, where nodes represent stand alone processing modules and the edges correspond to interprocessor communication links. Definition 2.1: A graph G = (V, E) is a hypercube of dimension n if and only if (i) (ii) (iii) (iv)
V has 2"-vertices and E has n,2"-1 edges. Every vertex has degree n. G is regular, symmetric, finite and connected. There exists an edge between any two vertices whose binary addresses differ by one and only one bit.
Definition 2.2: An n-cube can be partitioned to two (n - 1) cubes in n different ways. Definition 2.3: There are n!2" different ways in which the 2" nodes of a hypercube can be numbered. Definition 2.4: The minimum distance between two arbitrary nodes A and B is equal to the number of bits that differ in the labels of A and B, i.e. to the Hamming distance H(A, B). Definition 2.5: If A and B are two nodes of an n-cube with Hamming distance k = H(A, B), then there exist k node-disjoint parallel paths of length k between A and B. In addition there are (n - k) parallel paths of length (k + 2) between A and B.
3. P R O P O S E D M E T H O D
Hypercube architecture enjoys the largest popularity among all the recent parallel computers due to its topological supremacy. Hypercube multicomputers, because of their high connectivity and regularity are quite useful in multi-tasking environments. Consideration of task-based reliability (TBR) is quite important for a degradable computer system. We define TBR in the following paragraph: Definition 3.1: If a task needs K good processors for its execution, then the system remains operational as long as these minimum resources are available on the system. Otherwise, the system enters into a failed state. The TBR is represented by R(t) and is given by: R(t) = Pr{some specified minimum number of processors are operating to execute a given task at a specified time} N-K
=
~, ~ ( t ) , ~, = o i=0
where, N = total number of processors in the network, K = number of good processors required for the
task execution, P~(t) = state i probability at time t and #~ = repair rate in state i. In this case, failures are tolerated as long as some minimum number of nodes is available for execution of a task. In a real life situation, some specific tasks may require at least K processors out of total N to work as a connected group in a prescribed configuration, whereas some other tasks may need only K working processors to be simply available irrespective of their topological configuration. We will elaborately discuss this aspect in Section 4. Further, there may be some situations where the same number of good processors are available, but in different topological configurations. As an example, consider the hypercube of Fig. l(a). A situation may arise when only the processors 1, 3, 4, 5 and 7 are working as a connected group. This state is denoted by (5, 5), where the first entry (5) denotes the total number of working processors and the second figure (5) signifies the total number of effective links [Fig. l(b)]. From this state, if processor 1 or 7 fails, then the system enters into a state (4, 3) [Fig. l(c)], but if processor 3 fails the system again enters into a new (4, 3) state with an entirely different topological configuration [Fig. l(d)]. It is quite interesting to note that although both of these (4, 3) states contain an equal number of processors and effective links, their topological configurations are entirely different from one another. Therefore, they will fulfil two entirely distinct tasks. However, this type "ffproblem has not been analysed in the available literature. In this section we present a general algorithm called H M S G (Hypercube Markov State Generator) which generates all possible Markov states of an ndimensional hypercube taking the above topological aspect into consideration. An n-dimensional hypercube for n = 4 is shown in Fig. 2. It is observed that any two arbitrary nodes of a hypercube are at k Hamming distance away from each other, where Hamming distance between two binary sequences U = N 1 / 2 2 U 3 , , - U n and v = u 1 u 2 u 3 , , . u n is given as: H(u, v) = ~ k(ui, vi) k(ui, vi) = 1, ui ¢ vi k(ui, vi) = O, ui = v~. The proposed algorithm exploits the above symmetric topology of the hypercube and uses a recursive Depth-First-Expansion approach where a state once created is expanded to its maximum depth. From a parent state we enter into a state containing all nodes which are at k Hamming distance away from it. The configurations containing the same number of processors and links are compared among themselves to test the isomorphism. The non-isomorphic configurations are assigned to different states. The algorithm does not require a fault tree as input and considers all possible dynamic configurations on
Hypercube multicomputers
6(110)
(a)
,7 (111)
887
(b)
4~.
5
~(oo~)
o (ooo)
(d)
(c)
7
,
3 ~
4:
/
5
5
1
Fig. 1. Different configurations of a three-dimensional hypercube; (a) three-dimensional hypercube, (b) (5, 5) state configuration, (c) (4, 3) state configuration, (d) (4, 3) state configuration. processor failure rate
(x, Yi): Markov state with xi working processors and
/ '/
Fig. 2. Hypercube topology for n = 4.
Yl effective links set of all nodes with degree d; 0 ~< d ~< n H.D.: Hamming distance card(S): total number of elements in the set S R(t): task-based reliability.
The proposed algorithm hereafter referred to as H M S G is presented below.
3.2. Generation of Markov states for hypercubes occurrence of processor faults so as to fulfill various task requirements. The proposed algorithm is straightforward and can handle an n-dimensional hypercube with desired task requirements (degradation). The following assumptions and notation are made throughout this paper for reliability analysis.
3.1. Assumptions and notation
Assumptions. (1) The system consists of interconnected binary-state components: each component is either in an operational state or in a failed state. (2) All links are perfect, whereas processors (nodes) are imperfect, identical and homogeneous with a constant exponential failure rate 2. (3) A link is ineffective when it is connected to less than two nodes. (4) The system is degradable. (5) A processor once failed, can not be repaired. (6) All failures are statistically independent.
Notation. n: dimension of the hypercube (i.e. log 2 N) N: number of processors (i.e. 2")
Algorithm (HMSG) Initialise Create a hypercube of dimension n. Generate the state (x 1, Yl); where xl = N and Yl = n.2"-1 with N.2 goto state (x2, Yz); where x 2 = N - 1 and Y2 = n.(2"-1 _ 1) call Expand (x2, Y2) end; Procedure {Expand}:input: state (x, y) output: M a r k o v states if x = 0; exit; else Let d be the minimum degree of nodes in the
(x, y) with 2.card(Sd) goto the new state (xi, Yi) [if not isomorphic to any one of the previous states] with one of the nodes in Sa failing where xi=x--1 andyi=y-d if [(xi, Yi) a new state] call Expand (x~, y,) k=l while {all nodes are not considered in state (x, y)} do
C.R. Tripathy et al.
888
Let G = {vlH.D.(v, Sd) = k}; where H.D.(v, Sd) = k implies that there exists a node in set Sd which is at k Hamming distance from a node v (not considered so far) with Lcard(G) goto new state (x j, y j) [if not isomorphic to any one of the previous states] with one of the nodes in G failing where x~ = x - 1; yj = y - degree of the node failing. if [(x~, Yi) a new state] call Expand (xj, y~) k=k+l end (while do) end (Procedure {Expand}). The algorithm H M S G has been used for generating the Markov states and for evaluating the task-based reliability of hypercube architecture. In the following section, some of the analytical results have been presented and discussed.
4. RESULTS AND DISCUSSIONS In this section we obtain Markov states on the Deck-Alpha machine for two, three and fourdimensional hypercubes using the H M S G algorithm for 25~o, 50~o, 75~o and 100~o degradations (see Table 1). However, the algorithm can be implemented for analysing higher dimensional hypercubes with the availability of better computational facilities. For the purpose of illustration we construct Markov state diagrams for two and three-dimensional hypercube networks (Figs 3 and 4). The models are solved for various task requirements: (i) with 25?/0 degradation (that is the case when the task requires at least 75~o processors to be in working state); (ii) with 50~ degradation, i.e. the task requires 50~o processors to
work perfectly; (iii) without any degradation (here the task requires all the 100~o processors to work perfectly). There are clearly two possibilities with K working nodes: Case I: when the task requires K-out-of-N connected working processors and Case II: when the task needs K-out-of-N working processors. The state nos 6 and 8 denoted by entries (0, 0) in Figs 3 and 4, respectively, are the failed states. These models are solved for Case I, with no degradation and plotted in Fig. 5 for the purpose of analysis. It is observed that reliability of the two-dimensional hypercube is in general higher for a rigid task as above. This difference is quite remarkable and is almost 25~o during the initial 50-250 h for 2 = 0.001. We compare the reliability under Case I with that of Case II in Fig. 6 for a three-dimensional hypercube for 50~ degradation. The result infers that under flexible task requirements the reliability of the hypercube increases appreciably. At 400 h R(t) takes an upward lift from 61~,, in Case I to 72~o in Case II with 50~/o degradation; whereas this value is only 21.8~o under 25~o degradation for Case I (Table 2). Figure 7 exhibits the effect of processor failure rate on the hypercube reliability for Case II under 50~ degradation and compares the analytical results with those obtained through Monte Carlo simulation [10]. In our approach, two events are modelled, i.e. waiting time in a state and the departure time from a state. The results presented in Fig. 7 are based on different failure rates; where the solid lines represent analytical results and the dotted lines are for simulated results. It is seen that by increasing the failure rate from 0.001 to 0.005 the reliability drops by 80~,,, at a mission time of 500 h. Considering a reverse situation we find this value of reliability to improve only by 20~o when the processor failure rate is decreased from 0.001 to 0.0002.
Table 1. Markov states generated by the HMSG algorithm Number of generated states
Hypercube size S1.no.
dim
Proc.
1
2
4
2
2 3 4
3 4 5
8 16 32
5 24 990
25~degdn
50~degdn
75~odegdn
4
14 152 2700T
Fig. 3. State diagram for a two-dimensional hypercube.
100~odegdn
5
6
19 182
21 228
889
Hypercube multicomputers
3J~
t,),
2). 3~ GX
2X
Fig. 4. State diagram for a three-dimensional hypercube.
1-0
,
o.8
~-- o.ool . . . . . . 2- oim. Cube - 3-Dim. Cube
i ~~,
0"6 ~5 O
0"4
~
"6 iv"
%%•
0.2
0
~"
a
0
,
200
I~n
400 600 Time in hours
"
800 P
Fig. 5. Reliability of a two and three-dimensional hypercube.
1000
890
C.R. Tripathy et al. 1"0,
~
,
-"~"~.
O.9
),
• 0.001
~.~'~'
0"8
.
~
~%,
.
.
--
I °''
0"6
Without
.
--
With
Conn,
Conn.
~'\
-~ 0.5 "
0-4 0.3
0.2
0
I
I
I
1
I
200
400
600
800
1000
Time
1200
in h o u r s
Fig. 6. Task-based reliability of a three-dimensional hypercube (under 50% degradation).
Rel./sim of 3-Dim HC with 50% degrad., for vadous failure rates
1 :0.0002
0.9 0.8 0.7
,~ :0.001
^ 0.6 __>, "~ 0.5 cc0.4 :0.005
0.3 0.2 0.1 C 0
100
200
300
400 500 600 Timein Hours -->
700
800
900
1000
Fig. 7. Analytical/simulation rel. plot for various failure rates.
5. CONCLUSION In this paper we have proposed a new, but efficient algorithm for Markov states generation in ndimensional hypercubes taking network topology into account. We have defined task-based reliability (TBR) and obtained solutions for various task requirements using the Markov approach. The effects of failure rates
on the system reliability are quite interesting. The analytical solutions are in close agreement with the results obtained through simulation, with a maximum variation of about 0.4%. This validates the usefulness of the proposed method. With little modifications our method can be extended for evaluating the reliability of other multicomputer networks, such as Crosscube, Hierarchical hypercubes and Folded hypercubes.
Hypercube multicomputers Table 2. Reliability of a three-dimensional hypercube (considering conn.) Reliability of hypercube, 2 = 0.001 Time (h) 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
25% degrdn 1.0000000000 0.9946028393 0.9665394751 0.9120169655 0.8366105422 0.7485361465 0.6555373331 0.5636991542 0.4772259696 0.3986504384 0.3291919266 0.2691254761 0.2181012291 0.1753955628 0.1400953016 0.1112248156 0.0870201617 0.0690178370 0.0539999390 0.0420834518 0.0326794417
509/0 degrdn 1.0000000000 0.9998288651 0.9977896188 0.9909446317 0.9767813738 0.9538905701 0.9220167158 0.8818362121 0.8346548739 0.7821198603 0.7259859503 0.6679463027 0.6095231113 0.5520074556 0.4964362217 0.4435948734 0.3940367248 0.3481114762 0.3059977355 0.2677359046 0.2332591249
Acknowledgements--The authors wish to thank Mr C. K. Agrawal for his helpful discussion during the simulation. They gratefully acknowledge the facilities provided by the Reliability Engineering Centre, Indian Institute of Technology, Kharagpur (India).
891
REFERENCES 1. Saad, Y and Schultz, M. H., Topological properties of Hypercubes. IEEE Trans. Computers, 1988, 37, 86-88. 2. Johnson, S. L. and Ho, C. T., Optimum broadcasting and personalized communication in hypercubes. IEEE Trans. Computers, 1989, 38, 1249-1268. 3. Tripathy, C. R., Patra, S., Misra, R. B. and Mahapatra, R. N., Reliability evaluation of multistage interconnection networks with multi-state elements. Microelectron. Reliab., 1966, 36, 423-428. 4. Tripathy, C. R., Misra, R. B., Patra, S. and Mahapatra, R. N., Reliability modeling and analysis of multiprocessor systems. Proc. International Conf. on Stochastic Models, Optimization Techniques and Computer Applications. Wiley Eastern, New Delhi, 1994, pp. 77-91. 5. Tripathy, C. R., Mahapatra, R. N. and Misra, R. B., Reliability evaluation of multistage interconnection networks by network decomposition. Proc. 1st IEEE International Workshop on Parallel Processing. Tata McGraw Hill, New Delhi, India, 1994, pp. 228-236. 6. Soh, S., Rai, S. and Trahan, J. L, Improved lower bounds on the reliability of the hypercube architectures. IEEE Trans. Parallel and Distributed Systems, 1994, 5, 364-378. 7. Kim, J., Das, C. R., Lin, W. and Feng, T. Y., Reliability evaluation of hypercube multicomputers. IEEE Trans. Reliability, 1988, 38, 121-129. 8. Misra, K. B., Reliability Analysis and Prediction--A Methodology Oriented Treatment. Elsevier Science, Amsterdam, 1992. 9. Misra, K. B. (Ed.), New Trends in System Reliability Evaluation. Elsevier Science, Amsterdam, 1993. 10. Sahner, S. A. and Trivedi, K. S., Reliability modeling using SHARPE. IEEE Trans. Reliability, 1987, R-36, 186-193. 11. Kreutzer, W., System Simulation Programming Styles and Languages. Addison-Wesley, Sydney, 1986.