Stochastic analysis of a fault tolerant network system

Stochastic analysis of a fault tolerant network system

Microelectron. Reliab., Vol. 33, No. 3, pp. 303-306, 1993. Printed in Great Britain. 0026-2714/9356.00 + .00 © 1993 Pergamon Press Ltd S T O C H A S...

186KB Sizes 2 Downloads 66 Views

Microelectron. Reliab., Vol. 33, No. 3, pp. 303-306, 1993. Printed in Great Britain.

0026-2714/9356.00 + .00 © 1993 Pergamon Press Ltd

S T O C H A S T I C A N A L Y S I S OF A F A U L T T O L E R A N T N E T W O R K SYSTEM L. R. GOEL and RAKEStt GUFrA Department of Statistics, Institute of Advanced Studies, Mecrut University, Mecrut 250005, India and

V. S. R A N A DIC(NIC), DM Compound District Guna (M.P), India

(Received for publication 19 October 1991) Abstract--This paper deals with a stochastic model related to a converter-based communication network system. In this model two converters are connected in parallel with a network system. Each converter has two failure modes---transient failure and latent failure. Transient failures occur due to minor faults in converters. Further the converter may enter into the latent failure mode if major hardware problems arise in the transient failure mode. The converter may recover automatically when it is in a transient failure mode and needs repair when it is in a latent failure mode. The system also enters preventive maintenance at random epochs when both converters are normal. System failure occurs when both the converters are in a latent failure mode. Using the regenerative point technique, various reliability measures are obtained.

INTRODUCTION

NOTATION AND STATES OF THE SYSTEM

H o a n g P h a m and S. J. U p a d h y a y a [1] have analysed a class of fault tolerant systems using converters. They have assumed that as soon as the output from a converter becomes abnormal, the next converter is immediately switched on, thereby stopping the operation of the earlier converter. M a n y systems do exist where a converter can fail in different modes and the repair facility cares for both preventive maintenance and repair o f the failed converter. In the present paper, we analyse a system model having two converters in parallel with preventive maintenance (PM). The P M is carried out when both converters are working in parallel without any failure. If a converter enters into a transient failure state, its recovery starts automatically. F r o m transient failure a converter may go into its latent failure state, where the repairman begins its repair immediately. The system breaks down when both converters fail due to a latent fault or the system is under PM. A single repairman is provided for repair and preventive maintenance o f the system. Distributions of the times to transient and latent failures are taken to be negatively exponential, while the recovery, repair and preventive maintenance times are assumed to follow general distributions. By identifying the suitable regenerative points, the following characteristics o f interest are obtained:

0

rate of entrance of the system into preventive maintenance ~,/~ rate of transient/latent failure H(" ) c.d.f, of time to recovery of transient failed converter G(. ) c.d.f, of time to repair of latent failed converter K(. ) c.d.f, of time to preventive maintenance of the system Rj(t) reliability of the system, when E 0 = S~E At(t) p[system is up at time t IE0 = S~E]. The other notations which are not defined here may be seen in Ref. [2]. For defining the various states of the system we introduce the following symbols:

(1) Reliability of the system; (2) mean time to system failure; (3) pointwise and steady-state availabilities o f the system; and (4) expected busy period of repairman.

The non-zero dements of the transition probability matrix (t.p.m.) are:

OC converter is operative in the normal mode TF converter is in the transient failure mode LF converter is in a latent failure mode PM system is under preventive maintenance. Using these symbols we have the following states of the system: Upstates So = ( o c , o c ) Si = (TF, OC) $2 = (TF, TF) $3 (LF, OC) S, (TF, LF) = =

Downstates Ss (LF, LF) s6 = (PM) A transition diagram is shown in Fig. 1. =

TRANSITION PROBABILITIKS AND MEAN SOJOURN TIMES

P01 = 2~t/(2~, + 0), p~0 = R ( ~ + / 0 ,

303

P06 = 0/(2~t + 0), p,2 -- ~[1 - t / ( ~ + #)]/(~ +

#),

304

L. R. GOELet al. Transition diagram

so

RELIABILITY AND MEAN TIME TO S Y S T E M FAILURE

s~ H{.) ~

sz H(-)

OC

213

Let U~ be the random variable denoting the time to system failure, when the system starts from S~(i = O, 1, 2, 3, 4). Then, the reliability of the system is given by: R,(t) = p I U , > t]. Taking the failed states $5 and S~ as absorbing, and using the simple probablistic arguments, we have the following recurrence relations:

Ro(t) = Zo(t) + qo,(t ) © R,(t) Rl(t ) = Z l ( t ) + qlo(t) © Ro(t) "t- qt2(t) © R2(t ) d- q13(t) © R3(t)

R2(t) = Z2(t) + q21(t) © R l (t) d- q24(t) © R4(t ) R3(t) = Z3(t) + q30(t) © Ro(t) Ss

(~ Up state

Ss

[ ] Down state

R4(t) = Z4(t) + q,,(t) © Rl(t) + qs3(t) (~) R3(t), (1-5) where

Fig. !. Transition diagram.

Zo(t) = e -(2=+ °)', Z, (t) = e-(= +#)'H(t), Z2(t) = e-2#'/t(t),

Pl3 = fill --/-~(n + fl)]/(a + fl),

P24 -- 1 -- I7(2fl),

P3o = ~'(u),

+ q ~ ( t ) © R4(t)

P21 =/'t(2fl),

Z3(t ) ----e-=,G(t)

and Z4(t ) = e-#'/~(t)tT(t).

P~ = 1 -- d ( ~ )

Taking L.T. of Eqns (1-5) and simplifying for

p,,

f dG(t)e-a'a(t),

=

p43= f

dH(t)e-a'a(t),

R~(s), we have: R* (s) = Nl (s)/Dl (s),

p,,= f fle-#'dt(~(t)tT
(6)

where N I (s) = Z~' + q*l Z * + (l - q~q4a)(qol q12Z2

and -

P~o= f dK(t).

q leq2iZo ) - Zo (q12q24 + q13q34)q41

+ qolq12q24(q43Z3 + Z*) + qolqla(Za + q~Z*)

It can easily be verified that and Pol +Po~ =Plo +P12 +PI3 =/721 +P24 --P30 +P34

Dr(s) = 1 - q~lq~o - q*2q~'l ( 1 - q~q43)

=P41 d-P43 +P45 --P53 =P6o = 1.

The mean sojourn time #~ in state S~ is

E(T,.) = .f P(Ti > t) dt, where T~ is the sojourn time in state S~. Thus, using probability arguments, we have /to = l/(2~t + 0),

#3 = l[1 - (~(ct)]/=,

q~3q~)

-- qgl (q*3 +

q12q24q43)q30.

Taking an inverse Laplace transform of Eqn (6), we can get the reliability of the system, when the system initially starts from So. Using the formula:

E(To) = ~ : R°(t) dt,

~, = l[1 --/-~(ct + fl)]/(ct + #),

/t 2 = l[l --/7(2fl)]/2fl,

- (q*2 q~'4 +

the expression for M T S F is:

E(To) = N , / D , ,

Z,= fe-P'dtd(t)tl(t), m= f ~(t)dt

(7)

where Nt = ~ + Pol/Zl + (1 - P34P43)(PolPl2//2 - PI2P2!/-to)

and = F g ' ( t ) dt.

3

-/,to (P12P24 -I-P13P34)/741

d-polP12P24(P431z3-F 114) +PolPl3(#3 + P34/z4)

A fault tolerant network

305

and

and

D2 -- [(1 - P34P43 - P34P4s) ( 1 - Pl2P21 )

D~ = 1 - Po~P~o- Pl2P21 (1 - P34P43)

- (P12P24 4- P13P34)q41 ] (/,4o-I-Po6~ )

- (P12P24 + P13P34) - Pol (Pl3 -I-PI2P24P43)P3o.

+Pot(P30+P34P4t)fll -t-PolPl2( 1 --P34 AVAILABILITY ANALYSIS

× (P43 "Jr"P45 ))//2 -]- (PolP12P24(P43

According to the definition of Ai(t), we have the following recursive relations in Al(t):

+/745 -I-P01PI3))/~3 + P01 (PI2P24 + P13P34)/~4 +)901 (PI2P24 Jr-P13P34)P4sPS.

Ao(t) = Zo(t) + qol(t) © A t ( t ) + qo6(t) © A6(t) BUSY PERIOD ANALYSIS

/1, (t) = Z t ( t ) + qlo(t) © Ao(t ) + q12(t) © A2(t)

Using the definition of B~(t), we have the following relations:

Jr- ql3(t) (~ A3(t) A2(t) ----Z2(t) "[- q21 (t) (~ A I (t) -Jr- q24(t) ~) A4(t )

Bo(t) = qol (t) © B I (t) -I- qo6(t) © B6(t)

A3(t ) = Z3(t ) -F q30(t) © Ao(t) + q34(t) © A4(t)

B I (t) = qlo(t) © Bo(t) + q12(t) © B2(t) + ql3(t) © B3(t)

A4(t) = Z4(t) + q4~(t) © A~(t) + q43(t) © A3(t)

B 2(t) -- q2z (t) (~) B 1(t) + q24(t) © B4 (t)

+ q4s(t) © As(t)

B3(t) = q30(t) © Bo(t) + q34(t) © B4(t) + W3(t) As(t ) = q53(t) © A3(t) B4(t) = q41 (t) © B 1(t) + q43(t) ~) B3(t ) As(t) = q60(t ) © Ao(t ).

(8-14)

+ q45(t) © Bs(t) + W4(t)

Taking a L.T. of the above relations and solving the resulting equations for A * ( s ) , we get: A* (s) = N2(s)/D2(s ),

(15)

Bs(t) B6(t)

=

qs3(t) © B3(t) + Ws(t)

=

q60(t) © Bo(t) + W6(t)

where

where

W3(t) = Ws(t) = ¢~(t),

*

*

*

W4(t) = e-a'I-'l(t)~(t)

and

N 2 ( s ) = Z * [ ( 1 - - q * 2 q * l ) - - (ql2q24 * • + q13q34)q41] • • • •

(17-23)

W6(t)=l~(t).

*

+ q~4(q~3 + q45q53)[Zo (q12q2! -- 1) Taking a L.T. of (17-23) and simplifying for B~ (s), we get:

-- q ~ ( Z * + q~2Z~)] + q~lq~2 x [Z~ + q~4((q~'3 + q4sqs3)Z3 * * * + Z4* )]

B* (s) = N 3(s)/D 2(s),

+qotql3(Z3 + q ' Z * )

(24)

where

and

N 3 ( s ) = qolq13[W3 * * * + q ~ ( W * + q*s W*)]

D2(s ) = (1 - q ' q * ) [ 1 - q~2(q~l "[- q24q41)

• * * * W~ + W * + q * 5 + qolq12q24[q43 x (W~ + q*a W*)] + q * W*[(1 - q*2q*l)

- q34(q4|qz3 + q*3 + q~sq~3)]

x(l-

+q~4(q*3+ q45q53)[qolqlO * * * * + q*2q~! •

*

*

*

*

*

* * - q34q45qs3) * * * - q13q34q41]. * * * q34q43

In the long run, the fraction of time for which the system is under repair is:

x (I -- q*q~o)] -- qol [q12q24(q43 + q45qs3) +ql3]q3o -- qolqlo.

B 0 = limit sB~(s) = N3/D 2,

The steady state availability of the system is: Ao(o~) = N2/D2,

S~0

(16)

where in terms of

where W*(0) = W*(0) = f G ( t ) dt = p, N2 = Po[(1 - Pt2P2m) - (P~2P24 + P~3P3~)P~] +P34(P~3 + P45)[P.o(P~2P2~ - 1)

W~ (0) = ~e-#tfl(t)(~(t) dt = P4,

-Pol(/~l + P12/~2)] -t- PolP12[/~2 +P24 x ((P43 + P45)P3 +/~4)] + Po~P~3(P3 + P34P~)

W*(O) = lib(t)dt

J

= P~,

(25)

306

L. R. G o ~ et al.

REFERENCES

we get

N3 ~'potpl3~(1 "4-P34 + P34P45) + PolP12P24 × [/~(P43 +P45 -t- PdsP53) -I"/~4] + P061~6 x [(l -- Pl2P2! ) ( l -- P34P43 - P34P45P53) -

-

P13P34P41]"

1. Hoang Pham and Shambhu J. Upadhyaya, Reliability analysis of a class of fault tolerent system, I E E E 7~ans. Rel/ab. 38, 333-337 0989). 2. Rakesh Gupta and L. R. Goel, Profit analysis of a two-unit stand-by system with administrative delay in repair, Int. J. System. Sci. 20, 1203-1712 (1989). 3. Zoram Paulvik Predreg Rokie, An approach of fault tolerant system reliability modelling, Microelectron. Reliab. 29, 343-348 (1989).