Reliability evaluation of systems with critical human error

Reliability evaluation of systems with critical human error

Microelectron. Reliab., Vol. 24, No. 4, pp. 743-759, 1984. Printed in Great Britain. RELIABILITY EVALUATION 0026-2714/8453.00 + .00 © 1984 Pergamon...

422KB Sizes 1 Downloads 68 Views

Microelectron. Reliab., Vol. 24, No. 4, pp. 743-759, 1984. Printed in Great Britain.

RELIABILITY

EVALUATION

0026-2714/8453.00 + .00 © 1984 Pergamon Press Ltd.

OF

SYSTEMS

BALBIR

WITH

CRITICAL

HUMAN

ERROR

S. DHILLON and

R. Department

B. MISRA

of M e c h a n i c a l

UNIVERSITY Ottawa,

Engineering,

OF OTTAWA,

Ontario

K1N

6N5,

CANADA. (Received

for publication

2 4 t h A p r i l 1984)

ABSTRACT This paper presents redundant state

four mathematical

systems with critical

probabilities

models to evaluate

human e r r o r .

Equations

and mean t i m e t o f a i l u r e

and mean t i m e t o f a i l u r e

plots

are

reliability

of

for

system reliability,

are developed.

System r e l i a b i l i t y

shown.

INTRODUCTION Humans i n t e r a c t actions

s y s t e m s i n many w a y s .

may b e s e e n e a c h day i n n u c l e a r

cockpits tion

with engineering

of aeroplanes

and so on.

power p l a n t s ,

In the earlier

w a s g i v e n o n l y t o s y s t e m , and r e l i a b i l i t y

overlooked. out that

This fact

realistic

reliability

was r e a l i z e d

analysis

o f t h e human e l e m e n t a s w e l l .

20-30 p e r c e n t

of failures,

directly

rooms,

evaluation,

atten-

o f t h e human c o m p o n e n t was [1~

i n 1958.

must take into

He p o i n t e d

consideration

A c c o r d i n g t o D. M e i s t e r

or indirectly

Some o f t h e e x a m p l e s o f t h e human e r r o r s

computer operation

rehability

by H.L. W i l l i a m s

system reliability

Examples of i n t e r -

~2]

the

about

a r e due t o human a c t i o n s .

are as follows:

i)

Maintenance errors

ii)

Misinterpretation of instruments

iii)

Wrong actions

A list of publications on human reliability is given in reference !3~,

Similar

models can be found in references ~-6]. The purpose of this paper is to present a methodology and procedure for evaluating reliability of redundant systems with critical human error.

There-

fore, this paper presents four Markov models of well known redundant configurations E73.

MR. 24/4--J

743

744

B.S. DHILLON and R. B. MISRA

Model I represents a non-repairable

two identical unit parallel system.

The system transition diagram is shown in Figure i. critical human errors or hardware failures.

The system fails due to

This model separates only the

critical human errors from the hardware failures.

In other words only those

human errors due to which both units fail (i.e. when both units were operating normally) or would have failed More clearly,

due

(i.e. when only one unit was operating normally).

to a human action the system fails when both units are

functioning normally.

Furthermore,

due to the same human action to fail simultaneously)

when only one unit is operating normally,

(in other words that action which caused both units

the operating unit failed.

In other words,

units had been operating normally instead of only one, have failed due to the same action. a room (i.e.

For example,

if both the

the entire system would

if there had been a fire in

due to a human error where both units are located,

the entire system

two unit system) would fail irrespective of whether one or two units are

operating successfully. considered

Therefore,

in Model I-IV.

this type of critical human errors are

Non-critical human errors are added into hardware

unit failures in all four models. The only real difference between model I and II is that the model I represents a non-repairable system whereas model II essentially represents of model I but with one exception

(i.e.

the system is repaired from its one

unit failed state to both unit operating state). differ quite significantly.

the same system

The analysis of both models

The state space diagram of the system is shown in

Figure 2. Models III and IV represent a two identical unit standby system. unit is active and the remaining one is on standby.

As soon as the active unit

fails due to hardware failure or non-critical human error, with the standby unit.

it is immediately replaced

The operating and the standby unit may fail simultaneously

due to critical human errors as outlined for Models I and II. standby system

In both cases one

(i.e. when one unit had failed,

Futhermore,

the

the standby is operating) may also

fail due to the same critical human error which have caused the operating and standby units to fail simultaneously.

Fire in a room due to human error where

the two unit standby system is located represents an example of the critical human error. The state-space diagrams associated with models III and IV are shown in Figures 3 and 4.

The only difference between models III and IV is that in model

H u m a n error

III the failed unit is repaired

whereas

745

in model IV it is not repaired.

However,

in both cases the failed system is never repaired. The state probability, are devloped common

to

system reliability

for all four models.

and mean time to failure expressions

The following

assumptions

and notations are

all the four models:

ASSUMPTIONS i.

Failure,

repair and human error rates are constant.

2.

The repaired

3.

System failures

4.

Entire

5.

The critical

system is as good as new. are statistically

independent.

system can fail due to critical

human errors.

human errors may occur when either both system units are

good or when one system unit is good. 6.

System units are identical.

NOTATION 0

in circles of Figures i and 2 denotes are in operating

0

state

in circles of Figures the remaining

1

that both the units in the system

3 and 4 denotes

that one unit is operating

and

one is on standby.

in circles of Figures 1,2,3,

and 4 denotes

that only one of the unit

and 4 denotes

that the system is in

is operating 2

in circles of Figures failed

3

1,2,3,

state due to critical

in circles of Figures

1,2,3,

human error. and 4 denotes

failed state due to hardware failure rate

that the system is in

failures plus non-critical

human errors.

%

Unit constant

%hi

Constant

critical

human error rate when two units are in operating

state.

%h2

Constant

critical

human error rate when two units are in operating

state

(Note in the standby

(this also includes non critical human errors)

system either one unit is operating

one is on standby or one unit has failed, Constant

and the other

the standby operating.

unit failure rate

P.(t) i

Probability

that the system is in ith state at time t; for i = 0,1,2,3.

R(t)

System reliability

MTTF

System mean time to failure.

S

Laplace

at time

transform variable

t

746

B . S . DH1LLONand R. B, MISRA

ANALYSIS In this section equations for Markov models I, II, III and IV are developed Model I

%

2% r

Fig. 1

System transition diagram for 2 unit paralled system

The system of first order differential equations associated with Fig. i is P'o (t)

=

- (2% + %hl )

=

- (% + %h2 )

Po(t)

(i)

!

PI (t)

Pl(t) + 2% Po(t)

(2)

!

P2 (t)

=

%hl Po (t) + %h2 PI (t)

(3)

!

P3 (t)

=

% Pl(t)

(4)

Where the prime denotes the differentiation with respect to time At

t = o

P (t) = 1 and other

initial

condition

probabilities

are

t.

equal

to

O

zero. Solving the set of equations (i) - (4) yield the resulting state probabilities in terms of Laplace Transforms as follows. P

(s)

-

Pl(S)

=

P2(s)

=

o

1 s +a 1

(5)

2% (s + al)(S + a2) ~%%h2

+

%hl (s

(6) +

a2)] (7)

s(s + al) (s + a 2)

P3 (s)

=

2t 2

s (s + al) (s + a2)

(8)

H u m a n error

747

where aI

= 2X + %hi

a2

=

X + %h2

The state probabilities of equations

are obtained

by taking inverse Laplace

transform

(5) - (8) and are given by

_

alt

P (t) o

=

e

(9)

Pl(t)

=

Al(e

P2(t)

=

A 2 - A3e

P3(t)

=

A 5 - A6e

_alt

_a2t )

(io)

- e _alt

_a2t - A4e

_alt

(ii) _a2t

- A7e

(12)

Where AI

=

2% a2 - aI

2%%h2 + %hla2

A2

ala 2 2%Xh2 + Xhl(a2-al)

A3

a I (a 2 - al)

A4

E

2%Xh2 a2(a I - a2)

A5 =

212/ala 2

2X 2 A6 a I (a2-a I) 2X 2 A 7 -a2(al-a 2)

The system reliability obtained R(s)

by adding =

equations

Po(S) + Pl(S)

expression (5) and

in terms of Laplace (6)

transform

is

B.S. DHILLON and R. B. MISRA

748

or

s+a

R(s)

3

(13)

(s + al)(S + a21 where

A3 =

a 2 + 2X

The system MTTF is given by lim MrYF

=

R(s) S

-~ O

lim

s + a3

s ÷ o (s + al)(S + a 2) or

a3

MTTF

3% + Xh2

(141

(2~ + %hl ) (~ + Xh2 )

aI a2

From equation (9) the system reliability is given by

R(t)

= ~-i I

s ,+ a 3 (s + al) (s + a21

or

_

R(t)

=

(i + A I) e

alt

_ a2t -

(151

Ale

Plots are shown for system reliability and MTTF in figures 5,6 and 7 for different values of

X, t

and critical human error rate.

Model II

2~

v

Fig. 2.

System diagram for two unit parallel system with repair.

Human error

749

The system of first order differential equations associated with Figure 2 is

P'o (t)

--

(2% + %hl )

Po(t) + B Pl(t)

(16)

!

PI (t)

(% + %h2 + ~) P1 (t) + 2% Po(t)

(17)

%hi Po (t) + %h2 P1 (t)

(18)

% Pl(t)

(19)

T

P2 (t)

v

P3 (t)

=

At t = o,

Po(t)

= 1.0

and o t h e r

initial

condition

probabilities

are

equal

to zero. Solving

the

set

of equations

(16)

-

(19)

yield

the resulting

state

proba-

in terms of Laplace transform as follows

bilities

eo(S)

=

s 2 (s + b 2)

Pl(S)

=

2%s 2 / A

/ A

(20)

(21)

P2 (s)

=

s(2%%h2 + %hl (s + h 2) ) / A

(22)

P3(s)

=

2% 2 s/ A

(23)

where bI

=

2% + %hl

b2

=

% + %h2 + ~

KI

=

3% + %hl + %h2 + ~

K2

=

2%2 + 2%%h2 + %%hi + %hl%h2 + ~%hl

b3

=

K I +~ KI2 - 4K 2 2 K I - 4 K I 2 - 4K 2 b4

= 2

A

=

s 2 (s + b 3) (s + b 4)

The system state probabilities

in the time domain are obtained from

equations (20) - (23)

Po(t)

=

Ble-b3t + B2e-b4t

(24)

750

B.S. DHILLON and R. B. MISRA

Pl(t)

=

B 3 (e-b3 t _ e-b4 t)

P2(t)

=

B5

P3(t)

=

B 8 e -b3t +

(25)

e-b3 t + B6e-b4t + B 7

B9e-b4t + BIO

(26)

(27)

where (b3-b 2) B1 (b3-b 4)

(b2-b4) B2

(b3-b4 ) 2%

B3 (b4-b 3) 2%%h2 + b2%hl B4

=

B5

=

B 4 + b3%hl b 3 (b3-b 4) b4%hl-B 4 B6

= b 4 (b3-b 4)

B4 B7 b3b 4 2% 2 B8

b 3 (b3-b 4) 2% 2 B9 b4(b4-b 3) 2% 2 BI0 b3b 4 The system reliability expression in terms of Laplace transform is obtained by adding equations (20) and (21)

i R(s)

=

X

s+b 5 Pi(s)

i==O where

b 5 = b 2 + 2%

=

(28)

(s + b 3) (s + b4)

Human error

Inverse Laplace transform of equation

751

(28) gives the reliability

expression in time domain

R(t)

=

(29)

(B I + B3) e-b3 t + (B 2 _ B 3) e-b4 t

The system MTTF may be obtained from equation

(28) using final

value theorem 31 + 1h2 + MTTF

=

lim

R(s) =

(3o)

s+O

212 + 2llh2 + 11hi + lhllh2 + ~ lh I

The plot of equation (29) and (30) are shown in figures 5,6, and 7 for different values of

I,

t

and critical human error rate.

Model III

L

Fig. 3.

System transition diagram for two unit standby system.

This model represents

a 2 unit standby system.

units fail is not considered.

The repair when both

The system of differential

equations associated

with this model in terms of Laplace transform is as follows=

= 1.0

(31)

I Po(S) = 0

(32)

(s + % + Ih2) Po(S) - pPl(S)

(s + I + Ih2 + ~) PI(S) -

sP2(s ) - Ih2 Po(S) - Ih2 Pl(S)

sP3(s) - IPl(S)

= 0

= 0

(33)

(34)

752

B.S. DHILLON and R. B. MISRA

Solving expressions

the above

equations

in Laplace

resulting

state probability

transform:

=

P (s) o

(31) - (34) yields

(s + e)

(35)

(s + Cl)(S + c2)

PI (s)

X

=

(36) (s + Cl)

Xh2

(s + c2)

(s + c 3)

(37)

P2(s) s (s + Cl)

(s + c2)

)12

(38)

P3 (s) s (s + Cl)

(s + c2)

where,

C c

+ %h2 + ~ c +

3

c

c + % + %h2

4

c5

cI

c2

= c(l + Xh2 ) - ~X =

=

The system equations

c4 + ~

c~ - 4c 5

c4 -

2 c4 - 4 c 5

state probabilities

in the time domain are evaluated

from

(35) - (38). _ clt

Po(t)

=

CI e

Pl(t)

=

C3 e

P2(t)

=

C4 e

P3(t )

=

C7 e

_ c2t +

C2 e

-

C3 e

+

C5

+

C8 e

_ Clt

(39)

_ c2t

- Clt

(4o)

e- c2t

_ Clt

+ C6

(41)

+ C9

(42)

_ c2t

Human error

753

where (c I - c ) CI (c I - c2)

C2

_

( c - c2) (c I - e 2)

C3

= (c 2 - el) - C l l h 2 + C3~h2

C4

= c I (c I - c2) C2~h2 - C3~h2

C5

= c2(c I - c2) C3th2

C6 c1 c 2

~2 C7 Cl(C I - c2)

C

8

c2(c 2 - c I)

~2 C9

= cI c2

The system reliability expression in terms of Laplace transform is obtained by adding equations

R(s)

i r i=O

=

(35) and (36) S+e

P.(s) l

3

(43)

= (s + c I) (s + c 2)

The system reliability expression in time domain is obtained by taking inverse Laplace transform of equation (43). _Clt R(t)

= (C I + C3)e

_c2t + (C 2 - C 3) e

(44)

where CI, C 2 and C 3 have already been defined for this model. The system MTTF is given by the

MTTF

=

lim s-+O

R(s)

=

lim t-wo

R(t)

21 + lh2 + ~ (45)

2 + ~h2 ~2 + 2~Xh 2 + ~h2

754

B.S. DHILLONand R. B. MISRA

The plot of equations for different values of

(44) and (45) are shown in figures 5,6 and 7

),, t and

%h2"

Model IV

m._ w

Fig. 4.

Two unit standby system.

For this model the system of equations was obtained directly from equations

(31) - (34) by setting

~ = o.

Thus solving the resulting

equations leads to the following state probability,

system reliability

and mean time to failure expressions in Laplace transform, respectively. 1

Po ~sj'"

(46)

(s + d) )`

(47)

Pl(S) (s + d) 2 )`h2 (s + d I)

(48)

P2(s) s(s + d) 2

)2 (49)

P3 (s)

s ( s + d) 2 d =

where

~ + ),h2

d I = 2~ + %h2 Thus the system reliability i R(s)

=

r.

i==o

in Laplace transform is

s+d P.

I

(s)

~"

(50)

(s + d) 2

by utilizing equation (50) the system mean time to failure is:

~TF

=

lira s+O

2), + ),h2 R(s) (~

+ %h2)2

(51)

Human error

The system

state probabilities

755

in the time d o m a i n are g i v e n by:

dt

PO (t)

=

e

P1 (t)

=

%t e

P2(t)

=

DI - D1 e

P3(t)

=

D3 - D3 e

(52) dt

(53) dt

dt + D2t

(54)

e

dt

dt

(55)

- D4t e

where D1

=

dl%h2/d2

%h2

(d - dl) D2 d %2

D3

D4

d2

=

d D3

The s y s t e m r e l i a b i l i t y

R(t)

=

i Z i=O

in time d o m a i n

dt P

i

(t)

=

(i + %t)

The plot of s y s t e m r e l i a b i l i t y %h

is

and time are shown in f i g u r e s

(56)

e

and M T T F

5,6 and 7.

for d i f f e r e n t

values

of

%,

756

B.S. DHILLONand R. B. M,SRA

o o

.Model 3 _J

[odel 4 Model 2 Model I o Q

,00

TIPIE

Figure 5:

!N

HOURS

System Reliability Plot

ql-

o

Model 3

Model 4 4 .

Model 2 Model i

oo j " o'.o4 01.oI 1 o'.~2 " Oo~..oo I HUMAN F A I L U R E R A T E ( ~ h l ) IN

Figure 6:

ot. 20 #.~6 " PERCENT

Mean-time-to-failure Plot.

Human error

757

Model 3

Model 2 Model 4 Model 1

o oo ~

o~.,o

~.2o

o'ao

HUMAN FAILURE RATE(~h2) Fig. 7.

o'.4o ~

IN PERCENT

System mena-time-to-failure

o'.5o

Plot.

RESULTS AND DISCUSSION Numerical values of system reliability and MTTF were obtained using the above developed equations with different values of

~' %hl' %h2

and

~.

Figure 5 represents the plots of reliability Vs time for all the four models.

These plots are obtained for

%h2 = 0.002 and using

equations

% = 0.01,

(15), (29),

~ = 0.02,

(44) and (56).

that the system reliability decreases as the time increases.

%hl = 0.005 and Figure 5 shows Table i gives

the values of system reliability for all the four models for same values of and

~ considering the above values of critical human error rates.

reliability values under column A correspond to and

~ = 0.02

% ~ 0.01, %hl = 0.0, %h2 = 0,0

whereas system reliability values under column

as plotted in Figure 5.

System

B

are same

758

B.S. DHILLONand R. B. MISRA

Table i

System Reliability for Model 1 System Reliability

TIME Model 1

Model 2

Mode

Model 3

4

A

A

A

B

0

i

i

i

I

i

i

i

20

.9671

.8836

.9707

.8859

.9825

.9439

.9458

40

.8913

.7539

.9112

.7660

.9384

.8663

.8774

60

.7964

.6287

.8435

.6555

.8781

.7788

.8063

80

.6968

.5162

.7763

.5583

.8088

.6892

.7376

i00

.6004

.4192

.7125

.4747

.7357

.6023

.6732

120

.5117

.3377

.6533

.4032

.6628

.5212

.6137

140

.4324

.2705

.5986

.3423

.5918

.4473

.5592

160

.3630

.2157

.5485

.2906

.5249

.3812

.5094

180

.3033

.1714

.5025

.2467

.4628

.3229

.4640

Note that the percentage reduction in system reliability increases with time when critical human error rates are considered. Figure 6 represents the plots of MTTF vs critical human error rate (%hl) whereas in Figure 7 the plots of MTTF for different values of Values of

%

and

respectively.

~

%h2

are shown.

for the plots of Figure 6 and 7 are equal to 0.01 and 0.02

The value of

%h2 = 0.002

for Figures 6 and 7 respectively.

and

%hl = 0.005

are considered

The MTTF decreases with increase in critical

human error rate. CONCLUSION The models presented in this paper are typical examples of man machine system.

The analysis presented explains the effect of critical

human error rate on system reliability.

The analysis will be very useful

to the design engineers to optimize their designs to achieve reliability goals.

The analysis presented can easily be extended to general systems.

ACKNOWLEDGEMENT The financial assistance of the Natural Sciences and Engineering Research Council of Canada is gratefully appreciated. REFERENCES i.

H.L. Williams, Reliability evaluation of the human component in manmachine systems, Electrical Manufacturing, April 1958.

H u m a n error

2.

759

D. Meister, The problem of human-initiated failures, Eighth National Symposium on Reliability and Quality Control, 1962.

3.

B.S. Dhillon, On human reliability-bibliography,

Microelectronics and

Reliability, Vol. 20, 1980, pp. 371-373. 4.

B.S. Dhillon, RAM analysis of vehicles in changing weather, Proceedings of the Annual Reliability and Maintainability Symposium, 1984. pp. 48-53.

5.

B.S. Dhillon, Stochastic Models for predicting human reliability, Microelectronics and Reliability, Vol. 22, 1982, pp. 491-496.

6.

B.S. Dhillon, Reliability Engineering in Systems Design and Operation, Van Nostrand Reinhold Company, New York, 1982.

7.

B.S. Dhillon, Systems Reliability, Maintainability and Management, Petrocelli Books, Inc., New York, 1983. Company, New York.

M.R.24/4--K

Distributed by Van Nostran

Reinhold