Microelectron. Reliab., Vol. 24, No. 4, pp. 743-759, 1984. Printed in Great Britain.
RELIABILITY
EVALUATION
0026-2714/8453.00 + .00 © 1984 Pergamon Press Ltd.
OF
SYSTEMS
BALBIR
WITH
CRITICAL
HUMAN
ERROR
S. DHILLON and
R. Department
B. MISRA
of M e c h a n i c a l
UNIVERSITY Ottawa,
Engineering,
OF OTTAWA,
Ontario
K1N
6N5,
CANADA. (Received
for publication
2 4 t h A p r i l 1984)
ABSTRACT This paper presents redundant state
four mathematical
systems with critical
probabilities
models to evaluate
human e r r o r .
Equations
and mean t i m e t o f a i l u r e
and mean t i m e t o f a i l u r e
plots
are
reliability
of
for
system reliability,
are developed.
System r e l i a b i l i t y
shown.
INTRODUCTION Humans i n t e r a c t actions
s y s t e m s i n many w a y s .
may b e s e e n e a c h day i n n u c l e a r
cockpits tion
with engineering
of aeroplanes
and so on.
power p l a n t s ,
In the earlier
w a s g i v e n o n l y t o s y s t e m , and r e l i a b i l i t y
overlooked. out that
This fact
realistic
reliability
was r e a l i z e d
analysis
o f t h e human e l e m e n t a s w e l l .
20-30 p e r c e n t
of failures,
directly
rooms,
evaluation,
atten-
o f t h e human c o m p o n e n t was [1~
i n 1958.
must take into
He p o i n t e d
consideration
A c c o r d i n g t o D. M e i s t e r
or indirectly
Some o f t h e e x a m p l e s o f t h e human e r r o r s
computer operation
rehability
by H.L. W i l l i a m s
system reliability
Examples of i n t e r -
~2]
the
about
a r e due t o human a c t i o n s .
are as follows:
i)
Maintenance errors
ii)
Misinterpretation of instruments
iii)
Wrong actions
A list of publications on human reliability is given in reference !3~,
Similar
models can be found in references ~-6]. The purpose of this paper is to present a methodology and procedure for evaluating reliability of redundant systems with critical human error.
There-
fore, this paper presents four Markov models of well known redundant configurations E73.
MR. 24/4--J
743
744
B.S. DHILLON and R. B. MISRA
Model I represents a non-repairable
two identical unit parallel system.
The system transition diagram is shown in Figure i. critical human errors or hardware failures.
The system fails due to
This model separates only the
critical human errors from the hardware failures.
In other words only those
human errors due to which both units fail (i.e. when both units were operating normally) or would have failed More clearly,
due
(i.e. when only one unit was operating normally).
to a human action the system fails when both units are
functioning normally.
Furthermore,
due to the same human action to fail simultaneously)
when only one unit is operating normally,
(in other words that action which caused both units
the operating unit failed.
In other words,
units had been operating normally instead of only one, have failed due to the same action. a room (i.e.
For example,
if both the
the entire system would
if there had been a fire in
due to a human error where both units are located,
the entire system
two unit system) would fail irrespective of whether one or two units are
operating successfully. considered
Therefore,
in Model I-IV.
this type of critical human errors are
Non-critical human errors are added into hardware
unit failures in all four models. The only real difference between model I and II is that the model I represents a non-repairable system whereas model II essentially represents of model I but with one exception
(i.e.
the system is repaired from its one
unit failed state to both unit operating state). differ quite significantly.
the same system
The analysis of both models
The state space diagram of the system is shown in
Figure 2. Models III and IV represent a two identical unit standby system. unit is active and the remaining one is on standby.
As soon as the active unit
fails due to hardware failure or non-critical human error, with the standby unit.
it is immediately replaced
The operating and the standby unit may fail simultaneously
due to critical human errors as outlined for Models I and II. standby system
In both cases one
(i.e. when one unit had failed,
Futhermore,
the
the standby is operating) may also
fail due to the same critical human error which have caused the operating and standby units to fail simultaneously.
Fire in a room due to human error where
the two unit standby system is located represents an example of the critical human error. The state-space diagrams associated with models III and IV are shown in Figures 3 and 4.
The only difference between models III and IV is that in model
H u m a n error
III the failed unit is repaired
whereas
745
in model IV it is not repaired.
However,
in both cases the failed system is never repaired. The state probability, are devloped common
to
system reliability
for all four models.
and mean time to failure expressions
The following
assumptions
and notations are
all the four models:
ASSUMPTIONS i.
Failure,
repair and human error rates are constant.
2.
The repaired
3.
System failures
4.
Entire
5.
The critical
system is as good as new. are statistically
independent.
system can fail due to critical
human errors.
human errors may occur when either both system units are
good or when one system unit is good. 6.
System units are identical.
NOTATION 0
in circles of Figures i and 2 denotes are in operating
0
state
in circles of Figures the remaining
1
that both the units in the system
3 and 4 denotes
that one unit is operating
and
one is on standby.
in circles of Figures 1,2,3,
and 4 denotes
that only one of the unit
and 4 denotes
that the system is in
is operating 2
in circles of Figures failed
3
1,2,3,
state due to critical
in circles of Figures
1,2,3,
human error. and 4 denotes
failed state due to hardware failure rate
that the system is in
failures plus non-critical
human errors.
%
Unit constant
%hi
Constant
critical
human error rate when two units are in operating
state.
%h2
Constant
critical
human error rate when two units are in operating
state
(Note in the standby
(this also includes non critical human errors)
system either one unit is operating
one is on standby or one unit has failed, Constant
and the other
the standby operating.
unit failure rate
P.(t) i
Probability
that the system is in ith state at time t; for i = 0,1,2,3.
R(t)
System reliability
MTTF
System mean time to failure.
S
Laplace
at time
transform variable
t
746
B . S . DH1LLONand R. B, MISRA
ANALYSIS In this section equations for Markov models I, II, III and IV are developed Model I
%
2% r
Fig. 1
System transition diagram for 2 unit paralled system
The system of first order differential equations associated with Fig. i is P'o (t)
=
- (2% + %hl )
=
- (% + %h2 )
Po(t)
(i)
!
PI (t)
Pl(t) + 2% Po(t)
(2)
!
P2 (t)
=
%hl Po (t) + %h2 PI (t)
(3)
!
P3 (t)
=
% Pl(t)
(4)
Where the prime denotes the differentiation with respect to time At
t = o
P (t) = 1 and other
initial
condition
probabilities
are
t.
equal
to
O
zero. Solving the set of equations (i) - (4) yield the resulting state probabilities in terms of Laplace Transforms as follows. P
(s)
-
Pl(S)
=
P2(s)
=
o
1 s +a 1
(5)
2% (s + al)(S + a2) ~%%h2
+
%hl (s
(6) +
a2)] (7)
s(s + al) (s + a 2)
P3 (s)
=
2t 2
s (s + al) (s + a2)
(8)
H u m a n error
747
where aI
= 2X + %hi
a2
=
X + %h2
The state probabilities of equations
are obtained
by taking inverse Laplace
transform
(5) - (8) and are given by
_
alt
P (t) o
=
e
(9)
Pl(t)
=
Al(e
P2(t)
=
A 2 - A3e
P3(t)
=
A 5 - A6e
_alt
_a2t )
(io)
- e _alt
_a2t - A4e
_alt
(ii) _a2t
- A7e
(12)
Where AI
=
2% a2 - aI
2%%h2 + %hla2
A2
ala 2 2%Xh2 + Xhl(a2-al)
A3
a I (a 2 - al)
A4
E
2%Xh2 a2(a I - a2)
A5 =
212/ala 2
2X 2 A6 a I (a2-a I) 2X 2 A 7 -a2(al-a 2)
The system reliability obtained R(s)
by adding =
equations
Po(S) + Pl(S)
expression (5) and
in terms of Laplace (6)
transform
is
B.S. DHILLON and R. B. MISRA
748
or
s+a
R(s)
3
(13)
(s + al)(S + a21 where
A3 =
a 2 + 2X
The system MTTF is given by lim MrYF
=
R(s) S
-~ O
lim
s + a3
s ÷ o (s + al)(S + a 2) or
a3
MTTF
3% + Xh2
(141
(2~ + %hl ) (~ + Xh2 )
aI a2
From equation (9) the system reliability is given by
R(t)
= ~-i I
s ,+ a 3 (s + al) (s + a21
or
_
R(t)
=
(i + A I) e
alt
_ a2t -
(151
Ale
Plots are shown for system reliability and MTTF in figures 5,6 and 7 for different values of
X, t
and critical human error rate.
Model II
2~
v
Fig. 2.
System diagram for two unit parallel system with repair.
Human error
749
The system of first order differential equations associated with Figure 2 is
P'o (t)
--
(2% + %hl )
Po(t) + B Pl(t)
(16)
!
PI (t)
(% + %h2 + ~) P1 (t) + 2% Po(t)
(17)
%hi Po (t) + %h2 P1 (t)
(18)
% Pl(t)
(19)
T
P2 (t)
v
P3 (t)
=
At t = o,
Po(t)
= 1.0
and o t h e r
initial
condition
probabilities
are
equal
to zero. Solving
the
set
of equations
(16)
-
(19)
yield
the resulting
state
proba-
in terms of Laplace transform as follows
bilities
eo(S)
=
s 2 (s + b 2)
Pl(S)
=
2%s 2 / A
/ A
(20)
(21)
P2 (s)
=
s(2%%h2 + %hl (s + h 2) ) / A
(22)
P3(s)
=
2% 2 s/ A
(23)
where bI
=
2% + %hl
b2
=
% + %h2 + ~
KI
=
3% + %hl + %h2 + ~
K2
=
2%2 + 2%%h2 + %%hi + %hl%h2 + ~%hl
b3
=
K I +~ KI2 - 4K 2 2 K I - 4 K I 2 - 4K 2 b4
= 2
A
=
s 2 (s + b 3) (s + b 4)
The system state probabilities
in the time domain are obtained from
equations (20) - (23)
Po(t)
=
Ble-b3t + B2e-b4t
(24)
750
B.S. DHILLON and R. B. MISRA
Pl(t)
=
B 3 (e-b3 t _ e-b4 t)
P2(t)
=
B5
P3(t)
=
B 8 e -b3t +
(25)
e-b3 t + B6e-b4t + B 7
B9e-b4t + BIO
(26)
(27)
where (b3-b 2) B1 (b3-b 4)
(b2-b4) B2
(b3-b4 ) 2%
B3 (b4-b 3) 2%%h2 + b2%hl B4
=
B5
=
B 4 + b3%hl b 3 (b3-b 4) b4%hl-B 4 B6
= b 4 (b3-b 4)
B4 B7 b3b 4 2% 2 B8
b 3 (b3-b 4) 2% 2 B9 b4(b4-b 3) 2% 2 BI0 b3b 4 The system reliability expression in terms of Laplace transform is obtained by adding equations (20) and (21)
i R(s)
=
X
s+b 5 Pi(s)
i==O where
b 5 = b 2 + 2%
=
(28)
(s + b 3) (s + b4)
Human error
Inverse Laplace transform of equation
751
(28) gives the reliability
expression in time domain
R(t)
=
(29)
(B I + B3) e-b3 t + (B 2 _ B 3) e-b4 t
The system MTTF may be obtained from equation
(28) using final
value theorem 31 + 1h2 + MTTF
=
lim
R(s) =
(3o)
s+O
212 + 2llh2 + 11hi + lhllh2 + ~ lh I
The plot of equation (29) and (30) are shown in figures 5,6, and 7 for different values of
I,
t
and critical human error rate.
Model III
L
Fig. 3.
System transition diagram for two unit standby system.
This model represents
a 2 unit standby system.
units fail is not considered.
The repair when both
The system of differential
equations associated
with this model in terms of Laplace transform is as follows=
= 1.0
(31)
I Po(S) = 0
(32)
(s + % + Ih2) Po(S) - pPl(S)
(s + I + Ih2 + ~) PI(S) -
sP2(s ) - Ih2 Po(S) - Ih2 Pl(S)
sP3(s) - IPl(S)
= 0
= 0
(33)
(34)
752
B.S. DHILLON and R. B. MISRA
Solving expressions
the above
equations
in Laplace
resulting
state probability
transform:
=
P (s) o
(31) - (34) yields
(s + e)
(35)
(s + Cl)(S + c2)
PI (s)
X
=
(36) (s + Cl)
Xh2
(s + c2)
(s + c 3)
(37)
P2(s) s (s + Cl)
(s + c2)
)12
(38)
P3 (s) s (s + Cl)
(s + c2)
where,
C c
+ %h2 + ~ c +
3
c
c + % + %h2
4
c5
cI
c2
= c(l + Xh2 ) - ~X =
=
The system equations
c4 + ~
c~ - 4c 5
c4 -
2 c4 - 4 c 5
state probabilities
in the time domain are evaluated
from
(35) - (38). _ clt
Po(t)
=
CI e
Pl(t)
=
C3 e
P2(t)
=
C4 e
P3(t )
=
C7 e
_ c2t +
C2 e
-
C3 e
+
C5
+
C8 e
_ Clt
(39)
_ c2t
- Clt
(4o)
e- c2t
_ Clt
+ C6
(41)
+ C9
(42)
_ c2t
Human error
753
where (c I - c ) CI (c I - c2)
C2
_
( c - c2) (c I - e 2)
C3
= (c 2 - el) - C l l h 2 + C3~h2
C4
= c I (c I - c2) C2~h2 - C3~h2
C5
= c2(c I - c2) C3th2
C6 c1 c 2
~2 C7 Cl(C I - c2)
C
8
c2(c 2 - c I)
~2 C9
= cI c2
The system reliability expression in terms of Laplace transform is obtained by adding equations
R(s)
i r i=O
=
(35) and (36) S+e
P.(s) l
3
(43)
= (s + c I) (s + c 2)
The system reliability expression in time domain is obtained by taking inverse Laplace transform of equation (43). _Clt R(t)
= (C I + C3)e
_c2t + (C 2 - C 3) e
(44)
where CI, C 2 and C 3 have already been defined for this model. The system MTTF is given by the
MTTF
=
lim s-+O
R(s)
=
lim t-wo
R(t)
21 + lh2 + ~ (45)
2 + ~h2 ~2 + 2~Xh 2 + ~h2
754
B.S. DHILLONand R. B. MISRA
The plot of equations for different values of
(44) and (45) are shown in figures 5,6 and 7
),, t and
%h2"
Model IV
m._ w
Fig. 4.
Two unit standby system.
For this model the system of equations was obtained directly from equations
(31) - (34) by setting
~ = o.
Thus solving the resulting
equations leads to the following state probability,
system reliability
and mean time to failure expressions in Laplace transform, respectively. 1
Po ~sj'"
(46)
(s + d) )`
(47)
Pl(S) (s + d) 2 )`h2 (s + d I)
(48)
P2(s) s(s + d) 2
)2 (49)
P3 (s)
s ( s + d) 2 d =
where
~ + ),h2
d I = 2~ + %h2 Thus the system reliability i R(s)
=
r.
i==o
in Laplace transform is
s+d P.
I
(s)
~"
(50)
(s + d) 2
by utilizing equation (50) the system mean time to failure is:
~TF
=
lira s+O
2), + ),h2 R(s) (~
+ %h2)2
(51)
Human error
The system
state probabilities
755
in the time d o m a i n are g i v e n by:
dt
PO (t)
=
e
P1 (t)
=
%t e
P2(t)
=
DI - D1 e
P3(t)
=
D3 - D3 e
(52) dt
(53) dt
dt + D2t
(54)
e
dt
dt
(55)
- D4t e
where D1
=
dl%h2/d2
%h2
(d - dl) D2 d %2
D3
D4
d2
=
d D3
The s y s t e m r e l i a b i l i t y
R(t)
=
i Z i=O
in time d o m a i n
dt P
i
(t)
=
(i + %t)
The plot of s y s t e m r e l i a b i l i t y %h
is
and time are shown in f i g u r e s
(56)
e
and M T T F
5,6 and 7.
for d i f f e r e n t
values
of
%,
756
B.S. DHILLONand R. B. M,SRA
o o
.Model 3 _J
[odel 4 Model 2 Model I o Q
,00
TIPIE
Figure 5:
!N
HOURS
System Reliability Plot
ql-
o
Model 3
Model 4 4 .
Model 2 Model i
oo j " o'.o4 01.oI 1 o'.~2 " Oo~..oo I HUMAN F A I L U R E R A T E ( ~ h l ) IN
Figure 6:
ot. 20 #.~6 " PERCENT
Mean-time-to-failure Plot.
Human error
757
Model 3
Model 2 Model 4 Model 1
o oo ~
o~.,o
~.2o
o'ao
HUMAN FAILURE RATE(~h2) Fig. 7.
o'.4o ~
IN PERCENT
System mena-time-to-failure
o'.5o
Plot.
RESULTS AND DISCUSSION Numerical values of system reliability and MTTF were obtained using the above developed equations with different values of
~' %hl' %h2
and
~.
Figure 5 represents the plots of reliability Vs time for all the four models.
These plots are obtained for
%h2 = 0.002 and using
equations
% = 0.01,
(15), (29),
~ = 0.02,
(44) and (56).
that the system reliability decreases as the time increases.
%hl = 0.005 and Figure 5 shows Table i gives
the values of system reliability for all the four models for same values of and
~ considering the above values of critical human error rates.
reliability values under column A correspond to and
~ = 0.02
% ~ 0.01, %hl = 0.0, %h2 = 0,0
whereas system reliability values under column
as plotted in Figure 5.
System
B
are same
758
B.S. DHILLONand R. B. MISRA
Table i
System Reliability for Model 1 System Reliability
TIME Model 1
Model 2
Mode
Model 3
4
A
A
A
B
0
i
i
i
I
i
i
i
20
.9671
.8836
.9707
.8859
.9825
.9439
.9458
40
.8913
.7539
.9112
.7660
.9384
.8663
.8774
60
.7964
.6287
.8435
.6555
.8781
.7788
.8063
80
.6968
.5162
.7763
.5583
.8088
.6892
.7376
i00
.6004
.4192
.7125
.4747
.7357
.6023
.6732
120
.5117
.3377
.6533
.4032
.6628
.5212
.6137
140
.4324
.2705
.5986
.3423
.5918
.4473
.5592
160
.3630
.2157
.5485
.2906
.5249
.3812
.5094
180
.3033
.1714
.5025
.2467
.4628
.3229
.4640
Note that the percentage reduction in system reliability increases with time when critical human error rates are considered. Figure 6 represents the plots of MTTF vs critical human error rate (%hl) whereas in Figure 7 the plots of MTTF for different values of Values of
%
and
respectively.
~
%h2
are shown.
for the plots of Figure 6 and 7 are equal to 0.01 and 0.02
The value of
%h2 = 0.002
for Figures 6 and 7 respectively.
and
%hl = 0.005
are considered
The MTTF decreases with increase in critical
human error rate. CONCLUSION The models presented in this paper are typical examples of man machine system.
The analysis presented explains the effect of critical
human error rate on system reliability.
The analysis will be very useful
to the design engineers to optimize their designs to achieve reliability goals.
The analysis presented can easily be extended to general systems.
ACKNOWLEDGEMENT The financial assistance of the Natural Sciences and Engineering Research Council of Canada is gratefully appreciated. REFERENCES i.
H.L. Williams, Reliability evaluation of the human component in manmachine systems, Electrical Manufacturing, April 1958.
H u m a n error
2.
759
D. Meister, The problem of human-initiated failures, Eighth National Symposium on Reliability and Quality Control, 1962.
3.
B.S. Dhillon, On human reliability-bibliography,
Microelectronics and
Reliability, Vol. 20, 1980, pp. 371-373. 4.
B.S. Dhillon, RAM analysis of vehicles in changing weather, Proceedings of the Annual Reliability and Maintainability Symposium, 1984. pp. 48-53.
5.
B.S. Dhillon, Stochastic Models for predicting human reliability, Microelectronics and Reliability, Vol. 22, 1982, pp. 491-496.
6.
B.S. Dhillon, Reliability Engineering in Systems Design and Operation, Van Nostrand Reinhold Company, New York, 1982.
7.
B.S. Dhillon, Systems Reliability, Maintainability and Management, Petrocelli Books, Inc., New York, 1983. Company, New York.
M.R.24/4--K
Distributed by Van Nostran
Reinhold