Reliability Engineering and System Safety 68 (2000) 121–133 www.elsevier.com/locate/ress
Semi-Markovian reliability models for systems with testable components and general test/outage times I.A. Papazoglou* Radiation Protection, System Reliability and Industrial Safety Laboratory, Institute of Nuclear Technology, ‘Demokritos’, National Center for Scientific Research, P.O. Box 60228, 153 10, Athens, Greece Received 2 August 1999; accepted 5 January 2000
Abstract Semi-Markov models for systems undergoing periodic test and maintenance are developed. In particular, systems undergoing specific changes of state at predetermined instances of time and transiting to states with generally distributed sojourn times are considered. Problems addressed by the models are those concerned with optimum assessment of test intervals, and allowable outage times. Equivalent Markovian models allowing for the decomposition of a system of dimensionality N ⫹ M into two smaller problems of dimensionality N and M, respectively are developed. The general model is also specialized to systems with instantaneously testable components, unmonitored components undergoing tests (repair, maintenance) of fixed duration, and systems containing components characterized by limited allowable outage time (under test, or repair). Approximate equivalent Markov models are derived in these cases. Simple numerical examples are also presented. 䉷 2000 Elsevier Science Ltd. All rights reserved. Keywords: Semi-Markov reliability models; Limiting conditions of operation; Allowable outage times; Generally tested components; Surveillance test intervals
1. Introduction Safety or, in general, protection systems are systems designed to perform a particular function should the need arise. They constitute a subclass of systems whose mission is to monitor a certain process and undertake a particular action if certain conditions are met. These systems consist of components that should be available all the time but that their state is usually not known. Their availability or unavailability is revealed only when a demand to operate arrives. To improve the availability of such components, tests are performed where the components are asked to operate and if found unavailable they are put under repair or replaced. In many cases tests render the components unavailable (for the duration of test) and contribute to the unavailability or the components through possible errors in the ‘handing back’ procedure. It follows that there is a tradeoff between frequency and duration of tests of unmonitored components and their unavailability. Informed decision making on the frequency and other constraints of tests of unmonitored components requires detailed reliability models incorporating all the special characteristics of the * Tel.: ⫹ 30-1-6548-415; fax: ⫹ 30-1-6540-926. E-mail address:
[email protected] (I.A. Papazoglou).
testing procedure. This is particularly important for the Risk Informed, Performance-Base Regulatory approach that the US-NRC is considering for Nuclear Power Plant regulation in the US. As part of this approach technical specifications and limiting conditions of operation (LCO) as surveillance test intervals (STI) and allowed outage times (AOT) are to be established (changed from present values) on the basis of probabilistic risk assessments [1–4]. The latter require appropriate reliability models to properly evaluate the impact of the various LCOs. The objective of this paper is to present models suitable to a broad category of non-monitored components namely those that are characterized by random and generally distributed sojourn times in test-related states. Approximate models for systems under periodic test and maintenance for use in Fault Trees and Event Tree analyses have been developed before [5,6]. Furthermore, Markovian reliability models capable of incorporating a number of special characteristics of the stochastic behavior of a system [7] have been also widely used in reliability analysis. These models are also capable of incorporating periodic test and maintenance [8]. Markovian models, however, require exponentially distributed residence times in their states or equivalently that the transition probability between two states depends only on these two states [9]. This is a
0951-8320/00/$ - see front matter 䉷 2000 Elsevier Science Ltd. All rights reserved. PII: S0951-832 0(00)00003-X
122
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
Nomenclature Z N M
Set of all possible states of system Subset of Z containing states characterized by exponentially distributed residence times Complementary subset to N
Z N 傼 M containing states that can be reached form states in N through ‘tests’ at time tk. States in M are characterized by generally distributed residence times N Dimension of subset N M Dimension of subset M Conditional transition rate from state i to state j; i; j 僆 N aij A N × N Transition rate matrix with elements aij qim Transition probability from state i to state m, following the kth test, i 僆 N; m 僆 M d
t ⫺ tk Delta function Qd
t ⫺ tk N × M Test conditional transition rate matrix with elements qim d
t ⫺ tk cmi(x) Joint probability that system will remain in state m for x units of time and then it will transit to state i. m 僆 M; i 僆 N: C(x) M × N matrix with elements cmi(x). pm(0,t) Entrance probability rate: probability that system enters state m at the instant of time t. pm(x,t) Occupancy probability: probability that the system occupies state m at time t, and has spent x units of time in m. p(0,t) Entrance probability vector with elements pm(f ,t),
1 × M: p(x,t) Occupancy probability vector with elements pm(x,t),
1 × M p i(t) State probability: probability the system occupies state i at time t p (t) State probability vector
1 × N P N × N diagonal matrix with elements q 0ii m僆M qim : Q0 Pdf of time for a transition out of state m owing to process j. gmj(t) Conditional probability rate for a transition out of state m owing to process j. hmj(t) P gm
x i僆N R cmi
x Probability system will occupy state m exactly x units of time 1 ⫺ Gm
tR t0 gm
x dx Probability that system will occupy state m for less than t units of time Gm
t ∞ t gm
x dx Probability that system will occupy state for m more than t units of time G
t M × M Diagonal matrix with diagonal elements Gm(t) wm(t) State probability for a state m in M w(t) State probability vector
1 × m property that is not present in most processes associated with testing. In several instances the probability of transition from one state to another depends not only on the initial and final states but also on the time spent in the initial state. This type of stochastic behavior modeled by Semi-Markov models (see Howard, Volume II [9]) and they have been used in reliability models in several occasions in the last 30 years (e.g. Refs. [10–12]). van Dijkhuizen and van der Heijden [12], in particular, offer an alternative presentation of a three-state Semi-Markov system and derive closed formulae for a number of reliability measures. Closed formulae, however, are only possible in very special problems or in problems of small dimensionality. Realistic systems (e.g. substantial number of components) on the other hand imply an extremely large number of states with associated large computer storage and time requirements. Semi-Markov models face even more severe problems than simple Markov models since they require storage of a transition matrix for each moment of process-time and partially for this reason their use has been rather limited. This paper develops Semi-Markovian models for systems undergoing periodic tests and subsequent repair when the duration of testing is random and generally distributed.
Equivalent Markovian models are then developed for a number of special but frequently met cases substantially reducing both the computer storage and time requirements. The paper is organized as follows: Section 2 presents the general Semi-Markov model for systems undergoing periodic tests. Section 3 elaborates on the probabilistic characteristics of residence and transition times in states with general stochastic behavior. Section 4, based on the results of Section 2 specializes the results of Section 3 into three particular types of non-monitored and testable components and presents simple numerical examples. Finally, Section 5 offers a summary and the conclusions.
2. Model for periodic testing and non-exponential test/ outage times In several instances there is an interest in analyzing systems comprising components undergoing tests at predetermined instances of time and whose test and/or repair times are not exponentially distributed. The tests bring the tested component(s) in one of a number of test/outage states that are characterized by non-exponential residence times.
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
This makes the transition rate from such a state m to another state i dependent, both on the pair (m,i) and on the time spent in state m. Let the set Z of all possible states of such a system be partitioned into two sets N and M. Set M contains all states that are characterized by the generally distributed residence times and the set N contains the rest of the states. System transitions can be distinguished into transitions within subset N, from N to M and back, and in transitions within M. This analysis, however, will consider only systems for which transitions within the set M are not possible. Transitions between states in subset N occur randomly at times that are exponentially distributed and are characterized by constant transition rates aij and the corresponding transition rate matrix A. Transitions from subset N to subset M and back are special in nature. These transitions occur only when a test is initiated and this happens at predetermined periods of time. This implies transitions rates of the form qim d
t ⫺ tk and the corresponding test-transition rate matrix Qd
t ⫺ tk : Transitions from M to N occur randomly at times that are characterized by a pdf that can be either exponential or non-exponential. These transition are described by the joint transition probability that the system will transit to state i 僆 N; after spending x units of time in state m. This joint transition probability is denoted by cmi(x) and the corresponding M × N matrix by C(x). Transitions within set M are not possible. This is equivalent to assuming that transitions out of set M are much faster than transitions within set M. The state equations for the system can be derived as follows:
i
i 僆 N after spending exactly x units of time in m is given by {pm
0; t ⫺ xcmi
x}: Hence the unconditional (on the time spent in m) transition probability rate is: Zt Pr{m ! i at t} pm
0; t ⫺ xcmi
x dx 0
Hence,
II
X Zt 0
i僆M
pm
0; t ⫺ xcmi
x dx
p_ i
t {Transitions from and to states in N}
I
III ⫺pi
tq 0ii where q 0ii
X
with qim being the probability of transition from state i to state m. Hence, q 0ii gives the probability that the system will leave state i following a test. Eqs. (1)–(4) can be set in matrix form as follows: Zt p_
t p
tA ⫹ p
0; t ⫺ xC
x dx ⫺ p
tQ 0 d
t ⫺ tk
5 0
Since the system cannot transit within M, the only way it can enter a state in M is through a transition from subset N. Hence p
0; t p
tQd
t ⫺ tk
∞ X
p
0; s
p
tk Q e⫺stk
> > > > ;
k0
7 or
p
s p
0sI ⫺ A⫺1 ⫹
∞ X
p
tk {QC
s ⫺ Q 0 } e⫺stk sI ⫺ A⫺1
(8)
k0
Eq. (8) after inversion and rearrangement yields ( ∞ Zt ⫺ tk X At p
t p
0 e ⫹ p
tk Q C
t ⫺ tk ⫺ x eAx dx k0
0
)
⫺ Q 0 eA
t⫺tk u
t ⫺ tk
j僆N j苷i
Term (II) in Eq. (1) consists of transitions from states in M back to N. These transitions are in general characterized by non-exponential transition times. The probability that a system will transit from a state m
m 僆 M to a state
6
The system of equations (5) and (6) can be solved with the help of Laplace transform as follows: 9 ∞ X > p
tk Q 0 e⫺stk > sp
s ⫺ p
0 p
sA ⫹ p
0; sC
s ⫺ > > = k0
1
Term (I) in the equation above consists of transitions occurring within subset N and owing to random changes of states, and it can be written as: X aji pj
t ⫺ aii pi
t
2
I
4
qim
m僆M
⫹ {Transitions from states in M}
II ⫺ {Transitions to states in M}
III
3
Term (III) in Eq. (1) consists of transitions to subset M which are due to testing at times tk, hence
2.1. Occupancy probability for a state in N Consider a state i belonging to subset N. The rate of change in the state probability p i(t) is due to either transitions from within subset N, or to transitions from M to N or transitions from N to M.
123
where u
t ⫺ tk
(
1
if t ⬎ tk
0
if t ⬍ tk
(9)
124
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
In deriving Eq. (9) use has been made of the following Laplace transform properties: L⫺1 {f1
sf2
s}
Zt
⫹p
tm⫺1 e
Zt ⫺ tm Q 0
C
t ⫺ tm ⫺ x
#)
f1
xf2
t ⫺ x dx
0
" A
ctm ⫺tm⫺1
e
⫺A
t⫺tm ⫺x
dx ⫺ Q
0
eA
t⫺tm
and L⫺1 {f
s e⫺sa } f
t ⫺ au
t ⫺ a
p
t p
tm⫺1 eA
tm ⫺tm⫺1 ( ) Zt ⫺ tm ⫺Ay 0 I⫹Q C
y e dy ⫺ Q eA
t⫺tm
The term Zt ⫺ tk p
tk Q C
t ⫺ tk ⫺ x eAx dx
0
0
describes the transitions associated with tests at times t0 ; t1 ; … as follows:
p (tk)Q describes transitions from states in N (immediately before the kth test) into states in M through the test transition rate matrix Q. Rt ⫺ tk C
t ⫺ tk ⫺ x eAx dx describes transitions originat0 ing from states in M and ending into a states in N after
t ⫺ tk units of time. Such transitions can happen if a transition from M to N takes place, after spending
t ⫺ tk ⫺ x units in M, and then the remaining x units of time in N. This can happen for any time x in 0; t ⫺ tk : Eq. (9) is significant because it decreases the dimensionality of the problem from N ⫹ M down to N, since matrix Zt ⫺ tk C
t ⫺ tk ⫺ x eAx dx Q C 0
is an N × N matrix. Eq. (9) can be put in a recursive form suitable for further simplification in special cases, and more useful operationally. For tm ⬍ t ⬍ tm⫹1 Eq. (9) can be written as:
p
t p
0 eAt ⫹
m X
p
tk
where p (tk) denotes the state probability vector immediately after the kth test. By setting Qⴱ
t ⫺ tm I ⫹ Q
× Q
Zt ⫺ tk 0
# C
t ⫺ tk ⫺ x e
( p
0 eAtm ⫹
m X
"
p
tk Q
k0
Ax
0
A
t⫺tk
dx ⫺ Q e
Zt ⫺ tk 0
×e
0
dx ⫺ Q e
((
p
0 eAtm⫺1 ⫹
mX ⫺1
C
t ⫺ tk ⫺ x
A
tm ⫺tk
p
tk Q
k0
eA
t⫺tm
Zt ⫺ tk 0
C
t ⫺ tk ⫺ x
) e
⫺A
t⫺tm⫺1 ⫺x
0
dx ⫺ Q e
0
C
y e⫺Ay dy ⫺ Q 0
11
p
t p
tm⫺1 eA
tm ⫺tm⫺1 Qⴱ
t ⫺ tm eA
t⫺tm
12
Eq. (12) has the form of the state equation of a system undergoing tests in times tm. Vector p
tm⫺1 immediately after test
m ⫺ 1 is operated by matrix eA
tm ⫺tm⫺1 corresponding to transitions within set N. At time tm, a test is performed transforming states inside subset N according to the test matrix Qⴱ
t ⫺ tm like if they were instantaneous transitions and then the system continues for the remaining time
t ⫺ tm transiting among states of subset N. Equivalent test matrix Qⴱ
t ⫺ tm must be calculated and stored for as many time instances in the interval between tests, as the number of desired values of p (t) in the same interval. It is noteworthy, however, that these calculations must be performed only once, since Q ⴱ depends only on the absolute time elapsed after a test and not on the real time. If tests are equally spaced in time with period T, then Eqs. (11) and (12) become, respectively, Qⴱ
T I⫹Q
ZT 0
C
y e⫺Ay dy ⫺ Q 0
13
and
#) ⫺A
t⫺tm ⫺x
Zt ⫺ tm
Eq. (10) reduces to
k0
"
10
A
tm⫺1 ⫺tk
e
A
tm ⫺tm⫺1
p
tm p
tm⫺1 eAT Qⴱ
T or p
tm p
0eAT Qⴱ
Tm
14 Steady state properties can be explored in Eq. (14). 2.2. Occupancy probability for a state in M For unavailability calculations it might be of interest to calculate the probability of occupancy of state m in M. The basic equations governing transitions from states in M
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
are: w_ m
t {Transitions from N to M} ⫺ {Transitions from M to N} w_ m
t
X
pi
tqim d
t ⫺ tk ⫺
i僆N
Zt 0
X
pm
0; t ⫺ x
Zt 0
3. Probabilistic characteristics of residence and transition times in states with general stochastic behavior
p
0; t ⫺ xC 0
x dx
where C 0 is a diagonal matrix with elements c 0mm
x
N X
cmi
x
i1
Laplace transform of the last equation yields sw
s ⫺ w
0
∞ X
p
tk Q e⫺stk ⫺ p
0; sC 0
s
k0
By virtue of Eq. (6), sw
s ⫺ w
0
∞ X
p
tk Q e⫺stk ⫺
k0
sw
s ⫺ w
0
∞ X
∞ X
p
tk Q e⫺stk C 0
s
k0
p
tk QI ⫺ C 0
s e⫺stk
∞ X 1 e⫺stk ) w
s w
0 ⫹ p
tk QI ⫺ C 0
s s s k0
15
L⫺1
C 0
s s
18
gmj
t hmj
tGmj
t
19
The probability that the system will spend more than t units of time in state m is equal to the joint probability that the times-to-completion of all N independent competing processes are greater than t or that, Gm
t
)
Zt 0
C 0
x dx
20
0 j1
hm
t
p
tk QG
tu
t ⫺ tk
21
From which it follows that
0
∞ X
Gmj
t
or because of Eq. (18), 2 3 N ZN X hmj
x dx5 Gm
t exp4⫺
it follows that Eq. (15) takes the form, w
t
N Y j1
and since Zt Zt c 0mm
x dx gm
x dx 1 ⫺ Gm
t 0
Zt Gmj
t exp ⫺ hmj
x dx 0
) Inverting the Laplace transform and assuming that w
0 0 yields ( ) ∞ ∞ X X e⫺stk ⫺1 0 p
tk Qu
t ⫺ tk ⫺ p
tk QL C
s w
t s k0 k0
(
States with general stochastic behavior with respect to residence time are considered. In the general case, a system leaves its state if and when one out of a number of competing random processes is completed (see Ref. [9]). In reliability analyses these competing random processes consist in failing or repairing of components or other similar processes. For a single component under test competing processes may be: (a) completion of test; (b) a demand to operate; (c) completion of allowable outage time, etc. Let j, j 1; 2; …; N be an index over the independent competing random process that will take a system out of a state m. The pdf gmj(t), the conditional rate hmj(t), and the complementary condition probability function Gmj(t) are related as follows: Zt
17 gmj
t hmj
t exp ⫺ hmj
x dx 0
k0
But
problem of dimensionality
N ⫹ M has been reduced into two problems of dimensionality N and M, respectively. It is also noteworthy that the two problems need not be solved in parallel but they can be solved sequentially. The second problem requires only storing of p
tk
k 0; 1; 2; …; k:
cmi
x dx
i僆N
or in the matrix form _ m
t p
tQd
t ⫺ tk ⫺ w
125
16
k0
This equation can be solved if p
tk is known, or equivalently if Eqs. (12) or (14) has been solved. A substantial reduction in computational effort is thus achieved since a
N X
hmj
x dx
22
j1
gm
t hm
tGm
t
23
These general relationships can be specialized in the case of the systems considered here as follows. States with
126
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
generalized residence times are states belonging to subset M. The system leaves a state m in M when one of the competing random processes is completed and it transits to a state i in N. For simplicity of the notation, it has been assumed that there are at the most as many random processes as states in N(N). But this is not an assumption affecting the generality of the results. Of interest are the joint pdf cmi(x) of spending exactly x units of time in state m and then transit to state i. Since there are N competing processes, for this to occur it must be that the time to complete process i is exactly x while the times to completion of the remaining processes are all greater than x. That is, cmi
x gmi
x
N Y
Gj
x
the system from its original state i to its mirror state m and then instantaneously back to some state j of N with probability p 0mj : In a more general case the test can take a system state i to any state in M with probability qim and then back to any state j in N with probability pmj. Given that the transition out of the states m of M are instantaneous the N competing processes out of m have times distributed according to gmi
x p 0mi d
x
26
with X 0 p mi 1 and gm
x d
x
27
24
j1 j苷1
Then G
x 1 ⫺ u
x
or because of Eqs. (19) and (20), cmi
x hmi
xGmi
x
N Y
and Eq. (25) yields
Gj
x ) cmi
x hmi
xGm
x
j1 j苷1
25 Relationships (17)–(25) are helpful in solving the model developed in Sections 2 and 3. In a specific situation either the hmi(x) or the gmi(x) are known for all special states in M and all competing processes. Then C( y) in Eq. (11) can be computed in terms of Eq. (25). Also G(t) in Eq. (16) can be computed in terms of Eq. (21). 4. Special cases 4.1. Instantaneous testing A special case of the general model derived in the Section 2 is that of instantaneous testing. According to this model tests are performed at predetermined instances of time (tk) and instantaneously bring the system into another state characterized by exponentially distributed residence times. For example, from an operating state a test might bring the system back to this state or bring it to a failed state owing to an error. From a failed state the system might transit because of a test to a state where it can undergo repair or back to the original or another failed state owing to an error. Although this model can be derived directly, it is noteworthy that it can be considered as a special case of the general model. To apply the general model to this case a special subset M can be defined with dimensionally N containing mirror states of subset N. To each state i 僆 N there corresponds one state m in M. This is the most general case and enables the modeling of transitions (and hence human errors) that depend on the initial state i. Next, transition probabilities p 0mj are defined giving the probability that the system leaving state m will return to state j or N. Thus the assumption is that a test will take
cmi
x p 0mi d
x1 ⫺ u
x
28
By virtue of Eqs. (28) and (11) it follows that Zt ⫺ tm C
y e⫺Ay dy P 0 0
and that Qⴱ I ⫹ QP 0 ⫺ Q 0
29
Matrix Q ⴱ is time invariant and it might only depend on the type of test performed at tk to the extend that Q and/or P 0 depend on it. Q ⴱ operates on the state probability vector at each tk according to Eq. (12). P Since j僆N pmi 1 Eq. (29) guarantees that P ⴱ j僆N qij 1 Unavailability calculations are not affected by this model since there are always calculated on the basis of unavailable states of N. 4.1.1. Numerical example Consider a component that is standby and can fail in such a way that only a test can reveal the failure and start a repair. The state transition diagram is given in Fig. 1. Set N comprises three states: State 1, component available to perform its function; State 2, component down failure undetectable; and State 3, component unavailable under repair. Transitions possible among these states are as follows. From state 1 to state 2 with failure rate l and from state 3 to state 1 with repair rate m. Set M comprises two states: State 1 0 , component under test originating from state 1; and State 2 0 , component under test originating from state 2; if at the time of test the system is in state 3 a test is not performed since its state is known. From test state 1 0 the system returns back to available state 1 or owing to an error back to the unavailable state 2. From the test state 2 0 the system returns back to the repair
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
127
Fig. 1. State transition diagram for a component undergoing instantaneous tests: (a) complete state-space; (b) reduced state space.
state 3 if the failure is detected or back to state 2 if the failure is not detected. In this case matrices Q and P 0 in Eq. (29) have the form: 2 3 1 0 " # 0 p1 0 1 p1 0 2 7 6 0 7
30 Q6 P 40 15 0 p2 0 2 p2 0 3 0 0 2
1 0
6 Q0 6 40 1 0 0
0
3
7 07 5
see Eq:
4
0
And Eq. (29) yields 2 3 p1 0 1 p1 0 2 0 6 7 7 Qⴱ 6 4 0 p2 0 2 p2 0 3 5 0
0
31
1
Solution of the Markovian model simulating transitions within N yields (see also Fig. 1). 2
eAt
6 6 6 6 4
e⫺lt
1 ⫺ e⫺lt
0
0 1 m m l ⫺lt ⫺ mt ⫺lt e ⫺ e 1 ⫺ e ⫹ e ⫺ mt m⫺l m⫺l m⫺l
e
3
7 0 7 7 7 5 ⫺mt
32 2 e
4.2. Testing with deterministic duration Another special case of the general model, is that of a system than undergoes tests at predetermined periods of time and where the duration of the tests are deterministically known. Let t k be the fixed duration of the test to be performed at time tk. Given that the transition out of a state m of M is instantaneous after spending exactly t k units of time the N competing processes out of m have completion time s distributed according to: gmi
x p 0mi d
x ⫺ tk P with Ni1 p 0mi 1 and gm
x d
x ⫺ tk : Then G
x u
x ⫺ u
x ⫺ tk and Eq. (25) yields
34
cmi
x p 0mi d
x ⫺ tk u
x ⫺ u
x ⫺ tk
35
Given Eq. (35), Eq. (11) yields Zt ⫺ t12 Pd
y ⫺ tk e⫺Ay dy Qⴱ
t ⫺ tk
I ⫺ Q 0 ⫹ Q 0
) Qⴱ
t ⫺ tk
I ⫺ Q 0 ⫹ QP e⫺Atk u
t ⫺ tk ⫺ tk
36 where u
t ⫺ tk ⫺ tk is the step function.
and
AT
Solving Eq. (14) using Eq. (33) is equivalent to considering a three-state system with transition diagram as shown in Fig. 1(b). The unavailability of this system is given in Fig. 2.
p 1 0 1 e ⫺ lT
6 6 6 0 Q 6 6 6 p 0 m 4 11 e⫺lT ⫺ e⫺mT m⫺l ⴱ
p1 0 2 e⫺lT ⫹ p2 0 2
1 ⫺ e⫺lT
p2 0 3
1 ⫺ e⫺lT
p2 0 2
p2 0 3
p1 0 2 ⫺ p2 0 2 m ⫺lT p2 0 2 l ⫺ p1 0 2 m ⫺mT 0 e ⫹ e p2 2 ⫹ m⫺l m⫺l
p 0 me⫺lT p 0 le⫺mT 0 ⫹ 23 p2 3 ⫹ 2 3 m⫺l m⫺l
3 7 7 7 7 7 7 ⫺mT 5 ⫹e
33
128
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
Fig. 2. Unavailability of unmonitored component with instantaneous testing.
4.2.1. Numerical example: a single component system Consider again the system in Section 4.1.1 where now the test is not instantaneous but it rather has a fixed duration, r. The only difference between this case and the case of instantaneous testing (Section 4.1.1) is that the test matrix Q ⴱ has the following form: 2 3 p1 0 1 u
t ⫺ r p1 0 2 u
t ⫺ r 0 6 7
38 0 p2 0 2 u
t ⫺ r p2 0 3 u
t ⫺ r 7 Qⴱ 6 4 5
By virtue of Eqs. (12) and (36) it follows that
p
t p
tk⫺1 eA
tk ⫺tk⫺1 I ⫺ Q 0 ⫹ QP e⫺Atk u
t ⫺ tk ⫺ tk eA
t⫺tk or by setting p
tk⫺1 eA
tk ⫺tk⫺1 p
tk⫺
p
t p
tk⫺ I ⫺ Q 0 eA
t⫺tk ⫹
p
tk⫺ QP
A
t⫺tk ⫺tk
e
0 u
t ⫺ tk ⫺ tk
37
The term p
tk⫺ I ⫺ Q 0 eA
t⫺tk refers to those states of the system that P do not undergo test. For states i that do undergo a test it is m僆M qim 1: Hence, I ⫺ Q 0 ii 0 if i is a state that undergoes a test. The term p
tk⫺ QP eA
t⫺tk ⫺tk u
t ⫺ tk ⫺ tk refers to those states of the system that do undergo testing. The corresponding state probabilities become equal to zero for the period tk ; tk ⫹ tk and reappear after rearrangement through QP at tk ⫹ tk : For unavailability calculations, the first term in Eq. (37) can be used for those states that are unavailable and do not undergo test (e.g. system under repair), along with the second term. This latter term and particularly vector p
tk⫺ Q; provides the state probabilities of the test states which remain constant for the period tk ; tk ⫹ tk : Those states in M that are unavailable contribute to the unavailability in the period tk ; tk ⫹ tk :
0
1
Matrix e is as in example of Section 4.1.1. States 1 0 and 2 0 (see Fig. 1) have non-zero occupancy probability only during tests. The unavailability of this system is given by: At
U
t p2
t ⫹ p3
t
if t ⬍ tR or t ⬎ tR ⫹ t
39
and it becomes equal to unity for tk ⬍ t ⬍ tk ⫹ t: This time behavior is depicted in Fig. 3 where the unavailability is plotted as a function of time. 4.2.2. Numerical example: a 2-out-of-4 (success) system with simultaneous or staggered testing Consider a system of four identical components with the characteristics given in Section 4.2.1. The components are connected in 4-out-of-2 success logic that is it takes three component failures to have a system failure. A complete system-state development would require 54 625 states. Given the approximation of Section 4.2 only three states are necessary and sufficient to simulate each component evolution. This means that only 34 81
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
system states are required. Knowledge that a component is unavailable during test allows for the calculation of the unavailability of the system. It suffices to flag critical states, that is states for which an additional component failure or unavailability will render the system unavailable. Fig. 4 depicts staggered and simultaneous testing [3]. Steadystate values are obtained rather fast and average unavailabilities values can be obtained from one cycle only.
129
restricted to t m units of time. Once this time is completed the system transits to a state j in N. This means that the pdf of the time-to complete this process has the form of Eq. (34), or gmj
x d
x ⫺ tm and Gmj
x u
x ⫺ u
x ⫺ tm
4.3. Testing and repair with maximum allowable outage time An interesting special case of the general model arises when a system can spend only a predetermined and deterministically known amount of time in certain states. This is usually the case when a component is off-line for test and/or repair and unavailable. In several instances it is desirable to limit the amount of time the system is spending in those states, with the rationale that in doing so the risk is reduced. An analog channel in a protection system, for example, can be put in to a “trip state” giving a positive signal if the testing exceeds a certain period. Similarly a nuclear power reactor might be shutdown if one or more of its components (e.g. emergency diesel generators) remain unavailable for more than a certain period of time. Such situations can be modeled as follows. All system states where the residence time is restricted by the AOT limitation form subset M. The general model developed in Section 3 takes in this case a special form. Let m be a state in M for which the residence time is
40
Additional competing processes are usually in the form of a test or a repair with random duration. Let g(x), h(x), G(x) be the characteristics of this process. Eqs. (17) and (18) take then the form: gmi
t pmi g
t
41
Zt Gmi
t exp ⫺ pmc h
x dx 0
P
with i1 i苷j pmi 1 By virtue of Eqs.(40), (41), (20), it follows that Gm
ⴱ G
xu
x ⫺ u
x ⫺ tm and hence Eq. (25) yields ( pmi g
xu
x ⫺ u
x ⫺ tm cmi
x d
x ⫺ tm G
x
if i 苷 j if i j
42
in view of the special nature of cmj(x) in Eq. (42), Eq. (11)
Fig. 3. Unavailability of unmonitored component with constant duration testing.
130
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
Fig. 4. Unavailability of a 4-out-of-2 system consisting of components of example in Section 4.2.1.
becomes Q ⴱ
t I ⫺ Q 0 ⫹ Q
Zt
C
y e⫺Ay dy
43
0
where t denotes in summary form the fact that the upper limit of the integration of each element cmi(y) of C(y) extends up to the allowable residence time t m. Matrix Q ⴱ(t ) could be calculated from Eq. (43) and used in Eq. (12), to calculate
p
t for times t ⬎ tm ⫹ tⴱ where tⴱ max{tm } :
I ⫺ Q 0 Sk0 : corresponds to these states in N that do not communicate with M at the initiation of the test. P0 C
rSk0 ⫺r : corresponds to transitions from N to Q kr1 M at the time of test, and then spending r unit of time in M before transiting on N where the remaining k0 ⫺ N: This can happen for r 1; 2; …; k0 : Matrix Qⴱ
k0
I ⫺ Q 0 Sk0 ⫹ Q
k0 X
C
rSk0 ⫺r
45
r1
m僆M ⴱ
Operationally this means that Q (t ) can be calculated relatively easily numerically. For example discretization of Eq. (12) yields ( ) k X 0 k k⫺r p
nK ⫹ k p
nK
I ⫺ Q S ⫹ Q C
rS
44 r1
where n is the number of tests performed with T : number of time steps corresponding to test K Dt period T, S ⬅ I ⫹ ADt ⬇ eADt ; Sk ⬅ eA
kDt ; k0 Dtt : number of time steps corresponding to AOT t . The term in brackets in Eq. (44) describes transitions during the AOT period.
can be calculated once and it can be used in the form of ‘instantaneous’ transitions owing to tests.
4.3.1. Allowable outage times relatively shorter than mean times of other transitions If the allowable outage times tm
m 1; 2; … are relatively short in comparison to the mean times for transitions among state within set N (mainly failures and repairs), than a further simplification of Eqs. (43) and (44) is possible. Owing to the small values of the t m the correction factor e ⫺Ay can be ignored. In other words, the probability of observing transitions within subset N in the period t m is negligible. Then, e⫺Ay I and by virtue of Eq. (42) it
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
131
follows that cmi
tm cmi
Ztm 0
( cmi
y dy
pmi 1 ⫺ G
tm
if i 苷 jⴱ
G
tm
if i jⴱ
46
This means that Q ⴱ(t) is easily calculated on the basis of Eq. (46), it is time invariant and the system instead of being of the order
N ⫹ M it becomes a system of order N, with “instaneously” testable components.
4.3.2. Numerical example Consider an one-component non-monitored system with the state transition diagram shown in Fig. 5. The system can be in the following states. • State 1: system is up and available to perform its function. • State 2: system is down, its unavailability is undetected. • State 3: system is down, its unavailability detected and under repair. This state is considered fail-safe since appropriate actions are taken to offset the known unavailability of the system. From state 1, the system can fail in an undetectable mode to state 2 with transition rate l , and in a detectable mode to state 3 with transition rate l s. From state 3, the system can transit back to state 1 when its repair is completed with transition rate r . Since the system is unmonitored its true state is only known if it is state 3. Only a demand (a challenge) to the operability of the system can reveal whether it is in state 1 or 2. To improve the availability a test is performed every tk units of time. The duration of the test is not constant but rather it is random and exponentially distributed with mean time 1=m: During the test, the component is unavailable. To limit the vulnerability of the system the time allowed to spend in this unavailable mode is limited to t units of time. If this time is exceeded the component is set in the ‘fail-safe’ state 3 from where it can transit back to state 1. Should the test be completed before the expiration of the allowable outage time the true state of the component is known if no error is committed during the test. A test takes the system into the following states. • State 1 0 : Test state if state before was 1. Completion of test will bring the system back to state 1 probability p1 0 1 and leave it down (state 2) with probability p1 0 2
p1 0 1 ⫹ p1 0 2 1: If test is not completed by t then the system transits to state 3. • State 2 0 : Test state if state before test was 2. Completion of test will bring the system to state 3 (under repair) with probability p2 0 3 and back to the undetected state 2 if an error is committed with probability p2 0 2
p2 0 2 ⫹ p2 0 3 1:
Fig. 5. State transition diagram for a component undergoing tests and with maximum allowable time in test mode t .
Given these assumptions Eq. (42) becomes, ( p1 0 i m e⫺mx u
x ⫺ u
x ⫺ t i 1; 2 c1 0 i
x d
x ⫺ t e⫺mx i3 c2 0 i
x 0 c2 0 2
x p2 0 2 m e⫺mx u
x ⫺ u
x ⫺ t c2 0 3
x p2 0 2 m e⫺mx u
x ⫺ u
x ⫺ t ⫹ d
x ⫺ t e⫺mx Substitution of these equations in Eq. (45) with Q and Q 0 as in example in Section 4.1.1, gives Q ⴱ(k) for k 1; 2; …; k0 : Matrix Q ⴱ(k0) operating directly on the state probability vector p (nK) (see Eq. (44)) provides the state probability vector p
nK ⫹ k0 immediately after the expiration of the AOT. Given the possible transitions of the system and the resulting structure of the various matrices vector p (nK) has a steady-state value. Its periodic behavior is depicted in Fig. 6. If the AOT is treated as a single transition step, then the unavailability increases to 1 just after the initiation of the test and remains 1 during the AOT where it drops down to the exact value immediately after t units of time have elapsed (Line 1 in Fig. 6). This, however, is an approximation and its validity depends on numerical values of the parameters of the problem. Alternatively, the unavailability can be calculated in various instances in the interval [0,t ] to obtain the exact behavior and hence the exact average value (Ut ) over the same period (Line 2 in Fig. 6). Then the correct unavailability of the component can be calculated by assuming a test of constant duration t test matrix Q ⴱ(t ) and test unavailability Ut . (i.e. if a challenge occurs during test the system can respond successfully with probability 1 ⫺ Ut ). This allows a very significant reduction in the number of the states of the system. The electrical part of the protection system of a nuclear power reactor, for example, can be modeled in terms of six components like the one
132
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133
Fig. 6. Unavailability of component with allowable outage time when transitions during test are ignored (Line 1) and when they are taken in to account (Line 2).
considered here. Four of them are connected in a 4-out-of-2 logic. Testing of these components is not allowed to exceed a particular time period. If this happens to be in the ‘trip’state then this means a reactor shutdown. Spurious shutdown are undesired both because of their economic impact and because they can have detrimental safety implications. To calculate the unavailability of such a system taking into account the AOT restriction would require a model with 56 15625 states. Instead, the model presented here would give the same results with only 36 729; a reduction factor of over 20. Details of this analysis are given in Ref. [13].
5. Summary and conclusions Equivalent and reduced in size Markovian models have been derived for systems undergoing tests and with generally distributed residence times. The general model has been specialized for three particular cases: • instantaneous tests; • tests of fixed (deterministically known) duration; • tests of random duration generally distributed but associated with a maximum duration. The equivalent models imply a system with almost half the number of states of the complete model. Tests are simulated with one transition-taking place at the instances of tests. Detailed time-dependent solution of the reduced model is in the general case required only once to derive the equivalent test matrix. A substantial reduction both in the number of states is
thus achieved through the use of the reduce-component state model in multi-component systems. Furthermore, substantial reduction in computation time is achieved since Markovian instead of Semi-Markovian model have to be solved.
References [1] Golay M. Improved nuclear power plant operations and safety through performance-based safety regulation. Risk analysis. Papazoglou IA, editor. Journal of Hazardous Materials 2000;71(1– 3):219–37. [2] Caruso MA, Cheok MC, Cunningham MA, Holahan GM, King TL, Parry GW, Ramey-Smith AM, Rubin MP, Thadani AC. An approach for using risk assessment in risk-informed decisions on plant-specific changes to the licensing basis. Reliability Engineering and System Safety 1999;63:3. [3] Atefi B, Gallagher DW. Feasibility assessment of a risk-based approach to technical specifications. NUREG/CR-5742, 1991. [4] Samanta PK, Wong SM, Carbonaro J. Evaluation of risks associated with AOT and STI requirements at the ANO-1 nuclear power plant, NUREG/CR-5200, 1988. [5] Apostolakis G, Chu TL. The unavailability of systems under periodic test and maintenance. Nuclear Technology 1980;50:5. [6] Vesely VE et al. Frantic II—a computer code for time dependent unavailability analysis. Bookhaven National Laboratory, NUREG/ CR-1294, 1981. [7] Papazoglou IA. On the need of Markovian reliability analysis. In: Apostolakis G, editor. Probabilistic safety assessment and management, vol. 2. Amsterdam: Elsevier, 1991. p. 1413–8. [8] Papazoglou IA, Gyftopoulos EP. Markovian reliability analysis under uncertainty with and application on the shutdown system of the Clinch river Breeder reactor. Nuclear Science and Engineering 1980;73:1. [9] Howard R. Dynamic probabilistic systems, vols. I and II. New York: Wiley, 1971. [10] Branson MH, Shah B. Reliability analysis of systems comprised of
I.A. Papazoglou / Reliability Engineering and System Safety 68 (2000) 121–133 units with arbitrary repair-time distributions. IEEE Transactions on Reliability 1971;R20(4):217–23. [11] Werely NM, Walker BK. Approximate evaluation of Semi-Markov chain reliability models. Reliability Engineering and System Safety 1990;28(2):133–64.
133
[12] Van Dijkhuizen G, van der Hejiden M. Preventive maintenance and the interval availability distribution of an unreliable production system. Reliability Engineering and System Safety 1999;66(1):13–27. [13] Papazoglou IA. Reliability based surveillance testing intervals for a reactor protection system. Nuclear Technology 2000;130:1–22.