Reliability Engineering and System Safety 96 (2011) 230–235
Contents lists available at ScienceDirect
Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress
Short communication
Importance measures for multi-phase missions J.K. Vaurio n Prometh Solutions, Hiiht¨ aj¨ ankuja 3K, 06100 Porvoo, Finland
a r t i c l e in f o
a b s t r a c t
Article history: Received 2 May 2010 Received in revised form 7 July 2010 Accepted 9 July 2010 Available online 13 July 2010
Phased missions consist of consecutive operational phases where the system logic and failure parameters can change between phases. A component can have different roles in different phases and the reliability function may have discontinuities at phase boundaries. An earlier method required NOTgates and negations of events when calculating importance measures for such missions with nonrepairable components. This paper suggests an exact method that uses standard fault tree techniques and Boolean algebra without any NOT-gates or negations. The criticalities and other importance measures can be obtained for events and components relevant to a single phase or to a transition between phases or over the whole mission. The method and importance measures are extended to phased missions with repairable components. Quantification of the reliability, the availability, the failure intensity and the total number of failures are described. New importance indicators defined for repairable systems measure component contributions to the total integrated unavailability, to the mission failure intensity and to the total number of mission failures. & 2010 Elsevier Ltd. All rights reserved.
Keywords: Birnbaum Criticality Importance Failure intensity Phased mission Risk Repairable
1. Introduction Reliability importance measures for components have been traditionally defined or applied for coherent systems with mutually statistically independent components, without negations or mutually exclusive events. The most commonly used definitions were introduced by Birnbaum, Fussell and Vesely for components or basic events that are statistically independent of other basic events. The definitions are well documented in several textbooks, e.g. [1]. They are valid as such whether the basic events, i.e. failed states are unreliabilities of non-repairable components or unavailabilities of repairable components. Importance measures have not been extensively studied for dependent events or for phased missions in which the system logic may change between phases and the same component can have different roles in different phases. Then the reliability function can have discontinuities at phase boundaries. Practical examples of missions with multiple phases include space missions and emergency cooling when heat generation varies with time or heat sinks are depleted. Early reliability analysis methods for such missions considered only non-repairable components [2,3], but later fault tree techniques for phased missions were developed also for systems with repairable components [4]. The approach allows also grace periods for failed components [5]. A formalism was recently developed to calculate importance measures for phased missions when all components are
n
Tel.: + 358 40 4139949. E-mail address: jussi.vaurio@sulo.fi
0951-8320/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2010.07.002
non-repairable. In such cases a system failure in any phase fails the mission, and component failures in disjoint phases are mutually exclusive. The suggested method [6] used special Boolean rules and negations of events, which are not generally or not accurately handled by most computer codes for fault tree analysis. Importances were originally not even defined for models with NOT-gates or negations. This paper develops a method to calculate importance measures for phased missions without the need to use any negations of events. The viewpoint is fault tree analysis of very large systems such that analytic equations and derivatives may not be practical, but Boolean manipulations and numerical probability calculations are routine or automatic in computer codes. With proper modeling the contributions to mission failure (criticality), Birnbaum and other interesting measures can be obtained for events relevant to a single phase and to a component over the whole mission. Once a failure event of a non-repairable component occurs, that condition will be present through the remainder of the mission. The causes of a phase failure can occur in the phase itself or in an earlier phase. When phase failure conditions exist on entry to the phase, the system failure, and therefore mission failure, occurs on phase transition. Even if this sounds complex compared to traditional system analysis, it turns out to be no major challenge to traditional fault tree analysis methods. The main idea is to make the input events to each phase-specific fault tree equal to the logic sums (OR) of the same failure events in preceding phases, and then the mission TOP-event equal to the union (OR) of all phase-specific fault trees. The only minor deviation from traditional fault tree cases is that the basic failure events of a component in different phases are mutually exclusive
J.K. Vaurio / Reliability Engineering and System Safety 96 (2011) 230–235
(m.e.) rather than mutually statistically independent (m.s.i.). This causes no impact to the so-called rare-event approximation (r.e.a.) because no two such events appear in the same minimal cut set (m.c.s.) anyway. The evaluation of mission reliability enables decisions to be made regarding the acceptability of the system performance. Criticality importance measures of components can be used to identify possible weaknesses and to prioritize preventive improvements. Risk increase factors are useful in making decisions about acceptable configurations and repair times [7]. The reliability and importances are functions of time. The main interest is in the whole mission, but the same methodology applies at any moment, especially on both sides of any phase boundary. Mutual exclusivity of events causes changes in familiar relationships between certain importance measures. The importance methodology is also extended to phased missions where repairs are possible. The reliability, the availability, the failure intensity and the number of mission failures are of interest in such systems. New importance measures are suggested with respect to these quantities and for measuring the importance of a component for a particular phase and for the whole mission. This paper is structured as follows. The basic importance measures are defined in Section 2 in such a way that only one of them, the criticality, needs to be solved using the complete system model run. All others can be then obtained with simple arithmetic. Section 3 illustrates with an example how phased mission reliability characteristics and importance measures are solved with coherent fault tree methodology in case of non-repairable components. Section 4 extends the methods to systems with repairable components, requiring integrations over multiple phases, and introduction of new importances for unavailabilities, for failure intensities and for the number of mission failures. Failures at phase boundaries call for special treatment. Common cause failures are briefly commented in Section 5 and Section 6 is a summary of the work. This paper is an extension to [8] and an update of ideas and results earlier presented in a conference paper [9].
2. Basic importance measures Definitions, interpretations and mutual relationships of the most common importance measures of single-phase systems are presented in this section for independent events and for events that are mutually exclusive with a subset of basic events. This is a prerequisite for measures to be defined for multi-phase systems. Consider a coherent system or fault tree TOP-event as a Boolean function of K binary basic events Zk. The probability Pr(TOP) ¼Q(z1, z2, y, zK) is a multi-linear function of the basic event probabilities zk ¼Pr(Zk). This sum of alternating products results from the well established inclusion–exclusion principle of quantification based on minimal cut sets (m.c.s). Traditionally, all basic events are assumed mutually statistically independent (m.s.i.), i.e. Pr(Zk\Zj)¼zkzj for all kaj. The form of TOP is similar and without negations also if a group of basic events are mutually exclusive (m.e.), i.e. Pr(Zk\Zj)¼0 for a certain subset Y of indexes kaj, kAY, jAY and are independent of all other basic events outside Y. Examples of this kind are different failure modes of a non-repairable component, and failures of a non-repairable component in different phases of a phased mission. Correct modeling does not allow two events of Y to appear in the same minimal cut set. The usual importances are now presented in an order that minimizes calculations. Only one importance measure is needed per each basic event using the fault tree model. The other commonly used importance measures can then be obtained by simple arithmetic.
231
The criticality importance CIk is the relative contribution of Q that could be eliminated if event Zk could be eliminated, or if zk could be made zero CIk ¼
Q Q ðzk ¼ 0Þ Q ðzk ¼ 0Þ ¼ 1 Q Q
ð1Þ
where Q is the value obtained using the nominal values of all zk. Based on the expression TOP¼ZkGk + Hk, where ZkGk ¼the minimal cut sets containing Zk, Hk ¼the minimal cut sets not containing Zk one can write Q¼ Pr(TOP)¼Pr(ZkGk)–Pr(ZkGkHk) +Pr(Hk). For an independent Zk one can conclude that CIk ¼zk[Pr(Gk)–Pr(GkHk)]/Q. The criticality is automatically calculated by many computerized fault tree codes, not necessarily using this last expression exactly. The traditional Fussell–Vesely importance was defined in principle as FVk ¼Pr(ZkGk)/Q, often a close approximation of CIk. Due to the linearity of Q in terms of independent event probabilities there is another important interpretation for criticality: CIk is the relative increase in Q if zk is doubled. Thus, criticality is informative in both directions, and tells immediately what happens if a test interval is doubled, for example. A third interpretation is diagnostic. CIk is the conditional probability that: event Zk is failed when the system (TOP) is failed AND repair of Zk alone would repair the system. The risk reduction worth RRWk gives the same ranking as CIk because it is defined as RRWk ¼
Q ¼ ð1CIk Þ1 Q ðzk ¼ 0Þ
ð2Þ
Birnbaum’s measure of importance BIk for an independent event Zk is usefully defined as qQ/qzk, and then BIk ¼ Q
CIk zk
ð3Þ
This is exactly the same as BIk ¼qQ/qzk, however it avoids analytical derivatives which are not feasible for large models with fault tree codes. (For small system models one may start with BIk and solve CIk ¼BIk zk/Q.) The risk increase factors are defined as RIFk ¼ Pr(TOP9Zk)/Q, a factor indicating how much the risk increases from the nominal if event Zk is in a failed state. These are useful when one has to make decisions about plant configuration knowing some component to be failed, or when planning to take a component out of service. However, to use the very definition would require additional K calculations of Pr(TOP9zk ¼1) for k¼1, 2,y, K through the complete fault tree model. It is much easier to calculate RIF from criticality as follows:
When Zk is m.s.i. with all other basic events, it can be shown
that RIFk ¼Q(zk ¼1)/Q is CI ð4Þ RIFk ¼ 1 þ k CIk zk When Zk is mutually exclusive with a set of basic events {Zj}, jAJk (Jk is the set of indices of basic events mutually exclusive with event Zk), but statistically independent of all other basic events, one has to take into account that Zk ¼TRUE implies all Zj ¼FALSE for jAJk, and setting zk ¼ 1 requires setting simultaneously zj ¼0 for jAJk. Example of this situation could be multiple failure modes of a component: when one failure mode occurs, others cannot take place. But if one mode is eliminated (zk ¼0), the probabilities of other modes need not change. Then RIFk ¼Q(zk ¼ 1, zj ¼0 for jAJk)/Q leads to [8] RIFk ¼ 1 þ
X CIk CIk CIj zk jAJ k
ð5Þ
232
J.K. Vaurio / Reliability Engineering and System Safety 96 (2011) 230–235
By the way, in this case BIk is not equal to Q(zk ¼1)–Q(zk ¼0), the original definition of Birnbaum, as is the case when Zk is independent of all other basic events. This and the relationships (5) can be easily proved starting with the expression X TOP ¼ Zk Gk þ Zj Gj þ Hk j A Jk
and using the definitions, e.g. ZkZj ¼ f. For a general configuration X, a combination of simultaneous failures and possibly successes, one can define a risk gain, RG(X)¼Pr(TOP9X})/Pr(TOP) [7]. This is not generally a simple function of individual RIF-values, but can be obtained using the system fault tree as Pr(TOP X})/[Pr(TOP) Pr(X)].
Solving phased mission reliability and importance measures in case of non-repairable components is now illustrated without using any NOT-gates. There are three phases i, i ¼1, 2, 3. The first phase lasts from t0 to t1, phase 2 from t1 to t2 and the final phase from t2 to t3. There are three components A, B and C. The mutually exclusive failure events of A in consecutive phases are A1, A2 and A3, and similarly for B and C. The failure probabilities of these events in three phases are a1, a2 and a3, and similarly bi ¼Pr(Bi), ci ¼Pr(Ci), i¼1, 2, 3. Because of the mutual exclusivity these satisfy a1 + a2 +a3 r1, b1 +b2 + b3 r1 and c1 +c2 + c3 r1. A component can be in a failed state at the beginning with a certain probability (qA, qB or qC), or it can fail later according to a some failure time distribution [fA(t), fB(t), fC(t)]. The phase-specific probabilities can have several forms, typically Z t1 a1 ¼ qA þð1qA Þ fA ðtÞdt ð6Þ t0
Z
ti
fA ðtÞdt
for i 4 1
ð14Þ
3) Boolean simplification, solving the minimal cut sets, and quantification by the inclusion–exclusion principle can take place with normal routines. Only in higher order terms one has to take into account the mutual exclusivities and delete products AiAj, BiBj and CiCj for any iaj. This procedure yields the mission reliability at the end, Pr(TOPmiss) ¼Q(t3). If rareevent approximation is used, the mutual exclusivity does not need attention because two events of the same component do not appear in the same minimal cut set.
Q ðt3 Þ ¼ c1 þ c2 þ c3 þ ð1c1 c2 c3 Þða1 þ b1 þ a2 b2 þ a2 b3 þ a3 b2 þa3 b3 a1 b1 Þ
ð15Þ
The criticalities of the individual basic events are CIA1 ¼ ð1c1 c2 c3 Þð1b1 Þa1 =Q ðt3 Þ
ð16Þ
CIA2 ¼ ð1c1 c2 c3 Þðb2 þb3 Þa2 =Q ðt3 Þ
ð17Þ
CIA3 ¼ ð1c1 c2 c3 Þðb2 þb3 Þa3 =Q ðt3 Þ
ð18Þ
CIB1 ¼ ð1c1 c2 c3 Þð1a1 Þb1 =Q ðt3 Þ
ð19Þ
CIB2 ¼ ð1c1 c2 c3 Þða2 þ a3 Þb2 =Q ðt3 Þ
ð20Þ
CIB3 ¼ ð1c1 c2 c3 Þða2 þ a3 Þb3 =Q ðt3 Þ
ð21Þ
CIC1 ¼ ð1a1 b1 a2 b2 a2 b3 a3 b2 a3 b3 þa1 b1 Þc1 =Q ðt3 Þ
ð22Þ
CIC2 ¼ ð1a1 b1 a2 b2 a2 b3 a3 b2 a3 b3 þa1 b1 Þc2 =Q ðt3 Þ
ð23Þ
CIC3 ¼ ð1a1 b1 a2 b2 a2 b3 a3 b2 a3 b3 þa1 b1 Þc3 =Q ðt3 Þ
ð24Þ
ð7Þ
ti1
where qA is the component failure probability at t0 and fA is the density function of time to failure for component A. Similar forms can apply to bi and ci (i¼ 1, 2, 3). The phase-specific TOP-events for the consecutive phases in this example are assumed as TOP1 ðA,B,CÞ ¼ Aþ B
ð8Þ
TOP2 ðA,B,CÞ ¼ AB
ð9Þ
TOP3 ðA,B,CÞ ¼ AB þ C
TOPmiss ¼ TOP1 þ TOP2 þ TOP3
In the example these steps yield the m.c.s equation TOPmiss ¼A1 + B1 + A2B2 +A2B3 + A3B2 +A3B3 + C1 + C2 +C3. In r.e.a. this gives Q(t3) when the capital event names are replaced by the lower case, the probabilities. The exact result is
3. Phased mission example
ai ¼ ð1qA Þ
2) Because the mission fails if any of the above TOP-events occurs, the total mission TOP-event is
ð10Þ
OR-gate (union) is indicated by + and AND-gate (intersection) by multiplication. The procedure to obtain a proper fault tree for the mission is as follows: 1) Consider the status of the system at the end of each phase as if that phase stands alone. It does not matter when a component failed before or during that phase. To account for this we form new phase-specific TOP events so that each basic event in Ti is replaced by the union of basic events before and up to the phase, i.e. TOP1 ¼ TOP1 ðA1 ,B1 ,C1 Þ ¼ A1 þ B1
ð11Þ
TOP2 ¼ TOP2 ðA1 þ A2 ,B1 þ B2 ,C1 þ C2 Þ ¼ ðA1 þA2 ÞðB1 þ B2 Þ
ð12Þ
TOP3 ¼ TOP3 ðA1 þ A2 þ A3 ,B1 þ B2 þ B3 ,C1 þ C2 þC3 Þ ¼ ðA1 þA2 þA3 ÞðB1 þ B2 þ B3 Þ þ C1 þC2 þ C3
ð13Þ
The total criticality importance of component A over the mission can be defined as the relative reduction in Q(t3) when all phase failures of A are eliminated, i.e. CIA ¼1–Q(t39a1 ¼a2 ¼a3 ¼0)/Q(t3). This can be easily obtained from Eq. 15. More generally, the phase-criticalities of a component are always additive over the phases because of the linearity of Q and because different phase events of a component never appear as a product in the same term of Q. Thus, CIA ¼ CIA1 þ CIA2 þCIA3
ð25Þ
CIB ¼ CIB1 þCIB2 þ CIB3
ð26Þ
CIC ¼ CIC1 þ CIC2 þ CIC3
ð27Þ
RRW for a component is not additive over phases but can be meaningfully defined as RRWk ¼(1–CIk) 1 when CIk is the total component criticality, i.e. the sum of phase-event criticalities. The Birnbaum and RIF importances for phase-specific events can be obtained using Eqs. (3) and (5), but they are not automatically extendable to component importances over the mission. If one is interested in the risk increase when component A is down during the whole mission, one can conclude that with a non-repairable component this is obtained simply by setting A1 failed (i.e. a1 ¼1, a2 ¼a3 ¼0) in the mission model. In our example Q(t39a1 ¼1)¼1, while Q(t39a2 ¼1) ¼Q(t39a3 ¼1) ¼c1 + c2 + c3 + (1 c1 c2 c3)(b1 +b2 + b3). Thus, RIFA1 is a proper component RIFA in such applications.
J.K. Vaurio / Reliability Engineering and System Safety 96 (2011) 230–235
On the other hand, one may not know when A fails and is interested in the average impact of A if it fails during the mission. Meaningful component-RIF over a mission is then the weighted average of the phase-specific RIF-values. The weights are the phase-specific probabilities of the component in question, e.g. a1, a2 and a3. Then RIFA ¼(a1RIFA1 + a2RIFA2 + a3RIFA3)/(a1 +a2 + a3). 3.1. Mission failures at phase boundaries If the reliabilities and importances as functions of time are of interest, they can be obtained in the same way, defining the time point of interest as the end of mission. Because of possible discontinuities at phase boundaries, one has separate calculations at the end of a phase i, Q(ti ), and at the beginning of the next phase i +1, i.e. Q(ti + ). In calculating Q(ti ) one proceeds as if phase i is the last phase of the mission. In calculating Q(tj + ) one proceeds as if phase i+ 1 is the last phase of the mission, but setting the basic event probabilities aj + 1 ¼ bj + 1 ¼cj + 1 ¼0 because only the logic transition (and not component failures during phase i+ 1) can have caused system failure up to time ti + . In this way Q(t2 + ) can be obtained from (15) by setting a3 ¼b3 ¼c3 ¼0, i.e. Q(t2 + )¼c1 + c2 + (1–c1–c2)(a1 + b1 +a2b2–a1b1). At the end of phase 2, Q(t2 )¼ Pr(TOPn1 + TOPn2)¼ Pr(A1 + B1 +A2B2)¼ a1 + b1 +a2b2–a1b1. The mission failure probability at the transfer from phase 2 to phase 3 at t2 is Q(t2 +)–Q(t2 ). Numerical examples solved with exact and approximate methods are available [4]. We have seen that no negations or NOT-gates are needed when solving phased mission reliability and importance measures with non-repairable components. This is accomplished through formation of the mission TOP-event (14) rather than treating individual phases as in [6]. Intermediate results at phase boundaries t1 and t2 can be obtained through TOPmiss ¼ TOPn1 and TOPmiss ¼TOPn1 + TOPn2, respectively. 4. Phased missions with repairable components If all or some components are repairable during some or all phases of a mission there are several quantities of interest: the system unavailability ( ¼probability of failed state) Us(t) as a function of time t, the system failure intensity ws(t), the expected number of system failures Ns(t) and the system reliability Fs(t), i.e. the probability that the first system failure occurs before a certain time t. The last two are most interesting at the end of the mission, t ¼tm. In this section several importance measures are defined and applied for a mission taking into account continuous functions during phases and the effects of phase-boundaries. With repairable components the failures of a component in different phases are not anymore mutually exclusive. Failures and repairs can take place several times even in a single phase. The momentary system logic and the momentary component states determine the interesting quantities. We do not need to define separate basic events for each phase, but the numerical probabilities (unavailabilities) vary in time and from phase to phase. The system unavailability at any time t is determined by the system logic at that moment, the appropriate TOPi(Z1, y, ZK) for t in the interval [ti 1,ti], and Us(t)¼Pr(TOPi) ¼Ui(z1, y, zK) obtained from the fault tree with each zk ¼uk(t), the unavailability of component k at instant t. A typical form of uk is the asymptotic lk/(lk + mk) when lk is the failure rate (inverse of the mean time to failure) and mk is the repair rate (inverse of the mean repair time) when the phase duration is clearly longer than the mean repair time. For a non-repairable individual component at t the unavailability equals the failure probability counted from the beginning or from the time when it was known to be unfailed.
233
When the components are mutually independent, standard methods apply and the momentary importance measures can be obtained using Eqs. (1)–(4). The total unavailability contribution of component k over a mission can be measured by the averaged criticality Pm R t i i ¼ 1 ti1 Ui ðt9uk ðtÞ 0Þdt CIk ¼ 1 ð28Þ Pm R ti i ¼ 1 ti1 Ui ðtÞdt where Ui(t) is the system unavailability in phase i. The practical meaning of CIk is the fractional reduction of lost production if the component is made perfect (unfailed). When component unavailabilities are roughly constant within a phase (at close to the asymptotic value), one set of unavailability calculation per phase may be sufficient. Similarly, component RIFk can be defined as the weighted average unavailability increase, i.e. Pm R ti i ¼ 1 ti1 Ui ðtÞRIFi,k ðtÞdt RIFk ¼ ð29Þ Pm R ti i ¼ 1 ti1 Ui ðtÞdt with RIFi,k(t) of event k in phase i, obtained from phase-specific criticality CIi,k(t) by Eq. (4). Because unavailability usually means loss of production, Eq. (28) yields a relevant measure of the relative total economic loss associated with component k. In a safety system the unavailability relates to the accident probability. Then Eq. (28) is the contribution of Zk to the total accident probability. Eq. (29) is the risk increase factor in case the component is down during all phases. Example. Consider the previous 3-phase mission with the TOP events (8)–(10), but the components are repairable and the probabilities ai, bi and ci represent the unavailabilities in phases i (i ¼1, 2, 3). The phase unavailabilities are then U1 ¼ uA þ uB uA uB , U2 ¼ uA uB , U3 ¼ uA uB þuC uA uB uC : Each Ui is valid for t in the interval (ti 1, ti). (But if, for example, C is non-repairable then uC includes also the failure probabilities c1 and c2). The phase-criticalities for component A are CI1,A ¼ ð1-b1 Þa1 =U1 , CI2,A ¼ a2 b2 =U2 , CI3,A ¼ ð1b3 c3 Þa3 =U3 : These yield each RIFi,A through (4), RIF1,A ¼ 1=U1 , RIF2,A ¼ b2 =U2 , RIF3,A ¼ ðb3 þ c3 b3 c3 Þ=U3 : These yield the mission level RIFA by means of Eq. (29).
&
4.1. Failure intensity and Birnbaum importance Birnbaum’s importance has many remarkable interpretations [4]. One of them is that the system failure intensity can be written XK BIk ðtÞwk ðtÞ, ð30Þ ws ðtÞ ¼ k¼1 where wk(t) is the failure intensity of component k and BIk(t)¼ qUs/quk ¼Us(t)CIk(t)/uk(t) in analogy with (3). A typical expression for wk is (1 uk)lk. Eq. (30) is a valid expression for periods during which the system structure (logic) remains constant. In multi-phase systems the unavailability Us and any BIk can change discontinuously at phase boundaries, but Eq. (30) holds during each phase. Eq. (30) is often the most economical way to calculate ws because the number of components (K) in large systems is
234
J.K. Vaurio / Reliability Engineering and System Safety 96 (2011) 230–235
generally much smaller than the number of minimal cut sets. Traditionally, ws has been calculated first for every minimal cut set and then these were added together to obtain ws. A relevant question is how to measure the importance of a component to the system failure intensity? The ratio BIkwk/ws measures the relative fraction of system failures that component k causes as the last failure of a minimal cut set. This ‘‘last failure importance’’ does not measure the whole importance of component k because uk(t) can appear in many terms of Eq. (30) as a factor in BIj, j ak. To obtain the intensity contribution or ‘‘intensity reduction worth’’ one has to count not only the term BIkwk but also terms that contain uk in BIj (j ak). One way to obtain the intensity contribution ICk is to select a submodel Us,k that includes all terms of Us containing uk as a factor. The total system failure intensity due to these terms is ICk, the sum XK ICk ¼ ws,k ðtÞ ¼ ð@Us,k =@uj Þwj ðtÞ ð31Þ j¼1 This is also the intensity reduction that would be accomplished if component k could be made perfect (uk ¼ 40 and wk ¼ 40). Another way to obtain this is ICk ¼ Bk wk þ
K X
@Bj uk wj @u k j ¼ 1;j a k
ð32Þ
when the Birnbaum measures have been determined. ICk can be obtained also by multiplying each term of Us,k by the sum of the ratios wi/ui of all ui contained as a factor, and adding all together. Finally, one can get ICk by taking from ws all terms that contain either wk or uk. [A term is a product of basic event probabilities or probabilities and intensities. For example, (1–uA)wB is not a term in this context but wB and uAwB both are.] The expected number of failures over a phase tA(ti 1, ti) is Z ti ws ðtÞdt ð33Þ Ws,i ¼ ti1
and the contribution of component k in this is similarly the integral of ICk over the phase. Example. Consider the previous 3-phase, 3-component mission. In consecutive phases the phase-specific Birnbaum measures BIi,A, BIi,B and BIi,C for i¼1, 2, 3 are easily obtained as derivatives of Ui, and Eq. (30) yields t A ðt0 ,t1 Þ : ws ¼ ð1uB ÞwA þð1uA ÞwB
ð34Þ
t A ðt1 ,t2 Þ : ws ¼ uB wA þ uA wB
ð35Þ
t A ðt2 ,t3 Þ : ws ¼ uB ð1uC ÞwA þ uA ð1uC ÞwB þð1uA uB ÞwC
ð36Þ
In phase 1 the intensity contributions are ICA ¼ ð1uB ÞwA uA wB , ICB ¼ ð1uA ÞwB uB wA , ICC ¼ 0: In phase 2 they are ICA ¼ uB wA þ uA wB , ICB ¼ uB wA þ uA wB , ICC ¼ 0, and in phase 3 ICA ¼ uB ð1uC ÞwA þ uA ð1uC ÞwB uA uB wC , ICB ¼ uB ð1uC ÞwA þuA ð1uC ÞwB uA uB wC , ICC ¼ uB uC wA uA uC wB þ ð1uA uB ÞwC : Integrals of these over time give contributions of each component to the expected numbers of system failures in each phase. The sum over phases gives the total contribution of each component to mission failures accumulated during phases. But
this is not all because there can be additional system failures at phase boundaries. This will be discussed in Section 4.3. 4.2. Mission failures at phase boundaries At each phase boundary ti a system failure can occur even if no component fails exactly at that time. This happens if the system logic changes so that a component failure combination present at ti is not a system failure in phase i but becomes a failure in phase i+1. This occurs with probability
Di ¼ PrðTOPi þ 1 Þti PrðTOPi \ TOPi þ 1 Þti
ð37Þ
i.e. when the combination is a failure in phase i+ 1 but not in both phases i and i+ 1. Thus, the probability can be solved by normal fault tree techniques. One should note that Di is not the same as the change of unavailability, Ui + 1(ti + ) Ui(ti ), except in special cases. Example. Continuing the previous example: At t1 we have TOP1\TOP2 ¼AB¼ TOP2, and therefore D1 ¼0. At t2, TOP2\TOP3 ¼AB ¼TOP2 and TOP3 ¼C +AB. In this case D2 ¼uC–uCuAuB, which just happens to be equal to the unavailability jump U3(t2 + )–U2(t2 )¼uC(1–uAuB). 4.3. Expected number of mission failures The expected number of mission failures during the whole mission [t0, tm] is exactly Xm1 Xm Ns ðtm Þ ¼ D0 þ Dþ Ws,i , ð38Þ i¼1 i i¼1 where D0 ¼U1(0) accounts for possible initial failures. The contribution of any component k to Ns(tm) can be calculated by subtracting the same integrals and terms calculated with uk(t)¼ wk(t)0 throughout the mission, i.e. Ns(tm9uk(t)¼wk(t) 0). The relative contribution may be called the failure count criticality importance CCIk ¼
Ns ðtm ÞNs ðtm 9uk ¼ wk ¼ 0Þ Ns ðtm Þ
ð39Þ
Example. The intensity contributions of A, B and C have been discussed in Section 4.1. The integrals over the phases give relevant contributions to the numerator of Eq. (39). At phase boundary t1 no component contributes to D1. At t2 the contribution of A to D2 is uAuBuC. B has the same contribution as A, and C contributes uC(1–uAuB). Often the component unavailabilities and failure intensities are approximately constant during the phases, when the phase durations are significantly longer than the repair times. This simplifies integrations by changing them to multiplication with the phase durations ti–ti 1. 4.4. Repairable system unreliability The unreliability Fs(t) of a system or a mission with repairable components is defined as the probability that the first system failure occurs no later than in time t. A good conservative approximation to this when Ns(t) is small is the upper limit of [4]. Max Us ðtuÞ r Fs ðtÞ r Ns ðtÞ,
0 r tu rt
ð40Þ
This general idea was first recognized by Murchland (see references in [4]). Thus, Fs(tm) ffiNs(tm) is determined by Eq. (38) when Ns(t) is small. The importances of components with respect to Fs(tm) are then about the same as for Ns(tm).
J.K. Vaurio / Reliability Engineering and System Safety 96 (2011) 230–235
4.5. The number of completed repairs For the sake of completeness, the probability of a system to get repaired at phase boundary ti (even when no component repair occurs exactly at ti) equals di ¼Pr(TOPi)–Pr(TOPi\TOPi + 1) at ti. The expected total number of completed system repairs (mission repairs) is Vs(tm)¼Ns(tm)–Us(tm). 5. Common cause failures Common cause events are basic events that fail multiple components simultaneously or nearly simultaneously. For example, a cause event Zi,A,B failing components A and B simultaneously in phase i can be modeled in a system fault tree model explicitly in parallel (input to the same OR-gate) with the single failure event Zi,A, and also in parallel with the other single failure event Zi,B. Boolean algebra takes care that by so doing the system TOPevents are correctly modeled. In case of non-repairable components the method described in Section 3 applies as such when failure events Zi;A,B are defined for each phase i with phase-specific failure probabilities zi;A,B, typically according to Eqs. (6) or (7). Usually common cause events can be assumed independent of single failures and mutually. If some of them are mutually exclusive, no fundamental problems arise in the fault tree technique or in calculating importance measures for Zi;A,B through Eqs. (1)–(5), and the sum over the phases as in Eq. (25). The method of Section 4 applies under certain conditions if there are common cause failures repairable during the mission. Statistical independence of a repairable Zi,A,B can be true if repairs of both A and B are completed (simultaneously or consecutively) when a common cause event occurs and then both A and B are taken back into use simultaneously.
6. Conclusions It has been shown how many interesting reliability characteristics and importance measures can be defined and solved efficiently for multi-phase missions using standard fault tree techniques and Boolean algebra without negations or NOT-gates.
235
This holds for systems with non-repairable components as well as for systems in which some or all components are repairable in some or all phases. In non-repairable systems the total component contribution (criticality) is strictly the sum of phase-specific criticalities. Importance quantities defined here for repairable systems measure the importance of events and components to the total unavailability integral over a mission. A meaningful riskincrease factor is a weighted average of time-dependent RIF. New importance measures have been defined also in relation to the failure intensity (failure intensity contribution) and to the expected number of system failures during a mission (failure count criticality importance), taking into account also virtual system failures and discontinuities at phase boundaries. The results are based on exact formula and analysis. The system and mission reliability for repairable systems is approximated by the expected number of failures as there is no general analytical formula even for single-phase repairable systems. Methods to treat explicitly modeled common cause failures were pointed out.
References [1] Henley EJ, Kumamoto H. Reliability engineering and risk assessment. Englewood Cliffs: Prentice-Hall; 1981. [2] Burdick GR, et al. Phased mission analysis: a review of new developments and an application. IEEE Transactions on Reliability 1977;R-26(1):43–9. [3] Ma Y, Trivedi KS. An algorithm for reliability analysis of phased-mission systems. Reliability Engineering and System Safety 1999;66(2):157–70. [4] Vaurio JK. Fault tree analysis of phased mission systems with repairable and non-repairable components. Reliability Engineering and System Safety 2001;74(2):169–80. [5] Vaurio JK. Reliability characteristics of components and systems with tolerable repair times. Reliability Engineering and System Safety 1997;56(1):43–52. [6] Andrews J. Measure of component contribution to the failure of phased missions. In: Guedes Soares C, Zio E, editors. Safety and Reliability for Managing Risk. Leiden: Taylor & Francis/Balkema; September 2006. p. 1555–9. [7] Vaurio JK. Developments in importance measures for risk-informed ranking and other applications. In: Proceedings (CD) of the eighth international conference on probabilistic safety assessment and management (PSAM 8), New Orleans, Louisiana, USA, May 2006. New York: ASME Press. [8] Vaurio JK. Ideas and developments in importance measures and fault tree techniques for reliability and risk analysis. Reliability Engineering and System Safety 2010;95(2):99–107. [9] Vaurio JK. Definition and quantification of importance measures for multiphase missions. In: Aven T, Vinnem JE, editors. Risk, Reliability and Societal Safety. Leiden: Taylor & Francis/Balkema; June 2007. p. 245–51.