Author's Accepted Manuscript
Probabilistic common cause phased-mission systems
Failures
in
Chaonan Wang, Liudong Xing, Gregory Levitin
www.elsevier.com/locate/ress
PII: DOI: Reference:
S0951-8320(15)00194-5 http://dx.doi.org/10.1016/j.ress.2015.07.004 RESS5354
To appear in:
Reliability Engineering and System Safety
Received date: 6 November 2014 Revised date: 25 June 2015 Accepted date: 5 July 2015 Cite this article as: Chaonan Wang, Liudong Xing, Gregory Levitin, Probabilistic common cause Failures in phased-mission systems, Reliability Engineering and System Safety, http://dx.doi.org/10.1016/j.ress.2015.07.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Probabilistic Common Cause Failures in Phased-Mission Systems Chaonan Wanga,b, Liudong Xingb,*, Gregory Levitinc a
Shanghai University of Electric Power, Shanghai 200090, China b
University of Massachusetts, Dartmouth, MA 02747, USA
c
The Israel Electric Corporation, P. O. Box 10, Haifa 31000, Israel
E-mail:
[email protected],
[email protected],
[email protected] *Corresponding author. Tel.: +1 5089998883; Fax: +1 5089998489.
Abstract Probabilistic common cause failures (PCCFs) in a system are failures of multiple system components with the same or different probabilities due to a shared root cause or shock. They can contribute greatly to the overall system failure probability. Therefore, it is significant to incorporate effects of PCCFs into system reliability analysis. To the best of our knowledge, no research has been done on the reliability analysis of phased-mission systems (PMSs) subject to PCCFs. In this paper, we propose an explicit method and an implicit method to analyze reliability of PMSs with PCCFs caused by external shocks. Both methods are illustrated through detailed analyses of a wireless sensor network example. Both space and computational complexities as well as advantages are discussed and compared for the two proposed methods. Keywords: probabilistic common cause failure; phased-mission system; reliability; wireless sensor network. Acronyms ACP
Application communication phase
BDD
Binary decision diagram
CC
Common cause
CCF
Common cause failure
DCCF
Deterministic common cause failure
ICP
Infrastructure communication phase
PCCE
Probabilistic common cause event
PCCF
Probabilistic common cause failure
PCCG
Probabilistic common cause group
PDO
Phase dependent operation
PMS
Phased-mission system
WSN
Wireless sensor network
Notation m
Total number of phases in the considered PMS
Li
Number of elementary CCs involved in phase i
L
Total number of CCs occurring in the considered PMS, L m Li i 1
CCij
An elementary CC event in phase i
PCCGij
A PCCG consisting of all components that can fail due to CCij
CCiX(j)
The jth CC event that can affect component X in phase i
Xij
Conditional failure events of component X caused by CCiX(j)
Xi
Local failure event of component X
XiTF
Total failure event of component X
PCCEk
The kth PCCE
qiX
Conditional local failure probability given that component X has not failed before phase i
qijX
Conditional failure probability of component X given that CCij occurs
QikX
Total conditional failure probability of component X in phase i given that component X has not failed before phase i under PCCEk
UR
System failure probability
1. Introduction Some shocks, e.g., malicious attacks to sensor networks, viruses in computer systems, or extreme environmental conditions (hurricane, floods, lighting strikes), would cause failures of multiple components in real-world systems [1-5]. Such failures of multiple system components due to a shared root cause or a common cause (CC) are referred to as common cause failures (CCFs). Many studies have shown that CCFs contribute
greatly to the overall system failure probability [6, 7]. Therefore, it is essential to consider effects of CCFs for accurate reliability modeling and evaluation of critical systems. CCFs can be classified into two types according to their effects on the affected system components: deterministic CCFs (DCCFs) and probabilistic CCFs (PCCFs). A DCCF results in guaranteed or deterministic failures of all components affected by the CC; whereas a PCCF results in failures of different components affected by the CC with different probabilities [8, 9]. Consider, for example a wireless sensor network with sensors deployed in a military field or forest for fire detection. An explosion in a small area would destroy all sensors in this area. Such an explosion can be considered as a deterministic CC because all the affected sensors fail with probability “1”. However, failures of sensors caused by increased humidity are PCCFs because different sensors are resistant to different levels of humidity and thus may fail with different probabilities as the humidity level increases. Missions in many real-world systems, such as aerospace, nuclear power systems, airborne weapon systems and distributed computing systems, involve several different tasks that have to be accomplished in consecutive phases [10-16]. These systems are referred to as multi-phase systems or phased-mission systems (PMSs). Due to different requirements and environmental conditions, system configurations as well as component failure behaviors may vary from phase to phase. A specific, classic example is an aircraft with two engines. A complete flight mission involves taxiing, take-off, ascent, level-flight, descent, and landing phases. In the level-flight phase, only one engine is necessary, while in the take-off phase, both engines are required. Due to enormous stresses on engines in the take-off phase, the engines are more likely to fail as compared to other phases. In addition, the landing gear and its associate
control subsystems are only needed in the take-off and landing phases [17]. There have been numerous research studies on reliability analysis of both single-phase systems subject to DCCFs (e.g., [18-24]) and PMS subject to DCCFs (e.g., [25, 26]). However, very few research studies were on systems with PCCFs. Particularly, a binomial failure rate model was proposed in [27] to address PCCFs. But the model is only applicable to systems with s-identical and s-independent components where failure probabilities caused by the CC are also identical. Reference [9] analyzed reliability of systems with non-identical components where conditional failure events conditioned on the occurrence of a CC have to be s-independent. Reference [28] proposed two methods to analyze reliability of systems subject to s-independent or s-dependent CCs. All these existing research studies on PCCFs have assumed systems with a single-phased mission and cannot handle PMSs because PCCFs in a PMS are more complicated than those occurring in a single-phase system. Specifically, the dynamics in both component and system behaviors mentioned earlier pose unique challenges to reliability analysis of PMSs. Besides, there exist statistical dependencies across phases for a given component, which further complicates reliability analysis of PMSs. In particular, the state of a component at the beginning of a new phase has to be identical to its state at the end of the previous phase in a non-repairable PMS [29]. Moreover, the PMS might be subject to different PCCFs in different phases. Furthermore, a component that has no contribution to the system failure in a specific phase may also be affected by a CC in this phase; effect of the CC on this component should still be considered in latter phases. The existing explicit and implicit methods for handling PCCFs in single-phase systems [28] are not applicable to PMSs with these complicated PCCF behaviors as well as the aforementioned system dynamics
and dependencies of PMSs. Therefore, in this paper we make extensions by proposing new combinatorial methods to analyze reliability of PMSs subject to PCCFs. A case study is performed to illustrate the proposed methods. Computational and space complexity of the proposed methods are also discussed. The remainder of the paper is organized as follows. Section 2 describes the problem to be addressed in this paper. Section 3 presents a preliminary model. Section 4 presents the two proposed methods for the PCCFs analysis in PMSs. Section 5 gives a case study to illustrate applications of the proposed methods. Section 6 discusses and compares complexity of the two methods. Section 7 gives conclusions and future work.
2. Problem statement The paper considers the problem of evaluating the reliability of PMS subject to PCCFs. The considered PMS can be subject to more than one PCCF due to several elementary CCs occurring in one phase or in multiple different phases. All CCs are external to the system. In other words, PCCFs are only caused by external shocks. Different elementary CCs, whether from the same phase or from different phases, are s-independent. A component‟s failure event caused by a CC and its local failure event within each phase are s-independent. Components affected by the same elementary CC form a probabilistic common cause group (PCCG). A component can belong to more than one PCCG, that is, a component may be affected by multiple CCs. Let Li denote the number of elementary CCs involved in phase i and m denote the total number of phases in the considered PMS. Thus, L i 1 Li is the total number of CCs occurring in the PMS. The m
elementary CCs events occurring in phase i are denoted by CCi1 , , CCiLi . All
components that fail due to CCij constitute PCCGij (i ≤ m, j ≤ Li).
3. Preliminary model In this section, we review basics of the traditional binary decision diagram (BDD) model for system reliability analysis, which is adapted in Section 4 for being used in the proposed methods. Based on Shannon‟s decomposition, a BDD can be expressed using the if-then-else (ite) format as [30, 31]:
f ite x, f x1 , f x0 ite x,F1,F0 x F1 x F0
(1)
Eq. (1) implies that if x (x =1) then F1 = fx=1 (f evaluated at x being 1); else (x =0) F0= fx=0 (f evaluated at x being 0). Figure 1 shows the BDD format of this expression, where the right edge is referred to as a 1-edge or then-edge, the left edge is referred to as a 0-edge or else-edge.
x F0
F1
Figure 1. BDD encoding the ite format of Eq. (1) When modeling system failure behavior, there are two sink nodes in the BDD model which are labeled with constants 0 and 1 representing system success and failure, respectively. Each non-sink node representing a system component is labeled with a Boolean variable, and has two outgoing directed edges (Figure 1) representing the failure (1-edge) and success (0-edge) of the corresponding component, respectively. For BDD generation, the following traditional BDD generation rules are applied. Consider two sub-BDD models for variables representing two different
components
in
the
ite
format:
G ite x, Gx1 , Gx0 ite x, G1 , G2
and
H ite y, H y 1 , H y 0 ite y, H1 , H 2 . The rules for combining these two sub-BDD
models into one BDD model are [30] GH ite x, G1 , G2 ite y, H 1 , H 2 ite x, G1H 1 , G2 H 2 ite x, G1H , G2 H ite y, GH , GH 1 2
index x index y
index x index y
(2)
index x index y
where ◊ represents the logical AND or OR operation. index() represents the order of a Boolean variable predetermined before the BDD generation. In applying the above rules, orderings of two root variables (i.e., x for G, y for H) are first compared. If x and y have the same ordering (corresponding to the case where the two variables encode the same component), either of them becomes the root node of the combined BDD and the logic operation is applied to their child nodes. Otherwise, the variable with a smaller order becomes the root node of the combined BDD model and the logic operation is applied to each child of the node with the smaller order and the other sub-BDD model as a whole. These rules are applied to logic operations between sub-BDDs (Gi, Hi) in a recursive manner until one of them becomes constant „0‟ or „1‟, where Boolean algebra rules (1+x=1, 0+x=x, 1∙x=x, 0∙x=0) are applied.
4. Proposed methods In this section, we describe two methods, an explicit method and an implicit method, to analyze reliability of a PMS subject to PCCFs. The basic idea of the explicit method is to evaluate an expanded system model where each CC is modeled as a basic event shared by all components affected by this CC. The basic idea of the implicit method is to establish a system model without considering effects of PCCFs and then evaluate the system model including contributions of PCCFs. In general, the
explicit method is more straightforward and easier to follow, whereas the implicit method is more computationally efficient as detailed in Section 6. 4.1. Explicit method The proposed explicit algorithm can be described as the following two-step process: Step 1: establish an expanded PMS fault tree model considering effects of PCCFs. Based on the assumption that failure events caused by CCs and the local failure event of a component in a phase are s-independent, we develop independent pseudo-nodes representing the component failure events caused by CCs in a phase and add them to the original fault tree of the phase to generate an expanded PMS fault tree model. In particular, if component X appears in h PCCGs in phase i, that is, the component can be affected by h CCs (CCiX(1), CCiX(2), …, CCiX(h)) where
CCiX j CCi1 , , CCiLi
and 1≤ j≤ h, then h pseudo-nodes (Xi1, Xi2, … Xih)
representing the h conditional failure events caused by the h CCs are added to the original phase i fault tree for component X. Because a component fails if it either suffers a local failure or is affected by a CC, the total failure behavior of the component in phase i can be represented by the following logical expression: X iTF CCiX 1 X i1 CCiX 2 X i 2 CCiX h X ih X i
(3)
where Xi denotes the local failure event of component X. Figure 2 illustrates the corresponding fault tree model representing the failure of component X affected by h CCs in phase i.
XiTF OR
...
AND
CCiX(1)
Xi1
AND
CCiX(h)
Xih
Xi
Figure 2. Fault tree model for component total failure event in phase i The occurrence probability of Xij is a conditional probability that component X fails given that CCiX(j) occurs. Since a PCCG typically includes more than one component in a phase, a CC event may appear more than once in the expanded fault tree model. Note that it may happen that a component X does not contribute to the system failure in phase i but belongs to PCCGiX(j), that is, the event representing the component local failure does not appear in the fault tree corresponding to phase i but the component may fail due to the occurrence of CCiX(j). In this case, the logical “AND” gate connecting pseudo-node Xij and CCiX(j) which represents the effect of CCiX(j) on Xi should be added to the later phase where the component local failure event first appears in the fault tree. If the component local failure event does not appear in the PMS fault tree after phase i, then the effect of CC on X is ignored for all phases after phase i since this component‟s failure has no contribution to the failure of PMS in all phases after phase i. After building the fault tree model in the form of Figure 2 for all components appearing in the PCCGs, the expanded fault tree for the entire PMS can be established by replacing each basic component failure event in the original fault tree with the new total component failure event represented using Figure 2. Step 2: evaluate the expanded PMS fault tree.
In this paper, the BDD-based method for the reliability analysis of PMSs [29] is adapted to evaluate the expanded fault tree to obtain reliability of PMS subject to PCCFs. Major steps of the adapted BDD-based method are summarized as follows: 1) Order input component state variables: for variables representing different components, a heuristic ordering algorithm [32] can be used; for variables representing the same component but in different phases, forward (the variable order is the same as the phase order) and backward (the variable order is the reverse of the phase order) can be used. The backward method is adopted in this paper since it can generate BDDs with smaller size than the forward method [29]. 2) Generate a BDD for each single phase. Note that for traditional PMS analysis, only manipulation rules of (2) described in Section 3 are needed to generate a single-phase BDD because only variables representing different components would appear in each phase. However, as discussed in step 1 of the proposed explicit method, in some phases of the expanded fault tree, different variables representing the same component but different phases may appear within the same phase. In such cases, special Phase Dependent Operation (PDO) rules must be applied to consider dependencies between these variables belonging to the same component. PDO Rules: Consider two sub-BDD models for variables representing the same component but in different phases i and j (i < j): G ite xi , Gxi 1 , Gxi 0 ite xi , G1 , G2
and H ite x j , H x j 1 , H x j 0 ite x j , H1 , H 2 . The PDO rule for the backward ordering method is [29] GH ite xi , G1 , G2 ite x j , H1 , H 2 ite x j , GH1 , G2 H 2 .
(4)
3) Generate a BDD for the entire PMS by combining BDDs obtained in step 2. For two variables representing different components, manipulation rules of (2) are applied; for two variables representing the same component but in different phases,
the PDO rules of (4) are applied. 4) Calculate the system unreliability by evaluating the PMS BDD generated in step 3. Each path in PMS BDD from the root node to sink node “1” represents a disjoint combination of component failures and non-failures in different phases that can result in the entire mission failure. Therefore, the PMS unreliability can be calculated as a sum of probabilities for all the paths from the root node to sink node “1”. Note that for paths involving variables that represent the same component but in different phases, special evaluation method (Eq. (11) in [29]) should be applied to handle statistical dependencies among these variables. 4.2. Implicit method The implicit algorithm can be described as the following five-step process: Step 1: construct an event space that involves all combinations of occurrence and non-occurrence of elementary CCs and then evaluate occurrence probability of each event. Given L elementary CCs that may happen in a PMS with m phases, an event space with 2L disjoint events is constructed. Each event, called a probabilistic common cause event (PCCE), is a combination of occurrence and non-occurrence of the L CCs. The 2L PCCEs are:
PCCE1 CC11 CC1L1 CCm1 CCmLm PCCE2 CC11 CC1L1 CCm1 CCmLm …… PCCE2L CC11 CC1L1 CCm1 CCmLm
Let
2L k 1
Pr(PCCEk)
denote
the
occurrence
probability
of
PCCEk,
then
PrPCCE k 1 .
Step 2: evaluate the total conditional failure probability for components subject
to PCCFs under each PCCE in each phase. Let qiX be the conditional local failure probability given that component X has not failed before phase i and qijX be the conditional failure probability of component X given that CCiX(j) occurs. If component X can be affected by hk CCs (CCiX(1), CCiX(2), …, CCiX ( hk ) ) in phase i under PCCEk, the total conditional failure probability of component X in phase i given that component X has not failed before phase i is hk
QikX 1 1 qiX 1 qijX
j 1
(5)
Step 3: establish reliability model of the PMS without considering effects of PCCFs. In this step, a BDD model is built based on the original PMS fault tree model without considering effects of the PCCFs. Step 4: evaluate the PMS BDD model using the total conditional failure probabilities under each PCCE. Let Pr(PMS fails | PCCEk) be the conditional system failure probability given that PCCEk occurs. It is computed by evaluating the BDD model established in Step 3 using the component total conditional failure probabilities obtained in Step 2. Step 5: evaluate PMS reliability using total probability law. The final system failure probability considering effects of PCCFs is 2L
UR PrPMSfails PrPMSfails | PCCE k PrPCCE k . k 1
5. Case study In this section, we analyze communication reliability of a wireless sensor network (WSN) system in Figure 3 as a case study to illustrate the two methods presented in Section 4.
b
s
t
a c
j
e
n m
l k
d h f g
i
Figure 3. An illustrative WSN example There
are
two
communication
paradigms
within
WSN:
application
communication and infrastructure communication [33]. Infrastructure communication relates to delivery of configuration and maintenance messages (e.g. network set-up, query, path discovery, and policies) from the base station to sensor nodes; application communication relates to delivery of sensed data from sensor nodes to the base station. In this case study, we consider a two-phase communication: the first phase is an infrastructure communication phase (ICP) where the base station (node s) sends message to a destination sensor (node t); the second phase is an application communication phase (ACP) where the destination sensor node sends data to the base station. The two-phase communication mission succeeds only if both ICP and ACP succeed. In the ICP, there are two paths from the base station to the destination sensor node: 1) path11: c → f → h → j; 2) path12: c → f → h → k → m. In the ACP, there are two paths from the destination sensor node to the base station: 1) path21: l → e → b; 2) path22: l → i → g → d → b. We assume that the nodes are perfect and only the links can fail during the mission. There are two s-independent external CCs: CC11 in phase 1 and CC21 in phase 2. The corresponding PCCGs are: PCCG11 = {e, h, i} and PCCG21 = {e, j, l}. The following parameter values are used in the analysis: 1. Local conditional failure probabilities of links: q1X = 0.01, q2X = 0.02,
X a, b,, n . 2. Occurrence probabilities of CC: pCC11 = pCC21 = 0.001. 3. Conditional link failure probabilities given the occurrence of a CC: q11e = 0.2, q11h = 0.5, q11i = 0.3, q21e = 0.3, q21j = 0.5, q21l = 0.7. Figure 4 illustrates the fault tree model describing the failure of the example PMS. PMS Failure OR
j1
ICP Failure
ACP Failure
AND
AND
Path11 Failure
Path12 Failure
Path21 Failure
Path22 Failure
OR
OR
OR
OR
c1
f1
h1
k1
m1
e2
l2
b2
i2
g2
d2
Figure 4. Fault tree model for the PMS example 5.1. Explicit method Step 1: establish an expanded fault tree including effects of PCCFs based on the fault tree in Figure 4. Nodes representing local failure events of e1, h1, i1 in phase 1 and e2, j2, l2 in phase 2 are replaced with sub-fault tree models as shown in Figure 2, which represents the total failure event considering effects of PCCFs for a component. Although e1 and i1 do not appear in the phase 1, i.e., ICP sub-fault tree, failures of links e and i in phase 1 still contribute to the phase 2, i.e., ACP failure. Thus, the effect of CC11 on e1 and i1 is still considered by adding logical “AND” gates to
sub-fault tree under the ACP failure. The effect of CC21 on j does not need to be considered at all since the failure of link j has no contribution to the PMS failure in ACP phase which is the last phase. The expanded fault tree is shown in Figure 5. PMS Failure OR
j1
ICP Failure
ACP Failure
AND
AND
Path11 Failure
Path12 Failure
Path21 Failure
Path22 Failure
OR
OR
OR
OR
c1
f1
AND
CC11
k1
OR
h11
h1
m1
OR
AND
CC11
e11
AND
CC21
b2
OR
e2
e21
AND
CC21
l21
l2
g2
OR
AND
CC11
i2
i11
Figure 5. Expanded fault tree considering the effects of PCCFs Step 2: evaluate the expanded fault tree in Figure 5. In this paper, the BDD-based method for PMS is applied to evaluate the expanded fault tree. The PMS BDD model for the fault tree in Figure 5 is shown in Figure 6.
d2
b2 l2 c1 f1 h1 g2 d2 i2 e2 CC21
e2 CC21
l21
CC21 l21
l21
e21
e21 CC11
CC11
CC11 e11
i11
i11 e11
h11
h11
h11 j1 k1 m1 0
1
Figure 6. BDD model based on the expanded fault tree in Figure 4 By evaluating the BDD model in Figure 6, we obtain the communication unreliability of the example WSN system as 0.09034708.
5.2. Implicit method
Step 1: construct an event space that involves all combinations of occurrence and non-occurrence of the two CCs and then evaluate occurrence probability of each combination. Since there are two CCs, the event space consists of 22 = 4 PCCEs, which are PCCE1 CC11 CC21
;
PCCE 2 CC11 CC21
;
PCCE 3 CC11 CC21 ;
PCCE 4 CC11 CC21 .
Because the two CCs are s-independent, occurrence probabilities of the 4 events are PrPCCE1 1 pCC11 1 pCC 21 0.998001 ; PrPCCE 2 pCC11 1 pCC 21 0.000999 ; PrPCCE 3 1 pCC11 pCC 21 0.000999 ; PrPCCE 4 pCC11 pCC 21 0.000001 .
Step 2: evaluate total conditional failure probabilities for components subject to PCCFs under each PCCE in each phase. PCCE1 is an event that no CC happens at all. Therefore, no components are subject to PCCF under PCCE1. PCCE2 is an event that only CC11 happens. Under this event, components e, h, and i may fail in phase 1 due to occurrence of CC11. For example, based on Eq. (5), the total conditional failure probability for component e is
Q12e 1 1 q1e 1 q11e 0.208 . Similarly, we can obtain the total conditional failure probabilities for all other components subject to PCCFs under each PCCE in each phase, which are listed in Table 1. Table 1. Total conditional failure probabilities under each PCCE
e1 h1 i1 e2 l2
PCCE1 -
PCCE2 0.208 0.505 0.307 -
PCCE3 0.314 0.706
PCCE4 0.208 0.505 0.307 0.314 0.706
Step 3: build a BDD model of the PMS in Figure 4 without considering effects of PCCFs. The fault tree model of the example PMS without considering effects of PCCFs is shown in Figure 4. The corresponding PMS BDD model is shown in Figure 7. b2 l2 c1 f1 h1 g2 d2 i2 e2
j1 k1 m1 0
1
Figure 7. BDD model for the example PMS without considering effects of PCCF Step 4: evaluate the BDD model in Figure 7. Using the total conditional failure probabilities in Table 1, we can calculate the conditional failure probability under each PCCE by evaluating the BDD model in Figure 7.
The conditional PMS failure probabilities are PrPMSfails | PCCE1 0.08921187 ; PrPMSfails | PCCE2 0.5802923 ;
PrPMSfails | PCCE3 0.7336815 ; PrPMSfails | PCCE4 0.88559284 .
Step 5: evaluate the final PMS failure probability using total probability law as 4
UR PrPMSfails | PCCEi PrPCCEi 0.09034708 i 1
The result obtained by the implicit method matches exactly the result obtained using the explicit method. 5.3. Effects of PCCFs To show impacts of PCCFs on system reliability performance, five different sets of conditional link failure probabilities, which respectively represent no CCF occurring (Set 1 in Table 2), low occurrence probabilities (Set 2), medium occurrence probabilities (Set 3), high occurrence probabilities (Set 4) and DCCF scenarios (Set 5), are studied for the example WSN system. We also study effects of occurrence probabilities of CCs on system reliability by changing values of pCC11, pCC21 under these five sets of parameters. Last four rows of Table 2 shows failure probabilities of the example PMS. The results are also presented in Figure 8 graphically. Table 2. PMS failure probability results
q11e q11h q11i q21e q21j q21l
Set 1 0 0 0 0 0 0
Set 2 0.2 0.2 0.2 0.2 0.2 0.2
Set 3 0.5 0.5 0.5 0.5 0.5 0.5
Set 4 0.8 0.8 0.8 0.8 0.8 0.8
Set 5 1 1 1 1 1 1
UR for pCC11= pCC21 = 0.001 UR for pCC11= pCC21 = 0.1 UR for pCC11 = pCC21 = 0.5 UR for pCC11 = pCC21 = 1
0.08921187 0.08963062 0.09026816 0.09080058 0.09103254 0.08921187 0.13076253 0.19206757 0.24128704 0.26226162 0.08921187 0.29042071 0.54746887 0.71230026 0.77230297 0.08921187 0.47526844 0.86567213 0.99217003
pCC11=pCC21= 0.001
pCC11=pCC21= 0.1
pCC11=pCC21= 0.5
pCC11=pCC21= 1
1
1
PMS failiure probability (UR)
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Set 1
Set 2
Set 3
Set 4
Set 5
Figure 8. Impacts of PCCFs From results presented in Table 2 and Figure 8, we can see that as the conditional link failure probabilities increase (changing from Set 1 to Set 5), the PMS failure probability increases monotonically, and the degree of increase depends on occurrence probabilities of CCs which determine the overall contribution of PCCFs to the entire system. Specifically, when occurrence probabilities of CCs are small (pCC11 = pCC21 = 0.001, corresponding to dashed line), the PMS failure probability increases very slightly as the conditional link failure probabilities increase However, for another extreme case (pCC11 = pCC21 = 1, corresponding to solid line), a small change in conditional link failure probabilities would cause a large change in the PMS failure
probability. In this case, when all the conditional link failure probabilities are 1 (Set 5), all affected components suffer deterministic failures, which crashes the entire example system since there are no redundant paths available. Note that Set 1 represents a case that no components are really affected by occurrence of CCs. PMS failure probabilities under this case are the same although the occurrence probabilities for CCs vary.
6. Complexity analysis In this section, we analyze both space and computational complexity of the two proposed methods. 6.1. Space complexity Consider a mission with m phases and n components. In the worst case, all the n components contribute to the mission failure in every phase. For the explicit method, if there are x CCs and all n components are affected in the worst case by each CC, then n*x pseudo variables representing component failures due to effects of PCCFs and x nodes representing occurrences of CCs are included for generating the BDD model. Therefore, in the worst case, there are m*n+(n+1)*x input variables for generating the final BDD. The complexity of the worst case size of BDD is O(2N/N), where N is the number of variables [34]. Thus the complexity of the worst case size of BDD in the explicit method is O(2m*n+(n+1)*x /( m*n+(n+1)*x)). For the implicit method, the effects of PCCFs are not included in the PMS BDD model. Therefore, in the worst case, the number of input variables is m*n. Thus the complexity of the worst case size of BDD in the implicit method is O(2m*n/(m*n)). According to the above discussion, we can conclude that the explicit method has higher space complexity than the implicit method.
6.2. Computational complexity The computational complexity of the BDD evaluation algorithm via the bottom-up evaluation approach based on the memoization technique is O(M), where M is the number of nodes in the BDD model [35]. For the explicit method, since the BDD model generated for the expanded fault tree is evaluated only once, the computational complexity for the explicit method is simply O(2m*n+(n+1)*x /( m*n+(n+1)*x)) based on the worst case size of BDD discussed in Section 6.1. For the implicit method, since the BDD model needs to be evaluated 2x times with different parameters given there are x CCs, the computational complexity of the implicit method is 2x *O(2m*n/(m*n))= O(2m*n+x/(m*n)). According to the above discussion, we can conclude that the explicit method is computationally less efficient than the implicit method.
7. Conclusions and future work In this paper, we propose an explicit method and an implicit method for reliability analysis of PMSs subject to s-independent PCCFs. Both methods are applicable to any arbitrary types of time-to-failure distributions for system components. The explicit method is more straightforward to follow (involving only two steps) but less efficient (in both time and space) than the implicit method. In the future, we will study PCCFs in dynamic PMSs, s-dependent PCCFs, as well as PCCFs with cascading effects.
Acknowledgment This work was supported in part by the US National Science Foundation under
Grant No. 1112947.
Reference [1] W. Li and H. Pham, “Reliability modeling of multi-state degraded systems with multi-competing
failures
and
random
shocks”, IEEE
Transactions
on
Reliability, vol. 54, no. 2, pp. 297-303, Jun. 2005. [2] L. Xing, L. Meshkat and S. Donohue, “Reliability analysis of hierarchical computer-based systems subject to common-cause failures”, Reliability Engineering & System Safety, vol. 92, no. 6, pp. 351-359, Mar. 2007. [3] Z. Tang, H. Xu and J. B. Dugan, “Reliability analysis of phased mission systems with common cause failures”, Proc. of the Annual Reliability and Maintainability Symposium, pp. 313-318, 24-27 Jan. 2005. [4] G. Levitin, L. Xing, and S. Yu, “Optimal connecting elements allocation in linear consecutively-connected systems with phased mission and common cause failures”, Reliability Engineering & System Safety, vol. 130, pp. 85-94, 2014. [5] M. C. M. Troffaes, G. Walter, and D. Kelly, “A robust Bayesian approach to modeling epistemic uncertainty in common-cause failure models”, Reliability Engineering & System Safety, vol. 125, pp. 13-21, 2014. [6] S. Mitra, N. R. Saxena and E. J. McCluskey, “Common-mode failures in redundant VLSI systems: a survey”, IEEE Transactions on Reliability, vol. 49, no. 3, pp. 285-295, Sep. 2000. [7] J. Borcsok, S. Kassel and E. Ugljesa, “Estimation and evaluation of common cause failures”, Proceeding of Second International Conference on Systems, Sainte-Luce, Martinique, France, 22-28 Apr. 2007. [8] L. Xing and W. Wang, “Probabilistic common-cause failures analysis”,
Proceedings of the Annual Reliability and Maintainability Symposium, pp. 354-358, Las Vagas, Nevada, 28-31 Jan. 2008. [9] L. Xing, P. Boddu, Y. Sun and W. Wang, “Reliability analysis of static and dynamic fault-tolerant systems subject to probabilistic common-cause failures”, Proc. IMechE, Part O: Journal of Risk and Reliability, vol. 224, no. 1, pp. 43-53, 2010. [10] A. K. Somani, J. A. Ritcey, and S. H. L. Au, “Computationally efficient phased-mission reliability analysis for systems with variable configurations”, IEEE Transactions on Reliability, vol. 41, no. 4, pp. 504-511, 1992. [11] A. Pedar and V. V. S. Sarma, “Phased-mission analysis for evaluating the effectiveness of aerospace computing-systems”, IEEE Transactions on Reliability, vol. R-30, no. 5, pp. 429-437, 1981. [12] H. S. Winokur Jr., and L. J. Goldstein, “Analysis of mission-oriented systems”, IEEE Transactions on Reliability, vol. R-18, vol. 4, pp. 144-148, 1969. [13] J. L. Bricker, “A unified method for analyzing mission reliability for fault tolerant computer systems”, IEEE Transactions on Reliability, vol. R-22, no. 2, pp. 72-77, 1973. [14] L. Xing, “Reliability importance analysis of generalized phased-mission systems”, International Journal of Performability Engineering, vol. 3, no. 3, pp. 303-318, 2007. [15] J. Lu, and X. Wu, “Reliability evaluation of generalized phased-mission systems with repairable components”, Reliability Engineering & System Safety, vol. 121, pp. 136-145, 2014. [16] R. Peng, Q. Zhai, L. Xing, and J. Yang, “Reliability of demand-based phased-mission systems subject to fault level coverage”, Reliability Engineering
& System Safety, vol. 121, pp. 18-25, 2014. [17] L. Xing, S. V. Amari, and C. Wang, “Reliability of k-out-of-n systems with phased-mission requirements and imperfect fault coverage”, Reliability Engineering & System Safety, vol. 103, pp. 45-50, 2012. [18] Y. Dai, M. Xie, K. L. Poh and S. H. Ng, “A model for correlated failures in N-version programming”, IIE Transactions, vol. 36, no. 12, pp. 1183-1192, 2004. [19] J. K. Vaurio, “Fault tree analysis of phased mission systems with repairable and non-repairable components”, Reliability Engineering & System Safety, vol. 74, no. 2, pp. 169-180, Nov. 2001. [20] K. N. Fleming and A. Mosleh, “Common-cause data analysis and implications in system modeling”, Proceedings of the International Topical Meeting on Probabilistic safety methods and applications, San Francisco, California, 1985, vol. 1: 3/1-3/12, EPRI NP-39129-SR. [21] J. K. Vaurio, “An implicit method for incorporating common-cause failures in system analysis”, IEEE Transactions on Reliability, vol. 47, no. 2, pp. 173-180, Jun. 1998. [22] Z. Tang and J. B. Dugan, “An integrated method for incorporating common cause failures in system analysis”, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 610-614, Las Vagas, Nevada, 26-29 Jan. 2004. [23] L. Xing, “Reliability evaluation of phased-mission systems with imperfect fault coverage and common-cause failures”, IEEE Transactions on Reliability, vol. 56, no. 1, pp. 58-68, Mar. 2007. [24] L. Xing, A. Sherstha, L. Meshkat and W. Wang, “Incorporating common-cause failures into the modular hierarchical systems analysis”, IEEE Transactions on Reliability, vol. 58, no. 1, pp. 10-19, Mar. 2009.
[25] G. Levitin, L. Xing, S. Amari, and Y. Dai, “Reliability of non-repairable phased-mission systems with common cause failures”, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 4, pp. 967-978, 2013. [26] G. Levitin, L. Xing, S. Amari, and Y. Dai, “Reliability of non-repairable phased-mission systems with propagated failures”, Reliability Engineering & System Safety, vol. 119, pp. 218-228, 2013. [27] K. C. Chae, “System reliability using binomial failure rate”, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 136-138, Los Angeles, California, 26-28 Jan. 1988. [28] C. Wang, L. Xing, and G. Levitin, “Explicit and implicit methods for probabilistic common-cause failure analysis”, Reliability Engineering & System Safety, vol. 131, pp. 175-184, Nov. 2014. [29] X. Zang, H. Sun, and K. S. Trivedi, “A BDD-based algorithm for reliability analysis of phased-mission systems,” IEEE Transactions on Reliability, vol. 48, no. 1, pp. 50-60, 1999. [30] L. Xing and S. V. Amari, Binary Decision Diagrams and Extensions for System Reliability Analysis, Wiley-Scrivener, MA, ISBN: 978-1-118-54937-7, 2015. [31] S. Li, S. Si, H. Dui, Z. Cai, and S. Sun, "A novel decision diagrams extension method," Reliability Engineering & System Safety, Vol. 126, pp. 107-115, June 2014 [32] M. Bouissou, F. Bruyere, and A. Rauzy, “BDD based fault-tree processing: a comparison of variable ordering heuristics”, Proceedings of ESREL Conference 1997. [33] C. Wang, L. Xing, V. M. Vokkarane, and Y. Sun, “A phased-mission framework for communication reliability in WSN”, Proceedings of the Annual Reliability
and Maintainability Symposium (RAMS) ,Colorado Springs, CO, 27-30. Jan. 2014. [34] H. Liaw, and C. Lin, “On the OBDD-representation of general Boolean functions”, IEEE Transactions on Computers, vol. 41, no. 6, pp. 661-664, 1992. [35] A. Shrestha, L. Xing, and Y. Dai, “Decision diagram based methods and complexity analysis for multi-state systems”, IEEE Transactions on Reliability, vol. 59, no. 1, pp. 145-161, 2010.
Probabilistic Common Cause Failures in Phased-Mission Systems Chaonan Wang, Liudong Xing, Gregory Levitin Highlights: ► Probabilistic common cause failures (PCCFs) in phased-mission systems (PMS) are analyzed. ► Two combinatorial methods are proposed for reliability analysis of PMSs subject to PCCFs. ► Space and computational complexity is compared for the two proposed methods.