Probabilistic common cause failures in phased-mission systems

Probabilistic common cause failures in phased-mission systems

Author's Accepted Manuscript Probabilistic common cause phased-mission systems Failures in Chaonan Wang, Liudong Xing, Gregory Levitin www.elsevi...

625KB Sizes 2 Downloads 67 Views

Author's Accepted Manuscript

Probabilistic common cause phased-mission systems

Failures

in

Chaonan Wang, Liudong Xing, Gregory Levitin

www.elsevier.com/locate/ress

PII: DOI: Reference:

S0951-8320(15)00194-5 http://dx.doi.org/10.1016/j.ress.2015.07.004 RESS5354

To appear in:

Reliability Engineering and System Safety

Received date: 6 November 2014 Revised date: 25 June 2015 Accepted date: 5 July 2015 Cite this article as: Chaonan Wang, Liudong Xing, Gregory Levitin, Probabilistic common cause Failures in phased-mission systems, Reliability Engineering and System Safety, http://dx.doi.org/10.1016/j.ress.2015.07.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Probabilistic Common Cause Failures in Phased-Mission Systems Chaonan Wanga,b, Liudong Xingb,*, Gregory Levitinc a

Shanghai University of Electric Power, Shanghai 200090, China b

University of Massachusetts, Dartmouth, MA 02747, USA

c

The Israel Electric Corporation, P. O. Box 10, Haifa 31000, Israel

E-mail: [email protected], [email protected], [email protected] *Corresponding author. Tel.: +1 5089998883; Fax: +1 5089998489.

Abstract Probabilistic common cause failures (PCCFs) in a system are failures of multiple system components with the same or different probabilities due to a shared root cause or shock. They can contribute greatly to the overall system failure probability. Therefore, it is significant to incorporate effects of PCCFs into system reliability analysis. To the best of our knowledge, no research has been done on the reliability analysis of phased-mission systems (PMSs) subject to PCCFs. In this paper, we propose an explicit method and an implicit method to analyze reliability of PMSs with PCCFs caused by external shocks. Both methods are illustrated through detailed analyses of a wireless sensor network example. Both space and computational complexities as well as advantages are discussed and compared for the two proposed methods. Keywords: probabilistic common cause failure; phased-mission system; reliability; wireless sensor network. Acronyms ACP

Application communication phase

BDD

Binary decision diagram

CC

Common cause

CCF

Common cause failure

DCCF

Deterministic common cause failure

ICP

Infrastructure communication phase

PCCE

Probabilistic common cause event

PCCF

Probabilistic common cause failure

PCCG

Probabilistic common cause group

PDO

Phase dependent operation

PMS

Phased-mission system

WSN

Wireless sensor network

Notation m

Total number of phases in the considered PMS

Li

Number of elementary CCs involved in phase i

L

Total number of CCs occurring in the considered PMS, L   m Li i 1

CCij

An elementary CC event in phase i

PCCGij

A PCCG consisting of all components that can fail due to CCij

CCiX(j)

The jth CC event that can affect component X in phase i

Xij

Conditional failure events of component X caused by CCiX(j)

Xi

Local failure event of component X

XiTF

Total failure event of component X

PCCEk

The kth PCCE

qiX

Conditional local failure probability given that component X has not failed before phase i

qijX

Conditional failure probability of component X given that CCij occurs

QikX

Total conditional failure probability of component X in phase i given that component X has not failed before phase i under PCCEk

UR

System failure probability

1. Introduction Some shocks, e.g., malicious attacks to sensor networks, viruses in computer systems, or extreme environmental conditions (hurricane, floods, lighting strikes), would cause failures of multiple components in real-world systems [1-5]. Such failures of multiple system components due to a shared root cause or a common cause (CC) are referred to as common cause failures (CCFs). Many studies have shown that CCFs contribute

greatly to the overall system failure probability [6, 7]. Therefore, it is essential to consider effects of CCFs for accurate reliability modeling and evaluation of critical systems. CCFs can be classified into two types according to their effects on the affected system components: deterministic CCFs (DCCFs) and probabilistic CCFs (PCCFs). A DCCF results in guaranteed or deterministic failures of all components affected by the CC; whereas a PCCF results in failures of different components affected by the CC with different probabilities [8, 9]. Consider, for example a wireless sensor network with sensors deployed in a military field or forest for fire detection. An explosion in a small area would destroy all sensors in this area. Such an explosion can be considered as a deterministic CC because all the affected sensors fail with probability “1”. However, failures of sensors caused by increased humidity are PCCFs because different sensors are resistant to different levels of humidity and thus may fail with different probabilities as the humidity level increases. Missions in many real-world systems, such as aerospace, nuclear power systems, airborne weapon systems and distributed computing systems, involve several different tasks that have to be accomplished in consecutive phases [10-16]. These systems are referred to as multi-phase systems or phased-mission systems (PMSs). Due to different requirements and environmental conditions, system configurations as well as component failure behaviors may vary from phase to phase. A specific, classic example is an aircraft with two engines. A complete flight mission involves taxiing, take-off, ascent, level-flight, descent, and landing phases. In the level-flight phase, only one engine is necessary, while in the take-off phase, both engines are required. Due to enormous stresses on engines in the take-off phase, the engines are more likely to fail as compared to other phases. In addition, the landing gear and its associate

control subsystems are only needed in the take-off and landing phases [17]. There have been numerous research studies on reliability analysis of both single-phase systems subject to DCCFs (e.g., [18-24]) and PMS subject to DCCFs (e.g., [25, 26]). However, very few research studies were on systems with PCCFs. Particularly, a binomial failure rate model was proposed in [27] to address PCCFs. But the model is only applicable to systems with s-identical and s-independent components where failure probabilities caused by the CC are also identical. Reference [9] analyzed reliability of systems with non-identical components where conditional failure events conditioned on the occurrence of a CC have to be s-independent. Reference [28] proposed two methods to analyze reliability of systems subject to s-independent or s-dependent CCs. All these existing research studies on PCCFs have assumed systems with a single-phased mission and cannot handle PMSs because PCCFs in a PMS are more complicated than those occurring in a single-phase system. Specifically, the dynamics in both component and system behaviors mentioned earlier pose unique challenges to reliability analysis of PMSs. Besides, there exist statistical dependencies across phases for a given component, which further complicates reliability analysis of PMSs. In particular, the state of a component at the beginning of a new phase has to be identical to its state at the end of the previous phase in a non-repairable PMS [29]. Moreover, the PMS might be subject to different PCCFs in different phases. Furthermore, a component that has no contribution to the system failure in a specific phase may also be affected by a CC in this phase; effect of the CC on this component should still be considered in latter phases. The existing explicit and implicit methods for handling PCCFs in single-phase systems [28] are not applicable to PMSs with these complicated PCCF behaviors as well as the aforementioned system dynamics

and dependencies of PMSs. Therefore, in this paper we make extensions by proposing new combinatorial methods to analyze reliability of PMSs subject to PCCFs. A case study is performed to illustrate the proposed methods. Computational and space complexity of the proposed methods are also discussed. The remainder of the paper is organized as follows. Section 2 describes the problem to be addressed in this paper. Section 3 presents a preliminary model. Section 4 presents the two proposed methods for the PCCFs analysis in PMSs. Section 5 gives a case study to illustrate applications of the proposed methods. Section 6 discusses and compares complexity of the two methods. Section 7 gives conclusions and future work.

2. Problem statement The paper considers the problem of evaluating the reliability of PMS subject to PCCFs. The considered PMS can be subject to more than one PCCF due to several elementary CCs occurring in one phase or in multiple different phases. All CCs are external to the system. In other words, PCCFs are only caused by external shocks. Different elementary CCs, whether from the same phase or from different phases, are s-independent. A component‟s failure event caused by a CC and its local failure event within each phase are s-independent. Components affected by the same elementary CC form a probabilistic common cause group (PCCG). A component can belong to more than one PCCG, that is, a component may be affected by multiple CCs. Let Li denote the number of elementary CCs involved in phase i and m denote the total number of phases in the considered PMS. Thus, L  i 1 Li is the total number of CCs occurring in the PMS. The m

elementary CCs events occurring in phase i are denoted by CCi1 , , CCiLi . All

components that fail due to CCij constitute PCCGij (i ≤ m, j ≤ Li).

3. Preliminary model In this section, we review basics of the traditional binary decision diagram (BDD) model for system reliability analysis, which is adapted in Section 4 for being used in the proposed methods. Based on Shannon‟s decomposition, a BDD can be expressed using the if-then-else (ite) format as [30, 31]:

f  ite x, f x1 , f x0   ite x,F1,F0   x  F1  x  F0

(1)

Eq. (1) implies that if x (x =1) then F1 = fx=1 (f evaluated at x being 1); else (x =0) F0= fx=0 (f evaluated at x being 0). Figure 1 shows the BDD format of this expression, where the right edge is referred to as a 1-edge or then-edge, the left edge is referred to as a 0-edge or else-edge.

x F0

F1

Figure 1. BDD encoding the ite format of Eq. (1) When modeling system failure behavior, there are two sink nodes in the BDD model which are labeled with constants 0 and 1 representing system success and failure, respectively. Each non-sink node representing a system component is labeled with a Boolean variable, and has two outgoing directed edges (Figure 1) representing the failure (1-edge) and success (0-edge) of the corresponding component, respectively. For BDD generation, the following traditional BDD generation rules are applied. Consider two sub-BDD models for variables representing two different

components

in

the

ite

format:

G  ite x, Gx1 , Gx0   ite x, G1 , G2 

and

H  ite y, H y 1 , H y 0   ite  y, H1 , H 2  . The rules for combining these two sub-BDD

models into one BDD model are [30] GH  ite x, G1 , G2 ite  y, H 1 , H 2  ite x, G1H 1 , G2 H 2    ite x, G1H , G2 H  ite  y, GH , GH  1 2 

index x   index  y 

index x   index  y 

(2)

index x   index  y 

where ◊ represents the logical AND or OR operation. index() represents the order of a Boolean variable predetermined before the BDD generation. In applying the above rules, orderings of two root variables (i.e., x for G, y for H) are first compared. If x and y have the same ordering (corresponding to the case where the two variables encode the same component), either of them becomes the root node of the combined BDD and the logic operation is applied to their child nodes. Otherwise, the variable with a smaller order becomes the root node of the combined BDD model and the logic operation is applied to each child of the node with the smaller order and the other sub-BDD model as a whole. These rules are applied to logic operations between sub-BDDs (Gi, Hi) in a recursive manner until one of them becomes constant „0‟ or „1‟, where Boolean algebra rules (1+x=1, 0+x=x, 1∙x=x, 0∙x=0) are applied.

4. Proposed methods In this section, we describe two methods, an explicit method and an implicit method, to analyze reliability of a PMS subject to PCCFs. The basic idea of the explicit method is to evaluate an expanded system model where each CC is modeled as a basic event shared by all components affected by this CC. The basic idea of the implicit method is to establish a system model without considering effects of PCCFs and then evaluate the system model including contributions of PCCFs. In general, the

explicit method is more straightforward and easier to follow, whereas the implicit method is more computationally efficient as detailed in Section 6. 4.1. Explicit method The proposed explicit algorithm can be described as the following two-step process: Step 1: establish an expanded PMS fault tree model considering effects of PCCFs. Based on the assumption that failure events caused by CCs and the local failure event of a component in a phase are s-independent, we develop independent pseudo-nodes representing the component failure events caused by CCs in a phase and add them to the original fault tree of the phase to generate an expanded PMS fault tree model. In particular, if component X appears in h PCCGs in phase i, that is, the component can be affected by h CCs (CCiX(1), CCiX(2), …, CCiX(h)) where



CCiX  j   CCi1 , , CCiLi



and 1≤ j≤ h, then h pseudo-nodes (Xi1, Xi2, … Xih)

representing the h conditional failure events caused by the h CCs are added to the original phase i fault tree for component X. Because a component fails if it either suffers a local failure or is affected by a CC, the total failure behavior of the component in phase i can be represented by the following logical expression: X iTF  CCiX 1  X i1   CCiX 2   X i 2     CCiX h   X ih   X i

(3)

where Xi denotes the local failure event of component X. Figure 2 illustrates the corresponding fault tree model representing the failure of component X affected by h CCs in phase i.

XiTF OR

...

AND

CCiX(1)

Xi1

AND

CCiX(h)

Xih

Xi

Figure 2. Fault tree model for component total failure event in phase i The occurrence probability of Xij is a conditional probability that component X fails given that CCiX(j) occurs. Since a PCCG typically includes more than one component in a phase, a CC event may appear more than once in the expanded fault tree model. Note that it may happen that a component X does not contribute to the system failure in phase i but belongs to PCCGiX(j), that is, the event representing the component local failure does not appear in the fault tree corresponding to phase i but the component may fail due to the occurrence of CCiX(j). In this case, the logical “AND” gate connecting pseudo-node Xij and CCiX(j) which represents the effect of CCiX(j) on Xi should be added to the later phase where the component local failure event first appears in the fault tree. If the component local failure event does not appear in the PMS fault tree after phase i, then the effect of CC on X is ignored for all phases after phase i since this component‟s failure has no contribution to the failure of PMS in all phases after phase i. After building the fault tree model in the form of Figure 2 for all components appearing in the PCCGs, the expanded fault tree for the entire PMS can be established by replacing each basic component failure event in the original fault tree with the new total component failure event represented using Figure 2. Step 2: evaluate the expanded PMS fault tree.

In this paper, the BDD-based method for the reliability analysis of PMSs [29] is adapted to evaluate the expanded fault tree to obtain reliability of PMS subject to PCCFs. Major steps of the adapted BDD-based method are summarized as follows: 1) Order input component state variables: for variables representing different components, a heuristic ordering algorithm [32] can be used; for variables representing the same component but in different phases, forward (the variable order is the same as the phase order) and backward (the variable order is the reverse of the phase order) can be used. The backward method is adopted in this paper since it can generate BDDs with smaller size than the forward method [29]. 2) Generate a BDD for each single phase. Note that for traditional PMS analysis, only manipulation rules of (2) described in Section 3 are needed to generate a single-phase BDD because only variables representing different components would appear in each phase. However, as discussed in step 1 of the proposed explicit method, in some phases of the expanded fault tree, different variables representing the same component but different phases may appear within the same phase. In such cases, special Phase Dependent Operation (PDO) rules must be applied to consider dependencies between these variables belonging to the same component. PDO Rules: Consider two sub-BDD models for variables representing the same component but in different phases i and j (i < j): G  ite xi , Gxi 1 , Gxi 0   ite xi , G1 , G2 



 



and H  ite x j , H x j 1 , H x j 0  ite x j , H1 , H 2 . The PDO rule for the backward ordering method is [29] GH  ite xi , G1 , G2 ite x j , H1 , H 2   ite x j , GH1 , G2 H 2  .

(4)

3) Generate a BDD for the entire PMS by combining BDDs obtained in step 2. For two variables representing different components, manipulation rules of (2) are applied; for two variables representing the same component but in different phases,

the PDO rules of (4) are applied. 4) Calculate the system unreliability by evaluating the PMS BDD generated in step 3. Each path in PMS BDD from the root node to sink node “1” represents a disjoint combination of component failures and non-failures in different phases that can result in the entire mission failure. Therefore, the PMS unreliability can be calculated as a sum of probabilities for all the paths from the root node to sink node “1”. Note that for paths involving variables that represent the same component but in different phases, special evaluation method (Eq. (11) in [29]) should be applied to handle statistical dependencies among these variables. 4.2. Implicit method The implicit algorithm can be described as the following five-step process: Step 1: construct an event space that involves all combinations of occurrence and non-occurrence of elementary CCs and then evaluate occurrence probability of each event. Given L elementary CCs that may happen in a PMS with m phases, an event space with 2L disjoint events is constructed. Each event, called a probabilistic common cause event (PCCE), is a combination of occurrence and non-occurrence of the L CCs. The 2L PCCEs are:

PCCE1  CC11    CC1L1    CCm1    CCmLm PCCE2  CC11    CC1L1    CCm1    CCmLm …… PCCE2L  CC11    CC1L1    CCm1    CCmLm

Let



2L k 1

Pr(PCCEk)

denote

the

occurrence

probability

of

PCCEk,

then

PrPCCE k   1 .

Step 2: evaluate the total conditional failure probability for components subject

to PCCFs under each PCCE in each phase. Let qiX be the conditional local failure probability given that component X has not failed before phase i and qijX be the conditional failure probability of component X given that CCiX(j) occurs. If component X can be affected by hk CCs (CCiX(1), CCiX(2), …, CCiX ( hk ) ) in phase i under PCCEk, the total conditional failure probability of component X in phase i given that component X has not failed before phase i is hk



QikX  1  1  qiX  1  qijX



j 1

(5)

Step 3: establish reliability model of the PMS without considering effects of PCCFs. In this step, a BDD model is built based on the original PMS fault tree model without considering effects of the PCCFs. Step 4: evaluate the PMS BDD model using the total conditional failure probabilities under each PCCE. Let Pr(PMS fails | PCCEk) be the conditional system failure probability given that PCCEk occurs. It is computed by evaluating the BDD model established in Step 3 using the component total conditional failure probabilities obtained in Step 2. Step 5: evaluate PMS reliability using total probability law. The final system failure probability considering effects of PCCFs is 2L

UR  PrPMSfails    PrPMSfails | PCCE k   PrPCCE k  . k 1

5. Case study In this section, we analyze communication reliability of a wireless sensor network (WSN) system in Figure 3 as a case study to illustrate the two methods presented in Section 4.

b

s

t

a c

j

e

n m

l k

d h f g

i

Figure 3. An illustrative WSN example There

are

two

communication

paradigms

within

WSN:

application

communication and infrastructure communication [33]. Infrastructure communication relates to delivery of configuration and maintenance messages (e.g. network set-up, query, path discovery, and policies) from the base station to sensor nodes; application communication relates to delivery of sensed data from sensor nodes to the base station. In this case study, we consider a two-phase communication: the first phase is an infrastructure communication phase (ICP) where the base station (node s) sends message to a destination sensor (node t); the second phase is an application communication phase (ACP) where the destination sensor node sends data to the base station. The two-phase communication mission succeeds only if both ICP and ACP succeed. In the ICP, there are two paths from the base station to the destination sensor node: 1) path11: c → f → h → j; 2) path12: c → f → h → k → m. In the ACP, there are two paths from the destination sensor node to the base station: 1) path21: l → e → b; 2) path22: l → i → g → d → b. We assume that the nodes are perfect and only the links can fail during the mission. There are two s-independent external CCs: CC11 in phase 1 and CC21 in phase 2. The corresponding PCCGs are: PCCG11 = {e, h, i} and PCCG21 = {e, j, l}. The following parameter values are used in the analysis: 1. Local conditional failure probabilities of links: q1X = 0.01, q2X = 0.02,

X  a, b,, n . 2. Occurrence probabilities of CC: pCC11 = pCC21 = 0.001. 3. Conditional link failure probabilities given the occurrence of a CC: q11e = 0.2, q11h = 0.5, q11i = 0.3, q21e = 0.3, q21j = 0.5, q21l = 0.7. Figure 4 illustrates the fault tree model describing the failure of the example PMS. PMS Failure OR

j1

ICP Failure

ACP Failure

AND

AND

Path11 Failure

Path12 Failure

Path21 Failure

Path22 Failure

OR

OR

OR

OR

c1

f1

h1

k1

m1

e2

l2

b2

i2

g2

d2

Figure 4. Fault tree model for the PMS example 5.1. Explicit method Step 1: establish an expanded fault tree including effects of PCCFs based on the fault tree in Figure 4. Nodes representing local failure events of e1, h1, i1 in phase 1 and e2, j2, l2 in phase 2 are replaced with sub-fault tree models as shown in Figure 2, which represents the total failure event considering effects of PCCFs for a component. Although e1 and i1 do not appear in the phase 1, i.e., ICP sub-fault tree, failures of links e and i in phase 1 still contribute to the phase 2, i.e., ACP failure. Thus, the effect of CC11 on e1 and i1 is still considered by adding logical “AND” gates to

sub-fault tree under the ACP failure. The effect of CC21 on j does not need to be considered at all since the failure of link j has no contribution to the PMS failure in ACP phase which is the last phase. The expanded fault tree is shown in Figure 5. PMS Failure OR

j1

ICP Failure

ACP Failure

AND

AND

Path11 Failure

Path12 Failure

Path21 Failure

Path22 Failure

OR

OR

OR

OR

c1

f1

AND

CC11

k1

OR

h11

h1

m1

OR

AND

CC11

e11

AND

CC21

b2

OR

e2

e21

AND

CC21

l21

l2

g2

OR

AND

CC11

i2

i11

Figure 5. Expanded fault tree considering the effects of PCCFs Step 2: evaluate the expanded fault tree in Figure 5. In this paper, the BDD-based method for PMS is applied to evaluate the expanded fault tree. The PMS BDD model for the fault tree in Figure 5 is shown in Figure 6.

d2

b2 l2 c1 f1 h1 g2 d2 i2 e2 CC21

e2 CC21

l21

CC21 l21

l21

e21

e21 CC11

CC11

CC11 e11

i11

i11 e11

h11

h11

h11 j1 k1 m1 0

1

Figure 6. BDD model based on the expanded fault tree in Figure 4 By evaluating the BDD model in Figure 6, we obtain the communication unreliability of the example WSN system as 0.09034708.

5.2. Implicit method

Step 1: construct an event space that involves all combinations of occurrence and non-occurrence of the two CCs and then evaluate occurrence probability of each combination. Since there are two CCs, the event space consists of 22 = 4 PCCEs, which are PCCE1  CC11  CC21

;

PCCE 2  CC11  CC21

;

PCCE 3  CC11  CC21 ;

PCCE 4  CC11  CC21 .

Because the two CCs are s-independent, occurrence probabilities of the 4 events are PrPCCE1   1  pCC11 1  pCC 21   0.998001 ; PrPCCE 2   pCC11 1  pCC 21   0.000999 ; PrPCCE 3   1  pCC11  pCC 21  0.000999 ; PrPCCE 4   pCC11 pCC 21  0.000001 .

Step 2: evaluate total conditional failure probabilities for components subject to PCCFs under each PCCE in each phase. PCCE1 is an event that no CC happens at all. Therefore, no components are subject to PCCF under PCCE1. PCCE2 is an event that only CC11 happens. Under this event, components e, h, and i may fail in phase 1 due to occurrence of CC11. For example, based on Eq. (5), the total conditional failure probability for component e is

Q12e  1  1  q1e 1  q11e   0.208 . Similarly, we can obtain the total conditional failure probabilities for all other components subject to PCCFs under each PCCE in each phase, which are listed in Table 1. Table 1. Total conditional failure probabilities under each PCCE

e1 h1 i1 e2 l2

PCCE1 -

PCCE2 0.208 0.505 0.307 -

PCCE3 0.314 0.706

PCCE4 0.208 0.505 0.307 0.314 0.706

Step 3: build a BDD model of the PMS in Figure 4 without considering effects of PCCFs. The fault tree model of the example PMS without considering effects of PCCFs is shown in Figure 4. The corresponding PMS BDD model is shown in Figure 7. b2 l2 c1 f1 h1 g2 d2 i2 e2

j1 k1 m1 0

1

Figure 7. BDD model for the example PMS without considering effects of PCCF Step 4: evaluate the BDD model in Figure 7. Using the total conditional failure probabilities in Table 1, we can calculate the conditional failure probability under each PCCE by evaluating the BDD model in Figure 7.

The conditional PMS failure probabilities are PrPMSfails | PCCE1   0.08921187 ; PrPMSfails | PCCE2   0.5802923 ;

PrPMSfails | PCCE3   0.7336815 ; PrPMSfails | PCCE4   0.88559284 .

Step 5: evaluate the final PMS failure probability using total probability law as 4

UR   PrPMSfails | PCCEi   PrPCCEi   0.09034708 i 1

The result obtained by the implicit method matches exactly the result obtained using the explicit method. 5.3. Effects of PCCFs To show impacts of PCCFs on system reliability performance, five different sets of conditional link failure probabilities, which respectively represent no CCF occurring (Set 1 in Table 2), low occurrence probabilities (Set 2), medium occurrence probabilities (Set 3), high occurrence probabilities (Set 4) and DCCF scenarios (Set 5), are studied for the example WSN system. We also study effects of occurrence probabilities of CCs on system reliability by changing values of pCC11, pCC21 under these five sets of parameters. Last four rows of Table 2 shows failure probabilities of the example PMS. The results are also presented in Figure 8 graphically. Table 2. PMS failure probability results

q11e q11h q11i q21e q21j q21l

Set 1 0 0 0 0 0 0

Set 2 0.2 0.2 0.2 0.2 0.2 0.2

Set 3 0.5 0.5 0.5 0.5 0.5 0.5

Set 4 0.8 0.8 0.8 0.8 0.8 0.8

Set 5 1 1 1 1 1 1

UR for pCC11= pCC21 = 0.001 UR for pCC11= pCC21 = 0.1 UR for pCC11 = pCC21 = 0.5 UR for pCC11 = pCC21 = 1

0.08921187 0.08963062 0.09026816 0.09080058 0.09103254 0.08921187 0.13076253 0.19206757 0.24128704 0.26226162 0.08921187 0.29042071 0.54746887 0.71230026 0.77230297 0.08921187 0.47526844 0.86567213 0.99217003

pCC11=pCC21= 0.001

pCC11=pCC21= 0.1

pCC11=pCC21= 0.5

pCC11=pCC21= 1

1

1

PMS failiure probability (UR)

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Set 1

Set 2

Set 3

Set 4

Set 5

Figure 8. Impacts of PCCFs From results presented in Table 2 and Figure 8, we can see that as the conditional link failure probabilities increase (changing from Set 1 to Set 5), the PMS failure probability increases monotonically, and the degree of increase depends on occurrence probabilities of CCs which determine the overall contribution of PCCFs to the entire system. Specifically, when occurrence probabilities of CCs are small (pCC11 = pCC21 = 0.001, corresponding to dashed line), the PMS failure probability increases very slightly as the conditional link failure probabilities increase However, for another extreme case (pCC11 = pCC21 = 1, corresponding to solid line), a small change in conditional link failure probabilities would cause a large change in the PMS failure

probability. In this case, when all the conditional link failure probabilities are 1 (Set 5), all affected components suffer deterministic failures, which crashes the entire example system since there are no redundant paths available. Note that Set 1 represents a case that no components are really affected by occurrence of CCs. PMS failure probabilities under this case are the same although the occurrence probabilities for CCs vary.

6. Complexity analysis In this section, we analyze both space and computational complexity of the two proposed methods. 6.1. Space complexity Consider a mission with m phases and n components. In the worst case, all the n components contribute to the mission failure in every phase. For the explicit method, if there are x CCs and all n components are affected in the worst case by each CC, then n*x pseudo variables representing component failures due to effects of PCCFs and x nodes representing occurrences of CCs are included for generating the BDD model. Therefore, in the worst case, there are m*n+(n+1)*x input variables for generating the final BDD. The complexity of the worst case size of BDD is O(2N/N), where N is the number of variables [34]. Thus the complexity of the worst case size of BDD in the explicit method is O(2m*n+(n+1)*x /( m*n+(n+1)*x)). For the implicit method, the effects of PCCFs are not included in the PMS BDD model. Therefore, in the worst case, the number of input variables is m*n. Thus the complexity of the worst case size of BDD in the implicit method is O(2m*n/(m*n)). According to the above discussion, we can conclude that the explicit method has higher space complexity than the implicit method.

6.2. Computational complexity The computational complexity of the BDD evaluation algorithm via the bottom-up evaluation approach based on the memoization technique is O(M), where M is the number of nodes in the BDD model [35]. For the explicit method, since the BDD model generated for the expanded fault tree is evaluated only once, the computational complexity for the explicit method is simply O(2m*n+(n+1)*x /( m*n+(n+1)*x)) based on the worst case size of BDD discussed in Section 6.1. For the implicit method, since the BDD model needs to be evaluated 2x times with different parameters given there are x CCs, the computational complexity of the implicit method is 2x *O(2m*n/(m*n))= O(2m*n+x/(m*n)). According to the above discussion, we can conclude that the explicit method is computationally less efficient than the implicit method.

7. Conclusions and future work In this paper, we propose an explicit method and an implicit method for reliability analysis of PMSs subject to s-independent PCCFs. Both methods are applicable to any arbitrary types of time-to-failure distributions for system components. The explicit method is more straightforward to follow (involving only two steps) but less efficient (in both time and space) than the implicit method. In the future, we will study PCCFs in dynamic PMSs, s-dependent PCCFs, as well as PCCFs with cascading effects.

Acknowledgment This work was supported in part by the US National Science Foundation under

Grant No. 1112947.

Reference [1] W. Li and H. Pham, “Reliability modeling of multi-state degraded systems with multi-competing

failures

and

random

shocks”, IEEE

Transactions

on

Reliability, vol. 54, no. 2, pp. 297-303, Jun. 2005. [2] L. Xing, L. Meshkat and S. Donohue, “Reliability analysis of hierarchical computer-based systems subject to common-cause failures”, Reliability Engineering & System Safety, vol. 92, no. 6, pp. 351-359, Mar. 2007. [3] Z. Tang, H. Xu and J. B. Dugan, “Reliability analysis of phased mission systems with common cause failures”, Proc. of the Annual Reliability and Maintainability Symposium, pp. 313-318, 24-27 Jan. 2005. [4] G. Levitin, L. Xing, and S. Yu, “Optimal connecting elements allocation in linear consecutively-connected systems with phased mission and common cause failures”, Reliability Engineering & System Safety, vol. 130, pp. 85-94, 2014. [5] M. C. M. Troffaes, G. Walter, and D. Kelly, “A robust Bayesian approach to modeling epistemic uncertainty in common-cause failure models”, Reliability Engineering & System Safety, vol. 125, pp. 13-21, 2014. [6] S. Mitra, N. R. Saxena and E. J. McCluskey, “Common-mode failures in redundant VLSI systems: a survey”, IEEE Transactions on Reliability, vol. 49, no. 3, pp. 285-295, Sep. 2000. [7] J. Borcsok, S. Kassel and E. Ugljesa, “Estimation and evaluation of common cause failures”, Proceeding of Second International Conference on Systems, Sainte-Luce, Martinique, France, 22-28 Apr. 2007. [8] L. Xing and W. Wang, “Probabilistic common-cause failures analysis”,

Proceedings of the Annual Reliability and Maintainability Symposium, pp. 354-358, Las Vagas, Nevada, 28-31 Jan. 2008. [9] L. Xing, P. Boddu, Y. Sun and W. Wang, “Reliability analysis of static and dynamic fault-tolerant systems subject to probabilistic common-cause failures”, Proc. IMechE, Part O: Journal of Risk and Reliability, vol. 224, no. 1, pp. 43-53, 2010. [10] A. K. Somani, J. A. Ritcey, and S. H. L. Au, “Computationally efficient phased-mission reliability analysis for systems with variable configurations”, IEEE Transactions on Reliability, vol. 41, no. 4, pp. 504-511, 1992. [11] A. Pedar and V. V. S. Sarma, “Phased-mission analysis for evaluating the effectiveness of aerospace computing-systems”, IEEE Transactions on Reliability, vol. R-30, no. 5, pp. 429-437, 1981. [12] H. S. Winokur Jr., and L. J. Goldstein, “Analysis of mission-oriented systems”, IEEE Transactions on Reliability, vol. R-18, vol. 4, pp. 144-148, 1969. [13] J. L. Bricker, “A unified method for analyzing mission reliability for fault tolerant computer systems”, IEEE Transactions on Reliability, vol. R-22, no. 2, pp. 72-77, 1973. [14] L. Xing, “Reliability importance analysis of generalized phased-mission systems”, International Journal of Performability Engineering, vol. 3, no. 3, pp. 303-318, 2007. [15] J. Lu, and X. Wu, “Reliability evaluation of generalized phased-mission systems with repairable components”, Reliability Engineering & System Safety, vol. 121, pp. 136-145, 2014. [16] R. Peng, Q. Zhai, L. Xing, and J. Yang, “Reliability of demand-based phased-mission systems subject to fault level coverage”, Reliability Engineering

& System Safety, vol. 121, pp. 18-25, 2014. [17] L. Xing, S. V. Amari, and C. Wang, “Reliability of k-out-of-n systems with phased-mission requirements and imperfect fault coverage”, Reliability Engineering & System Safety, vol. 103, pp. 45-50, 2012. [18] Y. Dai, M. Xie, K. L. Poh and S. H. Ng, “A model for correlated failures in N-version programming”, IIE Transactions, vol. 36, no. 12, pp. 1183-1192, 2004. [19] J. K. Vaurio, “Fault tree analysis of phased mission systems with repairable and non-repairable components”, Reliability Engineering & System Safety, vol. 74, no. 2, pp. 169-180, Nov. 2001. [20] K. N. Fleming and A. Mosleh, “Common-cause data analysis and implications in system modeling”, Proceedings of the International Topical Meeting on Probabilistic safety methods and applications, San Francisco, California, 1985, vol. 1: 3/1-3/12, EPRI NP-39129-SR. [21] J. K. Vaurio, “An implicit method for incorporating common-cause failures in system analysis”, IEEE Transactions on Reliability, vol. 47, no. 2, pp. 173-180, Jun. 1998. [22] Z. Tang and J. B. Dugan, “An integrated method for incorporating common cause failures in system analysis”, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 610-614, Las Vagas, Nevada, 26-29 Jan. 2004. [23] L. Xing, “Reliability evaluation of phased-mission systems with imperfect fault coverage and common-cause failures”, IEEE Transactions on Reliability, vol. 56, no. 1, pp. 58-68, Mar. 2007. [24] L. Xing, A. Sherstha, L. Meshkat and W. Wang, “Incorporating common-cause failures into the modular hierarchical systems analysis”, IEEE Transactions on Reliability, vol. 58, no. 1, pp. 10-19, Mar. 2009.

[25] G. Levitin, L. Xing, S. Amari, and Y. Dai, “Reliability of non-repairable phased-mission systems with common cause failures”, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 4, pp. 967-978, 2013. [26] G. Levitin, L. Xing, S. Amari, and Y. Dai, “Reliability of non-repairable phased-mission systems with propagated failures”, Reliability Engineering & System Safety, vol. 119, pp. 218-228, 2013. [27] K. C. Chae, “System reliability using binomial failure rate”, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 136-138, Los Angeles, California, 26-28 Jan. 1988. [28] C. Wang, L. Xing, and G. Levitin, “Explicit and implicit methods for probabilistic common-cause failure analysis”, Reliability Engineering & System Safety, vol. 131, pp. 175-184, Nov. 2014. [29] X. Zang, H. Sun, and K. S. Trivedi, “A BDD-based algorithm for reliability analysis of phased-mission systems,” IEEE Transactions on Reliability, vol. 48, no. 1, pp. 50-60, 1999. [30] L. Xing and S. V. Amari, Binary Decision Diagrams and Extensions for System Reliability Analysis, Wiley-Scrivener, MA, ISBN: 978-1-118-54937-7, 2015. [31] S. Li, S. Si, H. Dui, Z. Cai, and S. Sun, "A novel decision diagrams extension method," Reliability Engineering & System Safety, Vol. 126, pp. 107-115, June 2014 [32] M. Bouissou, F. Bruyere, and A. Rauzy, “BDD based fault-tree processing: a comparison of variable ordering heuristics”, Proceedings of ESREL Conference 1997. [33] C. Wang, L. Xing, V. M. Vokkarane, and Y. Sun, “A phased-mission framework for communication reliability in WSN”, Proceedings of the Annual Reliability

and Maintainability Symposium (RAMS) ,Colorado Springs, CO, 27-30. Jan. 2014. [34] H. Liaw, and C. Lin, “On the OBDD-representation of general Boolean functions”, IEEE Transactions on Computers, vol. 41, no. 6, pp. 661-664, 1992. [35] A. Shrestha, L. Xing, and Y. Dai, “Decision diagram based methods and complexity analysis for multi-state systems”, IEEE Transactions on Reliability, vol. 59, no. 1, pp. 145-161, 2010.

Probabilistic Common Cause Failures in Phased-Mission Systems Chaonan Wang, Liudong Xing, Gregory Levitin Highlights: ► Probabilistic common cause failures (PCCFs) in phased-mission systems (PMS) are analyzed. ► Two combinatorial methods are proposed for reliability analysis of PMSs subject to PCCFs. ► Space and computational complexity is compared for the two proposed methods.