Common cause failure probabilities in standby safety system fault tree analysis with testing—scheme and timing dependencies

Common cause failure probabilities in standby safety system fault tree analysis with testing—scheme and timing dependencies

Reliability Engineering and System Safety 79 (2003) 43–57 www.elsevier.com/locate/ress Common cause failure probabilities in standby safety system fa...

245KB Sizes 0 Downloads 16 Views

Reliability Engineering and System Safety 79 (2003) 43–57 www.elsevier.com/locate/ress

Common cause failure probabilities in standby safety system fault tree analysis with testing—scheme and timing dependencies J.K. Vaurio* Fortum Power and Heat Oy, P.O. Box 23, 07901 Loviisa, Finland Received 20 January 2001; accepted 26 August 2002

Abstract Modelling and quantification of common cause failures (CCFs) in redundant standby safety systems can be implemented by implicit or explicit fault tree techniques. Common cause event probabilities are derived for both methods for systems with time-related CCFs modelled by general multiple failure rates. The probabilities are determined so that the correct time-average risk can be obtained by a single computation. The impacts of test intervals and test staggering are included. Staggered testing is best with a certain extra-testing rule, although extra testing is not important for 1-out-of-n : G systems. An economic model provides insights into the impacts of various parameters: the optimal test interval increases with increasing redundancy and testing cost, and it decreases with increasing accident cost and initiating event rate. Staggered testing with extra tests allows for the longest optimal test intervals. A practical technique is outlined for incorporating assessment uncertainties in the estimation of multiple failure rates based on data from many plants or systems. q 2002 Elsevier Science Ltd. All rights reserved. Keywords: Dependent failures; Fault tree; Implicit; Explicit; Redundant; Risk; Standby; Safety; System; Unavailability

1. Introduction Fault tree analysis [1] is a common technique used in large system reliability analysis and in PSA [2]. The qualitative phase of the analysis solves the system logic function in terms of the minimal cut set (mcs), as a Boolean sum of products of basic events (component failure modes). The quantitative phase calculates the system failed state probability, and then most computer codes assume the basic events to be mutually statistically (s-)-independent. In reality, different kinds of dependencies can be involved, making the joint probabilities of events unequal to the products of event probabilities. One important class of dependencies is the common causes that fail multiple components simultaneously or nearly so. Even if such events are rare, they often dominate system failure probabilities. It is therefore important to quantify and model common cause failure (CCF) in reliability and risk assessments. There are basically two ways to incorporate dependent failures in system analysis: implicit and explicit methods. Corresponding methods in some software studies have been called correlated failures and differentiated causes * Lappeenranta University of Technology, Lappeenranta, Finland. Tel.: þ 358-10-4554700; fax: þ358-10-4554435. E-mail address: [email protected] (J.K. Vaurio).

approaches, respectively [3]. In the implicit method, the mcs and the probability equation are first presented as if all events Xi (i refers to a certain failure mode of a specific component) were mutually independent basic events. In each term any product PðXi ÞPðXj Þ is then replaced with PðXi > Xj Þ; PðXi ÞPðXj ÞPðXk Þ with PðXi > Xj > Xk Þ; etc, as dependency in general means that the joint probabilities are not equal to the products of individual basic event probabilities. This approach applies if the joint probabilities PðXi > Xj · · ·Þ are known or can be determined through correlations or conditional probabilities. In this paper the probabilities are time-average joint unavailabilities of components. Certain types of dependencies can be modelled in a fault tree explicitly as mutually independent basic events Zij· · · cause-events for specific component failures i; j· · · (and no others). In this case each component-level event Xi is replaced with the union (Boolean OR-gate) of all events Zi· · · which fail component i (indicated by i as one of the sub-indices of Z· · · ). This is illustrated for X2 in Figs. 1 and 2 in case of three dependent failures. Similarly, X1 becomes replaced with the expression X1 ¼ Z1 < Z12 < Z13 < Z123 : The TOP-event of Fig. 1 can be part of a large system- or plant-level fault tree. The explicit method has been used mainly for CCF with empirically or parametrically determined probabilities PðZi· · · Þ [4]. An advantage

0951-8320/03/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 9 5 1 - 8 3 2 0 ( 0 2 ) 0 0 1 7 0 - 9

44

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

Nomenclature CCF

common cause failure; simultaneous failure of multiple components by shared cause CIRA CCF identification and repair assumption; a way to identify and repair other failed components of a CCF when one component of the CCF is found failed ETRR extra testing and repair rule; all other similar components are tested and repaired when one is found failed GMFR general multiple failure rate model (based on rates li ; lij ; …) ITRP components tested and repaired individually, one at a time PSA probabilistic safety assessment mcs minimal cut set(s) rea rare-event approximation; sum of the mcs probabilities sstatistically m/M-system m-out-of-M : G system; system is operational when m or more out of M redundant components are operational (success criterion)

n

number of mutually dependent events (s-independent of other events) < union of sets; OR-gate in a fault tree > intersection of sets; AND-gate in a fault tree PðAÞ probability of set A, or event A ¼ TRUE qk=n unavailability of specific k out of n basic events due to a common cause influencing exactly these k events and no others l / x2 ðrÞ=t l is a chi-squared distributed random variable with r degrees of freedom, divided by t Xi Boolean variable; a certain failure mode state of a specific component xi PðXi Þ xij· · · PðXi > Xj < · · ·Þ Zij· · · Boolean variable; failed state of events i; j; · · · due to a common cause zij· · · PðZij· · · Þ lij· · · rate of simultaneous failure of components (events) i; j; · · · due to a common cause failing exactly these components and no others; the rates depend on n

of the explicit method is that most computer codes solving fault trees can automatically handle mutually s-independent events. Some benefits and limitations of solving fault tree problems with the implicit method have been studied [5]. The number of events and mcs is smaller than in the explicit method. Potential drawbacks of the implicit method are that

(1) a large number of terms (mcs) may need to be manually quantified unless the process is computerised and (2) truncation of the system equation (necessary in large systems) carries the risk of losing terms or events that could be important through dependencies. For these reasons, and because most fault tree codes can automatically handle independent events (explicit models), it would be beneficial to be able to change an implicit model to an equivalent explicit model and know how the probabilities PðZi· · · Þ

Fig. 1. Component-level fault tree (example).

Fig. 2. Component event X2 modelled by cause-events Zij· · · :

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

should be determined when the joint probabilities PðXi > Xj · · ·Þ are known. Such transformations have been obtained for systems that can be modelled as a fault tree with basic events and AND and OR-gates. In this paper equations are developed for incorporating CCFs in standby system fault trees and unavailability quantification using both implicit and explicit methods. Common cause failed states Zij· · · are here due to events that simultaneously fail exactly specific components i; j; …; and no others. The causes are considered mutually independent so that any CCF having occurred does not preclude others from occurring. This may be a slightly conservative assumption, at most. The causes of single failures are often internal to the component (wear, ageing, loosening, etc.), while simultaneous multiple failures in parallel trains are more likely caused by external causes, like changes in common environment or common support or maintenance activities. Exact implicit – explicit transformation equations have been developed earlier for groups of n ¼ 2; 3 and 4 mutually dependent events [6]. These are presented for the sake of completeness in Section 2. Application equations for standby safety system analysis with CCF are developed in Section 3. A general multiple failure rate (GMFR) model is used because it accounts for test intervals and testing schemes and is not limited by any assumptions about relationships or ratios between the rates. The basic principle is to determine CCF -probabilities so that correct time-average system unavailability can be obtained with a single fault tree quantification. Many well-known models can be expressed in terms of GMFR [12]. This paper can be viewed as an extension and refinement of Ref. [12], including economical aspects. In addition, a method for estimation of the multiple failure rates with an empirical Bayes method that takes into account the assessment uncertainties is suggested in Section 4.

45

In case of two dependent components ðn ¼ 2Þ; one has to define three s-independent events Z1 ; Z2 and Z12. The ORgates X1 and X2 have the following inputs: Z1 is input to X1, Z2 to X2 and Z12 to both X1 and X2. This is equivalent to representing X1 and X2 with the Boolean equations X1 ¼ Z1 < Z12 ; X2 ¼ Z2 < Z12 : From these follows X1 > X2 ¼ ðZ1 > Z2 Þ < Z12 : The equations for solving the probabilities zi ¼ PðZi Þ and zij ¼ PðZij Þ in terms of xi ¼ PðXi Þ and xij ¼ PðXi > Xj Þ are: x1 ¼ z1 þ z12 2 z1 z12 ;

x2 ¼ z2 þ z12 2 z2 z12 ;

x12 ¼ z1 z2 þ z12 2 z1 z2 z12 : Analytical inverse solution leads to x12 2 x1 x2 x 2 x12 z12 ¼ ; z1 ¼ 1 ; 1 2 x1 2 x2 þ x12 1 2 x2 z2 ¼

x2 2 x12 : 1 2 x1

ð1Þ

ð2Þ

In case n ¼ 3 three single events Zi ; three double failure events Zij and one triple failure event Z123 need to be defined and quantified. It is advantageous to work with the success states Yi· · · ¼ Z i· · · with probabilities yi· · · ¼ PðYi· · · Þ ¼ 1 2 zi· · · : These are mutually s-independent just as Zi· · · : The basic idea is to write the equations for X i ¼ >Yi· · · : X i is TRUE if and only if all Yi· · · (that has i as any one of the subindices) are TRUE. Similarly, any intersection X i > X j is TRUE if and only if all Yi· · · ; Yj· · · (that has i or j or both as a sub-index) are TRUE. In case n ¼ 3; the seven equations for solving seven values yi· · · in terms of xj· · · are: PðX 1 Þ ¼ 1 2 x1 ¼ y1 y12 y13 y123 ; PðX 2 Þ ¼ 1 2 x2 ¼ y2 y12 y23 y123 ; PðX 3 Þ ¼ 1 2 x3 ¼ y3 y13 y23 y123 ; PðX 1 > X 2 Þ ¼ PðX1 < X2 Þ ¼ 1 2 x1 2 x2 þ x12 ¼ y1 y2 y12 y13 y23 y123 ; PðX 1 > X 3 Þ ¼ PðX1 < X3 Þ ¼ 1 2 x1 2 x3 þ x13

2. General implicit –explicit transformations In the explicit method, each component-level event Xi of a group on n mutually dependent events in a fault tree is replaced with an OR-gate having as input of all mutually independent Boolean events Zi ; Zik ; etc. so that in each Zi· · · one sub-index equals i. In this way n mutually dependent events Xi are modelled with 2n 2 1 mutually independent events, which can be real events or virtual ‘surrogate’ events defined for computational purposes. The probabilities of these Zi· · · -events are determined so that all joint probabilities PðXi Þ; PðXi > Xj Þ; etc. are preserved, thereby producing correct values for all mcs and for the system TOP-event. The method is independent of the system success criterion and n is not restricted in any way by the number of events in mcs.

¼ y1 y3 y12 y13 y23 y123 ;

ð3Þ

PðX 2 > X 3 Þ ¼ PðX2 < X3 Þ ¼ 1 2 x2 2 x3 þ x23 ¼ y2 y3 y12 y13 y23 y123 ; PðX 1 > X 2 > X 3 Þ ¼ 1 2 x1 2 x2 2 x3 þ x12 þ x13 þ x23 2 x123 ¼ y1 y2 y3 y12 y13 y23 y123 : On the left-hand side we have applied de Morgan rule A > B ¼ A < B and the exclusion –inclusion principle (Poincare´s rule) to get the intersection probabilities in terms of xkl· · · : The values of yi· · · can now be easily solved by the ratios, e.g. y1 ¼ PðX 1 > X 2 > X 3 Þ=PðX 2 > X 3 Þ: Then, because

46

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

zi· · · ¼ 1 2 yi· · · ; one obtains:

P124 ¼ P1234 =y3 ;

PðZ1 Þ ¼ ðx1 2 x12 2 x13 þ x123 Þ=ð1 2 x2 2 x3 þ x23 Þ;

P134 ¼ P1234 =y2 ;

PðZ2 Þ ¼ ðx2 2 x12 2 x23 þ x123 Þ=ð1 2 x1 2 x3 þ x13 Þ; PðZ3 Þ ¼ ðx3 2 x13 2 x23 þ x123 Þ=ð1 2 x1 2 x2 þ x12 Þ;

P234 ¼ P1234 =y1 :

PðZ12 Þ ¼12

ð1 2 x1 2 x3 þ x13 Þð1 2 x2 2 x3 þ x23 Þ ; ð1 2 x3 Þð1 2 x1 2 x2 2 x3 þ x12 þ x13 þ x23 2 x123 Þ

PðZ13 Þ ¼12

ð1 2 x1 2 x2 þ x12 Þð1 2 x2 2 x3 þ x23 Þ ; ð1 2 x2 Þð1 2 x1 2 x2 2 x3 þ x12 þ x13 þ x23 2 x123 Þ

From these one can solve yij··· : s and then zij·· · ¼ 1 2 yij··· : z1 ¼ 1 2 P1234 =P234 ;

z2 ¼ 1 2 P1234 =P134 ;

z3 ¼ 1 2 P1234 =P124 ;

z4 ¼ 1 2 P1234 =P123 ;

z12 ¼ 1 2

P134 P234 ; P1234 P34

z14 ¼ 1 2

P123 P234 ; P1234 P23

PðZ23 Þ ¼12

ð1 2 x1 2 x3 þ x13 Þð1 2 x2 2 x1 þ x12 Þ ; ð1 2 x1 Þð1 2 x1 2 x2 2 x3 þ x12 þ x13 þ x23 2 x123 Þ

PðZ123 Þ ¼ 1 2

z13 ¼ 1 2

P124 P234 ; P1234 P24

ð1 2 x1 Þð1 2 x2 Þð1 2 x3 Þð1 2 x1 2 x2 2 x3 þ x12 þ x13 þ x23 2 x123 Þ : ð1 2 x1 2 x2 þ x12 Þð1 2 x1 2 x3 þ x13 Þð1 2 x2 2 x3 þ x23 Þ

In case n ¼ 4 there are 15 probabilities zi· ·· ¼ PðZi··· Þ to be solved. The same number of equations can be developed by first calculating (by the inclusion –exclusion principle) the values Pij··· ¼ PðX i > X j > · · ·Þ for all combinations of unequal i;j; k ði;j; k ¼ 1; 2;3; 4; i – j – k – iÞ : Pi ¼ 1 2 xi ; Pij ¼ 1 2 xi 2 xj þ xij ; Pijk ¼ 1 2 xi 2 xj 2 xk þ xij þ xik þ xjk 2 xijk ;

ð5Þ

P1234 ¼ 1 2 x1 2 x2 2 x3 2 x4 þ x12 þ x13 þ x14 þ x23 þ x24

These same probabilities can be written as products of all yj· ·· ¼ PðYj· ·· Þ that have any sub-index equal to any subindex of Pj· ·· : This principle yields: P1 ¼ y1 y12 y13 y14 y123 y124 y134 y1234 ; P2 ¼ y2 y12 y23 y24 y123 y124 y234 y1234 ; P3 ¼ y3 y13 y23 y34 y123 y134 y234 y1234 ; P4 ¼ y4 y14 y24 y34 y124 y134 y234 y1234 ; P12 ¼ y1 y2 y12 y13 y14 y23 y24 y123 y124 y134 y234 y1234 ; P14 ¼ y1 y4 y12 y13 y14 y24 y34 y123 y124 y134 y234 y1234 ;

P124 P134 ; P1234 P14

z34 ¼ 1 2

P123 P124 ; P1234 P12

ð6Þ

ð4Þ

z24 ¼ 1 2

P123 P134 ; P1234 P13

z123 ¼ 1 2

P14 P24 P34 P1234 ; P4 P124 P134 P234

z124 ¼ 1 2

P13 P23 P34 P1234 ; P3 P123 P134 P234

z134 ¼ 1 2

P12 P23 P24 P1234 ; P2 P123 P124 P234

z234 ¼ 1 2

P12 P13 P14 P1234 ; P1 P123 P124 P134

z1234 ¼ 1 2

þ x34 2 x123 2 x124 2 x134 2 x234 þ x1234 :

P13 ¼ y1 y3 y12 y13 y14 y23 y34 y123 y124 y134 y234 y1234 ;

z23 ¼ 1 2

ð7Þ

P1 P2 P3 P4 P123 P124 P134 P234 : P1234 P12 P13 P14 P23 P24 P34

We have now general equations to calculate exactly all probability values zij· ·· ¼ PðZij· ·· Þ for the explicit model basic events, whether real or virtual, in terms of the joint probabilities xij··· : These results have been used earlier for explicit probabilities of repeatable human errors [7]. Eq. (7) has differences of nearly equal numbers and can exhibit high truncation errors if not carefully evaluated. It is not trivial to simplify them by serial developments because CCF-terms are not necessarily decreasing by the order, and products of lower order CCF-terms can be as large as single higher order terms. Anyway, the method is independent of the system size or success criterion, and a system can have many groups of mutually dependent events.

P23 ¼ y2 y3 y12 y13 y23 y24 y34 y123 y124 y134 y234 y1234 ; P24 ¼ y2 y4 y12 y14 y23 y24 y34 y123 y124 y134 y234 y1234 ; P34 ¼ y3 y4 y13 y14 y23 y24 y34 y123 y124 y134 y234 y1234 ; P1234 ¼ y1 y2 y3 y4 y12 y13 y14 y23 y24 y34 y123 y124 y134 y234 y1234 ; P123 ¼ P1234 =y4 ;

3. Standby systems with common cause failures Standby safety systems are normally dormant but expected to be activated when a true demand like a fire, a loss of power, a loss of coolant or some other initiating event

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

occurs. One or more safety system components can fail with some probability due to stresses caused by such events. The unavailability (failure probability) analysis of the system due to time-independent component failure probabilities and CCF probabilities is described in Section 3.1. Safety system components can also fail while on standby before any initiating event occurs, due to time-related stresses like corrosion, wear, vibration, temperature or other environmental factors. The system is tested or inspected periodically with regular intervals to discover and repair such failures before any true demand occurs. Initiating events are assumed to occur at a random time, as rare events uniformly distributed over test intervals. One may be interested in calculating the standby safety system unavailability as a function of time. To avoid calculations at multiple points in time one may be interested only in the time-average unavailability of a system. The main objective of this paper is to determine the probabilities of the basic events so that the correct time-average can be obtained with a single calculation. This can be obtained as the sum of the time-average unavailabilities of mcs (or more accurately by the inclusion – exclusion principle). The focus in the following is to obtain correct time-average joint-event unavailabilities xij· · · ¼ PðXi > Xj· · · Þ needed in the mcs of the implicit method, and corresponding basic event unavailabilities zij· · · ¼ PðZij· · · Þ needed in the mcs of the explicit method. Section 3.2 is devoted to the case of simultaneous testing of all n redundant trains with intervals T. These results also apply when the components are tested consecutively in a single testing episode, and the duration of a test is short compared to the test interval. Uniformly staggered testing scheme is analysed in Section 3.3 with a certain extra testing/repair rule, and in Section 3.4 without any extra tests/repairs. Economic optimisation is demonstrated in Section 3.5. 3.1. Failures caused by true demands If CCFs are caused by true demands, it is reasonable to consider CCF explicitly. In this section, consider the special case of identical time-independent components and symmetric probabilities (e.g. z12 ¼ z23 ¼ z13 ; as usually assumed for n ¼ 3), i.e. zi ¼ q1=n ;

zij ¼ q2=n ;

zijk ¼ q3=n ;

zijkm ¼ q4=n ; …; ð8Þ

1 # i # n;

i , j # n;

j , k # n;

k , m # n; …;

where qk=n ¼ P{failure of exactly specific k out of n components due to a demand}. One should always remember that the probabilities depend on n, although this is not indicated by the subindices of z‘s. The fact that z12 is not the same for all n ðq2=2 – q2=3 – q2=4 Þ becomes important if one needs to estimate probabilities for a system with one n based on failure data collected from systems with different n.

47

Equations for joint-probabilities xij· · · ¼ PðXi > Xj · · ·) are needed in implicit models because logic products appear in mcs. They can be solved in terms of zkm· · · in a straightforward way from Eqs. (1), (3), (5) and (6), because Xi > Xj · · · is an intersection of OR-gates like Fig. 2. Under the commonly valid assumption that all zkm· · · p 1 (rare event approximation), we obtain n¼2:

n¼3:

{x1 ¼ x2 ¼ q1=2 þ q2=2 ; x12 ¼ q21=2 þ q2=2 ; 8 x1 ¼ x2 ¼ x3 ¼ q1=3 þ 2q2=3 þ q3=3 ; > > < x12 ¼ x13 ¼ x23 ¼ q21=3 þ q2=3 þ q3=3 ; > > : x123 ¼ q31=3 þ 3q1=3 q2=3 þ 3q22=3 þ q3=3 ;

n¼4: 8 i ¼ 1; 2; 3; 4; xi ¼ q1=4 þ 3q2=4 þ 3q3=4 þ q4=4 ; > > > > 2 > > > xij ¼ q1=4 þ q2=4 þ 2q3=4 þ q4=4 ; > > > > i ¼ 1; 2; 3; i , j # 4; > > < xijk ¼ q31=4 þ 3q1=4 q2=4 þ 6q22=4 þ q3=4 þ q4=4 ; > > > > i ¼ 1; 2; i , j , k # 4; > > > > > > x1234 ¼ q41=4 þ 6q21=4 q2=4 þ 3q22=4 þ 4q1=4 q3=4 > > > : þ12q2=4 q3=4 þ 6q23=4 þ q4=4 :

ð9Þ

These are valid without any conditions for the relative magnitudes of zi· · · =zjk· · · or qk=n =qj=m : The same formalism also applies in the following situations: (1) CCFs are design or installation weaknesses and remain undetected by periodic tests. Examples include temporary flow-strainers left in pipelines during plant construction, and incorrect dimensioning of pipe supports. These may be modelled by constant unavailability values qk=n as illustrated with an auxiliary feedwater system pilot study [8] in 1980 (later called ‘basic parameter’ modelling). (2) Failures develop in periodic tests, not only in true demands, and the tests are performed simultaneously (or nearly so) for redundant trains. Failures are discovered and repaired at the tests. Then qk=n is the failure probability of specific k trains per one testing episode. There are two possible mechanisms covered by this model: (a) A failure occurs due to a test and is discovered and repaired immediately. From the point of view of a true demand, the faulty state existed over a whole test interval (because if a true demand occurred at any time after the preceding test, the failure would have appeared at the true demand). (b) An error or a failure occurs at the end of a test mission and remains undetected until the next test. Again the faulty state exists for a true demand over a whole test

48

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

interval and qk=n is the fraction of time such failure is present. One limitation of the above formalism is that it assumes completely identical units and symmetry, even if in practice nominally identical units can have quite different rates or probabilities. However, it is no problem in principle to use specific values zij· · · explicitly (without the symmetry assumption) if data for such is available. Another feature is that probabilities per demand do not indicate any dependence on the length of a test interval or on staggering of the tests. Using fixed probabilities per demand makes it impossible to optimise test intervals: extending the interval to infinity (i.e. terminating the weekly or monthly testing practice that is common at all nuclear power plants) would not change the risk in this formalism. The models in the following sections avoid such drawbacks. 3.2. Simultaneous or consecutive testing 3.2.1. Time-dependent analysis Consider failures that are not caused by demands but are generated while the system is on standby. The probability of a certain CCF event Zij· · · in a small time interval dt is determined by general multiple-failure rates li ; lij ; lijk ; etc. so that lij· · · dt is the probability of failure of exact components i; j; … in dt due to a common cause. These remain latent in a standby system until discovered by a scheduled test. This model is quite general because it assumes no conditions or relations between general multifailure rates. Thus, the name GMFR model is appropriate. One should bear in mind that the rates depend on n, e.g. l12 is not the same for n ¼ 2; 3 and 4. The group size n is indicated separately in the results rather than by an additional sub-index. The quantity of interest is the probability of the system failed state at the time of an initiator. This is the timedependent system unavailability, or the probability of failure on demand. It is basically a periodic function when test-cycles of length T are repeated. With simultaneous testing of the components and counting time t from one test episode to the next, the time-dependent explicit model unavailabilities are zij· · · ¼ PðZij· · · Þ ¼ lij· · · t;

0 # t , T;

ð10Þ

since throughout this paper we assume the normal situation lij· · · T p 1 for any failure rates. The time-dependent joint-probabilities xij· · · for the implicit method can be obtained in a straightforward way from Eqs. (1), (3), (5) and (6). In the special case of identical trains and complete symmetry, one can use Eq. (9) for each value n with qk=n ¼ lk=n t;

0 # t , T;

where l1=n ¼ li ; l2=n ¼ lij ; l3=n ¼ lijk ; etc.

ð11Þ

3.2.2. Time-average analysis A detailed time-dependent unavailability is not always needed or economically affordable. Since a true demand (initiating event) can occur uniformly at any time during the test interval, it is sufficient to determine the time-average unavailability of the system. This can be obtained by first calculating the time-average unavailabilities of the mcs, and then the system unavailability. This principle applies equally to implicit and explicit models. A cut set consists of a logic product of basic events, but the time-average probability of such product is not equal to the product of time-average event probabilities. This is why an explicit model using event probabilities zij· · · equal to time-average probabilities 1 ðT ð1 2 e2lij· · · t Þdt ø 12 lij· · · T ð12Þ T 0 does not yield correct results for a system. Instead, accurate results can be obtained with the implicit method using timeaverage values of the joint probabilities in cut sets and in system quantification. Substituting the time-dependent unavailability Eq. (11) to Eq. (9) and averaging over a test interval T yields, for the special case of identical components and symmetry, n ¼ 2 : x1 ¼ x2 ¼ ð1=2Þl1=2 T þð1=2Þl2=2 T; x12 ¼ ð1=2Þl2=2 T þð1=3Þðl1=2 TÞ2 ; n¼3: 8 x1 ¼ x2 ¼ x3 ¼ ð1=2Þl1=3 T þ l2=3 T þð1=2Þl3=3 T; > > > > > < x12 ¼ x13 ¼ x23 ¼ ð1=2Þl3=3 T þð1=2Þl2=3 T þð1=3Þðl1=3 TÞ2 ; > > x123 ¼ ð1=2Þl3=3 T þðl2=3 TÞ2 þðl1=3 TÞðl2=3 TÞ > > > : þð1=4Þðl1=3 TÞ3 ; n¼4: 8 x1 ¼ x2 ¼ x3 ¼ x4 ¼ ð1=2Þl1=4 T > > > > > > þð3=2Þl2=4 T þ ð3=2Þl3=4 T þ ð1=2Þl4=4 T; > > > > > x ¼ ð1=2Þl T þ l T þ ð1=2Þl T þ ð1=3Þðl TÞ2 ; > ij 2=4 3=4 4=4 1=4 > > > > > 1 # i , j # 4; > > < xijk ¼ ð1=2Þl3=4 T þ ð1=2Þl4=4 T þ 2ðl2=4 TÞ2 > > > > > þðl1=4 TÞðl2=4 TÞ þ ð1=4Þðl1=4 TÞ3 ; 1 # i , j , k # 4; > > > > > x1234 ¼ ð1=2Þl4=4 T þ 2ðl3=4 TÞ2 þ 4ðl2=4 TÞðl3=4 TÞ > > > > > > þð4=3Þðl1=4 TÞðl3=4 TÞ þ ðl2=4 TÞ2 þ ð3=2Þðl1=4 TÞ2 > > > : £ðl2=4 TÞ þ ð1=5Þðl1=4 TÞ4 : ð13Þ Here x12 for n ¼ 2; x123 for n ¼ 3 and x1234 for n ¼ 4 are consistent with the 1=n-system unavailabilities presented in Ref. [12, Table 1]. The next task is to determine consistent probabilities zi· · · of the explicit model so that the correct time-average system unavailability results. These are obtained from the implicit

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

model (Eq. (13)) by using the transformation Eqs. (2), (4) and (7). The following results may also be verified by Eq. (9) with q1=n replaced by the corresponding zi ; q2=n by zij ; etc.: n¼1: n¼2:

z1 ¼ ð1=2Þl1=1 T; z1 ¼ z2 ¼ ð1=2Þl1=2 T;

z12 ¼ ð1=2Þl2=2 T þ ð1=12Þðl1=2 TÞ2 ; n¼3: 8 z12 ¼ z13 ¼ z23 ¼ ð1=2Þl2=3 T z ¼ z2 ¼ z3 ¼ ð1=2Þl1=3 T; > < 1 2 þð1=12Þðl1=3 TÞ ; ; > : 2 z123 ¼ ð1=2Þl3=3 T þ ð1=4Þðl2=3 TÞ þ ð1=4Þðl1=3 TÞðl2=3 TÞ; n¼4: 8 zi ¼ ð1=2Þl1=4 T; i ¼ 1;2;3;4; > > > > > zij ¼ ð1=2Þl2=4 T þ ð1=12Þðl1=4 TÞ2 ; 1 # i , j # 4; > > > > < z ¼ ð1=2Þl T þ ð1=4Þðl TÞ2 þ ð1=4Þðl TÞðl TÞ; ijk 3=4 2=4 1=4 2=4 > 1 # i , j , k # 4; > > > > > z1234 ¼ ð1=2Þl4=4 T þ ð1=3Þðl3=4 TÞðl1=4 TÞ þ ðl3=4 TÞðl2=4 TÞ > > > : þð1=2Þðl3=4 TÞ2 þ ð1=4Þðl2=4 TÞ2 2 ð1=120Þðl1=4 TÞ4 : ð14Þ Equations for non-identical components are given in Appendix C. It is somewhat inconvenient that the basic event probabilities zij··· are not linear in terms of lij··· T if the products of lower order CCF unavailabilities are not small compared to the linear higher order terms. Using the simple Eq. (12) instead of Eq. (14) would lead to optimistic results. For example, if a 1/4-system with consecutive testing has no CCFs, the system unavailability term is x1234 ¼ ð1=5Þðl1=4 TÞ4 : A complete explicit model with zij··· from Eq. (14) yields this result. However, if one takes only the linear terms from Eq. (14), the result is ð1=16Þðl1=4 TÞ4 ; by a factor of 3 too low. 3.3. Staggered testing with CCF repairs at first failure discovery As indicated by Eq. (12), the average residence time of any failure is about T/2 with simultaneous testing. Uniformly staggered testing of n parallel components means that there is a time delay T=n between consecutive tests of different components, and each component is tested at intervals T. With staggered testing, the average residence time of a CCF is generally shorter than with simultaneous testing, especially if there are extra tests and repairs whenever one failure is discovered. Staggering makes component unavailabilities mutually less correlated than simultaneous testing. Derivation of time-average values xij· · · and the corresponding values zij· · · with staggered tests are presented in Appendix A for the general case of non-identical components and non-symmetric rates for n ¼ 2; 3 and 4 under the following assumption.

49

CCF-identification and repair assumption (CIRA): A CCF of components i; j; … is discovered, identified and repaired at the time of the very first test in which any of these components is tested. Other components (not failed by the same CCF) are not repaired at that time. This group-repair policy does not disturb the mutual independence of events, which is a key condition to facilitate analytic solution. The results will be justified later for a more realistic policy. When the components are identical and the failure rates symmetric ðli ¼ l1=n ; lij ¼ l2=n ; lijk ¼ l3=n ; l1234 ¼ l4=n Þ; the probabilities of Appendix A reduce to 8 x ¼ ð1=2Þl1=2 T þ ð1=4Þl2=2 T; i ¼ 1; 2; > > > i > < x12 ¼ ð1=4Þl2=2 T þ ð5=24Þðl1=2 TÞ2 ; n¼2: > zi ¼ ð1=2Þl1=2 T; i ¼ 1; 2; > > > : z12 ¼ ð1=4Þl2=2 T 2 ð1=24Þðl1=2 TÞ2 ; n¼3: 8 xi ¼ ð1=2Þl1=3 T þ ð5=9Þl2=3 T þ ð1=6Þl3=3 T; i ¼ 1; 2; 3; > > > 2 > > ¼ ð1=6Þ l T þ ð5=18Þ l T þ ð2=9Þð l TÞ ; x 3=3 2=3 1=3 > < ij i ¼ 1; 2; 3; j ¼ 1; 2; 3; j – i; > > > > x123 ¼ ð1=6Þl3=3 T þ ð1=3Þðl2=3 l1=3 ÞT 2 þ ð2=9Þðl2=3 TÞ2 > > : þð1=12Þðl1=3 TÞ3 : 8 zi ¼ ð1=2Þl1=3 T; i ¼ 1; 2; 3; > > > > < zij ¼ ð5=18Þl2=3 T 2 ð1=36Þðl1=3 TÞ2 ; i ¼ 1; 2; 3; > j ¼ 1; 2; 3; j – i; > > > : z123 ¼ ð1=6Þl3=3 T 2 ð1=12Þl2=3 l1=3 T 2 2 ð1=108Þðl2=3 TÞ2 : n¼4: 8 > xi ¼ ð1=2Þl1=4 T þ ð7=8Þl2=4 T þ ð9=16Þl3=4 T þ ð1=8Þl4=4 T; > > > > > i ¼ 1;2;3;4; > > > > > x12 ¼ x14 ¼ x23 ¼ x34 ¼ ð1=8Þl4=4 T þ ð3=8Þl3=4 T > > > > > þð5=16Þl2=4 T þ ð23=96Þðl1=4 TÞ2 ; > > > > > x ¼ x24 ¼ ð1=8Þl4=4 T þ ð3=8Þl3=4 T þ ð1=4Þl2=4 T > > 13 > > þð5=24Þðl1=4 TÞ2 ; > > > > > x123 ¼ x124 ¼ x134 ¼ x234 ¼ ð1=8Þl4=4 T þ ð3=16Þl3=4 T > > > > > þð59=128Þðl2=4 TÞ2 þ ð3=8Þl1=4 l2=4 T 2 þ ð3=32Þðl1=4 TÞ3 ; > > > > < x1234 ¼ ð1=8Þl4=4 T þ ð7=32Þðl3=4 TÞ2 þ ð19=32Þl3=4 l2=4 T 2 > þð29=96Þl3=4 l1=4 T 2 þ ð3=16Þðl2=4 TÞ2 þ ð107=384Þ > > > > > £ðl2=4 TÞðl1=4 TÞ2 þ ð251=7680Þðl1=4 TÞ4 ; > > > > > > > zi ¼ ð1=2Þl1=4 T; i ¼ 1;2;3;4; > > > z12 ¼ z14 ¼ z23 ¼ z34 ¼ ð5=16Þl2=4 T 2 ð1=96Þðl1=4 TÞ2 ; > > > > > z13 ¼ z24 ¼ ð1=4Þl2=4 T 2 ð1=24Þðl1=4 TÞ2 ; > > > > 2 > > zijk ¼ ð3=16Þl3=4 T þ ð5=256Þðl2=4 TÞ 2 ð1=16Þðl2=4 TÞðl1=4 TÞ; > > > > k . j . i; > > > > > z1234 ¼ ð1=8Þl4=4 T þ ð1=128Þðl3=4 TÞ2 2 ð1=16Þl3=4 l2=4 T 2 > > > : 2ð7=96Þl l T 2 2 ð9=128Þðl TÞ2 2 ð1=1920Þðl TÞ4 : 3=4 1=4 2=4 1=4

50

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

It is interesting to note that the explicit probabilities are not necessarily symmetric ðz12 – z13 for n ¼ 4Þ even if the rates were symmetric (i.e. l12 ¼ l13 ¼ l2=4 ). Related earlier results are scarce in the literature. But at least one can verify that x12 for n ¼ 2 is equal to the 1-out 1=2 in Ref. [12, of-2:G system time-average unavailability U  Table 3]. Similarly, x123 for n ¼ 3 equals U1=3 and x1234 for  1=4 in Ref. [12, Table 3]. n ¼ 4 equals U Approximate expressions can be obtained by considering the average residence times of each failure combination under the testing scheme. The residence time of a CCF ends with the very first test that finds a failed component belonging to the CCF group. Under the condition li· · · T p 1; the explicit model event unavailabilities derived in this way are equal to the linear terms of Appendix A for a general (nonsymmetric) case:

n$1:

zi . ð1=2Þli T;

n¼2:

z12 . ð1=4Þl12 T;

n¼3:

i ¼ 1; 2; …; n;

zij . ð5=18Þlij T;

1 # i , j # 3;

z123 ¼ ð1=6Þl123 T;

n¼4: 8 z12 . ð5=16Þl12 T; > > > > > z . ð5=16Þl34 T; > < 34 z13 . ð1=4Þl13 T; > > > > zijk . ð3=16Þlijk T; > > : 1 # i , j , k # 4;

z23 . ð5=16Þl23 T; z14 . ð5=16Þl14 T

ð15Þ

z24 . ð1=4Þl24 T; z1234 . ð1=8Þl1234 T:

These are accurate only if the products of lower order CCFunavailabilities are small compared to the linear higher order terms. Comparing with the results for simultaneous testing Eq. (14), one can conclude that staggered testing is better for all k/n-systems ðk , nÞ: The complete (n-fold) CCF terms are three to four times smaller with staggered testing than with consecutive testing, for n ¼ 3 and 4. One might consider the above CIRA-assumption somewhat unrealistic or academic, as it should ideally recognise n¼4: 8 z12 ¼ ð5=16Þl12 T þ ð1=8Þðl123 þ l124 ÞT þ ð1=16Þl1234 T; > > > > > < z14 ¼ ð5=16Þl14 T þ ð1=8Þðl124 þ l134 ÞT þ ð1=16Þl1234 T; > z24 ¼ ð1=4Þl24 T þ ð1=16Þðl124 þ l234 ÞT; > > > > : z ¼ ð3=16Þl T þ ð1=16Þl T; 1 # i , j , k # 4: ijk ijk 1234 each CCF and fail to repair some other failures. A more realistic policy is as follows: Extra testing and repair rule (ETRR): Whenever a component is found failed, all the other n 2 1 trains are also tested and all failed components are repaired.

With this rule the residence times of all failures are equal or shorter than with CIRA. Consequently, all values xij· · · and zij· · · are somewhat smaller with ETRR than with CIRA. The difference is small because multiple events are rare in a test cycle ðlij· · · T p 1Þ and therefore only a small fraction of failures is not repaired at first opportunity with CIRA. Furthermore, most system failures are discovered and removed at the first opportunity also under CIRA. Thus, the results obtained above (and in Appendix A) can be used as good slightly conservative approximations under the realistic policy ETRR. 3.4. Staggered testing without extra tests Another possibility with staggered testing is as follows: Individual testing and repair policy (ITRP): Components are tested and repaired individually with regular intervals T. No other component is tested immediately even if one is found to be failed. Exact analysis is rather complicated because a triple failure changes to a double failure in one test/repair, and a double failure to a single failure, for example. Detailed results are derived in Appendix B for the general case of non-identical components and non-symmetric rates for n ¼ 2 and 3. For n ¼ 4 reasonable linear terms for the explicit model are obtained with the concept of residence times, replacing rather tedious exact calculations. The residence time approach takes into account, for example, that due to individual repairs l134 contributes not only to z134 but also to z13, z14, z34 and even z1, z3 and z4. Taking into account the linear terms in Appendices A and B and the effect of rare events ðlij· · · T p 1Þ; the following conclusions can be drawn: † For all k=n-systems with k , n; n ¼ 2; 3, 4, the linear probabilities zij· · · are the same with ITRP and CIRA, except the following cases for ITRP: n¼3:

zij ¼ ð1=9Þl123 T þ ð5=18Þlij T;

1 # i , j # 3;

z13 ¼ ð1=4Þl13 T þ ð1=16Þðl123 þ l134 ÞT; z23 ¼ ð5=16Þl23 T þ ð1=8Þðl123 þ l234 ÞT þ ð1=16Þl1234 T; ð16Þ z34 ¼ ð5=16Þl34 T þ ð1=8Þðl134 þ l234 ÞT þ ð1=16Þl1234 T;

Again, some probabilities would not be symmetric ðz12 – z13 Þ even if the rates were symmetric. The additional linear terms in ITRP compared to CIRA do not contribute much in 1/n-systems but they can be very essential in 2/3-, 2/4- and 3/4-systems.

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

51

Table 1 The coefficients am=n ø Um=n ðTÞ=T System m=n

Consecutive testing

Staggered testing with extra tests (CIRA)

Staggered testing and repair individually (ITRP)

1/2 1/3 2/3 1/4 2/4 3/4

(1/2)l2/2 (1/2)l3/3 (1/2)l3/3 þ (3/2)l2/3 (1/2)l4/4 (1/2)l4/4 þ 2l3/4 (1/2)l4/4 þ 2l3/4 þ 3l2/4

(1/4)l2/2 (1/6)l3/3 (1/6)l3/3 þ (5/6)l2/3 (1/8)l4/4 (1/8)l4/4 þ (3/4)l3/4 (1/8)l4/4 þ (3/4)l3/4 þ (7/4)l2/4

(1/4)l2/2 (1/6)l3/3 (1/2)l3/3 þ (5/6)l2/3 (1/8)l4/4 (3/8)l4/4 þ (3/4)l3/4 (5/8)l4/4 þ 2l3/4 þ (7/4)l2/4

3.5. Economic optimisation As a simple example of optimisation consider a standby m=n-system designed in respond to initiating events (demands) that occur with frequency f. The cost of testing one train is CT, and the cost of repair CR. The system fails to respond properly with probability Um=n ðTÞ (in the average), and in such a case an accident would occur with cost CA. Assuming that failures can occur only when the system is up (available), the total average cost rate as ! n X n nCT þ fUm=n ðTÞCA þ ½1 2 Um=n ðTÞ k lk=n CR cðTÞ ¼ T k k¼1 ð17Þ

n because Lk=n ¼ l is the total rate of k-multiple k k=n failures. Since Um=n ðTÞ is a regular increasing function of the test interval T, an optimal T exists that minimises cðTÞ: To get an idea about the role of different parameters in this optimisation, consider a typical case such that the linear common cause terms dominate the system unavailability, Um=n ðTÞ ø am=n T;

ð18Þ

where the constant factor am=n depends on system configuration and testing scheme. The values of am=n can be calculated using Eqs. (13) – (16). For example, U2=3 ¼ z123 þ z12 þ z13 þ z23 þ z1 z2 þ z1 z3 þ z2 z3 in terms of the explicit model and rea. In the symmetric case, the results are as listed in Table 1. These results show that staggered testing under ITRP without extra tests has no advantage over consecutive testing for 2/3-, 2/4- and 3/4-systems, but is as good as CIRA (ETRR) for 1/n-systems. Different kinds of modelling can show that staggering is generally beneficial, even without extra tests, to avoid repeating possible human errors [7]. With Eqs. (17) and (18) the optimum test interval is 2 31=2 6 6 T¼6 6 4

" am=n fCA 2

nCT Xn k¼1

n k k

!

lk=n CR

7 7 #7 7 : 5

ð19Þ

The optimal T typically increases with redundancy (n ) and with the testing cost CT. It decreases with increasing

initiator frequency f and accident cost CA. It is not necessarily sensitive to the repair cost even if repairs can be significant in Eq. (17). Eq. (19) shows that no testing makes sense (optimal T ¼ 1) if the repair cost rate with the system up is not smaller than the accident cost rate with the system down. If there is an administrative upper limit for Um=n ; it may dictate an interval T shorter than Eq. (19). In practice the plant-level optimisation problem is often more involved with multiple systems, initiators and intervals [9]. If the unavailability is modelled with probabilities per demand qk=n independent of T, an unrealistic optimal test interval T ¼ 1 results from Eq. (17). 3.6. Implicit versus explicit modelling Let us now compare some of the features of the implicit and explicit modelling techniques. As an example, consider a 2/3-system. The unavailability with the implicit model and rea is U2=3 ¼ x12 þ x13 þ x23 : With staggered testing and CIRA, this is ð1=2Þl3=3 T þ ð5=6Þl2=3 T: This can be up to three times the correct value found in Table 1 if l3=3 . l2=3 : Only with higher order terms the implicit model equation U2=3 ¼ x12 þ x13 þ x23 2 2x123 yields the same accuracy as the explicit model with rea, U2=3 ¼ z123 þ z12 þ z13 þ z23 þ z1 z2 þ z1 z3 þ z2 z3 : Within rea the explicit model is generally more accurate than the implicit model. Even if the implicit model has generally less terms (mcs) than the explicit model, it cannot be trusted to yield accurate enough results without higher order terms. 3.7. Synthesis of models It is quite possible that a system has both demand-related CCFs and time-related CCFs. This can be taken into account in fault tree models by replacing a basic event Zij· · · by a ð0Þ ð1Þ ð0Þ ð0Þ union (OR-gate) Zij· · · ¼ Zij· · · < Zij· · · ; where zij· · · ¼ P½Zij· · ·  ð1Þ ð1Þ is given by Eq. (8), while zij· · · ¼ P½Zij· · ·  is given by Eqs. (14) – (16) (or Appendices A, B and C), depending on testing and repair policies. It remains a future effort to deepen empirical data evaluations enough to identify and separate such causes, allowing estimation of parameters for both models at the same time.

52

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

It is also possible that multiple CCF-susceptibilities or types of causes are involved. A group of components may have the same environment and a subset of components susceptible to similar maintenance, etc. One could include all types of causes in each rate lij· · · : Or one could model different cause-types separately as different failure modes of each component, and introduce OR-gates to combine CCFevents of different cause-types.

4. Estimation of GMFR Estimation of the GMFR parameters lij· · · from an unambiguous empirical failure data is presented briefly for a single plant in Section 4.1 and for multiple plants in Section 4.2. A method to handle assessment uncertainties is presented in Section 4.3, making it possible to use generally available empirical Bayes estimation tools even when data is tainted with uncertainties. Notation: Nk=n Nk=n ðlÞ Tn Tn ðlÞ

number of events in which any k out of n components fail due to a common cause in plant (system) under study Nk=n of plant (system) l observed exposure time of a group of n components in plant (system) under study Tn of plant (system) l

4.1. Data on plant under study Consider first a single plant under study, having event data Nk=n available on system size n over observation time Tn : Under the assumption that failures are rare events over a test interval, Lk=n Tn p 1; the observable Nk=n in time Tn obeys Poisson distribution with mean value Lk=n Tn : As a compromise between the classical upper and lower confidence limits, one has ! n Lk=n ¼ lk=n / x2 ð2Nk=n þ 1Þ=ð2Tn Þ: ð20Þ k In other words, Lk=n has a gamma distribution with mean value ðNk=n þ 1=2Þ=Tn and variance ðNk=n þ 1=2Þ=Tn2 : In the Bayesian formalism, the same results as a posterior distribution when the likelihood is Poisson and the prior 21=2 density is non-informative proportional to , L

k=n : For the n distribution of lk=n one can multiply Tn by in the above k equations, or divide the

mean value and the standard n deviation of Lk=n by : k 4.2. Data on multiple plants Consider a family of plants with the same redundancy n in the system of interest. When the plant under study has a

limited experience of CCF events, there is a motivation to utilise data from other plants to estimate CCF-rates. A simplistic way would be to assume complete identity of plants and lump together all experience, using the sum of all k=n-events as Nk=n and the sum of observation times as Tn with the estimation technique of Section 4.1. A more realistic way is to use some empirically oriented Bayes estimation (EBE) method to combine data from several sources and estimate a source (prior) distribution from which the plant/system specific data is a random sample. If the prior Lk=n is a gamma density with mean a=b and variance a=b2 ; the plant specific posterior has a mean value ðNk=n þ aÞ=ðTn þ bÞ and variance ðNk=n þ aÞ=ðTn þ bÞ2 : There are several ways to estimate a and b based on data {Nk=n ðlÞ; Tn ðlÞ} collected on L plants, l ¼ 1; 2; …; L; among them an easy moment-matching method [14 – 16]. The same procedure applies for lk=n when each Tn ðlÞ is replaced with

n T ðlÞ: k n 4.3. Uncertain and soft data Often event data at a plant has interpretation or observation uncertainties so that one can only assign weights wk=n ðn; lÞ indicating how likely event number n in a system with n redundant components at plant l had exactly k components failed by a common cause. The mean value and the variance of a plant-specific estimator of Lk=n have been derived in Ref. [10] in terms of the weights for the observed events at the plant. However, such data is not directly suitable input to most current EBE processes. The EBE-codes usually require as input the data pairs ½Nk=n ðlÞ; Tn ðlÞ; which only exist in case of unambiguous data. Fortunately, with uncertainties one can determine ‘effective’ data pairs ½N^ k=n ðlÞ; T^ k=n ðlÞ for each plant l so that ½N^ k=n ðlÞ þ d=T^ k=n ðlÞ and ½N^ k=n ðlÞ þ d=T^ k=n ðlÞ2  correspond to the plant specific mean value and variance obtained with the weights, respectively. By equating the moments one can derive the following equivalent data pairs XN 2 X N n n 2 n¼1 wn þd n¼1 wn ^ Nk=n ¼ XNn ; n¼1 wn ð2 2 wn Þ þ d ð21Þ X Nn wn þ d T^ n ¼ XNn n¼1 Tn ; n¼1 wn ð2 2 wn Þ þ d where Nn is the true total number of observations at plant l in time Tn ; wn ¼ wk=n ðn; lÞ; and d is a user-specified parameter between 0 and 1, normally d ¼ 1=2: These effective values can be used as data in the empirical Bayesian estimation process described in Section 4.2 to end up with the final posterior estimate for the plant of interest. If all weights are either 1 or 0, Eq. (21) yields the actual time Tn and the number of events. Numerical examples of a pilot study using this technique are given elsewhere [17,18].

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

5. Comments and conclusions Analytical relationships and expressions have been developed for the basic event probabilities of explicit fault tree models and for the joint-probabilities of implicit models when CCFs can occur at random times in redundant standby safety systems. Results are now available for three different testing schemes: (1) simultaneous (consecutive), (2) staggered without extra tests, and (3) staggered with extra tests and repairs. Earlier results have been developed further by general transformation equations, (a) incorporating higher order non-linear terms, and (b) extending equations to general failure rates (without the symmetry assumption). A limited number of earlier results was available to compare with and confirm new results. An important advantage of the GMFR model is that it allows comparative studies on different testing schemes, and optimisation of test intervals. The optimal interval typically increases with redundancy and with the testing cost. It decreases with increasing demand frequency and accident cost. Comparisons of implicit and explicit methods have shown that even if the implicit model has less terms (mcs), it cannot be trusted to be accurate enough within rea. Because most fault tree codes use rea and assume s-independence of the basic events, explicit modelling is recommended for large systems. A technique has been developed for incorporating assessment uncertainties in common empirical Bayes estimation formalisms to combine information from many plants. The method is practical but approximate, as it is based on matching the first two moments of uncertainty distributions. The procedure was limited to a group of plants (systems) that have the same redundancy n. How to extrapolate and use data between component groups of different sizes is a separate issue for future studies with alternative assumptions [12]. Other well-known CCF parameters such as b- or afactors or multiple Greek letters could be easily presented in terms of GMFR model rates and probabilities (and they would be different for different testing and repair policies). This was not considered necessary because the CCF event probabilities as derived here are what a fault tree computer code can use directly as input, without first computing other ratios or parameters. As much as has been written about CCF and explicit parametric models [13], it seems that complete testing and timing-dependent probabilities have never been available or used so far for CCF event probabilities in a full scale PSA. A simplified form has been used [14,16] by including plantspecific individual single failure events ðZi Þ and one ‘macro’ basic event ðZs Þ that fails the system. For a 1/3-system the probability of Zs would be approximately z123, for a 2/3 system z123 þ z12 þ z13 þ z23 ; etc, generally for m=n-system PrðZs Þ . am=n T with am=n given by Table 1. This system failure rate model has fewer parameters to be estimated

53

from scarce field data. A more general hybrid method would be quite possible: to use individual rates and probabilities for single failures (possibly based on plant-specific data) and symmetry assumption for CCF (to combine effectively scarce data). More specialised models can be used in the future when more CCF-data become available through international cooperation such as the international common cause data exchange program being developed under OECD/NEA.

Appendix A. Event probabilities for staggered testing with CCF identification and repair With a uniformly staggered testing scheme in a system with n trains, the first component is tested at t ¼ 0; T; 2T; …; the second component at t ¼ T=n; T=n þ T; T=n þ 2T; …; the third one at t ¼ 2T=n; 2T=n þ T; 2T=n þ 2t; … The tests are marked by dots on the time-lines in Figs. 3 and 4. The following assumption is made when calculating the joint failure probabilities of any combinations of failures: All components failed in a CCF are repaired completely at the first test in which any of these failures is discovered. The saw-tooth curves in Figs. 3 and 4 are CCF unavailabilities. uij· · · ðtÞ ¼ P{Failed state of components i; j; …; at time t due to a cause failing exactly these components and no others}:

Fig. 3. Staggered testing scheme for n ¼ 2 trains. Single failure unavailabilities u1 ðtÞ and u2 ðtÞ; CCF unavailability u12 ðtÞ:

54

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

discovered. In a periodic system, we can limit our attention to a single period, 0 # t # T: For n ¼ 2 (Fig. 3), we can write u1 ðtÞ ¼ l1 t for the whole period, while u2 ðtÞ ¼ l2 ðt þ ð1=2ÞTÞ and u12 ðtÞ ¼ l12 t

for 0 # t , 1=2T;

u2 ðtÞ ¼ l2 ðt 2 ð1=2ÞTÞ and u12 ðtÞ ¼ l12 ðt 2 ð1=2ÞTÞ for ð1=2ÞT # t , T: Similar expressions can be developed for uij··· ðtÞ when n ¼ 3 (Fig. 4). These are the time-dependent CCF event probabilities that can be used by some computer codes. However, the task is to find probabilities that yield the correct time-average system unavailability in a single calculation. First, one has to solve the probabilities vij· ·· ðtÞ ¼ P{Failed state of components i;j;… at time t due to any causes}: Nothing is presumed about the other components in this definition. The implicit model event Ð probabilities are the time-average values Xij··· ¼ T1 T0 vij··· ðtÞ dt: In case n ¼ 2 with small probabilities, we have v1 ¼ u1 þ u12 ; v2 ¼ u2 þ u12 and v12 ¼ u12 þ u1 u2 : The time-average values are xi ¼ ð1=2Þli T þ ð1=4Þl12 T;

i ¼ 1; 2;

x12 ¼ ð1=4Þl12 T þ ð5=24Þl1 l2 T 2 : From these, the transformation Eq. (2) yield the explicit model probabilities Fig. 4. Staggered testing scheme for n ¼ 3 trains. CCF unavailabilities uij· · · ðtÞ:

zi ¼ ð1=2Þli T;

i ¼ 1; 2;

z12 ¼ ð1=4Þl12 T 2 ð1=24Þl1 l2 T 2 :

Similar principles are used for higher systems. The corresponding explicit model probabilities zij··· are derived for n ¼ 3 using Eq. (4), and for n ¼ 4 using Eqs. summary,

redundancy basic event components (5) – (7). In

Each uij· · · ðtÞ is determined completely by a single rate parameter as uij· · · ðtÞ ¼ lij· · · ðt 2 Tt Þ; where Tt is the last test time before t when this failure combination could be ( i ¼ 1; 2; x12 ¼ ð1=4Þl12 T þ ð5=24Þl1 l2 T 2 ; xi ¼ ð1=2Þli T þ ð1=4Þl12 T; n¼2: zi ¼ ð1=2Þli T; i ¼ 1; 2; z12 ¼ ð1=4Þl12 T 2 ð1=24Þl1 l2 T 2 : 8 x1 ¼ ð1=2Þl1 T þ ð5=18Þl12 T þ ð5=18Þl13 T þ ð1=6Þl123 T; > > > > > > > x2 ¼ ð1=2Þl2 T þ ð5=18Þl12 T þ ð5=18Þl23 T þ ð1=6Þl123 T; > > > > x3 ¼ ð1=2Þl3 T þ ð5=18Þl13 T þ ð5=18Þl23 T þ ð1=6Þl123 T; > > > > > < xij ¼ ð1=6Þl123 T þ ð5=18Þlij T þ ð2=9Þli lj T 2 ; i ¼ 1; 2; 3; j ¼ 1; 2; 3; j – i; n¼3: > > x123 ¼ ð1=6Þl123 T þ ð1=9Þðl12 l3 þ l13 l2 þ l23 l1 ÞT 2 þ ð2=27Þðl12 l13 þ l12 l23 þ l13 l23 ÞT 2 þ ð1=12Þl1 l2 l3 T 3 ; > > > > > zi ¼ ð1=2Þli T; i ¼ 1; 2; 3; > > > > > > zij ¼ ð5=18Þlij T 2 ð1=36Þli lj T 2 ; i ¼ 1; 2; 3; j ¼ 1; 2; 3; j – i; > > > : z123 ¼ ð1=6Þl123 T 2 ð1=36Þðl12 l3 þ l13 l2 þ l23 l1 ÞT 2 2 ð1=324Þðl12 l13 þ l12 l23 þ l13 l23 ÞT 2 :

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

n¼4:

8 x1 > > > > > > x2 > > > > > > x3 > > > > > > x4 > > > > >
¼ ð1=2Þl1 T þ ð5=16Þl12 T þ ð1=4Þl13 T þ ð5=16Þl14 T þ ð3=16Þðl123 þ l124 þ l134 ÞT þ ð1=8Þl1234 T; ¼ ð1=2Þl2 T þ ð5=16Þl12 T þ ð1=4Þl24 T þ ð5=16Þl23 T þ ð3=16Þðl123 þ l124 þ l234 ÞT þ ð1=8Þl1234 T; ¼ ð1=2Þl3 T þ ð5=16Þl23 T þ ð1=4Þl13 T þ ð5=16Þl34 T þ ð3=16Þðl123 þ l134 þ l234 ÞT þ ð1=8Þl1234 T; ¼ ð1=2Þl4 T þ ð5=16Þl14 T þ ð1=4Þl24 T þ ð5=16Þl34 T þ ð3=16Þðl124 þ l134 þ l234 ÞT þ ð1=8Þl1234 T;

12

> > x14 > > > > > > x23 > > > > > x34 > > > > > > > > x13 > > : x24

¼ ð1=8Þl1234 þ ð3=16Þðl123 þ l124 ÞT þ ð5=16Þl12 T þ ð23=96Þðl1 TÞðl2 TÞ; ¼ ð1=8Þl1234 þ ð3=16Þðl124 þ l134 ÞT þ ð5=16Þl14 T þ ð23=96Þðl1 TÞðl4 TÞ; ¼ ð1=8Þl1234 þ ð3=16Þðl123 þ l234 ÞT þ ð5=16Þl23 T þ ð23=96Þðl2 TÞðl3 TÞ; ¼ ð1=8Þl1234 þ ð3=16Þðl134 þ l234 ÞT þ ð5=16Þl34 T þ ð23=96Þðl3 TÞðl4 TÞ; ¼ ð1=8Þl1234 þ ð3=16Þðl123 þ l134 ÞT þ ð1=4Þl13 T þ ð5=24Þðl1 TÞðl3 TÞ;

¼ ð1=8Þl1234 þ ð3=16Þðl124 þ l234 ÞT þ ð1=4Þl24 T þ ð5=24Þðl2 TÞðl4 TÞ; 8 > x123 ¼ ð1=8Þl1234 T þ ð3=16Þl123 T þ ð1=384Þð35l12 l13 þ 41l12 l23 þ 26l12 l34 þ 29l13 l23 þ 20l13 l24 > > > > > þ26l14 l23 þ 47l12 l3 þ 44l13 l2 þ 53l23 l1 ÞT 2 þ ð3=32Þl1 l2 l3 T 3 ; > > > > > > x124 ¼ ð1=8Þl1234 T þ ð3=16Þl124 T þ ð1=384Þð35l14 l24 þ 41l12 l14 þ 26l14 l23 þ 29l12 l24 þ 20l13 l24 > > > > > > þ26l12 l34 þ 47l14 l2 þ 44l24 l1 þ 53l12 l4 ÞT 2 þ ð3=32Þl1 l2 l3 T 3 ; > > > > > > x134 ¼ ð1=8Þl1234 T þ ð3=16Þl134 T þ ð1=384Þð35l13 l34 þ 41l14 l34 þ 26l12 l34 þ 29l13 l14 þ 20l13 l24 > > > > > > þ26l14 l23 þ 47l34 l1 þ 44l13 l4 þ 53l14 l3 ÞT 2 þ ð3=32Þl1 l2 l3 T 3 ; > > < x234 ¼ ð1=8Þl1234 T þ ð3=16Þl234 T þ ð1=384Þð35l23 l24 þ 41l23 l34 þ 26l23 l14 þ 29l24 l34 þ 20l13 l24 > > > > > þ26l12 l34 þ 47l23 l4 þ 44l24 l3 þ 53l34 l2 ÞT 2 þ ð3=32Þl1 l2 l3 T 3 ; > > > > > > x1234 ¼ ð1=8Þl1234 T þ ð7=192Þðl123 l124 þ l123 l134 þ l123 l234 þ l124 l134 þ l124 l234 þ l134 l234 ÞT 2 > > > > > þ1=384½l123 ð20l14 þ 17l24 þ 20l34 þ 29l4 Þ þ l124 ð17l13 þ 20l23 þ 20l34 þ 29l3 Þ > > > > > > þl134 ð20l12 þ 20l23 þ 17l24 þ 29l2 Þ þ l234 ð20l12 þ 17l13 þ 20l14 þ 29l1 ÞT 2 > > > > > > þð1=192Þð13l12 l34 þ 10l13 l24 þ 13l14 l23 ÞT 2 þ ð1=1536Þð75l12 l3 l4 þ 64l13 l2 l4 þ 75l14 l2 l3 > > > > : þ75l23 l1 l4 þ 64l24 l1 l3 þ 75l34 l1 l2 ÞT 3 þ ð251=7680Þl1 l2 l3 l4 T 4 : 8 i ¼ 1; 2; 3; 4; zi ¼ ð1=2Þli T; > > > > > < z12 ¼ ð5=16Þl12 T 2 ð1=96Þðl1 TÞðl2 TÞ; z13 ¼ ð1=4Þl13 T 2 ð1=24Þðl1 TÞðl3 TÞ; > > z14 ¼ ð5=16Þl14 T 2 ð1=96Þðl1 TÞðl4 TÞ; z23 ¼ ð5=16Þl23 T 2 ð1=96Þðl2 TÞðl3 TÞ; > > > : z34 ¼ ð5=16Þl34 T 2 ð1=96Þðl3 TÞðl4 TÞ: z24 ¼ ð1=4Þl24 T 2 ð1=24Þðl2 TÞðl4 TÞ; 8 z123 ¼ ð3=16Þl123 T þ ð1=768Þð10l12 l13 þ 7l12 l23 2 2l13 l23 2 26l12 l3 2 8l13 l2 2 14l23 l1 ÞT 2 ; > > > > > > z124 ¼ ð3=16Þl124 T þ ð1=768Þð10l14 l24 þ 7l12 l14 2 2l12 l24 2 26l14 l2 2 8l24 l1 2 14l12 l4 ÞT 2 ; > > > > > > z134 ¼ ð3=16Þl134 T þ ð1=768Þð10l34 l13 þ 7l34 l14 2 2l13 l14 2 26l34 l1 2 8l13 l4 2 14l14 l3 ÞT 2 ; > > > > > < z234 ¼ ð3=16Þl234 T þ ð1=768Þð10l23 l24 þ 7l23 l34 2 2l23 l34 2 26l23 l4 2 8l24 l3 2 14l34 l2 ÞT 2 ; > > z1234 ¼ ð1=8Þl1234 T þ ð1=768Þðl123 l124 þ l123 l134 þ l123 l234 þ l124 l134 þ l124 l234 þ l134 l234 ÞT 2 > > > > > > 2ð1=768Þ½l123 ð5l14 þ 2l24 þ 5l34 þ 14l4 Þ þ l124 ð2l13 þ 5l23 þ 5l34 þ 14l3 Þ > > > > > > þl134 ð5l12 þ 5l23 þ 2l24 þ 14l2 Þ þ l234 ð5l12 þ 2l13 þ 5l14 þ 14l1 ÞT 2 > > > : 2ð1=768Þð23l12 l34 þ 8l13 l24 þ 23l14 l23 ÞT 2 2 ð1=1920Þl1 l2 l3 l4 T 4 :

55

56

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

The linear terms of each zij··· are equal to Eq. (15). The linear terms are often reasonable approximations for the joint probabilities xij··· and for system unavailabilities.

Appendix B. Event probabilities with staggered testing and individual repairs

Expressions for v13 and v23 can be obtained by suitable rotations of indices. Time-averaging and transformations by Eq. (4) then yield the joint probabilities and explicit CCF probabilities. The results are summarised below for n ¼ 3 : x1 ¼ ð1=2Þðl1 þ l12 þ l13 þ l123 ÞT; x2 ¼ ð1=2Þðl2 þ l12 þ l23 þ l123 ÞT;

Under the ITRP, each component is repaired individually when found failed in a test. The tests are scheduled in the same way as in Appendix A (Figs. 3 and 4). However, the specific CCF unavailabilities uij· · · ðtÞ alone are not as useful as they are with CIRA. It seems to be easier to work with the probabilities vij· · · ðtÞ directly, taking into account that testing component 2 does not remove a failed state of component 1 even if both failures are due to a CCF with rate l12. Consider one test interval, 0 # t # T; beginning with a test of component 1. For n ¼ 2 and small probabilities we have v1 ðtÞ ¼ ðl1 þ l12 Þt and

x3 ¼ ð1=2Þðl3 þ l13 þ l23 þ l123 ÞT;

v2 ðtÞ ¼ ðl2 þ l12 Þðt þ ð1=2ÞTÞ and

z2 ¼ ð1=2Þl2 T þ ð1=9Þð2l12 þ 2l23 þ l123 ÞT;

v12 ðtÞ ¼ l12 t þ l1 tl2 ðt þ ð1=2ÞTÞ

z3 ¼ ð1=2Þl3 T þ ð1=9Þð2l13 þ 2l23 þ l123 ÞT;

for 0 # t , ð1=2ÞT;

zij ¼ ð1=9Þl123 T þ ð5=18Þlij T 2 ð1=36Þli lj T 2 ;

v2 ðtÞ ¼ ðl2 þ l12 Þðt 2 ð1=2ÞTÞ and

i ¼ 1; 2; j ¼ 2; 3; i – j;

v12 ðtÞ ¼ l12 ðt 2 ð1=2ÞTÞ þ l1 tl2 ðt 2 ð1=2ÞTÞ

z123 ¼ ð1=6Þl123 T 2 ð17=324Þðl12 l13 þ l12 l23 þ l13 l23 ÞT 2

xij ¼ ð5=18Þðlij þ l123 ÞT þ ð2=9Þli lj T 2 ; i ¼ 1; 2; j ¼ 2; 3; i – j; x123 ¼ ð1=6Þl123 T þ ð4=27Þðl12 l13 þ l12 l23 þ l13 l23 ÞT 2 þ ð1=9Þðl1 l23 þ l2 l13 þ l3 l12 ÞT 2 þ ð1=12Þl1 l2 l3 T 3 ; z1 ¼ ð1=2Þl1 T þ ð1=9Þð2l12 þ 2l13 þ l123 ÞT;

2 ð1=36Þðl1 l23 þ l2 l13 þ l3 l12 ÞT 2 :

for ð1=2ÞT # t , T:

x12 ¼ ð1=4Þl12 T þ ð5=24Þl1 l2 T 2 ;

Here x12 for n ¼ 2 and x123 for n ¼ 3 are consistent with the 1/n-system results of Table 2 in Ref. [12]. For n ¼ 4 the linear terms are obtained with the residence time approach directly for the explicit model events:

and Eq. (2) yields

z1 ¼ ð1=2Þl1 T þ ð1=16Þð3l12 þ 4l13 þ 3l14 þ 2l123

The time-averages of these are xi ¼ ð1=2Þðli þ l12 ÞT;

zi ¼ ð1=2Þli T þ ð1=4Þl12 T;

i ¼ 1; 2;

i ¼ 1; 2;

z12 ¼ ð1=4Þl12 T 2 ð1=24Þl1 l2 T :

þ l124 þ 2l134 þ l1234 ÞT;

2

For n ¼ 3 it is easy to conclude the average values x1 ¼ ð1=2Þðl1 þ l12 þ l13 þ l123 ÞT; x2 ¼ ð1=2Þðl2 þ l12 þ l23 þ l123 ÞT; x3 ¼ ð1=2Þðl3 þ l13 þ l23 þ l123 ÞT; but the expressions for vij and v123 are more complicated, e.g.

2 v12 ðtÞ ¼ ðl123 þ l12 Þt þ l1 tl2 t þ T 3 for 0 # t , ð1=3ÞT;

z2 ¼ ð1=2Þl2 T þ ð1=16Þð3l12 þ 4l24 þ 3l23 þ 2l124 þ l123 þ 2l234 þ l1234 ÞT; z3 ¼ ð1=2Þl3 T þ ð1=16Þð3l23 þ 4l13 þ 3l34 þ 2l123 þ l234 þ 2l134 þ l1234 ÞT; z4 ¼ ð1=2Þl4 T þ ð1=16Þð3l14 þ 4l24 þ 3l34 þ 2l124 þ l134 þ 2l234 þ l1234 ÞT; z12 ¼ ð5=16Þl12 T þ ð1=16Þð2l123 þ 2l124 þ l1234 ÞT; z13 ¼ ð1=4Þl13 T þ ð1=16Þðl123 þ l134 ÞT;





1 1 v12 ðtÞ ¼ ðl123 þ l12 Þ t 2 T þ l1 tl2 t 2 T 3 3

z14 ¼ ð5=16Þl14 T þ ð1=16Þð2l124 þ 2l134 þ l1234 ÞT;

for T=3 # t , T:

z24 ¼ ð1=4Þl24 T þ ð1=16Þðl124 þ l234 ÞT;

z23 ¼ ð5=16Þl23 T þ ð1=16Þð2l123 þ 2l234 þ l1234 ÞT;

J.K. Vaurio / Reliability Engineering and System Safety 79 (2003) 43–57

z34 ¼ ð5=16Þl34 T þ ð1=16Þð2l134 þ 2l234 þ l1234 ÞT; zijk ¼ ð3=16Þlijk T þ ð1=16Þl1234 T;

1 # i , j , k # 4;

z1234 ¼ ð1=8Þl1234 T: Note that zi ø ð1=2Þli T is a sufficient approximation as a single failure probability for all k/n-systems with k , n: The other terms contribute very little through multiplications in mcs because for any mcs with product zi zj there is a mcs with zij in place of zi zj :

Appendix C. Explicit event probabilities for consecutive and simultaneous testing, non-identical components and non-symmetric CCF-rates With simultaneous or consecutive testing the probabilities of the general events of the explicit models for n ¼ 1; 2; 3; 4 are as follows: zi ¼ 1=2li T; i ¼ 1; 2; …n;

n$1:

zij ¼ 1=2lij T þ 1=12ðli TÞðlj TÞ;

n$2:

1 # i , j # n; zijk ¼ 1=2lijk T þ 1=12ðlij lik þ lij ljk þ lik ljk ÞT 2

n$3:

þ 1=12ðli ljk þ lj lik þ lk lij ÞT 2 ; 1#i,j,k#n n¼4: z1234 ¼ 1=2l1234 T þ 1=12ðl1 l234 þ l2 l134 þ l3 l124 þ l4 l123 ÞT 2 þ 1=12ðl12 l34 þ l13 l24 þ l14 l23 ÞT 2 þ 1=2½l123 ðl14 þ l24 þ l34 Þ þ l124 ðl13 þ l23 þ l34 Þ þ l134 ðl12 þ l23 þ l24 Þ þ l234 ðl12 þ l13 þ l14 ÞT 2 þ 1=12½l123 ðl124 þ l134 þ l234 Þ þ l124 ðl134 þ l234 Þ þ l134 l234 ÞT 2 2 1=120l1 l2 l3 l4 T 4 References [1] Roberts NH, Vesely WE, Haasl DF, Goldberg FF. Fault tree handbook. NUREG-0492. US Nuclear Regulatory Commission; 1981. [2] Anon. PRA procedures guide. NUREG/CR-2300. US Nuclear Regulatory Commission; 1983.

57

[3] Dugan JB. Experimental analysis of models for correlation in multiversion software. Proceedings of the Fifth International Symposium on Software Reliability Engineering (ISSRE), New York: IEEE Press; November 1994. p. 36 –44. [4] Fleming KN, Mosleh A. Common-cause data analysis and implications in system modelling. Proc Int Topical Meet Probab Safety Meth Appl Feb 24–Mar 1, 1985;1:3/1–3/12. EPRI NP-3912-SR. [5] Vaurio JK. An implicit method for incorporating common-cause failures in system analysis. IEEE Trans Reliab 1998;47(2):173 –80. [6] Vaurio JK. Exact treatment of general dependencies in system fault tree and risk analysis. In: Cottam MP, Harvey DW, Pape RP, Tait J, editors. Proceedings of the Conference ESREL 2000, May 14– 17, 2000, Edinburgh, Scotland. Foresight and Precaution, vol. 2. Rotterdam: A.A. Balkema; 2000. p. 1333– 40. [7] Vaurio JK. Modelling and quantification of testing, maintenance and calibration errors in system analysis and risk assessment. Proceedings of ESREL ‘99, TUM Munich Garching, Rotterdam: A.A. Balkema; Sep 13 –17, 1999. p. 663– 8. [8] Vaurio JK. Availability of redundant safety systems with commonmode and undetected failures. Nucl Engng Des 1980;58:415–24. [9] Vaurio JK. Optimization of test and maintenance intervals based on risk and cost. Reliab Engng Syst Safety 1995;49:23–36. [10] Vaurio JK. Estimation of common cause failure rates based on uncertain event data. Risk Anal 1994;14:383–7. [11] Vaurio JK. The effects of testing arrangements on the unavailability of standby systems. Proceedings on PSA ‘93, Clearwater Beach, FL, vol. 1. USA: American Nuclear Society; Jan 26– 29, 1993. p. 654– 60. [12] Vaurio JK. The theory and quantification of common cause shock events for redundant standby systems. Reliab Engng Syst Safety 1994; ˜ 1/4 in Table 1 of Ref. [12] has typographical 43:289–305. Note: Eq. U error; it is correct in Ref. [11]. [13] Mosleh A, Rasmuson DM, Marshall FM. Guidelines on modeling common-cause failures in probabilistic risk assessment, Appendix E. NUREG/CR-5485, INEEL/EXT-97-01327. US Nuclear Regulatory Commission; November 1998. [14] Ja¨nka¨la¨ KE, Vaurio JK. Residual common cause failure analysis in a probabilistic safety assessment. Proceedings of the PSA ‘93, Clearwater Beach, FL, vol. 2. USA: American Nuclear Society; Jan 26– 29, 1993. p. 804–10. [15] Vaurio JK, Ja¨nka¨la¨ KE. Effective empirical parametric estimation of failure rates and event frequencies. Proceedings of the fifth International Conference on Probabilistic Safety Assessment and Management (PSAM 5), vol. 4. Tokyo: Universal Academy Press; 2000. p. 2143–2148. [16] Vaurio JK. Procedure for common cause failure assessment, IAEASM-321/46. Proceedings of PSA ‘91, Vienna, Austria, Vienna: International Atomic Energy Agency; 3–7 June, 1991. p. 505 –15. [17] Vaurio JK. From failure data to CCF-rates and basic event probabilities. ICDE seminar, Stockholm. Report NEA/CSNI/ R(2001)8, OECD; 12–13 June, 2001, in preparation. [18] Vaurio JK, Ja¨nka¨la¨ KE. Quantification of common cause failure rates and probabilities for standby-system fault trees using international event data sources. Proceedings (CD) of PSAM 6 Conference, San Juan, PR: Elsevier; 23–28 June, 2002.