Probability-informed testing for reliability assurance through Bayesian hypothesis methods

Probability-informed testing for reliability assurance through Bayesian hypothesis methods

ARTICLE IN PRESS Reliability Engineering and System Safety 95 (2010) 361–368 Contents lists available at ScienceDirect Reliability Engineering and S...

407KB Sizes 1 Downloads 82 Views

ARTICLE IN PRESS Reliability Engineering and System Safety 95 (2010) 361–368

Contents lists available at ScienceDirect

Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress

Probability-informed testing for reliability assurance through Bayesian hypothesis methods Curtis Smith a,n, Dana Kelly a, Homayoon Dezfuli b a b

Idaho National Laboratory, Idaho Falls, Idaho 83415-3850, USA National Aeronautics and Space Administration, Washington, DC 20546-0001, USA

a r t i c l e in fo

abstract

Article history: Received 31 January 2009 Received in revised form 9 November 2009 Accepted 18 November 2009 Available online 24 November 2009

Bayesian inference techniques play a central role in modern risk and reliability evaluations of complex engineering systems. These techniques allow the system performance data and any relevant associated information to be used collectively to calculate the probabilities of various types of hypotheses that are formulated as part of reliability assurance activities. This paper proposes a methodology based on Bayesian hypothesis testing to determine the number of tests that would be required to demonstrate that a system-level reliability target is met with a specified probability level. Recognizing that full-scale testing of a complex system is often not practical, testing schemes are developed at the subsystem level to achieve the overall system reliability target. The approach uses network modeling techniques to transform the topology of the system into logic structures consisting of series and parallel subsystems. The paper addresses the consideration of cost in devising subsystem level test schemes. The developed techniques are demonstrated using several examples. All analyses are carried out using the Bayesian analysis tool WinBUGS, which uses Markov chain Monte Carlo simulation methods to carry out inference over the network. & 2009 Elsevier Ltd. All rights reserved.

Keywords: Bayesian inference Reliability Hypothesis testing System analysis Cost MCMC Probability level

1. Introduction As systems are designed and constructed for application, frequently the question of testing to demonstrate a desired level of reliability arises. For example, in the nuclear power generation industry, regulatory standards mandate that ‘‘risk important’’ components be demonstrably able to achieve their claimed reliability values [1,2]. For space applications, the use of humancrewed space vehicles heightens the desire to achieve an acceptable level of reliability to provide assurance that fatality risks are low. While this desire to have acceptable reliability levels is pervasive, formal methods to demonstrate compliance with target reliability levels are not commonly known or employed. It is the focus of this paper to provide such methods by providing a theoretical discussion of the Bayesian approach and applying that method to representative problems. We will begin the discussion by simply posing the following question:

n

The short answer to this question is that we use Bayesian methods [3] via Bayes’ Theorem, to determine how many trials (or tests to use test-engineering terminology) are required to ensure – with probability of at least 50% – that the device unreliability is no larger than 0.001. To see how this approach is useful, we first need to review the fundamentals of the Bayesian approach in Section 2. In Section 3, we address the problem of determining the number of tests of a single device required for reliability assurance in the case of no prior information. Then, in Section 4, we cover the same problem but for the case where prior information is available. In Section 5 we address the analysis methods required when determining Bayesian reliability assurance tests for complex systems. Lastly, we provide conclusions in Section 6.

2. Review of Bayesian concepts

How many trials are needed to show that the unreliability of a device (i.e., the probability it fails on demand) is 0.001 with 50% probability?

We will begin the presentation of the Bayesian1 method of induction, or simply the ‘‘Bayes’’ method, by noting that logic theory uses a variety of deductive and inductive methods.

Corresponding author. E-mail address: [email protected] (C. Smith).

1 This method is occasionally referred to as the Bayes–Laplace method, due to the early adoption and promotion of the approach by Laplace [16].

0951-8320/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2009.11.006

ARTICLE IN PRESS 362

C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

Logic theory relies on two plausibility statements, the so-called ‘‘strong’’ and ‘‘weak’’ statements: Strong: Weak:

If A, then B If A, then B

B is false B is true

thus, A is false thus, A is more plausible

where the observed evidence that we speak of is contained in event B. In probabilistic models such as those used in reliability or risk assessments, we tend to focus on failures. That is, the observations we track, categorize, and analyze are failures—failures of hardware and humans. In terms of the logic statements above, when we record failure events, we have observations where B is true, thereby making more A plausible. For example, observations in the weak category include: Weak: If it rained during the night, then the sidewalk will be wet in the morning. The sidewalk is wet. Thus, it is more plausible that it rained last night If failure of a device can be described by a Bernoulli process with a failure-to-start probability of 0.01, then the device will, on average, fail once in a hundred starts. The device failed three times in the last hundred starts. Thus, it is more plausible that the failure-tostart probability is larger than 0.01 The more failures we record, the more information we have related to the associated event A, which represents component or human performance. It turns out that it is the weak logic statement [4] that is used in reliability and probabilistic risk assessment (PRA), where this relationship between observable events and component performance can also be described mathematically by PðA9BÞ ¼ PðAÞ

PðB9AÞ PðBÞ

ð1Þ

This equation of logical inference is also known as Bayes’ Theorem. If we dissect this equation, we will see four parts, as shown in Fig. 1, where D is the data, H is our hypothesis, and X is the general information known prior to having updated information (or data) specific to problem at hand in other words, our eXperience). Bayes’ Theorem gives the posterior (or updated) distribution for a parameter of interest, in terms of the prior distribution and the observed data, which can be written in the continuous case as

p1 ðy9xÞ ¼ R

f ðx9yÞpðyÞ : f ðx9yÞpðyÞ dy

In this equation, p1(y9x) is the posterior distribution for the parameter of interest, denoted as y. The observed data enters via the likelihood function, f(x9y), and p(y) is the prior distribution of y. The denominator of Bayes Theorem is sometimes denoted f(x), and is called the marginal or unconditional distribution of x. The likelihood, f(x9y), is often binomial, Poisson, or exponential in PRA and reliability applications. Priors can also be classified broadly as either informative or noninformative. Informative priors, as the name suggests, contain information about the parameter of interest. Noninformative priors, on the other hand, are intended to be objective, letting the data provide most of the information content. These noninformative priors are usually mathematically derived, and there are numerous types in use. For single-parameter problems in PRA and reliability, the Jeffreys prior is usually used [5]. It is convenient because it is (sometimes in a limiting sense) a conjugate prior, and the posterior can be written down by inspection. In PRA and reliability, we model a variety of probabilistic events: a component fails to start, an operator fails to respond to an alarm, a container fails to withstand an external load, an electric bus fails to remain energized, etc. While these myriad events describe different situations and different contexts, one underlying feature they all exhibit is the fact that they are all quantified via conditional probabilities. And in fact, as we will discuss throughout this paper, all probabilities are conditional. Or, in the words of one of the engineering reliability pioneers, Dr. Barlow:

‘‘In Einstein’s theory of relativity, mass, velocity, and time are relative to the observer. The same is true for probability; i.e., probabilities are relative to the observer or analyst. Also, probabilities are conditional, conditional on what is known. For this reason, conditional probabilities play a central role in the (Bayesian) approach [6].’’

While on the surface Bayes’ theorem appears quite simple, in many cases the mathematics required to handle the likelihood and conditional probability functions becomes quite involved. In many cases, solutions to Bayes’ equation cannot be derived analytically and must be found using numerical techniques. Nonetheless, there are many standard cases encountered in PRA and reliability that have been solved and documented [5,7,8] and in other disciplines [9]. In this paper, however, we are extending these applications to Bayesian hypothesis testing and probabilityinformed testing practices.

Fig. 1. Decomposition of Bayes’ Theorem.

ARTICLE IN PRESS C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

3. Bayesian hypothesis testing with little prior information

But when x =0, we can solve this equation directly for n since f(x= 09p =0.001) =0.5, where the 0.5 represents our ‘‘comfort’’ level expressed as a probability. Evaluating this expression yields 0.5 = 0.999n, or n= 693. Consequently, we would require almost 700 tests2 of this device (without any failures) to be reasonably sure that the unreliability is 0.001. However, this approach is flawed since the uncertainty in our estimate of p is not considered. A Bayesian would not solve the problem in the fashion described above. Instead, the question we are trying to ask is ‘‘given that p is uncertain in the epistemic sense (which it always is), how many trials (n) do we have to run with no failures to meet our desired comfort (or more precisely, probability) level?’’ In Bayesian hypothesis testing, we would specify our null hypothesis in terms of the device unreliability, denoted p. We then have, for our null hypothesis H0 : p r 0:001Þ The alternative hypothesis is then H1 : p 4 0:001Þ In the Bayesian framework, we must specify a prior distribution for p, which can be based upon previous information about p. In this problem, we have not been provided any past information about p; therefore, we will adopt the Jeffreys noninformative prior [12] or pprior  betaða ¼ 0:5; b ¼ 0:5Þ; where the ‘‘  ’’ means ‘‘is distributed as.’’ The prior probability that H0 is true (i.e., po0.001), is given by the area under the prior distribution to the left of 0.001. This probability can be shown to be about 2%, and is illustrated in Fig. 2 (note the use of a log scale in the figure, causing the 2% area to appear larger than it actually is). Now, our next step would normally be to collect data related to p from equipment tests. The model of the world (i.e., the likelihood in Bayes’ Theorem) is still (in this case) binomial, based on the underlying Bernoulli process, which requires that the trials be independent and that the probability of failure not change from one trial to another. Also, we assume that no failures (denoted by x) are recorded during the tests (x= 0). Because we started with a beta prior and assumed a binomial model for our likelihood, the posterior distribution will be beta, with parameters

a ¼ 0:5 þ 0 ¼ 0:5 b ¼ 0:5 þn0 ¼ n þ 0:5

2 It is interesting to note that if we ran 693 tests with no failures then computed a classical upper 50% confidence level (the lower bound is zero for zero failures), we would see that the 50% upper bound is about 0.001.

50

40

pdf

30

20

10

0 1.E-04

1.E-03

1.E-02 p

1.E-01

1.E-00

Fig. 2. Probability density function for the Jeffreys prior distribution of p, with the region with p less than 0.001 highlighted.

1000 900 800 700 600 pdf

First, let us return to our original question, but we will take a naı¨ve approach by interpreting the problem, ‘‘the unreliability of a device (on demand) is 0.001,’’ as implying that there is no uncertainty in the probability of failure on demand, p. We are looking for the necessary number of trials (denoted by n) to achieve a ‘‘comfort’’ level on the device reliability. The number of failures is assumed to be x =0 (if the device is as reliable as we claim, we would not expect to see any failures in a reasonable number of tests). The failure probability (or unreliability) on demand is p, which is 0.001. Our ‘‘model of the world’’ [10,11] is binomial and is given by   n f ðx9p ¼ 0:001Þ ¼ 0:001x ð10:001Þnx ; x ¼ 0; 1; . . .; n: x

363

500 400 300 200 100 0 1.E-04

1.E-03

1.E-02 p

1.E-01

1.E-00

Fig. 3. Probability density function for the posterior distribution of p, where we obtain zero failures in 228 trials, with the region with p less than 0.001 highlighted.

The posterior probability that H0 is true is found in an analogous manner to what was done above, but using the posterior distribution. However, we now put a slight twist on the problem by asking what value of n gives H0 : Pðp r 0:001Þ ¼ 0:5 where the probability is calculated using the posterior distribution of p. It can be shown that it takes 228 trials (n = 228) with no failures to ensure, with probability of 0.5, that the posterior unreliability of the device is no more than 0.001. The posterior distribution in this case is shown in Fig. 3. Note, however, by implying that we only need to be sure with probability 0.5, we are saying that the posterior odds of H0 being true are 1:1, which is a fairly weak probabilistic statement. In other words, we are also 50% sure that the unreliability of the device is greater than 0.001. If we need to be 90% sure that the unreliability is less than 0.001, it would take 1352 trials without a failure. While we were able to solve the case with the Jeffreys noninformative prior exactly due to the conjugate nature of the prior distribution and likelihood function, there are Bayesian tools available that can solve this problem in the general case of a

ARTICLE IN PRESS C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

nonconjugate prior. For example, WinBUGS [13,14]is able to solve complex hierarchical models, including those investigating testing schemes. WinBUGS (and the open-source version called OpenBUGS) (both of which can be run in Microsoft Windows) performs Bayesian inference for a variety of probabilistic problems. The solution method in WinBUGS is Markov chain Monte Carlo (MCMC) sampling. MCMC methods were described in the early 1950s in research into Monte Carlo sampling at Los Alamos [15]. Recently, with the advance of computing power and improved analysis algorithms, MCMC is increasingly being used for a variety of Bayesian inference problems [7]. MCMC is effectively (although not literally) numerical (Monte Carlo) integration by way of Markov chains. Inference is performed by sampling from a target distribution (i.e., a specially constructed Markov chain, based upon the inference problem) until convergence (to the posterior distribution) is achieved. Then, additional samples are drawn from this posterior distribution. These inference problems range from the simple variety where exact solutions are available (and can be solved using a spreadsheet), to very complex multidimensional and hierarchical problems where no analytical solution is possible. WinBUGS employs a simple scripting language to specify the model to be evaluated. The script is then compiled to analyze the model with its associated data. To carry out inference for the unreliability hypothesis problem using WinBUGS, we can use the script below: # Bayesian Analysis to determine testing protocol # Loop through different number of trials (n) model { for (i in 1:J) { p[i]  dbeta(0.5, 0.5) #Prior for p y Jeffreys failure.prior[i]  dbin(p[i], n[i]) #Binomial model for observed failures } } data list(failure.prior =c(0, 0, 0, 0, 0, 0, # Observe zero failures 0)) list(n = c(50, 100, 150, 200, 250, #n trials 300, 350), J= 7) By using this script, we can determine the 50% probability value for the Bayesian hypothesis calculation by monitoring the variable p for the seven different trials (ranging from 50 to 350 trials). For this analysis, we used 25,000 samples, where we discarded the first 15,000 samples to allow for convergence to the posterior distribution (the ‘‘burn in’’ period). The results of this analysis are shown in Fig. 4, where the numerical approach indicates about the same number of trials (approximately 225) as compared to the exact approach (228).

4. Bayesian hypothesis testing with prior information While the Bayesian approach to hypothesis testing is viable when there is no prior information, in many real-world situations, much prior information is available for the device in question. Under these conditions, the general approach to determine testing requirements does not change. However, a difficult part of the analysis is frequently the process of translating prior information into the probabilistic format necessary to apply Bayes’ Theorem. Recall that information is, in the Bayes framework, encoded via probabilities. Thus, the prior is, in practice, a probability distribution concerning the hypothesis, or our degree of knowledge about possible outcomes related to the hypothesis. The Bayes’ prior is intended to capture a state of knowledge independent of information provided by data collection. However,

0.005

0.004 p (at 50% probability)

364

0.003

0.002

0.001

0.000 0

50

100

150 200 250 Number of Trials

300

350

Fig. 4. Results from WinBUGS MCMC approach showing the number of trials required to achieve an unreliability of 0.001 at 50% probability level.

this knowledge does not imply that the device failure we are modeling has this probability distribution as an inherent property. While the Bayesian method has a long history and has come to be well accepted based upon its theoretical merits, in practice a variety of issues are commonly encountered. For example, it is possible that one’s prior information will be misleading or not applicable to the problem at hand (for example, in the case of wishful thinking). Nonetheless, one should evaluate relevant sources of information related to predicting the device unreliability and incorporate that information into the prior distribution. A list of relevant sources for device unreliability information includes:

 Generic databases  Past unreliability evaluations (e.g., fault tree models, simulation)

 Expert elicitation  Testing. One’s experience (which makes up the prior) typically comes from a vast (and sometimes vague) set of similar situations, relevant insights, judgment, knowledge, and observations. We would ideally then want to factor in all of the insights available to us for use in making inference on the unreliability of our device. What we aim to do is to take the different sets of information for the device (when available) and use knowledge as a prior to be updated with test data. Even if our information sources were strictly of the expert opinion type, these sources can be aggregated. For example, it would be possible (in fact encouraged) to assign a probability (or weight) to each opinion that is collected. This probability may be based upon the relative knowledge and independence of the expert. To illustrate this mechanism of aggregating relevant information sources, we explore the case where we have equally relevant information sources for the device unreliability. These data sources are in terms of ‘‘generic’’ data obtained from similar devices—we show these in Table 1. From these data sources, we may combine the individual distributions into a single

ARTICLE IN PRESS C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

0.003

Database I

Device unreliability is represented by a uniform distribution with lower bound of 0.0001 and upper bound of 0.1 Device unreliability is represented by a lognormal distribution with mean of 0.01 and an error factor of 10 Device unreliability is represented by a lognormal distribution with mean of 0.004 and an error factor of 10

Database II Database III

Table 2 Cases to be evaluated for informed prior information. Case

Information to be used

A B C D E

Database I and Database II combined with equal weights. Database II and Database III combined with equal weights Just Database I Just Database II Just Database III

distribution. This single distribution then becomes our prior distribution, where we list the cases to be explored in Table 2. Note that this aggregation of data sources into our prior distribution was done independently of any test data collection—we are accounting for our prior experience (which, in this limited example, is represented by the three distributions) by encoding that information into a prior distribution. We will use WinBUGS to evaluate all five cases listed in Table 2. For example, the WinBUGS script for Case A is shown below # Bayesian Analysis to determine testing protocol # Loop through different number of trials (n) model { for (i in 1:J) { P[i,1]  dunif(0.0001, 0.1) # Uniform prior for p P[i,2]  dlnorm( 5.58, 0.51) # Lognormal prior for p p.avg[i]o  P[i, r]# Aggregate prior for p y equal weights failure.prior[i]  dbin(p.avg[i], n[i]) # Binom. model for observed failures } r  dcat(pi[]) # 0 or 1 variable pick prior (P[i,1] or P[i,2]) } data list(failure.prior =c(0, 0, 0, 0, 0, # Observe 0, 0)) zero failures list(n= c(100, 200, 300, 400, # n trials 500, 600, 700), J= 7) list(pi = c(0.5, 0.5)) Again, we compile the script and run the analysis for the different numbers of tests in order to determine how many tests are required until the unreliability is known to be 0.001 with a probability of 50%. The results of this analysis for Case A are shown in Fig. 5. As can be seen in the figure, we would require around 500 trials to achieve 0.001 unreliability at the 50% level. In this situation, having information makes it more difficult to demonstrate a low unreliability due to the fact that our prior information suggests relatively high unreliability (i.e., p 40.001). If we look at the information sources, we see that for Database I (the uniform distribution) there is large probability that unreliability is above 0.001. Further, Database I does not allow

0.002

0.001

0.000 100

0

200

300 400 500 Number of Trials

600

700

Fig. 5. Results of WinBUGS MCMC analysis for Case A showing the number of trials required to achieve an unreliability of 0.001 at 50% probability level.

0.005 C 0.004 p (at 50% probability)

Information

p (at 50% probability)

Table 1 Example of informed prior information for the device unreliability. Data Source

365

0.003 D

0.002

B 0.001 E 0.000 0

100

200

300 400 500 Number of Trials

600

700

Fig. 6. Results of WinBUGS MCMC analysis for Cases B, C, D, and E showing the number of trials required to achieve an unreliability of 0.001 at 50% probability level.

(probability of zero) any values below an unreliability of 0.0001 (the lower end of the uniform distribution). In the case of the Jeffreys noninformative prior, we allow values lower than 0.0001. Note that if we had run a case where we used just a uniform distribution, say from 0.0001 to 0.001, then we force the unreliability to be below 0.001 since our prior assigns a probability of zero to values above this threshold. In other words, assigning potential unreliability outcomes zero (or very low) probability in the prior assigns a high degree of informational weight that may take significant quantities of data (or trials) to overcome. The remaining cases (B, C, D, and E) were evaluated and the results are shown in Fig. 6. For Case B (Database II and Database III combined with equal weights), we see that approximately 275 trials are needed to assure the unreliability of 0.001 at a

ARTICLE IN PRESS 366

C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

probability of 50%. If the prior information is characterized by Database I (Case C), then we require more than 700 trials. For the two lognormal priors (Cases D and E), the mean of the lognormal distribution affects the required number of trials. In Case D where the mean of the prior is 0.01, it takes approximately 520 trials to assure the 0.001 unreliability level. However, in Case E where the mean of the prior is 0.004, it only takes approximately 140 tests to reach the 0.001 unreliability level at a probability of 50%.

Table 4 Summary of uncertainty in system failure prior probability for various component arrangements.

5. Estimating test requirements for complex systems

Table 5 Number of tests needed to demonstrate system failure probability less than prior mean.

In this section, we will examine arrangements into a system of the three components from Section 4, where Component 1 is described by Database I, Component 2 is described by Database II, and Component 3 is described by Database III. A goal for failure probability is now set at the system level, and the number of tests required to demonstrate this goal with a specified probability is determined. In other words, we will allocate a number of trials that will provide desired overall system reliability. We determine the number of trials by either use of a spreadsheet (in the case of conjugate calculations) or WinBUGS. Three component arrangements are analyzed: series, parallel, and 2-out-of-3. In each case, failures of the individual components are assumed to occur independently (i.e., we do not consider common-cause failure). The prior distribution for the failure probability of each component is taken from Section 4. Assuming each component is independent, and using the rare event approximation, the system failure probability for each configuration is given in Table 3 below. With these prior distributions as inputs, and using the assumption of independence for component failure, a Monte Carlo analysis was used to propagate input uncertainties through to the level of system failure probability. The summary results of 100,000 iterations are in Table 4 below. For the series arrangement, the overall probability of failure is dominated by Component 1, whose mean failure probability is 0.05 (this is also the median value). For the other arrangements, there is a marked asymmetry in the system failure probability distribution, with the mean being significantly greater than the median. We will perform the analysis to determine, for each system arrangement, how many tests are required in order to demonstrate that the posterior mean (first) or median (second) are less than the prior mean or median, respectively. Equal numbers of tests will be performed on each component. WinBUGS was used to approximate the posterior distribution for system failure probability under the assumption that no failures at the component level are observed during any of the tests. The results (for both a check on the mean and median) are shown in Tables 5 and 6 below. We next examine the case of optimizing the number of tests, subject to cost constraints. The general framework is one in which each component test is associated with a cost, Ci. If each P component has ni tests, then the total cost is Ctot = Cini The goal is to find the set of ni’s that minimizes system failure probability, subject to a specified upper limit on the total cost, Ctot.

Table 3 System failure probabilities for various component configurations. Configuration

System failure probability

Series Parallel 2-of-3

p1 + p2 + p3 p1p2p3 p1p2 + p1p3 +p2p3

System arrangement Median failure probability Mean failure probability Series Parallel 2-of-3

0.06 2.2  10  7 3.1  10  4

0.06 2.0  10  6 7.4  10  4

System arrangement

Number of tests for 90% probability level

Number of tests for 95% probability level

Series Parallel 2-of-3

42 10 25

56 27 37

Table 6 Number of tests needed to demonstrate system failure probability less than prior median. System arrangement

Number of tests for 90% probability level

Number of tests for 95% probability level

Series Parallel 2-of-3

42 79 55

56 115 77

Table 7 Conjugate beta prior distributions used to optimize number of tests for each component. Component

1 2 3

Mean

0.05 0.01 0.004

95th percentile

0.095 0.0375 0.015

Beta dist. parameters

a

b

5.5 0.625 0.225

96.1 59.4 59.4

In the case of nonconjugate priors, the system failure probability cannot be written down as a simple function of the ni’s; it must be determined numerically. Our goal here is not to present an algorithm for optimizing the failure probability in general. Rather, we are focusing on the achievement of reliability goals using Bayesian methods. Therefore, for the purposes of optimizing the number of tests of each component, we replaced the nonconjugate priors with beta distributions having the same mean and 95th percentile as the nonconjugate priors. The resulting conjugate priors are shown in Table 7 below. Note, in these calculations, we determine the allocation using an ‘‘approximate optimum.’’ These ‘‘approximate optimums’’ allocations will be called optimum for the sake of brevity. Further, they were determined using the ‘‘solver’’ routine in Excel. The posterior mean of each component’s failure probability was used as an input to the equations for system failure probability listed in Table 4. A total cost constraint of $1,000,000 was assumed. The cost per test for each component was assumed to be $10,000 for component 1, $2000 for component 2, and $5000 for component 3. The resulting optimum number of tests for each component in each configuration is shown in Table 8 below. Because the failure probability for component 1 does not decrease significantly except for very large numbers of tests, and

ARTICLE IN PRESS C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

Table 8 Optimum number of tests for each component in various configurations. Configuration

Series Parallel 2-of-3

367

Table 11 Attainable system failure probabilities with optimum tests at $1000 per test.

Optimum number of tests for each component

Configuration

Median

90th

95th

Component 1

Component 2

Component 3

84 0 20

76 294 251

0 82 58

Series Parallel 2-of-3

0.005 1.57  10  9 5.58  10  6

0.011 1.61  10  8 2.36  10  5

0.013 2.97  10  8 3.41  10  5

6. Conclusions Table 9 Attainable system failure probabilities with optimum number of tests. Configuration

Median

90th

95th

Series Parallel 2-of-3

0.016 5.41  10  8 8.56  10  5

0.038 4.82  10  7 3.77  10  4

0.047 8.40  10  7 5.41  10  4

Table 10 Optimum number of tests if each component test costs $1000. Configuration

Series Parallel 2-of-3

Optimum number of tests for each component Component 1

Component 2

Component 3

693 693 489

206 206 301

100 100 208

each test of component 1 costs $10,000, it is only beneficial to test component 1 if the system is in a series configuration. Otherwise, the money is better spent on component 2, which has a relatively high prior failure probability, which decreases significantly with a moderate number of tests. Component 3 is deserving of fewer tests than component 2 because its prior failure probability tends to be smaller, and each test of component 3 costs $5000 versus only $2000 for component 2. Next, the optimum number of tests for each component listed in Table 8 was input to WinBUGS and the posterior distribution for system failure probability was obtained, assuming there were no component failures observed in any of the tests. The posterior distribution shows the attainable system failure probability under the assumed total cost constraint of $1,000,000 and the assumed costs per test listed above. In other words, demonstrating these results would cost $1,000,000 using the assumed costs per test. The results are shown in Table 9 below. A sensitivity study was done in which the assumed costs per test were changed to $1000 for each component. Thus, the problem is how to distribute the 1000 tests that can be afforded across the three components. This changes the optimum number of tests and the attainable results, as shown in Table 10 below. Note that the row totals may not equal 1000 because the number of tests has been rounded down to the nearest integer. It is interesting to note that the optimum numbers of tests are now the same for both the series and parallel configurations. In both of these cases, the largest reduction in system failure probability is obtained by reducing the failure probability of component 1, and this takes a sizeable number of tests, which in this case is now affordable because of the lower cost per test. For the 2-of-3 arrangement of components, the optimum number of tests for component 1 is less than for the series or parallel configuration, because p1 appears in only two of the three terms in the equation for system failure probability, so the impact of reducing p1 is less. Table 11 shows the system failure probabilities attainable with these optimum numbers of tests.

Frequentist estimates have been commonly used to suggest testing methods—however, the frequentist approach to inference does not allow useful past information to be incorporated. In contrast, the Bayesian approach that we have described in this paper allows the analyst to incorporate additional sources of information and provides probabilities of observable events (tests), and these estimates take into account the epistemic uncertainty inherent in the problem. For simple cases such as determining the number of tests of a single device when using conjugate priors, a spreadsheet can be used to perform the analysis. However, for a more general solution approach, we turned to MCMC-based sampling methods, specifically for the case of informed priors and for complex systems. Lastly, we described an approach wherein an external cost constraint was imposed on the testing scheme.

Acknowledgements This paper has been authorized by Battelle Energy Alliance, LLC (BEA) under Contract no. DE-AC07-05ID14517 with the US Department of Energy. The Government and BEA make no express or implied warranty as to the conditions of the research or any intellectual property, generated information, or product made or developed under this technical assistance project, or the ownership, merchantability, or fitness for a particular purpose of the research or resulting product; that the goods, services, materials, products, processes, information, or data to be furnished hereunder will accomplish intended results or are safe for any purpose including the intended purpose; or that any of the above will not interfere with privately owned rights of others. Neither the Government nor BEA shall be liable for special, consequential, nor incidental damages attributed to such research or resulting product, intellectual property, generated information, or product made or delivered under this technical assistance project. References [1] Dube DA, Atwood CL, Eide SA, Mrowca BB, Youngblood RW, Zeek DP. Independent verification of the Mitigating Systems Performance Index (MSPI) results for the pilot plants—final report, NUREG-1816, 2005. [2] NRC Inspection Manual. Inspection Procedure 62706, Maintenance Rule, 1997. [3] Bayes T. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society 1763;53:370–418. [4] Jaynes E. In: Probability theory—the logic of science. Cambridge University Press; 2003. [5] Siu N, Kelly D. Bayesian parameter estimation in probabilistic risk assessment. Reliability Engineering and System Safety 1998;62:89–116. [6] Barlow RE. Engineering reliability. American Statistical Association and the Society for Industrial and Applied Mathematics (ASA-SIAM), Philadelphia, PA, 1998, p. xix. [7] Kelly DL, Smith CL. Bayesian inference in probabilistic risk assessment—the current state of the art. Reliability Engineering and System Safety 2009;94:628–418–>643. [8] NASA/SP-2009-569. Bayesian inference for NASA probabilistic risk and reliability analysis. NASA Scientific and Technical Information Program, 2009. [9] Bruttia P, Santis FD. Robust Bayesian sample size determination for avoiding the range of equivalence in clinical trials,. Journal of Statistical Planning and Inference 2008;138(6):1577–91.

ARTICLE IN PRESS 368

C. Smith et al. / Reliability Engineering and System Safety 95 (2010) 361–368

[10] Winkler Robert L. An introduction to Bayesian inference and decision. New York: Holt, Rinehart, and Winston; 1972. [11] Apostolakis GE. A commentary on model uncertainty. In: Mosleh A, Siu N, Smidts C, Lui C, editors. Proceedings of workshop on model uncertainty, Center for Reliability Engineering, University of Maryland, College Park, MD, 1995 (also published as Report NUREG/CP-0138, US Nuclear Regulatory Commission, Washington, DC, 1994). [12] Jeffreys H. In: Theory of probability, 3rd ed. Oxford Press; 1961.

[13] Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo methods in practice, 1996. [14] Lunn DJ, et al. WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing 2000;10:325–37. [15] Wikipedia. URL: /http://en.wikipedia.org/wiki/Monte_Carlo_methodS, 2009. [16] Laplace PS. A philosophical essay on probabilities. Dover Publications; 1814. (reprint 1996).