ARTICLE IN PRESS Reliability Engineering and System Safety 94 (2009) 1862–1868
Contents lists available at ScienceDirect
Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress
Reliability and validity of risk analysis Terje Aven , Bjørnar Heide University of Stavanger, Norway
a r t i c l e in f o
a b s t r a c t
Article history: Received 18 June 2007 Received in revised form 20 May 2009 Accepted 14 June 2009 Available online 21 June 2009
In this paper we investigate to what extent risk analysis meets the scientific quality requirements of reliability and validity. We distinguish between two types of approaches within risk analysis, relative frequency-based approaches and Bayesian approaches. The former category includes both traditional statistical inference methods and the so-called probability of frequency approach. Depending on the risk analysis approach, the aim of the analysis is different, the results are presented in different ways and consequently the meaning of the concepts reliability and validity are not the same. & 2009 Elsevier Ltd. All rights reserved.
Keywords: Scientific requirements Reliability and validity Risk analysis Probability of frequency approach Bayesian approaches
1. Introduction In this paper we discuss the scientific basis of risk analysis. For many years there has been a lively discussion about the scientific platform of statistical analysis in general, the Bayesian–nonBayesian controversy, see e.g. Lindley [13] and the following discussion. However, there has not been much work on establishing a proper scientific basis for risk analysis. A number of papers addresses foundational issues of risk analysis, see e.g. Apostolakis [1,2], Kaplan and Garrick [12], Singpurwalla [18,19] and Cooke [9], but we are not aware of much work where fundamental scientific quality requirements such as reliability and validity are discussed in the context of a risk analysis. Of the few contributions we have found in the literature we would like to draw attention to Risk Analysis (1981), and in particular Weinberg [20] and Cumming [10]. These authors describe some of the problems of risk analyses, and express a large degree of skepticism to the scientific reliability and validity of risk analysis. Weinberg notes that ‘one of the most powerful methods of science – experimental observations – is inapplicable to the estimation of overall risk’. Graham [11] writes that the discipline ‘‘should (and will) always entail an element of craft-like judgment that is not definable by the norms of verifiable scientific fact’’, and that ‘‘any determination that a risk has been ‘verified’ is itself a judgment that is made on the basis of standards of proof that are to some extent arbitrary, disputable, and subjective’’.
Corresponding author.
E-mail address:
[email protected] (T. Aven). 0951-8320/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2009.06.003
We share many of the same views on the scientific basis of risk analysis. However, in order to gain more insight into this subject, we need to clarify what constitutes the probabilistic platform of the risk analysis; is it a relative frequency-based type of approach or a Bayesian approach? And we need to clarify what we mean by reliability and validity in this context. These types of approaches can be characterized as follows [3,6]: (a) Relative frequency-based approaches: Risk analysis is about estimation of some underlying relative frequency-interpreted probabilities. Such probabilities express the relative fraction of times the event of interest occurs if the situation analysed were hypothetically ‘‘repeated’’ an infinite number of times. The underlying probabilities are unknown and are estimated in the risk analysis. These estimates are uncertain, as there could be large differences between the estimates and the correct (true) probabilities. (b) Bayesian approaches: Probability is used as a measure of uncertainty about future events and outcomes (consequences), seen through the eyes of the assessor and based on some background information and knowledge. Probability is a subjective measure of uncertainty. Aven [4] discusses the scientific basis of risk analysis, based on the Bayesian perspective. Aven [4] argues that the analysis cannot be judged by reference to the traditional science paradigms alone, such as the natural sciences, social sciences, mathematics and probability theory. The quality of a Bayesian-founded risk analysis cannot be evaluated by reference to the accuracy in describing the
ARTICLE IN PRESS T. Aven, B. Heide / Reliability Engineering and System Safety 94 (2009) 1862–1868
world; as such an analysis is a tool for expressing and communicating the analysts’ uncertainty about the world. In the present paper we address risk analysis in general, covering both relative frequency-based approaches to risk analysis and Bayesian approaches. For the latter type of approaches we distinguish between two types of implementations: (b1) Bayesian approaches estimating non-observable parameters. We construct a fictive infinite population of items ‘similar’ to the ones being studied and define performance measures (parameters) based on average performance of this population, for example mean values and proportions. Such a proportion is referred to as a chance [19]. A main aim of the Bayesian analysis is to estimate these parameters, based on the available knowledge. (b2) Bayesian approaches that predict observables. We introduce no fictional population. The focus is on the future observable quantities such as costs, number of fatalities, the occurrence of a fatality, etc., and the main aim of the analysis is to predict these quantities and assess associated uncertainties. Subjective probabilities are used to assess the uncertainties, i.e. describe the analyst’s uncertainty of what will be the future outcomes of these observables. Our focus is quantitative risk analysis. Compared to Aven [4], we gain new insights by looking into the concepts of reliability and validity. While reliability is concerned with the consistency of the ‘‘measuring instrument’’ (analysts, methods, procedures), validity is concerned with the success at ‘‘measuring’’ what one set out to ‘‘measure’’ in the analysis. More precisely we make the following definitions: Reliability: The extent to which the risk analysis yields the same results when repeating the analysis (R) Validity: The degree to which the risk analysis describes the specific concepts that one is attempting to describe (V) We will consider different interpretations of ‘repeating the analysis’, including the one where a new analysis is performed with the same purpose and scope, but carried out by other analysts. We say that the risk analysis meets the reliability requirement if the risk analysis yields the same results when repeating the analysis, and meets the validity requirement if the risk analysis describes the specific concepts that one is attempting to describe. Depending on the risk perspective, more specific and detailed interpretations (sub-criteria) of the above general definitions of reliability and validity can be formulated: Reliability
The degree to which the risk analysis methods produce the same results at reruns of these methods (R1).
The degree to which the risk analysis produces identical results
when conducted by different analysis teams, but using the same methods and data (R2). The degree to which the risk analysis produces identical results when conducted by different analysis teams with the same analysis scope and objectives, but no restrictions on methods and data (R3). Validity
The degree to which the produced risk numbers are accurate compared to the underlying true risk (V1).
The degree to which the assigned subjective probabilities
adequately describe the assessor’s uncertainties of the unknown quantities considered (V2). The degree to which the epistemic uncertainty assessments are complete (V3).
1863
The degree to which the analysis addresses the right quantities (V4). In the paper we discuss how appropriate the various requirements are and to what extent they are met. We perform separate analyses of the relative frequency-based approaches and the Bayesian approaches. The relative frequency-based approaches cover traditional statistical inference methods (e.g. point estimators and confidence intervals) and the so-called probability of frequency approach. In this latter approach it is distinguished between (i) the relative frequency-interpreted probabilities Pf: reflecting variation within populations (aleatory uncertainty) and (ii) the subjective probabilities reflecting the analyst’s uncertainty what the correct relative frequency probabilities are (epistemic uncertainties), see e.g. Kaplan and Garrick [12] and Aven [3]. For the frequency component (i) we may replace the relative frequency-interpreted probabilities by the expected number of occurrences per unit of time (or any other reference), understood as the average number of occurrences when the ‘‘experiment’’ is repeated a large number of times. However, considering a suitable unit of time these expected values are approximately equal to Pf. The probability of frequency approach is conceptually not the same as the Bayesian approaches (b1), but from a practical risk analysis point of view these approaches are identical as will be explained later (Sections 3 and 4). The paper is organized as follows. First, in Section 2, we make some reflections concerning which requirements that should be specified for risk analysis to be classified as a scientific analysis. In Sections 3 and 4 we relate these requirements to risk analysis for relative frequency-based approaches and Bayesian approaches, respectively. Section 5 provides some conclusions and final remarks.
2. Scientific requirements to risk analysis For a risk analysis to be a scientific analysis it is natural to require that it meets the following requirements: 1. The scientific work shall be in compliance with all rules, assumptions, limitations or constraints introduced, and that the basis for all choices, judgments, etc. given shall be clear, and finally that the principles, methods and models shall be subjected to order and system, to ensure that critique can be raised and that it is comprehensible. 2. The analysis is relevant and useful—it contributes to a development within the disciplines it concerns, and it is useful with a view to solving the ‘problem (s)’ it concerns or with a view to further development in order to solve the ‘problem (s)’ it concerns. 3. The analysis and results are reliable and valid. The requirements 1–2 are based on standard requirements for scientific work [17]. The purpose of risk analysis is to provide decision support by systematization of knowledge to describe/ express risk. This is a unique objective for risk analysis. Terminologies have been developed, as well as principles and methods for analysing risks. However, there is no broad consensus in the risk analysis community about the terminology and the principles and methods to be used. For example, risk is defined and expressed in many different ways. The full discussion of these issues is beyond the scope of this paper, but important aspects are covered by our analysis of requirement 3, the reliability and validity criteria. These relate in particular to how to understand and express risk, as well as the use of the risk analyses.
ARTICLE IN PRESS 1864
T. Aven, B. Heide / Reliability Engineering and System Safety 94 (2009) 1862–1868
3. Reliability and validity for the relative frequency-based approaches to risk analysis In the relative frequency-based approaches to risk analysis, risk is quantified by the combination of the set of possible outcomes and their probabilities, and related expected values. For instance, say that a risk analysis investigates the probability p of an accidental event A. The probability p is defined as the fraction of times the accidental event A occurs if the activity considered were repeated (hypothetically) an infinite number of times. It expresses variations within this constructed infinite population. Hence an infinite population of similar activities needs to be introduced to define a relative frequency-based approach. Since the true value of p is unknown, it must be estimated. The estimation accuracy (precision) can be described in different ways, and the following two are the prevailing approaches: 1. traditional statistical methods such as confidence intervals and 2. the probability of frequency approach, where epistemic uncertainties about p are expressed by means of subjective probabilities. The former approach requires that there exist relevant data that can be used to estimate the parameters introduced, in this case p. We may estimate p directly using a set of relevant data on occurrences of the accidental event A, or indirectly using models. In this latter case we may write p ¼ g(q), where q is a vector of parameters, representing probabilities and expected values on a more detailed system level, and g is the model. The model g may for example be based on a set of event trees and fault trees. If relevant data exist for q, we can construct estimates q* of q, and through the model g, we obtain an estimate of p, p*, by the equation p* ¼ g(q*). 3.1. Reliability and validity—traditional statistical methods Now let us address the reliability and validity issue, within the context of traditional statistical methods. First let us look closer into the validity definition V1: The degree to which the produced risk numbers are accurate compared to the underlying true risk. This is obviously an appropriate criterion for a relative frequency-based approach as it is based on the existence of such a true risk. If we have a substantial amount of data available and these are considered relevant for the estimation of p, i.e. the observations are considered similar to those of the population generating p, statistical theory shows that the estimation error becomes negligible. Hence the results are valid according to the criterion V1. However, in a practical risk analysis setting data are often scarce. If the statistical analysis is based on few data, the estimation accuracy would be poor as the confidence intervals would be wide. Thus accurate estimation, and a high validity according to V1, cannot be obtained. It is not possible to define precisely what we mean by terms such as ‘‘accurate estimation’’ and ‘‘negligible estimation error’’ without being explicit about the context. However, often we may indicate an order of magnitude. For example, if we estimate a p equal to 0.10 and the upper and lower confidence bounds are 0.11 and 0.09, respectively, the estimation error would be considered negligible for most applications. To increase the amount of data, we often extend the relevant population of observations to cover situations that to a varying degree are similar to the one being studied. This reduces the quality, i.e. the relevancy of the data, but this aspect would not be
possible to describe by the statistical analysis. If the data are not considered relevant, the statistical analysis cannot be used to check the validity according to criterion V1. The same type of problems arises in the case of modelling, although the amount of data often is larger on the detailed system level. However, in this case we should also take into account the uncertainty introduced by using the model g. In this setting there exists a ‘‘true’’ model gT corresponding to the ‘‘true’’ values of p and q. In the analysis we use g and this introduces a possible error in the risk estimation. Hence validity V1 is ensured only if the model uncertainty is negligible. We conclude that the risk analysis in this case (relative frequency-based approach founded on traditional statistical methods) meets the validity requirement according to V1 if a large amount of relevant data is available. In other cases, when such data are not available, the V1 criterion is in general not satisfied. The criteria V2 and V3 are not relevant for this case as there are no epistemic-based assessments and probabilities. Criterion V4 is discussed below. The amount of relevant data also affects the reliability criterion R, the extent to which the risk analysis yields the same results when repeating the analysis. In the case of a large amount of relevant data, the statistical analyses would show insignificant variations from analysis to analysis for all three interpretations R1–R3. The statistical methods for such applications are largely universal. In other cases, when such data are not available, the R criterion is in general not satisfied. It is, however, not difficult to identify examples where the criterion R (and R1–R3) is met also when rather few data exist. If we have success–failure observations, for example 2 successes out of 10, the analysis teams are led to a binomial probability model and this would give the same estimate and confidence interval. For other situations there may be no obvious analysis methods and different analysis teams are likely to produce different results. If repeating the analysis means making a new data sample, this may lead to large variations in the results in the case that the sample is small (relevant for R1 and R3). For example, in the binomial case with 10 trials the success estimate could show large differences from sample to sample. In view of V1, the reliability criteria are all considered appropriate for the traditional statistical methods. The aim is to produce risk numbers close to the true risk and then we should require that the analysis results are not dependent on the analysis team and/or the methods and data used.
3.2. Reliability and validity—probability of frequency approach Consider now the probability of frequency approach. This approach recognises the need for incorporating uncertainty assessments reflecting also the data relevancy dimension. Following this approach, the analysts are to express the epistemic uncertainties about the parameters (p and q) using subjective probabilities. If the model g is introduced, an uncertainty distribution over p is derived by propagating the uncertainty distributions for the parameters q through g. This propagation is often carried out using Monte Carlo simulation. In practice it is difficult to perform a complete uncertainty analysis within this setting. In theory an uncertainty distribution on the total model and parameter space should be established, which is impossible to do. So in applications only a few marginal distributions on some selected parameters are normally specified, and therefore the uncertainty distributions on the output probabilities are just reflecting some aspects of the uncertainty. This makes it difficult to interpret the produced uncertainties.
ARTICLE IN PRESS T. Aven, B. Heide / Reliability Engineering and System Safety 94 (2009) 1862–1868
The ambition of the probability of frequency approach is to express the epistemic uncertainties of the parameters (p and q), and take into account all relevant factors causing uncertainties. Say that the analysis produces a 90% credibility interval for p, which equals [0.01, 0.08]. Hence the analyst is 90% confident that p lies in the interval [0.01, 0.08]. The best estimate could be 0.05 (say). The criterion V1, as well as the reliability criteria, are applicable for the probability of frequency approach as the basis for the analysis is the true risk and there is a strive for reducing the epistemic uncertainties (expressed for example by the credibility intervals) by increasing the knowledge base. However, we must conclude that this validity requirement is not in general met; the analysts may present a narrow credibility interval for p, but that would not preclude the true p to be outside the interval. If the interval is wide (which is typically the case if all epistemic uncertainties are included, i.e. V3 is met), we cannot claim that we have obtained an accurate estimation of p, and hence the validity requirement V1 is not met even if we acknowledge the analysis team to have strong knowledge. Alternatively we may consider the probability of frequency approach to be a framework for expressing uncertainties about the true risk. The risk analysis is not about bounding and reducing uncertainties, but to describe uncertainties. Then validity should not be related to the accuracy in the results but rather the transformation of uncertainties to probabilities, i.e. the probability assignments. According to this line of argument we should focus on the criterion V3, that the uncertainty assessments are complete, as well as the criterion V2: the degree to which the assigned subjective probabilities adequately describe the assessor’s uncertainties of the unknown quantities considered. It is not straightforward to verify that the validity requirement V2 is met, and there is an ongoing research and discussion in the literature addressing this issue. It is outside the scope of this paper to give a full account of this research and discussion, but we will highlight some important principles and procedures (Refs. [3,4,9,15]): (i) Coherent uncertainty assessments are achieved using the rules of probability, including Bayes’ theorem for updating of assessments in case of new information. (ii) Comparisons are made with relevant observed relative frequencies if available. (iii) Training in probability assignments is required to make assessors aware of heuristics as well as other problems of quantifying probabilities such as superficiality and imprecision (which relate to the assessor’s possible lack of feeling for numerical values). (iv) Using models to simplify the assignment process. (v) Using procedures for incorporating expert judgments. (vi) Accountability: the basis for all probability assignments must be identified. These principles and procedures provide a basis for establishing a standard for the probability assignments; the aim being to extract and summarise knowledge about the unknown quantities (parameters), using models, observed data and expert opinions. It seems reasonable to say that the requirement V2 is met provided that this standard is followed. However, this may be a too quick conclusion. The subjective probabilities could camouflage uncertainties. The assigned probabilities are conditioned on a number of assumptions and suppositions. They depend on the background knowledge. Uncertainties are often hidden in the background knowledge, and we may consequently question whether the assigned
1865
subjective probabilities adequately describe the assessor’s uncertainties of the unknown quantities considered (V2). This issue is discussed by, for example, Mosleh and Bier [16]. They refer to a subjective probability P(A|X) which expresses the probability of the event A given a set of conditions X. As X is uncertain (it is a random variable), a probability distribution for the quantity h(X) ¼ P(A|X) can be constructed. Thus there is uncertainty about the random probability P(A|X). However, we will stress that the probability is not an unknown quantity (random variable) for the analyst. To make this clear, let us summarise the setting of subjective probabilities. A subjective probability P(A|K) is conditional on the background knowledge K, and some aspects of this K can be related to X as described by Mosleh and Bier [16]. The analyst has determined to assign his/her probability based on K. If he/she finds that the uncertainty about X should be reflected, he/ she would adjust the assigned probability using the law of total probability. This does not mean however that P(A|K) is uncertain, as such a statement would presume that there exists a true probability value. The assessor needs to clarify what is uncertain and subject to the uncertainty assessment and what constitutes the background knowledge. This is a key point to meet the criterion V2. From a theoretical point of view one may think that it is possible (and desirable) to remove all such Xs from K, but in a practical risk assessment context that is impossible. We will always base our probabilities on some type of background knowledge, and often this knowledge would not be possible to specify using quantities such as X. Next we address the criterion V4: the degree to which the analysis addresses the right quantities. Is p really the quantity of interest? Our goal is to express the risk of a system, but in the relative frequency-based approaches we are concerned about the average performance of a thought-constructed population of similar situations. Are these quantities meaningful representations of the system being studied? Clearly, when for example looking at the total activity of an industry, for example the offshore petroleum activities on the Norwegian Continental Shelf, it is hard to understand the meaning of such a constructed infinite population. If we are to assess uncertainties of average performance of quantities of such a population, it is essential that we understand what they mean. Hence the validity requirement V4 can be questioned. Now let us look at the reliability criterion R: the extent to which the risk analysis yields the same results when repeating the analysis. One may expect that following the standard for probability assignments (i.e. meeting V2) would ensure that the reliability requirement R is met. However, the background knowledge that the assignments are based on need not be exactly the same from analysis to analysis. Hence we would experience differences in the probability assignments, but the differences are not likely to be large if V2 is met. This applies to R1 and R2. The criterion R3 (the degree to which the risk analysis produces identical (similar) results when conducted by different analysis teams with the same analysis scope and objectives, but no restrictions on methods and data) would in general not be met, as the background information would be different from analysis to analysis, and often this difference could be very large due to different levels of competence, research schools, tools available, etc. We may question the appropriateness of the reliability criteria in this case. Obviously we would require the results not to depend on the person running the computer calculations etc., but it should not be an objective to strive for identical results for different analysis teams. According to V2, the aim is to assess uncertainties using subjective probabilities. The background information for these assignments could be different from analysis to analysis, and often this difference could be very large
ARTICLE IN PRESS 1866
T. Aven, B. Heide / Reliability Engineering and System Safety 94 (2009) 1862–1868
as mentioned in the previous paragraph. Reflecting these differences may be considered an important aim of the analysis.
4. Reliability and validity for the Bayesian approaches to risk analysis As mentioned in Section 1, we distinguish between two ways of implementing the Bayesian perspective: (b1) Bayesian approaches estimating non-observable parameters. A main aim of the Bayesian analysis is to estimate these parameters based on the available knowledge. (b2) Bayesian approaches that predict observables. The focus is on the future observable quantities such as costs, number of fatalities, the occurrence of a fatality, etc., and the main aim of the analysis is to predict these quantities and assess associated uncertainties. Subjective probabilities are used to assess the uncertainties, i.e. describe the analyst’s uncertainty of what will be the future outcomes of these observables. For category (b1) approaches the quantities focused are fictional parameters as in the probability of frequency approach discussed in the previous section. Although Bayesians would not speak about true values of the parameters, the difference in interpretation of the parameters is more of a theoretical interest than of any practical importance. In the Bayesian approaches (b1) an infinite thought-constructed population of exchangeable random quantities is considered, and the parameters are seen as limits of these random quantities [7]. Hence the probability p defined in the previous section is to be interpreted as the limiting proportion of times the accidental event occurs when considering an increasing population of similar situations. But this is in practice the same as in the probability of frequency approach, and we refer to the previous section for analysis of reliability and validity for this case. In the following we restrict attention to the Bayesian approaches that predict observables (b2). To discuss reliability and validity of these approaches we need to summarise the basic ideas of this approach. We use a simple example for illustration purposes. We focus on the number of accidental events X in some specified activities. By modelling (for example using event trees and fault trees) we establish a link g between X and observables on a more detailed system level, denoted Z ¼ (Z1, Z2, y, Zm). Here Zj may for example be the number of hazardous situations of a certain type occurring during some activities or an indicator function which equals 1 if a specific safety barrier fails and 0 otherwise. To illustrate, consider the simple event tree model in
Z3 = 1
Z2 = 1
Z1 Z3 = 0
Z2 = 0 Fig. 1. An event tree example.
Fig. 1. In this model Z1 equals the number of hazardous situations occurring, Z2 is equal to 1 if the first safety barrier fails and 0 otherwise and Z3 is equal to 1 if the second safety barrier fails and 0 otherwise. To distinguish between the first, second, y hazardous situation occurring during the interval considered, we write Z2i and Z3i, i ¼ 1,2, y. The model g is defined by X ¼ gðZÞ ¼
X
Z 2i Z 3i
where the sum is over i ¼ 1,2, y, Z1, as X is given by the number of hazardous situations where both safety barriers fail. Note that in practice we seldom express the g function explicitly as in this example. The function is implicitly given by the system representation, for example the event tree, the fault tree or the influence diagram. The quantities X and Z are unknown and in the analysis we predict these quantities and express associated uncertainties. First we assess the uncertainties about Z. This is performed by assigning subjective probabilities according to the Bayesian paradigm, and through g we establish a probability distribution of X. This probability distribution P(Xrx) is conditional on the background information K of the assessor, and we write P(Xrx|K). The model g is a part of this background information. The models are used as tools to obtain insight into the phenomena studied and to express risk. They form part of the conditions and the background knowledge on which the analysis is built. It is obviously important to reflect on how suitable the model is for its objective. In this regard, however, it is not only the model’s ability to reflect the real world that is the point, but also its ability to simplify complicated features and conditions. Consequently with respect to the subjective probability P there is no meaning in speaking about model uncertainty, as the model g is a tool used to produce the resulting uncertainty distribution of X. Of course, we may be concerned about the accuracy of the model, but that is another issue. Poor accuracy of the model does not produce uncertainties in the assigned probability of X, as the probability is conditional on the use of this model. Returning to the event tree example, we establish a prediction of X by the following reasoning: General statistical data show that the hazardous situation of this type typically occurs 20 times, and hence we obtain the prediction Z 1 ¼ 20 of Z1. To assess the first safety barrier performance, consider 10 hazardous situations. Say that the assessor would predict 1 failure, and this gives a probability P(Z2 ¼ 1) ¼ 0.1. Now given that the hazardous situation occurs and the first safety barrier fails, how reliable is the second safety barrier? Say that we assign a probability of 0.4 in this case. That means that we predict 4 failures out of 10 cases. Or we could simply say that 0.4 reflects the assessor’s uncertainty (degree of belief). The reference is a certain standard such as drawing a ball from an urn. If we assign a probability of 0.4 for an event A, we compare our uncertainty (degree of belief) of A to occur with drawing a red ball from an urn having 10 balls where 4 are red [14]. Hence P(Z2 ¼ 1|Z1 ¼ 1) ¼ 0.4, and using the model g, we obtain the prediction X ¼ Z 1 Z 2 Z 3 ¼ 20 0:1 0:4 ¼ 0:8: We predict 1 accidental event for these operations. Next we need to address the associated uncertainties. One way of doing this is to specify a 90% prediction interval for Z1. If the interval is [10, 100], the assessor is 90% certain that the number of hazardous situations Z1 will be in this interval, given the background information of the analysis. The interval is determined based on the available information, that is, relevant data and expert judgments. From this analysis we obtain a 90% prediction interval for X equal to [0, 4], using the same probabilities for the barrier safety failures as above.
ARTICLE IN PRESS T. Aven, B. Heide / Reliability Engineering and System Safety 94 (2009) 1862–1868
More generally we may express probability distributions for Z1, and from this we obtain a distribution for X. A parametric probability distribution could also be used to express the analysts’ uncertainties, for Z1 and X. However, care has to be shown when it comes to the choice of distribution class and the interpretation of the distribution and its parameters. In our case, one may feel that a natural distribution choice for the number of hazardous situations Z1 is to use the Poisson distribution, with specified parameter values. But this distribution has a rather small variance (equal to its mean) and it may therefore not be appropriate for describing the analysts’ uncertainty. Instead a gamma-Poisson distribution (negative binomial distribution) [5] could be used as this distribution has a larger variance. Note that there is no correct distribution. The distribution chosen is a subjective assignment describing the analysts’ uncertainties. Nonetheless, we should always question if the assignments are reasonable given the background information. If there are large uncertainties about the number of hazardous situations, it will be hard to justify the use of the Poisson distribution. If the background information is strong, the Poisson distribution may be more adequate [3]. Also for the (b2) approach one may introduce models with unknown parameters. The parameters may not always be directly observable but should have meaningful interpretations. If such interpretations cannot be provided, it would make no sense to introduce the model and perform uncertainty assessment of the parameters. The models are just tools to support the predictions and uncertainty assessments of the ‘‘high-level’’ observable quantities (here X).
4.1. Reliability and validity—Bayesian approaches that predict observables We now discuss to what extent the Bayesian approaches that predict observables meet the reliability and validity criteria. The aim of the analysis based on these approaches is to produce predictions of observable quantities and assess associated uncertainties. Returning to the example above, we have produced a 90% prediction interval for X which equals [0, 4]. The prediction is 1. The situation is analogous to the probability of frequency approach. In both situations we are using subjective probabilities to express uncertainties. In the probability of frequency case we are uncertain about parameters, whereas for these Bayesian approaches, we are uncertain about observables. The validity criterion V1 is not relevant as there is no true risk defined. We refer to Section 3 and the conclusions made in the next section. A remark concerning criterion V4 is in place: the degree to which the analysis address the right quantities. As the aim of the analysis is to predict observables and assess associated uncertainties, this validity requirement is met. The observables are directly expressing the interesting features of the actual system, for example the number of fatalities, the costs, etc.
1867
5. Conclusions and final remarks The conclusions from the previous two sections are summarized in Table 1. We summarise the conclusions in the following way: The traditional statistical methods meet the reliability and validity criteria only if a large amount of relevant data is available. For the probability of frequency approach the validity requirements V1 is not in general satisfied, and V2–V4 are questioned;
important uncertainty factors may be hidden in the background knowledge (V2),
the uncertainty assessments may not be complete (V3) and the analysis focus is on fictional quantities (V4). When we ignore the hidden uncertainties in the background knowledge, the probability of frequency approach in general may meet the validity requirement V2 if the analysis is based on a set of standards established for such assignments. For the probability of frequency approach, the reliability criteria are not in general met. The background knowledge that the assignments are based on would not be exactly the same from analysis to analysis. However, if the methods and data are fixed, the differences from one analysis to another are not likely to be large if V2 is met. For the Bayesian perspective we distinguish between Bayesian approaches estimating non-observable parameters (b1) and the Bayesian approaches that predict observables (b2). The conclusions for the former type of approaches are the same as for the probability of frequency approach. As for the probability of frequency approach, the validity requirements V2 and V3 are also questioned for the Bayesian approaches (b2);
important uncertainty factors may be hidden in the background knowledge (V2) and
the uncertainty assessments may not be complete (V3). The reliability criteria are not in general met for the (b2) approaches, similar to the probability of frequency approach. The background knowledge that the assignments are based on need not be exactly the same from analysis to analysis, and we may again repeat the conclusions from the probability of frequency approach. Our perspective in this paper is based on a ‘‘risk analysis science’’ standpoint. We seek to identify to what extent the risk analysis meets the reliability and validity criteria when conducted according to some specific approaches. Some stakeholders may not benefit from an analysis meeting all criteria. For example, a decision-maker who would like to use the risk analysis to verify that a specific arrangement has an acceptable risk, may benefit from an analysis which fails to meet the validity requirement V3, the epistemic uncertainty assessments are not complete. The opposite situation is also relevant, to demonstrate that the
Table 1 Summary of reliability and validity analysis. Approach
Traditional statistical analysis, large amount of relevant data available Traditional statistical analysis in other cases Probability of frequency and Bayesian approaches estimating non-observable parameters Bayesian approaches that predict observables
Criterion R
R1
R2
R3
V
V1
Y N Y/N Y/N
Y N Y/N Y/N
Y N Y/N Y/N
Y N N N
Y/N Y/N Y/N Y/N
Y N N
V2
Y/N Y/N
V3
V4
Y/N Y/N
Y/N Y/N Y/N Y
Y indicates that the criterion is met, N that it is not met, and Y/N that it is met under certain conditions. The boxes are empty in cases where the criterion is not relevant.
ARTICLE IN PRESS 1868
T. Aven, B. Heide / Reliability Engineering and System Safety 94 (2009) 1862–1868
arrangement is not acceptable, more uncertainties should be added, providing wider uncertainty intervals. Some philosophers of science seem to claim that scientific methods must produce facts, see e.g. Chalmers [8]. But is science limited to cases where certainty can be claimed, or does science also have a role to play in cases where there are large uncertainties? We will argue that risk analysis fulfils some of the basic scientific requirements as discussed in Section 2, but key requirements as reliability and validity are not in general satisfied. Under certain conditions these criteria are met, and to varying degree depending on the risk perspective. Our analysis reveals that the Bayesian approaches that predict observables have good scores compared to the other perspectives. We acknowledge, however, that such a conclusion can be subject to discussion. There are no truths about these issues. It has not been an aim of this paper to ‘‘prove’’ that the (b2) Bayesian approaches are superior to the others. Rather our goal has been to clarify the scientific basis of the various approaches and test these against the various requirements. Hopefully, our work will initiate a discussion about the fundamentals of risk analysis. Such a discussion is important but has not been very intense recently.
Acknowledgement The authors are grateful to three anonymous reviewers for valuable comments and suggestions to earlier versions of this paper.
References [1] Apostolakis G. The interpretation of probability in probabilistic safety assessments. Reliability Engineering and System Safety 1988;23(4):247–52. [2] Apostolakis G. The concept of probability in safety assessments of technological systems. Science 1990;250:1359–64. [3] Aven T. Foundations of risk analysis—a knowledge and decision oriented perspective. New York: Wiley; 2003. [4] Aven T. Risk analysis and science. International Journal of Reliability, Quality and Safety Engineering 2004;11:1–15. [5] Barlow RE. Reliability engineering. Philadelphia: SIAM; 1998. [6] Bedford T, Cooke R. Probabilistic risk analysis. Foundations and methods. Cambridge: Cambridge University Press; 2001. [7] Bernardo J, Smith A. Bayesian theory. New York: Wiley; 1994. [8] Chalmers AF. What is this thing called science? Buckingham: Open University Press; 2000. [9] Cooke RM. Experts in uncertainty: opinion and subjective probability in science. New York: Oxford University Press; 1991. [10] Cumming RB. Is risk assessment a science? Risk Analysis 1981;1:1–3. [11] Graham JD. Verifiability isn’t everything. Risk Analysis 1995;15:109. [12] Kaplan S, Garrick BJ. On the quantitative definition of risk. Risk Analysis 1981;1:11–27. [13] Lindley DV. The philosophy of statistics. The Statistician 2000;49:293–337. [14] Lindley DV. Understanding uncertainty. Hoboken, NJ: Wiley; 2006. [15] Lindley DV, Tversky A, Brown RV. On the reconciliation of probability assessments (with discussion). Journal of Royal Statistical Society A 1979;142:146–80. [16] Mosleh A, Bier VM. Uncertainty about probability. A reconciliation with the subjectivist viewpoint. IEEE Transactions on Systems, Man and Cybernetics Part A—Systems and Humans 1996;26:303–10. [17] Research Council of Norway: RCN. Quality in Norwegian research—an overview of terms, methods and means, Oslo, 2000 [in Norwegian]. [18] Singpurwalla ND. Foundational issues in reliability and risk analysis. SIAM Review 1988;30:264–82. [19] Singpurwalla N. Reliability and risk. A Bayesian perspective. New York: Wiley; 2006. [20] Weinberg AM. Reflections on risk assessment. Risk Analysis 1981;1:5–7.