Reliability Engineering and System Safety 23 (1988) 285-292
Technical Note Will the Real Probability Please Stand Up?
A BS TRA C T To the editor's four questions we answer as follows: 1.
The philosophical basis is probability as a measure of degree of confidence and probability curve as expression of state of confidence, state of knowledge, or state of certainty. 2. The strength of this approach is simplicity, conceptual coherence, ability to handle any situation involving partially complete, partially relevant, and different kinds of evidence, and simple computational procedures for combining uncertainties. 3. This approach has real impact on the .feasibility, interpretability, and usefulness of PSA for decision purposes. 4. Lack of understanding of this approach has led to misuse and, especially, to non-use of PSA ; this has led to wrong decisions contributing to disasters such as Chernobyl, Challenger, Piper Alpha, Henderson, etc.
1 INTRODUCTION As y o u say, the meaning o f the word probability is an unresolved issue in probabilistic safety assessment (PSA). I certainly hope, though, that it is not destined to remain so, and I applaud your personal efforts, as reflected in your letter, to move the matter toward resolution. The meaning of this word has been the subject of vigorous debate since the time of Laplace and Bayes. Indeed, the linguistic situation in this field has often been likened to a Tower o f Babel or to a battlefield. The chaos has become even worse at the present time with the recent proliferation of new 285 Reliability Engineering and System Safety 0951-8320/88/$03.50 © 1988 Elsevier Science Publishers Ltd, England. Printed in Great Britain
286
Stan Kaplan
theories (Demster-Schaeffer theory, fuzzy set theory, etc.) put forth to deal with uncertainty and with the alleged deficiencies of probability theory. Although in a fundamental sense this debate is, at bottom, 'only' semantic, the resulting misunderstandings and miscommunications have exacted a heavy price indeed. For one thing, they have resulted in underuse, nonuse, and misuse of the discipline of PSA. This, in turn, in my opinion, has been a contributing factor to such events as Chernobyl, Challenger, Bhopal, Three Mile Island, the Beirut marine barracks, the Iran rescue mission, and many others. For another thing, they have, in my opinion, led to a far from optimal handling of issues of reliability in the design, procurement, testing, and maintenance of all sorts of industrial and military equipment. In the National Aeronautic Space Administration (NASA) organization, for example, because of earlier misuse and misunderstanding of the idea of probability, it has been a fixed point of the corporate culture that quantitative risk and reliability methods not be used and that 'bottom-line' numbers not be calculated. Similar attitudes have been present in other organizations, leading not only to poor safety performance but to wasteful cost overruns through suboptimal allocation of resources. Of course, without quantifying sources of risk and unreliability, how can one possibly allocate intelligently? The above is my answer to your Question 4. Many painfully wrong decision have been made. Another example of misunderstanding, which may currently be leading to wrong decision, is present in the March 21 issue of Time magazine (p. 61). In discussing heterosexual transmission of the AIDS virus, the article referred to a study in which, of 25 husbands and 55 wives of AIDS patients, 2 husbands and 10 wives became infected. This difference, the article says, was 'not considered statistically significant'. I say that whoever made that consideration did not understand probability. The picture is not entirely bleak, however. On the other side, we have recently seen a proper use of probability play an important role in major nuclear plant licensing decisions ~- 3 and we see the regulatory establishment moving strongly 4 toward the idea that PSA should be the centerpiece and guiding framework within which risk management and risk regulation is done. Even NASA, in a recent directive, 5 is beginning to open to the idea that PSA-type thinking may have something to offer them. With respect to your remaining three questions, which are very good questions, I have to say that I am not expert on alternate approaches. Truthfully, I don't understand any of them, and thus I cannot do a good job of contrasting strengths and weaknesses. I will confine myself therefore to giving as clear a statement as I can of what we mean by probability at Pickard, Lowe and Garrick, Inc., and how we use it in PSA. I will leave it to the partisans of the alternatives to present their approaches with equal clarity.
Will the real probability please stand up?
287
2 THE DEFINITION OF PROBABILITY We begin with the idea o f 'confidence' as a subjective quality that we experience in the same way that we experience 'size' or 'heaviness.' Confidence, as we experience it, has a linear or scalar character in that we readily say that our confidence in A is greater, less than, or the same as our confidence in B. We desire, therefore, as an aid to our decision making, to establish a numerical scale o f confidence in the same way that we established scales o f length or weight to aid in our construction of roads and bridges. In the same way, then, that we chose the king's foot as the unit of length, let us arbitrarily choose the interval zero to one as the length of our scale of confidence. Thus, irA represents a sentence of the type that can be said to be either true or false, we will mean by C(A)= 1, that we are completely confident that A is true. By C(A) = 0, we indicate our complete confidence that A is false. By C(A) = 0"5, we communicate that we have equal confidence that A is true as we have that it is false. We have thus calibrated three points on our confidence scale. To calibrate the remaining points, let us adopt the device of a lottery basket. We thus imagine a basket containing tickets numbered 1 to 1 000. We now blindfold ourselves and draw a ticket. Still blindfolded, we ask ourselves, 'is the ticket in our hand numbered between 1 and 250?' With respect to this question, we experience a certain degree of confidence. Let us agree to call this degree of confidence 0.25. Continuing in this same way, we calibrate the entire scale. That being done, we can now use this scale in a bureau o f standards sense to communicate our degrees o f confidence with respect to any proposition. Thus, ifI tell you that I am 60% confident that it will rain tomorrow, you will know exactly what I mean. Now, adopting what we believe to be the usage of Laplace and Bayes, we define probability to be s y n o n y m o u s with confidence, as defined by the above scale. We thus writep(A) = C(A) for all A and also refer to the scale as the probability scale. In our usage, then, the word probability is totally s y n o n y m o u s and interchangeable with the words confidence or degree of confidence. 3 BAYES' T H E O R E M A N D T H E A X I O M S O F P R O B A B I L I T Y THEORY Pursuing the scale we have defined, we discover that it has the following properties: p(A) + p(.~) -- 1.0 (1)
p(A A B) = p(A)p(BI A) = p(B)p(A I B)
(2)
288
Stan Kaplan
Thus, from this point of view, the so-called axioms o f probability theory can be seen not as new postulates but as simple consequences of the way we chose to calibrate our scale o f confidence. If we now equate the two right sides of eqn (2) and divide by p(B), we obtain Bayes' theorem,
[-p(BIA)7 p(AIB) =p(A)[_ p - ~ -j
(3)
which tells us how our degree o f confidence in A changes when we learn information B. We see thus how Bayes' theorem derives directly from our definition o f the confidence scale. In this light, therefore, we may regard this theorem as the fundamental law o f logical inference and the fundamental rule for the quantitative evaluation of evidence. It goes even deeper than that. With a little more reflection on the process of calibrating the scale, we come to the view that Bayes' theorem is actually the definition o f what we mean by logical and rational thinking.
4 T H E USE O F P R O B A B I L I T Y A N D BAYES' T H E O R E M IN PSA
4.1 The quantitative definition of risk In doing a PSA, we ask three questions: 6 1. What can go wrong (with our plant, system, or operation)? 2. How likely is that to happen? 3. If it does happen, what are the consequences? We express the answers to these questions in the form of a set of triplets, R = {<,% l,, x,)}
(4)
where si = the name and description of the ith scenario (an answer to: 'What can go wrong?'), l~ = the likelihood of that scenario, .¥~ = a measure, or measures, of the consequences of that scenario, and say, therefore, that the risk, R, 'is', by definition, this set of triplets.
4.2 Expressing likelihood, the probability of frequency format The next question, then, is how to express the idea of likelihood. There are m a n y ways for doing this. The most useful for PSA purposes is to adopt the
Will the real probability please stand up?
289
'probability o f frequency format'. In this format, we imagine a thought experiment in which we operate our plant or system many, m a n y times. In this experiment, each scenario will also occur m a n y times. We can thus measure in this experiment the occurrence frequency, ~bi, of each scenario in units of occurrences per unit time or occurrences per demand. In setting up this thought experiment, we have assumed that each scenario will occur at r a n d o m points in time but with the fixed long-term average frequency q~i.In other words, we have assumed our plant or system operates according to a r a n d o m process or 'Poisson' model. Real systems, o f course, do not operate this way. Nevertheless, this model is often a useful approximation for risk and reliability assessment purposes. The quantities ~b~are then parameters o f the model. We adopt these parameters as useful figures o f merit for characterizing the risk and reliability of our system. The next question in carrying out a PSA is, 'What are the numerical values o f these ~bg' In any real situation, of course, we will not know these values exactly, but we will know something about them. We therefore express what we do know in the form o f probability curves against the possible values of ~b~, as shown in Fig. 1. These curves are thus called probability of frequency curves. The ordinate of such a curve over any particular horizontal point x gives our confidence per unit x that the true value o f the frequency q~gis equal to this value x. This degree of confidence is based on, of course, and is in fact an expression of, all the evidence and information we have. The next question, therefore, is how do we actually develop these curves so that they fairly and accurately express our state o f knowledge about the ~i.
4.3 Breaking down scenarios (risk modeling) If we have a lot of experience with our system, we can develop the curve directly from the operating statistics. Usually, we are not in this position. One of the ways we proceed, then, is the 'analysis' or 'breaking down' approach. We note that a given scenario, s i, a macroscopic event, can be
>-->_ m~ mu.i n tOi x F R E Q U E N C Y OF S C EN AR IO i Fig. 1.
P r o b a b i l i t y o f f r e q u e n c y curve.
290
Stan Kaplan
viewed as a combination of microscopic or elemental events, such as a valve fails, a pipe joint leaks, etc. Let 2j stand for the frequency of the ith such component event. By analyzing our system, we may express the scenario frequency, q~i, in terms of these elemental frequencies, )'.r (5) These functions, Fi, along with the scenarios themselves, constitute what we call the 'risk model'. Our strategy will now be to develop probability curves, pj(2j), for the )~j and then 'propagate' these through the functions, F;, to obtain the resulting curves, p(~bi) for the ~ba. (9i = Fi(2i .... ),j. . .)
4.4 Developing the probability of frequency curves We develop these curves, p~2j), by using Bayes' theorem. Thus, let E denote the totality of evidence and information we have, relevant to the value of 2j. The ordinate of the curve then, over any x may be understood asp(x] E), our confidence that the true value of 2j is x, given the evidence E. By Bayes' theorem, then, , ,[-ptEI x)-] p(x [ E) = p~x,L ~-~7)- [ (6) This is the general statement of how we develop the curves. The specific procedures depend on the exact combination of available evidence E in each case. Examples of these procedures v's could fill a good-sized book. For present purposes, let us content ourselves with listing some of the types of evidence of which use can be made: 1. 2. 3. 4. 5. 6. 7.
Specific operational experience with our particular piece of equipment. The experience of other similar equipment elsewhere. The experience of earlier versions or generations of the equipment. The experience of similar equipment under test conditions or under different operation and maintenance policies. The experience of occurrence of 'near miss' or 'precursor' events. The predictions of theories, results of engineering calculations, models, simulations, etc. The judgements of experts. 5 WHAT D I F F E R E N C E DOES IT MAKE?
You have asked what difference it makes if one uses different definitions of probability. In the introduction, I gave a general answer, asserting that confusion about the meaning of probability and the traditional lack of
Will the real probability please stand up?
291
precision in distinguishing between probability and related ideas like frequency have contributed to ineffective use of PSA and, indeed, in some circles have given this discipline a bad reputation with unfortunate consequences to the safety of our technologies. On the basis of the previous section, we can now add two more specific answers. I assert that no other definition of probability and no other alleged theory for dealing with uncertainty, can properly include and account for the various kinds of evidence listed in Section 4.4. Also, I assert that no other theory, particularly not the classical statistical confidence interval approach, can handle the propagation step, described in Section 4.3, in as simple and clean a way as probability theory, as we have defined it here. I make these assertions as a challenge to the alternative approaches and will be happy to be proven wrong. Finally, let us ask what difference it makes to use probability at all, as opposed to using single values, so-called 'best estimates' or 'point estimates'. The answer is that decisions are always made under uncertainty. A decision maker needs to quantify his uncertainty. He needs to know the odds, so he can place his bets intelligently. This is obvious, but, just to drive the point home, consider Fig. 2. Figure 2 shows two different probability curves having the same mean value. Suppose these curves represented two states of knowledge, A and B, about the value of parameter x, where x is the failure rate of our system or plant. The mean value, 2, of the curves is then our best guess or point estimate. Suppose further, now, that this mean value is an unacceptably high failure rate. Since the mean values are the same for both states of knowledge, A and B, if we gave the decision maker only the point estimate, he would have to make the same decision in both cases. The information conveyed to him is the same in both cases. On the other hand, the story told by the probability curves is vastly different in the two cases. In case A, we are saying that the failure rate is too high and we are sure of that. The decision in this case would
PA(x) i
Fig. 2.
LOGx T w o different states of knowledge with the same mean value.
292
Stan Kaplan
have to be to change the hardware. We would have to move steel and concrete. By contrast, what does the curve B tell us? It tells us that the failure rate is probably quite small, very acceptable, but we are not quite sure of that. In this case, the decision is to not change the hardware, the hardware is probably very good. What we need in this case is more confidence. Therefore, study further, run some more tests, etc. Gain some more information, and then make a decision on the hardware. The full probability curve thus tells the full story. A n y single number at best tells only a small part of the story and at worst can be downright misleading. The full probability curve, calculated properly by Bayes' theorem from all the available evidence, tells the whole truth of what we know and do not know about our system. When we tell the truth to our clients, our public, and especially to ourselves, we make better decisions, we have less dispute, and everything works out better. That is the difference probability makes. That is why we use it. REFERENCES 1. Atomic Safety and Licensing Board, Indian Point Hearings. Docket Nos. 50247SP, 50-286SP, Washington, DC, January 1983. 2. Brookhaven National Laboratory, Technical Evaluation of the EPZ Sensitivity Study for Seabrook. FIN A-3852, prepared for the US Nuclear Regulatory Commission, Upton, NY, March 1987. 3. Atomic Safety and Licensing Appeal Board, Three Mile Island Station. Docket No. 50-320, Washington, DC, December 1978 and March 1979. 4. Miraglta, F. J., Integrated Safety Assessment Program. US Nuclear Regulatory Commission, ISAP II Generic Letter 88-02, Washington, DC, 20 January 1988. 5. Fletcher, J. L., NASA Management Instruction, Risk management policy for manned flight programs. NMI 8070.4, Washington, DC, 3 February 1988. 6. Kaplan, S. & Garrick, B. J., On the quantitative definition of risk. Risk Analysis, 1 (I) (1981), 1-7. 7. Kaplan, S., On a 'two-stage' Bayesian procedure for determining failure rates from experiential data. IEEE Transactions on Power Apparatus and Systems, PAS-102 (1) (1983), 195-202. 8. Kaplan, S. & Tinsley, G. A., On using information contained in near miss and precursor events by means of Bayes' theorem. In Proceedings, ANS 1988 Annual Meeting, San Diego, CA, 12 18 June 1988.
Stan Kaplan Pickard, Lowe and Garrick, Inc., 2260 University Drive, Newport Beach, California 92660, USA