Reliability Engineering and System Safety 73 (2001) 121±136
www.elsevier.com/locate/ress
Risk signi®cance and safety signi®cance R.W. Youngblood* ISL Inc., 11140 Rockville Pike, Suite 500, Rockville, MD 20852, USA Received 8 March 2001; accepted 24 May 2001
Abstract Existing measures of the risk signi®cance of elements of risk models (such as the Fussell±Vesely, or `F±V', importance of basic events) are based on the properties of cut sets containing the element. A measure of safety signi®cance (prevention worth, or PW) is proposed, based on the properties of path sets containing the element. A high value of F±V means that cut sets containing the element contribute signi®cantly to top event frequency; a high value of PW means that path sets containing the element contribute signi®cantly to top event prevention. The properties of PW as a measure of basic event signi®cance are illustrated ®rst with a simple block diagram example, and then with an example based on nuclear power plant risk models. PW can also be understood as a property of a set of success scenarios, and as such, can be applied more broadly than just as a measure of element signi®cance. q 2001 Elsevier Science Ltd. All rights reserved. Keywords: Importance measures; Fussell±Vesely; Risk signi®cance; Safety signi®cance
1. Introduction The primary purpose of this paper is to present and discuss a measure of `safety signi®cance' that is complementary to the familiar measures of `risk signi®cance', such as Fussell±Vesely (F±V), Birnbaum, Risk Reduction Worth, and risk achievement worth (RAW). Some de®nitions will be recapitulated below; recent and more complete discussions of these measures are given in Refs. [1,2]. Measures of this kind are useful tools for understanding the complex and voluminous output of large-scale risk models. Safety signi®cance Ð distinct from risk signi®cance Ð is among the ideas discussed in Ref. [1]. In this paper, the RAW is considered as a possible measure of safety signi®cance. The RAW of a basic event measures the impact of setting the event to `true', i.e. permanently failing a component. Two major criticisms of RAW as a measure of safety signi®cance are given in Ref. [1]. First, the usual interpretation of RAW corresponds to a sensitivity study that is generally considered too extreme to be useful. Second, because the RAW is a relative measure, it is dif®cult to formulate an objective criterion for applying it in designating components' safety signi®cance. Another perspective on RAW emerges from the discussion in this paper. The present paper begins with the idea that since F±V,
which is based on cut sets, is widely considered to be a useful measure of risk signi®cance, the properties of a measure based on path sets should be explored as a measure of safety signi®cance. Such a measure is introduced below, and its properties are discussed and compared with those of common measures of risk signi®cance. This is done ®rst with an example simple enough to be solved by hand, and then with an example based on core damage accident sequences emerging from a partial and simpli®ed model of a commercial nuclear power plant. Beyond the use of importance measures for understanding complex models, attempts are also sometimes made to apply these measures to decisions informed by model outputs, such as formulation of safety cases in general, or component classi®cation in particular. The primary purpose of the present paper is to support mainstream applications of importance measures as tools for understanding the output of complex models, and even to support applications of those models, based on appropriate interpretations of the measures. A detailed and technically defensible approach to component classi®cation is beyond the scope of this paper. Nevertheless, because attempts are being made to apply `importance measures' to component classi®cation, the applicability of these ideas to component classi®cation will be discussed brie¯y. 2. Safety signi®cance
* Fax: 11-301-468-0883. E-mail address:
[email protected] (R.W. Youngblood). 0951-8320/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved. PII: S 0951-832 0(01)00056-4
The measure is ®rst introduced with reference to a simple
122
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
system, so that the distinction between initiating events and mitigating system failure events can be suppressed. In this section, the discussion is based on a single system, and the risk metric is a system failure probability. Risk signi®cance, therefore, refers to the signi®cance of a contribution to system failure probability, and safety signi®cance refers to the signi®cance of a contribution to system success probability. In a notation similar to that of Refs. [1,2], the baseline risk is given by 0 1 [ R0 P@ MCSj A;
1 j
where R0 is the baseline risk (at this S point, just the failure probability of the subject system), j MCSj the union of the minimal cut sets, and P
X is the probability of event X. The RAW of element i is given by RAWi
R1 i ; R0
2
where R1 i is the risk measure recalculated without credit for successful performance of element i, that is, with element i set to `failed'. The Birnbaum importance of element i is given by 2 B i R1 i 2 Ri ;
3
where R2 i is the risk measure recalculated with credit for perfect performance of element i, that is, with element i set to zero failure probability. The F±V of element i is given by 0 1 [ i P@ MCSj A F±Vi
0 P@
j
[ j
1;
4
MCSj A
S where j MCSij is the union of the minimal cut sets containing element i. This measure re¯ects the contribution to risk of the MCS containing element i, normalized by the total risk. Therefore, this measure is an index of the risk signi®cance of element i. The numerator of the F±V is of interest in itself (being an absolute measure of risk signi®cance rather than a relative one), and it is the numerator of the F±V to which the initial analogy is developed below. Consider the quantity 0 1 [ i i PW P@ MPSj A;
5
the PW of an item is related to the value of its contribution to preventing the `top event', system failure. For the elements of a single system, which is so far the extent to which this measure should be interpreted, the PW is clearly analogous to the numerator of the F±V, differing only by the substitution of path sets for cut sets. Arguably, the pros and cons of the F±V, whatever the reader considers them to be, should correspond in some complementary way to PW, recognizing that PW bears on the functionality (the path sets) supported by the element, rather than the vulnerability re¯ected in the cut sets containing the element. Because success probabilities in practical problems are of the order of unity, it is easier to think about the signi®cance of a given value in failure space, or alternatively to quote PW in terms of the number of `nines' of reliability afforded by the collection of success paths being discussed. For example, if the success probability is 0.99, then two nines of reliability are being provided. This measure is essentially logarithmic, and it is natural when quoting a success probability simply to quote the negative of the logarithm of the complementary failure probability, i.e. NINES
PW 2log
1 2 PW
6
with common (base 10) logarithms being understood. Most of the PW results quoted in this paper are based on this nines measure. It is important to bear in mind that even though this measure makes use of a calculation in failure space, it quanti®es a property of a speci®c collection of path sets. Refer to Section 4 for more discussion of a computational approach. 2.1. Qualitative properties of the prevention worth measure Fig. 1 focuses on portions of a reliability block diagram of a hypothetical system. For elements in parallel as in Fig. 1a, the element having the higher reliability will have a higher PW, because its path sets have higher success probability. This follows because the difference between the success probabilities of path sets containing A and the success probabilities of path sets containing B is due to the difference between the reliability of A and that of B. Since A and B always appear together in cut sets, they have the same F±V.
j
S
where j MPSij is the union of minimal path sets containing element i, and `PW' stands for `prevention worth'. The name prevention worth has been chosen to suggest that
Fig. 1. Behavior of signi®cant measures for series and parallel elements.
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
123
Fig. 2. Simple example.
For elements in series as in Fig. 1b, the element having the lower reliability will have the higher F±V. This follows because the collection of cut sets containing A is completely analogous to the collection of cut sets containing B, the only difference being the substitution of A for B, with the cut set probabilities changing accordingly. A and B necessarily have the same PW, because they are involved in the same path sets. This property of elements in series is complementary to the equivalence of the F±V of elements that are in parallel. Complexity arises when success paths share some elements. This point is illustrated in the example discussed in Section 2.2. 2.2. Simple example A reliability block diagram example simple enough to be solved by hand is presented in Fig. 2, which also shows the element failure probabilities, cut sets, and path sets. Table 1 Table 1 Signi®cance measures for elements of block diagram example
A B C D E F G
F±V
PW (Nines)
RAW
1.00 6.02 £ 10 21 6.63 £ 10 22 3.31 £ 10 21 3.61 £ 10 22 3.61 £ 10 22 3.61 £ 10 21
0.40 3.78 1.22 1.22 1.00 2.68 2.92
2.50 6024.10 7.23 7.23 1.33 37.11 362.05
below presents various component-level measures for each element. As discussed in Section 4 almost all calculations in this paper are carried out within the rare event approximation applied in failure space. In this simple example, the error in top event probability is not very great; the rare-event failure probability is 6.64 £ 10 25, while the exact result is 6.6175 £ 10 25. Figs. 3 and 4 present the F±V and the PW (quanti®ed as the nines index) for these elements. Element A is a path set by itself. Therefore, it appears in every cut set. Therefore, it has the highest possible F±V (i.e. unity). Element B has the highest PW; it supports path sets having the highest collective success probability. This comparison between A and B is a simple extension of the comparison discussed in connection with Fig. 1a. A similar comparison obtains between E and F; they have the same F±V, but different PW. Being in series, C and D have the same PW, but different F±Vs. 2.3. Relationship of prevention worth to risk achievement worth and Birnbaum Fig. 5 shows the RAW of the elements in the simple example. The numerator is calculated by quantifying the failure expression re-reduced with the element set to fail, and the denominator is the baseline failure probability. Note that Fig. 5 is on a log scale. There is a visual similarity between Figs. 4 and 5. In fact, for elements A and B, the nines index and log(RAW) are equal, as can be veri®ed by hand calculation. Some
124
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
Fig. 3. Fussell-Vesely importance.
Fig. 4. Prevention worth (contribution to success probability).
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
125
Fig. 5. Risk achievement worth.
relationship between RAW and PW is not unexpected: there are, after all, only so many single-event measures that are really independent of each other. In this case, the relationship is based on the circumstance that the functionality quanti®ed in the PW of an element (the success paths containing the element) is the very functionality that is eliminated in the `sensitivity study' on which the RAW is said to be based. However, the RAW evaluates the impact in terms of the capability surviving after the subject element is failed, and normalizes the result by dividing by the baseline risk. PW does not invoke the remaining capability, nor does it normalize, and therefore only equates to log(RAW) in simple cases. For A and B, log(RAW) and NINES(PW) work out to be the same, because A decouples from the rest of the problem, and all paths containing B similarly decouple from A. For other elements, the picture is more complex and the comparison is correspondingly less clear-cut. This partial connection between PW and RAW re¯ects from a different perspective the properties of the RAW that warranted its having been discussed in Ref. [1] as a possible measure of safety signi®cance. However, unlike the usual presentation of RAW, the PW de®ned here is NOT based on a sensitivity study, but rather on quanti®cation of baseline success probability. Also, the PW de®ned above is not normalized, but its close relationship to RAW actually derives from the fact that RAW is normalized. Additional differences between PW and RAW are introduced when the PW measure is extended to cover multiple initiating events.
As can be veri®ed by hand calculation, the comparison between PW and Birnbaum is very similar to the comparison between PW and RAW. This is expected, because like RAW, the Birnbaum of an element depends very strongly on what is in parallel with the element, while the PW of an element depends on what is in series with the element. 2.4. Extension to multiple initiating events The simple example discussed before corresponds to a system failure probability, conditional on a generic initiating event. Measures of risk signi®cance, such as F±V, generalize naturally to consideration of more complex situations involving different accident sequences that lead to the same plant damage state but involve multiple systems and diverse initiating events. For example, it is natural to compute an F±V measure over the set of cut sets leading to a given plant damage state. At ®rst glance, one might consider extending the above discussion simply by complementing the accident sequence cut sets, initiating events and all. This is not necessarily the only choice. Fig. 6 presents a classi®cation scheme for scenarios. In scenarios binned into the lower right quadrant, the adverse consequence occurs: an initiating event and mitigating system failures occur. In the upper right quadrant, the initiating event occurs, but mitigation is successful. In the lower left quadrant, mitigating systems are failed but initiating events are prevented. In the upper left quadrant,
126
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
Fig. 6. Relationships between signi®cance measures and scenario types.
mitigating systems are available and initiating events do not occur. Traditional measures of element risk signi®cance dwell in the lower right quadrant. It might be supposed that a true measure of safety signi®cance should dwell in the upper left quadrant. However, there is merit to examining the upper right quadrant for the safety signi®cance of mitigating system elements. A measure of the safety signi®cance of a mitigating system element based in this quadrant can re¯ect the value of the consequences averted by the successful performance of the functionality supported by the element. For example, for commercial nuclear power plants, the measure might re¯ect the number of core damage events per year prevented by successful performance of the functionality associated with the element. Accordingly, for mitigating system elements, we compute the PW of a given element for each initiator, and weight each PW by the initiating event frequency X PWi f
IEj PWij ;
7 j
where f
IEj is the frequency of initiating event j, and PWij is the prevention worth of the element's success paths, calculated for initiating event j. For a given element, this yields a measure of consequences averted by the performance of success paths containing the element. There may be many strong success paths that do not contain the element as well as some that do, so it is not proper to interpret this measure as implying that paths associated with this element were solely responsible for event prevention. However, this measure re¯ects the value of the overall functionality supported by the element. Finally, to obtain a PW averaged over initiating events, we divide by the total initiating event frequency X f
IEj PWij PWi
j
X j
f
IEj
:
8
Both PWi and PWi are of some interest. The latter is a
number less than or equal to one; it is a measure of safety signi®cance conditional on the average initiating event, and for some purposes is easier to think about. Being less than one, it can be converted to a nines index. Depending on initiating event frequencies, the former can be a number greater than one; it measures consequences averted by the functionality associated with the element, and has an interest of its own, but the nines quanti®cation does not apply. For initiating events, the above argument suggests a measure derived from the lower left quadrant of Fig. 6. One would compute the probability of non-occurrence of the initiating event (a kind of interval reliability) and weight this quantity by the failure probability of the mitigating systems, conditional on occurrence of the initiator. That is, this measure would correspond to the consequences averted by non-occurrence of the initiating event, conditional on nominal performance of the mitigating systems. Depending on the actual formulation, this might have units different from those of the averted consequences of the mitigating system elements. Although RAW was discussed as a possible measure of safety signi®cance in Ref. [1], and has a certain relationship to PW, it is arguably more appropriately considered a measure of risk signi®cance, because it dwells in the fourth quadrant of Fig. 6. These calculations are illustrated below in an example based on a nuclear power plant risk model. 2.5. Relative (normalized) and absolute (non-normalized) measures Although the numerator of the F±V of an element is a useful quantity in itself, being the absolute value of the risk contribution from cut sets containing the element, the F±V is `normalized', that is, de®ned with total risk in the denominator, so that it becomes a relative measure. It is therefore frequently quoted as a percentage value. RAW, too, has a denominator turning it into a kind of relative measure. One could consider de®ning a measure of safety signi®cance either way: as a non-normalized (absolute) measure, or as a normalized (relative) measure, referred somehow to total success probability. Because all the term probabilities involved are typically of order unity, some ways of doing this are less revealing than others. If one wishes to de®ne a relative measure, this could be done either by dividing an element's PW by the PW of the union of all success paths, or by dividing the nines index of the element by the nines index of the union of all the success paths. This would express the fractional contribution of the element's success paths to the overall nines index. In this paper, results are presented without normalization. 2.6. Group measures A topic of current interest is the de®nition of signi®cance measures for groups of elements, and discussion of signi®cance measures in terms of the relationship of the group
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
measure to the measures associated with the constituents of the group. The PW measure is not additive in general; that is, the PW computed for the union of success paths containing either or both of two given elements is not the sum of the PWs calculated for the two elements. (The analogous comment applies to the F±V measure.) The nines index can be additive, as for elements A and B in the simple example presented above; when this occurs, it is because the two groups of success paths are non-overlapping. (In the simple example, overall failure probability in the rare event approximation is 6.64 £ 10 25, so the nines index for the entire block is 4.18, which is also the sum of the indices for A and B.) When the nines index is not additive for two elements, this is because the associated success paths overlap. As de®ned, the PW of an element is really the PW of a collection of success paths speci®ed by their containing that element. The PW is a property of a union of success paths, and is straightforwardly de®ned for any collection of success paths. Like the F±V, it is a measure de®ned for groups of scenarios; it can be applied to components through the device of de®ning groups of scenarios in terms of the elements that they contain. In a sense, it is a group measure to begin with. One interesting component group to consider is an entire system within a large risk model containing other systems. A PW measure of a system can be straightforwardly formulated in terms of the union of all path sets of the system. However, care may be needed in interpretation and formulation of such a measure; partial system success may suf®ce completely in some cases, while in others, partial system failures may have complicated rami®cations for the success criteria of other systems. Another interesting group to consider in general is the group of all success paths. The PW of this group is a ®gure of merit describing plant response conditional on an average initiating event. This would also be the denominator in one relative measure of PW mentioned above. Potentially interesting groups could be speci®ed by their containing speci®c groups of elements, analogously to what can be done for the F±V measure (see, for example Refs. [3,4]). Finally, especially interesting groups of success paths are prevention sets, which will be discussed brie¯y in Section 5. 3. Example based on nuclear power plant risk models A simpli®ed risk expression has been developed to illustrate certain points. The features modeled are typical of some commercial nuclear plant types, but this model is not derived from any plant-speci®c model. The list of accident sequences is incomplete, and the model of plant response is incomplete. Just enough is modeled to illustrate certain properties of signi®cance measures with results that will seem familiar to the reactor safety community. The initiating events considered are general transient, loss
127
of offsite power, and large LOCA. The top event modeled is `core damage'. The term `risk signi®cant' therefore refers to the signi®cance of contributions to core damage frequency (CDF), and the term `safety signi®cant' refers to the signi®cance of contributions to core damage prevention. The major systems modeled are shown in block diagram form in Fig. 7a±d. The front-line systems included are the reactor protection system (RPS) (treated as a module with no internal structure), the auxiliary feedwater system (AF), safety injection (SI), and low-pressure injection (LP). SI and LP are two-train systems. Systems other than RPS were modeled very simply at the supercomponent level. The AF system was modeled in enough detail to re¯ect ¯owpath topology typical of some four-loop plants' AF systems. This was done in order to support comparison of the safety signi®cance and risk signi®cance of AF ¯owpaths and AF pumps. The AF system comprises a turbine-driven pump and two motor-driven pumps. Each MDP has ¯owpaths to two steam generators; the TDP has ¯owpaths to all four steam generators. Each ¯owpath includes a check valve and a ¯ow control valve, lumped as a supercomponent. AF ¯owpaths are not typically found to be risk-signi®cant, but are found below to be safety-signi®cant. Support systems were suppressed except for AC motive power. There are two divisions of AC, each diesel-backed. A supercomponent called `LD-TAP' was included; following an initiating event other than loss of offsite power, success of LD-TAP means that offsite power is available. Otherwise, diesels are required to support front-line system pumps other than the AF turbine-driven pump. Where applicable, the basic event probabilities used are generally typical of IPE values. The success paths considered (see also Fig. 8) are: For general transient and loss of offsite power: Given RPS success: AF ¯ow to at least two steam generators, or bleed and feed utilizing at least one SI pump and two PORVs. Given RPS failure: AF ¯ow to four steam generators, boration from at least one SI pump, pressure relief from three SRVs, or from two SRVs plus two PORVs. For large LOCA: LP ¯ow from at least one pump. Contributions from sequences of different types are shown in Table 2. It is emphasized that these results are Table 2 Contribution to core damage frequency by sequence type Sequence type
CDF contribution
Non-ATWS transients Non-ATWS loss of offsite power Large LOCA ATWS Transient ATWS Loss of offsite power Total
7.51 £ 10 28 2.83 £ 10 26 2.79 £ 10 29 2.93 £ 10 28 9.15 £ 10 29 2.94 £ 10 26
128
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
Fig. 7. (a) Block diagram of AF system. (b) Block diagram of SI system. (c) Block diagram of LP system. (d) Block diagram of AC power.
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
129
Fig. 8. Success paths.
not presented as an example of a complete risk pro®le; they are presented to help understand properties of the measures. Measures have been calculated for particular elements selected to illustrate particular points. The elements selected are listed in Table 3. The F±V, PW (presented in nines units), and RAW are given in Table 4 for the selected elements. 3.1. Discussion of F±V results The baseline risk in the example is dominated by blackout. Therefore, the appearance of the turbine-driven pump and the diesel near the top of the F±V list is expected. Large LOCA is a tiny contributor, so the LP pump is not very F±V signi®cant. The AC bus fault has a low F±V because the Table 3 Basic events for which signi®cance measures are calculated Item
Role in model
RPS LD-TAP
Reactor protection system Supercomponent responsible for transferring plant loads to offsite power following an initiating event other than loss of offsite power One of the plant diesel generators One of the plant AC buses (this event refers to a bus fault) One of the two AF motor-driven pumps One of the two ¯owpaths from an AF MDP to a steam generator The AF turbine-driven pump One of the four ¯owpaths from the TDP to a steam generator One of the two low-pressure injection pumps One of the safety injection pumps
DG-1-FTS AC-BUS-1 AF MDP AF MDP Flowpath AF TDP AF TDP Flowpath LP Pump SI Pump
event has a very low probability. The AF MDP is not highly risk-signi®cant, partly because bleed and feed is available (given AC) if the AF fails. This point will be revisited in a later subsection. 3.2. Discussion of PW results RPS. If ATWS occurs, core damage is not assured, but in this example, many components must succeed in order for core damage to be avoided. The RPS has a much higher success probability than does this complex of equipment, and heat removal following RPS success is much more likely to be successful. Therefore, the RPS has very high PW: Thinking only of the `reactor shutdown' function for the moment, the PW of the RPS relative to other elements can be understood as a case of Fig. 1a, with RPS considered the more reliable of the two legs shown in Fig. 1a. AC Bus. Note that this event refers to a bus fault, not the availability of power to a bus. Most success paths involve one AC train or the other; everything except the TD train and pressure relief depends on AC. Therefore, the AC bus has very high PW: This is a typical example of a lowprobability event that has a low F±V but high PW because it supports important success paths. AF Pumps. The AF pumps have moderately high PW: The TD pump is given a relatively high failure probability, so for the general transient (offsite available and a low failure probability assigned to LD-TAP), the PW of the MD pump is higher. Note that (unlike the F±V) the absolute PW of these pumps is not affected by the presence or absence of bleed and feed. (This point will be illustrated later.) AF Pump ¯owpaths. The ¯owpaths associated with each
130
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
Table 4 Signi®cance measures for selected elements PW (Nines)
F±V (%) AF TDP DG-1-FTS LD-TAP RPS AF MDP AF MDP Flowpath SI Pump LP Pump AF TDP Flowpath AC-BUS-1
99.40 97.73 1.93 1.30 0.79 0.26 0.18 0.09 0.01 0.00
RPS AC-BUS-1 AF MDP AF MDP Flowpath SI Pump DG-1-FTS AF TDP AF TDP Flowpath LD-TAP LP Pump
pump have PW almost as high as that of the pumps themselves. This is different from the F±V result and is, in a sense, a prototypic difference between these measures. In fact, it is almost a case of Fig. 1b, because ¯ow is required to two steam generators. All success paths involving a given AF ¯owpath also involve the associated pump, but not all success paths involving a pump involve all associated ¯owpaths; even for the MDPs, there are complex success paths involving multiple pumps but only one ¯owpath from each. Therefore, the PW of an AF ¯owpath is always less than or equal to the PW of the associated pump, but will typically be only slightly less (determined by the relationship between the ¯owpath failure probability and the pump failure probability). LP Pump. The PW of this element is almost nil, because the only role of LP in this simple model is mitigation of large LOCA, and as de®ned here, PW is weighted by the initiating event frequency. The low assigned frequency of large LOCA means that this element has little opportunity to avert core damage within the context of the sequences modeled here. These pumps may, of course, have other functions not represented in this simple model. DG Failure to start. This has a moderate safety signi®cance. Recall that this measure is absolute, not relative, and re¯ects the success probability of paths containing this element for each initiating event, weighted by the frequency of the associated initiating event.
3.3. Discussion of RAW results There are interesting similarities and interesting differences between the PW ranking and the RAW ranking: RPS. The RPS has a high RAW for reasons related to its safety signi®cance and to its low baseline failure probability. LD-TAP. LD-TAP is higher in the RAW table because this model is dominated by blackout to begin with, and loss of LD-TAP turns every initiating event into a loss of offsite power. AF Components. In the RAW table, the ¯owpath of
3.89 2.73 2.24 2.12 1.73 1.43 1.39 1.37 1.31 0.00
RAW RPS AC-BUS-1 DG-1-FTS AF TDP LD-TAP AF MDP AF MDP Flowpath SI Pump LP Pump AF TDP Flowpath
434.80 28.84 26.43 24.25 20.30 3.02 2.47 1.28 1.17 1.08
the TD pump comes in much lower than does the TD pump itself, re¯ecting high redundancy in the TD ¯owpaths. As mentioned above, for the PW measure, AF ¯owpaths tend to have PWs comparable to those of the associated pumps. LP Pump. The LP pump has a relatively low RAW, because given the low frequency of large LOCA, its ability to increase CDF is limited. SI Pump. In this model, the only roles of SI are boration in ATWS events, and bleed and feed in transients with failed AF. Its RAW is therefore limited. If small LOCA were modeled, the story would obviously be different.
3.4. General observations In this example, it is seen that the PW ranking is again different from the RAW ranking, except for elements that essentially decouple from the rest of the problem, as in the simpler example discussed in Section 3.3. The PW value is really a property of the collection of path sets containing the element, and not the rest of the path sets, so the PW measure is ultimately different from RAW. The PW and the F±V are complementary both in the mathematical sense and in the common-language sense. Strengths and weaknesses of each appear in the other. A curiosity seen in these examples is that if elements having extremely low failure probabilities are modeled, they come through with the same PW as the elements they are in series with. The analogous failure-space curiosity is the example of a very unlikely success path (for example, nearsuperhuman but not impossible recovery actions) logically in parallel with other success paths. One can add such an element to any risk model. If its failure probability is essentially unity, the risk is unaffected, but the F±V importance of such an element is extremely high. Although the need for a very unlikely recovery action might be considered a vulnerability, the dif®culty of the action itself is not necessarily a vulnerability, any more than an extremely high success-probability element in the success path necessarily corresponds to a systemic strength.
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
Typically, the task of PRA modelers is to quantify dominant contributors to risk. This does not logically require that every element be included in the model regardless of its failure probability. Most elements that do appear are present because they contribute appreciably to risk (i.e. they are there in the expectation that their F±V measures will not be completely negligible). But this does not mean that elements that do not appear explicitly are not signi®cant to successful operation. It usually just means that they affect only one division, and have a low probability of taking that division down, or have an extremely low probability of affecting more than one division. Some unmodeled elements have high PW; and of these, some have extremely low failure probabilities due to programmatic resources allocated to assure this outcome. The low F±V of such elements does not necessarily mean that the resources were misplaced; it may mean only that they were successful. 3.5. Initiating events It was remarked above that a measure of the signi®cance of initiating event prevention might usefully be associated with scenarios in the lower left quadrant of Fig. 6. One can de®ne a measure in which the probability of successful initiating event prevention (the probability of not having the initiating event within a speci®c time interval) is weighted by the probability of mitigating system failure, conditional on that initiating event. This measures PW in terms of the adverse consequences expected if the initiating event occurs. There is more than one way to interpret such a measure. In the case of initiating events that can be addressed by engineering or operational practices, the PW of initiating events might say something about the allocation of resources to their prevention. Some initiating events, such as severe earthquake, cannot (as far as we know) be prevented. However, some events related to severe weather can be foreseen, up to a point; with some notice, plant operations can take measures to enhance plant capability to withstand such events. The PW associated with such measures is interesting to consider. Moreover, even though the low frequency of some initiators does not result from active prevention measures, a plant's safety case may indirectly re¯ect credit for their low frequency, so that the plant-wide allocation of resources is implicitly in¯uenced by the low frequency. In other words, the low frequency of certain initiators has value, even if it does not result from anything done by the plant operators, or is only somewhat related to operators' performance. Table 5 presents results for the PW of the initiating events considered in the model developed above. The measure has been de®ned as PW
IEi ; t P
IEi ; t P
TOPuIEi ;
9
where P
IEi ; t is the probability of not having initiating
131
Table 5 Safety signi®cance of initiating event prevention Initiating event (IE)
P (no IE in a given year)
P (CDunominal mitigating system performance)
PW (IE)
Transient LOOP Large LOCA
3.7 £ 10 21 9.5 £ 10 21 , 1.0 £ 10 100
1.0 £ 10 27 5.7 £ 10 25 2.8 £ 10 25
3.8 £ 10 28 5.4 £ 10 25 2.8 £ 10 25
event i before time t, and P
TOPuIEi is the probability of the top event, conditional on initiating event i. Table 5 suggests that prevention of LOOP is doing the most good, as measured by consequences averted as a result of successful prevention. Prevention of transients, whatever its contribution to plant availability, is not preventing many top events because the transient mitigating systems in this model are so reliable. Prevention of large LOCA is preventing more top events than prevention of general transients, because there is a lot less redundancy associated with large LOCA mitigation. This is related to the low PW of the LP pumps, which resulted from the success of prevention of large LOCA. This measure is the only measure discussed so far according to which any aspect of large LOCA is `signi®cant'. Both the large LOCA IE and the LP pumps have low F±V because the IE frequency is so low, and the mitigating system, LP, has a low RAW because the IE frequency is so low. In prevention of core damage initiated by large LOCA, IE prevention is where value has been achieved. Given low large LOCA frequency, the LP pumps add little value (PW is weighted by initiating event frequency). Given so-so LP reliability, large LOCA prevention adds value. Based on the earlier discussion of the conditions under which the Nines index of a basic event equates to log(RAW), it can be seen that this measure of the signi®cance of initiating event prevention would be simply related to the RAW of initiating events, if we were willing to de®ne RAW for initiating events. The RAW of an event is usually presented as the outcome of a sensitivity study in which the basic event is set to true, and the interpretation of this for initiating events is problematic. Even in the present formulation, a question of interpretation arises; `success' in initiating event prevention has been equated to `no IEs in a speci®ed time interval'. In Table 5, the time interval was taken to be one year. It is interesting at this point to consider the effect of enhancing the model to re¯ect a need for performance of accumulators in large LOCA. Unless the failure probability of the accumulator function is in®nitesimal, the calculated success path probabilities would be reduced as a result of modeling accumulator failure. The calculated PW of LP would therefore also be reduced. The PW of large LOCA prevention would correspondingly be increased.
132
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
4. Computational considerations The PW of a given element is determined by the joint success probability of the path sets containing that element. In problems of practical interest, the rare-event approximation is a poor approximation to the probability of a union of path sets, so quanti®cation of their overall success probability is less trivial than quanti®cation of the overall failure probability associated with a collection of minimal cut sets. Examples quanti®ed in this paper have been performed by quantifying the failure probability of the associated cut sets in the rare event approximation (i.e. complementing the path sets and quantifying the resulting cut set expression in the rare-event approximation). Conceptually, the general process for computing an element's PW is as follows. First, obtain all the path sets. Then, form an expression containing all and only the path sets containing the element, complement this collection of path sets (i.e. form the `NOT' of it to obtain the cut sets of that speci®c union of path sets), and quantify the resulting cut set expression. This result is the argument of the logarithm in Eq. (6). In most practical problems, this quanti®cation can adequately be done in the rare-event approximation. The calculations presented here were performed using SETS [5], a general-purpose tool for manipulation and quanti®cation of Boolean expressions. SETS still supports NOT logic, and also supports partitioning of terms in Boolean expressions based on what elements they contain. Both capabilities were used in these calculations. Even though the example done here was simpli®ed, the calculations were relatively tedious, chie¯y because they are not conveniently `canned' in user software, and also because complementing sizable expressions (more than 100 terms or so) is not always trivial. In fact, it remains to be demonstrated that the computations needed to calculate PW are practical for a full reactor-scale problem. Practicality depends in part on the approach taken. It is well known that the computational effort involved in forming and reducing a Boolean expression is strongly related to the approach adopted, especially the strategy used to break the expression into pieces and solve the problem one piece at a time. For example, in formulating the output of an AND gate whose several inputs are large complex expressions, much can be gained by working with intermediate products: ANDing subsets of the inputs, and formulating the ®nal result by ANDing these intermediate results. In fact, the actual gain in ef®ciency may even be strongly dependent on how combinations are chosen for ANDing in the ®rst round. In the reactor example done here, the following simple steps were taken to reduce computational effort: (1) basic events were grouped into supercomponents wherever possible to reduce the number of terms, and (2) high-level expressions were formulated in terms of success and failure at the system level, and the problem solved in pieces, rather than in a single, inef®cient step working entirely with the full core damage expression. For the
simple problem formulated and solved here, (1) and (2) suf®ced. For a full reactor problem, more considered approaches would be necessary. In the 1970s and early 1980s, when the norms of current nuclear-plant PRA practice were being developed and importance measures were ®rst being popularized, even the simple calculations done here would have been considered challenging. This may be one factor in the unpopularity of measures de®ned in success space, another being that risk signi®cance was a topic of seemingly greater interest at the time. However, times have changed. In evaluating whether the measures introduced here are really practical, it should be noted that complementing expressions is a special case of the computations involved in top event prevention analysis (TEPA) [6±9] (discussed below). Considerable work has gone into developing the capability to perform these calculations, and signi®cant progress has been made in this area. For example, in a recent application [8] of TEPA, `several million' prevention sets were obtained, each containing over 600 basic events. Until recently, such a processing feat would have been considered impractical. This example suggests that with some work, PW calculations in largescale problems can be made practical if intelligent approaches to computation are adopted.
5. Dependence of importance measures on prevention set 5.1. Top event prevention analysis TEPA [6±9] is a technique for choosing a collection of elements of a risk model having the property that credit for these elements alone is suf®cient to satisfy a `prevention criterion' (i.e. a safety objective). A collection of elements satisfying a prevention criterion is a `prevention set'. Analogous to `minimal cut set', a minimal prevention set for a given prevention criterion is a prevention set that is no longer a prevention set for that criterion if any elements of it are removed. Given a model with more than enough elements to satisfy a prevention criterion, TEPA shows how to choose subsets of model elements that satisfy the prevention criterion and also optimize ®gures of merit such as cost. The original purpose of TEPA was to support component classi®cation for novel kinds of facilities at the design stage, when there is still a clean slate. Given a conceptual model taking credit for design features that are under consideration, TEPA can show which combinations of design features are capable of satisfying the criterion, and thereby support not only classi®cation but design itself. One example of a prevention criterion is `the system shall be single-failure-proof'. A prevention set satisfying such a criterion is clearly a combination of success paths that is not incapacitated by any single failure. Another example of a prevention criterion is `no cut set probability greater than x'. A prevention set satisfying such a criterion is a combination of success paths whose complement (the cut sets) has no
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
133
Table 6 Risk signi®cance measures compared for two prevention sets: (1) baseline, (2) operator action to initiate bleed and feed not credited F±V
AF TDP DG-1-FTS AF MDP AF MDP Flowpath LD-TAP RPS LP Pump SI Pump AF TDP Flowpath AC-BUS-1
RAW
Baseline (%)
No F and B (%)
Baseline
No F and B
99.40 97.73 0.79 0.26 1.93 1.30 0.09 0.18 0.01 0.00
99.00 57.93 26.19 7.36 1.33 0.65 0.04 0.03 0.01 0.01
24.25 26.43 3.02 2.47 20.30 434.80 1.17 1.28 1.08 28.84
26.07 16.18 67.16 56.52 14.38 215.37 1.09 1.04 103.95 159.94
terms whose probability violates the cutoff. More complex prevention criteria can also be articulated, such as `singlefailure-proof for large LOCA and steamline break, doublefailure-proof for transients and small breaks'. At this point in its development, TEPA operates only with prevention criteria articulated at the cut set level, and is therefore not a true global optimizer. However, in many real problems, it has been shown to generate useful prevention sets. A key feature of TEPA Ð unique among extant techniques for component classi®cation Ð is that the prevention sets are combinations of complete success paths, and as such, are actually functional when propagated through the risk model. No single-event importance measure (such as F±V, RAW, or PW) leads in general to identi®cation of a set of elements having this property [6± 10]. Application of F±V alone, for example, may lead to inclusion of pumps but not ¯owpaths. The concept of a prevention set appears to apply more generally than to cut set level criteria. It is meaningful to identify collections of success paths that satisfy a safety objective, whether or not that objective is articulated at the cut set level. TEPA is introduced here for the following reasons. First, it is important to make the point that importance measures, including PW; are dependent on the prevention set within which they are calculated. Second, given a prevention set, importance measures including PW may clarify thinking about how to allocate resources within the elements of the prevention set. Finally, because of the way in which TEPA currently works, it may be possible to use PW to improve the ef®ciency of TEPA itself, by directing the algorithm to the stronger success paths whose inclusion in prevention sets will make the overall safety case stronger. 5.2. Importance measures as a function of prevention set Tables 6 and 7 illustrate how importance measures change with prevention set. The baseline prevention set is the accident sequence model used above. The other preven-
tion set is derived from the ®rst by removal of a single element, having the effect of removing credit for bleed and feed. It is clear that these prevention sets cannot both be minimal with respect to the same prevention criterion, but this kind of change serves to illustrate the implications for importance measures. Removal of bleed and feed means that the only success paths supported by SI are associated with boration following ATWS. F±V and RAW were presented above for the baseline prevention set. With removal of bleed and feed, the risk importances change as shown in Table 6. The most obvious change is that the motor-driven AF pump now has a higher F±V. This occurs because of the loss of the bleed and feed success path. The TDP already had a high F±V because CDF was already dominated by blackout, in which condition neither the AF MDP nor bleed and feed can operate. The change in PW is shown in Table 7. Since there are now fewer success paths in the model, the absolute PW of any given element either stays the same or decreases. (The relative PW could, of course, increase.) The small reduction in RPS PW occurs because complete success paths require Table 7 Safety signi®cance compared for two prevention sets PW (Nines) compared for two prevention sets: original model vs. `operator action to initiate bleed and feed not credited' Item
F and B not credited
Original model
RPS LD-TAP DG-1-FTS AC-BUS-1 AF MDP AF MDP Flowpath AF TDP AF TDP Flowpath LP Pump SI Pump
3.88 1.31 1.39 2.24 2.24 2.12 1.39 1.37 0.00 0.00
3.89 1.31 1.43 2.73 2.24 2.12 1.39 1.37 0.00 1.73
134
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
not only RPS success but also heat removal, and one way to remove heat is by bleed and feed. The SI pump's safety signi®cance becomes almost nil, because its only remaining function in this model is mitigation of ATWS, a very rare event. (Recall that contributions to PW are weighted by challenge frequency.) AF elements retain their original PW. The AC bus PW reduces somewhat, because it used to support not only the AF pumps but also the bleed and feed option, and now no longer supports bleed and feed. A similar comment applies to the DG. For some plants, removal of success paths containing LDTAP would be an interesting example to consider. In the above example, the CDF is already dominated by blackout, so the change would not be very dramatic.
6. Summary comments on measures of event signi®cance ² The F±V of an element is based on its inclusion in probabilistically signi®cant cut sets. F±V indicates relative vulnerability. It is reasonable to describe F±V as a measure of risk signi®cance. ² A measure of safety signi®cance of an element (PW) can be based on its inclusion in probabilistically signi®cant path sets. ² The conceptual pros and cons and limitations of PW are complementary to the pros and cons and limitations of the F±V measure, although the invalidity of the rareevent approximation for the probability of a union of success paths somewhat complicates the interpretation of PW relative to the interpretation of F±V. ² PW is distinct from, but has a signi®cant relationship to, RAW. In exceptionally simple cases, but only in exceptionally simple cases, the two measures are essentially equivalent. ² An element having high F±V and high PW is a relatively weak link in a strong collection of success paths. ² PW is de®ned for a group of path sets. It refers to a particular basic event if the collection of path sets has been speci®ed as containing that event. It is meaningful to discuss the PW for any group of path sets. Interesting examples are prevention sets and entire systems. ² The prevention set within which a given measure of event signi®cance is calculated has a large effect on the value of the measure. When used, both kinds of measures (risk signi®cance and safety signi®cance) need to be quoted with respect to a particular prevention set. ² Ranking components by PW could improve the ef®ciency of TEPA. Prevention sets that include higher-performance success paths are likely to be higher-performance prevention sets. ² It is misleading to suggest that any single-event measure captures `importance'. At best, a given measure re¯ects some particular consideration.
7. Applicability to component classi®cation Traditionally, classi®cation of components in commercial nuclear power plants to determine special treatment requirements has been closely related to how (and especially whether) their performance has been invoked in the analysis of design basis accidents: the complement of components needed to ful®l the design basis with single-active-failureproof redundancy is designated safety-class. This approach has not led to optimal resource allocation for several reasons. The selection of design basis accidents, whatever its merits for purposes of specifying physical capability of systems, was not formulated to optimize resources from a risk point of view. Some scenarios receive priority despite being highly unlikely, while others that arguably need attention are not addressed. Moreover, once invoked in the safety case, components are likely to be subjected to extremely demanding regulatory requirements, sometimes out of proportion to the level of performance that would be deemed adequate in a safety case that quanti®ed the level of protection. This inappropriately allocates safety resources, and imposes signi®cant overall burdens on licensees. Finally, components not included in the traditional safety case may nevertheless carry part of the real safety burden; a more optimal allocation of performance might relax requirements on very stringently regulated components, while re¯ecting a regulatory stake in the protection afforded by components that are not presently safety-class. In the last 10 years, attempts have been made to improve component classi®cation based on importance measures: not necessarily changing what is `safety-class', but modifying existing classi®cation schemes with additional designations based on risk model insights (e.g. low safety signi®cant) in the hope of improving the allocation of treatment resources. A detailed review of work in this area is beyond the scope of this paper. However, the following seem to be characteristic of much recent work. 1. Many discussions implicitly equate what is here called risk signi®cance to across-the-board importance. Some authors all but equate risk signi®cance with safety significance. For example, one sees elements classi®ed `high safety-signi®cance' based on high F±V. An important exception is Ref. [1]. 2. In most work done by the industry, 1 the prevention set is not explicitly discussed, and is therefore implicitly the set of all components modeled in the PRA. 3. There is a tendency to equate a need for special treatment with high F±V or high RAW, independently of the actual performance allocation, and to de-emphasize special treatment for low F±V and low-RAW elements. In areas where defenses are thin (F±V is high), this emphasizes special treatment, and in areas where defenses are 1 Some exceptions appear in the work by Blanchard and coworkers (see for example Ref. [8]).
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
abundant (F±V is low), it de-emphasizes special treatment, as if none of the barriers in the latter areas were important. Regarding (1): Part of the current proposal is that it is useful to distinguish between safety signi®cance and risk signi®cance. Regarding (2): Part of the current proposal is to recognize that element importance measures re¯ect not only element properties, but also properties of the prevention set within which they are calculated. Addressing the questions raised by (3) is beyond the present scope, although it is hoped that consideration of PW will enhance the perspective afforded by existing measures. PW should not replace risk signi®cance as the measure to determine special treatment, nor should any combination of single-element measures be used, except heuristically. Arguably, the appropriate resource allocation depends on the level of performance allocated to the element in the safety case (compared, for example, to `generic' performance), the susceptibility of performance to being in¯uenced by treatment, and the uncertainty in, or variability of, that performance. The latter is an important theme in Refs. [1,2]. For plants starting with a clean slate (other industries and not-yet-licensed plants), the following steps, adapted and abbreviated from Refs. [7,9], might be useful: 1. Construct a comprehensive risk model. 2. Determine a candidate prevention set. 3. Allocate performance over elements of the prevention set, and propagate the result through the model to see whether the desired properties (risk metrics, defense in depth) are achieved. This should include measures of central tendency (mean) as well as uncertainty. See the references for discussion of unmodeled elements implicitly included in prevention sets. 4. Determine the treatment requirements needed to achieve allocated performance, and the costs of satisfying them. Factors affecting treatment requirements arguably include whether the allocated performance is better than generic for a given element, and the degree of variability in the performance of elements of its class. 5. Iterate steps 2±4 to achieve a solution that satis®es safety objectives at a reasonable cost. The above steps were articulated without explicit mention of signi®cance measures. It has been shown [6±10] that single-event importance measures do not lead to sensible prevention sets (step 2). They may ®nd heuristic application in step 3 (allocation of resources within a prevention set), but any decision based on single-event signi®cance measures can and should be con®rmed by explicit model quanti®cation of the implications of the decision. 8. Summary A measure of the safety signi®cance of a risk model
135
element has been proposed
PW: This measure is analogous to a widely used measure of the risk signi®cance of an element, F±V importance, which is based on the element's inclusion in probabilistically signi®cant cut sets. The PW of an element is based on its inclusion in probabilistically signi®cant path sets. The conceptual pros and cons and limitations of this measure are complementary to the pros and cons and limitations of the F±V measure, although the invalidity of the rare-event approximation in success space somewhat complicates the interpretation of PW relative to the interpretation of the F±V measure. The PW of an element re¯ects what the element is in series with. The F±V of an element re¯ects what the element is in parallel with. The PW is distinct from, but has a signi®cant relationship to, RAW. For elements that appear only in success paths that do not overlap success paths not containing the element, the nines index is equivalent to log(RAW). It is an oversimpli®cation to equate either risk signi®cance or safety signi®cance with overall component importance. The two kinds of measures re¯ect complementary aspects of a risk model. The prevention set within which a given measure is calculated has a large effect on the values of the signi®cance measures. When used, both kinds of measures (risk signi®cance and safety signi®cance) need to be quoted with respect to a particular prevention set. The proposed measure of element safety signi®cance (PW) is really a measure of the PW of a union of success paths. It can be applied to an element because a particular union of success paths can be speci®ed by the paths' containing that element. PW has meaning, and possible usefulness, when applied to other unions of success paths. One could specify a set of success paths in terms of their each containing a speci®ed group of elements, or one could quantify the PW of a system by calculating the PW of its success paths. Because success paths are the functional elements of a safety case, PW is a potentially useful tool in TEPA. Prevention sets that include higher-performance success paths are likely to be higher-performance prevention sets. Acknowledgements The author is grateful to ISL, Inc. for supporting this work. The author is grateful to the referees for several helpful suggestions, and to D. Blanchard for several helpful conversations. Views expressed in this paper are solely those of the author. References [1] Cheok MC, Parry GW, Sherry RR. Use of importance measures in risk-informed regulatory applications. Reliab Engng Syst Safety 1998;60:213±26.
136
R.W. Youngblood / Reliability Engineering and System Safety 73 (2001) 121±136
[2] Borgonovo E, Apostolakis GE. A new importance measure for riskinformed decision making. Reliab Engng Syst Safety 2001;72:193± 212. [3] Youngblood R et al. Pair importance measures in systems analysis. ANS/ENS Topical Meeting on Thermal Reactor Safety, February 1986. [4] Wong S et al. Performing sensitivity evaluations of plant risk using interactive PC-based programs. Proceedings of PSA '89/International Topical Meeting/Probability, Reliability, and Safety Assessment, 2±7 April 1989, Pittsburgh, Pennsylvania. La Grange Park, IL: American Nuclear Society. 1989. [5] Worrell RB. SETS Reference Manual, NUREG/CR-4213 (USNRC, 1985). Actually, the version of SETS used here was a signi®cantly updated commercial version, not the mainframe version documented in this reference. No generally-available references to the current version exist. [6] Youngblood RW, Worrell RB. Top event prevention in complex systems. Proceedings of the 1995 Joint ASME/JSME Pressure
[7] [8]
[9] [10]
Vessels and Piping Conference, PVP vol. 296, SERA vol. 3, Risk and Safety Assessments: Where Is The Balance? July 1995. New York: The American Society of Mechanical Engineers. 1995. Youngblood RW. Applying risk models to formulation of safety cases. Risk Anal 1998;18(4):433. See Nierode CF, Worrell RB, Blanchard DP. Use of top event prevention analysis to select a safety-signi®cant subset of air-operated valves for testing. Proceedings of the 4th International Conference on Probabilistic Safety Assessment and Management, 13±18 September 1998, New York City, USA [sic]. London: Springer. 1998. p. 1358±63, plus citations in Ref. [7] above. Youngblood RW et al. Elements of an approach to performance-based regulatory oversight, NUREG/CR-5392, USNRC, 1999. Vesely WE. The use of risk importances for risk-based applications and risk-based regulations. Proceedings of the International Topical Meeting on Probabilistic Safety Assessment PSA '96. La Grange Park, IL: American Nuclear Society. 1996. p. 1623±31.