Reliability Engineering and System Safety 46 (1994) 203-207
ELSEVIER
0951-8320(94)00024-7
(~) 1995 Elsevier Science Limited Printed in Northern Ireland. All rights reserved 0951-8320/94/$7.00
Guest Editorial
On merging system safety and quantitative risk assessment Unfortunately there is at present no accepted terminology in th& field. FRANK LEES, 19801
munitions. The T O C D F also contains a typical litany of chemical hazards: acids, caustics (bases) natural gas, hydrogen, and of course, the special chemical agents at the root of the whole stockpiling effort. ~2
INTRODUCTION Risk assessment (RA) has lately been the dominant tool used to address public safety issues in the nuclear industry. The key United States federal agencies associated with this work environment, the Nuclear Regulatory Commission (USNRC) and the Department of Energy (DOE), have each used RA extensively to evaluate and manage risk at their facilities. Other industries and hazardous technologies have not been so enthusiastic in its use. 2 A military approach to the need to assure safety uses a concept termed system safety. 3 In contrast, the chemical process industry (CPI) seems to be in a transition 4"5 from more simplistic analysis approaches such as H A Z O P 6 to something closer to the RA of its nuclear sister, quantitative RA (QRA). 7 Recently, there has arisen the opportunity to compare and blend these two approaches in the nation's program to dispose of its chemical munitions stockpile per international treaty (the CSDP). 6 In preparation for this disposal program, the National Research Council (NRC) has called for the development of a Q R A for the initial continental US (CONUS) chemical agent disposal facility (CDF) sited at the Army Depot near Tooele, Utah (TOCDF). 9 This Q R A is fully scoped (including estimates of public and worker risk from internal and external events). As such, it is more ambitious than even the landmark RA, W A L S H 1400. ~° The RA borrows its name from that used in the chemical industry, namely QRA, borrows from the fairly mature technology of nuclear RA, ~ and is performed under the auspices MIL-STD-882C. As an anchor for the following discussion, the T O C D F is a relatively standard chemical process plant, compared, say, to the agri-pesticide industry, with the added hazard that the toxic chemicals in the Depot's stockyard are often embedded in delivery
SEMANTIC DIFFICULTIES The technical definition of risk is well established, j3 Conceptually, risk is the collection of all possible risk scenarios, what the MIL-STD-882C refers to as m&haps. Each mishap is a postulated progression of events leading to consequences, a loss related to human health and safety. As such, each scenario has a (typically unknown) occurrence frequency associated with the sequence of events and a quantifiably severe loss. J4 The triplet consisting of a scenario, its frequency and loss--indicated by (si, f , li)--IS risk in this view. The collection of all such triplets for any facility or its underlying technology is its overall risk. Numerically, risk may be reduced to the product of each triplers frequency with a numeric measure of its loss. In this case, overall numerical risk is simply the sum of all such products, since each scenario is arranged so as to be distinct from all the others, ~ Notice that this technical definition seems to emulate the common sense one: risk is the likelihood o f loss. Semantic difficulties apparently arise with the concept of hazard. In fact, the difference in the system safety approach to safety compared to the quantitative risk assessment approach may be said to be that the system safety proponents do hazard analyses and the risk analysts do risk. In fact, it seems that the overriding hope of system safety proponents is to avoid the concept of probability altogether; so much so that even when they are in a quantitative mood, they do not use the older term probabilistic RA. The concept of hazard should be non-controversial. A hazard is a material feature of the world that were it to come into contact with a person or a person with 203
Guest Editorial
204 Table 1. Some TOCDF hazard types
Toxins Stored chemicals
Caustics Electrical
Force Temperature Mechanical
Asphyxiates Pressure Other hazmats
Blister agents Nerve agents TNT Hydrogen Propane Natural gas NaOH NaCIO HCI ac/dc carriers Batteries ttcat tracing Lightning Earthquake Tornado Meteorite Furnaces & incinerators Demil machines Forklifts Conveyors Trucks Halon Building (HVAC) Hydraulics Spent charcoal Dioxins, NO,, SO~
it, an unfortunate consequence could occur (to the person, of course). This seems to be the gist of the M I L - S I D definition of hazard: "a condition that is prerequisite to a mishap. "~ Surely, thc kc~ prerequisite to a mishap is the objective thing that 'carries' the danger associated with the mishap: toxins, encrgy, ctc. Table l indicates some of the types of and specific hazards at the T O C D F . This list borrows from the System Safety M a n a g e m e n t Plan (SSMP) that documents the C S D P ' s implementation of 882(. As noted previously, this list is rather standard for a chemical process facility. Industry statistics are gathered annually on the degree of safety to facility personnel. Ls The Tooele Q R A is to specifically consider most of these hazards as amenable to statistical techniques and not in scope. Rather, worker risk was restricted to that unique to the handling and disposal of chemical agent munitions. Environmental risk was also out of the scope, although an environmental risk study was to be p e r f o r m e d separately. It is unfortunate that the definition of hazard in the M I L - S T D is so vague, since system safety must depend on its crisp definition to implement most of its r e c o m m e n d e d program. To the MIL-STD, a hazard is a prerequisite condition to a mishap. A mishap and a scenario are apparently equivalent concepts, which seems reasonable; the word 'condition" can stand for anything, but it doesn't seem to apply well to the occurrence of an event. The prefix "pre' indicates that
a hazard in some sense exists prior to the mishap: hence, a hazard is distinct from a mishap and precedes it. ~'' This all seems workable until another feature is added to the M I L - S T D definition of hazard. A hazard is not only associated with a severity but an occurrence frequency. > Now a mishap, being a sequence of events, can have associated with it a frequency of occurrence. But a prerequisite condition, a material, objective danger, is not an event. Hence, the intuitive concept of a hazard cannot have a frcqucncy. Similarly, such a condition cannot have a severity really, a l t h o u g h - - b y imagining a m i s h a p - - a hazard can be said to be associated with a potential severity (of loss). Hence, to maintain conceptual integrity, a hazard is a feature of reality. It exists: it may cease to exist: but not being a part of a scenario/mishap, i.e. not being an event, then a hazard cannot be associated with an occurrence frequency or severity as it is in the MIL-STD. From this perspective, Table 1 amounts to a hazard audit of the facility: an itemization of the (types of) physical reality that can harm people, irrespective of whether a mishap actually occurs or not. A S O L U T I O N : T H E RISK P A R A D I G M
The solution of the system safety conceptual .jumble is to take advantage ol tl]c years ol groundwork laid by the R A community. :~ Figure 1 presents the basic paradigm in RA. \ mishap (what R A terms a scenario) is a "path,' "~,,~sibly one of many, from a material d a n g e r - - t h e ha/ard---bv means of a sequence of events to the consequences of this partictdar p a t h - - i t s loss. The cvents linked together in the definition of the mishap are postulated, i.e. they, could but may not have happened. This is where the probabilistic framework enters the analysis. A particular mishap begins with an event called the initiator. An initiator at T O C D F (for the purposes of the Q R A ) is defined as either an upset of the disposal process, an agent leak, or the explosion of a munition. In the first case, there are other engineered barriers to the leak of agent and an incident is a path to the breach of first agent confinement. The mishap, or what is often termed a (severe) accident, breaches all engineered controls, i.e. all mitigators and confinement, of particular significance when public safety is the risk criterion of interest. It should bc noted that somc initiators are immediately far reaching in consequences in that they amotmt to incidents or even mishaps in themselves. This may also be another confusion related to the hazard concept residual in RA. What is called a seismic hazard sometimes can lead to a use of
Guest Editorial
beyond engineenng controls
initiators
II
Hazard
205
Loss
1!
latent conditions incident
J Y mishap/ accident
Fig. 1. The general paradigm for safety and risk assessment. language that suggests that the hazard, in this case that of an earthquake, has a frequency.22 But it is only the seismic event, an earthquake's happening, that may be associated with a frequency. Any of the event-oriented elements of the paradigm--the initiator, the incident, or the mishap-can have an occurrence frequency. It is the estimation of this frequency toward which is directed much of the effort of QRA. However, system safety need not pursue this quantitative urge, since much, if not most of the insights needed to manage risk in a hazardous facility are obtained from the qualitative efforts of the quantitative RA. Coming at this point from another direction, one of the criticisms with the stable of system safety techniques--the (preliminary and system) hazard analyses--is that it doesn't result in much analysis. Hazard analysis is the step from the necessary hazard audit to identifying (typically, unsystematically) the proximal cause and effect of a hazardous element. 23 This element of a hazard analysis is ill-defined but in practice it seems to mean the initiator or often any of the events that can be part of a potential (but are not fully identified) mishap. One result of this confusion is that a hazard tracking log (HTL) which is mandated by MIL-STD-882C may be implemented as a mishash of elements from the figure. A final element of the paradigm is a set of latent conditions. These latent conditions are preconditions to mishaps but not necessarily for a particular mishap: hence, they are not hazards, although they are prior to the initiation of any mishap. For example, what human factors specialists usually call human engineering deficiencies (HEDs) have been implemented in some hazard studies as human hazards. 24 Obviously, people may be dangerous but in an engineered facility it is not called for to assume any maliciousness. Instead HEDs and their possible result, human errors, may be prior (latent) conditions to a mishap, a rough measure of the inadequacy of the safety environment at any time. But they still are not
the hazards themselves. 25 Of course, other latent conditions may solely relate to equipment or the environment of the facility. Notice that failures of mitigative features may occur prior to a mishap if undetected and, hence, also be latent. IMPLICATIONS: A MERGER
From the discussion above as well as hard lessons learned in developing QRA, it seems that the merger of system safety with QRA can begin with the following considerations: (1) The definitions of hazard, mishap and the other event-oriented hazardous elements of the RA paradigm should be recognized and adhered to in any safety analysis. (Note that this would mean modifying the MIL-STD in obvious ways.) (2) A hazard audit, i.e. making a hazard list, is crucial to instigating either a system safety hazard approach to safety or a QRA-based one. (3) A preliminarv hazard analysis (PHA--the distinction between preliminary and system hazard analysis is not really useful) should freely identify any of the hazardous elements-latent or event-oriented--of a mishap related to a given hazard, but should carefully note which element it represents in a column (or three) additional to the standard PHA format. 2~ At a minimum, initiator, mitigator and latent conditions should be distinguished. Figure 2 indicates the minimal information for a PHA. Notice that the third example element is both latent and mitigative. (4) A detailed hazard analysis (DHA) should consist of the systematic identification of all credible hazardous elements: initiators by hazard and location, the events in each postulated mishap by hazard and location, and the specific expected loss from each mishap. 27 Specifically, the techniques of event and fault
Guest Editorial
206
Hazard
agent
location
Munitions Demilitarization Building, Deactivation Furnace System Room F ~.~ oF ~ = _'~ '~ "~ Proximal Cause
Hazardous Element
Proximal Effect
flameout
loss of combustion air compressor fails off control valve fails closed ac power fails off
challenges automatic shutdown system
interlock not available when temperature below 500 deg.
design decision: startup is mostly manual
safety element is not available in some operating modes; risks mode error by operator
data bus to PLC not reconnected
workers might enter room with less than required protection when agent present
ACAMS incorrectly changed out in furnace room
X
X
Fig. 2. A sample PHA form with minimal information needed.
trees from Q R A should be used (or some surrogate). This will assure that multiple events, dependencies between preventative and mitigative events, human interactions and HEDs, and common cause events are properly accounted for in the hazard analysis. The analytical framework developed will also allow the incorporation and assessment of actual events (initiators or incidents). Notice also that this framework means that, although fault tree analysis is not a way to identify hazards as implied in the SSMP, 2~ it is one rigorous way to identify hazardous elements. The output of a D H A is a set of mishaps related to each hazard defined as culsets from the event tree/fault tree models. A cutset is typically a set of multiple events that, taken in sequence, make up a mishap. Considerable assistance is also required from phenomenological (so-called deterministic) analyses to support the identification and qualitative specification of mishaps. One early finding is that the T O C D F is an intimate interconnection of people, systems and p h e n o m e n a - - f a r unlike any found in the nuclear industry. Table 2. Toward merging approaches
Adopt definitions Hazard audit PHA Detailed HA Quantification Consequences Risk amalgam HTL
System safety
QRA
Mandatory Mandatory Mandatory Recommended Optional Qualitative Qualitative Mandatory
Mandatory Mandatory Optional Mandatory Mandatory Quantitative Quantitative Mandatory
(5) A mishap quantification analysis is the only dependable way to assign frequencies to mishaps and otherwise 'quantify' hazards. > This includes a proper data analysis, a full human factors/human reliability analysis, and a meaningful common cause analysis. A D H A is prerequisite to quantification, and when D H A is opted, even without full quantification, the qualitative assignment of probabilistic values is more credible. (6) A consequence (loss) analysis is the means toward quantifying properly the loss of each postulated mishap. This analysis must adopt the probabilistic framework as well in order to blend effectively with the rest of the analysis. This may mean significant adjustment to environmental risk approaches as well as to favorite codes used to assess population exposure. (7) A risk amalgamation effort is required to usefully compile and present risk results and provide them for safety and risk management. The simplistic risk assessment codes (RACs) of the MIL-STD 3u may have to be used if a fully quantitative approach is not adopted but, unaccompanied by a D H A , they can be quite misleading. (8) A serious hazard tracking log requires that the disposition of the hazards, i.e. hazards, mishaps, latent conditions, etc. actually be tracked by the managing organization. It is one thing to identify additional mitigation/preventive strategies and another to implement them effectively. The latter needs the foundation described above and the commitment to apply it. Only until this program is adopted as the way to go in system safety (as well as the implementation of
Guest Editorial regulations related to the CPI) or some thorough, consistent c o m p l e m e n t of this program is completed, will the design goal of system safety be met:
From the first, design to eliminate hazards. I f an identified hazard cannot be eliminated, reduce the associated risk to an acceptable level through design selection. 31 A m o n g the strengths of Q R A are the fact that the qualitative analysis is t o p - d o w n and event oriented, the fact that the urge to quantity has utility beyond quantification, and the fact that Q R A forces the examination of uncertainties. These benefits can only be obtained by following the program r e c o m m e n d e d in the rightmost column column in Table 2. Short of that, a major step toward merging system safety with Q R A can be made by following the program r e c o m m e n d e d in the middle column: namely, performing as detailed a qualitative, i.e. hazard, analysis as resources allow, so that the qualitative specification of risk has a reasonable basis. The first step toward this merger, however, is the adoption of a consistent conceptual framework as developed by R A and outlined above. Then, and only, then, can Frank Lees' statement finally be falsified.
NOTES AND REFERENCES 1. Lees, F. P., Loss Prevention in the Process Industries. Butterworth-Heinemann, 1980, p. vii. 2. The nuclear industry's enthusiasm has been somewhat strained since the agencies have had to write "rules' requiring the use of RA. But many in the nuclear community have, if begrudingly, come to realize RA as a useful tool. 3. US Department of Defense, Military Standard: System Safety Program Requirements, MIL-STD-882C. DoD, Washington, DC, 19 January 1993. 4. US Department of Labor, Occupational Safety and Health Administration, Process Safety Management of Highly Hazardous Chemical: Final Rule. Federal Register, 29 CFR 1910.119, 57 (36) (1992) 6356-417. 5. Amendola, A., Major hazards regulation in the european community. In Probabilistic Safety Assessment and Management, G. Apostolakis, ed., pp. 229-32. Elsevier, Amsterdam, 1991. 6. The Center for Chemical Process Safety, Guidelines for Hazard Evaluation Procedures, 2nd edition. American Institute of Chemical Engineers, 1992. 7. The Center for Chemical Process Safety, Guidelines for Chemical Process Quantitative Risk Analysis. American Institute of Chemical Engineers, 1989. 8. US Department of Defense Authorization Act 1986, Public Law No. 99-145, amended to comply with the Chemical Weapons Convention in Defense Authorization Act 1993. 9. A prototype facility has been operating for some 3 years on Johnston Island in the Pacific Ocean.
207
10. Rasmussen, N. C., et al., Reactor Safety Study, WASH-1400 (NUREG-75/014). 11. US Nuclear Regulatory Commission, Severe Accident Risks: An Assessment for Five U.S. Nuclear Power Plants, NUREG-1150. NRCa, Washington, DC, December 1990. 12. See, for example, the description of the disposal process in National Research Council, Alternative Technologies for the Destruction of Chemical Agents and Munitions. National Academy Press, Washington, DC, 1993. 13. Kaplan, S. & Garrick, B. J., On the quantitative definition of risk. Risk Analysis, 1 (1981) 11-27. 14. The CPI often focuses on more general consequences-any kind of loss, e.g. economic and environmental--and not just safety. See, e.g. Lees' Loss Prevention. 15. However, as Garrick and Kaplan note (pp. 13-14) this simple definition is obtained at the price of a loss in information. 16. Op. cir., p. 5. 17. Department of the Army, Program Manager for Chemical Demilitarization, System Safety Management Plant for the Chemical Stockpile Disposal Program. PMCD, Aberdeen Proving Ground, MD, April 1991. 18. See, for example, National Safety Council, Worker Injury and Illness Rates 1992. NSC, Itasca, I1., 1992. 19. Notice that Garrick and Kaplan (p. 13) identify a hazard as risk without the frequency, i.e. (si, l~). Hence, for all practical purposes, they inexplicably define a hazard as a mishap. 20. Op. cit., p. 5. 21. The details of a risk analysis must be tailored to the risk setting. For a first step in this direction, see Garrick, B. J., The approach to risk analysis in three industries: nuclear power, space systems, and chemical process, Reliab. Engng System Safety, 23 (1988) 195-205. 22. This example is due to a cohort, Amir Afzali. 23. PMCD, SSMP, op. cit., p. 32. 24. This example is due to cohort, Dr Carol Tolbert, who, assisting in a software interface hazards study, faced the problem: 'Are people hazards?' 25. This means that a 'software hazards analysis" is a misnomer; it actually is a study to analyze the initiators, latent conditions and mitigation aspect of the software. 26. PMCD, SSMP, op. cit., p. 35. 27. M. Kazarians show that a sophisticated use of traditional system safety tools can go a long way toward capturing qualitative risk [Hazard analysis for compliance with process safety, in Proc. PSAM-II, G. E. Apostolakis and J. S. Wu, eds. San Diego, CA, 20-25 March 1994, pp. 27-1 through 27-6]. However, a hard lesson learned from these efforts for the TOCDF is that confusion can result without an explicit, consistent and detailed risk framework. 28. lbid, p. 35. 29. Quantification is fraught with interesting issues but they do not negate the value of the effort to quantify. 30. MIL-STD-882C, op. cit., pp. 11 and A-5. 31. PMCD SSMP, op. cir., p. 25.
Ed M. Dougherty, Jr 1309 Continental Dr. Suite F Abingdan M D 21009 USA