PII: S 0 9 5 1 - 8 3 2 0 ( 9 7 ) 0 0 1 6 6 - X
Reliability Engineering and System Safety 62 (1998) 11–16 q 1998 Elsevier Science Limited All rights reserved. Printed in Northern Ireland 0951–8320/98/$19.00
Use of the Safety Monitor in operational decisionmaking at a nuclear generating facility Shan (Sam) H. Chien, Thomas G. Hook & Roger J. Lee Southern California Edison Company, P.O. Box 128, San Clemente, CA 92672, USA The utilization of Safety Monitor at a nuclear generating facility in 1994 revolutionized the way US nuclear power plants manage configuration risks. At Southern California Edison (SCE) Company’s San Onofre Nuclear Generating Station, it transformed probabilistic risk assessment (PRA) from a retrospective tool for understanding past risk into a prospective tool for controlling future risk. Since that time, many other nuclear utilities have taken aggressive steps in using PRA to better understand and manage risks associated with plant operation and maintenance. These utilities have employed a variety of methods ranging from systems similar to San Onofre’s Safety Monitor to systems dramatically different in both technology and philosophy. In the development and use of its Safety Monitor, SCE has been guided by two philosophical goals: (1) maximize the objectivity of PRA-informed decision-making relative to managing configuration risks, and (2) ensure that risks are managed conservatively. q 1998 Elsevier Science Limited. Key words: probabilistic risk assessment, safety monitor, maintenance rule.
conditions such as switchyard maintenance, turbine stop valve testing, external fires and degraded grid voltage conditions. The output of the original Safety Monitor was a calculation of instantaneous core damage frequency (CDF), attributable to internal initiating events, given the plant’s specified configuration. In early 1995, the Safety Monitor was further equipped with the capability of calculating conditional containment failure risk. This risk, while originally portrayed in terms of radioactive release, is today reflected in terms of ‘large early release frequency’ or LERF. The technological breakthrough of the Safety Monitor was the speed at which it is able to solve the plant PRA model. While complete solution of the plant model to calculate CDF once required hours and even days, the current version of the Safety Monitor accomplishes this, including the LERF evaluation in roughly one minute on a Pentium PC. The rapid solution time for the Safety Monitor model was achieved by reorganization of the traditional small event tree and large fault tree PRA model into a single large fault tree and through use of a highly optimized RELMCS fault tree solution software.6 It must be clarified that the design specification for the Safety Monitor required a complete resolution of the entire model with no loss of detail, scope, or accuracy.
1 DEVELOPMENT OF THE SAFETY MONITOR In 1989, SCE’s interest in risk-based regulation prompted it to join, with several other utilities, a Nuclear Regulatory Commission (NRC) sponsored working group chartered to explore the feasibility of risk-based Technical Specifications (i.e. regulatory requirements governing the operation of nuclear generating facilities). As a result of insights gained, SCE concluded that the potential benefits of a computer system which could evaluate plant risk in real-time justified the development and implementation costs. These potential benefits included: (a) increased plant safety via an enhanced ability to assess and manage risk, and (b) increased operational flexibility resulting from the NRC’s growing support of risk-based regulation. SCE began developing its Safety Monitor in mid-1992 with the preparation of a detailed project specification and the engagement of NUS Corporation as its contractor. Through a collaborative effort between NUS and SCE staff, San Onofre’s Safety Monitor was completed in early 1994.1–5 The San Onofre Safety Monitor is a multi-user personal computer (PC) based software tool into which is entered information on the plant’s configuration (actual or hypothetical) including components out-of-service for maintenance or testing, system alignments and operational 11
12
S. H. Chien et al.
Fig. 1. Safety monitor main screen.
Indeed, the results of the Safety Monitor have been verified to be identical to those previously generated via the older method. The model does not involve event modularization, simplification of the IPE model, cutset editing, or building a library of pre-evaluated plant configurations (although this last method will generate equally accurate results). Because the Safety Monitor risk model is maintained current, each calculation reflects the as-built and as-operated plant at the time of the assessment. The main screen of the Safety Monitor (see Fig. 1) displays (1) the instantaneous level of risk via a thermometer type display, (2) the risk history for a period up to 30 days, and (3) a recommended time limit for the plant configuration based on risk. The risk levels are ‘Normal’, ‘ Moderate’ and ‘Caution’ with the ‘Normal’ range less than 3.2E–5/ year instantaneous core damage frequency, the ‘Caution’ range greater than 2E–4/year, and the ‘Moderate’ range in-between. A recommended time limit (i.e. allowed outage time) for the configuration is calculated based on the assumption that the accrued increase in core damage probability due to any plant configuration entered should not exceed 1E–6.
2 USE IN PLANT OPERATION DECISION-MAKING The Safety Monitor is available on the site-wide computer network, making it possible for virtually any individual on
site (including NRC resident inspectors) to assess actual or hypothetical plant configurations. Maintenance personnel utilize the Safety Monitor in the scheduling of preventive maintenance activities. Several weeks in advance, a maintenance planner enters into the Safety Monitor the equipment outages anticipated in a draft maintenance schedule. If the calculated instantaneous risk for these ‘hypothetical’ configurations is unacceptable (i.e. in the Caution range), then equipment outages are rearranged until the risk is sufficiently low. As the time approaches for the planned maintenance to be performed, additional runs of the Safety Monitor are performed to accommodate emergent corrective maintenance work items. Furthermore, work activities are planned so as to be completed within the allowed outage time recommended by the Safety Monitor. Operations personnel responsible for work authorization independently assess the risk associated with each day’s anticipated plant configuration. While strict compliance with the Technical Specifications is always maintained, the Safety Monitor is used in an advisory capacity to ensure conservative plant operation. The Shift Technical Advisor is responsible for identifying potentially risksignificant emergent conditions, reassessing the plant risk, and advising control room operations staff if PRA-informed responses are required. probabilistic risk assessment analysts (part of San Onofre’s oversight organization) perform retrospective evaluations of plant risk by inputting to the Safety Monitor data taken from control room operator logs, updating its
Use of Safety Monitor at a nuclear generating facility historical risk profile, and by critiquing both the model (i.e. comparing calculated risk with what might be expected from the associated configuration) and plant personnel performance (i.e. whether risk management guidance was followed). Annotated historical profiles of actual plant risk generated from the Safety Monitor are distributed for management review and critique on a quarterly basis. In 1995, senior management at San Onofre instituted a tangible demonstration of its commitment to plant safety and included safety as an element in the employee incentive program. Thus, 25% of what each individual could potentially receive via the nuclear organization’s incentive compensation program was directly tied to plant safety as measured by average core damage frequency from initiating events. The Safety Monitor played a key role in this program. It tracked performance of the plant against the goal as well as empowering the maintenance, operations and engineering personnel, whose actions impacted plant risk, with the ability to assess the impact of their proposed actions. This program has been regarded as highly effective and has subsequently been continued every year since. Although not included in its original design specification, support of the Maintenance Rule has been an important application of the Safety Monitor. The Safety Monitor has supported compliance with many Maintenance Rule requirements, including providing a means for gathering data on component unavailability, justification of entry into Technical Specification action statements, and establishing component and system performance criteria. Specific examples about using Safety Monitor to assist implementation of the Maintenance Rule and enhancement to the Safety Monitor initiated by its applications to the Maintenance Rule are provided in the Appendix.
3 PHILOSOPHICAL PERSPECTIVES In the development and use of its Safety Monitor, utility management can be guided by two philosophical perspectives relative to managing configuration risks: (1) maximize the objectivity of PRA-informed decision-making, and (2) ensure that risks are managed conservatively. The industry has long been critical of regulatory guidance which is overly subjective. Such guidance has led to ambiguity in expectations, ineffective responses, and surprises. The industry has frequently suggested that guidance would be improved if it were more objective. This suggestion is believed to be no less applicable to PRA-informed decision-making. Putting this perspective into practice at an operating nuclear power plant has meant focusing resources on maximizing the objectivity of PRA-informed guidance, i.e. ensuring that it is clear, correct and complete. It has long been a practice of responsible PRA practitioners to verify, clarify and qualify analysis results
13
prior to their use in the plant. While some kind of verification will always serve an important quality assurance function, the need to provide clarification and/or qualification of PRA results indicates the existence of knowledge or insight not contained in the model. The objectivity of PRA-informed decision-making is enhanced when knowledge or insight with a material impact on analysis results is proactively included in the model rather than reactively inserted into the results. Such a practice would result in repeatable and unambiguous guidance for the end user and also allow for more thoughtful and deliberate critique of this knowledge and insight. It is believed that the technology can support this philosophy and has therefore made pursuit of model robustness a high priority. Even the earliest version of the Safety Monitor contained risk factors (e.g. instrumentation testing, switchyard activities, turbine stop valve testing) not traditionally included in PRA analyses. And one year after its initial release, the Safety Monitor was given the ability to calculate large early release frequency as a function of the reliability and availability of components associated with containment isolation and pressure control. Work is currently proceeding to enhance the Safety Monitor further by adding the ability to assess seismic and fire risk as well as risk during shutdown conditions. Southern California Edison maintains a policy of continuous self-critique of its PRA processes and modeling. Critical reviews of PRA analyses are performed routinely and when weaknesses are identified (a frequent occurrence in the early years but increasingly rare), or when required by modifications to plant design or data, model changes are promptly initiated. Needless to say, for the model to support these demands for enhancement and/or revision, it must be flexible. Therefore, San Onofre recently developed a relational database to enhance Safety Monitor model flexibility. All assumptions and success criteria upon which the model is built, as well as references to all supporting design documentation and analyses, were put into the database and cross-referenced. Bases for assumptions can now be instantly identified. Changes to design documents can now be instantly traced to their use in the model. The result is a system which permits changes in design documents, procedures or component characteristics to be reflected in the PRA model as quickly as they are made in the plant itself. The second philosophical perspective which has shaped the development and use of Safety Monitor is that risk should be managed conservatively. To this end, Safety Monitor can be utilized to control risk in three areas: the amount of risk accrued over an entire year, the amount accrued by any given plant configuration, and the level of instantaneous risk. As discussed, SCE management included in its employee incentive program a goal for nuclear plant safety as measured by average annual core damage frequency. Performance is tracked daily by the Safety Monitor and made
14
S. H. Chien et al.
available to all site personnel. This program, believed the only one of its kind in the US, has driven home the concept of ‘safety culture’ and has tangibly demonstrated plant management’s confidence in PRA technology and the sincerity of its interest in managing risk. Each assessment of a plant configuration on the Safety Monitor generates a recommended allowed outage time. This outage time is based on the amount of time required for the evaluated configuration to accrue a core damage probability of 1E–6. In the interest of conservative risk management, San Onofre maintenance planners impose upon themselves a time limit for each planned configuration based on the shorter of the allowed outage times permitted by the Technical Specifications or that calculated by the Safety Monitor. Finally, plant management can establish a policy of planning no voluntary entries into configurations which place the plant in a condition of undesirably high instantaneous risk. Some utilities have opted not to provide restrictions based on instantaneous risk. Others have set the threshold relatively high, e.g. 1E–3/year. Southern California Edison’s philosophy of conservative risk management has resulted in a substantially more restrictive threshold of 2E–4/year. Southern California Edison’s commitment to its goals of objective decision criteria and conservative risk management has been tested. One such test was the assessment and management of risks associated with switchyard activities. Few switchyard designs are as robust as that at San Onofre, with four trains of connections between each plant and the switchyard, and with nine connections between the switchyard and the offsite power grid. The likelihood of work-related faults propagating into a loss of offsite power could have been subjectively argued as negligible. Nevertheless, fault likelihood was modeled with frequencies derived from industry data documented in NUREG-14107 and modified by subject experts to account for varying degrees of hazard associated with different types of switchyard activities.5 With its conservatively low threshold for instantaneous risk, switchyard activities posing high risk of faults (e.g. hot washing of insulators) were found to preclude simultaneous outages of a large number of plant components. Many possible options for dealing with the resulting scheduling challenges were not supported by SCE’s desire for robust datasupported modeling of the risks. Ultimately, SCE concluded that unless its goals of objectivity and conservative risk management were to be compromised, actual changes to the plant would have to be made. These changes ended up being modest enhancements to the coordination of switchyard maintenance with in-plant maintenance activities which, while being simple and inexpensive, were highly effective in reducing risk. In summary, the utility’s risk management program encounters few restrictions on plant configuration or activities which cannot easily be accommodated by the work scheduling process.
4 CONCLUSION In the development and use of its Safety Monitor, SCE has been guided by two philosophical perspectives relative to managing configuration risks: (1) maximize the objectivity of PRA-informed decision-making, and (2) ensure that risk is managed conservatively. Other utilities may find different applications of these philosophical perspectives or be guided by different philosophies altogether. The best choice will likely not be discernible for years to come, if ever. Much clearer, however, has been the impact of the Safety Monitor itself on the nuclear power industry. The Safety Monitor marked the coming of age of PRA and its transition out of the ‘classroom’ and into the real world by demonstrating the ability and value of PRA as a realtime operational tool. It established a new standard of plant safety, and thereby has contributed to ensuring not only public health but also, hopefully, the future health of the nuclear power industry. Finally, it must be noted that the Safety Monitor was not developed ex nihilo. It was built upon the work of those preceding it. The ultimate credit for what risk assessment/management tools like the Safety Monitor are able to accomplish today belongs to those who for years have pioneered the technology by establishing its visions, empowering the science, and guiding its use.
REFERENCES 1. Hook, T. G., Use of the Safety Monitor in operations decision-making and the Maintenance Rule at San Onofre Nuclear Generating Station. Paper presented at PSA 95 Conference, Seoul, Korea, 26–30 November 1995. 2. Lee, R. J., Morgan, T. A. and Hook, T. G., Insights from use of the Safety Monitor at San Onofre Nuclear Generating Station. Paper perpresented at the ANS Topical Meeting Computer-Based Human Support Systems: Technology, Methods, and Future, Philadelphia, 25–29 June 1995. 3. Hook, T. G. and Morgan, T. A., Development and application of the San Onofre Safety Monitor. paper presented at the International Topical Meeting on Nuclear Thermal Hydraulics, Operations, and Safety, Taipei, Taiwan, 3–8 April 1994. 4. Hook, T. G., Lee, R. J. and Morgan, T. A., Application of the San Onofre Safety Monitor to daily plant operations and maintenance activities. paper presented at the ANS San Francisco Meeting, 14–18 November 1993. 5. Hook, T. G., Risk impact of switchyard maintenance through use of Safety Monitor. Paper presented at the PSA 96 Conference, Park City, Utah, 29 September–3 October, 1996. 6. Software design document for the San Onofre Safety Monitor, Version 1.0. NUS Project 2342, Revision 1.0, 1 December 1993. 7. NUREG-1410, Loss of vital AC power and the residual heat removal system during mid-loop operations at Vogtle Unit 1 on March 20, 1990. U.S. NRC, June 1990.
Use of Safety Monitor at a nuclear generating facility APPENDIX A APPLICATION OF SAFETY MONITOR TO THE MAINTENANCE RULE PROGRAM A dynamic tool which could address the risk from plant design changes, procedure changes and all possible combinations of component outages in a quantitative manner was considered essential to meeting the Maintenance Rule requirements. The Safety Monitor fulfills these specifications as well as granting an interactive software to provide around-clock support required by the Maintenance Rule. This appendix gives details on the use of Safety Monitor for the selection of performance criteria, adding LERF in the development of performance criteria, enhancement to the Safety Monitor during implementation of Maintenance Rule, and future expansion of this risk assessment tool to provide full support for Maintenance Rule and other living PRA applications.
Appendix A.1 Selection of performance criteria for Maintenance Rule The Maintenance Rule requires demonstration that the performance criteria used for reliability and availability preserve the assumptions in the PRA or that the use of criteria which exceeds PRA assumptions do not have an adverse impact on plant risk. The difficulty in developing reliability or availability performance criteria consistent with the PRA is that the PRA contains average expected availability and reliability assumptions based on a combination of generic and plant-specific data. Utilizing the PRA reliability and availability assumptions as performance criteria would result in the risk-significant structures, systems and components (SSC) exceeding their performance criteria 50% of the time, without any influence from poor maintenance. This would cause maintenance-induced degradations in SSC performance to be masked by the statistical variations in normal SSC operation. Performance criteria with adequate margins need be selected to avoid excessive false goal setting, i.e. elevating the level of system performance monitoring owing to statistical variations of physical process, rather than deterioration of maintenancepreventable functional failures. Rapid turnaround time associated with use of the Safety Monitor allows utilities to perform a large number of CDF and LERF calculations to develop the appropriate performance criteria. The typical quantification step involves an iterative process of solving for CDF and LERF with higher unavailability assumptions until a pre-established risk limit is exceeded. The selection of risk-based acceptance criteria is normally based on plant-specific risk results and follows the EPRI PSA Applications Guide. Owing to the large number of PRA full plant model calculations involved in the determination of performance criteria, the Safety Monitor leads to significant savings of PRA analysis effort.
15
In using the Safety Monitor to determine the allowed train unavailability without exceeding selected risk limits, one possible approach is to set all trains of a system to the same unavailability when calculating the increase in risk. This assumption is based on the observation that for most systems, preventive maintenance influences SSC unavailability more than that of corrective maintenance. Preventive maintenance is generally scheduled using a consistent process and at the same frequency for all pumps or trains in the same system. Appendix A.2 Role of LERF in Maintenance Rule performance criteria development The capability of the Safety Monitor to carry out radiation release calculations (or Level 2 PRA analysis) is another stride forward in assessing performance of plant equipment that is not directly linked to core damage, but affects containment integrity and radiation releases. This equipment would affect release of radioactive material outside the containment building after core damage. Plant equipment in this category includes the containment spray system, containment fan coolers, containment isolation system, etc. In the San Onofre Safety Monitor, the completed Level 1 PRA top logic model has been expanded to include the calculation of LERF from the Level 2 PRA. In the Safety Monitor, the calculation of both Level 1 and Level 2 cutsets is performed simultaneously, using a set of plant damage ‘tag’ events to identify which plant damage state each cutset contributes. The required data is extracted from the phenomenological portion of the Level 2 PRA to determine the conditional likelihood of a large early release for each plant damage state. In summary, adding of LERF modeling to the Safety Monitor not only provides the capability to establish performance criteria for plant equipment not directly linked to core damage, but it also provides a different perspective for components critical to maintain core integrity. For example, the combination of plant equipment failures that result in core damage sequences with high reactor coolant system pressure and loss of steam generator secondary side water level are the sequences that have a significant contribution to LERF. Since both CDF and LERF plant level performance criteria are included in the Maintenance Rule, the governing criteria can be either of the two parameters. In the selection of availability performance criteria for the risk-significant systems, a significant number of performance criteria were generated by the ceiling associated with LERF. Appendix A.3 Summary of Safety Monitor enhancement generated by the Maintenance Rule program The capability to include or exclude maintenance unavailability in the model was added to Safety Monitor in the early stage of Maintenance Rule development. This Safety Monitor option allows the PRA analyst to set all
16
S. H. Chien et al.
maintenance unavailability basic events to plant-specific PRA values or set all maintenance contributors to zero. This feature enables use of Safety Monitor in two modes: (1) as a risk monitoring tool, and (2) as a risk application tool. In the risk-monitoring mode, dynamic plant configuration requires average maintenance contribution to be excluded i.e. set maintenance basic events to zero. However, to quantify the plant unavailability impact on CDF as required by Maintenance Rule, the averaged plant configuration must be used and the maintenance unavailability of an individual system needs to be set to the plant-specific average values based on operational data. Another enhancement to the Safety Monitor in the field of plant risk management is to add a feature to develop automatically a list of plant components whose failure would push the CDF into a high risk level. This automatically generated list provides the operator with user-friendly guidance for prioritizing the available resources if additional equipment outages are identified. The other expansion of Safety Monitor scope is to include equipment that is not accounted for by a normal PRA, but is in the scope of Maintenance Rule Program. These systems are either not explicitly modeled by the PRA or have a low risk significance. Adding these components to the Safety Monitor equipment list allows sitewide users to keep track of the operating status of all components in the Maintenance Rule high safety significance scope.
APPENDIX B EXPANSION OF SAFETY MONITOR CAPABILITY FOR OVERALL RISK MANAGEMENT The plan for future expansion of this risk-assessment tool is to provide full support for Maintenance Rule and other living PRA applications by incorporating shutdown, internal fire and seismic models into the Safety Monitor. The justification for further enhancements to the Safety Monitor is that although the internal initiating event PRA has identified all risk-significant components that can be revealed by internal and external PRAs, the risk ranking of the same set of components could be different for internal versus external initiators. For example, a diesel generator could be perceived to have a higher risk worth in a seismic PRA due to the increase of its importance in a loss of offsite power transient. In the mean time, risk ranking of the turbinedriven AFW pump could be lowered as a result of the longer time delay required to restore offsite power after a major earthquake (e.g. delay time exceeds DC battery capacity to sustain turbine-driven auxiliary feedwater pump operation). A change in equipment risk ranking is also expected in a comparison between the shutdown and full power PRA. Equipment with minimal risk impact in a full power model may have a different role during refueling outages. For example, components in the shutdown cooling loop may not be modeled in a full power PRA, but have a higher risk worth during a refueling configuration.