Reliability Engineering and System Safety 27 (1990) 231-240
An Integrated Program of Risk Assessment and Operational Reliability Monitoring at Ontario Hydro F. K. King, S. B. H a r v e y Ontario Hydro, 700 University Avenue, Toronto, Ontario, Canada M5G 1X6
& C. E. P a c k e r Darlington Nuclear Generating Station, Box 4000, Bowmanville,Ontario, Canada L1C 3W2 (Received 11 February 1989; accepted 24 April 1989)
A BS TRA C T This paper describes a program in place at Ontario Hydro where risk models are being developed for all its nuclear generating stations and where the results of these risk models and their constituent system reliability models are being used in the operational reliability program for the stations involved. Furthermore, the results of the station operational reliability monitoring program are being utilized to. keep the system reliability and station risk models up to date. The program in place for the Darlington Nuclear Generating Station is examined in detail to illustrate Ontario Hydro's approach.
1 INTRODUCTION As part of its approach to risk management, Ontario Hydro has for many years used probabilistic safety analysis techniques in the design of its nuclear stations 1-4 and has formally monitored and controlled the reliability of important safety systems during plant operation. 5 Over the years these activities have evolved to what is now a fully-integrated lifecycle program of 231 Reliability Engineering and System Safety 0951-8320/90/$03.50 © 1990 Elsevier Science Publishers Ltd, England. Printed in Great Britain
232
F. K. King, S. B. Harvey, C. E. Packer
risk assessment and operational reliability monitoring. The term fullyintegrated means that the main ingredients (i.e. which systems, which components, what test frequencies) of the operational reliability program are derived from the results of the station risk assessment and its constituent system reliability models. Furthermore, the results of surveillance testing activities conducted as part of the operational reliability monitoring program are used to update the station risk model throughout the life of that station. This provides a continuing degree of assurance that the overall plant design and operation remain consistent with safety objectives. A risk assessment provides an understanding of the important risk contributors at any point in time. By updating the risk assessment with station-specific operating experience this understanding is kept current and provides a continuing useful vehicle for communication between operations and design personnel. The primary benefit is enhanced safety. System-based fault tree models are derived from the risk assessment and form the basis of the day-to-day operational reliability monitoring program. This results in no duplication of effort as there will be one set of reliability models for use in both design and operations and all design and operational staff are contributing to one safety reliability program for the station. This integrated program is first being fully instituted at Ontario Hydro's Darlington nuclear generating station, a 4 x 880MWe C A N D U station, the first unit of which is scheduled to go critical in mid-1989. It is intended that programs at Ontario Hydro's other nuclear stations, Picketing (8 x 540) and Bruce (8 x 850), will be made consistent with the Darlington program once currently-underway risk assessments for these stations are complete. The purpose of this paper is to describe the integrated program as it is applied at Darlington. The program is illustrated in Fig. 1.
STATION RISK ASSESSMENT
GENERIC FAILURE DATA
BEST AVAILABLE DATA
SYSTEM ~ LEVEL FAULT TREES
1
]
TEST, MAINTENANCE, OPERATING PROGRAMS
STATION EXPERIENCE FAILURE DATA
Fig. 1. Integrated risk assessment/operational reliabilityprogram.
Risk assessment and operational reliability
233
2 ORGANIZATIONAL STRUCTURE Ontario Hydro is a publicly-owned utility in the province of Ontario, Canada and currently acts as designer, constructor and operator of its nuclear stations. Within the design side of the company, the Nuclear Safety Department is responsible for licensing and safety analysis activities. This department includes a Risk Assessment Section responsible for the preparation of station risk models. On the production side of the company there are technical sections, which include a safety/reliability unit, at each station. There is also a head office support group, which is responsible for coordinating safety-related reliability activities at all stations. All three groups mentioned work closely together to develop and maintain the integrated program which is the subject of this paper.
3 RISK ASSESSMENT PROGRAM Ontario Hydro has initiated a program which will produce comprehensive risk models for all its nuclear generating stations. The schedule for this program is illustrated in Fig. 2. Since each station has four identical units, five risk models are adequate to represent twenty reactors. Each station is sufficiently different from its predecessor that station-specific models are required. The current status of the risk model development program is that the Darlington risk assessment (also known as the Darlington Probabilistic Safety Evaluation or DPSE) has been completed4 and work on the Pickering A risk assessment is scheduled for completion in early 1990. The DPSE STATION (4 UNITS EACH)
1983
I 1 1987 I I
1988
1989
1990
1991
DARLINGTON A I I !
PICKERING A
I
y
I I I
PICKERING B
I
I I I I I I
BRUCE A BRUCEB
y
I Fig. 2.
Risk model development schedule.
1992
234
F. K. King, S. B. Harvey, C. E. Packer
study, initiated by Ontario Hydro as a safety design review vehicle, led to the uncovering and correction of many design problems while Darlington was still under construction. 6 This 'good' experience was a factor in Ontario Hydro committing risk assessments for all its operating stations. While the operating stations have undergone earlier forms of probabilistic safety assessment 2'5 it was felt that the additional rigour of current methods could provide further insights as well as yield a product much more amenable to future updating. Ontario Hydro's risk assessments involve a calculation of core damage frequency, containment release frequency and public health risk; in probabilistic safety assessment terminology they are Level-3 PSAs. They are limited, however, to internal events. Once risk models have been prepared for all stations, then the inclusion of selected external events, which are now considered in a deterministic manner, will be examined. Another characteristic of Ontario Hydro's risk assessments is the extent and depth of system fault tree analysis. Fault trees are developed according to the step-bystep development procedure, 7'a yielding large fault trees, but ones which are easier to review, verify and update than those prepared with less structured modelling techniques, especially by new analysts. Fault trees developed in Ontario Hydro risk assessments also model electrical and control aspects of systems in great detail. This is felt necessary to uncover any cross-links between redundant components and between systems caused by failures in these aspects of design. Methods for the efficient processing of fault trees have been developed over the years. Current initiatives in this area are directed towards facilitating system fault tree and risk model updates. Current methods in use on the Pickering A risk assessment involve the use of an executive computer code (SIMPLE) for fault tree processing. This code initiates automatic failure data assignment (DAS), basic event probability calculations and SETS input file creation (INPUT), fault tree modularization (MOD), fault tree solution (SETS), fault tree plotting (FTD) and primary event data table creation (PETAB). All computer processing is performed on a Cyber 930-31 computer. Risk model updates, with new station-specific failure data and required logic changes, will be performed on an as-required basis. The updating frequency will depend on the significance of the required logic changes and on the characteristics of the data base changes. Updated data bases will typically be available on an annual basis. The new station-specific failure data will be derived from the operational reliability monitoring and deficiency reporting systems as will be discussed later in the paper. The purpose of periodic risk model revision and requantification is to obtain any further insights into the adequacy of station design and operation as a result
Risk assessment and operational reliability
235
of operational experience, beyond those apparent from the system-based operational monitoring program described below. Another purpose is to maintain an up-to-date risk model of the station which can be used to review the integrated impact on station risk of any design changes which might be proposed in the future. 4 OPERATIONAL RELIABILITY PROGRAM
4.1 Purpose The purpose of the Darlington operational reliability program is to provide assurance that the availability of poised safety-related systems, functions and components continues to be consistent with their safety role. Specifically, the program identifies important safety features, ensures adequate corresponding test procedures are developed and allows the implications of abnormal configurations related to equipment failures, maintenance outages, hardware line-ups or test deferments and changes to failure data to be assessed. The program requires that the observed performance of systems and functions be consistent with initial predictions or that the impact of deviations be shown to be acceptable.
4.2 ~ o ~ The Safety Report for the station generally identifies what systems and functions are important to safety. In the case of Darlington this information has been augmented by further insights obtained by the DPSE study. The systems, functions and components which are subject to the operational reliability program can be categorized as follows. 4.2.1 Special safety systems This refers to reactor shutdown systems number 1 and number 2, the emergency coolant injection system and the containment system. The need to have reliability models for these systems to predict future reliability, and to demonstrate through testing that unavailability limits are being met, has been a regulatory requirement in Canada for many years. 4.2.2 Major mitigating systems These are systems whose roles are mostly related to the maintenance of heat sinks. In Darlington these systems are the shutdown cooling system, the steam generator emergency cooling system, the emergency service water system, standby Class III power, standby emergency power and the auxiliary boiler feedwater system.
236
F. K. King, S. B. Harvey, C. E. Packer
4.2.3 Other poised safety-related functions These are poised features in systems other than the aforementioned systems which provide a significant safety-related function. A few examples of such functions are: (i) (ii) (iii) (iv)
relief valves on safety-related equipment local air reservoir check valves main control room outside air auto isolation auto closure of valves that isolate certain service water loads on loss of normal power (v) closure of bleed condenser isolation valve on high D20 storage tank level One of the benefits of having a risk assessment in place for Darlington has been the ability to better identify those systems and functions which have safety roles (e.g. design protections against containment bypass events). The main features of the Darlington operational reliability program are: (i)
The special safety and major mitigating systems listed above will have separately packaged fault tree models and availability targets derived from the Darlington risk assessment. These models will be periodically used to forecast future unavailability with the use of updated data. Actual unavailability will be measured by surveillance testing and will be monitored for deviations from target values. (ii) Test procedures will be verified against the system fault tree models, to ensure that all postulated failure modes will be tested. (iii) Operational reliability program test procedures will be scheduled, performed and results analyzed under a system of strict administrative control referred to as Safety-Related Systems Tests (SRSTs). The conduct and results of such tests will be monitored closely by the station reliability unit. These features will be discussed in more detail below.
4.3 Preparation of system models The fault tree models for both the special safety systems and the major mitigating systems have been derived from the system models in the overall station risk model with only two simplifications. Firstly, the support system events (electrical power, service water, instrument air) in the fault trees have been replaced by an undeveloped primary event with an assigned probability (in the DPSE these would be developed events whose failure logic would be included as part of the risk model integration process). Secondly, mission time events have been removed.
Risk assessment and operational reliability
237
The treatment of support systems has been simplified because these systems are normally running and hence their availability is being continuously demonstrated. Mission-related events have been removed for the following reasons: (i)
it is often not practical to perform routine functional testing for mission time related failure modes as a full functional test would require that equipment be routinely run for the assumed mission duration, (ii) there is a desire to keep the system fault trees models as simple as possible due to operational-use considerations. In justification of the above simplifications it should be remembered that the station risk model is based on rigorous integration of support system and front-line system logic and includes full consideration of mission time events, and that the risk model will be updated and reviewed for design and operational implications. The system fault tree models have been prepared for operational use in hardcopy report format (assumptions, data, plot) and as a computer file with associated spreadsheet-based interrogation software.
4.4 Operational surveillance testing It is necessary that test programs be established to substantiate reliability assumptions made in the Safety Report and related licensing submissions. The test program is developed according to a formal Nuclear Generation Division procedure which specifies responsibilities, requirements for preparation and revision of test procedures, scheduling constraints and test performance requirements and is applicable to all Ontario Hydro operating reactors. The test program for a given system is initially proposed and developed as part of pre-commissioning activities by an engineer, resident at the station, responsible for the particular system. The tests are checked by the line organization and also by a reviewer in either the head office design or operations' support groups. These reviews are intended to confirm that the relevant failure modes identified in the risk assessment would be uncovered by the test program. Safety-Related System Tests are scheduled by the station Reliability Unit. Standard schedules for conducting these tests are developed for normal operation and for the shutdown state. The objectives of these standard schedules are to level operator workload, to ensure tests are scheduled at the appropriate frequency, to ensure tests of redundant features are staggered, to ensure expected operating conditions are as required for the test and to
238
F. K. King, S. B. Harvey, C. E. Packer
test separately independent channels by different work crews to the extent practicable. Modifications to the standard schedules are made within the Reliability Unit when warranted by temporary conditions (e.g. extra tests required to compensate for a failed redundant component or a test deferment because a prerequisite cannot be met). The test frequencies are initially set to align with those used in the risk assessment. The frequencies in the risk assessment were based on previous station practice with modifications as suggested by the station system engineers. Many of the Safety-Related System Tests at Darlington are presented to the Unit First Operator by computer generated visual display. The tests identify necessary prerequisites and precautions. The tests are performed step-by-step with acknowledgement of each step by the operator. A paper record of the test is produced. It is the responsibility of the Unit First Operator to ensure that unit conditions are acceptable for the performance of a test. For example, it is necessary to check the line-up of the system on test and of interfacing systems to ensure that no known deficiency or maintenance activity will result in a degraded system as a result of the test. The operator is only permitted to perform one test at a time. When tests result in reduced redundancy of a system the operator must limit and record the duration of the condition. If it is judged that unit conditions do not allow a test to be performed, the rationale is logged and the test is rescheduled at the next available opportunity, often the next shift.
4.5 Analysis of results A record of all Safety-Related System Test results is sent to the station Reliability Unit for review. All failures and appropriate information from deficiency reports (records of failure maintenance activities), associated work reports and unit logs are entered into a computerized data base, the original elements of which were identified from the basic events in the risk assessment. The records of tests involving failure are also sent to the system engineer for further analysis. If the failure mode is new and was not considered in the risk assessment, a detailed assessment of its impact is performed and the need to revise the risk assessment is considered. Periodically (usually annually) the data base of failure experience is analyzed to detect statistically significant trends, either positive or negative, compared to previous experience and generic data. Positive trends may identify an improved work practice which may be useful on other systems or stations. Negative trends need to be evaluated to establish the impact on the overall system using the system reliability model. A negative failure rate
Risk assessment and operational reliability
239
trend also identifies that the surveillance program scope and frequency may need review. It is not possible or practical to compare all of the input assumptions or primary event frequencies used in the risk assessment to observed experience. For example, 'mean time to failure' for long mission time events cannot be accurately monitored since long mission tests are costly and not necessarily informative (it is not possible to simulate post-accident conditions). Similarly, many human interactions cannot be quantified from operating experience. The intention is to periodically review such unverifiable inputs to the risk assessment to consider whether they are the 'best available' based on current technology and experience. 4.6 Operational use of system models The Operating Policies and Principles (OP&Ps) for the station define the envelope of conditions under which the station may be operated. Within the constraints of the OP& Ps, important safety-related systems are maintained and tested to meet availability targets. The system fault tree models play an important role in assisting day-today decision making regarding the acceptability of abnormal system situations. Typical questions are: (i)
How long can a component be left out for maintenance without significantly affecting system expected unavailability? (ii) How long can a scheduled test be delayed without significantly affecting system expected unavailability? (iii) If a component is found in the failed state, what other components should be tested to demonstrate acceptable expected system availability? Procedures have been developed which will allow trained staff at the station to use the system fault trees and their minimal cutset solutions to answer questions of this type. To assist staff in this task, personal computer based spreadsheets of the fault tree solutions are currently available. A more userfriendly and powerful computer code is presently being developed which will allow the interrogation and manipulation of fault tree models and their solutions to better assist their use in an operational environment. 5 BENEFITS OF AN I N T E G R A T E D P R O G R A M The benefits of the integrated risk assessment and operational reliability monitoring program described above are many. Comprehensiveness is improved because all reliability practitioners are feeding their risk insights
240
F. K. King, S. B. Harvey, C. E. Packer
into a c o m m o n analysis, namely the risk assessment and its constituent system reliability models. A broader base of expertise is embodied in the risk assessment and is therefore available to the user whether the user is a designer, operator, or in a technical support function. The comprehensive, closed-loop nature of the program should lead to enhanced safety. Using single reliability models for both design and operational reasons also leads to less duplication of effort and enhanced communication. Integrated programs should also provide better assurance that future changes to designs or operational practices are appropriate given the broader, coordinated investigations prior to commitment.
6 SUMMARY Ontario H y d r o has embarked on an integrated risk assessment and operational reliability monitoring program in design and operations that is expected to result in enhanced safety and reduced duplication of effort. This approach involves comprehensive risk assessments for all stations and a coordinated program of operational reliability monitoring. REFERENCES 1. Laurence, G. C., Boyd, F. C., Jennekens, J. H., Sutherland, J. B. & Hamel, P. E. Reactor safety practice and experience in Canada. Third UN International Conference on the Peaceful Uses of Atomic Energy, Geneva, 1964, Atomic Energy of Canada Limited Report 2028. 2. Gumley, P., Use of Fault Tree/Event Sequence Analysis in a Safety Review of CANDU Plants. IAEA Publication IAEA-CN-39/7, Vienna, 1981. 3. King, F. K., Raina, V. M. & Dinnie, K. S., The Darlington probabilistic safety evaluation--A CANDU risk assessment. In Proceedings of 8th Annual Conference of the Canadian Nuclear Society, Saint John, N. B., 1987. 4. Ontario Hydro, Darlington Probabilistic Safety Evaluation--Summary Report. December 1987. 5. Brunnader, H. & Farr, J. A., In-service monitoring of safety system reliability. Nuclear Safety, 27 (1986) 499-504. 6. King, F. K. & Raina, V.M., The benefits of pre-operational risk assessment based on experience with the Darlington probabilistic safety evaluation, In Proceedings of the ANS/ENS Topical Meeting on Probabilistic Safety Methods and Applications, San Francisco, 1985. 7. Raina, V. M., System modelling techniques and insights from the Darlington probabilistic safety evaluation study. In Proceedings of the ENS/ANS Topical Meeting on Probabilistic Safety Assessment and Risk Management, Zurich, 1987. 8. Vesley, W. E., Goldberg, F. F., Roberts, N. H. & Haasl, D. F. Fault Tree Handbook. USNRC, NUREG-0492, ! 981.