030%0161(94)00130-8 ELSEVIER
USING
PRA IN SUPPORT
ht. J. Pres. Ves. & Piping 61 (1995) 593-608 Elsevier Science Limited Printed in Northern Ireland 0308-016119.5/$09.50
OF MAINTENANCE
OPTIMISATION
L. J. PERRYMAN, N. A. S. FOSTER and D. R. NICHOLLS Nuclear Safety Division, Technology Group, Eskom. PO Box 1091, Johannesburg 2000, RSA.
ABSTRACT For a number of years the Plant Specific Probabilistic Risk Assessment of Koeberg Nuclear Power Station has been used to support maintenance optimisation. By ranking all components by their importance to nuclear safety we have ensured that the most stringent maintenance regimes are only applied to components where it is absolutely necessary. Further, the overall level of safety is set by our licensing authority. This implies that a certain level of reliability is required for each component involved in safety. By monitoring performance and using the Bayesian analysis technique, it can be deduced whether an acceptable reliability is being achieved and whether a reduction in the maintenance frequency may be justified.
INTRODUCTION A plant specific probabilistic risk assessment uses reliability engineering techniques to determine overall risk criteria for the plant in question. In our case, we have determined the expected frequency of the various possible accident initiating events and the reliability of the numerous safety systems designed to mitigate these accidents. The unreliability of the safety systems is usually determined by using the conventional Fault Tree analysis technique [l]. However, when repairability is an important factor in determining the system unreliability we have chosen to use a Monte Carlo routine,
593
L. J. Perlyman et al,
594
The expected frequency of each possible accident sequence leading to severe consequences is then essentially determined by multiplying the accident initiating event frequencies by the unreliabilities of the safety systems designed to mitigate the accident. Often further complexity has to be included as the inter-dependence of safety systems and operator actions must also be modelled. Koeberg Nuclear Power Station has two 922MW(e) PWR units and first started generating power in June 1984. Prior to this, our Licensing Authority insisted that Eskom demonstrate that certain probabilistic criteria were met. This then formed part of the license. These criteria were based upon the predicted frequency of the various possible radioactive releases into the environment. Since commissioning this emphasis by the Licensing Authority on using reliability techniques to demonstrate an acceptable level of safety has continued [2,3]. Over the years, Koeberg management have come to rely more and more on Probabilistic Risk Assessment as their confidence and understanding in its techniques grew. Thus, many years of experience have now been obtained in using PRA to aid maintenance optimisation. It should also be noted that PRA has regularly been used to assess operational events and proposed modifications. This paper has two main sections. The first of these sections discusses the techniques which have been used to optimise maintenance. The second section presents a few of the many possible examples of situations when the plant risk assessment was used as the principle tool for optimising maintenance. Finally, some concluding remarks and references are given.
MAINTENANCE Component Importance
OPTIMISATION
TECHNIQUES
Listings.
Most of the new plant specific risk analyses of nuclear power stations do contain a component importance listing. This listing usually ranks the components contained within the risk analysis using the Fussel-Vesely criteria. The Fussel-Vesely criteria in this case is usually defined as the probability of the component having failed given that the “accident” has occurred. The “accident” applicable to nuclear power stations usually being core damage. The Fussel-Vesely measure is useful for grading components for maintenance purposes since it can be assumed that the better the maintenance
Using PRA in supporting maintenance optimisation
the better the component reliability and the Fussel-Vesely measure gives some indication of the consequences should the reliability of the component decrease. At Koeberg we have classified components into three categories for maintenance purposes. These are “Critical Safety Related”, “Safety Related” and “Non-Safety Related”. Components classified as “Critical Safety Related” are maintained according to strict procedural guide-lines and must conform to a comprehensive quality control programme. Clearly all the components that appear within the plant risk analysis must have some relevance to safety and so they are all graded at least “Safety Related”. However, the components near the top of the list are classified as “Critical Safety Related”. The Fussel-Vesely cutoff value between “Critical Safety Related” and “Safety Related” components is largely a matter of opinion. Ideally all components which could have any impact on safety should have the most stringent possible maintenance regime. However, since resources are always finite, it is better to concentrate on the more important components while ensuring they all receive adequate maintenance. Our plant risk analysis is more comprehensive than most and so includes the Waste Treatment and Fuel Handling systems. These systems have no impact on the likelihood of core damage but their failure can lead to a relatively minor release of radioactivity into the environment. Since our plant risk analysis covers all aspects of on-site nuclear safety, the component importance listings can be used as the basis for suggesting which components require the most stringent maintenance regime and which only require a comparatively relaxed maintenance regime. Another significant improvement we have made is that we not only grade components by the Fussel-Vesely criteria but also by the Risk Achievement criteria. In this case, the Risk Achievement criteria is defined as the new expected core damage frequency given that the component has failed. Although the Risk Achievement criteria is usually used for assessing the risk of operating with a piece of equipment failed, it can also be used in conjunction with the Fussel-Vesely criteria to assess the level of maintenance required. The table below is a small extract from one on the component importance listings contained within our plant risk analysis. The full listing contains many hundreds of components.
59.5
596
L. J. Perryman et al.
Table 1. An Extract From The Component Importance Listings Within The Plant Risk Analysis.
r
Component
High Head Safety Injection Pump Containment Spray Pump Pressuriser Relief Valve Train A Busbar Train B Busbar Reactor Trip Breaker, Train A Reactor Trip Breaker, Train B Emergency Diesel Generator Auxiliarv Feedwater Puma
Fussel-Vesely Value 4.5E-2 3.6E-2 3.5E-2 2.5E-2 25E-2 2.3E-2 2.3E-2 1.7E-2 1.2E-2
Risk Achievement Value l.lE-3 1. SE-3 3.7E-4 3.OE-1 3 .OE-1 7.OE-4 7.OE-4 l.lE-4 9.9E-5
Care must be taken when generating component importance listings. All possible accidents need to be considered within the plant risk analysis. If certain accident scenarios are not considered then the components there to mitigate those accidents will not appear to be important. In particular, it is common practice to take a representative initiating event for an accident type and not consider the whole range of possible accidents. An example of this is the LOCA (Loss of Coolant Accident). Koeberg Nuclear Power Station has three primary circuit loops leading from the reactor vessel. If one of these develops a leak the accident is called a LOCA. Often plant specific safety analyses only consider a break in one of the loops. However, this situation is not symmetrical and will place undue importance on certain components. In this case, we have found it necessary to analyse breaks in each of the loops even though the overall consequences to the environment are similar in each case, simply to ensure the importance of each component is correctly determined. Another area which caused us some difficulty concerns multi-train systems. If a system contains more than one train where one may be in operation while another is on standby then care needs to be taken in determining the importance of the individual components. This is because the failure modes of the components in the different trains of the system may not be the same. For instance, the standby train may fail to start while the operating train can not have this problem. Since the failure modes are different, the probability of failure will be different and so the Fussel-Vesely and Risk Achievement values will be different. This is fine if the system line-up never changes but if the train
I
Using PRA in supporting maintenance optimisation
597
on standby regularly changes, as is usually the case on a nuclear power station, then differing importance values for identical equipment results. We have overcome this problem with the use of House Gates in our Fault Trees [I]. House Gates are usually used to specify the conditions at the start of the accident. Thus, a House Gate may specify “Train A on standby”. This would either be true or false and so would be assigned the value 1 or 0 in the conventional way, We have had to adapt the House Gate system to make it possible to assign a probability to House Gates. The value for the House Gate “Train A on standby” is then determined from site records showing the number the hours the system was in operation and the number of hours Train A was on standby. By using these component importance listings we have been able to concentrate maintenance where it is most needed. Previously, a conservative deterministic classification process was used which resulted in many more components being classified as “Critical Safety Related”. By using the probabilistic listings from the plant specific risk analysis we have ensured that the most stringent maintenance regimes are only applied where absolutely necessary and so the number of components for which this was required was drastically reduced. This has resulted in considerable time saving and financial benefits. Bayesian Analysis Technique. Much has previously been written about Bayesian analysis, (References 141 and [5]). We have used this technique quite extensively to update the generic failure data initially used within the plant risk analysis. When the plant risk analysis was initially released the component failure data came from international literature sources [6], [7]. Since that time a programme has been running to make the component failure data more applicable to Koeberg. This has been done by recording all relevant information on the components at site. The sort of data recorded covers the number of starts, the number of operating hours, the number of failures and details of the maintenance and testing regimes. The Bayesian analysis technique is essentially used to update the assumed failure probabilities for a component using the recorded plant specific reliability data. The original failure probability distribution for the component (recorded as the median failure probability with an error or uncertainty factor) is then .updated using this site data via the Bayesian analysis technique. The result is then a new failure probability distribution which more accurately reflects the true failure probability of the component on site. Since the operating
598
L. J. Perryman et al.
conditions, maintenance regimes and testing practices vary from plant to plant, the generic failure probability distributions available from the literature will rarely match those observed in practice. As time goes by and more and more component reliability data becomes available from the station then the accuracy of the plant specific risk analysis will increase. However, we can also use the Bayesian analysis technique to assess the adequacy of the maintenance on a specific component or group of components. If the reliability of a component is increasing then its maintenance regime can be said to be good. Visa versa, if the reliability of the component is decreasing then it may indicate that more attention to maintenance is required. It therefore follows that the Bayesian analysis technique can be used to assess whether the current maintenance regime on a specific component, or group of components, can be relaxed without detrimentally affecting the overall system reliability. Examples of this are given later. In our case this has proved very useful since we are essentially set overall system reliability goals. This has taken the form of the overall risk to the environment must be within certain set limits. By monitoring the component reliabilities and regularly updating the failure rates, we can see whether we are achieving these reliability goals. If we are exceeding the goals by a considerable extent then a reduction in the maintenance regime on various equipment may be justified. By using the Bayesian analysis technique we have been able to justify the reduction in maintenance of a large number of components. These components include pumps, valves and switchboards. The saving in time and money has been considerable but the overall high system reliabilities have been maintained. However, the maintenance department does not always benefit. For example, the Bayesian analysis of the emergency diesel generators at Koeberg indicated that their reliability was decreasing and so a more stringent maintenance regime had to be implemented. Another point that must be kept in mind is the need for accurate and complete reliability data collection at the site which can often prove tiresome for the personnel involved.
Using PRA in supporting maintenance optimisation
EXAMPLES
OF MAINTENANCE
OPTIMISATION
Component Classifkation. As discussed in the previous section, the plant risk analysis contains listings of the safety significance of components. These listings can be used to classify components. The higher the classification the more stringent the maintenance and quality control requirements. Previously, all the equipment at Koeberg was classified using deterministic criteria. One of the principle criteria was that if the failure of the component resulted in an serious accident or a significant decrease in the reliability of an important safety system then it was graded as vital to safety. Obvious examples of such components would be the reactor pressure vessel and the safety injection pumps. A less obvious example would be the instrument isolation valves for the sensors connected to the primary circuit. These valves are manually operated and are set to the correct position prior to reactor start-up. If they are not in the correct position the reactor will stay shut-down. Once the reactor is in operation these valves are inaccessible. Thus, the only failure mode of concern would be rupture of the valve casing. If this happened then a LOCA (Loss of Coolant Accident) would occur. This is why these instrument isolation valves were previously graded as important to safety. However, the instrument isolation valves connected to the primary circuit are not deemed critical to safety in the plant risk analysis. This is because the probability of valve rupture is so low. The probability assumed for valve rupture is only 3 .O x 10-q per hour while, by comparison, the probability of a manual valve failing to open on demand is 1.0 x 10e4. The probabilistic analysis ranks the risk (likelihood and severity) of an accident. Since rupture of the casing of a manual valve is so unlikely, accidents involving these instrument isolation valves do not significantly contribute to the overall risk from operating Koeberg Nuclear Power Station. Therefore these instrument isolation valves were ranked low down in the component importance listings generated from the plant risk analysis. This was then used to justify down-grading the maintenance and quality control on these components. This one case involved 66 valves and so resulted in considerable financial savings. Many other examples could be given of components that were downgraded for maintenance purposes because of the results of the plant risk analysis. A major reason for this was that the initial deterministic component
599
600
L. J. Perrynan
et al.
classification was not precise and so tended to be conservative, over-grading many components. The advantage of the plant risk analysis is that it highlights what components are actually important to safety. However, there were a few cases where components were up-graded. This occurred when components previously thought to be unimportant to safety turned out to be much more significant. An example of this was the condenser extraction pumps which in an emergency could possibly supply cooling water to the steam generators. These pumps were not initially designed for this purpose and so the initial component classification process over-looked them.
Residual Heat Removal Pumps. In March 1992 the probabilistic analysis group was asked to assess the impact on safety of changing the frequency of the overhaul maintenance schedule on the Residual Heat Removal pumps from every second reactor outage to every sixth reactor outage. There are four of these pumps, two on each unit. All the pumps would still be inspected every reactor outage and if this, or any of the pump monitoring devices, detected any problems then the pumps would be overhauled and brought back to full working order. Bayesian analysis becomes invalid if the components it is applied to are entering their “wear-out phase”. This is because once the components begin to wear out the Bayesian analysis technique can predict a lower failure rate than is really the case. A series of components approaching the end of their useful life can have a sharply increasing failure rate which would not initially be predicted by applying the Bayesian analysis technique. Only once some failures had occurred would the Bayesian analysis technique show the increase in the failure rate. It therefore becomes necessary to demonstrate that the components being analysed would not suffer from wear during the next period of operation before the Bayesian analysis technique can safely be applied. Table 2 shows the site data recorded for these pumps up to 26/2/92. From Table 2 it can be seen that all the Residual Heat Removal pumps had only done approximately 10,000 running hours and so it could be safely assumed that they would not approach the end of their useful life within the near future. Conservatively, the manufacturers suggest that wear out should not occur before the pumps have individually completed at least 40,000 running hours. Since the Residual Heat Removal pumps should only do approximately 860 running hours during a reactor outage and none while the reactor is at power, it was concluded that there should be no problems associated with wear out for a considerable period of time.
Using PRA in supporting maintenance optimisation
601
Table 2. Site Data On The Residual Heat Removal Pumps For The Bayesian Analysis.
I
Pump
I
Running Hours
1 Number
Of Failures
1 RRA 001 PO
Table 3. List Of Defects Recorded On The Residual Heat Removal Pumps. Pump
Date
1 RRA 001 PO 1 RRA 001 PO 1 RRA 001 PO
13/04/M 20/05/89 17/10/90
Flange leak, re-torqued. Flange leak, faces machined. Flange leak, gaskets replaced.
2 RRA 2 RRA 2 RRA 2 RRA 2 RRA 2 RRA
26/05/87 22/09/88 06/09/88 17/10/88 09/04/90 09/07/91
Flange Flange Flange Flange Flange Flange
001 001 001 001 001 001
PO PO PO PO PO PO
Description
leak, leak. leak, leak. leak, leak.
both flanges replaced. faces machined. faces polished. seat renlaced. re-torqued. re-toraued.
Table 3 lists all the defects that had occurred on the Residual Heat Removal pumps. None of these defects stopped these pumps performing their
]
602
.L J. Perryman et al.
function and so were not classified as pump failures. This highlights an important point. A failure of a component means that the component would no longer be able to perform the function it was designed to do. In the case of these pumps their function is to provide cooled water to the reactor vessel thus removing the heat generated by the reactor core. None of the defects listed in Table 3 were therefore classified as component failures. The results of the Bayesian analysis were generated in accordance with the methodology described within References [4] and [5]. These results are presented in Table 4.
Table 4. Results 0 Bayesian Analysis.
Pump
Old Failure Rate
New Failure Rate
l/2 RRA 001/2 PO
1.0 x 10-4 / hour
3.0 x 10-S / hour
It can be seen from the results of the Bayesian analysis that the new predicted failure rate was considerably lower than that originally assumed. The main reason for this was that no failures had yet occurred with these pumps. This emphasises their reliability and indicated that stripping them down and overhauling them could well have introduced more problems than were solved. It was concluded that overhauling these pumps every two reactor outages involved some superfluous maintenance which could increase the likelihood of problems arising. The ideal maintenance frequency is one that maximises availability and reliability. It was therefore agreed to reduce the overhaul frequency to once every six reactor outages.
Emergency Diesel Generators. An assessment was carried out in the beginning of 1994 to determine the reliability of Koeberg ’ s five emergency diesel generators. These diesels supply power to essential equipment in the event of all other electrical power sources becoming unavailable. The importance of these diesels was previously determined from the plant risk analysis which indicated that they are critical to the safety of the plant. Their reliability was assessed by considering all the failures, and potential failures, that occurred between January 1990 and December 1993.
Using PRA in supporting maintenance optimisation
Even though each of the two units at Koeberg only require one operating diesel under the most severe accident conditions, they are still very important to the overall safety of the plant. The diesels are the last resort given the failure of both off-site electrical power supply lines and the failure of the reactors at Koeberg to supply their own essential equipment. Because of their importance we have five diesels, two for each unit (called unit diesels) and an additional one (called the fifth diesel) which can supply either unit in the case where both unit diesels have failed. The reliability of the diesels was assessed using the methodology described in Reference [8] which takes into account the unavailability due to maintenance and defines an acceptance target for diesel reliability. The number of failures was determined from site records. These failures were classified as “Absolute Failures” and “Partial Failures”. Absolute Failures are those which definitely resulted, or would have resulted, in the failure of the diesel. Partial Failures are those which may have caused the diesel to fail if it was required to supply power for an extended period. Although Partial Failures offered no conclusive proof that the diesel would have failed, these events were deemed significant enough to be given some credit. The number of diesel failures that have occurred at Koeberg between January 1990 and December 1993 are given in Table 5. The “Estimated Mean” is the sum of the Absolute Failures and half of the Partial Failures The approximate number of diesel starts during each year was 160. Table 5. Koeberg Diesel Failures From January 1990 To December 1993.
The probability of failure per periodic test is calculated by simply dividing the number of failures that had occurred by the number of periodic tests. This was done for each year for the Absolute and the Estimated Mean Number of Failures. These results are tabulated in Table 6. Figure 1 is a plot of the Estimated Mean number of failures.
603
L. J. Perrynan
604
et al.
Table 6. Probability Of Failure Per Periodic Test Failures Considered Absolute Failures Estimated Mean
1990 4.98E-02 6.53E-02
1991 1992 O.OOE+OO 1.87E-02 9.33E-03 3.42E-02
1993 2.49E-02 4.35E-02
1 .OOE-01
h
8.00E-02
2
n fiP
I
6.00E-02
--
4.00E-02
--
2.00E-02
--
O.OOE+OO
1
E: a g ,1 E!
I
Year
Figure 1. Diesel Failure Probability Per Demand.
Figure 1 shows that the probability of a diesel failure to start has got progressively worse since 1991. Reference [8] suggests targets for diesel reliability. To meet these targets the diesel failure probability must be below 0.05 per demand. While we had been meeting this target for the last three years, the trend in failure probability suggested that attention needed to be given to diesel maintenance to ensure our continued compliance. To determine the reliability and availability of the diesels the length of time the diesels are not operational due to maintenance is also required. For the purpose of this report, only the non-operability whilst the relevant unit is at power is considered. Up until the middle of 1992 the diesels were overhauled consecutively, one being taken out of service as the previous one was re-commissioned. This meant that one diesel was out for maintenance almost all of the time.
Using PRA in supporting maintenance optimisation
Therefore, as there are a total of five diesels, it can.be determined that each diesel had an approximate 20% unavailability due to maintenance. After the middle of 1992 all planned maintenance on the unit diesels was performed during reactor outages. This was because a previous risk study [9] demonstrated how important diesel availability was to nuclear safety. The policy on diesel maintenance was then changed to ensure all planned maintenance on the unit diesels was carried out during the relevant unit outage. This meant that from the middle of 1992 until the end of 1993 each diesel was only unavailable for an average of 7.8 days per year due to unplanned maintenance. This was due to a number of diesel defects which occurred during this time period. This equates to a 2.14% contribution to diesel unavailability per year. For practical reasons, the fifth diesel is overhauled when both units are at power. In order not to unduly bias the results, only the unavailabilities of the unit diesels is considered in the following pages. To summarise, this means that up until the middle of 1992 the unavailability of each diesel due to maintenance was 20%. After the middle of 1992 until December 1993, the unavailability of the unit diesels due to maintenance was determined to be 2.14 % . These results are presented in Table 7 and plotted in Figure 2.
Table 7. Unit Diesel Unavailability
Due To Maintenance.
Figure 2 clearly shows the improvement made in unit diesel percentage availability following the release of a previous risk study [9] in the middle of 1992. This shows the impact probabilistic risk analysis can make.
605
L. J. Perryman
606
25
et al.
T
Year
Figure 2. Unit Diesel Percentage Unavailability
Reference [8] defines diesels reliability,
Q, as follows :
Q = 1 - (qd + qr) where: .
.
qd is the probability of diesel failure per demand, and qr is the estimated unavailability due to maintenance.
The first parameter, qd, is given in Table 6 and the second parameter, qr, is given in Table 7. The overall reliability of the unit diesels, Q, is presented in Table 8 and shown in Figure 3. Table 8. Unit Diesel Unreliability *
Unit Diesels
As Defined In Reference [8].
1990
1991
1992
1993
73.51%
79.07%
85.53%
93.54%
Figure 3 shows the improvement made in diesel reliability since 1990. Most of this improvement was due to a change in unit diesel maintenance policy brought about by a risk analysis [9]. The new policy meant that all scheduled
Using PRA in supporting maintenance optimisation
607
maintenance on the unit diesels has been carried out during the relevant unit outages since the middle of 1992. The resulting improvement to overall plant safety has been considerable with the probability per year of a core damage accident decreasing by as much as 5 % .
1990
1991
1992
1993
Year
Figure 3. Diesel Percentage Reliability As Defined In Reference [8],
CONCLUDING
REMARKS
This paper has shown the improvement risk analysis has made to maintenance and plant safety at Koeberg Nuclear Power Station. Maintenance optimisation using probabilistic analysis has ensured that only the components critical to nuclear safety have the most stringent maintenance and quality control regimes. This has resulted in considerable financial and time savings. Bayesian analysis is a technique by which component reliability can be monitored highlighting where more or less attention should be given However, for the full benefits of the plant risk analysis to be obtained extensive peripheral work is required. Accurately recording comprehensive component reliability data on site is essential. Further, before the risk analysis can be used regularly to assist in maintenance optimisation, its acceptance by the power station operators, management and the Licensing Authority must be obtained. This will not happen unless the risk analysis has been reviewed extensively and an adequate quality assurance programme initiated. Having
608
L. J. Perryman et al.
done all this, Koeberg management is now willing to use the plant risk analysis in support of maintenance optimisation. No one doubts that this has resulted in an overall improvement in plant safety and considerable financial rewards.
REFERENCES 1. McCormick, 1981.
N. J., “Reliability
And Risk Analysis”,
Academic Press Inc.,
2. Kussman, K.M., The use of Level 1 PRA for Regulatory Decision Making in the Republic of South Africa, Proceedings of an International Symposium on the use of Probabilistic Safety Assessment for Operational Safety, PRA ‘91, IAEA, Vienna 1991. 3. Hill, T.F., Use of Level 2 and Level 3 PRA for Regulatory Decision Making in South Africa, Proceedings of an International Symposium on the use of Probabilistic Safety Assessment for Operational Safety, PRA ‘91, IAEA, Vienna 1991. 4. Apostolakis, G., Kaplan, S., Garrick, B. J. and Duphily, R. J., “Data Specialisation For Plant Specific Risk Studies”, Nuclear Engineering and Design, 1980, ~0156, ~~321-329. 5. Dickson, H. S., Perryman, L. J. and Wolvaardt, F. P., “Statistical Procedure For The Updating Of Generic Failure Data For Koeberg With Site Operating History”, Eskom, Nuclear Engineering Division, ACC 1164090, 2/l/92. 6.
“Reliability Data collection: Used In The Risk Assesment Of The Koeberg Nuclear Power Station”, Framatome, KBA0022E 12002, 11/2/8 1.
7.
“Reliability Data”, United States Nuclear Regulatory Commission, 1400, NUREG 75/014, 1975.
WASH
8. Perryman, L. J., “Probabilistic Risk Analysis of the importance of the emergency diesel generators”, Eskom, Nuclear Safety Division, PRA/LHS/OO 1, 15/7/92.