Reliability Engineering and System Safety 72 (2001) 293±302
www.elsevier.com/locate/ress
An analysis of maintenance failures at a nuclear power plant Pekka Pyy* VTT Automation, PL 1301, 02044 VTT, Finland Received 17 August 2000; accepted 23 February 2001
Abstract In this paper, a study of faults caused by maintenance activities is presented. The objective of the study was to draw conclusions on the unplanned effects of maintenance on nuclear power plant (NPP) safety and system availability. More than 4400 maintenance history reports from the years 1992±1994 of Olkiluoto BWR NPP were analysed together with the maintenance personnel. The human action induced faults were classi®ed, e.g. according to their multiplicity and effects. This paper presents and discusses the results of a statistical analysis of the data. Instrumentation and electrical components appeared to be especially prone to human failures. Many human failures were found in safety related systems. Several failures also remained latent from outages to power operation. However, the safety signi®cance of failures was generally small. Modi®cations were an important source of multiple human failures. Plant maintenance data is a good source of human reliability data and it should be used more in the future. q 2001 Elsevier Science Ltd. All rights reserved. Keywords: Human failures; Human reliability; Data analysis; Statistical analysis; Maintenance; Nuclear power plants
1. Introduction In human reliability research, the main attention has usually been focused upon the control room crew performance in post initiating event conditions. The control room operators have an essential role in disturbance management. On the other hand also maintenance may have an impact on the severity of an incident by recovering lost systems or by erroneously disabling safety related equipment. The chances of operators to successfully manage a disturbance are worsened, if there are latent equipment faults in the safety related systems. Especially, common cause failures (CCFs), affecting several trains of a safety system, may have a signi®cant contribution to the reactor core damage risk [1,2]. Often, CCFs are caused by human maintenance actions. In some cases, even single human actions may affect safety by in¯uencing several components through latent system interactions [3]. In probabilistic safety assessment (PSA), human failures 1 have been divided into three categories [4,5]: (A) pre-initiator events that cause equipment/systems unavailability, (B) * Fax: 1358-9456-6752. E-mail address: pekka.pyy@vtt.® (P. Pyy). 1 The term human failure is used in this paper instead of human error to emphasise that the reasons for a failing human action may be many. Sometimes even a correct human action transmits a fault mechanism into the equipment, e.g. due to a faulty instruction or tools, and thus causes an equipment fault. In the following, the term human failure is used as a synonym to a fault caused by human action.
actions leading to PSA initiating events i.e. human induced initiators, (C) post-initiator human actions. Generally, maintenance actions are included in the PSA models for class A. Type A activities have not been one of the key areas of developments in human reliability analysis (HRA), although some effort has been made, e.g. to assess probabilities of human failures in maintenance [2,6,7]. Furthermore, exact modelling and quanti®cation of repeated human failures in PSA has been formulated in Ref. [8], and maintenance data from several installations has been used to draw generic conclusions in Ref. [9]. Even though some models go as far as to simulate maintenance activities [10], the efforts to develop, validate and verify HRA models with plant data have been few. An example of such studies is that of Reiman [2], and references will be made to it in several parts of this paper. A study on maintenance induced faults was performed for more than 4400 maintenance history reports from the years 1992±1994 of Olkiluoto boiling water reactor (BWR) nuclear power plant (NPP) in Finland [11]. This paper presents a follow-up analysis of the data collected in that study. The objective of the follow-up analysis was to allow statistical inference of unplanned effects of maintenance actions on plant safety. Further, the idea was to generate a database available for potential future probabilistic studies. 2. Methods and material
0951-8320/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved. PII: S 0951-832 0(01)00026-6
In this section, the main characteristics of data collection
294
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
and the statistical analysis are presented. The collection work of the original data, including the detailed classi®cations and description on the analysis ¯ow, is described in a detailed manner in Ref. [11]. In the study, the human failures were classi®ed into single ones, into multiple ones (Human related Common Cause Failures (HCCFs) and Human related Common Cause Non-critical failures (HCCNs)) and into Human related Shared Equipment Failures (HSEFs). HCCFs were common cause failures in redundant components caused by repeated human actions and HCCNs were corresponding non-critical common cause failures, i.e. they do not directly cause faults in a component but may reduce its operability characteristics. HSEFs were multiple faults caused by a single human failure, which is possible especially through latent electrical and instrumentation system dependencies. Other faults taking place in more than one (non-redundant) component or system were classi®ed as single failures, although some of them may have included dependent features. Thorough interviews with the utility personnel revealed 206 single human failures leading to equipment faults, as shown in Fig. 1 demonstrating the screening of data. Apart from this amount, 126 fault history records could be grouped into 37 dependent failures and 11 HSEFs. This reduction in number was due to the fact that several fault records in the database referred to a same dependency mechanism. Together eight HCCFs and six HCCNs were identi®ed in the data, whereas a closer analysis revealed that actually 23 dependent cases were not due to human actions. A typical fault record in the plant maintenance database included the following information: component identi®cation number, component type, room, fault detection time, repair initiation time, repair ®nishing time, number of work-
ers, hours spent, urgency class, fault report number, codes and a short free text description. The codes referred to fault causes, consequences and repair actions usually accompanied by the text description of 1±2 sentences. No information allowing deeper analyses of human behaviour was available in the database due to the fact that it is primarily intended for other purposes than HRA. The data was veri®ed with the maintenance foremen and against other plant documentation, and more information was collected to complete the picture. This part of the analysis was very resource intensive, and it was even completed later on with the regulatory body personnel and plant PSA group. Finally, all single human failures were classi®ed according to the phenotype of the human failure, type of equipment involved, time of failure origin, time of failure detection and type of action that revealed the failure. Further, the mechanisms of the dependent failures were studied in a more detailed manner. The statistical analysis also included studies of safety and economic signi®cance of the failures. This means studying the number of repair hours required, repair urgency class and the PSA signi®cance of the fault. Especially the single human failure database was large enough to allow proper statistical testing and inference. Here, non-parametric tests such as x 2, Fisher's exact probability test and Mann±Whitney's rank sum test [12,13] were used. Generally, a 0:05 was used as signi®cance criterion. SIGMA STAT [14] software was used with veri®cation calculations by other tools. The ability to carry out statistical testing is the reason for devoting a longer discussion for single human failures in the following. Statistical tests were also performed for HCCFs, HCCNs and HSEFs that are more safety signi®cant important categories of human failures, but these sample sizes were considerably smaller.
Fig. 1. Flow chart of the screening of human failures [11].
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
295
Table 1 Distribution of all faults and human failures in cause categories as reported by the plant maintenance personnel (for dependent failures, see Table 6) Reported main cause category (given by utility)
Number of all fault records (unit 1 1 unit 2)
A Failure in installation or earlier B Operating or maintenance personnel C Consequence of operation D Miscellaneous causes Total a
Number of single human failure records
Percentage of single human failures of fault records (%)
279 1 221 500
29 1 21 50
10.0
113 1 101 214
44 1 27 71
33.2
1250 1 1491 2741 505 1 447 952 4407
10 1 3 13 23 1 47 70 204 a
0.5 7.4 4.6
Two cases came from utility event reports (206 single human error cases, see Fig. 1).
Table 1. The division of single human failures in different equipment type is presented in Table 2. As seen, control and instrumentation (I & C, 84 cases) and electrical equipment (40 cases) are often affected by human actions. Their share together is about 60% of the total. A closer study of the whole database revealed that the high number of human originated instrument faults is analogous to the share of I & C in all the faults (<40%). Nearly three fourths of all single human failures (152 cases) were found in process systems, whereas only 41 cases where found in so-called electrical or instrumentation systems, e.g. in bus bars or in plant protection. Also many electrical and I&C faults were found in process systems, which is due to the amount of I&C equipment in all kinds of systems. Consequently, more emphasis should be put in PSA to study complex equipment such as instrumentation, control, protection, electrical power supply and drives in all systems. The next step was to study which kind of human failures take place. Swain [6] divides human failures into errors of commission and omission. An error of omission (EoO) is a failure to perform an action totally, i.e. one omits it. An error of commission (EoC) is an incorrect performance of an action, or performance of some additional action. HRA studies mostly concentrate upon errors of omission. Table 2 shows how the Swain's taxonomy [6] was expanded in this study so that wrong set point failures and wrong direction failures (e.g. an electric motor rotates in wrong direction due
The results obtained for them are compared to the ones of single human failures in Section 3.2. 3. Results In the following sections, the results of the statistical analysis will be discussed separately for single human failures, dependent ones (HCCFs and HCCNs) and HSEFs. 3.1. Single human failures The plant maintenance personnel had classi®ed the human failures, i.e. human actions that led to a fault, in all four available fault cause categories A±D, as seen in Table 1, although there was a speci®c category B in the reporting form for human failures. This was mostly due to the fact that the categories, e.g. `operating and maintenance personnel' and `failure in installation or earlier' are not mutually exclusive. Similarly, some of their sub-categories, e.g. `foreign labour' and `installation error' may apply to the same fault, and slightly different reporting principles have obviously been applied in the two plant units. Based on this result, the search for human failures may not be restricted to the corresponding cause categories in the maintenance history only but also free text descriptions need to be used. 3.1.1. Equipment type and human failures Together 206 (4.6%) maintenance history records were due to single human failures, as was shown in
Table 2 Single human failure types and their distribution among different equipment categories Human failure type
I&C components
Mechanical components
Electrical components
Valves a
Instr. line valves a
Total
Omission Commission, wrong set points Commission, wrong direction Commission, Other Total
13 11 11 49 84
7 0 3 26 36
14 5 7 14 40
7 2 6 19 34
8 0 0 4 12
49 18 27 112 206
a Instrument valves are shown separately due to their proneness to omission errors but all valves were considered as one class in x 2-tests to avoid zero frequency categories.
296
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
to misplaced wires) were separated as own failure classes. This was due to the fact that they were saliently present in the data and the results became comparable with those of Reiman's study [2]. The type of database did not allow for any deeper classi®cations of human failure mechanisms. The distribution of failures shown in Table 2 is not homogenous, which is con®rmed by the x 2-test
p 0:02: EoCs dominate the results with the share of 76%, and especially the category `other commission' is salient (54%) deserving further analysis. The share of EoCs in mechanical and I&C equipment is exceptionally high. In contrast, many EoOs had taken place in instrument line block valves (67% of all instrument valve failure modes). For valves, the result could be expected Ð failure types such as wrong direction and wrong settings do not very often take place in process components. The amount of wrong settings in I&C and electrical equipment was about 12±13% of the total amount of failures, which cannot be seen as exceptionally low. However, the wrong settings are the category of EoCs that is sometimes included in PSAs, whereas the other types of EoCs are normally neglected. Table 3 gives more detailed information about the maintenance related human failures based on a closer investigation. The events could be, for example, decomposed into: lack of attention causing a short circuit, confusion in cables (put in wrong order), use of excessive force causing crushed instrumentation tubes and use of too little force causing bad connections, untight bolts, broken pieces etc. Especially, the analysis of `other commission failures' is further expanded. The ®nding of this part of the study was that the ®rst four columns in Table 3, referring more or less to carelessness, correspond more than 52% of the total. The distribution of detailed human failure type frequencies in Table 3 is not homogenous, as con®rmed by the x 2test (p , 0.001). The `lack of attention' type failures seem to be common in mechanical components (53% of faults). `Forgetting' is the most frequent mode in valves and electrical equipment, which is usually taken into account in PSA studies. Excessive force was in many cases used to adjust valves. Work planning and system design and layout (ergonomics) appeared to contribute to many of these failures. Control and instrumentation equipment is prone to wrong
direction failures, choosing a wrong object and using of too little force (including loose connections). The data of this study showed somewhat higher yearly frequencies for selected human failure classes when compared to the one collected by Reiman [2] earlier from the same power plant. Reiman discovered < 9.6 omissions and 3.8 wrong direction commissions per year through 1981±1991, whereas the ®ndings of this study are 16.3 and 9.0, correspondingly. The difference is mostly due to the more extensive effort put to this study. The search for human failures in Ref. [11] covered all the maintenance record classes, and not only those pre-classi®ed as human failures (Class B, Table 1) as in Ref. [2]. No statistically signi®cant increase or decrease trends were noticed in the yearly frequencies of those failure classes. Finally, also working hours spent were studied as an indicator of the unavailability time and related costs, although the unavailability time cannot be based directly based on them. Data directly from the maintenance history could be used, here. There were no statistically signi®cant differences in the distributions of working hours due to single human failures and due to other faults according to the Mann±Whitney rank sum test
p 0:30: The average was 22.5 h but the data formed clearly two groups with very short and long repairs. This result cannot be seen as surprising, since simple faults are ®xed at once independently on whether they are caused by human actions or by other factors. The mechanical components had signi®cantly longer repair times than the other component groups (average 32.1 h) and they were the only group deviating saliently from other data. 3.1.2. Fault origination and detection In NPPs, many preventive maintenance, modi®cations and testing activities take place during the annual refuelling outage. This fact partly explains that 127 (<62%) single human failures also stem from that period, in comparison to the 78 ones taking place in power operation. The origination and detection of human failures is important information for HRA purposes. About 94% of the failures born during the power operation were also detected during the power operation, as shown in
Table 3 A detailed human failure classi®cation and its distribution in different equipment classes
I&C Mechanical Electrical Valve a Instr. Valve a a
Lack of attention
Too much force
Too little force
Wrong a object
Wrong a set point
Wrong direction, sequence
Forgetting a phase
Total
17 19 10 9 1 56
4 2 1 8 0 15
10 3 2 2 0 17
17 2 1 2 3 25
11 0 4 1 0 16
11 3 8 6 0 28
14 7 14 6 8 49
84 36 40 34 12 206
Valves and `wrong' failure modes were combined for x 2-test to avoid zero frequencies.
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
297
Fig. 2. Plant operating mode at the time of detection of 127 single human failures (faults) stemming from outages (left) and 78 stemming from the operating period (right). For one case, the timing remained unclear. The detection took place by a preventive (prev.) or by other type of action.
Fig. 2. As a contrast to that, 49% of human failures born during outages remained latent until the plant start-up or even until the power operation. This was some unexpected since several preventive actions, such as tests and inspections, take place at the end of an outage in order to detect remaining faults and human failures. The ®nding led to a closer investigation of the birth and detection of human failures. Fig. 3 shows the results of a more detailed study of the failures born during the outages. Detection percentages are shown (1) as a function of different equipment classes and (2) the plant operating mode at the time of the detection. More faults are detected in start-up or during the power operation for all the component classes except for the mechanical component faults, which are mostly detected during the outages (67%). The obvious explanation is that mechanical damages have a good visibility. However, the
equipment type is not a statistically signi®cant explanatory factor (x 2, p < 0.42) for differences in detection frequencies. 3.1.3. Safety signi®cance The indicators of safety signi®cance were based on NPP Safety Technical Speci®cations (TechSpecs), giving deterministic safty related rules, and PSA. To some extent, different systems are listed in TechSpecs and modelled in PSA fault trees (FTs). A PSA study includes important systems from severe core damage point of view. Often, those systems are also in stand-by state during the power operation. The safety system concept in TechSpecs is different, since e.g. ®re and mechanical fuel integrity risks are taken into account. In the following, more emphasis is put on systems modelled in PSA, which required some further investigation of the data and co-operation with the
Fig. 3. Detection frequencies of the outage born faults as a function of different equipment classes and the plant operating mode at the time of the detection.
298
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
Table 4 Distribution of single human failures in different equipment between safety related and non-safety related systems
IC EL MEC VAL IVAL Total
Number in TechSpecs systems
Number in other systems
Total
Number in PSA systems
Number in other systems
Total
40 22 9 8 4 83
44 18 27 26 8 123
84 40 36 34 12 206
33 23 9 14 8 87
51 17 27 20 4 119
84 40 36 34 12 206
PSA-team. The assessment of the safety signi®cance was carried out in three phases. The ®rst phase of the safety signi®cance review was to study the frequencies of single human failures in safety related and other systems. The amount of them in systems with a FT in PSA was 87 (compared to 119 in other systems). To ®nd out if the frequencies of technical faults are distributed in a similar way, all the 4146 fault records due to technical reasons were classi®ed into PSA (1431 records) and non-PSA systems (2715 records). A comparison to the distribution of human failures (87/119) was carried out by x 2-test. The result indicates that the amount of single human failures in systems with a PSA FT is higher than random factors would explain
p 0:03: One reason for this ®nding is the large amount of different preventive maintenance and test tasks carried out in safety related systems. It may also be that not all the faults in totally safety insigni®cant systems are duly booked. Table 4 was drawn to study which components groups are prone to human failures in safety related systems. The distribution of number of human failures in different equipment types is not homogenous according to the x 2-test. The situation is not much affected by whether the division is made with regard to PSA/non-PSA system
p 0:03 or with regard to TechSpecs/non-TechSpecs
p 0:01 systems. Electrical faults induced by human actions tend to concentrate in safety related systems, whereas the mechanical ones are frequent in other systems. Many human failures in instrumentation valves were in PSA systems. The share of single human failures in the systems modelled in PSA fault trees was for I&C faults 39% and for electrical faults 58%. The same ratios for systems mentioned in TechSpecs were 48 and 55%, correspondingly. This result has to be interpreted against the background that the number of plant safety related systems is considerably smaller than that of non-safety related systems. For example, the number of systems modelled in PSA is 47 and the number of other systems is 185. More than 50% of the outage born single human failures in PSA systems were also identi®ed in outages, as shown in Table 5. However, the result is not signi®cantly better than for all single human failures. Valves in PSA systems were the only group, where more safety system related faults were detected during the power operation than in shutdown.
Some negligible seal leakages were present in valve data, which may be the explanation for this ®nding. Nevertheless, there was only a symptomatic difference in the frequencies between the different equipment classes
x2 ; p 0:09: Not all the safety signi®cant events are modelled in fault trees, e.g. those inducing initiating events of PSA. Furthermore, in systems modelled in PSA FTs, there are components such as small tubes that do not have to do with core damage risks. Thus, one should study the human failures leading to PSA basic events in order to ®nd out a realistic nuclear risk contribution. This is discussed, in the following, as the second phase of the safety signi®cance review for the data of our study. Together 19 single human failures, modelled as a basic event in the plant PSA model, were found. This amount corresponds 9.2% of the single human failures. Two of them were only modelled as a part of a CCF mechanism. The most frequent classes were, again, I&C and electrical component faults corresponding together about 63% of the total. An interesting detail is that in PSA basic events, there were more omission type of human actions and wrong direction failures at the cost of failure types somehow caused by careless actions, when compared to Table 3. One potential explanation may be taking better care of important components. The dominant part of human failures leading to PSA basic events were born in outages (15 cases, 79%). About 40% of them remained undetected from outage to start-up or to power operation. This is slightly less than the average for Table 5 The detection frequencies of single human failures born in outages taking into account if the system had a FT model in PSA or not Detected in outage
Detected in power
System
PSA
NON-PSA
PSA
NON-PSA
Total
Component I&C Mechanical Electrical Valve a Instr. valve a Total
13 3 8 4 2 30
13 11 2 7 1 34
9 2 6 7 2 26
19 5 5 6 2 37
54 21 21 24 7 127
a
The valves were analysed together in statistical testing.
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
all systems, which could be explained by tighter check-ups and tests in safety related systems especially for important components. Nevertheless, the difference between the populations (PSA/other events) remains unproven in x 2-testing. As the last phase of the study of safety impact, also generally used reliability importance measures where used to study the signi®cance of human failures. The Fussell± Vesely (FV) importance for basic events was applied, meaning the risk proportion of all cut-sets where the speci®ed basic event is present. Another used importance measure was the risk achievement worth (RAW), meaning the relative rise in the core damage risk given that a basic event takes place with probability 1. These two importance measures give somewhat different results. For example, RAW gives more weight to rare CCFs and single component faults do not appear high in its ranking list. For FV, the situation is often the opposite. Thus, RAW gave no signi®cant importance to the single human failure basic events. The highest FV importance, less than 1%, was obtained for some important control valve faults in the reactor water cleaning system. Should all the identi®ed single faults have taken place at once, the FV importance contribution to the core damage risk would have been 1.2%. This shows clearly the small signi®cance of single failures in a redundant and diversi®ed nuclear power plant. The importance of dependent failures is discussed under Section 3.2.3. 3.2. Dependent human failures Candidates for dependent human failures were found in 126 maintenance records in the database and in four plant licensee event reports. After a screening analysis, this amount could be reduced to 43 records referring to 13 HCCF/HCCN cases and 14 records referring to 10 HSEF cases. Other records referred to e.g. ageing mechanisms, as indicated already in Fig. 1. Apart from the amount listed before, one HCCF and one HSEF case came from other utility reporting (such as LERs). This makes the amount of HCCFS and HCCNs together 14, as shown in Table 6, and the amount of HSEFs 11, correspondingly. The HCCF and HCCN cases have been listed in Appendix A. There were more fault records in the maintenance history than actual human failures, since dependent failures had
299
effect on several components, as shown in Table 6. Fault records are normally booked on component basis rather than on mechanism basis. Furthermore, many dependent failures that were ®rst classi®ed as wrong calibrations were due to ageing etc. and could be screened out in the course of the study [11]. This shows the need for plant staff interviews as a part of data collection. The proportion of human failure records in the whole maintenance data was about 6.0% and the proportion of human failure cases about 5.2%. The latter result slightly underestimates the human failure share, since no grouping of other dependent faults than human failures was done. Moreover, some human failures may have remained unidenti®ed. The dependent human failures (HCCFs, HCCNs and HSEFs) represent 10.8% of the total amount of human failures. The share of HCCFs, representing clearly critical failures, is 3.5% based on Tables 1 and 6. These results represent rather low probabilistic dependence and they also con®rm from their part the ®ndings of Reiman [2]. A further analysis showed that, of the eight HCCF cases, one case affected two redundancies out of two, three cases affected two out of four and three cases affected four out of four (different systems had different degrees of redundancy). In addition, in one case the number of redundant systems was higher than 10, four of which had failed. In one case, where all redundant subsystems on one plant unit had become unavailable, also another unit was affected (two out of four). Thus, dependent human failures can extend across the system and plant block boundaries. 3.2.1. Affected equipment and human failure types for HCCFs and HCCNs Statistical analysis of HCCF and HCCN data was dif®cult due to the small amount of data. This also means that no tables are used here to explain results. In some cases, x 2and Fishers exact probability test could be used to support inference. However, many conclusions could only be based on qualitative reasoning and comparisons to the results of single human failures. Anyway, the results obtained for HCCFs and HCCNs resemble those of the single human failures, as shown in the following. Instrumentation (10 cases) and electrical equipment (four cases) were the only affected equipment types. The
Table 6 Identi®ed dependent human failure related records with their distribution both in HCCF/HCCN cases and in reported cause categories Reported cause category
Fault records
Records referring to single human failures
Records referring to HCCFs/HCCNs
Dependent human failure cases (HCCFs/HCCNs)
Number of records per dependent h.f. case
A Failure in installation or earlier B Operating/maintenance staff C Consequence of operation D Miscellaneous causes Total
500 214 2741 952 4407
50 71 13 70 204 a
12 13 8 10 43 b
4 6 1 2 13 b
3 2.2 8 5 3.3
a b
The amount excludes two reports not coming from the maintenance records, together 206 single human failures. Excludes one case coming from other utility records.
300
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
corresponding distribution of the eight HCCFs is ®ve in I&C and three in electrical equipment. This result supports the conclusion made in Section 3.1.1 about the importance of analysing maintenance actions related to electrical and I&C equipment. The dominant failure type category for HCCFs and HCCNs is EoC (see Section 2), as for the single failures. Its contribution is 86%, which means seven HCCFs and ®ve HCCNs. Wrong settings (four cases) is also a salient category, but all of them appear to be non-critical instrumentation related HCCNs. The contribution of wrong settings is, anyway, more signi®cant than in the case of single human failures. A wrong direction failure was behind two instrumentation related HCCFs. Six cases could be seen as other types of EoCs. In contrast to single human failures, many dependent EoCs were related to systematic de®ciencies in work planning and practices than due to carelessness and other random causes. The issue of non-criticality of faults (HCCNs) is sensitive to de®nition of equipment boundaries. Only in one case, the effect on equipment was so negligible that the case was left outside further study of consequences of failures. For the rest 13 HCCF/HCCN cases, ®ve equipment inoperabilities and eight wrong functions were identi®ed as consequences. The result was compared to the distribution of single human failures by using x 2-test, which showed that the distributions are not homogenous
p 0:03: Thus, more wrong equipment functions appeared in the consequence of dependent human failures. The difference may be easily explained by the population, since most dependent failures were instrumentation related and, thus, wrong signals are a common fault mode. 3.2.2. Fault origination and detection Only three dependent failure cases (HCCFs and HCCNs) were born during the power operation and the rest 11 were born in outages. This is analogous to single human failures. Fig. 4 presents the plant operating mode at the time of detection of those 11 failures born during the outages. Also in this case, the results do not differ from the ones
Fig. 4. Distribution of the detection modes of human induced dependent faults introduced during an outage (11 cases). The more left in the picture the better the situation.
obtained for the single human failures. Seven cases including three HCCFs remained undetected at least until the plant start-up. Furthermore, Fisher's exact probability test con®rms the homogenous distribution of outage born HCCFs and HCCNs detection frequencies, when the failures are classi®ed according to their detection in outage or later, i.e. start-up or power operation (2 £ 2 contingency table). However, the data was very sparse. A more thorough analysis of the dependent human failures allowed further inference about their causes and means of detection. Modi®cations are an important source with the share of 50% (seven cases). Further, periodic testing and alarms detected together 50% of the cases. However, different types of preventive actions also caused ®ve cases. The importance of modi®cations is problematic from the safety point of view, because it is dif®cult to know which kind of hazards are induced by the new equipment and set-ups. Nuclear utilities normally carry out extensive start-up testing programs for their new equipment. However, in many cases either the test program was not found to be comprehensive enough, or the tests were not carried out thoroughly. Similarly, ¯aws were found in barriers like control and adjustment, personnel training and work planning. In future, plant back®ttings and modi®cations must be seen as activities having an impact on many parts of the plant and its organisation. 3.2.3. Safety signi®cance of dependent human failures A deeper analysis of the data reveals that the amount of dependent human failures (HCCFs and HCCNs) is equal or higher in safety related (PSA or TechSpecs) systems than in other systems. Seven out of 14 dependent faults were in systems modelled in PSA FTs. The corresponding ®gures for TechSpecs systems were nine out of 14. The low amount of data did not allow for a statistical con®rmation of this ®nding, but it is analogous to the one obtained for single failures. The dependent failures in systems modelled in PSA fault trees/mentioned in TechSpecs are detected slightly earlier than in other systems. The situation is also some better with regard to preventive actions, since 46% of dependent failures were detected by them. The slightly better detection ef®ciency was not, however, proven statistically. A remarkable fraction of dependent failures, as was the case with single failures born in outages remain latent until the power operation. An assessment of safety signi®cance of dependent human failures was also based on the importance measures discussed under Section 3.1.3. To identify the corresponding CCF events in the plant PSA models, additional work and judgement was required. Finally, ®ve approximate correspondents were found. The highest contribution to the core damage risk was, according to FV importance, due to a fourfold HCFF in seawater mussel ®lters (<1.4%) and, according to RAW importance, due to a manifold HCCF in hydraulic scram system (<1.2%). These two cases are signi®cant contributors to core damage frequency.
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
In the latter case, some uncertainty was left if the human failures really caused unavailability of several scram groups. Similarly, the plant speci®c PSA model is a living one, which leads to constant change in the results. 3.3. Human induced shared equipment faults Altogether 11 shared equipment faults (HSEFs) were identi®ed based on 14 fault records and one internal utility report. In HSEFs, single human actions cause multiple consequences through system structure or component interactions. Examples of HSEFs are, e.g. a short circuit in two electrical trains due to de®ciently installed jumpers and a missing restoration work order (many restorations on the same form). They are not HCCFs or HCCNs, since repeated human failures were not involved. In eight HSEF cases, two components were affected. Furthermore, in two cases three components and in one case four components were affected. Instrumentation was, again, a dominant equipment group. Otherwise the ®ndings (type of human failure, consequence, time of origin, time of detection) resembled those of single human failures. The result is not surprising, since HSEFs are single human failures by their nature. The population of HSEFs was quite sparse to allow profound inference, and it was not carried out in this study. Obviously more attention should be paid, in future, to potential HSEFs. This means studying dependencies caused not only by system structure but also by human actions. 4. Discussion and conclusions A large amount of plant speci®c maintenance data was used as the source material of this study. The data analysis, and especially interviewing, required a lot of resources. Plant maintenance records, however, offer the best database for maintenance related human failures. Their wider utilisation in HRA work is recommended, in future. The used database is also a source of some uncertainties. Identifying human failures as sources of the fault records and classifying them is not straightforward. The cause categories used in the plant maintenance records did neither directly address the faults in redundant components, nor explicitly allow for human failure classi®cations. It is impossible to totally exclude subjective biases from the results. In order to reduce them, several discussions were carried out with the plant and regulatory body personnel. In the light of the results, instrumentation, control and electrical components are especially prone to human failures, partly due to the vulnerability and partly due to the complexity of the equipment. Thus, more emphasis has to be put on studying I&C and electrical components in safety related systems. An amount of human failures, stemming from outages, remain undetected until the power operation. In that respect, single and dependent human failures show similar behaviour, and more rigid testing and veri®cation is suggested.
301
Many single human failures were related to lack of vigilance, whereas the most dependent ones were related to planning and co-operation gaps. The single human failures led more frequently to equipment unavailability than to wrong equipment functions. Wrong systems functions were frequent in the consequence of HCCFs, which may be explained by the amount of I&C equipment. Human reliability analyses of PSA studies often concentrate upon errors of omission (EoOs) and not on errors of commission (EoCs). There is confusion in the discussion about this topic, since one may mean by the acronym EoC or EoO either the external human failure type or its consequences. There is no ®xed mechanism that would lead from an EoO to system unavailability consequence only and from an EoC to wrong system functions only. As shown by the results of this study, as much as 68% of EoCs led to unavailability of equipment and some EoOs led to a wrong system response. Thus, more analysis effort than just using EoO & EoC paradigm is required. A high number of human failures takes place in safety related systems. Potential explanations to this are the high amount of scheduled activities in safety systems and that the of®cial fault reporting in nonsafety systems does not work as well as in safety systems. Electrical faults due to human failures tend to concentrate into safety related systems, whereas the mechanical ones are rare in them. The amount of human failures in the maintenance data is not insigni®cant, but especially the number of dependent failures remained considerably low. Dependent human failures (HCCFs and HCCNs) and single ones show rather similar behaviour with regard to many traits. Plant modi®cations appeared as a very important source of dependent human failures. Thus, more extensive planning, co-ordination and testing of the modi®cations may be recommended. Despite the number of human failures found, only few HCCFs turned out to be safety signi®cant in a closer study. When the human failures related to maintenance are discussed, one should also remember that more safety degradation would probably take place if no maintenance were performed. Acknowledgements The author wishes to acknowledge Dr Kari Laakso for the amount of work put in the screening analysis of the material used in this study. The help of Dr Urho Pulkkinen, Dr Lasse Reiman and other reviewers in preparing the manuscript is also highly appreciated. The ®nancial support of the Finnish National Nuclear Research Programmes RETU and FINNUS has been vital for the work. Finally, the author wishes to warmly thank the Olkiluoto NPP and STUK regulatory body personnel that participated both in data analysis and in commenting about the manuscript.
302
P. Pyy / Reliability Engineering and System Safety 72 (2001) 293±302
Appendix A HCCF and HCCN failures (Table A1) (Type of equipment (IC instrumentation and control, EL electrical) given in parentheses). Table A1 Plant units affected HCCF 1. 2. 3. 4. 5. 6. 7. 8. HCCN 1. 2. 3. 4. 5. 6.
The trip limits lowered on wrong neutron ¯ux trip conditions (IC) Neutron ¯ux trip limits left too low after valve self-closure test (IC) Power cables cut to the supply pumps of the diesel fuel tanks (EL) Difference pressure measurements crosswise connected in mussel ®lters (IC) Couplings broken between actuators and control valves (IC) The actuation times too long due to mineral oil impurities in the anchors of the solenoid valves (EL) Simultaneous work in two subsystems of the auxiliary feedwater system during the refuelling outage of unit 2 (IC) Turning pieces of ¯ow measurement devices mixed after cleaning (IC)
One unit only One unit only One unit only Two units One unit only Two units One unit 1 single failure on another One unit 1 single failure on another
The temperature measurement values of the bearing pads of the turbine set too low (IC) The protective coverings broken in the power supply cables of solenoid valves (EL) Air left in instrument lines of the pressure difference measurements of the suction strainers. In addition unnecessary alarms (IC) Wrong settings of the piston position indications of the operating oil pressure accumulators due to start-up problems (IC) The signal lights of the operating oil pressure accumulators do not indicate due to wrong settings (IC) The air pressure correction was lacking in the calibration method of the temperature monitoring limit switches (IC)
One unit only One unit only One unit only
References [1] Hirschberg S, editor. Dependencies, Human interactions and Uncertainties, ®nal report of NKS/RAS-470. NORD 1990:57 report. P. 2-12-65. ISBN 87 7303 454 1, 1990. [2] Reiman L. Expert judgment in analysis of human and organizational behaviour at nuclear power plants. Helsinki: Finnish Centre for Radiation and Nuclear Safety. Thesis for the degree of Doctor of Technology. STUK-A118 report, ISBN 951-712-012-5, 1994. 226 p. [3] IAEA. Single human failures in nuclear power plants: a human factors approach to the event analysis. Report of a consultants meeting, limited distribution. IAEA-CS12/96, 1996. 61 p. [4] Illman L, Isaksson J, Makkonen L, Vaurio JK, Vuorio U. Human reliability analysis in Loviisa probabilistic safety assessment. Proceedings of SRE Symposium '86, Espoo, October 1986. 12 p. [5] IAEA. Procedures for conducting probabilistic safety assessments of nuclear power plants (Level 1), Safety Series No. 50-P-4, IAEA, Vienna, 1992. [6] Swain AD, Guttmann HE, Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications. NUREG/CR1278, Sandia National Laboratories, Albuquerque, USA, 1983. p. 554. [7] Samanta PK, O'Brien JM, Morrison HW. Multiple Ð sequential
[8]
[9]
[10] [11] [12] [13] [14]
One unit only One unit only Two units
failur model: evaluation of and procedures for human failure dependency. NUREG/CR-3637. Brookhaven National Laboratory. May 1985. Vaurio J. Modelling and quanti®cation of testing, maintenance and calibration failures in system analysis and risk assessment. In: Schueller GI, Kafka P, editors. Safety and Reliability. Proceedings of ESREL '99 conference, 1999. p. 663±9. Morris IE, Walker TG, Findlay CS, Cochrane EA. Control of maintenance errors. Safety and reliability, Proceedings of ESREL '98 Conference, Trondheim, Norway. Rotterdam: Balkema, 1998. p. 281±5. Siegel AI, Bartter WD, Wolf JJ, Knee HE, Haas HE, Haas PM. Maintenance Personnel performance simulation (MAPPS) model, Vol. 1. Summary Description. NUREG/CR-3626, 1984. Laakso K, Pyy P, Reiman L. Human failures related to maintenance and modi®cations. STUK-YTO-TR 1998;139:42. Conover WJ. Practical nonparametric statistics. New York: Wiley, 1971. p. 493. Siegel S. Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956. p. 312. SPSS. SigmaStat, Statistical Software Version 2.0. User's Manual. ISBN:1-56827-149-2, 1997. p. 860.