Nuclear Engineering and Design 93 (1986) 181-186 North-Holland, Amsterdam
181
COLLECTION, P R O C E S S I N G AND USE OF DATA G. M A N C I N I Commission of the European Communities, Joint Research Centre, lspra Establishment, 21020 Ispra (Va), Italy
Received August 1985
The proliferation in the past few years of data banks related to safety and reliability of nuclear reactors has somewhat alleviated the problem of lack of information when carrying out safety analysis; but it has succeeded meanwhile to identify several crucial aspects in the overall procedures of collection, processing and use of data. These aspects are reviewed in the paper with special emphasis on the interaction of the information and the various interpretative models of component and system behaviour. The need for the analyst of not only validating stated models but also of attempting new interpretations of the facts is particularly stressed. Eventually the role of new Artificial Intelligence techniques in supporting man in his search for model structures - that is the "history" underlying the facts, is mentioned.
1. Introduction
Event and reliability data banks have proliferated during the past decade; this was due initially to designers, operators and managers of industrial installations becoming aware of the benefits to be derived from an in-depth knowledge of the reliability characteristics of the plant for which they were responsible. Current growth is in part due to proposals from legislation that will require plant operators to produce hazard assessments of their plants using, where possible, data derived from their own operating experience. In the nuclear power domain we see that power industry has undertaken risk assessments of many of their plants and has established information systems on the operational data of their systems. In some countries, incident reporting has become mandatory; this eventually shows that for major complex technologies the risk associated with operation cannot anymore be managed internally in the plant but involves the society at large. Also in the more general industrial field these trends are now being followed: EEC has issued their directive on major hazards in the chemical industry which embody some of these requirements [1]. Several types of institutions are involved in datagathering exercises: governmental authorities, plant owners a n d / o r operators, industries, public knowledge, insurance companies, university departments, etc. Each of these institutions have different goals in their data collection: attempting to give a categorization of the
various information systems is thus a difficult and perhaps not useful task. We may say anyway that with the exception of the nuclear power industry, no group of similar industries have combined to produce their own industry standard for the collection of data and its subsequent processing and use. In this paper we will review various aspects of data collection, processing and use, in the frame of probabilistic safety analysis; some trends will thus be outlined for future development.
2. Collection of data
Three major classes of data could be identified when looking at the present data-gathering exercises: (1) data on failure of components; (2) data on abnormal occurrences, incidents; (3) data on reliability parameters for classes of components. Indeed, the third class does not relate to a primary source of information but to processed data derived mainly from collection of component failures or from subjective estimates of experts. 2.1. Data bank content
The first two classes of data-gathering include reporting of raw events. The difference lies in the object of the reporting: whereas in the first class the focus is
0 0 2 9 - 5 4 9 3 / 8 6 / $ 0 3 . 5 0 © Elsevier Science Publishers B.V. ( N o r t h - H o l l a n d Physics Publishing Division)
182
(~. Mancini / Collection, processing and use q["data
on c o m p o n e n t behaviour, m the second class the focus is on the m e c h a n i s m s that govern the initiation and p r o p a g a t i o n of incidents. Looking from the plant model aspect, the c o m p o nent failures identify relations between the elementary units of the plant and between them a n d the various systems to which they belong; the incidents data may be seen as identifiers of the relations a m o n g the systems and between them, and the plant to which they belong. From a safety assessment modelling point of view. c o m p o n e n t failures may be seen either as the initiators of a transient or as c o n t r i b u t o r s to the unavailability unreliability of systems called in operation; whereas the incident data will inform a b o u t the event sequence m the accident d e v e l o p m e n t and the causal links between the various events. Tables 1 and 2 summarize a typical i n f o r m a t i o n content of data b a n k s on c o m p o n e n t failures and on a b n o r m a l occurrences, respectively [2]. In the following we will try to outline some of the aspects which, in our view, deserve more attention in the collection of data.
Table 1 Component failures data bank content Identification of the component in the plant: plant classification plant subdivision into systems, sub-systems, units definition of functions of systems, sub-systems and of their mode of operation in relation to the plant status -
Identification of the engineering characteristics of the componenl: - definition of the component, its boundaries and its piece parts -- classification of the component types according to engineering attributes Identification of the duty of the component: definition of operation specifications of the component definition of the environment in which the component is working Identification of the operating hours and cycles of the component Identification of the event to be reported: failure definition time of failure - mode and cause of failure effect(s) of the failure (on other components, on systems, administrative) circumstances of failure discovery (status of the plant, way of discovery, etc.)
2.2. l)ata collection oblect
The first issue to be approached when collecting data is to establish their final use: according to this both the object of the information to be collected a n d the degree of detail of this information will be established. Let us consider briefly these two aspects of data: the object and the detail. As far as the object of reporting is concerned, it is indeed very interesting to notice how different uses may influence collection of data of the same object: for instance, collection of incidents of nuclear power reactors has given rise - in the past few years to various incident reporting systems such as the Incident Reporting Systems IRSs of O E C D and {AEA [3] and the A O R S [4] of the Commission of the E u r o p e a n Communities. T h e IRSs and the AORS differ indeed by a factor of 20 (in favour of AORS) in the n u m b e r of incidents reported, being the field of application practically the same. The criterion of reporting is, in fact, more selective for the two IRSs, which require only typical examples of the various types of incidents to be t r a n s m i t t e d by the m e m b e r countries: in AORS, on the contrary, the widest scoping reporting is considered. The difference derives from the final use of the data collection a n d organization; in the case of the two IRSs the focus is on the m o n i t o r i n g of the overall degree of safety of reactor operation, while the A O R S intends to study the various events which could constitute the precursors of more serious accidents, thus needing a more extended basis of information from which to infer
Table 2 Abnormal occurrences data bank content Identification of the plant Identification of the incident: - date - status of the plant at the moment of the incident mmanng event time sequence of the events systems involved during the events development (whether failed to operate or not) - failed components cause of failure of systems and components way of failure(s) discover,,' -
Identification of the consequences: - on the system/plant operation on the people inside and outside the plan! on the plant itself on the environment
(2 Mancini / Collection, processing and use of data some conclusions; in the first case the national reporter decides on the importance of the incident to be then reported; in the second case analysis and comparison of data define the importance of the reported events. Similar considerations apply also to component failure data which can be used for many purposes such as: gross component reliability estimations, maintenance planning, and finally failure root causes analysis, although in this latter case only for some particular causes such as system caused root causes, analysis is more amenable to success (in other cases discovery of root causes involves research which overcomes the possibilities of the reporter). Eventually, it is always important to check the object of a data collection with the use one wants to make with it: many examples tend to show that this has not always been the case (i.e. use of abnormal occurrence data to infer component reliability estimates). The other important aspect of data is constituted by the detail of information to be collected: the model (of the plant, of the component, of the accident) adopted by the user will determine the level and structure of the information to be collected, and used then in the model (see for instance tables 1 and 2). If it is indeed of the utmost importance to underline the role of the model as the underlying structure of the data, we have anyway to recognize that new models and theories may derive from the collected data. Therefore, if we do not want data banks to become a mere storage of data to be retrieved and used without any exploitation of the novelties of the information, we have to structure the bank in such a way that new models for interpreting the physics and dynamics of events could emerge from the data analysis: the exception is very often the most instructing event! Use of Artificial Intelligence techniques could, as we will see later on, help the researcher in this essential duty. 2.3. Data collection procedures After having commented on few aspects concerning the objectives of data collection, we will analyse in the following some practical issues related to the data collection procedures. We will in particular address shortly the problems of data classification, of failure cause mechanisms and of human errors. One typical issue in data collection schemes is the set-up of coherent classifications for plants, systems, components, in order to avoid ambiguity in reporting and to enhance the potentiality of data exchange and use. In these classifications, the functions and the boundaries of systems and components are defined together with the major engineering
183
attributes influencing component behaviour. The nuclear industry has invested a considerable effort in the set-up of such reference classifications: we would like to mention the EllS (Energy Industry Identification System) [5] in USA, the U N I P E D E classifications in Europe, and also the voluminous set of systems and components reference classifications prepared by the CEC in the frame of the ERDS [2]. A tendency which has become quite important in the past few years in the collection of component failures is the correct identification of the causes of failure. This is not only due to the recognition of the usefulness of these data as a source of information for improving the design of components; we indeed believe that the major driving force for this trend has been the recognition of the errors which usually affect probabilistic safety analysis, when attributing the majority of the failures to the "random component failure" category; a correct event description may indicate indeed that other categories would be more appropriate, e.g. human errors, design errors or, what may be improperly defined, common cause failures. This is indeed one of the major unsolved issues also in the analysis models: how to deal with these other categories of failure, where to place them in the models, as they definitely challenge the probabilistic characteristics of the models. This is also the reason why no mention has been made up to now of collection of human errors. At present, very few and limited data-gathering exercises have been atempted on this subject; the difficulty of "modelling" the man in his actions (errors and recovery) make this task a formidable one. The characteristics of "randomness" of human errors in the several tasks described in the models is also very questionable. Existing, but not yet revealed, causal links in human performance may seriously affect any data-gathering exercise, if not extended to cover at the beginning, a very large number of variables. At the moment poor descriptions of qualitative behaviour of humans are extracted from the incident data collections.
3. Processing and use of data 3.1. Introduction We have, up to now, examined the problem of collecting data and we have mostly underlined the tight interrelationship between data to be collected and models; we have also advocated for a structure of data collections enabling the derivation of new knowledge and new models, Eventually, we have also to stress that
G. Mancini / Collection,processing and use of data
184
collection of data and storing of data in computerized data banks does not represent a great problem anymore. The utility of computer-based systems is indeed limited more by inadequate techniques for processing and managing large quantities of data than by our technical ability to gather and store those data. Existing systems for storing and retrieving data are too unflexible and complex for easy extraction of information. Processing and use of data are, therefore, two issues of one, more general, problem: extraction of information from data. We will address some issues of this general problem m the following.
3.2. Reliability parameters Major data collection schemes contemplate also collection of reliability parameters for similar classes of components [6,7]. Indeed, we are dealing with this type of data-gathering in the frame of this paragraph, because these data derive from the processing of raw data, namely: component failure events. These data-gathering exercises constitute historically the first type of data collection dedicated to reliability of components: they are indeed tuned to a very precise use of the information stored in the failure events that is the supply of reliability parameters for their use in reliability analysis. These data reflect mostly the choice of a constant failure rate and a repair rate. This choice is indeed rather questionable and it could be argued whether more elaborated models could not, in some cases, suit better the evidence of data.
3.3. Bayesian approach Processing failure data in order to evaluate reliability parameters has involved more and more the principles of Bayesian analysis by which generic information is combined with application-specific data. Generic data are obtained from industry experience in the specific field (e.g. nuclear) and also in other fields; these data can be extracted from the data-gathering exercises mentioned above to provide the a-priori distribution of failure rates to be updated with the application-specific events; generic data can be constituted also by expert judgement based on own experience [8], Bayes' theorem, applied in this context, becomes ~r(~) L ( E I)~)
~r'()~] E ) = f ~ r ( a
(1)
) L ( E l a ) d)~,
where: vr'(~lE) = probability density function of the failure
rate ?~ given evidence E (posterior distribution) rr(?,) = the probability density prior to having evidence E L ( E bk ) = the likelihood function (probability of the evidence E given 2~). Bayesian methods provide a logical framework for a "state-of-knowledge" approach to data assimilation and allow to exploit the small number of failures generally available on some class of components. One may also envisage to extrapolate the principles of Bayesian analysis to regulate, more generally, "learning" from experience: similar procedures could then be applied to other information stored in the event data banks, such as, for instance, incident scenario description, incident initiators, incident precursors, human intervention, etc.; anyway the models of these events by far outweigh the relative model simplicity of component failure. Such extrapolation is, therefore, questionable because in order to apply Bayesian rules, an a-priori perfect information of all the possible states in which an event may happen (develop into) is needed: in reality, for many events this is difficult if not impossible.
3.4. Fuzz~ logic' An alternative route to interpret the evidence of new facts in relation to past information has been recently proposed with the use of fuzzy logic, through which a measure of uncertainty, non-probabilistic (i.e. nonBayesian, that is non-additive) can be formulated [9]. More specifically the use of fuzzy attributes values by means of possibility distribution seems to be rather promising in order to represent incomplete information or vague and imprecise information in data banks such as for instance: a "high" power engine, a "frequently" tested system, an "highly qualified" operator, a machine of "high" standard design, a "redundant" system, an "incompatible" material, a "reliable" configuration, a "high" failure rate of an instrument, etc.; the available information about the value (i.e. high) of an attribute A (i.e. power) for an object x (i.e. engine) will be represented by a possibility distribution vA(x~. Furthermore, use of fuzzy attributes values may also help in representing and formalizing different kinds of fuzzy contraints and fuzzy queries [10].
3.5. Data analysis We have previously outlined the increasing importance of analysing the causes of component failures or more generally of identifying the causal structures un-
G. Mancini / Collection, processing and use of data
derlying failure mechanisms. This need calls for an increased use of more advanced techniques of data analysis than those until now reviewed for reliability parameter estimation. More and more room is acquired by factorial analysis techniques and Discriminant Function Analysis which attempt to explore correspondence between the various attributes of the failure events [11]. Besides the possible discoveries of the underlying causes of failures and, therefore, also of the so-called "common cause failures", such techniques are also being used: to extract correspondences between the various attributes of components, environment, duties and failure events in order to construct failure models for the various components (for the electronic components, these models already exist under the form of mathematical expressions); to explore the domain of events represented by failure of components, failure of systems, accident initiators, human behaviour, etc. in order to extract correspondences between them and thus corroborate hypotheses of incident development or generate new possibilities of sequences. 3.6. Knowledge inference
Finally, one of the most relevant issues in the overall scenario of collection, processing and use of data is the problem to help the user, that is: the decision maker, in his queries and to let him interact with the data bank. It is important that the decision maker obtain information which is valuable to him: traditional output from data bases represents primarily selected data, rather than information; it is vital to use data bases for deriving or inferring facts about the domain we consider. Development in Artificial Intelligence (AI) over the past two decades provides tools which can be of value in increasing the responsiveness of data base systems to their user's needs [12,13]. Several applications of A1 techniques may be thought of in order to optimally exploit the large amount of information stored in data bases, for instance an "intelligent" interface could be designed [14], capable to understand natural language and to carry out the job of an intermediary (the professional who helps and assists users in the use of a data bank, correctly formulating his information needs, accessing the data bank, evaluating results), to be an expert of the specific bank (logical structure, content, query language(s), navigation procedures, etc.) and of the package of available application programs. Further development may call also for the implementation of learning capabilities which will enable the
185
system to assimilate and generalize into the rules of its own those inference steps employed by the user in his queries, thus creating new knowledge. We think that by the application of these techniques the user will be much more assured of adequate exploitation of the information content of data for the generation of new safety issues or the corroboration of previous hypotheses.
4. Conclusions
In trying to reach some conclusions from the abovesketched discussions, we have first of all to recognize that we are living in a world dominated by information: its explosion in the area of nuclear technology has, in reality, happened with the multiplication of organized data banks, of plant information systems and of information networks. The danger of this situation is represented by the difficulty of creating a correct sense of the "history" which lies behind the multiplicity of events and, therefore, of recognizing the "novelty" whenever it happens. How to construct, therefore, a "history" from the information we obtain, and how to bring all the information to unity? The interaction of the models and the information, if correctly structured, is the answer to these questions: in this framework we can then better understand the efforts which have only now started, but which deserve the greatest support if we really want to give a meaning to the reality around us. Starting from the simplest models of component reliability parameter estimation (constant failure rate), we have now moved to test time-dependent models in the data banks and hence more complex component failure models and dependency structures of failure events. "Precursor" studies try to discover in the banks occurrences which may be coincident with the rather simple accident development models as accepted, up to now, in risk studies. The need of more realistic comprehensive and, therefore, more complex accident scenario models will indeed require a much more accurate analysis of the data bank content: the combinatorial explosion of possible accident paths will have to be correctly guided and bounded by the relevance of the reactor experience accumulated in the banks. Recent experiments have been performed at JRCIspra of interacting accident sequences as generated by a System Response Analyser with the information stored in the Abnormal Occurrences Reporting System (AORS), which have given very encouraging results: not only the various accident paths could be weighted
186
G. Mancmi / Collection, processing and use o] data
against past occurrency frequencies, but also deficiencies in the accident modelling were discovered, thus enabling to improve the previous accident analysis. Certainly the i m p l e m e n t a t i o n of these investigation procedures is not simple a n d often outweighs the capability of the analist: Artificial Intelligence tools will have to be developed, which try to generalize the inference steps of the analist, and to simulate the heuristics of the researcher in testing his belief statements a n d in acquiring new knowledge.
5. References [1] EEC Directive on Major Accident Risks, Official Journal of European Communities, 24 June, 1982. [2] G. Mancini et al., The European Reliability Data System: An organized information exchange on the operation of European nuclear reactors, Int. Conf. on Nuclear Power Experience, IAEA, Vienna, September 1982. [3] J. Amesz, H.W. Kalfsbeek et al., Status of the IRS databank, OECD, NEA Report SINDOC 84-202, September 1t~84" [4] J. Amesz et al., The European Abnormal Occurrences Reporting System - AORS, 4th EuReDatA Conf., Venice, March 1983. [5] J.L. Parris et al., A standardized approach to unique identification for power systems, structures and components, 4th EuReDatA. Venice, March 1983.
[61 US Nuclear Regulatory Commission, Reactor safety study An assessment of accident risks in US nuclear power plants, WASH-1400, NUREG-75/014, 1975. [7] 1EEE Guide to the collection and presentation of electrical, electronic and sensing component reliability data for nuclear power generation stations, IEEE Std 500, 1977. [8] G Apostolakis, Bayesian methods in risk assessment, in: Advances in Nuclear Science and Technology, Vol, 13 (Plenum Publising Corporation, New York, 1981). [9] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications (Academic Press, New York, 1980). [10] H Prade and C. Testemale, Representation of soft constraints and fuzzy attribute values by means of possibility distributions in data bases, 1st Int. Conf. on Fuzzy Information, Kanai, Hawaiian Islands, July 1984. [11] R.S, Sayles, The use of discriminant function techniques m reliability assessments and data classification, Reliability Engrg. 6 (1983). [12] KBMS Knowledge based management system project, Stanford University report. 1984. [13] RL. Blum, Discovery, confirmation and incorporation of causal relationships from a large time-oriented minicore database: The RX project, Computers and Biomedical Research 12 (1983). [14] F. Bastin, S. Capobianchi et al., An intelligent interface for accessing a technical data base, 2nd IFAC Conf. on Analysis, Design and Evaluation of Man-Machine Systerns, Varese, Italy, September 1985.