An auditing methodology for safety management of the Greek process industry

An auditing methodology for safety management of the Greek process industry

Reliability Engineeringand SystemSafety 60 ELSEVIER PII:S0951-8320(97)00148-8 (1998) 185 - 197 © 1998 Elsevier Science Limited All rights reserved...

1MB Sizes 15 Downloads 110 Views

Reliability Engineeringand SystemSafety 60

ELSEVIER

PII:S0951-8320(97)00148-8

(1998) 185 - 197 © 1998 Elsevier Science Limited All rights reserved. Printed in Northern Ireland 0951-8320/98/$19.00

An auditing methodology for safety management of the Greek process industry Zoe S. Nivolianitou & Ioannis A. P a p a z o g l o u System Reliability and Industrial Safety Laboratory, Institute of Nuclear Technology-Radiation Protection, National Center for Scientific Research 'Demokritos', Aghia Paraskevi 15310, Greece

(Received 24 February 1997; accepted 4 October 1997)

Process Risk Management Audit (PRIMA), a methodology aiming at assessing the impact on safety of the management system in a plant is outlined. The methodology has been applied in two plants in Greece within the framework of a cross-national application and results concerning both the methodology and the Greek plants are reported and discussed. © 1998 Elsevier Science Limited.

1 INTRODUCTION AND HISTORICAL REVIEW

development of powerful Quantified Risk Assessment (QRA) techniques generating risk indices characterizing the safety level of an industrial site. As they continued to be developed, QRA techniques considered, in an increasingly broadened scope, the elements of hardware and human performance. According to Bley et al. 2, human performance is a broad heading comprising both modeling of human reliability and modeling of what might underlie human reliability, i.e., the organizational factors at a facility. Several attempts to model human reliability have been made during the 80s. We refer indicatively to the SwainGuthman 3 Handbook on Human Reliability Analysis in 1987, to the Bello and Colombari 4 TESEO Operator Model in 1980, to Embrey et al. 5 with the SLIM-MAUD approach, and to the Hannaman, Spurgin et al.'s 6 model of Human Cognitive Reliability. These approaches, however, treat the human failures as events directly belonging to the sequence of events constituting accidents. Furthermore the analysis stops at the individual who directly causes the failure. Hurst et al. 7-l l, among others, in a series of articles claim that in a large number of accidents (loss of containment) or near misses, human actions integrated in organization and management factors were the underlying causes of accident initiators and/or subsequent failures. Jacobs et al. J2 state that safety statistics frequently attribute 50% or more of the causes of industrial accidents to human errors. Upon closer examination these 'human errors' may be of various origins and part of larger organizational processes that encourage unsafe acts which

It is now widely recognized that any Major Hazard (MH) Process Plant (Nuclear Power Plants or Chemical Sites) involves a complex interaction of human elements, hardware or technical elements and the environment or climate within which it operates. Such a system is commonly termed a 'socio-technical' system. The socio-technical approach emphasizes the individual, social, organizational and management aspects which affect human behavior and ultimately influence system performance, in addition to climate factors such as regulations and economic pressures. This interaction could be revealed by the use of auditing techniques of the Safety Management System within the plant. Safety auditing of dangerous chemical sites aiming at the identification of areas of vulnerability and of the corresponding hazards has been an early concern in the chemical industry, as it is witnessed by the Safety Audits Manual of the BCISC (British Chemical Industry Safety Council) dated back to 1973, the Hazard Survey of Chemical and Allied Industries by Fouler and Spiegelman in 1968 sponsored by the American Insurance Association, and the Dow's Process Safety Guide available from the AIChE (American Institute of Chemical Engineers) also in 1973 i. Two issues underlie the identification of hazards in Process Plants: 'hardware' reliability and 'human' reliability. These issues are not mutually exclusive, but rather inter-dependent, the main influence being from the 'human' part towards the 'hardware' part. Initially the need for assessing the 'hardware' reliability has led to the 185

186

Z. S. Nivolianitou, L A. Papazoglou

ultimately produce system failures. The same authors underline that the management role is central to the safe functioning of plants. This is particularly true in the process industry (nuclear power plants and chemical plants) where--in contrast to the manufacturing industry-reactions are not always so easily managed or controlled due to large amounts of stored energy (chemical, nuclear, pressure or temperature). Along the same line, Llory 13 claims that human factors are at the core of the real every day problem of safety in high risk facilities. According to Bellamy et al. 14, the interest in the influence of organization and management on plant safety was made evident in the UK in the early 80s, when in 1983 the UK Atomic Energy Authority commissioned the Bellamy to investigate whether any data existed to quantify the effects of these influences on human reliability. In the 80s safety culture became an emerging issue internationally, leading the International Nuclear Safety Advisory Group (INSAG) 15 to define it "as that assembly of characteristics and attitudes in organization and individuals which establishes that, as overriding priority, plant safety issues receive the attention warranted by their significance". This statement emphasizes that safety culture is attitudinal as well as structural, relates both to organizations and individuals and concerns the requirement to match all safety issues with appropriate perceptions and actions. The need for this change in mentality was also strengthened by a string of catastrophes--like Bhopal, Chernobyl, the Exxon Valdez and others--which have struck humanity in recent years. Pate-Cornell et al. 16,17mention that according to the investigators the accident on the Piper Alpha offshore platform is an example of critical subsystem failure rooted in several organizational problems, including failure of communications and failure to take corrective actions given early warning signals of an impending catastrophe. In the Challenger Orbiter, explosion among management factors that affected the reliability of a certain protective system were time pressures that had occurred in the past, liability concerns and conflict among contractors, the low work status of certain technicians among the maintenance personnel, the absence of priorities in work testing and the under-recognized coupling among subsystems. Despite the perception of this situation it is difficult to measure the common cause-effect of organizational factors. Bley 2 claims that it is easy to get a feeling for corporate culture, since the prevailing philosophy in a particular organization (progressive or reactive, rigid or flexible) comes through when one visits the plant and talks to operating and maintenance staff. It is much more difficult, nevertheless, to assess the quantitative impact of the safety culture and to convince the plant owner that this quantitative impact is the reasonable result of conditions at his facility. Davoudian et al. ~8 claim that, in attempting to assess the impact of organizational factors on plant safety, two major tasks must be accomplished. First, models of the organization as a whole are needed, and second, models which allow

for the quantification and incorporation of this impact into PSA must be devised. The same authors point out that, quite frequently, the largest obstacle in the way of accomplishing the first task is the fact that an informal organization is superimposed upon the formal work organization. In other words, although most of the work in complex organizations is, in one way or another, standardized, individuals tend to deviate from these standards in their daily routines. This problem is more accentuated in chemical plants which are lacking the formal environment of a nuclear power plant as is described by Haber et al. 19 and tend to have more loose communication channels. Winsor 20, on the other hand, points to a general difficulty within a hierarchically structured industrial environment of either sending or receiving bad news, particularly when it must be passed to superiors and outsiders. Pate-Cornell et al. 16 state, that an organization faces two classes of problems in situations of distributed decision-making: information problems and incentive problems. Several interesting taxonomies have been suggested so far on how to describe this formal organization of a process plant and the various interaction loops among humans at different levels. The model by Wu et al. 21 shows through the highlighting of four characteristics the influence of management on risk. Pate-Comell et al. 16 propose a taxonomy of human errors dividing them broadly into errors of judgment and into gross errors (errors about which there is no controversy). In the specific case of an offshore platform the authors demonstrate that the percentages of each error category are 63 and 37%, respectively. They were all classified as design errors and contributed 40% to the failure probability. Jacobs and Haber ~2 describe a technique of measuring organizational factors based on 20 common dimensions, while Llory 13 defines 14 climate scales assessing, among other things, the psychological climate inside a company. Modarres et al. 22, starting from the onion model of a nuclear power reactor and its management conceived by the US Nuclear Regulatory Commission, proposed the organizationalfield model, which covers behavioral aspects of the organization related to the management and operation of the plant, coupled with the diamond tree, which gives the functional hierarchy of plant safety. Davoudian et al. 18 apply their Work Process Analysis Model (WPAM), to pre-accident conditions of a nuclear power plant within the framework of the maintenance work processes, with an ultimate goal to investigate the dependencies that organizational factors introduce among probabilistic safety assessment parameters. Reason 23 with his types-tokens model of accident causation and Embrey et al. 3 with the SLIM-MAUD approach trace the same line of accident causation through active, latent and recovery errors. Another approach is the MIMIX technique 24 which probes the management scheme of a plant from the supervisor level and up, leaving hardware reliability to QRA techniques.

Safety management Along the same lines is the socio-technical pyramid concept developed by Bellamy et al. 14,25,26 proposing a hierarchical model of accident causation within a process plant. An evolution of this methodology is a Process Safety Management System (PSMS) audit tool. This paper presents an implementation of the PSMS audit tool called PRIMA 27 (Process Risk Management Audit) at two Greek major hazard sites (among others) as part of a project under the Commission of the European Union (EU) Environment Program 1993-94. In the remainder of the paper they are referred to as company A and company B. Following a short description of the methodology in Section 2, Section 3 presents and discusses the results of the application of the prototype audit technique in the Greek process industry. Section 4 gives general comments on the auditing methodology itself, its application in the Greek industry and the tool adaptation for the Greek version.

2 T H E O R E T I C A L A N D E M P I R I C A L BASIS OF THE AUDIT 2.1 Overview As mentioned above, interest in the influence of organization and management on safety began in the EU in the early 1980s. The possibility of quantifying management influence on safety was addressed in that decade, and, in the mid-80s industry started to query the application of generic failure rate data to all companies and plants, given management differences. Consideration of the major causes of accidents involving a system failure led to the development of the MANAGER audit technique 28. The decision as to which questions to include was relatively subjective but the intention was to cover all relevant aspects of the Safety Management System (SMS). Quantification of the audit results was achieved by mapping the range of the possible audit results into a range of failure rates with lower and upper bounds half and one order of magnitude away from the generic failure rates respectively 26. Against this background, work was undertaken by the Health and Safety Executive of the U.K (HSE) to develop an audit system with a demonstrable statistical and theoretical basis which would have the potential to quantify the quality of an SMS at a plant and link this into any risk assessment carried out. The prime objective of the audit was to provide a systematic appraisal of the Process Safety Management system (PSMS) operating at a major hazard industrial site. This results in the production of structured observations about the PSMS, risk reduction recommendations and a quantitative indicator of the quality of the PSMS. This indicator, based on the overall rating the site or plant receives, is specifically intended for use as a failure rate modifier in QRA. Within the project, originating from a number of previous research studies and applications 14,25,26,29-31, a prototype

187

audit technique was developed 27 (PRIMA) which: •



identified and quantified the contribution of various management influences on loss of containment incidents in pipework (921 incidents), vessel (230 incidents), and, more recently, industrial hoses/ loading arms (162 incidents), using a 3-dimensional classification scheme of cause of failure; developed a theoretical model of the route by which management influences hardware failures using the concept of a socio-technical pyramid of causation.

2.2 The classification scheme of Loss of Containment (LOC) incidents Incidents were classified according to: • • •

direct or immediate cause of failure; origin of failure or underlying cause; and recovery or preventive mechanism failure.Examples of classes of direct causes are corrosion, overpressure and operator error.

The underlying causes or origins of failure used are: • • • • • • • • •

Design (DES); Construction/Installation (CON); Operations during normal activities (OP); Maintenance activities (MAINT); Manufacture/Assembly (MANU); Natural Causes (NAT); Domino (DOM); Sabotage (SAB); Unknown Origin (UO);

The first four causes i.e., design, construction, normal operation and maintenance are the most important in terms of numbers of component failures. The audit is, therefore, primarily structured around these areas. The recovery or preventive mechanism is the mechanism that theoretically could have recovered or prevented the failure. The categories used are: • • • •

• •

Appropriate hazard study of design or as-built, e.g., HAZOP(HAZ); Human Factors review (HF); Task driven recovery activities, i.e., checking, testing and correction of completed tasks (CHEC); Routine, regular, recovery activities, i.e., routine inspections and tests, process sampling, safety assessments (ROUT); Not Recoverable (NR); Unknown Recovery (UR);

The combination of the underlying causes of failures and recovery mechanisms failures define specific ways through which loss of containment accidents can come about and suggest areas that can be influenced by the SMS. Each of the 1313 analysed incidents was attributed to one of these combinations and the results identified the eight more significant

Z. S. Nivolianitou, L A. Papazoglou

188 Table 1. Audit areas of concern

Audit area

Description

Pipework (%)

Vessels (%)

Hoses (%)

DES/HAZ MAINT/HF

Hazard Review of Design Human Factors of Maintenance Task Checking and Supervision of Maintenance Routine Inspection, Testing and Maintenance Human factors of normal Operations Task Checking and supervision of Construction Work Hazard Review of Normal Operations Task checking and Supervision of Normal Operations

25 15

29 6

19 0

13

4

2

10

1I

15

11

24

35

8

2

6

0

5

9

2

2

10

MAINT/CHEC MAINT/ROUT OP/HF CON/CHEC OP/HAZ OP/CHEC

combinations given in Table 1 29 and described below. DES/HAZ (Hazard Review of Design): this area concerns the development and implementation of safety engineering codes, standards and procedures for the design of new and modified plant. MAINT/HF (Human factors of Maintenance): this area concerns the training, supervision, procedural support etc. provided for the persons involved in the performance of maintenance and inspection work, such as fitters and inspectors. M A I N T / C H E C K and MAINT/ROUT (Task checking and supervision of maintenance): this area addresses the conduct of routine inspection and maintenance of plant both during and between plant shutdowns. OP/HF (Human factors of normal operations): this area concerns the assessment and management of the potential for human error within plant operations. CON/CHEC (Task checking and Supervision of Construction Work): this area relates to the inspection and testing of constructed plant prior to commissioning. OP/HAZ (Hazard reviews of normal Operations): this area concerns the development and implementation of technically safe operating procedures through the analysis of process and operating hazards. OP/CHECK (Task checking and supervision of normal operations): this area relates to the checking by supervisors, foremen and other operators, of work carried out by operators and other operational personnel, such as process control, valve opening/closure, purging, inerting etc. 2.3 T h e socio-technical p y r a m i d o f accident c a u s a t i o n a n d the a u d i t t h e m e s

The socio-technical model of accident causation was developed from analysis of accidents such as Three Mile Island, Flixborough etc. The model is represented as a sociotechnical pyramid (see Fig. 1) with the causation of an

accident being traced back through increasingly remote levels of abstraction. The events leading up to 'Operator Reliability'--Level 2 referring to the accident initiation (gray area) and the Engineering Reliability--Level 1 are not addressed by the PRIMA methodology but are rather examined within the context of a QRA. Levels 2 - 5 are used to structure the audit question set and to ensure that all causes of accidents, distant and immediate are addressed. Complementing the socio-technical pyramid is the concept of the management control loop which is central to the audit rationale. This loop is based on the socio-technical pyramid and defines all of the elements and links that should be in place for a complete process management system. An illustration of the control loop is shown in Fig. 2 where each box represents a transition between socio-technical levels. This loop covers both PSMS design, implementation, monitoring and revision and includes consideration of the levels of the organizational hierarchy which can affect process safety. This management control loop ought to exist (according to the methodology) in each of the eight areas of concern mentioned above (see Table 1). In order to test the 'integrity' of the control loop, a set of questions has been developed covering all eight areas of concern. Furthermore, the audit question set for each area of concern is structured according to four job related themes as follows: Theme A: procedures and processes to do the job (refers to those procedures and processes which have been established to carry out the functions and the organizational needs to achieve its operational objectives). Theme B: standards for the job (refers to the means by which the appropriate setting of safety standards are checked and improved as necessary) Theme C: do other pressures interfere with the job? (refers to the schedule, economic, operational, output and cost pressures which may impair the execution of safety functions and the barriers in place to counter these pressures).

189

Safety management

Theme D: are there adequate resources for the job? (refers to all the measures and resources which an organization may utilize to fulfil its functions and maintain levels of performance). On the basis of the answers to the questionnaire the auditor is assessing each of the eight areas of concern as 'good', 'average' or 'poor'. The judgment is based on the degree of completeness of the management control loop. A management system, in a particular area of concern is good if all elements of the control loop (indicated by arrows in Fig. 2) are present or if at the most the feedback between level 4 is missing. 'average' characterizes a SMS if it contains the feedback element from level 2 to level 3. Finally 'bad' characterizes the SMS that lacks at least the feedback elements from level 2 to level 3, from level 3 to level 4 and at level 4. Criteria for assessing the existence or not of elements of the control loop have been developed. The use of these three grades is not an indication of a comparison with what the industrial standard is considered to be, but solely a comparison with the control loop itself. It may be the case that average correspondence with the control loop is well above the industrial standard in most cases 29

3 A P P L I C A T I O N OF T H E P R O T O T Y P E AUDIT

TECHNIQUE IN THE GREEK PROCESS INDUSTRY In order to test the developed auditing methodology and to build on the above work, the method has been applied to a number of hazardous process industries in various European countries--including Greece--comparing the results and considering the implication for the modification of QRA results and land-use planning implications. In this paper only the first part of the application is presented while the modification of the QRA results are discussed in Ref. 32 A significant amount of information on the organization of the companies, the management system and the processes was collected through pre-audit visits. Next, on the basis of this information, a list of persons representative of the organogram of each company was generated with the collaboration of the management representing vertical and horizontal slices of the personnel starting from upper management members (general manager, all technical directors) and going through supervisors and foremen down to operators and fitters. The selected personnel were interviewed by a team of analysts experienced mainly in process and risk analysis of chemical plants and to a lesser degree in human factor analysis. The team itself has been trained in PRIMA by its developers. Analysis of the responses to the question set led to the following judgment of the safety management system of each site.

3.1 Company A

Level 1 Level 2

/

/Operator~ Reliability

"~

L Level 3

~Control

/

and Feedback

[

3.1.1 Results of the audit at company A The general survey of the SMS of company A (ammonia storage facility of a fertilizer company) created the impression to the analysts that a general pro-safety attitude already exists and is constantly under development. The department of Safety and Quality Control is sufficiently manned with competent personnel, which takes care of the overall safety practicing. Issues, like the ISO 9000 certification are being addressed. A short description of each audit area follows. For the sake of simplicity, areas with a certain affinity are examined together although their rating may vary, as is explained in the following. DES/HAZ and CON/CHEC. In the design and construction of new projects the following positive points were underlined:

• •

Level4 /

and~tanOe~d~



[

Level 5 /~ystem

ClimatS~

Certain areas, however, need consideration, like: •

Fig.1. Socio-technicalpyramid.

company A has significant experience in dealing with design problems of ammonia installations; during the design stage of new projects safety is an active issue; there is a set of written procedures for both preliminary and final acceptance of a new construction.

the systematic hazard assessment of a new project with well known risk analysis techniques, e.g., HAZOP, fault tree analysis;

190

Z. S. Nivolianitou, I. A. Papazoglou •



the standards and policies for assessing the effectiveness of existing control procedures in the acceptance (preliminary and final) of a new construction; additional training in matters related to HAZOP analysis and works management of engineers participating in new projects.

The first of the two areas (DEZ/HAZ) is judged as 'average' (see Fig. 3) as the criteria required to reach the 'poor' PSMS upper boundary have already been surpassed whilst those of the 'good' PSMS lower boundary were not attained. The transition from level 2 to 3 is identifiable but weak. There is some reviewing of the systems for design hazard review but such an activity is not systematic. There is very little evidence that policy on hazard review of design is systematically revised based on review and monitoring activities (3 ---* 4 missing). The second area (CON/CHEC), is judged to have 'poor' correspondence with the Ideal Management Control Loop, since transitions which cover the implementation of the

management system for assessing and controlling the checking of construction work are not in place at all (2 ----, 3 missing) and the revision of policies seems weak (3 ~ 4 missing). OP/HF, OP/HAZ and OP/CHEC. The company claims to operate at a high safety level because it has a specialized Safety and Quality Control Department. However, there is no systematic consideration of human factors in this approach. In particular: • •

there are no written job descriptions and procedures; the training of the employees in safety-related matters (under the auspices of the Safety and Quality Control Department) is not very systematic or intensive (for instance, it was noted that certain employees do not follow the training courses systematically owing to heavy work load); the operators did not follow special training regarding the risks to which they are exposed during their work in certain processes;



Level 2: (Operato~ Re,al~llty)

Oom ,en ..ut,oO,.o, personnel, and quality of job [1~ support (tools, interface etc.) I I ~

~n

Implementationthrough ~ ormalization of selection, training, liI | procedures,communications& Ii1 ~x_ conflict resolution J9

Leve~3: (Convr~ntcottons, con.ol and feedbock]

S Monitoringand assessmentof I effectivenessof the control system H |(including incident/nearmiss data H ~ ..........................a.Na.!ys!s! .. ............................J9 ..

~ . ~ . . ~ . . v . ~ . . ~ . ~ . ~ . ~ . ~

Setting of policies, standards, organization, responsibilitiesand resources

Level4: (C~'ganlsatlon and manogement]

i

Revision of policiesstandards, -~ organization, responsibilitiesand H .....................................................

J

................................... J

srrE SYSTEM CLIMATE

I

Regulationsguidance and industrynorms

Level 5: (Sv'aem climate)

Revision of regulations, guidance "~ and industrynorm

.......................::::.:.:.;;:.;.;.;.................................................... . .....:............................J

Fig. 2. Ideal management control loop.

Safety management

191

Judgement: Average

{

1_,.~. {

l

t

J

7.,.

I

]~_~".--{ t~d4:

(

l~_S{

l

---

Judgemnt: Poor IvI~INT/CHEC

/ {

j

f

/,c

i ],~{

)

)

Judge~.,age

Judgement:Poor

~,~c

{

. _-_-.=..{

]

I

{

J~I""

)

I

{

IzAI

)

t,,,~ ]I ¢',.,,,,~,,,~,,,t c,

]

¢mtwl ~1

] ~."-

}

] ,,.Z.~ {

}

~Jt, at I

t

Judgement: Poor

f

tAml 2:tOpm~ I l d ~ q ~

Judgement:Poor

i m

{

l~C

1

/

l ~='-{

}

i'd

Fig. 3. Management control loops of company A at the eight audit areas.

Z. S. Nivolianitou, I. A. Papazoglou

192 •

the movement of large vehicles and machinery in the operations area is not controlled in a systematic way.

According to Fig. 3 the OP/HF area is judged to have 'average' correspondence with the ideal management control loop. The criteria required to reach the 'good' PSMS lower boundary are not met in all transitions, although the implementation of the management system for assessing and controlling the potential for human error associated with operations (2 ---* 3) is relatively well developed. However, the other side of transitions concerned with the assessment and review of the system is lacking, at least in transition 3 ~ 4. The other two areas, OP/HAZ and OP/CHEC, are judged to have 'poor' correspondence with the ideal management control loop. The elements concerning the implementation of the management system for assessing and controlling hazard reviews associated with operations on the one hand and for the checking associated with operations on the other, are more or less there, however, the transition levels concerning the assessment and review of the system are missing (2 ~ 3 and 3 ---* 4 missing). MAINT/HF and MAINT/ CHECK. The company has significant experience in maintenance matters owing to its uninterrupted running for more than 30 years. The applications of safety during maintenance is supervised and controlled by the Safety and Quality Control Deptartment. The 'permit to work' system is normally practiced. Some points to highlight, though, are the following: • •



there are no explicit instructions for maintenance execution; there is no stated policy for labelling the process areas under maintenance or for putting control labels in areas where maintenance is ongoing; there are no procedures for the verification and control that maintenance has been successfully executed.

The first area, MAINT/HF, is judged to have 'average' correspondence with the ideal management control loop. The elements for meeting the criteria set by the good PSMS are not attained in this area. Regarding the implementation of the management system for assessing and controlling the potential for human error associated with maintenance (see Fig. 3), some transition links are in place, others, however, concerned with the assessment and review of the system seem rather weak or missing, like the policy reviewing, i.e., transition 3 --~ 4. The second area, MAINT/CHEC, is in 'poor' correspondence with the ideal management control loop. In general, there is a system in place for the checking of maintenance work in the company which is mainly based on the permits to work (PTWs) and in effective personnel communication. However, there is no policy for checking and reviewing the efficiency of this system, as can be seen in Fig. 3, since transitions 2 ~ 3 and 3 ---, 4 are simply not there.

MAINT/ROUT. Despite the in-company accumulated significant experience owing to long operating periods, a point could be raised for the following item: •

it is difficult to check the effectiveness of strategies and standards for controlling successful maintenance owing to the lack of safety related indicators e.g., Mean Time to Failure.

The judgment of 'poor' fits also the area MAINT/ROUT as can be seen in Fig. 3. Most transitions of the loop are loose (like the control mechanisms that inspection and routine maintenance of equipment have been successfully performed) or working based on the personal experience of the employees, without systematic checking and revision of this practice. Transitions 2 ----, 3 and 3 ---, 4 are missing in the loop.

3.1.2 Summao' of the observations for the safety management system of company A In accordance with the above judgments, most of the areas of the PSM have been judged to have 'poor' correspondence with the ideal control loop of Fig. 2. Only the areas DEZ/ HAZ (Hazard Review in the design stage), MAINT/HF (Human factor in maintenance) and OP/HF (Human factor during normal operations) have been judged to have 'average' correspondence with the ideal control loop, as is depicted at the management control loops of company A at the eight audit areas presented in Fig. 3.

3.2 Company B 3.2.1 Results of the audit at company B The general survey of the SMS of company B (a refinery) led to a better impression than in the first case study. Both the management of the company and the personnel at all levels are very aware of the high potential risks that the operation of their company implies. The Safety Engineer of the plant has the corporate surveillance for safety practicing at work places, while the Quality Assurance Manager checks all job descriptions, procedures and documents in the light of constraints that the ISO 9000 certification imposes to the factory. More analytically, the following comments can be made. DES/HAZ and CON/CHEC. In the design and construction of new projects the following positive points were observed: • • • • •

the refinery has significant experience in dealing with design problems related to hydrocarbon processes; during the design stage of new projects safety is always taken into consideration; there are clear procedures for the acceptance (both preliminary and final) of a new construction; for every new project commissioned outside the company a HAZOP analysis is required; new projects developed within the company are reviewed from a safety point of view (HAZOP) only if addressing safety enhancement;

Safety management • • •

there is an explicit policy in the factory for the use of standards during the assessment of new works; external constructors are obliged to comply with the safety requirements of the company.However, special training in matters related to HAZOP analysis and new works management is limited for engineers participating in new projects.

Both areas DES/HAZ and CON/CHEC are judged to have 'average' correspondence with the ideal management control loop, as it can be seen in Fig. 4. Neither of them meets the requirements of the 'good' PSMS lower boundary. In the first case the transition from level 2 to level 3 is identifiable but weak. There is some review of the systems for design hazard review but such activity is not systematic and there is very little evidence that policy on hazard review of design is systematically revised based on review and monitoring activities (3 ---* 4 missing). Along the same lines, in the second case the transition covering the implementation of the management system for assessing and controlling the checking of construction works is in place and well documented, but it is not evident how the monitoring and assessment of the effectiveness is achieved and moreover how the revision of policies (3 ---* 4 missing) is done, as they rely on the experience of the contractors. OP/HF, OP/HAZ and OP/CHEC. The company has a stated policy regarding the preservation of a high safety level. The Safety Engineer as well as the Quality Assurance Manager are responsible for the following: • •



accidents are analysed to assess their causes, including possible human errors; the training of the employees in safety-related matters is systematic as far as fire protection is concerned but less intensive as far as process related risks are concerned; the operators follow some special training regarding the risks to which they are exposed during their work in certain processes and always have written job descriptions at hand.

The OF/HF area has 'good' correspondence with the ideal PSMS loop requirements. Transitions are in place to ensure that the implementation of the management system for assessing and controlling the potential for human error associated with operations is well developed. The other side of the loop concerned with the assessment and review of the system is also in place within the framework of the ISO 9000 certification and the accident analysis system The OP/HAZ area is judged to have 'poor' correspondence with the ideal management control loop, as the transition on the left hand side of the loop, concerned with the implementation of the management system for assessing and controlling hazard reviews associated with operations are not applied in a systematic way, and the transitions on the other side of the loop, concerned with the assessment and review of the system, are not there, as can be seen in Fig. 4.

193

Lastly the OP/CHEC area is judged to have 'average' correspondence with the ideal management control loop. The criteria for meeting the 'good' PMSM lower boundary are almost met apart from the transition 3 --0 4 which seems to be absent. MAINT/HF and MAINT/ CHECK. The company has significant experience in maintenance matters owing to its uninterrupted running for more than 20 years. The application of safety during maintenance is supervised and controlled by the Safety Engineer: • • •

the 'permit to work' system is normally practiced; there are written instructions for maintenance performance. However, there is no stated policy for signs and control labels in process areas where maintenance is ongoing.

The MAINT/HF area just crosses the 'good' lower boundary in comparison with the ideal management control loop as can be seen in the appropriate loop of Fig. 4. All links between the various transition levels are there, with greater or smaller consistency and also transition 3 ---* 4 is partially covered. The MAINT/CHEC area is judged to have 'average' correspondence with the ideal management control loop. In general there is a system in place for the checking of maintenance work in the company, which is mainly based on the PTWs and to effective personnel communication between operations and maintenance personnel. The existence of written procedures restricts to a great extent the possibility of omitting the performance or checking stage. The auditors felt that most of the transition links are in place and well documented, apart from the transition from level 3 ---, 4 (reviewing of the checking procedure) bringing this area just below the 'good' lower boundary of the PSMS, as can be seen in Fig. 4. MAINT/ROUT. As has already been mentioned, the company has accumulated significant experience owing to its long operating period. •

The effectiveness of strategies and standards for controlling successful maintenance can be checked on the basis of this experience.

The MAINT/ROUT area is also judged to have 'average' correspondence with the management control loop, as can be seen in Fig. 4. Most transitions of the loop are there (like the control loop showing that inspection and routine maintenance of equipment has been successfully performed) and performing in a more or less systematic way. However, there is no clear evidence that transition 3 ---* 4 is there, thus bringing the whole system below the 'good' lower boundary level.

3.2.2 Summary of the observations for the Safety Management System from company B According to the conclusions presented above most of the areas of the PSM have been judged to have 'average' correspondence with the ideal Control loop, although some of

Z. S. Nivolianitou, L A. Papazoglou

194

Judgement : Good

Judgement: Average

MAIN'r/I-IF

DESA-IAZ

a-~

-Judgement: Average

Judgement : Average

MAINTICHEC

(

IN~t I

t

Judgement : Good

Judgement: Average

OP/HF

CON/CHEC i~ 2 IOr=v~l~Rr~Ality)

j Judgement : Poor OP/I~AZ

Judgement : Average MAINT/ROUT

L~w,,,dZ;~O=en~c=f P,,dll'~i',,,I

(

t

1

f~k~ 1''4~

Fig. 4. Management control loops of company B at the eight audit areas.

Safety management them were just below the lower limit of the 'good' PSM loop definition. The areas MAINT/HF and OP/HF (i.e., Human factors in maintenance and operations) were judged as 'good', while OP/HAZ (Hazard identification in operations) was judged as 'poor' compared to the ideal PSMS control loop. These judgments are depicted in the management control loops of company B in the eight audit areas presented in Fig. 4.

4 CONCLUSIONS R E A C H E D FROM THE A P P L I C A T I O N OF T H E P R I M A A U D I T T O O L 4.1 General comments on the methodology

• A number of general conclusions conceming the applicability of the PRIMA audit tool were reached from the application of this tool in the two Greek plants. •







The tool is usable by people with general process and safety engineering knowledge after a rather short training. The team that performed the two audits mentioned in this paper consisted of two researchers (with engineering training) with degrees of expertise in risk assessment varying from little, to 8 years of experience. The training consisted of a 2-week workshop: 1 week of oral presentations and 1 week of participating as observers deriving the actual application of the audit. The total audit is based on fundamental management and safety principles and presents no major cultural difficulty for countries of different cultural background. At least no such problem has been identified in the application of the tool in four different European countries. An exception to this general statement has to do with the translation of the questionnaire as explained later. As also mentioned in Ref. 33 the audit tool appears to be useful in the implementation of the proposed new SEVESO directive which explicitly asks for consideration of management systems and for corresponding inspection requirements in assessing the safety of chemical sites in the European Union. In the opinion of the authors, however, some further simplification might be in order prior to widespread application. At the beginning of the project the questionnaire contained 640 questions, which during the implementation in this project were reduced to 570 by omitting redundant questions or questions not applicable to areas of concern or work themes of certain management levels. Further reduction was achieved through the implementation of the audit in the two industries, mainly owing to the coincidence of several levels of management (as perceived in the ideal PSMS) into single levels. In the two audits reported here these 570 questions reduced







195 to a set of 150 main questions 34. For example, all questions investigating the 'system climate' were addressed either to the general manager and/or to the head of each investigated area e.g., maintenance or human factors. The general manager being the same for all audit areas and the maintenance manager being the same for three audit areas (MAINT/ HF, MAINT/ROUT, MAINT/CHECK) the relevant questions were asked once with minor clarifications regarding the specific areas. The same has happened to lower levels of organization concerning foremen or asset managers, leading thus to a reduction of the amount of audit questions actually asked by roughly a factor of 4 in the two Greek audit experiences. There is still a group of questions that are repeated almost verbatim in each area of concern. This could be avoided in areas with close context, like for instance MAINT/ROUT, MAINT/CHECK and MAINT/HF. Elimination of these redundancies is important for both the auditor and the interviewees. First, the auditor should not be obliged to repeat almost similar questions and secondly the interviwees are not faced with long and dull interviews. Fundamental principles like 'application of the permit to work system in all phases of maintenance' should be investigated with the use of a few general questions addressing all three areas. The complexity of the questions should correspond to the level at which they are asked e.g., if the issue to be investigated is employee motivation to pursue safety, questions like "Are staff aware of the system of rewards and sanctions relating to safety matters?" may be OK for the general manager, but difficult to understand for a plant operator. Similar remarks can be made to the degree of directness of certain questions. At the beginning of the project the auditors were faced with similar difficulties trying to understand the definition of each anchoring point in the control loop (i.e., the standards for assessing 'good', 'average' or 'poor' performance). This, however, was rather improved in the second version of the audit tool. In the application of the audit methodology in Greek industry it was taken for granted that the proposed PSMS defined by the control and monitor loop of Fig. 2 is the 'ideal' one and every comparison has to be made against this system. This ideal PSMS is in turn based on a number of fundamental safety and management principles. The proposed specific manifestations of these principles are, however, designed for rather large companies with significant personnel turnover. Lack of correspondence of the analysed management schemes to the ideal are considered to be deficiencies. The importance of such deficiencies, nevertheless, might vary with the size of the

Z. S. Nivolianitou, L A. Papazoglou

196

company and the average time of j o b occupancy. Some of the required elements of the ideal scheme might be of lesser importance in companies of small size and with employees associated almost for life with a particular type of job in the company (like those analysed in Greece).

4.2 General comments on the application in the Greek industry A number of insights into Greek industry that were gained through the application of the P R I M A audit tool are discussed below. •





The overall impression of the audit at the two sites is that areas requiring significant improvement in their SMS have been identified. Substantial risk reduction can be achieved if these improvements are implemented as was stated in the reports to the management in each case. Another basic conclusion from both audits is that the deficiencies identified are mainly located in the 'feedback links' of the proposed ideal loop. Usually policies do exist and are implemented but the mechanisms for monitoring their effectiveness are missing. This is a more general problem in the industrial domain where production pressures are significant, making the assessment of the effectiveness of safety assurance systems and their improvement rather problematical. The application of the audit tool to two sites of the Greek industry helped to open up these enterprises to new scientific techniques which can be much less expensive for the same safety gain than 'adding steel to the structure'.

4.3 Adaptation of the tool in the Greek version The original version of the question set was in the English language, therefore to make the audits in the Greek industry it had to be translated into Greek. In light of the difficulties presented in Section 4.1 above, the Greek version was produced with the following features. Certain redundant questions were omitted leading to a more compact version of the question set. Fundamental principles of management were investigated on the basis of a few general questions. More profound questions were asked, if necessary, about special areas of concern at the various levels of management. Care was taken, however, not to lose the powerful principle of the audit, which is cross-checking of information through asking the same question to different people.

ACKNOWLEDGEMENTS The authors are pleased to acknowledge the support of the Ministry of VROM (Contract No: 93231410) and the Commission of the European Union (Contract No: EV5VCT92-0068) in partial funding of this work.

REFERENCES 1. Lees, F. P., Loss Prevention in the Process Industries, Vol. 1. Butterworths, London. 2. Bley, D., Kaplan, S. & Johnson, D. The strengths and limitations of PSA. Where we stand. Reliabili~" Engineering and System Safer3', 1992, 38, 3-26. 3. Swain, A. D. and Guttmann, H. E., Handbook of Human Reliabilir3, Analysis with Emphasis on Nuclear Power Plant Application. Prepared for the U.S. Nuclear Regulatory Commission, Sandia Laboratories, NUREG/CR-1278, April 1980. 4. Bello, G. C. & Colombari, V. The human factors in risk analysis of process plants: the control room operator model. TESEO, Rel. Eng., 1980, 1, 3-14. 5. Embrey, D. E., Humphreys, P. C., Rosa, E. A., Kirwan, B. and Rea, K., SLIM-MAUD: an approach to assessing human error probabilities using structured expert judgment. NUREG/CR-3518, US Nuclear Regulatory Commission, Washington, DC, 1984. 6. Hannaman, G. W., Spurgin, A. J, and Lukic, Y. D., Human cognitive reasoning model for PRA analysis. NUS-4531, Electric Power Research Institute, December 1984. 7. Hurst, N. W., Immediate and underlying causes of vessel failures: implications for including management and organizational factors in quantified risk assessment. Paper presented at IChemE Symposium Series No. 124. Institution of Chemical Engineers, Rugby, UK, 1992. 8. Hurst, N. W., Bellamy, L. J., Geyer, T. A. W. & Astley, J. A. A classification scheme for pipework failures to include human and sociotechnical errors and their contribution to pipework failure frequencies. Jou~rnal of Hazardous Materials, 1991, 26, 159-186. 9. Geyer, T. W., Bellamy, L. J., Astley, J. A. & Hurst, N. W. Prevent pipe failures due to Human errors. Chemical Engineering Progress, 1990, 80(11), 66-69. 10. Wright, M. S., Bellamy, L. J., Brabazon, P. G. and Hurst, N. W., The evaluation and management of pipework and vessel safety. C454/014, IMechE 1993, pp. 175-191. 11. Hurst, N. W., Davies, J. K. W., Hankin, R and Simpson, G., Failure rates for pipework--underlying causes. Paper presented at Valve and Pipeline Reliability Seminar, University of Manchester, Institution of Mechanical Engineers, 24 February 1994. 12. Jacobs, R. & Haber, S. Organizational process and nuclear power plant safety. Reliability Engineering and System Safer3', 1994, 45, 75-83. 13. Llory, M. A. Human reliability and human factors in complex organizations: epistemological and critical analysis--practical avenues to action. Reliabili~" Engineering and System Safety, 1992, 38, 109-117. 14. Bellamy, L. J., Wright, M. S. and Hurst, N. W., History and development of a safety management system audit for incorporation into quantitative risk assessment. International Process Safety Management Workshop, AIChE/CCPS, 2224 September 1993. 15. International Atomic Energy Agency--International Nuclear

Safety management

16. 17.

18.

19. 20.

21.

22. 23. 24. 25.

Safety Advisory Group, Safety Series No. 75-1NSAG-4, IAEA, Vienna, 1991. Pate-Cornell, M, E. & Bea, R. G. Management errors and system reliability: a probabilistic approach and application to offshore platforms. Risk Analysis, 1992, 12, 1-18. Pate-Cornell, M. E. & Firchbeck, P. S. PRA as a management tool: organizational factors and risk-based priorities for the maintenance of the tiles of the space shuttle orbiter. Reliability Engineering and System Safety, 1993, 40, 239-257. Davoudian, K., Wu, J. S. & Apostolakis, G. Incorporating organizational factors into risk assessment through the analysis of work processes. Reliability Engineering and System Safety, 1994, 45, 85-105. Haber, S., O'Brien. J., Metlay. D. and Crouh. D., Influence of Organizational Factors on Performance Reliability, Vol. 1. NUREG/GR-5538, BNL-NUREG-52301, 1991. Winsor, D. A. Communication failures contributing to the challenger accident: an example for technical communicators. IEEE Transactions on Professional Communication, 1988, 31, 101-107. Wu, J. S., Apostolakis, G. E. and Okrent, D., On the inclusion of organizational and managerial influences in probabilistic safety assessments of nuclear power plants. In The Analysis Communication, and Perception of Risk, eds B. J. Garrick and W. C. Gekler. Plenum Press, New York, 1991, pp. 429439. Modarres, M., Mosleh, A. & Wreathall, J. A framework for assessing influence of organization on plant safety. Reliability Engineering and System Safety, 1992, 38, 157-171. Reason, J., Types, tokens and indicators. In Proceedings of the Human Factors Society 34th Annual Meeting. The Human Factors Society, Santa Monica, CA, 1990, pp. 885-889. Papazoglou I. A., Management factors in process plant safety: the TOMHID approach. Ispra, technical note 1.94.102, August 1994. Hurst, N. W., Hankin, R., Bellamy, L. J. & Wright, M. J. J. Auditing--a European perspective. J. Loss Prev. Process Ind., 1994, 7(2), 197-200.

197

26. Hurst, N. W., Auditing and safety management. CEC DGXII/ ESReDA Conference, Occupational Safety Seminar, Lyon, France, 14-15 October 1993. 27. Contract EVSV-CT92-0068, Environment Program, Auditing and safety management for safe operations and land use planning. A cross-national comparison and validation exercise. Final report for the CEC, 20 December 1995. 28. Pitblado, R., Williams, J. C. & Slater, D. H. Quantitative assessment of process safety programs. Plan~Operations Progress, 1990, 9(3), 169-175. 29. Four Elements Limited, Process Safety Management Audit Manual. Greencoat House, Francis St., London SWIP DH, February 1994. 30. Hurst, W. N., Bellamy, L. J. and Wright, M. S., Research Models of Safety Management of Onshore Major Hazards and their possible Application to Offshore Safety. IChemE Symposium Series 130, 1992, pp. 129. 31. Hurst, W. N., Bellamy, L. J. and Wright, M. S., Failure rate and incident databases for major hazards. In 7th International Symposium on Loss Prevention and Safety Promotion in the Process Industries, 4 - 8 May, Italy, 1992. 32. Papazoglou, I. A. and Aneziris, O., Quantifying the effects of organizational and management factors in chemica installations. In Probabilistic Safety Assessment and Management '96, Vol. 2, eds P. C. Cacciabue and I. A. Papazoglou. Springer-Verlag, Berlin, 1996, pp. 922-927. 33. Hurst, W. N., Young, S., Donald, I., Gibson, H. and Muyselaar, A., Measures of safety management performance and attitudes to safety at major hazard sites. J. Loss Prev. Process Ind., 1996, 9(2), 161-172. 34. Nivolianitou, Z., Papazoglou, I. A., Diamanti, D., Aneziris, O. and Briassoulis, H., Auditing and safety management for safe operations and land use planning. A cross-national comparison and validation exercise, implementation in the Greek industry. Greek contibution to the final project EV5V-CT92-0068 report, Athens, June 1995.