Human redundancy in complex, hazardous systems: A theoretical framework

Human redundancy in complex, hazardous systems: A theoretical framework

Safety Science 43 (2005) 655–677 www.elsevier.com/locate/ssci Human redundancy in complex, hazardous systems: A theoretical framework David M. Clarke...

453KB Sizes 0 Downloads 44 Views

Safety Science 43 (2005) 655–677 www.elsevier.com/locate/ssci

Human redundancy in complex, hazardous systems: A theoretical framework David M. Clarke

¤

Rolls-Royce plc, PO Box 2000, Raynesway (3W-200), Derby DE21 7XX, UK Received 21 November 2004; received in revised form 17 May 2005; accepted 25 May 2005

Abstract A theoretical framework is presented to provide a basis for sociotechnical system design strategies to promote eVective human redundancy in complex, hazardous systems and to support qualitative human reliability assessment. A concept of human redundancy is proposed, including a description of human redundancy forms. Human redundancy forms constitute the various ways in which human redundancy can be implemented in a sociotechnical system and incorporate human redundancy structures, active and standby human redundancy, duplication and overlap of functions, and cognitive diversity. Multi-modal redundancy structures in systems with human, hardware and software sub-systems are described. The foregoing concept of human redundancy is integrated with an adapted error recovery process. The resulting framework accounts for human redundancy forms, the inXuence of the nature of the error to be recovered, underlying error recovery processes (initiation of human redundancy followed by detection, indication, explanation and correction of the error), the local and organisational factors inXuencing the eVectiveness of human redundancy, and cognitive diversity. On the basis of this framework, failures of human redundancy are analysed, sociotechnical system design strategies for the promotion of eVective human redundancy are discussed, and future research needs are outlined.  2005 Elsevier Ltd. All rights reserved. Keywords: Human redundancy; Error recovery; Error detection; Cognitive diversity; Checking; Human reliability

*

Tel.: +44 01332 637526; fax: +44 01332 622968. E-mail address: [email protected]

0925-7535/$ - see front matter  2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.ssci.2005.05.003

656

D.M. Clarke / Safety Science 43 (2005) 655–677

1. Introduction Control of human error is as an essential part of safety management and can be achieved through a variety of means (Kirwan, 1994). While preventive approaches have dominated human reliability studies to date, it is becoming increasingly apparent that more attention must be paid to promoting error recovery (Kontogiannis, 1999; Van der Schaaf and Kanse, 2000). It is not possible to prevent all errors in complex, hazardous systems. Furthermore, there may be a point at which it becomes more cost-eVective to adopt strategies to control error through recovery, rather than implementing increasingly expensive preventive measures. Human redundancy is one important way of recovering human errors. Work in high reliability organisations (Roberts, 1989; La Porte and Consolini, 1991; Weick, 1987) suggests that redundancy, including human redundancy, is a key design feature of organisations capable of very high standards of safety performance. From a theoretical perspective, despite its widespread use and importance in risk management, human redundancy has been the subject of only a limited amount of research and is not well understood. From a practical viewpoint, a comprehensive framework is needed to provide a sound basis for deWning sociotechnical system design strategies to promote error recovery through human redundancy. In the present context, the term ‘sociotechnical system design’ is used to refer to all design-related activities that have a bearing on the nature of the system in connection with its safety, including technical system design, interface design, job design, training, organisation, and development of cultural attributes. The principal objectives of this paper are: 1. to set out a concept of human redundancy, including a detailed description of its possible forms (that is, the diVerent ways in which human redundancy can be implemented); 2. to integrate this concept with an error recovery perspective to generate a framework that describes human redundancy in sociotechnical systems; and 3. to use the framework to analyse failures of redundant arrangements in human systems, discuss sociotechnical system design strategies for the promotion of eVective human redundancy, and outline future research needs. The form of human redundancy present in a redundant arrangement in a human system can be thought of as deWning the organisational initial conditions with respect to the error recovery process. A description of human redundancy forms is developed in this paper. The characterisation of cognitive diversity is based primarily on the work of Westerman et al. (1995, 1997). The error recovery process in the framework is an adaptation of the standard error recovery process (Kontogiannis, 1997, 1999; Van der Schaaf and Kanse, 2000; Kanse and Van der Schaaf, 2001; Zapf and Reason, 1994), the essential components of which are error detection, error explanation and error correction. This adaptation is designed to reXect speciWc characteristics of recovery through human redundancy. The work is concerned with organised human redundancy in complex, hazardous, engineering systems such as nuclear power plants, commercial aviation systems, marine systems, and chemical plants. However, the Wndings may have wider applicability. This paper is concerned with redundant arrangements in human systems such as: 1. a supervisor and operators in the control room of a civil nuclear power plant or chemical plant;

D.M. Clarke / Safety Science 43 (2005) 655–677

657

2. the shift technical advisor and normal operating team in the control room of a civil nuclear power plant; 3. the Wrst oYcer and captain on the Xight-deck of a commercial airline; and 4. the bridge team and pilot on a ship. This paper is concerned with the role of a redundant individual in recovering the error of another. The recovery of errors of subordinates by their superiors and the recovery of errors of superiors by their subordinates are of interest, in addition to recovery of errors by peers. The overall reliability of a redundant arrangement of individuals is, of course, determined by the reliability with which an interaction is carried out in the Wrst place together with the reliability associated with error recovery. The main focus is on operation, but the framework has relevance to maintenance and other activities in sociotechnical systems. 2. Concept of human redundancy The existing concept of human redundancy in human reliability studies is that described by Swain and Guttman (1983). It is conceived as a recovery factor with the following key features: 1. someone checks someone else’s work; 2. a check is carried out at the time a function is fulWlled or soon after it is fulWlled; 3. the checker is directed, either verbally or through a written procedure, to check a particular human interaction; and 4. the check takes place during normal operation. A wider deWnition of human redundancy is used in this paper. Redundancy is present in a system when more than one means of performing a required function exists (Villemeur, 1992). Redundancy exists in a human system when two or more operators are concerned with the fulWlment of a required function and have access to information relevant to that function. Human redundancy comes into play when an error recovery process is entered. The potential for recovery leads to an increase in the overall human reliability associated with a given function. Human redundancy could be deWned in error recovery terms as the potential for human recovery by another of an error associated with a given operator function. This deWnition encompasses a wide range of recoveries from error. For the purposes of this paper, constraints are imposed on the foregoing deWnition, partly to restrict the range of error recoveries considered and to focus on the types of redundant relationships identiWed in the examples given in Section 1. The following deWnition is proposed: Human redundancy exists where there is support for concurrent human recovery by another of an error associated with a required operator function. The two constraints introduced into the new deWnition are, Wrst, support for error recovery and, second, a time constraint implied by the term ‘concurrent’. The use of the term ‘support’ in the deWnition implies a deliberate decision by the sociotechnical system designer to take advantage of the potential for recovery of an error made by an individual through the intervention of another. In other words, the support of human redundancy is a planned activity in design and the deWnition above leads to the possibility of sociotechnical system design strategies to promote eVective human redundancy. One manifestation of

658

D.M. Clarke / Safety Science 43 (2005) 655–677

such planned support is the speciWcation of a check in a procedure. This constraint is introduced into the deWnition because, for the sociotechnical system designer and risk analyst, planned redundancy is of interest but ‘fortuitous redundancy’ is not. The introduction of a time constraint through the use of the term ‘concurrent’ is more arbitrary. The term ‘concurrent’ implies that recovery must take place within a critical time interval (this can be interpreted as any time up to the end of the shift in which the erroneous interaction takes place). The result of including this time constraint in the deWnition is that human redundancy applies largely to individuals in a working unit concerned with the same task(s) at the same time, which is a useful boundary within which to restrict an analysis of error recovery by others (though other boundaries could be imposed) and corresponds to the examples of human redundancy of interest given in Section 1. Two examples of activities in large, complex systems that are outside the scope of this paper include checks of equipment carried out before installation and routine inspections. An important consequence of the time constraint in the deWnition is that human redundancy often applies to groups of individuals, who are familiar with each other and who work together. These circumstances make certain social factors more important with respect to human redundancy than with respect to other arrangements to recover errors where such conditions do not generally apply. Further implications of the deWnition above are: 1. the focus is on redundancy within human systems, such as operating teams, and not on recovery by operators of hardware and software failures; 2. human redundancy, as deWned, can be applied to a range of operator functions, including simple functions requiring actions such as operating a switch and more complex functions such as diagnosing a fault. The deWnition does not limit human redundancy to checks speciWed in procedures. Error recovery by another can take place in any mode of operation, including normal and emergency operation, while recognising that, as noted by Swain and Guttman (1983), the personnel dynamics in coping with an abnormal event diVer from those involved in routine checking during normal operation; 3. the requirement of planning is weaker than that of direction. In many instances, human redundancy can be implemented in such a way that the attention of a checker is directed to a speciWc human interaction associated with an operator function. For example, a supervisor may be directed through a procedure to check that a particular valve has been operated appropriately. Because of its eVects on a checker’s attention, direction has implications for error detection. The requirement of direction is not included in the deWnition because there are many circumstances in which recovery of an operator error through the intervention of another can occur without instructions of such speciWcity and it will be useful to consider these recoveries; 4. the deWnition of human redundancy makes no explicit reference to cognitive diversity, which may be present at any level in the redundant arrangement. It is noted that diversity can exist only within the context of redundancy; 5. the deWnition excludes recovery by the individual responsible for an error (selfrecovery); and 6. the association of planned support with human redundancy excludes error recovery that occurs as a result of some fortuitous feature of the sociotechnical system or of circumstances within it, such as the presence in a control room of an individual not formally part of the operating team.

D.M. Clarke / Safety Science 43 (2005) 655–677

659

In comparison with Swain and Guttmann’s deWnition of human redundancy, the concept of human redundancy described in this paper is more consistent with the fundamentals of redundancy in reliability theory. In the hardware domain, a narrow concept of redundancy does not exist and diversity can only exist within the context of redundancy (three components capable of fulWlling the same function may be redundant and identical, or redundant and diverse, for example). Using Swain and Guttmann’s concept, cognitive diversity appears to exist outside the context of human redundancy (for example, during emergency operation). The concept of human redundancy described in this paper is also more consistent with the broad treatment by social scientists of redundancy within organisations. Finally, it is noted that the concept of human redundancy in this paper is consistent with frequent usage of the term ‘redundancy’ in the context of groups of human operators concerned with the same function. For example, the captain and Wrst oYcer on the Xight-deck of a commercial airline are usually referred to as providing redundancy in a wide range of circumstances, many of which are outside the boundaries associated with Swain and Guttmann’s deWnition. Human redundancy can be implemented in many ways in sociotechnical systems. The manner of implementation, or form of human redundancy, encompasses characteristics such as: 1. the number of redundant individuals and their inter-relationships (such as supervisor– operator); 2. whether a redundant individual is active in the execution of a task or is available on call; 3. the degree of cognitive diversity present in a redundant arrangement of operators; and 4. whether or not redundancy is initiated through a procedural requirement for a check. Forms of human redundancy are described in the detailed presentation of the human redundancy framework in Section 3. 3. Human redundancy framework 3.1. Overview A schematic of the proposed human redundancy framework is shown in Fig. 1. The eVectiveness of human redundancy is inXuenced by its form in the sociotechnical system design and by the nature of the error that is to be recovered (a slip or mistake). If human redundancy is to result in successful recovery of an error and successful fulWlment of the required function, the redundant arrangement of operators must pass through some or all of the following stages of an error recovery process: 1. initiation of human redundancy, which is the formation of intent on the part of an individual(s) to conWrm the correct status of the system in some respect, or alternatively, to attempt detection of a potential error; 2. error detection; 3. error indication, which occurs when an individual who has detected an error brings this to the attention of another member(s) of the team; 4. explanation, or localisation, of the error; and 5. error correction.

660

D.M. Clarke / Safety Science 43 (2005) 655–677

Fig. 1. Schematic of the human redundancy framework.

Various local and organisational factors inXuence the likelihood of success of each stage of this error recovery process. The existence of cognitive diversity within the redundant arrangement of operators enhances the reliability of some, but not all, of the stages. The error recovery process shown in Fig. 1 is an adaptation, with respect to human redundancy, of general error recovery frameworks (detection, explanation and correction of an error). The stages of human redundancy initiation and error indication have been added. This paper focuses on error recovery issues that are most distinctive from a human redundancy perspective. With regard to the error recovery stages shown in Fig. 1, these are initiation of human redundancy, error detection and error indication. 3.2. Forms of human redundancy In hardware and software systems, redundancy can be implemented in several ways (Brand and Matthews, 1993; Leitch, 1988; Leveson, 1995). Similarly, diVerent implementations of human redundancy are possible. The objective in the following sub-sections is to develop a high-level description of the character of these forms. Such a description is essential because diVerent forms of human redundancy have diVerent implications for the error recovery process. DiVerent forms vary in respects such as failure modes, relevance of particular local and organisational factors, the implications of time constraints, and the general level of reliability that can be expected. To facilitate a discussion of human redundancy, this paper focuses on human systems. However, a group of individuals containing human redundancy, such as an operating team, exists within a wider sociotechnical system containing hardware, software, and organisational sub-systems, including other individuals and other groups. The interaction of a local human system with the rest of the sociotechnical system must therefore be represented in a framework describing human redundancy. This interaction is most apparent in the discussion in Section 3.5 of local and organisational factors inXuencing the eVectiveness of human redundancy and in Section 5, which addresses sociotechnical system design strategies.

D.M. Clarke / Safety Science 43 (2005) 655–677

661

3.2.1. Human redundancy structures In the context of hardware redundancy, Brand and Matthews (1993) describe the following redundancy structures: a single channel (no redundancy or diversity), redundant systems, diverse systems, and two diverse systems. These structures exhibit a general trend towards increased reliability. A similar approach to classiWcation of structures is useful in the context of human redundancy. A human redundancy structure may be deWned as a speciWc arrangement of individuals, interfacing with technical and organisational sub-systems, providing redundancy with respect to a required operator function. Examples of human redundancy structures are: 1. two redundant individuals (the simplest structure); 2. an operating team consisting of two or more operators and a supervisor; and 3. an operating team and another individual, such as the shift technical advisor employed in civil nuclear power plants or an independent observer. Human redundancy structures can change over time; for example, because of the arrival of additional personnel or re-arrangement of responsibilities following an abnormal event. 3.2.2. Active and standby human redundancy Through partial analogy with redundancy in hardware systems, active and standby forms of human redundancy can be identiWed. Active human redundancy requires that the individual fulWlling a redundant function is involved in the task at hand. The redundant individual will often be located in the operator’s immediate work environment, but can also be engaged in a task remotely. Commonly, an operator fulWls a function while another monitors the performance of that operator with respect to the required function. Less frequently, two operators carry out identical tasks to achieve the same function. Standby human redundancy exists when the redundant individual is not immediately involved in the task at hand, is typically not present in the operator’s immediate work environment, and must be called on when circumstances dictate. The initiation of standby human redundancy takes place when, on the occurrence of a pre-speciWed condition, an operator calls on the redundant individual and that individual becomes available. Once available, the redundant individual can then review past activities and contribute to future activities in ways dictated by written procedures and established operational practice. A redundant individual who is actively involved in carrying out some function, which for present purposes includes monitoring, is constantly aware at some level of detail of the system state. However, when a standby redundant individual is called on and becomes available, it may take some time for that individual to reach a satisfactory understanding of the state of the system necessary to respond appropriately, depending on the speciWcs of the scenario. Where the time constraint for action is severe, this diVerence between active and standby human redundancy could be important. On the other hand, dependence between the team and the redundant individual is lower because a standby redundant individual is typically not as fully integrated into the operating team as its other members. 3.2.3. Duplication, overlap and substitution Drawing on the work of other organisational theorists, Sagan (1993) identiWes duplication and overlap as two types of redundancy in organisations. Duplication exists where

662

D.M. Clarke / Safety Science 43 (2005) 655–677

two diVerent units (including human units) perform the same function or a reserve unit is available. Overlap refers to the existence of two units (including human units) with some functional areas in common. The duplication and overlap distinction is concerned with whether redundancy exists with respect to a function in whole or in part. DiVerent levels of redundancy in a human system as a whole can be achieved through more or less overlap of the responsibilities of the individuals in that system. In systems where operators are generalists, because they have rotated around diVerent jobs and received the appropriate training, substitution is possible (Perrow, 1999). Substitution occurs when one operator can stand in for another operator carrying out a diVerent job. While replacement of an individual responsible for an error may be required in certain rare scenarios, the principle relevance of substitution to human redundancy is that operators who are generalists have a greater awareness of the operation of the system as a whole and, in a given scenario, may be able to provide input to the response beyond their own narrow role. In Perrow’s terms, it is easier for an operator to be a generalist in linear systems than in complex systems where greater specialisation is demanded of operators. 3.2.4. Cognitive diversity Based on the work of Westerman et al. (1995, 1997), the following deWnition of cognitive diversity is proposed: Cognitive diversity is the availability of diVerent cognitive behaviours to fulWl a required function, where diVerences originate either within operators or within their environment. Cognitive diversity can be present within the various redundant arrangements of individuals that exist in sociotechnical systems and can be distributed in diVerent ways within them. For example, cognitive diversity can exist with respect to: 1. two or more individuals within a group; 2. between a group and another individual; and 3. between two checkers. The shift technical advisor in civil nuclear power plants provides an example of current use of diversity. The shift technical advisor provides redundancy with respect to the standard operating team when a response to an abnormal event is necessary. The level of redundancy so provided is greater with respect to high-level functions, such as situation assessment and the determination of an appropriate strategy to bring the plant to a safe state, than with respect to low-level functions, such as operation of speciWc panel controls. The shift technical advisor is standby redundant, provides partial redundancy with respect to the functions that must be fulWlled by the standard operating team as a whole, with emphasis on higher level and strategic decisions, and provides diversity by virtue of a diVerent perspective on the task together with a degree of autonomy. The progressively more complex structures of a single operator (no redundancy or diversity), a redundant human system, a diverse human system, and two diverse human systems may be expected to exhibit a general tendency to increased reliability. However, the reliability enhancement achieved from human redundancy in a given situation will be determined, not only by the high-level characteristics of a redundancy structure, but

D.M. Clarke / Safety Science 43 (2005) 655–677

663

also by other features of the particular arrangement and the environment in which it exists. High levels of diversity that are possible in principal are currently used only rarely in practice. A discussion of how the sociotechnical system designer can implement cognitive diversity is provided in Section 5.6. 3.2.5. Multi-modal redundancy structures While they are not the main concern of this paper, wider redundancy structures arise as a result of the co-existence of human, hardware and software sub-systems that act in concert to fulWl a function. Each type of sub-system can contain redundancy. Furthermore, a failure in one type of sub-system can be recovered through the action of another type of sub-system. An example of a multi-modal redundancy structure is provided by two redundant operators who make a decision on the basis of two redundant and diverse information sources, each consisting of hardware and software sub-systems. In sociotechnical systems that build extensive redundancy into hardware, software and all levels of the organisational hierarchy, such multi-modal redundancy structures can be very complex. 3.3. Error characteristics The nature of an error has an eVect on the nature and probability of recovery from that error (Embrey and Lucas, 1987). In the context of recovery of error by another through human redundancy, the processing causing a slip and that associated with its detection are independent. In practice, however, recovery from slips can be made more diYcult because of factors such as high workload and delayed feedback from the system (Reason and Embrey, 1985). The relationship between the processing causing a mistake and that associated with its detection by another depends on the degree to which the redundant individual was implicated in the original error. A redundant individual with a degree of independence from the original error provides a prime means of recovery from mistakes. Error type determines the relevance or importance of diVerent local and organisational factors to the error recovery process in the human redundancy framework. For example, common training and experience inhibit the recovery of mistakes in a way that is not paralleled in recovery of slips. Human redundancy, as deWned, is applicable to active errors but not latent errors. In the context of error recovery, latent errors can be addressed by other means, such as testing and inspection. Human redundancy is intended to reduce errors rather than violations. The inXuence of a redundant individual on violations is limited by factors such as acceptance of violations by supervisors or the organisation more widely. Violations can create situations that are unfamiliar (Reason, 1997). Unfamiliar situations can result in an increased probability of error and a reduction in the chances of error recovery by any means, including human redundancy. 3.4. Error recovery process The following sub-sections provide a characterisation of the error recovery process associated with human redundancy, applicable to any given form.

664

D.M. Clarke / Safety Science 43 (2005) 655–677

3.4.1. Human redundancy initiation Given that an error occurs and a redundant individual is available, or potentially available, successful recovery Wrst requires that human redundancy is initiated. Human redundancy is initiated when an individual forms the intention to carry out a check that is procedurally speciWed or normally carried out as a matter of good practice. When human redundancy is initiated in this way, initiation and error detection are distinct stages. In the absence of procedural speciWcation, initiation of human redundancy can occur as a response to information in the environment. When human redundancy is initiated in this way, initiation may be followed by error detection or, alternatively, initiation and detection may be simultaneous. When it is a distinct stage and the explicit requirement for a check is absent, initiation can occur before detection if an individual comes to suspect that an error has been made by another on the basis of information from the environment. Initiation of human redundancy, whether or not it is procedurally speciWed, can therefore be thought of as the formation of intent to conWrm the correct status of the system in some respect, or alternatively, to attempt detection of a potential error. 3.4.2. Error detection Once human redundancy has been initiated, the redundant individual must successfully detect the error. As noted in Section 3.4.1, it is possible that initiation of human redundancy and error detection occur simultaneously. To achieve error detection with respect to the work of another, the detector must have knowledge about the operator function concerned or have expectations about outcome, and must understand the possible goals associated with the observed behaviour (Hutchins, 1995). Certain errors may be more diYcult to detect when a function has been executed because evidence of failure may be less readily available or be hidden. 3.4.3. Error indication When an individual detects an error, it will often be necessary for this Wnding to be indicated to another member(s) of the operating team (Sasou and Reason, 1999). However, the communication by an individual to others that an error has occurred is not suYcient in itself to initiate activities to recover an error. If the error is ultimately to be corrected, it is necessary that other members of the team accept this indication. Others in the team need not necessarily accept an indication as a deWnitive statement that an error has occurred (though in some cases this may be clear immediately). The indication must at least be taken seriously and regarded as highlighting that an error may have occurred and that further attention to the relevant part of the task is necessary to determine whether or not an error has, in fact, occurred. 3.4.4. Error explanation If an error has been indicated it must be explained, or localised. In some situations the way the error has been detected will lead to simultaneous explanation of the error. For example, if a supervisor observes that an operator has selected the wrong switch, then error detection and localisation are simultaneous. In other situations, for example when symptoms of an error have been detected at the outcome stage through observation of plant indications, localisation of the error may be more diYcult. Symptoms of an error may be misinterpreted as signs of equipment failure.

D.M. Clarke / Safety Science 43 (2005) 655–677

665

3.4.5. Error correction An error must then be corrected. Correction is sometimes possible without error explanation (Kontogiannis, 1999). For example, in a nuclear power plant an attempt to align a heat sink that fails because of human error may be followed by the alignment of an alternative heat sink without the identiWcation of the error causing the Wrst heat sink to fail necessarily taking place. 3.4.6. Task planning, execution and outcome A human interaction passes through the three stages of planning, execution and outcome. An error can occur and be recovered at any of these stages (Kontogiannis, 1999). For example, in the context of human redundancy and considering error detection, an error may be detected in the planning stage if a redundant individual correctly understands the erroneous plan and can compare it to the correct plan. 3.4.7. Variations on the general framework Fig. 1 includes a general representation of the error recovery process associated with human redundancy implemented in any given form. Variations in the precise sequence of events will arise depending on conditions such as whether human redundancy is active or standby, whether an error is unambiguously detected or merely suspected, and who detects or suspects an error. Variations in the importance of a given stage of the error recovery process and the relevance of local and organisational factors will arise depending on conditions such as whether initiation of human redundancy and error detection are distinct stages or whether they occur simultaneously. Depending on the initial human redundancy structure and the changes in the structure that take place with time, the level and character of redundancy can vary during the error recovery process. For example, the arrival of additional personnel following detection of an error can lead to increased redundancy during error explanation and correction. 3.5. Local and organisational factors inXuencing human redundancy The several stages of the error recovery process in the human redundancy framework are subject to the inXuence of a large number of factors in the local workplace and the organisation as a whole. Almost any factor that can inXuence human performance may inXuence human redundancy. Therefore, an exhaustive identiWcation of factors is not feasible. A selection of factors of particular signiWcance is listed below (1Swain and Guttman, 1983; 2Sasou and Reason, 1999; #identiWed in the context of normal operation; identiWed in the context of emergency operation) and several observations are made concerning these factors. These observations contribute to a characterisation of human redundancy and have important implications for sociotechnical system design. 1. 2. 3. 4. 5. 6.

Procedural speciWcation used.1,# Presence of check-oV provisions.1,# Presence of alerting factors.1,# Degree of interpretation required to detect an error.1,# EVect of operator action on personal safety of checker.1,# Active/passive checks.1,#

666

7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

D.M. Clarke / Safety Science 43 (2005) 655–677

Checker’s familiarity with the operator.1,#,

Checker’s knowledge of operator’s technical level.1,# Over-trusting of operators by superiors.2,# Excessive professional courtesy between individuals of similar rank.2,#,

Stress (which increases dependence of less experienced staV on their superiors).1,

Perceived safety signiWcance of the scenario.1,

Relative status of operators.1,

Excessive authority gradient.2,

DeWciencies in resource.2,

DeWciencies in resource management.2,

Observations concerning these factors are: 1. Factors that inXuence the eVectiveness of human redundancy are diverse in character. Further to the characterisation given so far, this indicates that human redundancy is a complex phenomenon and that sociotechnical system design measures to promote human redundancy must be wide-ranging. Therefore, there is no ‘easy Wx’ to ensure that human redundancy is eVective. For similar reasons, human redundancy is not an easy option to achieve error control. Multi-modal redundancy structures are potentially subject to a still wider range of factors. 2. The factors do not, in general, uniformly inXuence the several stages of the error recovery processes shown in Fig. 1 but rather act on speciWc stages. However, certain factors are relevant to more than one stage of the human redundancy framework. An example is familiarity, which can lead a redundant individual to omit a check (failure of initiation) because of conWdence in the individual concerned, or, following successful initiation of human redundancy, can lead to a ‘perceptual set’ that results in failure to detect an error. 3. Many social factors inXuence human redundancy, indicating that serious attention to social interaction is an essential task for the sociotechnical system designer. 4. Certain factors inXuence assumptions and predictions made by an individual, such as a supervisor, about others. 5. Certain factors inXuence whether communication takes place between a detector of an error and others, conWrming the key role of communication in human redundancy. Communication required to recover an error can be socially inhibited. 6. As noted earlier, the reliability with which a human interaction is carried out in the Wrst place contributes to the overall reliability of a redundant arrangement. An operator’s knowledge that his or her work is subject to human redundancy may inXuence the reliability with which an interaction is carried out. Social factors may therefore be important in both directions with respect to two or more redundant individuals. Where human redundancy is implemented, the potential for reduction in the human reliability associated with a human interaction is a concern for both the sociotechnical system designer and risk analyst. 7. Some factors vary in their relevance and importance with respect to normal and emergency operation. For example, stress is an important inXuence during emergency operation. One of the eVects of stress is to increase the dependence of junior operators on senior operators. 8. The issue of responsibility is important. A redundant individual must be suYciently aware of his or her responsibilities, in addition to being adequately motivated and assertive,

D.M. Clarke / Safety Science 43 (2005) 655–677

667

in order to detect and bring to light an error. To further illustrate the salience of this issue, if the responsibilities of individuals are unclear, multiple redundant checks can be less eVective in comparison with a single person made responsible for a self-check (Health and Safety Executive, 2001). 4. Failures of redundant arrangements in human systems The human redundancy framework presented in Section 3 provides a basis for analysis of failures of redundant arrangements in human systems with respect to a given operator function. Such analysis is of practical value to the sociotechnical system designer and the risk analyst. The failure modes associated with redundant arrangements in human systems can be analysed by applying error classiWcations (errors of omission and commission; slips, lapses and mistakes) and other useful distinctions (independent and dependent failures; complete and partial failures). Such analysis can be carried out for diVerent forms of human redundancy. If an analysis is conducted for a redundant arrangement in the form of an operator and a supervisor, where the supervisor is required to carry out a procedurally speciWed check, several useful conclusions can be drawn, including: 1. the potential for failure exists at each of the several stages of the human redundancy framework; 2. the character and likelihood of failure associated with a given stage varies with error type, local and organisational factors, and the character and degree of cognitive diversity present; 3. stages of the human redundancy process can fail through independent failures or dependent failures. For example, a check can be omitted because of an omission of a step from a procedure (independent failure) or because of a high degree of expectancy that the operator function has been adequately fulWlled (person-person dependent failure). A further example is provided by the error detection stage. While human redundancy may have been successfully initiated, a checker may carry out the wrong check or carry out the correct check wrongly (independent failures). Alternatively, a checker may fail to perceive an error because of a belief in the competence of a colleague (person-person dependent failure). Other types of dependent failures can lead to failure of human redundancy, including failures arising from common factors integral to fulWlment of the operator function, such as quality of the interface, or common factors peripheral to the task, such as training; and 4. human redundancy can fail through violations. For example, a procedurally speciWed check can be omitted because of a deliberate violation. Further to the foregoing analysis of a particular form of human redundancy, if the failures associated with diVerent forms of human redundancy are considered, it is found that diVerent forms are subject to diVerent failures. 5. Sociotechnical system design strategies The human redundancy framework presented in Section 3 provides a basis for sociotechnical system design strategies to make use of and eVectively support human redundancy,

668

D.M. Clarke / Safety Science 43 (2005) 655–677

through interventions in the areas of technical system design, interface design, training, management, and development and maintenance of cultural attributes. While a comprehensive discussion of design strategies to support human redundancy is beyond the scope of this paper, brief guidelines are presented in Sections 5.1–5.8, focusing on certain key issues. These include the importance of operational context, the selection of appropriate human redundancy forms, the support of human redundancy forms through attention to local and organisational factors and the use of cognitive diversity, and the cost of redundancy. 5.1. Operational context The appropriate form of human redundancy and the relevance of associated local and organisational factors are inXuenced by the operational context in which operators must work. For example, if novel and unpredictable disturbances are encountered during operation, a team design is required that places greater emphasis on redundancy than a design appropriate for routine operation, other desirable features including robustness, review and reallocation (Schraagen and Rasker, 2004). Arrangements to achieve transitions between diVerent human redundancy structures associated with diVerent operating modes must be considered. IdentiWcation of the operational context is therefore a Wrst step in design to support human redundancy. 5.2. Human redundancy structures The concept of human redundancy structures can be used early in the design process when making high-level decisions about the implementation of human redundancy. When assigning a structure, factors to take into account include: 1. the level of human reliability required; 2. constraints associated with the use of increasing numbers of people in the system in connection with a single function. These include constraints associated with the physical system (such as space), resources required to generate high levels of cognitive diversity, and the costs associated with additional interfacing hardware and software; 3. dependence eVects in either simple or extended redundancy structures; and 4. there must be suYcient resource for operators to carry out their individual tasks and to devote to teamwork (Schraagen and Rasker, 2004), including recovery of the errors of others. 5.3. Active and standby human redundancy Active human redundancy, particularly where a function is fully duplicated, is demanding of resource. However, an individual fulWlling a monitoring function can monitor more than one operator and carry out other duties, subject to constraints on total workload. Some hazardous activities justify active human redundancy in the form of a dedicated independent observer. An analysis of failures of standby redundancy based on the human redundancy framework presented in Section 3 shows that for standby human redundancy to be an appropriate system design strategy, there must be an expectation that the following conditions can be met:

D.M. Clarke / Safety Science 43 (2005) 655–677

669

1. the operator must be able to detect a condition that requires the redundant individual to be summoned; 2. the operator must summon the redundant individual (or some other mechanism for alerting the redundant individual must be in place); 3. the redundant individual must be available; and 4. there must be suYcient time available for the redundant individual to reach the physical location of the operator, if required, and explain and correct the error in time. The complexity and time demands of anticipated scenarios and potential access problems following certain abnormal events inXuence whether these requirements can be satisWed. Workload and attitudes to reliability of both actively and standby redundant individuals must be such that fulWlment of a redundant function is not omitted for reasons of low perceived priority or overload. In the context of standby human redundancy, a question arises regarding the extent to which decisions should be made or inXuenced by individuals more or less remote from the execution of a function. Research in high reliability organisations (Roberts, 1989; La Porte and Consolini, 1991; Weick, 1987) suggests that when problems arise, rapid and eVective responses are facilitated by delegation, decentralisation of decision-making authority to local areas where the problem exists, and deferral to individuals at lower levels of the organisational hierarchy. 5.4. Duplication and overlap of functions Hutchins (1995) observes that because of the way people pass through the system, progressing from less to more demanding roles, knowledge of the entry-level tasks is represented more redundantly than the expert-level tasks. This has implications for the performance of the normal operating team when carrying out complex functions requiring expert knowledge. The relatively low level of redundancy with respect to a high level function can be oVset through the use of personnel outside the normal operating team to provide redundant and diverse expert knowledge when required. While high levels of overlap and duplication of functions confer greater levels of redundancy, they also give rise to costs, for example personnel and training costs. Rasker and Willeboordse (2001) cited in Schraagen and Rasker (2004) suggest that the costs of redundancy in teams can be minimised by assigning to each team member one other team member who can support the Wrst team member when needed, creating dyads in the organisation. When further redundancy is required, this approach can be extended to create ‘dyads of dyads’. Redundancy through duplication or overlap of function depends on appropriate distributions of access to information and knowledge. 5.5. Local and organisational factors As discussed in Section 3.5, a diverse range of local and organisational factors inXuence the eVectiveness of human redundancy. A major task of the designer is therefore to support human redundancy through attention to these factors with respect to a given form of human redundancy. The relevance of several factors has been highlighted above. Further examples are discussed below.

670

D.M. Clarke / Safety Science 43 (2005) 655–677

Interface design and communication can prompt the initiation of human redundancy in the absence of procedurally speciWed checks. The system may be designed so as to bring appropriate information to the attention of a co-worker (Hutchins, 1995; Rognin and Blanquart, 1999). For example, an important control can be designed and positioned such that its operation is clearly visible to operators other than the individual operating it, or the indicated value of an important parameter can be made prominent. Another way of supporting initiation of human redundancy is to require operators to verbalise actions shortly before or as they are carried out. Rognin and Blanquart (1999) describe the work of air traYc controllers who work closely together as a co-located, two-person team. The controllers have access to shared information by virtue of the design of the system. Co-location provides opportunities for communication through verbal messages, written messages and gestures. One controller can hear verbal messages between the second controller and other individuals in the system, such as pilots. Actions on artefacts provide information about what the actor is doing. These and other features of air traYc control permit the controllers to achieve mutual awareness, which supports error recovery and contributes to system dependability. To promote error recovery, Kontogiannis (1999) identiWes the desirable characteristics of interface transparency to facilitate error detection through immediate feedback, and traceability to help operators trace the causes of errors. 5.6. Achieving cognitive diversity Cognitive diversity arises from the availability of diVerent cognitive behaviours originating within operators or their environment and can be distributed throughout a human redundancy structure in diVerent ways. Dependent failures arising from cognitive errors can be reduced through the application of cognitive diversity. Physical diversity is relevant to some functions but it is not addressed here. Diversity is not an ‘all or nothing’ characteristic, but can be present within a system at diVerent levels. Westerman et al. (1995, 1997) hypothesise that the cognitive demands placed on engineers by task environments vary such that, for a given environment, some speciWc types of error are detected more readily than others. Task environments may be diverse with respect to speciWed methods, equipment (hardware and software), and personnel with which an individual must interact to carry out a given task. A comparison of errors in independent code inspection and independent functional testing showed evidence of diVerential sensitivity to diVerent error types and, hence, cognitive diversity between the two strategies. This result could be explained by diVerent mechanisms. DiVerential sensitivity could originate in either diVerent types of cognitive processing or diVerent levels of cognitive behaviour demanded by the two approaches. Beltracchi and Lindsay (1992) discuss the latter source of diversity in the context of operation and maintenance. Human performance can be classiWed into skill-, rule- and knowledge-based levels (Rasmussen, 1983; Reason, 1990). Beltracchi and Lindsay suggest that in critical applications, to reduce common mode failures, a task and the corresponding check should be speciWed such that they make use of diVerent levels of cognitive behaviour. For example, if an operator carries out a task at the rule-based level, a supervisor can refer to information that provides a basis for deducing the progress of operations at the knowledge-based level of performance. Hollywell (1993, 1996) identiWes a variety of uses of diversity to defend against coupling mechanisms associated with human error dependent failures. Diversity within individuals

D.M. Clarke / Safety Science 43 (2005) 655–677

671

and the environment is discussed, which makes use of the potential for diversity in the areas of information, equipment, intrinsic human capabilities, skill and organisation. Given the variety of human redundancy forms and the diVerent ways cognitive diversity may be implemented and distributed within them, there are, in principal, many options available to the sociotechnical system designer to enhance human reliability. However, while diversity is a key strategy for increasing the reliability enhancement oVered by redundancy, it has its limitations. First, improvements are constrained by dependencies. Second, cognitive diversity does not act on every stage of the human redundancy framework and aVects neither independent failures associated with error recovery nor dependencies that arise for social reasons. For example, contributions that might be made by operators as a result of cognitive diversity can be stiXed through an excessive authority gradient. Third, Perrow (1999) identiWes certain ‘diYcult-to-understand’ scenarios that can arise in complex, tightly coupled systems. The cognitive demands of such scenarios may be beyond the capacities of a group of operators in which cognitive diversity exists, even at a high level. Part of the solution to this problem is to reduce complexity and coupling in the system, thereby reducing demands on operators (a physical design solution), rather than attempting to Wght complexity with complexity through the use of cognitive diversity alone. Fourth, some of the options for implementing cognitive diversity available to the sociotechnical system designer in principle are limited by practical constraints and by conXicts with other design goals. In summary, there is empirical evidence to support the notion that cognitive diversity can be eVective in increasing the utility of human redundancy in the context of operation and maintenance. However, the limitations of cognitive diversity must be recognised. In principle, many ways of implementing cognitive diversity are available, although the options are reduced by practical constraints. Therefore, cognitive diversity is a useful, but only partial, solution to promoting eVective human redundancy. 5.7. Team design and human redundancy Poor teamwork has been implicated in several well-publicised disasters, including Three Mile Island and the Tenerife runway collision (Urban et al., 1995). The following examples, which emphasise social issues, illustrate interactions between design for human redundancy and team design. 1. Leadership behaviour: Stewart et al. (1999) identify several leadership behaviours that can exist in teams. “Overpowering leadership” is characterised by an active and autocratic approach in which the leader sees his or her way as the right way to accomplish things. While this leadership behaviour can be appropriate in certain circumstances, it may be inconsistent with eVective use of redundancy within teams. 2. Shared mental models: these are organised knowledge structures that allow team members to describe, explain, and predict teamwork demands (Schraagen and Rasker, 2004). Team members share an understanding of the team’s goals and are aware of each other’s roles and responsibilities. Shared mental models facilitate teamwork, including error recovery through human redundancy, and can be developed through training. 3. Social biases: to the extent that biases derived from social interaction within groups are applicable to teams (Jones and Roelofsma, 2000), such biases may be inimical to eVective use of human redundancy. For example, in groups susceptible to groupthink

672

D.M. Clarke / Safety Science 43 (2005) 655–677

(Janis, 1972), achieving consensus becomes more important than the quality of the decision itself and strong pressures to conform exist. 4. Interpersonal competences: the interpersonal competences of team members, which include respect for the viewpoints of others, inXuences team eVectiveness (Medsker and Campion, 1997) and may inXuence social interactions associated with human redundancy. Criteria related to these competences can be used when selecting team members. 5. Team structure: speciWcation of human redundancy structures and design to support their eVective functioning interacts with the design of team structures for performance more widely. For example, Salas and Fiore (2004) suggest that the structure of a team (including lines of authority and divisions of tasks and responsibilities) inXuences the need for coordination among team members and the need for mutual awareness, which link to team performance. As noted in Section 5.5, mutual awareness supports error recovery. 5.8. System design process The perspective in this paper demonstrates that human redundancy must be addressed throughout the design process associated with a new system and not simply used at the end of a project as an attempt to rectify reliability shortfalls. Similarly, when error reduction is required in an existing system, a structured and wide-ranging design activity must be undertaken to optimise the beneWts to reliability of human redundancy, rather than, for example, cursory speciWcation of checks. Opportunities to reduce human error arise in several ways, including during the design of a new system, on implementation of an error management programme, and following an evaluation or audit of factors aVecting error potential (Embrey et al., 1994). The design activities necessary to achieve eVective use of human redundancy are identiWed in Fig. 2. In practice, iterations are required. These activities can be supported through use of established techniques such as task analysis. The designer must also be aware of practical issues, such as cost-beneWt, that place limits on particular forms of implementation. 6. Future research Empirical studies using accident and near-miss reports should be carried out to reWne and validate the proposed human redundancy framework. Such studies must be carried out at a Wner level of detail than previously achieved. For example, it is not suYcient to identify a given local or organisational factor as merely relevant to error recovery through human redundancy. It is necessary to consider factors in relation to speciWc characteristics of an operational situation, including stages of the error recovery process, diVerent error types, and diVerent modes of operation. For example, a given factor may inXuence one particular stage of the error recovery process but not others. A more comprehensive set of data concerning local and organisational factors is required to feed into the design process. Future work should address various human redundancy forms, including both simple and complex human redundancy structures. Details of the action of human redundancy in extended, complex structures are of interest and will require consideration of multi-modal redundancy structures. An advantage of addressing multi-modal redundancy structures is that the design of hardware, software and human sub-systems can be considered concurrently, including the interactions between them. For example, diverse hardware systems

D.M. Clarke / Safety Science 43 (2005) 655–677

673

Fig. 2. Design activities to implement human redundancy.

can beneWt human redundancy by introducing cognitive diversity into the environment. Furthermore, the identiWcation of dependencies across related redundancy structures is facilitated. While most of the human factors literature is concerned with the individual, the relevant unit in a hazardous system is often a group. To understand how a sociotechnical system works, and to design it appropriately, it is necessary to consider the dynamics of groups of diVerent sizes (Westrum, 1997). Therefore, further attention to social factors, which are known to be important with respect to human redundancy, is required. It may be useful to

674

D.M. Clarke / Safety Science 43 (2005) 655–677

examine human redundancy in diVerent types of teams, characterised using suitable dimensions. Besides a qualitative understanding of human redundancy, also of interest are the relative and absolute quantitative reliability enhancements arising from the use of diVerent forms of human redundancy (including cognitive diversity implemented in diVerent ways) and from the inXuence of local and organisational factors. Part of the treatment of human redundancy in this paper is based on the dominant cognitivist model of human error (other parts including the identiWcation of organisational structures, the treatment of social factors, and the adaptation of concepts from hardware redundancy). Davies et al. (2003) suggest that approaches other than the cognitivist model may be useful in human error studies, including the approach of distributed cognition (Hutchins, 1995, 2000; Rogers, 1997). In distributed cognition, the unit of analysis for cognition is broadened beyond individuals and accounts for complex, socially distributed cognitive activities. Rather than being located solely in an individual’s head, cognition is regarded as being distributed across individuals and the environment. Cognitive, social and organisational perspectives are combined in the approach, which describes the propagation of representational states across media both internal and external to individuals (human memories, charts, written notes and so forth). One of the general properties of cognitive systems from a distributed cognition perspective is that knowledge within a cognitive system is variable and redundant. Also, pathways in the system through which information Xows can be redundant (Hutchins and Klausen, 2000). Means by which errors can be detected include the comparison of independently computed representations. Therefore, distributed cognition may be particularly well suited to describing human redundancy and could provide insights that complement those presented in this paper. At its current level of development, the human redundancy framework presented in this paper suggests certain sociotechnical system design strategies to make eVective use of human redundancy in controlling errors. The further development of the framework will amplify its utility in this respect. The validity of interventions to promote the eVectiveness of human redundancy must be established. Interactions exist between design for human redundancy and other design objectives. For example, a fundamental constraint on the use of cognitive diversity is the tension between, on the one hand, the need in an organisation for conformity and commonality in procedures, hardware and people (Turner and Pidgeon, 1997; Hickling, 1997) and, on the other, the need for diversity to deal with the complexities of the environment. Approaches to resolution of this and other constraints on design for human redundancy arising from conXicting requirements must be sought. There are important practical questions in connection with implementation, such as cost-beneWt, that should be investigated. In the context of the present work, the ability to achieve a balance between error prevention and error recovery in a rigorous manner, which includes cost-beneWt issues, is worthy of study. 7. Conclusions A concept of human redundancy was proposed, including a description of human redundancy forms that incorporated the notion of human redundancy structures, active and standby human redundancy, duplication and overlap of functions, and cognitive diversity. This characterisation of human redundancy was integrated with an adapted error recovery process. The foregoing approach has several advantages. First, the framework as

D.M. Clarke / Safety Science 43 (2005) 655–677

675

a whole provides a basis for sociotechnical system design strategies to promote human redundancy in complex, hazardous systems. Second, the framework can be used in qualitative human reliability assessment of failures of human redundancy. Third, forms of human redundancy provide a way of structuring and describing the organisational initial conditions for error recovery. These initial conditions have a major inXuence on the way in which error recovery through human redundancy takes place and its likelihood of success. Fourth, the concept of cognitive diversity is brought to the fore and integrated into the error recovery process. Fifth, multi-modal redundancy structures can be described which provide, for example, a means of tying together recovery of failures in hardware, software and human sub-systems with respect to a given function. Research in safety science concerned with error recovery is still at an early stage and, in particular, much remains to be done in progressing our understanding of human redundancy in complex systems. The task of supporting human redundancy requires attention to many issues connected with forms of redundancy, error characteristics, and error recovery processes, including a wide range of local and organisational factors. Key issues in the future development of theory and sociotechnical design strategies are social interactions (including communication in and beyond local teams), resource characteristics, resource management, the support of mutual awareness of operators, the use of cognitive diversity (including diVerent types and distributions), the quantitative inXuence of diVerent characteristics of human redundancy forms and of local and organisational factors, and in-depth integrated analysis of multi-modal redundancy structures. The results will contribute to more eVective error management, including both improved error recovery and, in a wider context, the achievement of an optimum balance in risk management between error prevention, error recovery and automation. Acknowledgement I would like to thank Ned Hickling and an anonymous reviewer, for helpful comments on this paper. References Beltracchi, L., Lindsay, R.W., 1992. A strategy for minimizing common mode human error in executing critical functions and tasks. Power Plant Dynamics, Control and Testing Symposium, May 27–29, Tenessee. Brand, V.P., Matthews, R.H., 1993. Dependent failures—when it all goes wrong at once. Nuclear Energy 32 (3). Davies, J., Ross, A., Wallace, B., Wright, L., 2003. Safety Management: A Qualitative Systems Approach. Taylor & Francis, London. Embrey, D., Kontogiannis, T., Green, M., 1994. Guidelines for Preventing Human Error in Process Safety. American Institute of Chemical Engineers, New York. Embrey, D.E., Lucas, D.A., 1987. The nature of recovery from error. In: Goosens, L.H.J. (Ed.), Human Recovery: Proceedings of the COST A1 Seminar on Risk Analysis and Human Error. Delft University of Technology, Delft. Health and Safety Executive (Society’s 1993 Annual Conference, April 13–16, Edinburgh, Scotland), 2001. Preventing the Propagation of Error and Misplaced Reliance on Faulty Systems: A Guide to Human Error Dependency, OVshore Technology Report 2001/053, HSE Books, Sudbury. Hickling, E.M., 1997. Towards the ClariWcation of Human Dependence, COPSA’97, 7–9th October. Herriot Watt University, Edinburgh. Hollywell, P.D., 1993. Human dependent failures: a schema and taxonomy of behaviour, Contemporary Ergonomics 1993. In: Lovesey, E.J. (Ed.), Proceedings of the Ergonomics Society’s 1993 Annual Conference, 13–16 April, Edinburgh, Scotland. Taylor & Francis, London.

676

D.M. Clarke / Safety Science 43 (2005) 655–677

Hollywell, P.D., 1996. Incorporating human dependent failures in risk assessments to improve estimates of actual risk. Safety Science 22 (1–3), 177–194. Hutchins, E., 1995. Cognition in the Wild, Cambridge. The MIT Press, Massachusetts. Hutchins, E., 2000. Distributed cognition. Available from: . Hutchins, E., Klausen, T., 2000. Distributed cognition in an airline cockpit. Available from: . Janis, I.L., 1972. Victims of Groupthink. Houghton MiZin, Boston. Jones, P.E., Roelofsma, H.M.P., 2000. The potential for social contextual and group biases in team decisionmaking: biases, conditions and psychological mechanisms. Ergonomics 43 (8), 1129–1152. Kanse, L., Van der Schaaf, T.V., 2001. Recovery from failures in the chemical process industry. International Journal of Cognitive Ergonomics 5 (3), 199–211. Kirwan, B., 1994. A Guide to Practical Human Reliability Assessment. Taylor & Francis, London. Kontogiannis, T., 1997. A framework for the analysis of cognitive reliability in complex systems: a recovery centred approach. Reliability Engineering and System Safety 58, 233–248. Kontogiannis, T., 1999. User strategies in recovering from errors in man–machine systems. Safety Science 32, 49–68. La Porte, T., Consolini, P., 1991. Working in practice but not in theory: theoretical challenges of ‘high-reliability organisations’. Journal of Public Administration Research and Theory 1 (1), 19–47. Leitch, R.D., 1988. BASIC Reliability Engineering Analysis. Butterworths, London. Leveson, N.G., 1995. Safeware: System Safety and Computers. Addison–Wesley Publishing Company, Reading. Medsker, G.J., Campion, M.A., 1997. Job and team design. In: Salvendy, G. (Ed.), Handbook of Human Factors and Ergonomics, second ed. John Wiley & Sons, New York, pp. 450–489. Perrow, C., 1999. Normal Accidents: Living with High-Risk Technologies. Princeton University Press, Princeton. Rasker, P.C., Willeboordse, E.W., 2001. Werkafspraken in de commandocentrale [Work arrangements in the Combat Information Centre], Report no. TM-01-A002, Soesterberg: TNO Human Factors. Cited in Schraagen, J.M. and Rasker, P., 2004. Team design, in: Hollnagel, E. (Ed.). Handbook of Cognitive Task Design, Lawrence Erlbaum Associates, London, pp. 753–786. Rasmussen, J., 1983. Skills, rules, knowledge: signals, signs and symbols and other distinctions in human performance models. IEEE Transactions: Systems, Man and Cybernetics, 1983 SMC-13, 257–267. Reason, J., 1990. Human Error. Cambridge University Press, Cambridge. Reason, J., 1997. Managing the Risks of Organizational Accidents. Ashgate, Aldershot. Reason, J.T., Embrey, D.E., 1985. Human Factors Principles Relevant to the Modelling of Human Errors in Abnormal Conditions of Nuclear and Major Hazardous Installations, Prepared for the European Atomic Energy Community. Roberts, K.H., 1989. New challenges in organisational research: high reliability organisations. Industrial Crisis Quarterly 3 (2), 111–125. Rogers, Y., 1997. A brief introduction to distributed cognition. Available from: . Rognin, L., Blanquart, J.-P., 1999. Impact of communication on systems dependability: human factors perspectives. In: Felici, M., Kanoun, K., Pasquini, A. (Eds.), Proceedings of SAFECOMP’99, 27–29 September, Toulouse. Springer-Verlag, Berlin, pp. 113–124. Sagan, S.D., 1993. Limits of Safety: Organisations, Accidents, and Nuclear Weapons. Princeton University Press, Princeton, NJ. Salas, E., Fiore, S.M., 2004. Team Cognition: Understanding the Factors that Drive Process and Performance. American Psychological Association, Washington, DC. Sasou, K., Reason, J., 1999. Team errors: deWnition and taxonomy. Reliability Engineering and System Safety 65, 1–9. Schraagen, J.M., Rasker, P., 2004. Team design. In: Hollnagel, E. (Ed.), Handbook of Cognitive Task Design. Lawrence Erlbaum Associates, London, pp. 753–786. Stewart, G.L., Manz, C.C., Sims Jr., H.P., 1999. Team Work and Group Dynamics. John Wiley & Sons, New York. Swain, A.D., Guttman, H.E., 1983. Handbook of Human Reliability with Emphasis on Nuclear Power Plant Applications, NUREG/CR-1278. US Nuclear Regulatory Commission, Washington, DC. Turner, B.A., Pidgeon, N.F., 1997. Man-Made Disasters, second ed. Butterworth-Heinemann, Oxford. Urban, J.M., Bowers, C.A., Cannon-Bowers, J.A., Salas, E., 1995. The importance of team architecture in understanding team processes. In: Advances in Interdisciplinary Studies of Work Teams. In: Beyerlein, M.M., Johnson, D.A., Beyerlein, S.T. (Eds.), Knowledge in Work Teams, vol. 3. JAI Press, London, pp. 205–228.

D.M. Clarke / Safety Science 43 (2005) 655–677

677

Van der Schaaf, T.W., Kanse, L., 2000. Errors and error recovery. In: Elzer, P.F., Kluwe, R.H., BoussoVara, B. (Eds.), Human Error and System Design and Management. Springer-Verlag, London, pp. 27–38. Villemeur, A., 1992. Reliability, Availability, Maintainability and Safety Assessment (vol. 2). John Wiley & Sons, Chichester. Weick, K.E., 1987. Organizational culture as a source of high reliability. California Management review 24, 112– 127. Westerman, S.J., Shryane, N.M., Crawshaw, C.M., Hockey, G.R.J., Wyatt-Millington, C.W., 1995. Cognitive diversity: a structured approach to trapping human error. In: Safecomp’95, Proceedings of the 14th International Conference on Computer Safety, Reliability and Security. Springer, London, pp. 142–155. Westerman, S.J., Shryane, N.M., Crawshaw, C.M., Hockey, G.R.J., 1997. Engineering cognitive diversity. In: Redmill, F., Anderson, T. (Eds.), Safer Systems: Proceedings of the Fifth Safety-Critical Systems Symposium. Springer, London, pp. 111–120. Westrum, R., 1997. Social factors in safety–critical systems. In: Redmill, F., Rajan, J. (Eds.), Human Factors in Safety–Critical Systems. Butterworth-Heinemann, Oxford, pp. 233–256. Zapf, D., Reason, J.T., 1994. Introduction: Human errors and error handling. Applied Psychology: An International Review 43 (4), 427–432.