ARTICLE IN PRESS International Journal of Industrial Ergonomics 38 (2008) 1067– 1077
Contents lists available at ScienceDirect
International Journal of Industrial Ergonomics journal homepage: www.elsevier.com/locate/ergon
An algorithm for classifying error types of front-line workers based on the SRK framework Tarcisio Abreu Saurin , Lia Buarque de Macedo Guimara˜es, Marcelo Fabiano Costella, Lucimara Ballardin Industrial Engineering and Transportation Department, Federal University of Rio Grande do Sul, Av. Osvaldo Aranha no. 99, 51 andar, Porto Alegre, RS, CEP 90035-190, Brazil
a r t i c l e in fo
abstract
Article history: Received 28 August 2007 Received in revised form 12 January 2008 Accepted 28 February 2008 Available online 21 April 2008
Although there are many classifications of error types, literature provides little guidance on how to systematically classify an event into the proposed error type categories. This research introduces an algorithm for classifying error types of front-line workers involved in occupational incidents, based on the skill–rule–knowledge (SRK) framework. While the original version of the algorithm was tested in a heavy machinery manufacturer (study 1), in which 36 accidents were analyzed, an improved version was tested at an oil distribution company (study 2), in which the analysis encompassed 20 accidents and 14 near misses. The resulting distribution of error types in both studies 1 and 2 was respectively as follows: slips (42% and 12.2%); memory lapses (0% and 2.4%); violations (17% and 7.3%); knowledge-based errors (11% and 0%); and no worker error (30% and 78.1%). The incident causes attributed by both companies’ safety staffs were re-classified based on a uniform terminology and then they were associated with the sub-systems of a socio-technical system. The results of this analysis for both studies 1 and 2 were respectively as follows: technological sub-system (37.3% and 42.8% of all causes); work design subsystem (31.4% and 38.1% of all causes); personnel sub-system (31.4% and 19.0% of all causes). The external environment sub-system was not associated with any cause probably because it was ignored during the original investigations conducted by the safety staff of the enterprises. In study 2, an analysis was also carried out to track the pathways followed by the investigators through the algorithm. In 63.3% of the investigations, the analysts had to answer 5 out of 10 questions when using the algorithm.
Keywords: Error types Incident investigation Cognitive ergonomics Manufacturing industry Oil distribution industry
Relevance to industry This paper introduces a tool that helps to elucidate the nature of front-line workers involvement in occupational incidents. In particular, the tool allows the classification of the error-types involved in each incident, which is important since different error types require different safety management strategies. The application of the tool typically takes a few minutes and the investigators must answer no more than seven questions for each incident. Two case studies—one of them in a heavy machinery manufacturer and the other at an oil distribution plant—illustrate how the tool should be applied and how the results should be analyzed. & 2008 Elsevier B.V. All rights reserved.
1. Introduction There is wide agreement within ergonomics literature that human errors are symptoms of trouble deeper inside a system rather than the cause of a mishap. Therefore, human errors are just the starting point for accident investigations (Dekker, 2002). Although definitions of human error may be ambiguous and
Corresponding author. Tel.: +55 51 3316 3490; fax: +55 51 3308 4007.
E-mail addresses:
[email protected] (T.A. Saurin),
[email protected] (L.B.M. Guimara˜es),
[email protected] (M.F. Costella),
[email protected] (L. Ballardin). 0169-8141/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.ergon.2008.02.017
elusive (Rasmussen et al., 1994), Reason (1990) provides a broad definition proposing it as a generic term to encompass all those occasions in which a planned sequence of mental or physical activities fails to achieve its intended outcome, and when these failures cannot be attributed to the intervention of some chance agency. Such failure will usually be easier to label as a human error in well-structured technical systems in comparison to less structured work environments, such as administration or maintenance (Rasmussen et al., 1994). A number of studies have found human error as a major contributing factor in accident causation in several industries (Suraji et al., 2001; Lawton and Parker, 1998; Rasmussen et al., 1994; Sanders and McCormick, 1993). However, the actual
ARTICLE IN PRESS 1068
T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
participation of human error as a causal factor has varied within a wide range among different industries. For instance, while Suraji et al. (2001) concluded that human errors were contributing factors in 29.9% of 500 construction accidents investigated in the UK, Shappel and Wiegmann (1996) concluded that between 60% and 80% of aviation accidents are attributable, at least in part, to human error. Besides the particularities of each industry, this variation might also be due to the subjectivity involved in determining causes of accidents. In fact, since it is not possible to establish objective rules to terminate the search for causal explanations, the decision to stop and to accept one or more events as the causes depends entirely on the discretion of the analyst. Sometimes the search is terminated when an event or act that appears to be a familiar explanation is found; sometimes it stops when a cure to prevent an event is known; and sometimes simply because the causal path disappears due to the lack of information (Rasmussen et al., 1994). Nevertheless, accident causes attributed to human errors and organizational failures have increased in complex systems (Hollnagel, 2004). This is due to the fact that after passing the early stages of new technological developments, in which technical failures are the main causes of accidents, the focus switches to human error and human–machine mismatches, and eventually to organizational factors (Reason, 1990). Of course, it could be argued that when new technological developments are implemented, organizations and their employees require a period of time to adjust to the changes and so the cause of errors during this period might also have a strong human component (Paries and Amalberti, 2000). Thus, for leading companies in terms of safety in several industries that have mature technologies, such as aviation, petrochemical and nuclear power generation, technical failures are no longer a major cause of accidents (Hollnagel, 2004). A core issue dealt with by many studies on human error is the classification scheme of error types adopted. The term error type relates to the presumed origin of an error within the stages involved in conceiving and then carrying out an action sequence (Reason, 1990). An effective classification scheme can be of value in organizing data on human errors and for giving useful insights into the ways in which errors are caused and how they might be prevented (Sanders and McCormick, 1993). According to Reason (1997), different error types require different kinds of management. For instance, Atkinson (1998) suggests that violations are often caused by the natural human tendency to take the path of least effort, being usually aided by a relatively indifferent environment that rarely punishes violations or rewards observances. Moreover, the design of mistake-proof devices, which is a well-known approach to make the boundaries of performance error-tolerant, should be based on data on frequency and severity of errors. A classification scheme is also important because different errors types imply different degrees of culpability. Reason (1997) proposes an assessment protocol for culpability, ranging from blameless errors to sabotage and malevolent damage. Regardless of the importance of classification schemes, there is no universally agreed upon classification of human error types (Reason, 1990). The literature abounds with such taxonomies, reflecting a variety of practical concerns and theoretical orientations and ranging from the highly task specific to broad statements of underlying error tendencies (Reason, 1990). Baker and Krokos (2007) point out the fact that methods are necessary to compare taxonomies, regarding which are best for which purposes. Nevertheless, the classification of an error into the categories proposed by the existing classification schemes is not always straightforward. There is little guidance in the literature on how to minimize the subjectivity involved in error classifications by means of systematic methods. Shappell et al. (2007) identified and classified errors involved in commercial aviation accidents
based on inferences made from accident reports by a panel of experienced pilots who received a basic training in human factors. Reason (1990) reports some studies in which errors were identified and classified based on their descriptions by front-line workers involved in laboratory studies. Another drawback of current studies on error classification and error counting is that they have neglected manufacturing industry and occupational accidents, in favor of high-risk complex systems, such as medicine, aviation and military. This implies in relatively less availability of empirical data on human error for manufacturing. Also, this drawback limits the empirical validation of human error concepts and classifications in other settings. For instance, the complexity of human error investigation tends to be different whether manufacturing or high-risk complex systems are in focus. This is due to the fact that organizational accidents are usually the main concern in high-risk systems, while individual accidents are the typical focus in manufacturing. On the one hand, human error investigations tend to be more complex in organizational accidents, which have multiple causes (without clear causal links) involving many people operating at different levels of their respective companies. On the other hand, individual accidents might be fairly easy associated with particular types of unsafe acts that have a very large impact. In individual accidents, a specific person or group is often both the agent and the victim of the accident. Also, the people at risk, the hazards and the dangerous situations are well-known (Reason, 1997). Considering this context, this paper introduces an algorithm for classifying errors of front-line workers involved in incidents (this term encompasses both accidents and near misses in this article), based on the skill based (SB)–rule based (RB)–knowledge based (KB) framework proposed by Rasmussen (1982). It is worth noting that, in this paper, the term ‘‘front-line workers’ errors’’ refers to what is often referred to in literature as errors of the sharp end people. According to Woods et al. (1994), sharp end refers to the people who actually interact with the hazardous process. Hollnagel (2004) adds that sharp end people are engaged in work at the time and place where the accidents take place. The blunt end refers to the people who affect safety through their effect on the constraints and resources acting on the practitioners at the sharp end (Woods et al., 1994). The blunt end people are often the ones who create what Reason (1997) named as latent conditions (e.g. poor design and gaps in supervision), which may be present for many years before they combine with local circumstances and sharp end failures to penetrate the system’s many layers of defenses (Reason, 1997).
Attentional slips of action Skill-based slips and lapses Lapses of memory ERRORS Rule-based mistakes Mistakes
Knowledge-based mistakes Fig. 1. Summary of the principal error types (Reason, 1997).
ARTICLE IN PRESS T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
Was the task or situation covered by procedures or training?
No
1069
Better than 50% chance of mistakes with a high proportion of comission errors
Improvization, knowledge-based processing
Yes Yes
Mispliance
Were they apropriate for the task in hand? No
Latent organizational condition leading to lack of trust in procedures and greater reliance on informal work practices
Was the bad procedure followed?
No
Yes
No No
Were they followed as intended?
Skill-based slip or lapse
No Was some other formal procedure followed?
Yes Correct and successful performance
Correct violation
Was some informal work practice followed?
Yes Misvention
RB mistake: misapplication of a good rule
Yes
Fig. 2. Varieties of rule-related behaviors (Reason, 1997).
Although the proposed algorithm is based on a similar one proposed by Reason (1997), it emphasizes practical implementation issues that were not dealt with by that author. The algorithm was tested in two case studies conducted in Brazil, in sectors that have been neglected by previous studies: a heavy machinery manufacturer and an oil distribution plant.
2. Classification of human errors adopted in this study This study adopts the proposal of Reason (1990), who differentiates error types according to the performance levels at which they occur. According to Reason (1990), a classification based on the cognitive mechanisms involved in error production has a higher level of abstraction than either classifications based on observable features of erroneous behavior (e.g. repetition and omissions) and classifications that draw attention to local contextual factors. Due to this characteristic, incident investigators and researchers, rather than front-line workers and supervisors, are the typical user populations of this type of classification (Reason, 1990). Fig. 1 summarizes the error types proposed by Reason (1997). Fig. 2 summarizes the varieties of rule-related behaviors, offering two principal routes for action: one that goes directly to correct and successful performance; and other showing a variety of pathways to what are mostly unsafe behaviors (Reason, 1997). It is worth noting that Reason (1990) presents Fig. 2 with the primary aim of explaining the theoretical routes that lead either to erroneous or successful performance, rather than as a practical tool to investigate and classify human errors.
3. Research method 3.1. Overview of the research method An underlying assumption of the proposed algorithm is that it is not always straightforward to determine in advance whether
there was a front-line worker error or not. Although this could be obvious for some events, this preliminary filtering was not made in this study, since the more the algorithm was tested, the greater would be the opportunities for its validation. Moreover, since all available events were analyzed based on the algorithm, it would be possible to identify the proportion of events involving sharp end people’ errors in relation to those that did not involve their errors. The first version of the algorithm was tested at a heavy machinery manufacturer (field study 1) and, based on this study, a new version of the algorithm was developed and tested at an oil distribution plant (field study 2). In addition to the fact that those companies were representative of sectors that have not been focused by previous studies, they were also chosen because they were developing other joint research projects with the team that conducted this study. This was very positive for this study, since the researchers could become more acquainted with both companies’ processes and management practices. This also supported the understanding of the broader organizational context in which the investigated incidents took place. Concerning the company involved in field study 1, it is worth noting that, since 1998, the enterprise has maintained a partnership for carrying on research projects on ergonomics with the laboratory that carried out this study. In both studies, the algorithm was used to re-analyze incidents that had been investigated by members of the safety staff of the companies. The results of those investigations were documented in the safety and health department of both enterprises. The causes of incidents attributed by the enterprises’ safety staffs were re-classified in order to establish a common terminology. The re-classification was based on a checklist of potential causes that were empirically identified in action research studies conducted by two of the authors in six construction sites, in which they developed and implemented a safety planning and control model (Saurin et al., 2004, 2005). In the re-classification, the number of causes attributed to each incident was consistent with the number of causes that were originally attributed by
ARTICLE IN PRESS 1070
T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
companies—i.e. usually two causes per event in field study 1 and usually one cause per event in field study 2. In field study 1, this re-classification was based on a consensus among the research team and members of the enterprise’s safety and health department. In field study 2, only the research team perceptions were considered. It is worth emphasizing that the adopted procedure cannot be considered a root cause analysis and did not intend to be one, since no systematic method was adopted to take into account all possible causes and to establish their causal links through the several layers from the sharp end to the blunt end. In fact, it s likely that an effective accident investigation would be able to detect additional causes in relation to those detected by both companies safety staffs. In order to better understand the complex scenarios where control can be lost, the re-classified incident causes were also analyzed based on their association with the four sub-systems of a socio-technical system proposed by Hendrick and Kleiner (2001): (a) The personnel subsystem, comprised by the workforce characteristics, like degree of professionalism, demographic characteristics and psychosocial aspects. (b) The technological subsystem, which consists of the physical environment and the workstation characteristics, including the machinery, tools, equipment, and the degree of automation. (c) The organizational/work design subsystem, which refers to the design of an organization’s work system structure in terms of complexity, formalization and centralization. (d) The external environment subsystem that affects organizational functioning. It is basically composed of five types of environments: socioeconomic, educational, political, cultural and legal. The subsystems are interconnected, but it is possible to say that the external environment strongly influences the first three subsystems. 3.2. Characteristics of the enterprises under study 3.2.1. Enterprise 1 The heavy machinery manufacturer was a transnational that had approximately 2200 workers and whose main products were tractors and harvesters. Since 2001, this company has been reorganizing its production system adopting the lean production philosophy as a basis. Fig. 3 shows a partial view of the shop floor nearby the harvesters’ assembly line. In the plant studied, 125 incidents that were registered by the enterprise’s safety and
Fig. 3. Shop-floor view nearby the harvesters’ assembly line (field study 1).
health department in 2004 were analyzed. Members of this department comprised one safety engineer and four safety specialists, all of them at least with five of years of experience as practitioners. The enterprise had a registration form with a general description of the incident, a causal analysis that identified both a behavioral and a technical cause, and the registration of the incident’s responsibility. Every time the form did not bring enough information to fulfill the proper analysis, the research team reconstructed the incidents based on interviews and meetings with safety personnel to clarify the facts. In a few cases, reconstruction also demanded a visit to the incident’s locale and a discussion with co-workers and technical clerks. Of course, the interval between the date of the incident and the date of the interviews (i.e. from January to April 2005) may have influenced the results of the re-construction. On the one hand, the interviewees can have forgotten important details on the event. On the other hand, the temporal distance from the event may have resulted in more mature perspectives, setting aside precipitated judgments made in the aftermath of an undesired event. 3.2.2. Enterprise 2 The oil distribution plant had 17 workers who operated both a roadway and a railway distribution terminal. However, in the roadway terminal these workers just supervised the truck loading, which was carried out by the truck drivers. The roadway terminal received 450 trucks every day on average and the trucks waiting time in the queue ranged from 1 to 6 h. The railway terminal received, on average, 22 wagons every day (each wagon had a 60,000 l capacity) and the operators of the oil distribution plant undertook the loading of oil into the wagons. In this study, 37 incidents reported between October 2003 and April 2006 were analyzed. Incident reports had general information like date, time, local, involved personnel, reports, incident description, causes, consequences and proposed solution measures. Differently from field study 1, at the oil distribution plant the researchers had no opportunity to reconstruct the incidents with the workers involved in the events and with the enterprise’ safety staff, which encompassed four safety specialists, all of them with more than 5 years of experience. However, in field study 2 it was possible to analyze each incident from the point of view of all workers involved in the accident scene, since this information was available in the investigation forms. Therefore, the tool was applied as many times as the number of the involved people in all the cases with more than one person was involved. 3.3. Original version of the algorithm The algorithm assumes that the rules were devised well before the moment of execution, typically by managers and without inputs from front-line workers. In fact, this perspective of rules is described in the algorithm by the term procedures, which can be developed by a number of well-known techniques, such as the preliminary hazard analysis (PHA) (Kolluru et al., 1996). As a result of this assumption, a core idea of the algorithm is that there was no worker error when the procedure was badly devised. In the case of application of bad procedures by workers, the error should be attributed to rule’ developers, who in turn may have committed this error at any of the SRK levels. So, workers errors at the rule-based level may only involve the misapplication of normally good procedures or the failure to apply good procedures. The algorithm is basically a flow chart (see Fig. 4) of a series of questions, starting from question 1: ‘‘does the task have a procedure and/or workers were trained? ‘‘In this question, the word ‘‘task’’ has a broad meaning, referring to a set of operations
ARTICLE IN PRESS T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
1071
START
1
No No error from the injured worker
Does the task have a defined procedure and/or training? Yes
2
No
Was the procedure and/or training adequate?
No error from the injured worker
Yes 3
6
No Was the procedure and/or training followed?
Would another worker behave the same way in a same situation?
Yes
No error from the injured worker
No
4
Was there any worker’s failure?
Slip
5
Did the problem happen in the context of a new, unpredictable situation?
7
No Was the error intentional?
Memory Lapse
Yes 8
No Was the error commited by the injured worker?
Yes KB error
No error from the injured worker
No
Yes No
Yes
Violation from another worker
Yes Violation from the injured worked
Fig. 4. Original version of the algorithm.
carried out in a certain workstation. If the workstation has a PHA, the answer of question 1 should be yes, even though not all possible operations in that workstation were included in the PHA. This means that the answer of question 1 should be yes if the accident happened during an unexpected operation that was undertaken during a broader task that had a PHA. Another assumption was that training would exist either if the worker was formally trained (e.g. through safety meetings and lectures) or if he/she had practical experience in the task being undertaken. If the answer of question 1 is no, it is a totally blameless event as the worker does not have the minimum information required to carry on the task. The next step is to define, using question 2, whether the procedure is really applicable: ‘‘was the procedure and/or training adequate?’’ If the answer is negative and the bad procedure was followed leading to the accident, there was a mispliance (Reason, 1997) and the worker has no blame for the accident. Considering that the procedure and/or training were adequate, question 3 (‘‘was the procedure and/or training followed?’’) divides the
decision tree in two branches. In the case that the procedure was not followed, the error could be eventually classified as a violation or a memory lapse. In the case that it was followed, a slip or a knowledge-based error might have occurred. In the case that the procedure and/or training were adequate and had been followed, question 4 should be made: ‘‘was there any worker failure?’’ In the case that there was a human failure, it is necessary to analyze whether it happened in the context of an unanticipated situation. In the case of a routine situation carried out at the skill-based level, the error should be classified as a slip. In the case of a new unpredictable situation, the error should be classified as a knowledge-based error. In the other branch of possible errors, when the procedure and/or training are not followed, the substitution test is used (Reason, 1997) for question 6: ‘‘would another worker behave the same way in a same situation?’’ If the test is positive, it strongly suggests that the error was induced by problems in the production system, and therefore, the accident is not due to any type of worker error.
ARTICLE IN PRESS 1072
T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
In the case the substitution test results negative, two situations are possible: the worker might have forgotten how to carry out the task or might have deliberately skipped one or more steps. Question 7 (‘‘was the error intentional?’’) is probably the most difficult to answer, since it might involve the problem of culpability. Of course, rather than just directly asking to the worker involved in the event, it is also necessary to look for clues that could lead to the conclusion about whether or not there was intention to violate the procedures. In the case that the conclusion is that the error was not intentional, there might have been a memory lapse (because of fatigue, for example) which prevented the worker from operating properly. In the case that the conclusion is that the error was intentional, question 8 should be asked: ‘‘was the error committed by the injured worker?’’ The answer of this question indicates whether the violation was carried out by the injured worker or by someone else.
3.4. Improved version of the algorithm Fig. 5 presents the improved version of the algorithm. A major modification from the original flow chart was the introduction of a set of questions that builds a third stream not previously considered. The original flow chart has two major branches that lead to different results: one considers that the worker did not follow the procedures and/or training while the other stream considers that the worker did. The third proposed branch starts in question 1 (‘‘was the worker aware of the procedures content and was he/she trained?’’) and considers the knowledge the worker has regarding the task in hand. This question aims at emphasizing that a procedure has no meaning if the worker is not aware of it. In the case the worker is not acquainted with the task, question 9 must be asked: ‘‘was the worker assigned by his/her superior to carry out this task?’’ If the answer is no, then there was a violation, since the worker made the decision to carry out the task without
START
9 1
Was the worker aware of the procedures content and/or was he/she trained?
Was the worker assigned by his/her superior to carry out this task?
No
2
Violation
Yes
Yes No
No
Was the procedure and/or training adequate and applicable? Yes
3
Was the procedure and/or training followed?
No
6
If the procedure or training had been followed, would the incident happen? No
Yes Yes
7
4 Was there any technical failure?
No worker error
Would another worker behave the same way in a same situation?
No No Slip
5
Did the problem happen in the context of a new, unpredictable situation?
Yes No worker error
No 8
Was the error intentional?
No Memory Lapse
Yes Violation
Yes KB error
Yes
10 Was there any other worker involved?
Yes
Fig. 5. Improved version of the algorithm.
No
END
ARTICLE IN PRESS T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
being assigned to do it and without being prepared for carrying it out. In the case of a positive answer for question 9, the fault cannot be attributed to the worker, since he was assigned by someone else to carry out a task of which he was not capable. Another important change in the improved version was the substitution of question 4. In the new version it was stated as ‘‘was there any technical failure?’’ rather than ‘‘was there any worker failure?’’ This change was made both because technical failures were usually less ambiguous than workers failures—this made it easier to apply the algorithm—and because the researchers realized that technical failures performed a major role in many incidents in field study 2. A minor change made in the new version was the introduction of question 10 (‘‘was there any other worker involved in the incident?’’) that brings back to the beginning of the flow chart in order to evaluate the same event under another involved person’s point of view. Due to this change, the term injured worker that was used in the original version was substituted by the word worker.
4. Results 4.1. Field study 1 Only 36 out of the 125 incidents selected at the heavy machinery manufacturer could be fully analyzed. This was due to the fact that the information presented in most accident records was not sufficiently detailed and clear. In fact, the enterprise did not fully use the historical data on incidents as a means to provide feedback to its safety management system. Considering the 36 accidents selected, 47% of them were classified as first aid cases—the worker is medicated and returns to his/her workstation; 19% were classified as 1-day away from work cases; 17% were classified as more than 1-day away from work cases; and 17% resulted only in material damage. In 50% of the cases selected, the conclusion stated in the reports was that the cause was ‘‘lack of attention’’ of the employee and, therefore, the accident happened because of his/her fault. The results from the re-analysis of accidents at the heavy machinery manufacturer using the proposed method are summarized in Table 1. The slips were the most frequent error type (41.7%) and their typical consequences were small injuries to the worker or to the property (e.g. hitting or scratching shelves with piling machines). Of course, a high incidence of skill-based errors is expected in labor-intensive repetitive manufacturing environments, such as the plant investigated in this study. This is due to the fact that performance at the skill-based level is supposed to take place most of the time in repetitive environments that demand automatic behaviors. As a basis for comparison, averaging three laboratory studies conducted by other researchers, Reason (1990) found the relative proportions of the three error types to be 60.7% SB errors, 27.1% RB errors and 11.3% KB errors. The studies reported by Reason (1990) involved statistical problem solving,
Table 1 Distribution of errors after re-analysis of incidents (field study 1) Error types
Frequency
%
Slips No worker error KB error Violation by the injured worker Violation by other worker
15 11 4 4 2
41.7 30.6 11.1 11.1 5.6
Total
36
100.0
1073
database manipulation and steel mill production planning. In field study 1, the proportion was 41.7% SB, 16.6% RB and 11.1% KB. Table 2 presents the distribution of causes of the accidents after re-analysis. The causes of slips were mostly associated with the personnel sub-system (55% of all slips’ causes) and with inadequate operation of tools and equipment. Causes of KB errors were evenly distributed among the technological, personnel and work system design sub-systems. Overall, KB errors were characteristic of more serious and complex accidents, with and without leave from work. Concerning violations, 57% of all causes were related to the technological sub-system and 43% to the personnel sub-system. No violation cause was assigned to the work system design subsystem. When no error of front-line workers was detected, 53% of all causes were related to the technical sub-system and 47% to the work system design sub-system. The situations not involving worker error typically resulted in accidents without leave from work and small injuries. Although the causes of the accidents that did not involve worker error were very distinct, 27% of them were classified as supplier’s problems, especially badly dimensioned belt supports for equipment and the existence of chemicals that caused allergies, despite the use of protection gloves. Overall, the data presented in Table 2 pointed out that, with the exception of the external environment sub-system, all the remaining three sub-systems played a similar role as causal factors. The absence of causes related to the external environment reflects the fact that it was fully ignored by accident investigations conducted by the safety staff. Perhaps investigators felt that the external environment should not be taken into account, since it is much more difficult to be controlled by the company than the other elements of the socio-technical system. However, it is likely that deeper investigations could detect relevant factors related to the external environment, such as job insecurity due to variations in demand. In particular, during the analyzed period, a major reduction in demand due to droughts in the main agricultural regions of Brazil had forced the investigated plant to dismiss about 800 workers. This scenario may have created additional performance pressure on the remaining workers, and this could have impacted negatively on their safety. Also, another conclusion from Table 2 is that some categories of causes were associated just with one specific result of the algorithm: ineffective communication and KB error, failure of the external suppliers and no worker error, nonexistent safety procedures and no worker error, inadequate execution procedure and no worker error, failure in task planning and KB error, poor layout planning and poor work station design and slips, and unsafe acts and violations by the injured worker.
4.2. Field study 2 At the oil distribution plant, 3 out of the 37 reported incidents could not be used in the study due to the lack of information. This was a much better rate (92% of the reports could be used) in comparison with field study 1 (only 29% of the available reports could be used). Within the 34 incidents analyzed, 32 happened inside the plant area and the other two happened in roadways outside the plant. It is important to emphasize that the algorithm was used in 41 re-analyses because in some incidents more than one person performed roles in the accident scene (e.g. a truck driver and an oil distribution operator). In fact, 29 events (85%) had just one person involved, 4 events (12%) had two people involved, and 1 event (3%) had four people involved. Differently from field study 1, the health and safety department at the oil distribution plant also registered near misses. Considering the period of time analyzed in this study, 20 incidents
ARTICLE IN PRESS 1074
T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
Table 2 Distribution of errors after re-analysis according to their causes (field study 1) Causes
Slips
Technological Ineffective communication Materials badly stored or improperly moved Use of tools and equipment poorly maintained Failure either to implement or maintain safeguards Failure of the external suppliers Work system design Nonexistent safety procedures Not identified hazard Ineffective or inexistent training Inadequate execution procedure Failure in task planning Poor layout planning or poor work station design
No worker error
KBB error
Violation by the injured worker
Violation by other worker
Total
Total (%)
4 0 1 1 2 0 6 0 1 1 0 0 4
8 0 2 2 0 4 6 1 1 2 3 0 0
3 1 0 1 1 0 3 0 2 0 0 1 0
2 0 1 0 1 0 0 0 0 0 0 0 0
2 0 2 0 0 0 0 0 0 0 0 0 0
19 1 6 4 4 4 16 1 4 3 3 1 4
37.3 2.0 11.8 7.8 7.8 7.8 31.4 2.0 7.8 5.9 5.9 2.0 7.8
Personnel Inadequate operation of tools and equipment Unsafe act Execution out of the proper sequence or lack of execution of some stage
11 9 0 1
0 0 0 0
3 1 0 2
3 1 1 1
0 0 0 0
16 11 1 4
31.4 21.6 2.0 7.8
Total
20
15
9
5
2
51
100.0
Table 3 Distribution of errors after re-analysis of incidents (field study 2) Error types
Frequency
%
Slips Lapses No worker error Violations
5 1 30 5
12.2 2.4 73.8 12.2
Total
41
100.0
out of the 34 were classified as accidents and 14 were classified as near misses. Within the 20 accidents, 14 were classified as firstaid cases and four were absence leave cases, but there were no information that could define their severity. Table 3 presents the results of the re-analysis of incidents based on the application of the improved version of the algorithm. In this study, 32 (78.1%) out of the 41 people involved in the incidents did not err. This data differs substantially from field study 1, in which events with workers errors were about 2.5 times more frequent. Such differences can be partially due to the characteristics of both work environments. In fact, workers in field study 1 seemed to be more prone to err since their tasks were more labor-intensive (i.e. assemblage of tractors and harvesters) than tasks conducted by workers in field study 2 (supervision and pumping oil to trucks). Concerning the absence of KB errors in field study 2, this can be a result of less task variety at the oil distribution plant in comparison with the heavy machinery manufacturer. The fairly low incidence of violations in field studies 1 and 2 (7.3% and 17%, respectively) can be a result of many factors, such as well-designed rules, effective oversight of workers’ performance and effective safety cultures in both companies. According to Cooper (2000), safety culture is a subfacet of organizational culture, which is thought to affect members’ attitudes and behavior in relation to an organization’s ongoing health and safety performance. When safety is a core organizational value, individuals are less prone to unnecessarily violate good procedures. Table 4 presents the causes of incidents after re-analysis. The only new category of cause that was included in this study refers to equipment breakdowns (9.5% of all causes), which reflects the
fact that at the oil distribution plant the technological sub-system performed a stronger role than in field study 1. Data of Table 4 are consistent with data of Table 3, since both point out that the personnel sub-system was not the most important factor contributing to the incidents investigated. The major category of cause in this study concerns poor layout planning or poor workstation design (26.2% of all causes). This is a result of the fact that at the oil distribution plant, designers privileged the technological sub-system. While the location of major equipment, such as trucks, oil tanks and pumps were certainly a major concern for designers, operators often did not have proper access to their workstations. Similarly to field study 1, the safety staff of the oil distributor did not register any cause related to the external environment sub-system. Some potential accident contributing factors related to the external environment could have been detected by deeper investigations, such as frequent traffic jams in the roadways nearby the plant. This could make truck drivers more tired and nervous. Also similarly to field study 1, some categories of causes were associated to a single error type: ineffective communication and no worker error, equipment breakdown and no worker error, nonexistent safety procedures and no worker error, and inadequate tools and equipment and no worker error. Considering both studies altogether, five categories of causes were associated to a single error type: equipment breakdown and no worker error, nonexistent safety procedures and no worker error, failure of the external suppliers and no worker error, inadequate execution procedure and no worker error, and failure in task planning and KB error. This data suggests that causes of events with no worker error tend to be less ambiguous and easier to be identified than causes of events in which some error occurred. If those patterns were consistently identified in a larger sample of events, the algorithm could also make it easier causal investigations, since a limited number of categories of causes could be associated with each error type. In this study, an analysis was also carried out to track the pathway followed by the investigators through the algorithm (Table 5). This analysis pointed out that in 63.3% of the investigations, the analysts had to answer four questions when using the algorithm. The minimum was three questions and the
ARTICLE IN PRESS T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
1075
Table 4 Distribution of errors after re-analysis according to their causes (field study 2) Causes
Lapses
Slips
No worker error
Violation
Total
Technological Ineffective communication Use of tools and equipment poorly maintained Failure either to implement or maintain safeguards Equipment breakdown
0 0 0 0 0
1 0 0 1 0
16 1 5 5 4
2 0 2 0 0
18 1 7 6 4
42.8 2.4 16.7 14.3 9.5
Work system design Nonexistent safety procedures Inadequate tools and equipment Poor layout planning or poor work station design
0 0 0 0
1 0 0 1
15 1 3 10
1 1 0 0
16 2 3 11
38.1 4.8 7.1 26.2
Personnel Unsafe act Execution out of the proper sequence or lack of execution of some stage
1 0 1
3 1 2
1 0 1
3 3 0
8 4 4
19.0 9.5 9.5
Total
1
5
32
6
42
100.0
Table 5 Pathways followed by investigators in the algorithm Pathway
Frequency
%
1-2-3-4-10 1-2-3-6-10 1-2-3-4-5-10 1-2-3-6-7-8-10 1-9-10 1-2-10 1-2-3-6-7-10
21 5 4 4 4 2 1
51.2 12.2 9.7 9.7 14.7 4.9 2.5
Total
51
100.0
maximum was seven questions. This indicates that the use of the algorithm is usually not time-consuming, even though answers are not always straightforward. Some pathways were found to be associated with certain types of incidents, such as the pathway 1-2-3-4, which was mostly associated with incidents in which technical failures took place. The frequency that each question had to be answered was also calculated: question 1 (100%), question 2 (90.2%), question 3 (85.4%), question 4 (61.0%), question 5 (9.8%), question 6 (24.4%), question 7 (12.2%), question 8 (9.8%), question 9 (14.7%) and question 10 (100%). This data indicates the most frequently used branches of the algorithm. For instance, when question 3 was asked (‘‘was the procedure and/or training followed?’’), 71.4% of the answers was yes and the investigators followed to question 4 (‘‘was there any technical failure?’’). In order to provide a better understanding of the data presented in Tables 3–5, some examples of the re-analysis of errors at the oil distribution plant are presented as follows:
(a) No worker error (code 1-2-3-4-10) When the operator was closing the cap of a fuel tank, the structure that maintained the cap opened broke down and the cap fell on the operator’ foot. According to the accident investigation conducted by the safety staff, the collapse of the cap was caused by a failure in welding during its assembly. In this case, causes were far away from the worker involved in the accident, who was aware of the procedures to handle the cap and followed them properly. The cause of this event was assigned to the technological sub-system, in the category use of tools and equipment poorly maintained. (b) No worker error (code 1-2-3-6-10) An operator riding a bicycle through the site was distracted in order to give directions to a truck driver who was coming into
Total (%)
the site. As a result, the bicycle went over iron bars that covered a manhole and the operator fell down. Even though the operator did not follow the procedure, which prescribed that he should not talk to anyone else while biking, the research team considered that the accident would have happened anyway, since the manhole was badly located, featuring a latent condition. The cause of this event was assigned to the work system design sub-system. For instance, due to poor layout planning or poor workstation design, the manhole was poorly located and uncovered, leaving gaps among the iron bars. Similarly, poor layout planning or poor workstation design resulted in poor direction and sign systems for the truck drivers. (c) No worker error (code 1-2-3-4-10) When a truck driver was attaching the filling pump at the tank on his truck, there was a leakage. The investigation conducted by the safety staff concluded that the pump could not be properly attached because it had a missing part. Even though this error was likely undertaken by the maintenance personnel, the accident investigation form was filled out only from the perspective of the truck driver. Because there was not enough information to analyze the event from the perspective of maintenance, the cause was attributed to the technological sub-system, in the category use of tools and equipment poorly maintained. (d) Slip (code 1-2-3-4-5-10) The truck driver mistakenly typed 1230 dl, rather than 230 dl, as the volume of oil to be filled in the tank of the truck. As a result, there was a leaking of oil. In this case, it was a routine situation and the truck driver was aware of the procedure, which regardless of being adequate was not followed. The cause was assigned to the personnel sub-system, in the category execution out of the proper sequence or lack of execution of some stage. (e) Memory lapse (code 1-2-3-6-7-8-10) A truck driver maneuvered to leave the filling station without un-plugging the ground wire. Since that wire was entangled with the pipes that pumped fuel to the tank on the truck, that failure damaged the pipes resulting in a leakage. In this case, the driver was aware of the procedure (i.e. un-plug the ground wire) and the procedure was adequate, but it was not followed. In the substitution test, it was assumed that it was unlikely that another driver committed the same failure. This was a typical memory lapse and the only error of this type detected in both field studies. Although memory lapses are internal events that are usually difficult to observe, in this case the classification of the described
ARTICLE IN PRESS 1076
T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
event as a lapse was fairly easy, since the procedure was both clear and adequate and the consequences of the lapse (i.e. leakage) were immediately visible. The cause was assigned to the personnel sub-system, in the category execution out of the proper sequence or lack of execution of some stage. (f) Violation of the injured worker (code 1-9-10) A truck had mechanical problems while pumping fuel and an operator made the decision to help the truck driver by manually pushing the truck. This operator seriously hurt his shoulder at the moment the truck started to move. Even though the written procedures did not prescribe how to proceed in the case trucks were stuck by mechanical problems, the tacit rules of behavior led the operator to do what made sense to him at that moment (i.e. pushing the truck to release the filling station as soon as possible). It is worth noting that the rule violated by the injured worker was that he made a task that was not of his responsibility. Two main causes were assigned to this event: one of them was related to the work system design sub-system (nonexistent safety procedures) and the other was related to the technological sub-system (use of tools and equipment poorly maintained), since the truck had a mechanical breakdown. (g) Violation of the injured worker (code 1-2-3-6-7-8-10) A sub-contracted worker used a 25-cm-wide plank to cross over a drainage pipe carrying a wheelbarrow full of pebbles. The worker lost his balance and his left leg fell down into the ditch. The investigation detected that the plank was used as an improvisation to shorten the path. In this case, the worker did not follow a good procedure, of which he was aware. Therefore, the cause was assigned to the personnel subsystem, in the category unsafe act. Based on the above descriptions, it is possible to realize that some of the causes are not really related to upstream managerial failures. For instance, in event (d) the mistakenly typing could have other causes (e.g. poor display design) that were not feasible to be identified because not enough information had been collected during the accident investigations. Of course, drawbacks for the identification of causes were the low quality of the information available in some of the incident investigation forms and the time lapse of several months or even a few years since the incidents occurred.
5. Conclusions The proposed algorithm for classifying error types is a tool that might help to better understand the causes of incidents. In particular, it helps to elucidate the nature of front-line workers involvement in the incident, with no intention of placing guilt. This might be an important first step in a thorough investigation aimed at all layers from sharp end to blunt end. The algorithm was developed and tested in the analysis of 36 accidents in a heavy machinery manufacturer and 34 incidents (accidents and near misses) at an oil distribution plant. Both field studies made it clear that the successful retrospective use of the algorithm depends to a great extent on well-done incident investigations. Of course, ideally the algorithm should be used as a tool during the actual investigation conducted by the enterprises’ safety staff. This would also minimize the time lapse between the use of the algorithm and the incident, which was a drawback in both field studies. The use of the algorithm was not usually time-consuming. In fact, no more than seven questions needed to be answered for each incident in both field studies and this often took only a few
minutes. Nevertheless, it is worth noting that all incidents investigated were typical occupational incidents with low severity. The re-analysis of incidents also showed that data resulting from the algorithm might be useful for driving safety management strategies. Since the cognitive mechanisms involved in errors at each of the SRK levels are different, it is reasonable to assume that preventive measures will also have different emphasis whether they are focusing skill, rule or knowledgebased failures. For instance, skill-based errors (such as the memory lapse detected in the field study 2) are prone to be tackled through mistake-proof devices, since at the SB level the task is carried out in an automatic fashion and mistake-proof devices, by definition, ensure safe work independently of a worker’ attention. Similarly, rule-based errors (such as in the case of the sub-contracted worker that used a 25-cm-wide plank to cross over a drainage pipe carrying a wheelbarrow full of pebbles) typically call for either improved oversight of workers’ performance or improved procedures. This in turn demands continuous monitoring of procedures adherence, in order to reduce the gap between work as imagined by managers and work as performed by workers. Knowledge-based errors, which were only detected in field study 1, are indeed the most difficult to be tackled. However, remedial measures also exist, such as training workers to detect procedures that are no longer adequate and applicable. If possible, the system should be designed to allow work to be stopped, so people could undertake the slow mental processing typical of the KB level. This proposal is feasible in assembly lines such as the ones that existed in the heavy machinery manufacturer, since its partial or full stoppage has no strict technological limitations. Concerning the quantitative results of both field studies, it was concluded that a high proportion of events (30% in field study 1 and 78% in field study 2) had no error of front-line workers as a major contributing factor. The distribution of error types in both studies 1 and 2 was respectively as follows: slips (42% and 12.2%); memory lapses (0% and 2.4%); violations (17% and 7.3%); knowledge-based errors (11% and 0%). The causes of all incidents were also classified based on their association with the sub-systems of the socio-technical. The results of this analysis for both studies 1 and 2 were as follows: technological sub-system (37.3% and 42.8% of all causes); work design sub-system (31.4% and 38.1% of all causes); personnel sub-system (31.4% and 19.0% of all causes). It is worth noting that lack of attention, which was a cause attributed by the safety staff to 50% of the investigated incidents in study 1, was not identified as a cause after the re-analysis. As previously mentioned, all causes associated with the personnel sub-system in study 1 accounted for much less than 50%. The external environment sub-system was not associated with any cause probably because it was ignored during the original investigations conducted by the safety staff of the enterprises. Since the empirical data provided by this study are not statistically representative of the industries that were investigated, additional data will be necessary so patterns might be either confirmed or not. In particular, it is suggested that the algorithm be used as a basis to investigate the relationships among error types and categories of causes. Of course, there is also an opportunity to investigate the relationships among error types and other variables, such as workers age, expertise, time and day of the week, similarly to what has been done by a number of studies concerning accident statistics. Another opportunity for future studies concerns the development of a similar tool to classify the types of successful performances, such as correct compliances, correct violations and correct improvisations. The better understanding of mechanisms and relative frequencies of different types of successful performances is a key to enable the
ARTICLE IN PRESS T.A. Saurin et al. / International Journal of Industrial Ergonomics 38 (2008) 1067–1077
development of innovative safety management strategies. In fact, a single broader algorithm could be developed to encompass both events with successful and erroneous performances. Moreover, an investigation could be made to assess what adaptations are necessary to apply the algorithm in more complex events, especially those related to organizational accidents. In this respect, it must be emphasized that in the field studies the algorithm was applied considering just the last error committed by front-line workers involved in the incident scene. In more complex events, it may be necessary initially to detect the chain of errors committed by every single worker and then apply the algorithm for each error. References Atkinson, A., 1998. Human error in the management of building projects. Construction Management and Economics 16, 339–349. Baker, D., Krokos, K., 2007. Development and validation of aviation causal contributors for error reporting systems (ACCERS). Human Factors 43 (2), 185–199. Cooper, M.D., 2000. Towards a model of safety culture. Safety Science 36, 111–136. Dekker, S., 2002. The Field Guide to Human Error Investigations. Ashgate, London. Hendrick, H., Kleiner, 2001. Macroergonomics: An Introduction to Work System Design. Human Factors and Ergonomics Society, Santa Monica. Hollnagel, E., 2004. Barriers and Accident Prevention. Ashgate, Aldershot. Kolluru, R., Bartell, S., Pitblado, R., Stricoff, R., 1996. Risk Assessment and Management Handbook. McGraw-Hill, New York. Lawton, Parker, D., 1998. Individual differences in accident liability: a review and integrative approach. Human Factors 40 (4), 655–671.
1077
Paries, J., Amalberti, R., 2000. Aviation safety paradigms and training implications. In: Sarter, N.B., Amalberti, R. (Eds.), Cognitive Engineering in the Aviation Domain. Lawrence Erlbaum Associates, 2000, Mahwah, NJ, pp. 253–286. Rasmussen, J., 1982. Human errors: a taxonomy for describing human malfunction in industrial installations. Journal of Occupational Accidents 4, 311–333. Rasmussen, J., Pejtersen, A., Goodstein, L., 1994. Cognitive Systems Engineering. Wiley, New York, 378pp. Reason, J., 1990. Human Error. Cambridge University Press, Cambridge. Reason, J., 1997. Managing the Risks of Organizational Accidents. Ashgate, Burlington, p. 252. Sanders, McCormick, E., 1993. Human Factors in Engineering and Design, seventh ed. McGraw-Hill, New York. Saurin, T.A., Formoso, C.T., Guimara˜es, L.B.M., 2004. Safety and production: an integrated planning and control model. Construction Management and Economics 22 (2), 159–169. Saurin, T.A., Formoso, C.T., Cambraia, F., 2005. Analysis of a safety planning and control model from the human error perspective. Engineering, Construction and Architectural Management 12 (3), 283–298. Shappel, S., Wiegmann, D., 1996. US naval aviation mishaps 1977–2: differences between single and dual piloted aircraft. Aviation, Space, and Environmental Medicine 67, 65–69. Shappell, S., Detwiler, C., Holcomb, K., Hackworth, C., Boquet, A., Wiegmann, D., 2007. Human error and commercial aviation accidents: an analysis using the human factors analysis and classification system. Human Factors 43 (2), 227–242. Suraji, A., Duff, R., Peckitt, S., 2001. Development of causal model of construction accident causation. Journal of Construction Engineering and Management 127 (4), 337–344. Woods, D.D., Johannesen, L.J., Cook, R.I., Sarter, N.B., 1994. Behind human error: cognitive systems, computers, and hindsight. Crew System Ergonomics Information Analysis Center, Wright-Patterson Air Force Base, OH.