Detection tasks in nuclear power plant operation: Vigilance decrement and physiological workload monitoring

Detection tasks in nuclear power plant operation: Vigilance decrement and physiological workload monitoring

Safety Science 88 (2016) 97–107 Contents lists available at ScienceDirect Safety Science journal homepage: www.elsevier.com/locate/ssci Detection t...

949KB Sizes 4 Downloads 103 Views

Safety Science 88 (2016) 97–107

Contents lists available at ScienceDirect

Safety Science journal homepage: www.elsevier.com/locate/ssci

Detection tasks in nuclear power plant operation: Vigilance decrement and physiological workload monitoring Lauren Reinerman-Jones, Gerald Matthews ⇑, Joseph E. Mercado Institute for Simulation and Training, University of Central Florida, United States

a r t i c l e

i n f o

Article history: Received 7 September 2015 Accepted 4 May 2016

Keywords: Nuclear power plant Human performance Vigilance decrement Workload Psychophysiology Individual differences

a b s t r a c t Nuclear power plant (NPP) operators perform a variety of tasks that differ in mental workload. These include detection tasks that may be vulnerable to vigilance decrement. The present study used a simulation of NPP operation to investigate possible loss of vigilance during detection. Metrics used to assess operator functioning included subjective measures of workload and stress, physiological indices of workload, and objective performance. Detection, checking and response implementation tasks were compared, in the context of a simulated Emergency Operating Procedure (EOP). Study findings suggested three conclusions. First, detection imposed higher subjective workload and distress than other tasks, but physiological data suggested more complex differences between tasks. Second, vigilance decrements in detection performance were observed within 5-min task ’steps’. However, analyses of physiological metrics suggested that multiple temporal processes may operate. Third, there were consistent individual differences in task-induced workload responses. Implications of the findings for evaluating NPP interface designs and monitoring operators are discussed. Ó 2016 Elsevier Ltd. All rights reserved.

1. Introduction The operation of nuclear power plants (NPPs) raises a variety of human factors issues, including potential loss of vigilance and alertness. A primary function of the human NPP reactor operator (RO) is monitoring the state of the plant to determine whether it is operating correctly. Operators routinely monitor an array of control panels and computer displays, to detect system parameters that may deviate from normal operational states (O’Hara and Higgins, 2010; O’Hara et al., 2008). Recent literature has identified a vigilance aspect of detection tasks (Reinerman-Jones et al., 2013). Vigilance is traditionally defined as the operator’s ability to maintain the focus of attention and to remain alert to stimuli over prolonged periods of time (Warm et al., 2008). Loss of vigilance may be expressed in failures to detect critical stimuli or ‘signals’, for example, a gauge indicating that steam pressure exceeds an acceptable value. Often, vigilance deteriorates over time (‘vigilance decrement’: Warm et al., 2008), leading to an increased error rate in signal detection. In the context of NPPs, vigilance is renamed ‘detection’ and is operationally defined as continuous monitoring of a control parameter for identification of changes (Reinerman-

⇑ Corresponding author at: Institute for Simulation and Training, University of Central Florida, 3100 Technology Parkway, Orlando, FL 32826, United States. E-mail address: [email protected] (G. Matthews). http://dx.doi.org/10.1016/j.ssci.2016.05.002 0925-7535/Ó 2016 Elsevier Ltd. All rights reserved.

Jones et al., 2013). The present study investigated operators’ vulnerability to vigilance decrement over short time durations when performing a detection task, using multiple metrics.

1.1. Vigilance in the control room Vigilance decrement is readily demonstrated using laboratory signal detection tasks (See et al., 1995), and field studies in industrial, military, transportation and medical contexts (Warm et al., 2008). Vigilance may be important for NPP operation (Laughery et al., 2002; Mumaw et al., 2000; Thornburg et al., 2012), but controlled empirical studies of the issue are lacking. The applied relevance of laboratory studies of vigilance remains controversial, in part because of the much greater complexity of real systems and displays (Donald, 2008). On the basis of field observations, Mumaw et al. (2000, pp. 42–42) concluded that ‘‘monitoring during normal operations was a complex, cognitively demanding task that was better characterized as active problem solving than as passive vigilance.” Specifically, these authors determined that effective monitoring, or what recent literature calls detection, depended on understanding the current status of plant, which defined which signals were critical at any given time. Operators also devised a wide range of strategies to ease detection demands, such as manipulating alarm set points and leaving sticky notes to flag unusual indicators. Mumaw et al. (2000) also drew attention

98

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

to differences in routine detection, for example during equipment testing, and monitoring alarm signals, which are typically salient. Mumaw et al.’s (2000) analysis indicates the dangers of naïve generalization from laboratory findings. At the same time, several recent research advances suggest the potential relevance of vigilance to the NPP domain. First, much basic and applied research can be accommodated within a common workload-resource model (Hancock, 2013; Warm et al., 2008). Although vigilance assignments appear superficially undemanding, they often impose high workloads that over time induce cognitive fatigue and increased vulnerability to error in signal detection. That is, vigilance is not necessarily a ‘passive’ task, but one that depends on active, effortful interrogation of task stimuli. Workload is a known issue for the NPP operator (Hwang et al., 2008; Lin et al., 2011), and its impact may extend to the detection task performed by the operator (Ha and Seong, 2010). Second, when task workload is high, vigilance decrement may be observed even on tasks of short duration (Matthews et al., 1990; Temple et al., 2000). Brain metabolic activity during certain vigilance tasks declines over periods as short as a few minutes (Warm et al., 2012). Third, due to high workload and limited personal control of the task, vigilance tasks are often stressful (Hancock, 2013; Warm et al., 2008). Similar stress factors appear to predict operator performance in both laboratory and real-world settings (Matthews et al., 2013). Fourth, as in other human factors domains, the increasing automation of NPPs is changing the role of the RO from an active controller of the system to a more remote monitor of automated systems (Jou et al., 2009; Lin et al., 2010). The changing role of the RO is likely to produce greater vulnerability to loss of vigilance. 1.2. Multivariate assessment of mental workload and stress Valid assessment of mental workload is operationally important, although multiple metrics are necessary to evaluate workload and stress (Matthews et al., 2015, 2002; Taylor et al., 2010). Mental workload is typically defined as the total demand for limited cognitive resources imposed by the tasks performed by the operator (Wickens et al., 2013). Subjective measures such as the NASA Task Load Index (NASA-TLX: Hart and Staveland, 1988) are commonly used for assessment. The validity of the NASA-TLX is wellestablished (Vidulich and Tsang, 2012) and it is sensitive to task load variation in NPPs (Gao et al., 2013; Hwang et al., 2008; Lin et al., 2011). However, subjective scales are prone to the biases of self-report, suggesting a need for objective indicators. Typically, performance levels decline with increasing workload, but performance and workload changes may dissociate (Horrey et al., 2009). Psychophysiological measures also provide objective workload metrics. Simulation studies (Gao et al., 2013; Hwang et al., 2008) have shown that electrocardiographic (ECG) and ocular indices are sensitive to manipulations of task complexity. Studies from other process control environments (e.g., Hockey et al., 2009) suggest that electroencephalographic (EEG) measures such as frontal theta also reflect operator workload. Slow-wave EEG activity has been linked specifically to vigilance decrement (Kamzanova et al., 2014). ECG and EEG metrics may be supplemented by hemodynamic measures of metabolic activity in brain areas supporting attention (Warm et al., 2012). Loss of vigilance is frequently accompanied by declining cerebral bloodflow velocity (CBFV), measured using transcranial Doppler sonography (TCD: Warm et al., 2012). Another hemodynamic index linked to mental workload is level of blood oxygenation in frontal areas measured by functional near infrared (fNIR) spectroscopy (Ayaz et al., 2012). However, few studies have compared these various workload metrics for their sensitivity and diagnosticity, and psychometric evidence suggests that they may be only weakly inter-related

(Matthews et al., 2015). Multivariate assessment of workload may be necessary to evaluate the demands of different elements of the NPP operator’s task (Hwang et al., 2008). In addition, continuous psychophysiological recording of the operator may be diagnostic of loss of alertness (Matthews et al., 2010; ReinermanJones et al., 2011). Diagnostic monitoring may identify ROs who are failing to sustain attention effectively, and in need of support from other team members or technological aids. 1.3. Aims and hypotheses The current study used a simulation of NPP operations performed by a RO and Senior Reactor Operator (SRO) working as a team (Reinerman-Jones et al., 2013). The simulation is designed to support the primary tasks of operators identified by O’Hara et al. (2008): monitoring and detection, situational assessment, response planning, and response implementation. It also supports a further key task, checking an instrument or control to verify that it is in the appropriate state (Reinerman-Jones et al., 2013). The SRO initiated the tasks via three-way communication. For example, prior to each task the SRO initiates an instruction to the RO, the RO signals understanding of the instruction, and the SRO confirms the comprehension statement. At this point, the RO performs the task in the simulator. The present study aimed to compare workload responses to a detection task with responses to two routine task elements, checking a single control, and implementing a single response (opening or closing a switch). Specific issues addressed were as follows: Workload profiles of tasks. We aimed to assess EEG, ECG and hemodynamic responses to detection, checking and response implementation tasks, along with subjective stress and workload. Given that even short-duration detection tasks impose high mental demands (Warm et al., 2008), we hypothesized that the detection task would show the highest level of workload, shown in both objective indices such as decreased heart rate variability (HRV: Hwang et al., 2008), EEG frontal theta (Gevins and Smith, 2003) and increased frontal blood oxygenation (Warm et al., 2012). We also hypothesized that detection would elicit stress, as indexed by the key factors for subjective stress: distress, worry and loss of task engagement (Matthews et al., 2002, 2013). In naturalistic settings, tasks include both communication and task execution. Detection, however, may involve periods during which the task is executed without communication. To test the hypotheses, we performed both a ‘naturalistic’ comparison of the three tasks, including communication and execution, and a more restricted comparison of the execution phase of the detection task only with the other two. (It was not possible to separate communication and execution phases of the relatively short checking and response implementation tasks.) Vigilance during detection. We tested for changes in neurocognitive indices of alertness in a sequence of five-minute intervals, during which the detection task was executed. It was hypothesized that performance would decline over time, along with changes in neurocognitive indices diagnostic of loss of alertness, especially increased slow-wave EEG activity and decreased CBFV (Kamzanova et al., 2014; Warm et al., 2012). Predictors of neurocognitive functioning during detection. We tested whether indices of operator alertness during execution of the detection task could be predicted from indices secured from the relatively unchallenging checking and response implementation tasks. It was hypothesized that indices of response to these tasks would be more predictive than baseline measures, given that task-induced workload responses show inter-individual consistency across different levels of task demand (Matthews et al., 2015).

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

2. Method 2.1. Participants Participants were eighty-one (45 males, 36 females, M = 21, SD = 4.11) undergraduate and graduate students from the University of Central Florida. Participants were required to have normal or corrected-to-normal vision (including not being colorblind), and have no prior experience using a NPP simulator or operating a power plant. They were also required to have not ingested nicotine at least two hours prior to the experiment or alcohol and/or sedative medications at least 24 h prior to the experiment.

2.2. Task simulation 2.2.1. Equipment and experimental scenarios The GSE Generalized Pressurized Water Reactor (GPWR) simulator was customized for use in the present experiment. The simulator includes one standard desktop computer (6.4GT/s, Intel XeonTM 5600 series processor), two 2400 (16:10 aspect radio) monitors, and one sound bar speaker. It was configured to require participants to follow a series of tasks and steps required when completing tasks from an Emergency Operating Procedure (EOP). EOPs are plant specific procedures containing instructions for operating staff to implement preventive measures for managing accidents (International Nuclear Safety Advisory Group, 1999). For example, the EOP might guide operators through diagnosing and fixing a faulty aspect of NPP operation. The EOP was developed according to principles for making the tasks accessible to a novice population laid (Reinerman-Jones et al., 2013). It utilized common tasks seen in EOPs, along with other realistic tasks provided by a Subject Matter Expert (SME), a former RO. Because of the use of a novice population, modifications to the EOP and the control panels included reducing the number of controls within one panel, adding additional tasks, and changing the naming convention of specific gauges and switches (ReinermanJones et al., 2013). The controls in each panel may be categorized into five groups: gauges, switches, light boxes, status boxes, and other controls. For this experiment, participants interacted with gauges, switches, and light boxes. The panel A2 contains 197 controls and panel C1 has 113 (53 gauges, 23 switches, 30 other controls, 5 light boxes, and 2 status boxes). To simplify the task, the number of controls on panel A2 was reduced to 113. The number of each category of control on this panel was reduced by c. 43%, so that modified panel A2 contained 62 gauges, 46 switches, 3 other controls, 2 light boxes, and 0 status boxes (see Fig. 1). Thus, the ratio of controls on the modified panel was equated to those of the original panel, and panels A2 and C1 were of similar complexity.

99

ROs refer to certain gauges and switches by their full name (e.g., moisture separator reheated bypass shut off valve). However, the names of those gauges and switches on the panels contain acronyms (e.g., MSR BYP SHUT OFF). To avoid the need to train participants on these acronyms, we made two modifications to the naming convention of gauges and switches that contained both an alphanumeric code and name. First, SROs were required to refer to all gauges and switches by their alphanumeric code (i.e., STM HEADER PRESS gauge became gauge PI-464A1). Second, all gauges and switches that had an alphanumeric code of greater than seven characters were recoded to one of seven or fewer characters (i.e., gauge number EI-6963A1 SA was recoded to EI-6963), adhering to Miller’s (1956) rule for memory capacity of seven plus or minus two items. Controls with no code remained unchanged. Fig. 2 illustrates recoding. 2.2.2. Experimental design and tasks Task type (checking, detection, response implementation) was manipulated within-subjects. To address order effects, three scenarios were generated, with different and realistic task orderings (see Table 1). The task types were only partially counterbalanced because checking always occurs before response implementation in a real NPP and thus, to maintain external validity, this task order was kept in each scenario. Each task was made up of four steps (subtasks). The physics of an NPP dictate that certain steps within each task type occur in a given order. As a result, the steps within each task type were the same across participants. 27 participants were assigned at random to each scenario. The three tasks were configured as follows. The checking task required a one-time inspection of an instrument or control to verify that it was in the state specified by the EOP (e.g., ‘‘‘verify valve PCV-444B is shut”). Participants, acting as ROs, were required to indicate identification by clicking on the correct control and using three-way communication to communicate the state of the controls. The detection task required participants to correctly locate a control and then continuously monitor that control parameter for identification of a specified change. Participants were required to monitor the gauge for five minutes and detect changes in level by clicking on an ‘acknowledge’ button located at the bottom of the display (e.g., ‘‘verify gauge TI-430 SB and report when less than 400 PSIG”). Twelve random changes per minute occurred, totaling 60 changes per detection task. The response implementation task required participants to correctly identify a control, and then open or shut a switch on that control (e.g., ‘‘shut valve 1CS-235B”). Each task type consisted of four steps that were executed using threeway communication led by the experimenter acting as the SRO. 2.2.3. Performance measures Execution and communication performance were secured. Execution measures are the primary focus of the present report. They

Fig. 1. Original A2 panel used by ROs (left) and modified A2 for experimentation.

100

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

Fig. 2. Illustration of recoding of codes for original gauges (left) as codes of seven or fewer characters (right).

Table 1 Partial counterbalancing of task types for scenario generation. Scenario 1

Checking

Scenario 2 Scenario 3

Detection Checking

Response implementation Checking Detection

Detection Response implementation Response implementation

were recorded as follows. For checking, the simulator recorded correct identification of controls and erroneous identifications. For detection, the simulator recorded hits, misses, and false alarms. These data were recorded on a minute-by-minute basis to analyze for temporal changes in performance within steps. For response implementation, the simulator recorded correct and incorrect actions. Three-way communication performance measures were obtained from the verbal interchanges prior to each step in which the SRO instructs the RO, and following each step in which the RO confirms step completion. Measures included instruction events per task, instruction events repeated, instruction clarifications, location help, and percent correct. Instruction events per task were the number of three-way communication events completed. An instruction event repeated was the number of requests by participants (ROs) for a repeated instruction and the number of requests by the SRO for a repeated response from participants. An instruction clarification was a clarification by the SRO to a participant. Location help was the number of requests, by participants, for assistance in locating the correct control. Here, we report only the overall percentage of correct responses, across all parts the of three-way instructions, as an index of the quality of communication between RO and SRO. 2.3. Subjective measures NASA-TLX (Hart and Staveland, 1988). The six subscales of the NASA-TLX require 0–100 ratings of mental demand, physical demand, temporal demand, performance, effort and frustration. Overall workload was computed as an unweighted mean of the

six subscales (Nygren, 1991). The TLX was administered following the final step of each task. Dundee Stress State Questionnaire (DSSQ: Matthews et al., 2002, 2013). The short, 21-item version of the DSSQ was used. It assesses task engagement (motivation, energy, concentration), distress (tension, negative mood, low confidence) and worry (self-focus, low self-esteem, intrusive thoughts about task and personal concerns). The DSSQ was administered as a baseline measure prior to performance, and following the final step of each task. 2.4. Physiological measures EEG. The Advanced Brain Monitoring (ABM) B-Alert X10 wireless Bluetooth system was used at a sampling rate of 256 Hz, with 9 channels: Fz, F3, F4, Cz, C3, C4, Pz, P3, and P4. Reference electrodes were placed at each mastoid. High pass and median filters were applied, as well as 50, 60, 100, and 120 Hz notch filters. Artifacts were identified and removed using the ABM algorithms for artifacts associated with electromyography, eye blinks, excursions, saturations, and spikes (Advanced Brain Monitoring, Inc., 2011). Spectral Power Densities (SPDs) were computed by performing Fast Fourier Transform (FFTs) for four bandwidths: theta (4–8 Hz), alpha (9–13 Hz), beta (14–30 Hz), and gamma (31–100 Hz). As in a previous study (Matthews et al., 2015), SPDs were highly intercorrelated across channels. Thus, SPDs averaged across all 9 channels are used in subsequent analyses. ECG. The ABM system also captured ECG, sampled at 256 Hz. Single-lead electrodes were placed on the center of the right clavicle and on the lowest left rib (Henelius et al., 2009). Artifacts were identified and removed automatically by the ABM software. So and Chan’s (1997) algorithm was used to detect the QRS complex. Mean inter-beat interval (IBI) was computed as the mean of R–R intervals, excluding excessively short (<400 ms) and long (>1600 ms) beats. HRV was calculated as the SD of all beats during the recording period. TCD. Spencer Technologies’ ST3 Digital Transcranial Doppler, model PMD150 (sampling at 1 Hz), measured CBFV in the medial cerebral arteries in the left and right hemispheres, through high

101

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

pulse repetition frequency (PRF). The Marc 600 head frame set was used to hold the TCD probes in place. Poor quality data were removed automatically by the system. CBFV during performance was calculated as a percentage change in mean from an initial baseline with no performance imperative. fNIR. The Somanetics’ Invos Cerebral/Somatic Oximeter, model 5100C (sampling at .2 Hz), was used to measure hemodynamic changes in oxygenated hemoglobin (oxy-Hb) and deoxygenated hemoglobin (deoxy-HB) in the left and right hemisphere prefrontal cortex (Ayaz et al., 2012). Poor quality data were removed automatically by the system. Oxygen saturation (rSO2) during performance was calculated as a percentage change in mean from baseline. 2.5. Procedure Participants completed the Ishihara color-blindness test and a demographics questionnaire. Participants were then trained for two hours using a PowerPoint presentation and the NPP simulator. The presentation provided an introduction to the procedures and protocols for participating in a NPP simulation for experimental research. Participants were trained to use 3-way communication to clearly relay critical information, navigate within the simulator to locate and read status indicators, respond appropriately to a simulated NPP system warning by following standardized procedures, and complete questionnaires. Each aspect was trained separately and then a practice session combined all components. Feedback and proficiency tests were given after each portion. Participants’ scores had to be over 80% to move forward to the experimental scenario. After training, participants were given a fiveminute break. The physiological sensors were connected and a five-minute resting baseline was taken before proceeding with the first task type of the experimental scenario. The steps within the task type were carried out through implementation of the three-way communication protocol initiated by the experimenter acting as the SRO. Physiological responses were recorded continuously during task performance. NASA-TLX and DSSQ were administered after each task condition. The same process was followed for the next two task type conditions. In total, participants participated in the study for four hours. 3. Results 3.1. Performance Key performance measures are summarized in Table 2. Task effects were analyzed using repeated measures one-way ANOVAs. In these and subsequent analyses Box’s correction to the degrees of freedom for violation of sphericity was applied where appropriate.

Table 2 Means (and SDs) for performance measures, by task type. Performance Measure type

Checking

Execution

Percentage of controls located correctly Number of additional attempts to locate control Percent correct responses

75.94 (31.68) 84.37 (34.30) 100.00 (0.0)

Percent correct responses

90.40 (20.85) 82.10 (15.67) 94.16 (10.69)

Instruction

Detection

Response implementation

.33 (1.24)

2.38 (5.67)

.41 (.937)



64.81 (18.63) 60.29 (35.27)

All three tasks have a navigation element of correctly locating the relevant control. Overall location performance was at ceiling for response implementation. Accuracy was lower for checking and detection, but means for these two tasks did not differ significantly on a t-test. In some instances it took multiple attempts to locate the correct control. The number of additional attempts to locate the control were higher for detection than for the other two tasks, F(1.078, 85.126) = 10.270, p = .001, g2p = .096. The execution component of the checking task was defined solely by its navigation element. The other two tasks required a further response, whose accuracy was assessed. Table 2 shows that the percentage of correct responses was a little higher for detection than for response implementation but these means did not differ significantly. The final row of the table refers to instruction performance, i.e., the overall percentage of correct communication reports, from both SRO and RO. Performance was lowest on the detection task, F(1.742, 139.335) = 16.974, p < .001, g2p = .088. (Operators tended to request more repetitions and clarifications for the detection task.)

3.2. Subjective scales: Effects of task type Table 3 shows mean levels of global workload and the three stress state scales following each task. Pre-task baseline data are also given for stress states. Repeated measures one-way ANOVAs were run to analyze for differences between the three task types for each subjective measure. The effect of task type was significant for workload, F(2, 160) = 4.038, p = .019, g2p = .013, task engagement, F(1.725, 137.996) = 38.295, p < .001, g2p = .324, distress, F (2, 160) = 12.982, p < .001, g2p = .140, and worry, F(1.760, 140.768) = 14.498, p < .001, g2p = .153. Table 3 shows that workload was higher for detection than for the other two tasks. The detection task also elicited the lowest level of task engagement, and highest levels of distress and worry. Repeated-measures t-tests, with the Bonferroni correction applied, were used to test for changes in state relative to pre-task baseline for each state. Engagement declined significantly following the detection task, t(80) = 5.63, p < .01, but not after the other two tasks. Distress was lower following the response implementation task, t(80) = 4.56, p < .01, but differences from pre-task means were nonsignificant for the other two tasks. Worry declined significantly following both checking, t(80) = 3.55, p < .01, and response implementation t(80) = 5.21, p < .01, but was unchanged following detection.

3.3. Physiological workload indices: Effects of task type All physiological indices were calculated as percentage changes from baseline. EEG indices were log-transformed prior to calculating percentage change to reduce positive skew. Analyses addressed three key issues related to mean differences in indices.

Table 3 Means (and SDs) for subjective measures, by task type. Subjective variable

Pre-task

Global workload Task engagement Distress Worry



Task type Checking

Response Detection implementation

34.99 (16.97) 34.02 (19.53)

38.85 (18.90)

18.95 (5.45) 19.89 (5.78)

19.77 (6.26)

15.47 (5.91)

10.54 (5.57) 9.05 (6.11) 11.95 (4.89) 10.05 (5.67)

7.59 (5.37) 8.63 (6.10)

10.40 (5.66) 11.46 (6.26)

102

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

1. Do means differ significantly from zero? A nonzero mean indicates that task performance induced a change in the index relative to baseline. 2. Do task means differ significantly? A significant difference implies that the three task conditions elicited differing workload responses. 3. Is there a characteristic workload response to the execution phase of the detection task? The detection task requires both communication (instruction) and execution of the task, but the execution phase may place particular demands on attention. We checked whether measuring workload only during execution influenced the pattern of cross-task differences observed. Table 4 summarizes these analyses. The first three columns give means for each change score, assessed across the full period for each task. Asterisks next to each mean indicate whether the mean differs significantly from zero, on a one-sample t-test (Bonferroni correction applied). A one-way within-subjects ANOVA was run to compare the three means. The next column gives the effect size (ES) for the main effect of task type, and the significance level of the main effect. (We do not give the Fs for these main effects to save space, and because the ESs are of primary interest). Ns for these analyses ranged from 62 to 72. The next column gives the mean for the index measured only during the execution phase of the detection task, and whether it differed significantly from zero. The final column gives the ES from the ANOVA in which the two means for checking and response implementation were compared with the mean for detection (execution only). For example, the first row summarizes analyses of IBI. The first three entries are negative and significant, indicating that IBI tended to decrease significantly relative to baseline in all three task conditions. The effect size from the ANOVA of .083 is significant, indicating that the response differed significantly across tasks; the decrease was least for detection (2.5%). The next column shows that for detection (execution) only, the decrease in IBI was even smaller (.71%) and non-significantly different from zero. With

this mean used in the ANOVA, the effect size increased to .226. Thus, while performance generally tended to induce cardiac acceleration, there was no significant change in IBI during the execution phase of the detection task. In fact, as well as reducing IBI, task performance tended to reduce blood oxygenation, to reduce EEG alpha, and to increase EEG activity in frontal beta and gamma bands. However, all these effects were accompanied by significant main effects of task type, associated with smaller responses on these indices for the detection task. For beta and rSO2, changes did not differ from zero. The analyses including the execution phase only of the detection task in several cases produced larger ESs, i.e., sharper differentiation of the detection task from the other two tasks. Execution of detection was differentiated especially by reduced HRV, reduced theta activity, as well as absences of the changes in IBI, beta and gamma that typified checking and response implementation tasks. That is, detection appears to be typified by low autonomic arousal (longer IBI), conflicting changes in effort indices (HRV and frontal theta), and low levels of higher-order cognitive processing (beta and gamma). 3.4. Temporal effects on execution of the detection task To investigate possible vigilance effects on the detection task, we re-analyzed the data for the execution phase of this task on a minute-by-minute basis. There were four execution steps, each of five minutes duration, so analyses used 4  5 (step  minute) repeated-measures ANOVAs. Dependent measures were error rates on the detection task, as well as physiological responses averaged across each 1-min interval. 3.4.1. Performance effects The primary performance measure was the percentage of changes in the gauge level detected. The ANOVA for this measure showed significant main effects of step, F(2.514, 196.090) = 4.487, p = .007, g2p = .054, and minute, F(2.563, 199.944) = 29.831, p < .001, g2p = .277, as well as a step  period interaction, F

Table 4 Comparisons of means (and SDs) across task conditions, including effect sizes (ESs) for cross-task differences, and significances of differences from zero. Checking IBI

4.65 (6.95)

HRV

**

Response-implement **

Detection **

Cross-task ES (g2p)

Detection (execution)

Cross-task ES (g2p)

.083

.71 (4.69)

.226**

**

3.44 (7.69)

2.50 (4.77)

2.62 (26.85)

9.25 (35.22)

6.48 (22.00)

.038

14.55** (17.47)

.380**

CBFV-L

2.31 (11.86)

.77 (11.9)

.70 (10.06)

.042*

.72 (10.42)

.041

CBFV-R

.24 (8.28)

1.77 (9.06)

.26 (7.06)

.030

.09 (7.75)

.044*

SO2-L

1.50** (2.72)

1.99 (3.48)

**

.78 (2.96)

.123**

.80 (3.01)

.098**

SO2-R

1.06 (2.73)

1.41 (2.79)

**

.02 (2.55)

.230**

.08 (2.54)

.213**

Theta

.29 (3.77)

.21 (3.68)

.40 (2.88)

.042

5.00** (2.81)

.679**

Alpha

4.02** (4.90)

3.49** (5.36)

3.06** (4.02)

.054*

3.15** (4.07)

.035

Beta

1.20** (3.72)

1.41** (3.83)

.13 (3.19)

.124**

.67 (3.85)

.192**

Gamma

6.47** (8.43)

6.78** (8.84)

3.87** (7.09)

.189**

.20 (6.62)

.478**

**

Note. The first cross-task effect size (ES) refers to indices taken from the full period of performance for each task type. For the second ES, full-period detection was replaced by indices measured only during the execution phase. * p < .05. ** p < .01.

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

103

(9.412, 734.112) = 3.580, p < .001, g2p = .044. Effects of step were curvilinear: marginal means for steps 1–4 were 70.1, 74.6, 70.0 and 74.5 respectively. Detection declined at step 3 but recovered at step 4. Marginal means for minute 1–5 were 72.7, 75.6, 75.1, 75.0 and 63.2; i.e., detections were markedly lower in the fifth minute. Fig. 3 illustrates the interaction. The first three steps show a more pronounced effect of minute than the last one, although detection rate declines from minutes 4 to 5 in all cases. We also analyzed the percentage of false positive responses, i.e., acknowledgment of a non-existent change in the gauge. The analysis showed a main effect of minute, F(2.301, 179.451) = 6.110, p = 002, g2p = .073, but no main or interactive effects of step. Marginal means for minutes 1–5 were 18.0, 16.0, 13.7, 12.1 and 14.1. (Corresponding standard errors ranged from 8.9 to 12.3). The decline in false positives over time is commonly seen in vigilance tasks, and may reflect an increase in response criterion (Davies and Parasuraman, 1982). We also calculated the A0 index of perceptual sensitivity (Davies and Parasuraman, 1982), which takes into account both detection and false positive rates. A’ varies from 0.5 (chance response) to 1.0 (perfect accuracy). Analysis of this index confirmed a significant main effect of minute, F(3.571, 267.823) = 15.374, p < .001, g2p = .170, as well as a step  period interaction,

progressively over time, whereas CBFV and SO2 concentration decreased. Second (see Fig. 5), the other three EEG measures showed curvilinear temporal trends, which were especially pronounced for theta and gamma. Power tended to decline below baseline at steps 2 and 3 before recovering at step 4. Most indices also showed main effects of 1-min period. Three of these indices (Fig. 6) showed clear temporal trends. CBFV and IBI declined, whereas alpha power increased. The step  minute interactions for IBI and CBFV reflect weakening of the temporal effect at the last step. The remaining EEG indices also showed main effects of minute, but in these cases the effect of minute was qualitatively different at different steps, producing the significant interactions listed in the table. For example, for theta, which showed the largest effect sizes, minute had little effect at steps 1 and 4. At step 2, theta declined from minute 1 to minute 5 (1.8% to 12.7%) and increased again from minute 1 to 5 at step 3 (12.5% to 2.7%). That is, the minute effect appeared to be driven by the curvilinear cross-step temporal trend shown in Fig. 5.

F(8.345, 625.861) = 2.574, p = .008, g2p = .033. Marginal means for minutes 1–5 were .861, .877, .887, .893 and .841. (Corresponding standard errors ranged from .011 to .012). As for detections, A0 was lowest at minute 5. The interaction was similar to that for detections, with stronger effects of minute at steps 1 and 2 than at steps 3 and 4.

The detection task elicited the highest levels of subjective workload and stress, as well the distinctive pattern of physiological response previously discussed. Thus, correlational analyses focused on the prediction of workload indices during execution of the task from indices available from the other tasks. Such prediction may afford identification of vulnerable operators prior to their participation in the task. As in a previous study using a similar suite of sensors (Matthews et al., 2015), correlations between workload indices secured from different sensors were typically nonsignificant. Thus, the aim was to maximize prediction of a given index during detection from the other available measures of that same index. Table 6 summarizes the correlations (Ns range from 69 to 75). The first column shows that baseline measures were generally negatively correlated with the task-induced response, consistent with the Law of Initial Value (Berntson et al., 1994). These correlations were especially high for the EEG indices. The next two columns show correlates of the percentage change scores for the checking and response implementation tasks. For the ECG and hemodynamic indices, the task-induced response was substantially more predictive of the detection (execution) measure than was baseline. The next two columns show partial correlations between the taskinduced responses and the detection (execution) measure, controlling for baseline. All these partials were significant and in most cases were of substantial magnitude. Thus, measurement of workload response in more undemanding task conditions may serve to identify operators vulnerable to high workload on the relatively demanding detection task.

3.4.2. Physiological workload indices Analyses of the physiological indices are too extensive to report in full: complete ANOVA statistics are available from the authors. Instead, we summarize those effects that reached significance, and describe three patterns of responses that appeared to generalize across multiple indices. Hemodynamic indices (CBFV and SO2) were averaged across hemisphere for this analysis, as temporal trends appeared to be similar across hemispheres. Table 5 shows effect sizes from the ANOVAs, and their significance levels. The indices were generally sensitive to both factors. The majority of indices showed main effects of step, i.e., changes in response across the four five-minute performance intervals. Examination of these main effects showed two distinct types of effects. First, as shown in Fig. 4, several indices showed monotonic or near-monotonic changes. EEG alpha power and HRV increased

100

Step 1

Step 2

Step 3

Step 4

Detections (%)

90

4. Discussion

80

70

60

50

3.5. Correlations of workload indices

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

Minute Fig. 3. Detection accuracy (%) as a function of step and minute.

The present study identified three key issues for evaluating workload during NPP operations: determining the workload response to different tasks, identifying loss of vigilance during detection tasks, and predicting the operator’s neurocognitive functioning. Study findings add to understanding of each issue. First, we showed that physiological workload measures differ in their sensitivity to task demands. However, we found also that it is important to separate instruction and execution phases in assessing workload responses to the detection task. Second, analysis of temporal trends showed vigilance-like effects on both performance and physiological measures. However, temporal change was more complex than simple vigilance decrement, and multiple patterns of

104

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

Table 5 Effect sizes (g2p ) for effects of temporal factors on workload indices. IBI Step Minute Step  minute * **

HRV

CBFV

*

– .152** .056**

SO2

**

.051 .051** –

**

.172 .038* .033*

.170 – –

Theta

Alpha

**

**

.694 .533** .622**

Beta **

.080 .092** .038**

.075 .050* .070**

Gamma .531** .266** .117*

p < .05. p < .01.

5

0 SO2

HRV

Alpha

0

-5

Change (%)

Change (%)

CBFV

-5

-10 -15 -20

-10

1

2

3

-25

4

1

2

Step

3

4

Step

Fig. 4. Variation in selected workload indices across four task steps: near-monotonic trends.

10

Theta

Table 6 Correlates of workload indices measures during execution of the detection task. Predictors include baseline measures and percentage change responses during the other two tasks.

Beta

Gamma

5

Change (%)

Uncorrected correlations

-5 -10 -15

1

2

3

4

Step Fig. 5. Variation in workload indices across four task steps: curvilinear trends. * **

5.0 IBI

CBFV

Alpha

0.0

Change (%)

Partial correlations (baseline controlled)

0

-5.0

-10.0

Index

Baseline Response (Check) Response Response (Check) Response (Res-Imp) (Res-Imp)

IBI HRV CBFV-L CBFV-R SO2-L SO2-R Theta Alpha Beta Gamma

.26* .36** .38** .11 .24* .19 .50** .56** .65** .68**

.57** .70** .84** .51** .60** .68** .63** .78** .63** .78**

.66** .70** .80** .70** .78** .73** .36** .64** .46** .70**

.54** .54** .83** .50** .58** .64** .51** .70** .58** .64**

.63** .55* .78** .70** .78** .72** .34** .65** .40** .49**

p < .05. p < .01.

change were observed. Third, we demonstrated that workload measures taken from relatively undemanding task elements may predict the operator’s neurocognitive functioning under the more demanding conditions of sustained detection. In the remainder of this discussion, we consider the applied relevance of the findings to three practical problems in the NPP domain: workload evaluation for task design, countering possible vigilance effects, and diagnostic monitoring of individual operators. 4.1. Workload evaluation for task design

1

2

3

Minute

4

5

Fig. 6. Variation in workload indices across five one-minute steps.

Reliable and valid workload assessment is critical to designing interfaces and systems that will not overload the operator (Stanton et al., 2005). However, in the NPP domain as in other contexts there are two major challenges. First, workload and performance may dissociate, so that workload measures are not indicative of the likelihood of operator error (Wickens et al.,

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

2013). Second, recent psychometric evidence shows that alternate workload indices are poorly correlated, so that multidimensional assessment is necessary (Matthews et al., 2015). The present data suggest strategies for meeting these challenges in NPP operations. It was hypothesized that the detection task would be higher in workload than the checking and response implementation tasks. The task is intrinsically the most complex, as it requires both navigation to a control and monitoring for specified parameter changes. The vigilance element of maintaining alertness to change may also elevate workload (Warm et al., 2008). NASA-TLX data confirmed that subjective workload was highest for detection, but physiological data suggested a more nuanced picture. The initial cross-task comparison, based on both instruction and execution parts of the task, confirmed expectations in that frontal rSO2 concentrations were higher for detection than for the other tasks, consistent with previous hemodynamic studies of effort (Warm et al., 2012). Contrary to the initial hypothesis, though, we observed weaker alpha blocking and less highfrequency EEG power (beta and gamma) in the detection task. Two indices of high workload identified in previous studies (Gao et al., 2013; Hockey et al., 2009; Hwang et al., 2008) – HRV and frontal theta failed to discriminate tasks effectively. The reanalysis of cross-task differences using only responses to the execution part of the detection task clarified some of the unique demands of monitoring controls for change. In particular, the analysis showed a large-magnitude decline in HRV, consistent with Hwang et al.’s (2008) conclusion that this measure is sensitive to high workload. However, IBI was relatively high, frontal theta was low, and beta and gamma showed relatively low levels of power, as in the original analysis, suggesting reduced levels of cognitive activity. Thus, the data suggest that different indices are diagnostic of different workload components. Hockey et al. (2009) suggest that HRV is primarily sensitive to emotional strain, whereas frontal theta may reflect needs for cognitive control of performance as demands increase. The detection task may then have been relatively stressful and emotionally demanding, as reflected in subjective stress response, but also a task requiring primarily routine processing rather than high levels of cognitive control. Limited needs for cognitive control may also explain the relatively low levels of beta and gamma activity, taken to indicate higher-level cognitive activity. Similarly, Mulder (1986) differentiated task effort driven directly by computational demands from state effort required to protect performance from stress and fatigue. The detection task here may have combined high state effort (low HRV, high rSO2) with low task effort (low frontal theta, beta and gamma). Systematic evaluation of the workload imposed by different elements of NPP is critical for system design, in order to identify activities that may be sufficiently demanding to elevate operator error and threaten safety (Reinerman-Jones et al., 2013). The present findings reinforce the need for system designers to employ multiple measures in evaluating tasks, as well as alternate interface designs. Use of a single measure, whether subjective or physiological, is likely to provide a misleading picture of how operators react to task demands. In addition, communication can mask the impacts of detection on HRV and frontal theta, for example. Thus, careful task analysis must precede workload evaluation, and averaging workload assessment across multiple task elements may lead to loss of sensitivity and diagnosticity. 4.2. Countering vigilance decrement Vigilance decrement is expressed as a heightened probability of detection errors as a task progresses, accompanied by changes in neurocognitive functioning (Warm et al., 2008). We hypothesized that the detection element of NPP might be vulnerable to such

105

decrements. Indeed, within each 5-min step, we found evidence for increasing errors, especially in the final minute of the step. However, as other authors have surmised (Mumaw et al., 2000), vigilance in the NPP context may be more complex than in the laboratory. We also found effects of step, with performance lowest at step 3, and showing recovery at step 4. Furthermore, the minuteby-minute vigilance effect was weakest at step 4. The physiological workload data provide some insights into influences on performance change. Temporal changes varied in complex ways across different indices, confirming the importance of multidimensional assessment (Matthews et al., 2015), but three distinct patterns of response were distinguished. Within each step, CBFV and IBI declined across successive minutes, whereas alpha power increased. Previous work implicates declining CBFV in reduced resource utilization (Warm et al., 2008, 2012), whereas increasing alpha may signal reduced cortical arousal, feeding into resource availability (Kamzanova et al., 2014). These effects correspond to the minute-by-minute changes in detection error. However, we also found effects of step, for which laboratory studies may be a poor model, given that typically participants perform a one-time task run only. By contrast, repetition of tasks, with an interval between task activities, is common in a range of industrial settings, including NPP operation. Near monotonic effects of step partly mirrored within-step changes, in that CBFV declined and alpha increased across steps 1–4. However, given that detection errors were maximal at step 3, it appears participants successfully compensated for any resource depletion at step 4, similar to the ‘end-spurt’ effects sometimes seen in vigilance studies (Temple et al., 2000). (Participants were aware that step 4 was the last one for the task.) In addition, declining frontal SO2 concentration and increasing HRV suggest decreasing state-related effort (Mulder, 1986). By contrast, most EEG bands, especially frontal theta and gamma, showed a U-shaped trend across steps. Interpretation of such a complex pattern is necessarily tentative. Speculatively, declining resources was accompanied by performance coming under increasingly routine control (declining frontal theta) until step 4. At this step, the participant mentally prepares for a new activity, restoring higher level cognitive activity (gamma and beta), which may compensate for loss of resources. Synchronized theta and gamma activity may support emotional modulation of cognition (Buzsáki, 2009), and tactical deployment of cognitive emotion regulation skills (Tolegenova et al., 2014). Thus, frontal theta and gamma may be highest as the person transitions between activities (step 1) or anticipates transition (step 4), and consequently demands on self-regulation are highest (Dennis and Solomon, 2010). The practical implication is that workload evaluations must accommodate possible temporal changes. Checking and response implementation activities are brief enough to treat as a single unit for workload assessment purposes, but the same is not true for detection. Vigilance decrement appears to play a role in detection, and regulating the duration of continuous execution performance may be desirable. However, temporal processes are more complex than vigilance decrement alone, and transitions between activities may also impact neurocognitive efficiency. Indeed, the present data suggest a performance–workload dissociation (Wickens et al., 2013), in that performance increased from step 3 to step 4, although resource utilization indexed by CBFV appeared to decline. Design of task sequences to support the operator’s capacity to compensate for changes in neurocognitive functioning remains a challenge. 4.3. Diagnostic monitoring of operators Currently, the primary application for workload assessment is in assessing tasks and interfaces. However, the continuous assess-

106

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107

ment of operator status afforded by physiological recording may also support diagnostic monitoring of the operator in order to detect suboptimal neurocognitive functioning (Matthews et al., 2010; Reinerman-Jones et al., 2011). Such monitoring may be valuable when operators are fatigued, for example, from a change in work shifts, or when shifts must be extended due to unusual operating circumstances, such as when an RO has to cover shifts for another operator. As well as diagnostic monitoring for development of excessive operator workload, it may be desirable to anticipate loss of operator competence by diagnosis of vulnerability prior to engagement with more challenging tasks. We found that, although checking and response implementation are relatively undemanding tasks, workload responses to these tasks are quite highly predictive of the workload response to the more challenging detection task. Consistent with the Law of Individual Values (Berntson et al., 1994), baseline measures were negatively correlated with taskinduced responses, especially for EEG. However, even with baseline controlled, task-induced responses were intercorrelated. Thus, monitoring operator response during undemanding phases of work may be useful for identifying individuals vulnerable to excessive workload when tasks become more challenging. 4.4. Study limitations Study limitations include the use of a simulated environment and novice operators, as well as modifications to EOPs to accommodate use by novices. The advantage of the current approach is that it allows for the complex team processes of NPP operation to be studied in a controlled environment at modest cost, so that sample sizes are large enough to provide adequate systematic statistical power (Reinerman-Jones et al., 2013). In addition, design of the simulation was informed by an NPP SME. Nevertheless, field studies would be desirable to establish generalization of findings to fully trained operators. Another issue is the generalization of findings to the full range of work activities performed by ROs. For example, the present study identifies vigilance issues in the context of a simulated EOP, but it remains to be determined how such issues would play out in the context of other routine monitoring or maintenance activities. Furthermore, we did not attempt to simulate operational stressors such as threat and fatigue that might impact performance during a real emergency; indeed, levels of subjective stress were generally modest. Typically, operational stress and fatigue factors tend to increase vulnerability to loss of vigilance (Matthews et al., 2000), but further work on this issue is needful. In addition, we were not able to parse the instruction and execution portions of checking and response implementation tasks for data analysis purposes. The execution portion of the detection task yielded different physiological responses compared to the instruction portion: it is of interest to determine whether the same holds true for the checking and response implementation tasks. Finally, the simulation does not capture the use of informal aids to vigilance such as sticky notes that may be employed (Mumaw et al., 2000); there is a potential conflict between realism and experimental control. In any case, it is important to establish basic principles of workload response in the NPP domain whose operation in a real control room can be further investigated. 5. Conclusion Workload responses during simulated NPP operation are complex with respect to both inputs and outputs. The present study shows the importance of accommodating both temporal factors and the multidimensional nature of response in evaluating opera-

tor workload. Findings also identify the potential vulnerability of detection tasks to adverse effects of workload. At a practical level, careful analysis of task demands and the timecourse of work activities is necessary prior to conducting workload evaluations. Such an analysis may also support strategies for identification of operators vulnerable to loss of neurocognitive efficiency prior to their engagement in especially challenging tasks.

References Advanced Brain Monitoring, Inc., 2011. B-Alert Software User Manual. Advanced Brain Monitoring Inc., Carlsbad, CA. Ayaz, H., Shewokis, P.A., Bunce, S., Izzetoglu, K., Willems, B., Onaral, B., 2012. Optical brain monitoring for operator training and mental workload assessment. NeuroImage 59, 36–47. Berntson, G.G., Uchino, B.N., Cacioppo, J.T., 1994. Origins of baseline variance and the law of initial values. Psychophysiology 31, 204–210. Buzsáki, G., 2009. Rhythms of the Brain. Oxford University Press. Davies, D.R., Parasuraman, R., 1982. The Psychology of Vigilance. Academic Press, London. Dennis, T.A., Solomon, B., 2010. Frontal EEG and emotion regulation: electrocortical activity in response to emotional film clips is associated with reduced mood induction and attention interference effects. Biol. Psychol. 85, 456–464. Donald, F.M., 2008. The classification of vigilance tasks in the real world. Ergonomics 51, 1643–1655. Gao, Q., Wang, Y., Song, F., Li, Z., Dong, X., 2013. Mental workload measurement for emergency operating procedures in digital nuclear power plants. Ergonomics 56, 1070–1085. Gevins, A., Smith, M.E., 2003. Neurophysiological measures of cognitive workload during human–computer interaction. Theor. Issues Ergon. Sci. 4, 113–131. Ha, J.S., Seong, P.H., 2010. Attentional-resource effectiveness measures in monitoring and detection tasks in nuclear power plants. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 40, 993–1008. Hancock, P.A., 2013. In search of vigilance: the problem of iatrogenically created psychological phenomena. Am. Psychol. 68, 97–109. Hart, S.G., Staveland, L.E., 1988. Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (Eds.), Human Mental Workload. Elsevier Science Publishers, Amsterdam, pp. 139–184. Henelius, A., Hirvonen, K., Holm, A., Korpela, J., Muller, K., 2009. Mental workload classification using heart rate metrics. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2–6. Hockey, G.R.J., Nickel, P., Roberts, A.C., Roberts, M.H., 2009. Sensitivity of candidate markers of psychophysiological strain to cyclical changes in manual control load during simulated process control. Appl. Ergon. 40, 1011–1018. Horrey, W.J., Lesch, M.F., Garabet, A., 2009. Dissociation between driving performance and drivers’ subjective estimates of performance and workload in dual-task conditions. J. Saf. Res. 40, 7–12. Hwang, S.L., Yau, Y.J., Lin, Y.T., Chen, J.H., Huang, T.H., Yenn, T.C., Hsu, C.C., 2008. Predicting work performance in nuclear power plants. Saf. Sci. 46, 1115–1124. International Nuclear Safety Advisory Group, 1999. Basic safety principles for nuclear power plants, 75-INSAG-3 Rev. 1, INSAG Series No. 12, IAEA, Vienna. Jou, Y.T., Yenn, T.C., Lin, C.J., Yang, C.W., Chiang, C.C., 2009. Evaluation of operators’ mental workload of human–system interface automation in the advanced nuclear power plants. Nucl. Eng. Des. 239, 2537–2542. Kamzanova, A., Kustubayeva, A.M., Matthews, G., 2014. Use of EEG workload indices for diagnostic monitoring of vigilance decrement. Hum. Factors 56, 1136–1149. Laughery, R., Laux, L., Hara, J.M, Brown, W.S., Higgins, J.C., Persensky, J.J., Bongarra, J., 2002. Decision-centered design as the basis of defining the human role in systems. In: Proceedings of the 2002 IEEE Seventh Conference on Human Factors and Power Plants. IEEE, pp. 4–18. Lin, C.J., Hsieh, T.L., Tsai, P.J., Yang, C.W., Yenn, T.C., 2011. Development of a team workload assessment technique for the main control room of advanced nuclear power plants. Hum. Factors Ergon. Manuf. Serv. Ind. 21, 397–411. Lin, C.J., Yenn, T.C., Yang, C.W., 2010. Automation design in advanced control rooms of the modernized nuclear power plants. Saf. Sci. 48, 63–71. Matthews, G., Campbell, S.E., Falconer, S., Joyner, L., Huggins, J., Gilliland, K., Grier, R., Warm, J.S., 2002. Fundamental dimensions of subjective state in performance settings: task engagement, distress, and worry. Emotion 2, 315– 340. Matthews, G., Davies, D.R., Lees, J.L., 1990. Arousal, extraversion, and individual differences in resource availability. J. Pers. Soc. Psychol. 59, 150–168. Matthews, G., Davies, D.R., Westerman, S.J., Stammers, R.B., 2000. Human Performance: Cognition, Stress and Individual Differences. Psychology Press, London. Matthews, G., Reinerman-Jones, L.E., Barber, D.J., Abich, J., 2015. The psychometrics of mental workload multiple measures are sensitive but divergent. Hum. Factors 57, 125–143. Matthews, G., Szalma, J., Panganiban, A.R., Neubauer, C., Warm, J.S., 2013. Profiling task stress with the Dundee stress state questionnaire. In: Cavalcanti, L., Azevedo, S. (Eds.), Psychology of Stress: New Research. Nova Science, Hauppauge, NY, pp. 49–91.

L. Reinerman-Jones et al. / Safety Science 88 (2016) 97–107 Matthews, G., Warm, J.S., Reinerman, L.E., Langheim, L., Washburn, D.A., Tripp, L., 2010. Task engagement, cerebral blood flow velocity, and diagnostic monitoring for sustained attention. J. Exp. Psychol.: Appl. 16, 187–203. Miller, G., 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63, 81–97. Mulder, G., 1986. The concept and measurement of mental effort. In: Hockey, G.R.J., Gaillard, A.W.K., Coles, M.G.H. (Eds.), Energetical Issues in Research on Human Information Processing. Martinus Nijhoff, Dordrecht, The Netherlands, pp. 175– 198. Mumaw, R.J., Roth, E.M., Vicente, K.J., Burns, C.M., 2000. There is more to monitoring a nuclear power plant than meets the eye. Hum. Factors 42, 36–55. Nygren, T.E., 1991. Psychometric properties of subjective workload measurement techniques: implications for their use in the assessment of perceived mental workload. Hum. Factors 33, 17–33. O’Hara, J.M., Higgins, J.C., 2010. Human–system interfaces to automatic systems: Review guidance and technical bases. Human Factors of Advanced Reactors (NRC JCN Y-6529): BNL Technical Report No. BNL91017-2010. Brookhaven National Laboratory, Upton, NY. O’Hara, J., Higgins, J., Brown, W., Fink, R., 2008. Human factors considerations in new nuclear power plants: detailed analysis. BNL Technical Report No. 799472008. Brookhaven National Laboratory, Upton, NY. Reinerman-Jones, L., Guznov, S., Mercado, J., D’Agostino, A., 2013. Developing methodology for experimentation using a nuclear power plant simulator. In: Foundations of Augmented Cognition. Springer, Berlin, pp. 181–188. Reinerman-Jones, L.E., Matthews, G., Langheim, L.K., Warm, J.S., 2011. Selection for vigilance assignments: a review and proposed new direction. Theor. Issues Ergon. Sci. 12, 273–296. See, J.E., Howe, S.R., Warm, J.S., Dember, W.N., 1995. Meta-analysis of the sensitivity decrement in vigilance. Psychol. Bull. 117, 230–249. So, H.H., Chan, K.L., 1997. Development of QRS detection method for real-time ambulatory cardiac monitor. Engineering in Medicine and Biology Society,

107

1997. Proceedings of the 19th Annual International Conference of the IEEE, vol. 1, pp. 289–292. Stanton, N.A., Salmon, P.M., Walker, G.H., Baber, C., Jenkins, D.P., 2005. Human Factors Methods: A Practical Guide for Engineering and Design. Ashgate, Burlington, VT. Taylor, G., Reinerman-Jones, L., Cosenzo, K., Nicholson, D., 2010. Comparison of multiple physiological sensors to classify operator state in adaptive automation systems. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 54, pp. 195–199. Temple, J.G., Warm, J.S., Dember, W.N., Jones, K.S., LaGrange, C.M., Matthews, G., 2000. The effects of signal salience and caffeine on performance, workload and stress in an abbreviated vigilance task. Hum. Factors 42, 183–194. Thornburg, K.M., Peterse, H.P., Liu, A.M., 2012. Operator performance in long duration control operations: switching from low to high task load. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 56, pp. 2002–2005. Tolegenova, A.A., Kustubayeva, A.M., Matthews, G., 2014. Trait meta-mood, gender and EEG response during emotion-regulation. Pers. Individ. Differ. 65, 75–80. Vidulich, M.A., Tsang, P.S., 2012. Mental workload and situation awareness. In: Salvendy, G. (Ed.), Handbook of Human Factors and Ergonomics, fourth ed. John Wiley, Hoboken, NJ, pp. 243–273. Warm, J.S., Parasuraman, R., Matthews, G., 2008. Vigilance requires hard mental work and is stressful. Hum. Factors 50, 433–441. Warm, J.S., Tripp, L.D., Matthews, G., Helton, W.S., 2012. Cerebral hemodynamic indices of operator fatigue in vigilance. In: Matthews, G., Desmond, P.A., Neubauer, C., Hancock, P.A. (Eds.), Handbook of Operator Fatigue. Ashgate Press, Aldershot, UK, pp. 197–207. Wickens, C.D., Hollands, J.G., Banbury, S., Parasuraman, R., 2013. Engineering Psychology and Human Performance, fourth ed. Pearson, Boston.