International Journal of Industrial Ergonomics 75 (2020) 102890
Contents lists available at ScienceDirect
International Journal of Industrial Ergonomics journal homepage: http://www.elsevier.com/locate/ergon
Feasibility of methods for early formative control room system evaluation Eva Simonsen 1, Lars-Ola Bligård *, Anna-Lisa Osvalder Department of Industrial and Materials Science, Division Design and Human Factors, Chalmers University of Technology, SE-412 96, Gothenburg, Sweden
A R T I C L E I N F O
A B S T R A C T
Keywords: Control room Formative evaluation methods Nuclear power
In a nuclear power plant, human factors evaluation is an important activity in the design of the control room system to ensure safe operation. The purpose of this study was to test the feasibility of methods for early formative evaluation of nuclear power plant control room systems, that is to say assessment of more general design decisions. Two methods were chosen for the assessment, scenario-based talkthrough and heuristic eval uation, and they were tested in three nuclear power plant control room system modification projects. The two methods were found useful and suitable for early formative evaluation. The combination of the methods makes it possible to take advantage of the strengths of both methods. Using guidelines focuses the evaluation on iden tifying typical design problems, while using a scenario-based use-focused approach allows identifying problems less typical and more unique for the specific design being evaluated. Doing the latter thoroughly result in a recourse-intensive evaluation, but this can be countered by trading some of the thoroughness for efficiency by using guidelines in the evaluation. Proposals for future work include improving the method combination by providing better support for adapting the implementation of the methods, as well as for the practical execution of the evaluation activity. Future work should also include further investigation of the method combination’s usefulness, for example in other domains.
1. Introduction A nuclear power plant control room system is a socio-technical sys tem (c.f. The definintion by Hendrick and Kleiner, 2001), an “integration of the human-machine interface, the control room staff, operating proced ures, training programme, and associated facilities or equipment which together sustain the proper functioning of the control room” (International Electrotechnical Commission, 2009, p. 10). Consideration of human factors (HF) issues in the design of a nuclear power plant is stated as a safety principle by the IAEA International Nuclear Safety Advisory Group (1999). Many design decisions must be made to develop a nuclear power plant control room system, and evaluation from a HF perspective is an activity that can aid navigation among this multitude of design decisions. As a development project progresses, the possibility of changing the design decreases while the cost of doing so increases. This may lead to safety implications as well if the cost of changes to ensure a safe design is prohibitive (Hale et al., 2007). To assess the suitability of the design and avoid late, expensive and potentially less safe changes, control room system evaluation in early process stages is recommended (Laarni et al.,
2011, 2014; Boring et al., 2014). Ullman (1997) describes design as the successive development and application of constraints to reduce the number of potential solutions to a problem until only one unique product remains. Successively narrowing the solution space through the application of constraints in a development process creates a gradual move from making general design decisions to specific ones. General design decisions will limit underlying and dependent specific design decisions (Bradford, 1994; Papin, 2002). To reduce the need for recon sidering design decisions (and underlying dependent design decisions) they should ideally be evaluated as soon as they are made. Thus, the gradual specificity of design decisions made during development sug gests that early evaluation could be understood as assessment of more general, higher level, design decisions. In a development process, the purpose of an evaluation activity can differ. Formative evaluations are performed to improve the design as part of an iterative development process, and summative evaluations are performed to assess the quality of the design (Scriven, 1967; Nielsen, 1993). Simonsen (2017) showed in a literature study that, in the nuclear power domain, formative evaluation approaches for more general design decisions were less common and not described in as much detail
* Corresponding author. E-mail addresses:
[email protected] (E. Simonsen),
[email protected] (L.-O. Bligård),
[email protected] (A.-L. Osvalder). 1 Present address: Business Unit Fuel, Engineering & Projects, Department Safety & Environment, Vattenfall AB, SE-401 27 Gothenburg, Sweden. https://doi.org/10.1016/j.ergon.2019.102890 Received 31 August 2018; Received in revised form 6 August 2019; Accepted 22 November 2019 Available online 5 December 2019 0169-8141/© 2019 Elsevier B.V. All rights reserved.
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
as summative evaluations that also included more specific, lower-level design decisions. Simonsen (2017) also acknowledged that this gap had to some extent been addressed by academia, but that there is a need to further develop methodologies and methods suitable for formative evaluation of design decisions at higher levels, as well as to assess their applicability for control room system evaluation. A method is a means to an end, a way to achieve a desired impact. Andersson et al. (2011) argued that for HF methods to have an impact in industry, they should be developed and adapted to the constraints of the user and the use context. A method that is not used or is used in the wrong way will not achieve its desired impact, that is to say it must be useful in practice. Nielsen (1993) stated that the usefulness of a system depends on its utility (that the functionality of the system can perform what is needed) and its usability (how well the users can use that functionality). The same is valid for a method (Andersson et al., 2011). Shorrock and Williams (2016) identified three fundamental constraints that affect the application of HF methods in practice, usability being one of them (the other two were accessibility and contextual constraints). The purpose of the study presented in this paper was to test the feasibility of methods for early formative evaluation of nuclear power plant control room systems, in other words assessment of more general design decisions. The paper will first present the prerequisites for a method to be used for this type of evaluation. After that, the two methods that were selected, heuristic evaluation and a scenario-based talkthrough, are described together with details of how they were modified to better suit evaluation of nuclear power plant control room systems. Finally, the paper presents how the methods were tested in three case studies to evaluate their practical usefulness in real modifi cation projects of nuclear power plant control room systems.
measures targeted by the methods. Simonsen and Osvalder (2018) identified six categories of measures relevant for evaluation of nuclear power plant control room systems:
2. Method prerequisites and choice of methods
Simonsen and Osvalder (2018) suggested that choosing measures from all categories would be beneficial because they represent different perspectives that are important to take when assessing a complex system to increase the possibility of identifying discrepancies. While all are relevant for evaluation of control room systems, the categories are suitable for use during different parts of the development process. Having only a system representation of lower fidelity available will affect the possibility to target some of the categories of measures. Direct measures in the category system performance, such as noting the value of crucial plant parameters in a simulator, is difficult with low-fidelity system representations. If the process systems governed by the control room are sufficiently developed, knowledge about task performance may however be used to analytically assess system performance indi rectly. The use of resources category contains measures meant to assess different aspects such as situation awareness and mental workload. Methods to assess situation awareness and mental workload, for example SAGAT (Endsley, 1987, 1988) and NASA-TLX (Hart and Sta veland, 1988), typically require a simulated use that is very close to the use of the future system for the assessment to provide valid results. Measures in the remaining categories (task performance, teamwork, user experience, and identification of design discrepancies) are possible to measure with sufficient precision in early evaluation. To target measures in these four categories are thus a second prerequisite for methods for early evaluation of control room systems (Goals A and B). The first method chosen for the study was a scenario-based talk through. This was an approach where the proposed design was assessed by letting users go through a number of scenarios using a representation of the design, such as a paper drawing. Daniellou (2007) used the term ‘narrative simulation’ to describe the concept of participants building up a detailed oral account of feasible ways of carrying out future tasks. Depending on the level of fidelity of the system representation, users can simulate use, but may also describe the use instead of acting it out in detail. This method is not dependent on all aspects of the simulated use being very similar to the actual use of the implemented design. This flexibility makes the method suitable for early evaluation. Through in clusion of users and simulation or description of use, the scenario-based
� System performance. Measures of the overall outcome of the func tioning of the system as a whole. In the case of control room systems, this could be noting the value of crucial plant parameters such as tank levels and temperatures, when a shift team handles a scenario in a simulator. That crucial plant parameters are kept within the limits for which the plant has been designed is reliant both on the func tioning of technical systems (e.g. automatic functions) and the way the operators operate the plant. � Task performance. Measures of how users perform tasks, such as the number and nature of errors in use, or time to complete tasks. This category also includes qualitative assessments of the users’ way of working. � Teamwork. Measures meant to assess the quality of team-based activity. � Use of resources. Measures meant to assess different aspects of the operators’ use of their mental and physical resources, such as situ ation awareness, mental workload, and physical load. � User experience. Measures assessing the feelings and emotions of the operators. � Identification of design discrepancies. Focus on the design of the control room system, identification of parts of the design that may induce errors in use or in other ways hinder the effect the design is meant to achieve. Typically this is done through a comparison with what experience has shown to be the best way to do it (i.e. design guidelines).
The prerequisites for choosing methods to test in this study were derived from the desired goals of the methods. The goals of the methods for early formative evaluation of nuclear power plant control room systems are that they: (A) should be possible to use for assessment of higher level design decisions (early evaluation), and (B) should be suitable for assessment of complex socio-technical systems, such as control rooms. From these goals, prerequisites to guide method choice can be derived. One prerequisite derived from the first desired goal is the system representation used. Evaluation typically requires some kind of repre sentation of the system to be assessed. There are several different di mensions of fidelity to consider when developing system representations for evaluation: breadth of features, degree of functionality, similarity of interaction, and aesthetic refinement (Virzi et al., 1996). A system representation that compromises one or more of these dimensions in a way that is obvious to the user is to be regarded as low-fidelity according to Virzi et al. (1996). Using a high-fidelity system representation to assess higher level design decisions has disadvantages. There is a risk that the assessment might focus on details that are not of interest for evaluation at that moment in time while ignoring the design decisions that need to be assessed. Creating a high-fidelity system representation may also require a lot of resources in terms of time and money. Using a low-fidelity system representation to assess higher level design decisions lower the risk of the assessment focusing on the wrong things because of the representation. Creating a low-fidelity system representation also typically requires fewer resources. One prerequisite for methods suitable for early evaluation are therefore that they should not be dependent on high-fidelity system representations. Some aspects of the use of an implemented design, such as time, may not be accurately represented by simulated use of a low-fidelity system representation. Evaluations methods that rely on simulated use being close to actual use, such as some usability tests, are therefore dependent on having high-fidelity system representations, and are thus not suitable for early evaluation. One prerequisite derived from the second goal related to the 2
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
talkthrough targets measures in the categories task performance, team work, and user experience. If the causes of non-satisfactory findings in these categories are traced back to problems in the design the method targets measures in the identification of design discrepancies category as well. The scenario-based talkthrough method used in this study primarily drew inspiration from group-based expert walkthrough (Følstad, 2007a, 2007b) and participatory simulation as described by Daniellou (2007) and Andersen (2016). The group-based expert walkthrough is a method that originates the human-computer interaction domain, so it has a natural inclination towards user interface design. The participatory simulations by Andersen (2016) were applied in design of hospital work systems, and thus focused on assessing ergonomic conditions such as organisation of work, work flows, and procedures. The assessment of control room systems requires a method that is capable of evaluating design decisions regarding both user interfaces and ergonomic condi tions of work systems. In order to better suit evaluation of control room systems, the scenario-based talkthrough was modified. In the group-based expert walkthrough by Følstad (2007a) a general discussion about the overall usability of the design concept is initiated at the end of each scenario. In the scenario-based talkthrough in the present study a set of discussion questions was added at the end of the workshop. Their purpose was to focus the assessment on important aspects that would not necessarily be discussed during the scenario talkthrough and also to further stimulate a reflective discussion about the design as a whole. Examples of questions posed were: “Does this design minimise the risk of persons being harmed or subjected to harmful substances? If not, why not?” and “Does this design allow flexibility in how tasks are executed? Can this flexibility lead to negative consequences? If not, why not?“. In the group-based expert walkthrough method (Følstad, 2007a), the criticality of discrepancies is assessed directly when they are found. In the scenario-based talk throughs in the present study discrepancies were prioritised according to the importance of addressing them at the end of the workshop to minimise disruption of the scenario talkthroughs. In order to better understand the advantages and disadvantages of the scenario-based talkthrough, a second method was chosen and applied for comparison. The second evaluation method chosen for this study was heuristic evaluation. Heuristic evaluation is an easy, fast and cheap evaluation method where a small set of evaluators is used to assess how well an interface (in this case, a socio-technical system such as a control room) complies with recognised design guidelines (Nielsen,
1994). Heuristic evaluation has the advantage that it does not neces sarily require the involvement of users and utilises existing design knowledge in the form of guidelines (Nielsen, 1994). The fact that the assessment does not rely on simulated use at all makes the method suitable for early evaluation with a system representation of lower fi delity. A heuristic evaluation targets measures in the identification of design discrepancies category since it directly identifies parts of the design that may induce errors in use or in other ways hinder the effect the design is meant to achieve. Compared to the scenario-based talkthrough the identification of design discrepancies done through heuristic eval uation may be more consciously directed to investigate the existence of typical design problems. Furthermore, through the use of existing design knowledge formulated as design advice (guidelines), the heuristic evaluation method facilitates the utilisation of knowledge that the par ticipants do not already have. In the scenario-based talkthrough work shop, the participants must rely on their existing design knowledge. The heuristic evaluation method was also modified to better suit evaluation of control room systems. The procedure of heuristic evalua tion is to let the evaluators go through the interface several times and compare the design to a list of recognised design guidelines (Nielsen, 1994). The method utilised in the present study was similar to, but also differed from, the heuristic evaluation described by Nielsen (1994). The heuristic evaluation in the present study was performed by systemati cally going through the list of HF guidelines and comparing the sug gested design with the guidelines, instead of the opposite – going through the design and comparing it with guidelines. This modification of the method was made to tailor it to the second goal of the method combination, to make it more suitable for evaluation of control room systems (Goal B). Control room systems are typically large in scope. Having the guidelines as the starting point for the comparison instead of the many different parts of the control room system was deemed to be more time-efficient and was seen as a way of ensuring consistency in the system as a whole. 3. Empirical evaluation of the chosen method combination Case studies were used to assess the use of the methods in practice. Case studies allow proximity to real-life situations that generate a learning process which promotes advanced understanding (Flyvbjerg, 2006; Yin, 2014). Case studies as a research method allow exploration of the methods’ advantages and disadvantages when used in the chosen context.
Table 1 Details on the execution of Case 1. Waste management control room (Case 1) Process stage The project was at a stage where a decision had to be made as to whether the modernised operator interfaces would be analogue, screen-based, or hybrid. Level of specificity in the design to be evaluated Control room layout suggestion for screen-based operator interface (no interface design had yet been done) System representation 2D paper drawing CR, 2D paper drawings plant Heuristic evaluation workshop Number of participants (total) 2 Number of HF specialists 1 (the moderating HF specialist was the same in Cases 1 and 2) Number of persons with operational knowledge 1 Selection of requirements/guidelines from: Control room philosophy (a collection of HF requirements and guidelines for that specific control room), ISO standards Prioritisation of identified discrepancies By the participants during the workshop Scenario-based talkthrough workshop Number of participants (total) 3 Number of HF specialists 2 (the moderating HF specialist was the same in Cases 1 and 2) Number of operators 1 Number of other persons with operational knowledge Number of project members Types of scenarios List of overall tasks Scenarios prepared by Researcher, with help from persons with operational knowledge Use of LEGO™ figures Yes Prioritisation of identified discrepancies By the participants during the workshop
3
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
Three control room modification projects at a Swedish nuclear power plant were used as cases. They were all at stages of the development process where changes to the design were still possible, and formative evaluation was therefore worthwhile. Case 1 was the planned modern isation of the waste management control room. Case 2 was the addition of new functionality to the plant, independent core cooling, which included a new control panel in the main control room, as well as new control panels and components to be operated from outside the control room, in another building. Case 3 was the addition of a new operative workplace for the shift supervisor at the central work desk in the main control room. This project also included the installation of two new computer screens for presenting plant data on the rear of the reactor console. The cases are further described in terms of process stage and level of specificity in the design to be evaluated in Tables 1–3.
3.1. Execution of evaluation workshops In each case, two assessment workshops were held, the first using the heuristic evaluation method and the second using the scenario-based talkthrough method. In each case, the project’s HF specialist was the moderator during the workshops and the researcher (the first author) merely observed. Before the workshops, the researcher gave the moderator spoken and written instructions on how to utilise the methods. The procedures for the evaluations are presented in Table 4. Persons with operational knowledge (but not actively working as operators) participated in the heuristic evaluation together with the HF specialists. This was done to complement HF knowledge with knowledge of working in the control room. Representatives of the users (people actively working as operators) of the proposed solution participated in the scenario-based talkthrough. Engineers were included as participants
Table 2 Details of the execution of Case 2. * United States Nuclear Regulatory Commission (2012). Independent core cooling (Case 2) Process stage This project was at a later stage in the development process, where more detailed design decisions had been drafted but not finalised. Level of specificity in the design to be evaluated Suggestions for room layout, control panel interfaces, and location of local control equipment in the plant. System representation 2D paper drawings plant, paper drawings control panel interfaces Heuristic evaluation workshop Number of participants (total) 3 Number of HF specialists 2 (the moderating HF specialist was the same in Cases 1 and 2) Number of persons with operational 1 knowledge Selection of requirements/guidelines from: Control room philosophy (a collection of HF requirements and guidelines for that specific control room), NUREG-0700*, Company specific guidelines, Project HF analysis Prioritisation of identified discrepancies By the participants during the workshop Scenario-based talkthrough workshop Number of participants (total) 8 Number of HF specialists 2 (the moderating HF specialist was the same in Cases 1 and 2) Number of operators 4 Number of other persons with operational knowledge Number of project members 2 (engineers) Types of scenarios Description of initial situation & procedure-like instructions Scenarios prepared by HF specialists, with help from project engineers Use of LEGO™ figures Prioritisation of identified discrepancies By the participants during the workshop
Table 3 Details on the execution of Case 3. Central work desk in main control room (Case 3) Process stage The project was at a stage where decisions had to be made regarding the overall layout in the control room. Level of specificity in the design to be evaluated Overall control room layout including placement of workstations, documentation storage, and computer screens System representation 2D paper drawing CR, Printed 3D drawing CR Heuristic evaluation workshop Number of participants (total) 2 Number of HF specialists 1 Number of persons with operational knowledge 1 Selection of requirements/guidelines from: Control room philosophy (a collection of HF requirements and guidelines for that specific control room) Prioritisation of identified discrepancies By the participants during the workshop Scenario-based talkthrough workshop Number of participants (total) 11 Number of HF specialists 1 Number of operators 7 Number of other persons with operational knowledge 1 Number of project members 2 (project leader & engineer) Types of scenarios Description of initial situation & detailed descriptions of actions Scenarios prepared by Researcher, with help from persons with operational knowledge Use of LEGO™ figures Yes Prioritisation of identified discrepancies By the HF specialist after the workshop due to time constraints
4
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
together with detailed descriptions of actions to be done. In addition, a discussion of how the shift team would handle normal operation (e.g. start-up, and outage) using the new design was added to the workshop agenda in Case 3. In Case 3, the design to be evaluated affected most tasks in the control room, but unlike in Case 1 there were too many tasks to cover in the time available for the workshop. Instead, a selection had to be done to cover important aspects. The scenarios selected were chosen to challenge the new design. Due to the difference in scenario types, interaction with scenarios was done somewhat differently in the three cases. In Case 1, the moderator let the operator describe how each task on the list would be done using the new design of the control room. In Case 2, the partici pating shift team acted out the scenarios in their respective roles ‘using’ the new design (e.g. ‘maneuvering’ controls on the paper drawings) or describing how they would have used it (e.g. how they would move within the plant). In Case 3, the moderator urged the operators to describe how they would handle the situations described in the scenarios using the new design, using the written descriptions of the scenarios as memory aids rather than scripts. Both the heuristic evaluation and the scenario-based talkthrough method stipulated that discrepancies identified in the design concepts during the evaluation workshops in all three cases should be noted. If ways of solving them were spontaneously mentioned by the participants they were noted as well, but they were not actively sought. In addition, information useful for the continued detailing of the design concepts (in other words new requirements) was to be documented. In the scenariobased talkthrough workshops positive aspects in the design concepts were noted as well, and the participants were asked to refrain from commenting on the design solution until after the scenario ended in order to avoid hampering the talkthrough flow. Instead, they were asked to note discrepancies and new requirements on Post-it notes for later discussion. Further details on how the study was executed in the three cases are shown in Tables 1–3.
Table 4 The procedures used in the three cases. Heuristic evaluation
Scenario-based talkthrough
(1) Selection of guidelines (before the workshop) (2) Written and spoken instructions for workshop execution to moderator (before the workshop) (3) Presentation of design concept to be evaluated to workshop participants (during the workshop) (4) Comparison of design with guidelines (during the workshop)
(1) Selection/development of scenarios (before the workshop) (2) Written and spoken instructions for workshop execution to moderator (before the workshop) (3) Presentation of design concept to be evaluated to workshop participants (during the workshop) (4) Talkthrough of design using scenarios (during the workshop) (5) Asking discussion questions (during the workshop) (6) Severity classification of identified discrepancies (during the workshop)
(5) Severity classification of identified discrepancies (during the workshop)
in the scenario-based talkthrough workshops for Cases 2 and 3. The project leader was included in Case 3. The exact number of participants in each workshop is shown in Tables 1–3. In all cases, the evaluation workshop took place in an ordinary meeting room, and the participants had the system representations available on the meeting table. The system representations used differed between the cases. In Case 1, the design to be evaluated was a new control room layout for a screenbased control room (the existing control room was analogue). The sys tem representation used was a 2D scale paper drawing (top view) of the new design of the control room, but 2D scale paper drawings (top view) of the existing plant were also provided to serve as a base for discussion of tasks executed outside the waste management control room. In Case 2, the design to be evaluated was new control panel operator interfaces, new control room layout and the location of new local control equip ment (e.g. for valves) in the plant. As system representation, 2D paper drawings in full scale of the new control panels were used, together with 2D scale paper drawings (top view) of the plant (both existing and new design). In Case 3, the design to be evaluated was a new overall control room layout including placement of workstations, documentation stor age, and computer screens. As system representation, both 2D scale drawings (top view) and printed 3D drawings (perspective view) of the new design of the control room were used. When system representations are far from full-scale, participants cannot place themselves ‘within’ the system representation, but may be given objects to represent themselves instead. In studies by Andersen (2016), participants were provided with LEGO™ figures to use as rep resentations of themselves when they acted out scenarios in system representations that were not full-scale. The same approach was used in the scenario-based talkthrough in Cases 1 and 3. However, this approach was not used in Case 2 since the evaluation of that case relied less on top-view drawings as a system representation, and more on full-scale paper drawings of the operator interface design. The scenarios used in the scenario-based talkthroughs differed be tween the three cases as well. In Case 1, the scenarios in consisted of a list of the tasks to be executed in the control room, each broken down into sub-tasks. This choice of scenarios was made because the new design to be evaluated concerned the whole control room, so it was deemed important that the scenarios covered all the tasks executed there, and the range of task was small enough to make it practically possible. In Case 2, the scenarios were formulated as procedures for the shift team (detailed descriptions of actions to be done) together with descriptions of the initial situations in which the shift team were to imagine themselves when following the procedure-like scenarios. The purpose of design to be evaluated in Case 2 was to allow the shift team to handle a few very specific operational situations, so the scenarios needed only to cover these. In Case 3, simulator scenarios where the shift team had to handle severe disturbances (e.g. tube rupture, fire) were chosen from the operators’ training programme. These scenarios con sisted of descriptions of the initial situations the operator were to handle
3.2. Data collection for research study The data collected in this study was of two types: data relating to the result of the evaluation workshops, discrepancies and new re quirements; and data relating to the participants’ experience of the methods. To collect the first type of data, the workshops were both audio and video recorded. During the workshops, the researcher observed and took notes of discrepancies and new requirements that were identified. The list was later complemented with the notes taken by the HF spe cialists who participated in the workshops. Video analysis was used to make the final verification of the list of discrepancies and new requirements. To collect the second type of data, the researcher held semistructured interviews with all workshop participants after each work shop (either in groups per role or individually, face-to-face or by phone) about their opinions of the method used. However, the operators participating in the scenario-based talkthrough for case 3 were not available for post-workshop interviews. They were instead asked to answer written questions by email. The interview questions related to how the interviewees experienced participating in the workshops and how they perceived the usefulness of conducting these types of evalu ations. The HF specialists were also asked more detailed questions about the following topics: how they experienced the methods’ ability to identify discrepancies and new requirements, how they would estimate the resources needed to prepare the workshops, how easy/difficult it was to understand the methods, their experiences regarding moderating the workshops, their feelings about executing the workshop, and how likely they thought it would be that they would use the methods again. The interviews were audio-recorded. The participants were also asked to rate their responses to these questions in a questionnaire.
5
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
3.3. Data analysis
Table 5 The number of identified discrepancies and new requirements in the three cases, and overlap in identified items. Case 1
Case 2
Case 3
14
19
10
0 14
5 14
3 7
Number of discrepancies classified as “must be addressed” Number of discrepancies classified as “should be addressed” Number of discrepancies classified as “may be addressed” Scenario-based talkthrough (ST) Number of identified discrepancies and new requirements Number of identified discrepancies Number of identified new requirements
-
3
3
-
2
0
-
0
0
19
14
29
4 15
11 3
7 22
Number of discrepancies classified as “must be addressed” Number of discrepancies classified as “should be addressed” Number of discrepancies classified as “may be addressed” HE & ST Number of discrepancies and new requirements identified in both HE and ST workshop
4
9
6
0
1
0
0
1
1
1
4
8
Heuristic evaluation (HE) Number of identified discrepancies and new requirements Number of identified discrepancies Number of identified new requirements
In the data analysis, the two types of data were compiled and ana lysed separately before being complied and analysed together in relation to the goals of the methods, the effects of the modifications made, and the usefulness of the methods. For each workshop, the number of identified discrepancies and new requirements were counted, and items being identified in both the heuristic evaluation and the scenario-based talkthrough were noted. The severity ratings for discrepancies were noted, and the identified dis crepancies and new requirements were sorted in categories depending on content. The design level of each discrepancy or new requirement was also noted, using a categorisation by Bligård et al. (2016) that de fines a design level hierarchy in the following way (levels listed from higher to lower): � Effect: The effect that the machine is intended to achieve in the context (the term ‘machine’ is defined as the artefact with which the end users will be interacting, i.e. the product being developed) � Usage: The use of the machine by humans � Architecture: The technical architecture of the machine � Interaction: The interaction between human/context and the ma chine in detail The levels are hierarchical in the sense that design decisions on higher levels restrict possible design decisions on underlying levels. All interviews were transcribed in full and the transcribed material was coded into two broad categories: positive and negative statements
Table 6 Identified discrepancies and new requirements sorted in categories, per method and case. Some examples of discrepancies and new requirements have been inten tionally vaguely written due to information security restrictions. * disc. ¼ discrepancy; new req. ¼ new requirement; DL ¼ design level Category
Case 1 Operator interface Placement (within the control room, within the plant) Functionality (that is not needed) Space Work environment (lighting) Case 2 Operator interface Placement (within the plant) Functionality (that is needed/not needed) Space Staffing Training/way of working Procedures Case 3 Placement (within control room) Functionality (that is needed/not needed) Space Work environment (sound) Sight lines Usability (efficiency) Cooperation
Example of item in category (disc./new req., DL*)
Number of identified disc./new req. Heuristic evaluation (HE)
Scenario-based talkthrough (ST)
Start of pumps should be blocked when associated valves are in the wrong position (new req., DL Interaction) Manoeuvre of a frequently used valve should be remote for due to risk for injury (new req., DL Usage) Additional work stations (compared with today) are not needed (new req., DL Architecture) More space for whiteboards and notice boards are needed (new req., DL Interaction) Lighting must be adjusted to fit new work stations (new req., DL Architecture)
11 (1 shared with ST) 2
9 (1 shared with HE)
1
1 2 -
Buttons to initiate independent core cooling sequences placed too close to each other (disc., DL Interaction) Additional indication of position of certain valves needed (disc., DL Architecture)
1 (1 shared with ST) 4 (1 shared with ST) 8 (1 shared with ST) 1 1
8 (1 shared with HE)
1 (1 shared with ST) 3
1 (1 shared with HE)
2 (2 shared with ST) 2 (1 shared with ST) 2 (2 shared with ST) -
5 (2 shared with HE)
4 (3 shared with ST) -
7 (3 Shared with HE)
Possibility for communication with main control room needed from certain locations in the plant (disc., DL Usage) Room for ramps over thresholds needed (new req., DL Interaction) Responsibility for certain tasks must be given to operators with certain competence (new req., DL Effect) Training of operators must address consequences of initiating sequences for independent core cooling when they are not needed (new req., DL Usage) Criteria for terminating independent core cooling must be clear in procedures (new req., DL Interaction) Controls for communication devices can be moved from central parts of the control room (new req., DL Architecture) Shift supervisor needs a separate office space in addition to new workstation (new req., DL Architecture) Central desk should be split as not to hinder movement through the control room (disc., DL Architecture) Spot with bad acoustics in existing control room should be avoided when placing new workstations (new req., DL Architecture) Screens on one of the workstations must be done in a way that does not block the view of the control panels (new req., DL Architecture) Adjustable screen should be easy to move into position (new req., DL Interaction) Field operators not as included in control room work with new position of their workstations (disc., DL Architecture)
6
7
2 (1 shared with HE) 3 (1 shared with HE) -
-
6 (1 shared with HE) 7 (2 shared with HE) 1
1 2
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
Fig. 1. Number of identified discrepancies and new requirements per design level (C1/C2/C3 ¼ Case 1/Case2/Case 2).
regarding the content and characteristics of the methods and how they were used. Each coded statement was then abstracted into a short description of its content, sorted, and descriptions with similar content were grouped. The analysis of the compiled data (both types) was structured to assess the desired goals of the methods and the effects of the modifica tions made. The desired goals were: A) that they should be possible to use for assessment of higher level design decisions (early evaluation), and B) that they should be suitable for assessment of complex sociotechnical systems, such as control rooms. The prerequisites for choosing the two methods were: 1) that they were not dependent on high-fidelity system representations (derived from Goal A), and 2) tar geted categories of measures relevant for assessment of control room systems in early evaluation, namely task performance, teamwork, user experience, and identification of design discrepancies (derived from Goals A and B). The methods were modified to better adapt them to evaluation of control room systems (Goal B), and the effects of the following method modifications were investigated: 1) the structure of the heuristic evaluation (checking guidelines against the design and not the other way around), 2) adding discussion questions to the scenariobased talkthrough, and 3) assessing the severity of identified discrep ancies at the end of both types of evaluation workshops. The data was also analysed with regard to the prerequisite of usefulness in practice, sorting statements in the categories of utility and usefulness.
5. Analysis The data from the case studies was used to investigate the usefulness of the methods, the effects of modification of the chosen methods, and if the chosen methods met the desired goals. 5.1. The usefulness of the methods To assess the usefulness of the methods, their utility and usability were analysed. When answering the questionnaire, on questions that related to utility (questions 1, 3, 4, 5, and 6) the participants were positive (Table 7). They perceived the utility of the methods to be high (questions 1, 3, and 4; median 4 to 4.5 out of 5). The HF specialists perceived the methods’ ability to identify discrepancies and information that could improve the design later in the development process (that is, new requirements) to be good (questions 5 and 6, median 4 out of 5). The participants rated the methods similarly to a very high degree. The methods’ ability to identify discrepancies and new requirements is another aspect of their utility. More discrepancies were identified with the scenario-based talkthrough than with the heuristic evaluation. The pattern was not as clear regarding new requirements. In Case 1, the number of identified new requirements were almost the same with the two methods. In Case 2 more new requirements were identified with the heuristic evaluation, and in Case 3 the result was the opposite. Of the identified discrepancies many were classified as “must be addressed” or “should be addressed”, indicating that the identified discrepancies were not considered trivial. No pattern in the classifications of severity of the identified discrepancies and requirements could be discerned between the methods. A number of utility-related strengths of the methods were expressed in the interviews. Participants in all three cases noted that the methods were a good way to check that the design is “on the right track”. It was also viewed as a positive aspect that the methods identified new re quirements as well as discrepancies, and the HF specialist in Case 3 noted that the scenario-based talkthrough made it easier to find solu tions to identified discrepancies. After the scenario-based talkthrough in Case 1, the HF specialists concluded that the workshop had become a data collection activity as well as an assessment, which was not planned, but they conceded that they had gained much useful knowledge about the work to be supported by the new design. The inclusions of persons with operational knowledge in the heuristic evaluation was considered necessary by participants in all three cases. However, the HF specialist in Case 3 pointed out that if the person with operational knowledge is not
4. Results The number of discrepancies and new requirements identified in each workshop is presented in Table 5, together with the number of items identified in both workshops for each case. Table 6 present the items sorted in categories according to their content. Fig. 1 presents identified discrepancies and new requirements per design level. The participants’ ratings in the questionnaire are pre sented in Table 7. In the interviews, the participants expressed both positive and negative aspects of the methods. They spoke about the evaluation ac tivity in relation to the project, the organisation and the development process; the method’s ability to identify important discrepancies and new requirements; adapting the implementation of the methods to the design to be evaluated; the knowledge and expertise of the participants in the workshops; practicalities of planning and executing the work shops; and resource demands.
7
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
Table 7 Questionnaire ratings. Question
Ratings (medians)
Persons with operational knowledge & operators 1) How do you experience the utility of performing this kind of evaluation? (very low utility/very high utility, 1-5) 2) How was it to participate in this evaluation? (very easy/very difficult, 1-5) Project leaders & designers 3) How do you experience the utility of performing this kind of evaluation? (very low utility/very high utility, 1-5) HF specialists 4) How do you experience the utility of performing this kind of evaluation? (very low utility/very high utility, 1-5) 5) How do you experience the method’s ability to identify discrepancies in the design? (very bad/very good, 1-5) 6) How do you experience the method’s ability to identify information that can improve the design later in the development process? (very bad/very good, 1-5) 7) How would you approximate the work needed to prepare this kind of evaluation? (not at all demanding/very demanding, 1-5) 8) How was it to understand how the method should be executed? (not at all difficult/very difficult, 1-5) 9) How was it to moderate this evaluation workshop? (not at all demanding/very demanding, 1-5) 10) How do you think this evaluation workshop went? (very bad/very good, 1-5) 11) How likely is it that you will use this method in another project in the future? (very unlikely/very likely, 1-5)
Heuristic evaluation
Scenario-based talkthrough
4 1
4 2
[did not participate]
4.5
4.5 4 4
4 4 4
2.5 1.5 2 4.5 5
4 2 3 4 5
The usability-related statements all concerned weaknesses in the methods. In the interviews after the heuristic evaluation workshops in Cases 1 and 2, the HF specialists noted that the exact purpose of the workshops was not sufficiently discussed and defined beforehand, which made the desired outcome of the workshops unclear. In Case 2, the HF specialists also noted that they did not think through which role each participant should have in the heuristic evaluation workshop, which contributed to a group dynamic that was not the way they would have wanted it to be. The HF specialists in Case 1 and 2 did not think the purpose of the scenario-based talkthroughs was clear either. It was also noted that the roles and responsibilities of the participants in the scenario-based talkthrough workshops in Cases 2 and 3 were unclear as well. In all three cases the HF specialists questioned how detailed and elaborate a scenario-based talkthrough needs to be early in a develop ment project, when the design is less detailed. In Case 3, the HF specialist specifically mentioned that a simpler and less time-consuming version would have sufficed. The HF specialists in Case 2 expressed difficulties in adapting the implementation of the scenario-based talk through to the design to be evaluated, and desired more support in this from the method. One such difficult adaptation was selecting relevant scenarios and shaping them to focus the discussion on the right level of detail (Case 2). Steering the discussion in the scenario-based talkthrough to a relevant level of detail was a problem the moderators in all three cases expressed. In all, the participants expressed more usability-related weaknesses regarding the scenario-based talkthrough than they did for the heuristic evaluation. As an overall assessment of usefulness, the participants responded positively to the two questionnaire questions regarding this aspect. They felt the evaluation workshops went well (question 10, median 4 and 4.5 out of 5) and would use the methods again in the future (question 11, median 5 out of 5), Table 7.
familiar with the heuristic evaluation method (or similar methods), it might be more time-efficient to use the workshop as an opportunity for data collection. The data can then be used by the HF specialist after the workshop to assess the fulfilment of requirements and guidelines. The inclusion of designers in the scenario-based talkthrough workshop was viewed as very beneficial in Case 2 since the designers were getting firsthand insights into the use of the proposed design, and were able to provide very detailed answers the operators’ questions on the design. The HF specialist in Case 3 pointed out that a prerequisite for success fully including the project leader and designers in the workshop is that they listen, answer questions, and do not try to defend their design to the users. In addition, the positive aspects that are presented below regarding modification of the methods, as well as the reasons for choosing the methods and their desired goals were also utility-related. Comparing the utility-related strengths of the heuristic evaluation with the ones of the scenario-based talkthrough can be summarised as both methods being viewed as having acceptable utility, but the latter was also viewed as providing functionality that was not expressed for the heuristic evaluation. The participants noted utility-related weaknesses in the methods as well; there were aspects of the control room system designs that the participants did not feel were sufficiently covered in the scenario-based talkthroughs. In Case 1, the discussion did not provide as many new requirements regarding how the digital operator interface design should be designed as the HF specialists would have wanted. In the assessment of Case 2 on the other hand, several discrepancies in the operator interface design were identified but the discussion did not focus on the design of the tasks or decision-making to the degree that the HF spe cialists had hoped for. The scenarios in Case 3 did not sufficiently cover normal operation, which affected the discussions of the design. The concern that the heuristic evaluation did not sufficiently cover parts of the control room design was expressed in one of the cases. In Case 2, the HF specialists noted that the level of detail in the design to be evaluated differed between different parts, and that it would have been beneficial if the selected guidelines suited that level of detail. In addition, the negative aspects that are presented below regarding the modification of the methods, as well as the reasons for choosing the methods and their desired goals were also utility-related. When answering the questionnaire, for questions relating to usability (questions 2, 7, 8, and 9) the participants were overall positive (Table 7). However, the HF specialists believed that the scenario-based talk throughs would be rather demanding to prepare (question 7, median 4 out of 5). Moderation of the scenario-based talkthrough workshops was felt to be moderately demanding (question 9, median 3 out of 5). These concerns did not show in the ratings for the heuristic evaluation.
5.2. Modification of methods The procedure of the heuristic evaluation was changed (going through the guidelines and checking them against the design instead of the opposite) to adapt the method to larger and more complex systems such as control rooms. In Case 2 participants still pointed out that it was difficult to assess guidelines (especially more general ones) against such an extensive design, and suggested that the design could be divided into smaller parts and that the assessment could then be made for each part. The discussion questions were added to the scenario-based talk through to focus the assessment on important aspects that would not necessarily be discussed during the talkthrough and to further stimulate a reflective discussion about the design as a whole. Participants in Cases 8
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
1 and 2 mentioned that the scenario-based talkthrough was a good method to focus the discussion on important aspects, for example through the added discussion questions. In Case 3, the moderator did not ask the discussion questions directly, but used them as a checklist of important aspects to guide the moderation of the discussion in the workshop. The HF specialist regarded most of the topics as important, but felt that it was not always necessary to ask them as specific ques tions. Participants in Cases 2 and 3 felt that the discussion questions must be better adapted to the design in question; not all questions were found to be applicable to the designs to be evaluated. The participants did not express any strong opinions regarding the modification to assess the severity of identified discrepancies after the evaluation workshops. After the heuristic evaluation, the HF specialists in Case 2 reported that it would be beneficial to assess the importance of new requirements as well, and not only the severity of identified dis crepancies. The HF specialist in Case 3 pointed out that the severity of single identified discrepancies must be assessed in relation to how they might affect the overall purpose of the control room system.
these cases, the methods seemingly complemented each other. In Case 3, on the other hand, the overlap was large: eight out of ten discrepancies and new requirements identified in the heuristic evaluation workshop were also identified in the scenario-based talkthrough. No clear pattern could be discerned in the content of the discrepancies and requirements identified with the different methods. One of the desired goals of the methods was that they should allow assessment early in the development process (Goal A). Participants in all three cases reported that both methods allowed the design to be reviewed early in the development process, which they felt was very beneficial. The heuristic evaluation was appreciated because it made it possible to check early on if the design was in line with requirements and guidelines. The scenario-based talkthrough was appreciated because it offered a way to review the design together with future users early in the development process. With regard to the project timeline all three cases were in the early stages of the development process, but the discrep ancies and new requirements identified still belonged to lower design levels. Identified items in Case 1 were distributed on the usage, archi tecture, and interaction levels, with emphasis on the two lowest. Iden tified items in Case 2 were distributed on all levels, but also with an emphasis on the lower levels. Case 3 showed a different pattern. Here, almost all discrepancies and new requirements belonged to the archi tecture level. No clearly discernible difference between the methods could be seen regarding this aspect. The other desired goal of the methods was that they should allow evaluation of complex sociotechnical systems such as control room systems (Goal B). The participants in the study offered no specific comments regarding this, but rated the methods’ utility high in the questionnaires and spoke positively about it in the interviews. The modifications made to the methods to make them more suitable for evaluation of control room systems were deemed mostly positive. When the participants spoke negatively about the modifications it was in terms of what could be done to further improve the methods, not that they were not feasible.
5.3. Choice of methods and desired goals One prerequisite for the chosen methods was that they should not require high-fidelity system representations. In the scenario-based talkthrough in Case 2, the operators reported that it would have been easier for them to imagine the use situation if the system representation was full-scale and included values of process parameters (these did not necessarily need to be dynamically presented, but could be given verbally by the moderator). They suggested that the full-scale print-outs of the operator interface could have been placed in the full-scale simu lator of the control room to make it easier to assess the placement in the control room. They also found it difficult to imagine what it would be like to use the design in the more extreme use scenarios, since these included severe disturbances they had never experienced themselves in real life. No comments were given regarding the systems representations in the heuristic evaluation workshops. Another prerequisite for the chosen methods was the ability to target certain categories of measures, specifically task performance, teamwork, user experience, and identification of design discrepancies. As for the identification of design discrepancies category, participants in the heu ristic evaluation workshops in both Cases 1 and 3 felt that reviewing a design against requirements and guidelines was a necessity. In the heuristic evaluations in Cases 1 and 2 participants thought it was beneficial to be “forced” to systematically review the design against consciously chosen criteria. In all three cases participants felt that the heuristic evaluation method helped them consider important aspects that they may otherwise have missed. The identification of design dis crepancies category was not as explicitly commented regarding the scenario-based talkthroughs. The scenario-based talkthrough also tar geted the categories of task performance and teamwork. The HF specialist in Case 3 noted that it was necessary to assess how well the design supports the performance of tasks. Participants in Case 2 were not as explicit, but thought that going through the scenarios was a good way to provide a common foundation for discussions in the group, and that the scenario-based talkthrough “forced” participants to review impor tant parts of the design. Participants in the scenario-based talkthroughs in Cases 2 and 3 explicitly reported that being able to hear the opinions of the future users was beneficial (user experience category). Two methods were tested in the study to be able to contrast them against each other, but using the combination of heuristic evaluation and scenario-based talkthrough was explicitly appreciated by the HF specialist in Case 3, who viewed an assessment of the operators’ ability to perform their tasks safely as a necessary complement to a review of the fulfilment of requirements and guidelines. Looking at the overlap in identified items (items identified in both the heuristic evaluation workshop and the scenario-based talkthrough workshop) in the different cases (Table 5), showed that there was little overlap in Cases 1 and 2. In
6. Discussion The purpose of this study was to test the feasibility of methods for early formative evaluation of nuclear power plant control room systems, in other words assessment of more general, high level, design decisions. When tested in practice in the three cases, the methods rendered positive feedback, both in the interviews and the questionnaire ratings. Some negative aspects were identified too, but they related to improvements rather than calling into question the feasibility of the methods. The identified discrepancies and new requirements were not found to be trivial, which indicates that the methods provided a result of value. In all, the results indicate that the validity of the methods is acceptable. However, the chosen research method (case studies) does not provide data on the reliability of the methods. The methods were only tested once in each case, with no possibility to draw conclusions regarding their ability to produce the same result if repeated. The methods presented in this paper were selected to allow assess ment of higher level design decisions. While the assessments were executed in the early stages of the development projects, the catego risation of the design level of identified discrepancies and new re quirements (Fig. 1) show that lower design levels were overrepresented in the outcome. One possible explanation for this is the nature of the changes to the existing control room system made in the three cases. In Cases 1 and 3, the control room systems were to have the same func tionality and support the same tasks but the technology to realise the functions was to be changed. In Case 1, the operator interface was to be screen-based instead of analogue (which required e.g. new desks in the control room and a shift in where information were to be shown). In Case 3, functionality was mainly relocated within the control room (e.g. the place where the shift supervisor were to stand/sit during operational tasks were moved backwards in the main control room space). For these 9
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
could indicate that a less detailed scenario type that facilitated discus sion about new work practices would have been more suitable. On the other hand, the scenarios in Case 2 included several operators in different roles who were meant to execute different tasks and cooperate. Not using task sequences could have made this exploration of parallel task sequences more difficult. In Case 3, the HF specialist explicitly stated that a simpler and less time-consuming version of the scenario-based talkthrough would have sufficed. Since the design to be evaluated affected more general movement patterns in the control room rather than detailed operator interface interactions it is possible that case stories would have sufficed. But as in Case 2, the scenarios in Case 3 involved several different operator roles executing different tasks, and forgoing task sequences would have made exploration of parallel task sequences more difficult. The reasoning above suggests that scenarios should be chosen carefully to suit the purpose and the conditions of the evaluation, and further exploring the scenario types of Andersen (2016) in the context of nuclear power plant control room systems would be interesting in future studies. Another subset of identified weaknesses in the methods related to their practical execution and how this influences the behaviour of the participants. Comments were offered regarding unclear roles and re sponsibilities of the participants, that it was difficult to assess guidelines against an extensive design concept, and that the severity of discrep ancies must be assessed in relation to their impact on the control room system’s purpose. Moderators expressed that they sometimes found it difficult to know what level of detail it was relevant to steer the dis cussions towards. The introduction given to the participants in the workshops should have been be clearer, and the moderators should have been given more support for steering the discussion during the work shops. The heuristic evaluation method should better support assessing large design concepts against guidelines. It was suggested that the methods should prompt assessment of the importance of new re quirements as well as the severity of discrepancies. However, while this might be useful for the continuing development of the control room system, it would also make the duration of the evaluation workshop longer. Regarding the practical execution of the workshops, it was noted that the participants did not adhere to the instruction to note discrepancies and new requirements on Post-it notes for later discussion instead of directly interrupting the workshop flow with comments. In addition, the moderators did not emphasis this rule to any great extents. This may have been because the moderators did not want to interrupt the work shop flow with admonitions to the participants, or they simply forgot. In Cases 1 and 2, a large proportion of the identified discrepancies and new requirements were uniquely found with one of the methods. This was not true for Case 3, which might be because the selected guidelines did not suit the design to be evaluated. One strength of a heuristic evaluation is that the guidelines steer the review to look for known typical design problems. With insufficient focus on control room layout and workstation design in the set of guidelines used it is possible that this strength was not fully utilised. The small overlap of discrep ancies and new requirements in Cases 1 and 2 can be viewed as an indication that the methods complement each other in the sense that one method is able to identify items that the other is not. It may, however, also be a consequence of the participants not being the same in the two workshops. Different individuals with different knowledge and experi ences will likely identify different discrepancies and requirements. Even though combining the methods was explicitly appreciated by one of the HF specialists, the data from the case studies is not enough to conclude that it is beneficial to combine the two different methods. The notion that involving more persons in the evaluation will likely lead to the identification of more discrepancies and requirements could be viewed as support for executing the same method multiple times with different participants. The methods’ complementary relationship may however also be examined by reviewing the participants’ statements in the interviews.
cases, it was logical that fewer discrepancies or new requirements were identified for the higher design levels since these levels had not been modified. Case 2, however, involved the implementation of new func tionality in the plant (independent core cooling). Because of this, the evaluated design involved changes to higher design levels (e.g. addition of new operational situations to handle, loss of offsite power – design level Effect; and addition of new tasks to handle these situations – design level Usage), which could explain the prevalence of identified discrep ancies and new requirements at these levels. The case studies identified some weaknesses in the methods. One subset of identified weaknesses related to the possibility to adapt the methods to the development project in question and the control room system to be evaluated. The exact purpose of the evaluation activity was not always discussed and defined in the cases, which made the desired outcome of the workshops unclear. Not knowing the exact purpose and outcome of an activity makes it more difficult for the HF specialists to plan and execute it in a way that fully utilises its potential. The HF specialists also commented on the difficulty of selecting the right guidelines, creating the right scenarios, and formulating suitable dis cussion questions. Operators participating in the scenario-based talk throughs reported that imagining the future use of the control room system was sometimes difficult, and suggested how the system repre sentation could have made it easier for them. The system representation used in an evaluation will have an impact on the design decisions that are addressed (see for example Andersen and Broberg, 2015). Consciously choosing a system representation that helps focus the par ticipants’ attention on relevant aspects of the design is thus advisable. To address these weaknesses, the methods should better support defining the purpose of the evaluation activity and adapting the method imple mentation to this as well as to the development project in question and the control room system to be evaluated. Better support is needed for choosing a suitable system representation, for guideline selection, for scenario creation, and for formulation of discussion questions. As an example, detailed support for scenario creation is given by Andersen (2016), which could be used to address that weakness iden tified in the present study. Andersen (2016) defined three types of sce narios: scenarios developed on the spot (i.e. not prepared beforehand but by the participants during the workshop), case stories (character ising work situations to be solved), and task sequences (defined tasks with the inclusion of a time factor, e.g. how long after the start of the scenario the task should be executed). It is also recommended that these scenarios are combined with predefined unexpected events to simulate situations where not everything goes according to plan. These different scenario types have different advantages and disadvantages, and are suitable during different circumstances. Scenarios developed on the spot are suitable when no resources for scenario creation are available be forehand and when it is desirable that the participants are able to in fluence the scenario creation. Case stories are suitable when resources for scenario creation are available beforehand and when it is desirable that new work practices are discussed during the workshop. Task se quences are also suitable when resources for scenario creation are available beforehand, but are more advantageous when it is of interest to let participants explore different task sequences in parallel. Comparing the scenarios used in the three cases in the present study to the categories by Andersen (2016) show that the scenarios show simi larities but are not exactly mapping to the categories. The scenarios in Case 1 are similar to simple case stories, describing overall tasks to be done but without much detail on the work situations. The HF specialists in Case 1 expressed in the interviews that the evaluation did not provide as many new requirements for the upcoming design of the digital user interface as they had hoped. It is possible that more detailed scenarios, task sequences, could have contributed better to this since a higher level of detail would have led the discussion to more specific tasks. The sce narios in Cases 2 and 3 are similar to task sequences, but without the time factor. In Case 2, the HF specialists expressed that they had hoped for more discussion on the design on tasks and decision-making, which 10
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
Participants in all cases noted that it was beneficial to systematically review the design against consciously chosen criteria in the heuristic evaluation. Some even deemed it to be a necessity. The scenario-based talkthrough, on the other hand, provided additional benefits, for instance greater ease in finding solutions to discrepancies and serving as data collection. The scenario-based talkthrough served a communicative purpose by allowing project members and users to interact, but this could be achieved in the heuristic evaluation as well by including project members and users. The scenario-based talkthrough also targeted cate gories of measures not covered by the heuristic evaluation: task per formance, teamwork, and user experience (though the latter could be addressed by including persons currently working as operators in the workshop). Using a combination of the two methods for assessment would provide the advantages of both. The complementary relation of the methods may also be studied through the ‘tactic’ used in the evaluation. The heuristic evaluation seeks to establish the existence of and locate typical problems in the design (the word ‘problem’ here denotes both discrepancies in the proposed design and new requirements). Design guidelines are knowl edge about successful design solutions presented as design advice. In a heuristic evaluation, when choosing design guidelines to use in assess ment, the type of problems to look for is also chosen. For example, choosing a guideline that advocates consistency in the design will make the evaluator search for elements of the design that are visually different but functionally alike. During this assessment, the evaluator might not as easily identify other types of problems, such as that the interface lack a possibility to mitigate erroneous actions. The goal can be expressed as finding the unknown knowns – identifying if typical problems known to be common in designs exist in this specific design, and if so, where. A risk associated with using guidelines, however, is that if they are not complete, not updated, or not appropriate for the given use situation, they may invoke a sense of false security (Hale et al., 2007). Hale et al. (2007) pointed out that safety standards (which can be regarded as design guidelines) should not be a substitute for not thinking about use situations and their challenge to design. The scenario-based talkthrough focuses on use situations, use, the consequences of that use, and finally tracing that to discrepancies in, or requirements on, the design. Through this focus the method is more capable of identifying and locating problems that are not explicitly sought. Here, instead of looking for a typical known design problem, undesirable phenomena can be identi fied (e.g. a use error such as pressing the wrong button), which can be traced back to design problems (such as buttons being too similar) without that being the specific problem that was sought. The goal can be expressed as finding the unknown unknowns – identifying design problems whose possible existence were not explicitly imagined before the evaluation. The HF specialists in all cases questioned how detailed and elaborate a formative scenario-based talkthrough actually needs to be, when used early in the development process. In a development project resources are finite, and a trade-off between efficiency and thoroughness (cf. Hollnagel, 2009) is needed. The ideal evaluation should identify all problems in a design. A scenario-based talkthrough could theoretically come close to this if all possible variations of context configurations were covered, but it would be a very resource-intensive assessment for a large and complex socio-technical system such as a control room system. One way to make the evaluation more efficient is to utilise prior knowledge in the form of design guidelines, as in a heuristic evaluation, and decrease the scope of the scenario-based talkthrough. However, with this approach thoroughness is sacrificed since problems not covered by the design guidelines will most likely not be identified. The two methods were originally included in the study to make it possible to contrast them against each other, but the combination of a heuristic evaluation and a scenario-based talkthrough may be viewed as a way to achieve an acceptable trade-off between efficiency and thoroughness. The combination of a heuristic evaluation and a scenario-based evaluation method to assess a system from a HF point of view has also
been suggested by other researchers. CRIOP (Crisis intervention and operability analysis) is a methodology developed for verification and validation of control centres in the Norwegian oil and gas industry (Johnsen et al., 2011). CRIOP has many similarities with the method combination used in the present case study, primarily the combined use of guidelines and scenarios but also the inclusion of other stakeholders than users and HF specialists, such as designers. The main difference between CRIOP and the method combination in the present study lies in how much specific direction is given by the method, particularly for the execution of the scenario-based talkthrough. CRIOP is an elaborate methodology, and it specifies in great detail how evaluation should be performed. It is a methodology that serves both formative and summa tive purposes, whereas the present study explored an approach for formative evaluation only. Compared to CRIOP, the method combina tion in this study gave less specific direction and left more freedom to the moderator to make decisions as seen fit. The scenarios for the scenario-based talkthrough were prepared prior to the workshop, shortening the duration of the evaluation workshop (in CRIOP, scenarios are developed and documented during the workshop). The scenario analysis in a CRIOP involves going through a list of questions for each event in a scenario, whereas the scenario-based talkthrough in this study only required going through the discussion questions after all scenarios were finalised. This study can be said to have explored a less resource-demanding approach to formative evaluation than CRIOP (which aims at summative evaluation as well). The overall impression from the case studies was positive, the benefit of using the methods was deemed to outweigh the costs. Some participants even wondered if parts of the approach could not be made still more efficient. The research method chosen for this study was a case study to better understand the use of the methods in practice. Testing the methods in real modernisation projects gave the opportunity to understand how the methods suit the resource limitations in development projects. It is also likely that the representatives of the users were more motivated to identify discrepancies in the design since it would have an actual impact on their work environment. The HF specialists in the development projects moderated the workshops, and it is possible that some of the problems with the scenario-based talkthrough voiced by the participants stem from them not being familiar with the exact version of the method used in the study (all HF specialists had previously used heuristic eval uation). However, learnability is an important attribute of usability (Nielsen, 1993), and the present study approach gave an opportunity to explore possible problems with learning to use the method. In the cases, most of the preparation before the workshops was done by the researcher, with the exception of the scenario-based talkthrough in Case 2, where the HF specialists prepared the scenarios. Thus the HF spe cialists’ answers regarding resources and effort needed for preparation are only estimations. While their estimations in Cases 1 and 3 did not differ from the estimations in Case 2, it is possible that preparing the workshops overall is a larger problem than was indicated in this study. This is something that should be investigated further in future studies. The cases in which the methods were tested were diverse in scope and nature. The design concept assessed in Case 1 consisted of a whole control room, in Case 2 of smaller parts of a control room as well as new control panels locally in the plant, and in Case 3 of a part of the control room. Cases 1 and 2 included changes to operator interfaces, while Case 3 did not. Cases 1 and 3 involved larger layout changes, while Case 2 only involved smaller additions to the existing layout. Case 2 included new functionality, while Cases 1 and 3 did not. This diversity suggests that the methods, and the method combination, are useful for a variety of control room system development projects. However, one limitation in the study is that all cases were executed at the same power plant, within the same company, and little can be said regarding the influence of organisational culture on the usefulness of the method combination. Additional studies are needed to assess if the methods, and the method combination in particular, are useful in other organisations or domains, where organisational barriers to formative evaluations may be higher or 11
E. Simonsen et al.
International Journal of Industrial Ergonomics 75 (2020) 102890
different in nature. One conclusion of the present study is that the combination of heu ristic evaluation and scenario-based talkthrough has advantages. The present study constitutes one iteration loop in developing a method combination for early formative assessment of control room systems. The results from the case studies identified a number of ways the methods could be improved, and further work with the method com bination should involve implementing these improvements and assess ing the modified version. One area of particular interest is exploring the method combination’s usefulness in other domains than nuclear power. Investigating the feasibility of the preparations before executing the evaluation workshops, such as preparing guidelines and scenarios, is another topic that is of interest to explore further in future studies.
Andersen, S.N., Broberg, O., 2015. Participatory ergonomics simulation of hospital work systems: the influence of simulation media on simulation outcome. Appl. Ergon. 51, 331–342. Andersson, J., Bligård, L.-O., Osvalder, A.-L., Rissanen, M.J., Tripathi, S., 2011. To develop viable human factors engineering methods for improved industrial use. In: Marcus, A. (Ed.), Design, User Experience, and Usability, Pt 1 I, Human Computer Interaction International, Orlando FL 2011, Lecture Notes in Computer Science, vol.6769. Springer, Berlin, pp. 355–362. Bligård, L.O., Simonsen, E., Berlin, C., 2016. ACD3 - a new framework for activitycentered design. Nord Design 2, 2016; 10-12 August 2016, Trondheim. Boring, R.L., Joe, J.C., Ulrich, T.A., Lew, R.T., 2014. Early-stage design and evaluation for nuclear power plant control room upgrades. In: Proceedings of the Human Factors and Ergonomics Society; 2014, pp. 1909–1913. Bradford, J.S., 1994. Evaluating high-level design: synergistic use of inspection and usability methods for evaluating early software designs. In: Nielsen, J., Mack, R.L. (Eds.), Usability Inspection Methods. John Wiley & Sons, New York, pp. 235–253. Daniellou, F., 2007. Simulating future work activity is not only a way of improving workstation design. Activit�es 4 (2). Endsley, M.R., 1987. SAGAT: A Methodology For the Measurement Of Situation Awareness. Northrop Technical Report. NOR DOC, 87–83. Endsley, M.R., 1988. Situation awareness global assessment technique (SAGAT). In: Proceedings of the National Aerospace and Electronics Conference (NAECON); 23-27 May 1988, Dayton, OH. Flyvbjerg, B., 2006. Five misunderstandings about case-study research. Qual. Inq. 12 (2), 219–245. Følstad, A., 2007. Group-based expert walkthrough. In: COST294-MAUSE 3rd - Review, Report And Refine Usability Evaluation Methods (R3UEMs); 5 March 2007a, Athens, pp. 58–60. Følstad, A., 2007. Work-domain experts as evaluators: usability inspection of domainspecific work-support systems. Int. J. Hum. Comput. Interact. 22 (3), 217–245. Hale, A., Kirwan, B., Kjell� en, U., 2007. Safe by design: where are we now? Saf. Sci. 45 (1–2), 305–327. Hart, S., Staveland, L., 1988. Development of NASA-TLX (task load index): results of empirical and theoretical research. In: Hancock, P., Meshkati, N. (Eds.), Human Mental Workload. North Holland, Amsterdam, pp. 139–183. Hendrick, H., Kleiner, B., 2001. Macroergonomics: an Introduction to Work System Design. Human Factors & Ergonomics Society, Santa Monica, California. Hollnagel, E., 2009. The ETTO Principle: Efficiency-Thoroughness Trade-Off. Why Things that Go Right Sometimes Go Wrong. Ashgate, Farnham, UK. IAEA International Nuclear Safety Advisory Group, 1999. Basic Safety Principles for Nuclear Power Plants. International Atomic Energy Agency, Vienna (INSAG: 75INSAG-3). International Electrotechnical Commission, 2009. IEC 60964:2009 Nuclear Power Plants - Control Rooms - Design. International Electrotechnical Commission, Geneva. Johnsen, S.O., Bjørkli, C., Steiro, T., Fartum, H., Haukenes, H., Ramberg, J., Skriver, J., 2011. CRIOP: a scenario method for Crisis Intervention and Operability analysis. SINTEF, Trondheim. https://www.sintef.no/globalassets/upload/teknologi_og_ samfunn/sikkerhet-og-palitelighet/prosjekter/criop/criopreport.pdf. Laarni, J., Savioja, P., Karvonen, H., Norros, L., 2011. Pre-validation of nuclear power plant control room design. In: International Conference on Engineering Psychology and Cognitive Ergonomics (EPCE 2011); 9-14 July 2011, Orlando, FL, pp. 404–413. Laarni, J., Savioja, P., Norros, L., Liinasuo, M., Karvonen, H., Wahlstr€ om, M., Salo, L., 2014. Conducting multistage HFE validations – constructing systems usability case. In: Proceedings of the ISOFIC/ISSNP 2014; 24-28 August 2014 2014, Jeju, Republic of Korea. Nielsen, J., 1993. Usability Engineering. Academic Press, San Diego. Nielsen, J., 1994. Heuristic evaluation. In: Nielsen, J., Mack, R.L. (Eds.), Usability Inspection Methods. John Wiley & Sons, New York, pp. 25–62. Papin, B., 2002. Integration of human factors requirements in the design of future plants. In: Proceedings of the Enlarged Halden Program Group Meeting; 8-13 September 2002, Storefjell. Scriven, M., 1967. The methodology of evaluation. In: Tyler, R., Gagne, R., Scriven, M. (Eds.), Perspectives on Curriculum Evaluation (AERA Monograph Series – Curriculum Evaluation). Rand McNally and Co, Chicago. Shorrock, S.T., Williams, C.A., 2016. Human factors and ergonomics methods in practice: three fundamental constraints. Theor. Issues Ergon. Sci. 17 (5–6), 468–482. Simonsen, E., 2017. A comparison of human factors evaluation approaches for nuclear power plant control room assessment and their relation to levels of design decision specificity. In: Nordic Ergonomic Society 2017 "Joy at Work" Conference Proceedings; 20-23 August 2017, Lund, pp. 405–414. Simonsen, E., Osvalder, A.-L., 2018. Categories of measures to guide choice of human factors methods for nuclear power plant control room evaluation. Saf. Sci. 102, 101–109. Virzi, R.A., Sokolov, J.L., Karis, D., 1996. Usability problem identification using both low- and high-fidelity prototypes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Common Ground; 13–18 April 1996. Vancouver, pp. 236–243. Yin, R.K., 2014. Case Study Research: Design and Methods, fifth ed. SAGE, London.
7. Conclusions The heuristic evaluation and scenario-based talkthrough methods can be used for early formative evaluation of nuclear power plant con trol room systems. The methods are not dependent on high-fidelity system representations and were found to be useful in practice when tested in three different modernisation projects in industry. Combining the methods makes it possible to take advantage of the strengths of both methods. A combination of the two methods also al lows a way to trade-off between efficiency and thoroughness in the evaluation by combining a search for typical design problems using guidelines with a use-focused approach to identify and locate problems not explicitly sought. Proposals for future work are: (1) improving the method combina tion by providing better support for adapting implementation of the methods to the development project in question and the control room system to be evaluated; (2) improving the method combination in terms of better support for the practical execution of the evaluation activity; and (3) investigating the method combinations’ usefulness in other domains than nuclear power as well as the feasibility of the preparations needed. Relevance for industry Formative human factors evaluation is an important activity in control room system development to create a design that has the intended impact when implemented. Evaluating early in the develop ment process reduces the risk of late, expensive, and potentially less optimal changes in the design. The present study tested the feasibility of methods for early formative control room system evaluation in three nuclear power plant control room modification projects. The two tested methods were found to be useful in the three cases and suitable for early formative evaluation. Combining the methods makes it possible to take advantage of the strengths of both methods. Acknowledgements The authors would like to thank all the participants in the study for sharing their time and expertise. The study presented in this paper was supported by the Swedish Radiation Safety Authority. References Andersen, S.N., 2016. Participatory Simulation in Hospital Work System Design (Dissertation). Technical University of Denmark.
12