The effect of metacognitive monitoring feedback on performance in a computer-based training simulation

The effect of metacognitive monitoring feedback on performance in a computer-based training simulation

Applied Ergonomics 67 (2018) 193–202 Contents lists available at ScienceDirect Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo ...

1MB Sizes 0 Downloads 59 Views

Applied Ergonomics 67 (2018) 193–202

Contents lists available at ScienceDirect

Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo

The effect of metacognitive monitoring feedback on performance in a computer-based training simulation

MARK

Jung Hyup Kim Department of Industrial and Manufacturing Systems Engineering, University of Missouri, Columbia, MO, 65211, USA

A R T I C L E I N F O

A B S T R A C T

Keywords: Retrospective confidence judgments Computer-based training Human-in-the-loop simulation Human performance

This laboratory experiment was designed to study the effect of metacognitive monitoring feedback on performance in a computer-based training simulation. According to prior research on metacognition, the accurate checking of learning is a critical part of improving the quality of human performance. However, only rarely have researchers studied the learning effects of the accurate checking of retrospective confidence judgments (RCJs) during a computer-based military training simulation. In this study, we provided participants feedback screens after they had completed a warning task and identification task in a radar monitoring simulation. There were two groups in this experiment. One group (group A) viewed the feedback screens with the flight path of all target aircraft and the triangular graphs of both RCJ scores and human performance together. The other group (group B) only watched the feedback screens with the flight path of all target aircraft. There was no significant difference in performance improvement between groups A and B for the warning task (Day 1: group A – 0.347, group B – 0.305; Day 2: group A – 0.488, group B – 0.413). However, the identification task yielded a significant difference in performance improvement between these groups (Day 1: group A – 0.174, group B – 0.1555; Day 2: group A – 0.324, group B – 0.199). The results show that debiasing self-judgment of the identification task produces a positive training effect on learners. The findings of this study will be beneficial for designing an advanced instructional strategy in a simulation-based training environment.

1. Introduction Computer-based training simulation has become a prevalent instructional tool for military training (Bell and Kozlowski, 2007). Researchers have developed various instructional strategies to improve the efficiency of military training systems (Vogel-Walcutt et al., 2013). Providing metacognitive prompts is one of the recent training strategies within complex military contexts (Fiore et al., 2008; Fiorella et al., 2012). In this approach, trainees are provided with prompts that calibrate their understanding of materials related to conceptual and integrated knowledge. Metacognitive prompting is very effective for trainees at the novice and journeyman levels (Vogel-Walcutt et al., 2013). Although the results of previous research are encouraging, continuing studies on this new instructional strategy are needed to obtain further empirical evidence toward improving military training efficiency. To address this need, the present study was designed to identify the learning effects of viewing retrospective confidence judgment (RCJ) resulting from metacognitive prompting and operator action performance (OAP) feedback. After participants performed training scenarios and answered RCJ probes, a feedback screen was automatically displayed on a main monitor. The experiment tested two

different feedback screens. One group (group A) viewed the feedback screens with the flight path of all target aircraft and the triangular graphs of both RCJ and OAP scores together. Another group (group B) only watched the feedback screens with the flight path of all target aircraft. 1.1. Metacognitive prompting and learning Metacognition refers to thoughts about thoughts (Flavell, 1979) or the knowledge and regulation of one's own cognition (Nelson and Narens, 1994). It is related to the ability to monitor and control our knowing (Van Overschelde, 2008). Metacognition consists of three elements: metacognitive knowledge, metacognitive monitoring, and metacognitive control. Metacognitive knowledge is defined as people's declarative/procedural knowledge about cognitive processes. It plays an important role in selecting appropriate learning strategies and managing cognitive resources. The second element, metacognitive monitoring, is the ability to make accurate judgments at the metacognition (meta-level). According to Nelson and Narens (1990), metacognitive monitoring involves the flow of information from cognition (object-level) to metacognition (meta-level). The object-level is the

E-mail address: [email protected]. http://dx.doi.org/10.1016/j.apergo.2017.10.006 Received 12 August 2016; Received in revised form 25 September 2017; Accepted 9 October 2017 0003-6870/ © 2017 Elsevier Ltd. All rights reserved.

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

view of the ongoing cognitive activities, such as attention, learning, and external objects (e.g., that thing I see is an aircraft). The meta-level is defined as the learners' understanding of the ongoing cognitive processes at the object-level. The last element, metacognitive control, can regulate the ongoing cognitive activities, such as a decision-making procedure regarding the use of new tactics to solve a difficult problem. Metacognition helps learners develop an integrated learning process related to attention to one's own behaviors, current progress toward a goal, and evaluative response to one's own performance. Recently, the importance of metacognition has received considerable empirical attention in the literature (Boekaerts et al., 2005; Dunlosky and Bjork, 2013; Hacker et al., 2009). Although the definitions of metacognition vary, they focus on two primary dimensions: awareness and regulation (Schraw, 1998; Schraw and Dennison, 1994). According to prior research on metacognition, successful learning often results from participation in the specific awareness and regulation of cognition (Azevedo, 2005; Georghiades, 2004; Hattie et al., 1996; Wang et al., 1990). Learners who are equipped with a high level of metacognitive skills are aware of their current understanding of the training material as well as their own performance. Several studies have shown that a novice trainee's lack of skills reduces his or her ability not only to do a given task correctly but also to accurately judge future performance (Bol and Hacker, 2001; Dunning et al., 2003; Klassen, 2002). These results indicate that the calibration of metacognitive monitoring is a critical component in improving the quality of performance. Different experimental studies have focused on the calibration of metacognitive monitoring (Bol et al., 2012; Chiu and Klassen, 2010). Additionally, researchers have investigated the effects of the calibration of metacognitive monitoring on performance in a computer-based military training simulation (Cuevas et al., 2004; Fiore and Vogel-Walcutt, 2010; Fiorella and Vogel-Walcutt, 2011; Fiorella et al., 2012; Kim et al., 2012; Wiltshire et al., 2014). Although calibration is a major component of the metacognitive learning model (Winne and Hadwin, 1998), there is insufficient information on how to measure the gap between trainees' knowledge and their actions and performance.

Fig. 1. Example of a perfect calibration.

anchoring-and-adjustment heuristic (Tversky and Kahneman, 1975). Hacker et al. (2000) compared students' predicted performance before they took an exam and the postdiction after the exam. The results showed that the students' postdicted scores were lower and became less overconfident than the predicted scores. This indicated that the students began with an anchor near the predicted scores and adjusted their judgment downward after the exam. The findings of Hacker et al. showed that the anchoring-and-adjustment heuristic reflects the metacognitive process of debiasing overconfidence in RCJs. The other way that debiasing techniques affect calibration is by influencing not only the trainees' judgments but also their performances. Huff and Nietfeld (2009) found that students who made a habit of calibrating retrospective confidence judgments with test performance improved their calibration accuracy and showed higher confidence on test performance than other students. Nietfeld et al. (2006) also found that feedback improved both metacognitive monitoring accuracy and performance, because the students improved both their performance also showed enhanced calibration associated with self-efficacy. According to Coutinho (2008), the relationship between performance and metacognitive judgments could be regulated by self-efficacy. He found that a person's perceived ability to achieve a successful result and his or her metacognitive judgment influence human performance. Hence, in the present study, the response-oriented technique for debiasing RCJs is used to improve performance because giving feedback about the accuracy of the trainees' RCJs can direct their self-efficacy and attention to the discrepancies between performance and confidence. The effects of RCJs have been previously tested in a computer-based training simulation. Sethumadhavan (2011) examined individuals' RCJs regarding their performance by using an air-traffic-control task. The results showed that the participants with higher confidence in their performance tended to have a better outcome and were faster in responding to system failures. However, other researchers have found that RCJs are accurate only in predicting search behavior (McCarley and Gosney, 2005; Mitchum and Kelley, 2010). For this reason, additional studies are needed to examine the relationship between RCJ and human performance. In the present work, a time-window-based humanin-the-loop (TWHITL) simulation representing an anti-air warfare coordinator (AAWC) was used as a tool to collect RCJ and human performance data in a computer-based training environment. During the experiment, the TWHITL simulation activates multiple task events, and each participant was required to carry out each event within a given time frame. The accuracy of on-time correct actions was used as a measure of the participant's performance, referred to as the operator action performance (OAP). There are two types of OAP: the first is for the warning task, and the second is for the identification task. Both tasks are discussed in detail in Section 2.2. After each training session, the participants were provided with their RCJ and performance scores. Two groups participated in the

1.2. Current study and rationale The purpose of the present study is to investigate a training effect caused by debiasing learners' RCJs during a computer-based simulation. RCJs are the metamemory judgments that play a role in the regulation of memory. It is a metacognitive monitoring metric associated with retrieval that is commonly used to measure a participant's confidence level regarding responses before he or she knows a performance result (Dougherty et al., 2005). Researchers in metacognitive monitoring have found that most RCJs are either over- or under-confident (Dunlosky and Metcalfe, 2008). Over-confidence is observed when the RCJ score is higher than the task performance. In contrast, under-confidence is found when the RCJ score is lower than the performance. When the RCJ score is equal to the performance, the result is a perfect calibration (see Fig. 1). People are often overconfident with general knowledge items (Tversky and Kahneman, 1975), a phenomenon called the overconfidence effect, or under-confident when they feel that the task is relatively easy (Gigerenzer et al., 1991). According to the existing literature, there are two techniques for debiasing overconfidence in RCJs: response-oriented modification and process-oriented modification (Keren, 1990). The response-oriented technique involves providing feedback that informs the trainees about the overconfidence of their RCJs. On the other hand, in the processoriented technique, the trainees are required to generate reasons for their answers before responding to RCJ probes (Dunlosky and Metcalfe, 2008). These debiasing techniques can affect the calibration in two different ways. First, they can influence the trainees' judgments of confidence. In this case, the trainees set an initial value and adjust from the anchor to develop a judgment of confidence. This is called the 194

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

Fig. 2. Sensor and aircraft information.

experiment. Group A viewed the feedback screen with both the flight path of all target aircraft and the triangular graphs of the RCJ and OAP scores. Group B watched the feedback screen only with the flight path of all target aircraft. We hypothesized that a participant who is continuously in the over or under-confident state during the training sessions would show a slow performance improvement, whereas a participant who acknowledges the gap between his or her self-judgment and the task performance would achieve a fast performance improvement during the computerbased simulation training. Thus, the present research tested the following hypotheses.

value = 0.386). 2.2. AAWC human-in-the-loop training simulation The anti-air warfare coordinator (AAWC) human-in-the-loop test bed, a radar monitoring simulation, was developed for use in this experiment (Kim et al., 2011). In this test, a participant must defend a battleship against hostile aircraft. Certain tasks are embedded in the simulation so that the participant can learn task-specific rules from the training exercises. Each aircraft presents specific cues to enable the participant to identify unknown aircraft. The participant is required to carry out appropriate actions within the identification and warning tasks. The details of these tasks are shown below:

• Hypothesis 1: Monitoring both the RCJ and the OAP scores (group •

A) will significantly improve the performance of participants assigned to the warning and identification tasks. Hypothesis 2: Participants with a significantly improved performance, will show a shift of mental state from over-to under-confident or vice versa after two days of training sessions.

• Identification Task (Unknown aircraft only)

2. Method



2.1. Participants This research was approved by the Institutional Review Board (IRB). Thirty undergraduate students (24 males and 6 females) between 20 and 25 years of age (M = 21.2, SD = 1.45) participated in the experiment. Before the start of the study, a pilot test was carried out with 10 undergraduate and graduate student volunteers. According to the power analysis (target power = 0.9, alpha = 0.05), the minimum sample size for each condition was 12. Data were collected from 15 participants (12 males and 3 females) per condition (group A: M = 21.12 years old, SD = 1.26; group B: 21.6 years old, SD = 1.59). To screen participants for previous experience with radar monitoring and similar military tasks, individuals who reported a history of military or radar monitoring experience were excluded. The average rating of subjects' computer experience was 3.187 (SD = 0.973) based on a scale of 1 (novice) to 5 (expert). The average score on playing a resource management video game was 2.39 (SD = 1.12) based on a scale of 1 (novice) to 5 (expert). No relationships were found between subjects' computer or video game experience and AAWC task performance. The Pearson correlation coefficient between task performance and video game experience was 0.122 (p-value = 0.507), whereas that between task performance and video game experience was 0.159 (p-

Make a primary identification of air contact (i.e., friendly or hostile) Make an AIR identification of air contact (i.e., strike aircraft, nonmilitary aircraft, airborne early warning (AEW) aircraft, or helicopter) Warning Task (Hostile or unknown aircraft only) Issue Level 1 warning at a distance of between 50 and 40 nautical miles Issue Level 2 warning at a distance of between 40 and 30 nautical miles Issue Level 3 warning at less than 30 nautical miles

If an unknown aircraft appears on the radar image, an operator starts to collect data from the screen to make a primary identification (friendly or hostile) and AIR identification, classifying the unknown aircraft as a strike, non-military aircraft, AEW aircraft, or helicopter. The simulation provides the operator with two pieces of information. The first is sensor information. If the operator clicks on the EWS button, the simulation shows the radar sensor information on the unknown aircraft. Each aircraft has a unique radar sensor. If the operator acknowledges the sensor name from the list of friendly or hostile aircraft, then the identification task can be carried out. For example, if the system reports that the unknown aircraft has an ARINC564 radar sensor (see Fig. 2), then the aircraft is identified as non-military (friendly). The second piece of information is the common aircraft profiles (see Table 1), which is used by the operator to verify the evaluated aircraft. Through the data panel at the top left corner of the radar screen (see Fig. 3 (a)), the system provides detailed information on the current 195

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

the selected aircraft. For instance, if we assume that Delta 2 is a hostile aircraft, then the operator should issue a Level 1 warning to this aircraft because the distance between Delta 2 and the battleship is 48.4 NM.

Table 1 Common aircraft profiles. Platform

Typical Altitude

Typical Speed

Commercial Airliner (Non-military) Helicopter AEW STRIKE

27,000–37,000 ft. 100–2500 ft. 1000–20,000 ft. 1000–34,000 ft.

300–450 knots 50–200 knots 200–300 knots 300–1200 knots

2.3. Procedure The experiment took approximately 4 h per participant. The participants were assigned randomly to group A or B. The theoretical framework of the feedback training process was based on the modified version of Kuhl and Goschke's feedback model (Kuhl and Goschke, 1994). The experiment consisted of two sessions: practice (Day 0: 60 min) and actual training (Day 1: 90 min, Day 2: 90 min). The practice session consisted of 5 short lessons: 1) introduction 5 min, 2) how to control the radar display 5 min, 3) how to acquire proper information 10 min, 4) how to do the identification task 10 min, and 5) how to do the warning task 5 min. During these lessons, the participants learned their mission and how to use the AAWC human-inthe-loop training simulation. After the lessons, the participants were provided with 3 practice scenarios, which took 5 min each to complete (total of 15 min = 3 × 5 min). Based on the pilot test results, the participants who completed all 3 practice scenarios were deemed ready to engage the simulation without an instructor.

altitude and speed of the hooked aircraft. For example, as shown in Fig. 3 (b), the track number 4003 aircraft Delta 2 has been identified as a strike aircraft, with altitude and speed of 11,111 ft above sea level and 555 knots, respectively. These data are consistent with the strike aircraft profile, indicating that Delta 2 has been correctly identified. However, if the speed or altitude is outside the specified range, then the data must be reevaluated by clicking on the EWS button. If an unknown or hostile aircraft approaches the battleship, and the distance between the ship and the aircraft is less than 50 nautical miles (NM), then the operator must carry out the warning task. There are three levels of warning: Level 1 (50-40 NM), Level 2 (40-30 NM), and Level 3 (less than 30NM). By using the track range on the data panel, the operator can determine the current distance between the ship and

Fig. 3. AAWC human-in-the-loop training simulation.

196

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

Fig. 4. Overall experimental procedure.

graph. In contrast, if the participant was under-confident (RCJ < OAP), the RCJ triangular graph was smaller than the OAP graph. Kim et al. (2016) carried out an experiment to evaluate whether these triangular graphs would debias the participants' RCJs in a human-in-theloop simulation environment. Their results showed that the triangular feedback could significantly influence the accuracy of the participants' judgments and their situational awareness. In the present work, the triangular graphs were used as feedback in debiasing RCJs and improving performance in a computer-based training simulation.

Table 2 Scenario sequence for the two days of training sessions. Scenario sequence

Day1

1 2 3 4 5 6

Scenario Scenario Scenario Scenario Scenario Scenario

Day2 #1 #2 #3 #4 #5 #6

→ → → → → →

#2 #1 #4 #3 #6 #5

→ → → → → →

#3 #4 #5 #6 #1 #2

Scenario Scenario Scenario Scenario Scenario Scenario

#4 #3 #4 #5 #2 #1

→ → → → → →

#5 #6 #3 #2 #3 #4

→ → → → → →

#6 #5 #6 #1 #4 #3

2.4. Measures

For the actual training session, the participants underwent two days of AAWC training. Each participant was exposed to 3 training scenarios daily (see Fig. 4). Each scenario had the same number of aircraft (4 friendly and 6 unknown aircraft) and showed similar low average NASA-TLX workload scores during the pilot test. To minimize the scenario order effect, each participant was assigned to one of 6 scenario sequences (see Table 2). Each scenario sequence was applied to five participants. At the end of each scenario, the simulation was automatically frozen and students asked for their confidence rating regarding their performance. The participants then watched two different styles of feedback based on their groups. In the present study, a one-factor (feedback type) experiment was designed with repeated-measures between subjects. The independent variable in this experiment was feedback type, which consists of two groups (A and B); and performance improvement (PI) was used as a dependent variable. Table 3 shows the details of the experimental design. The participants were exposed to feedback screens for the warning and identification tasks. To limit the effect of bias due to uneven exposure, the same exposure time (one minute per screen) was applied to all participants. Each screen showed the action responses, correct warnings, and identification of all unknown aircraft. In addition, the flight path of each unknown aircraft that appeared on the radar screen during the trial was monitored. However, the participants in group A observed only the triangular graphs of both the RCJ and the OAP scores. These symmetric triangular graphs showed a deviation between the participants' self-judgment and their actual task performance (see Fig. 5). If the participant was over-confident (RCJ > OAP), the triangular graph representing the RCJ score was bigger than the OAP

2.4.1. Retrospective confidence judgments The subjects' self-evaluation of their performance was measured with the use of RCJ probes. The RCJs measured the participants' confidence in their responses before knowing their actual task performance. In this study, self-rated RCJ scores were collected (scale: 0.01 to 1) as metacognitive monitoring. The RCJs showed participants' belief in their ability to successfully complete a given task. A score of 0.01 indicated failure to complete the warning or identification task, whereas a score of 1 represented a perfect execution of the tasks. During the practice session, the participants experienced how to respond to RCJ probes through multiple practice scenarios. However, they did not get feedback on the accuracy of their judgments. During the training session, the participants were asked the RCJ probes at the end of each trial. Below are examples of the RCJ probes:

• Warning: “How well do you think have you performed a warning task to the objects in your airspace?” • Identification: “How well do you think have you performed an identification task to the objects in your airspace?"

2.4.2. Operator Action Performance (OAP) The participants' task performance was measured by using a time window (TW), which specifies a relationship between a required task and the time interval it took to complete the task (Kim, 2014; Rothrock, 2001; Rothrock et al., 2005). The TW does not specify what action the operator must carry out, but it indicates whether or not an executed action will lead to the completion of a required task. The TW can also

Table 3 Experimental design. Independent Variable Feedback Type (between subjects)

Levels

Dependent Variables

○ Feedback for Group A 1) Participant responses, correct warnings, and identification of all unknown aircraft 2) Flight path of every unknown aircraft on the radar screen 3) Triangular graphs of both RCJ and OAP scores ○ Feedback for Group B 1) Participant responses, correct warnings, and identification of all unknown aircraft 2) Flight path of every unknown aircraft on the radar screen

197

Performance Improvement (PI) = OAP2 − OAP1

* Where OAP2 and OAP1 are the average performance on Day 2 and 1 training

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

Fig. 5. Feedback screen for Group A and B.

2.4.3. Performance Improvement (PI) The PI measures how subjects can obtain knowledge during the learning process (Ko, 2012). It represents a certain degree of achievement after engaging in a learning activity and captures the performance improvement from Day 1 to Day 2 training session. This is one of the common performance evaluation methods in educational theory and practice (Gagne, 1985). The PI for the participants in groups A and B is calculated by

Table 4 Comparisons of overall RCJ, OAP, and PI between groups A and B. Task

Group

Warning Identification

A B A B

RCJ

OAP

PI

Mean

StDev

Mean

StDev

Mean

StDev

0.3395 0.3127 0.3198 0.3105

0.0851 0.0998 0.1001 0.0974

0.4179 0.3589 0.2487 0.1915

0.2286 0.3052 0.1813 0.2014

0.141 0.108 0.150 0.018

0.256 0.270 0.236 0.260

PI = OAP2 − OAP1

where OAP1 and OAP2 are the average performance scores, respectively, for Day 1 and Day 2 training session.

classify the accuracy (correctness) and timeliness (delay time) of the operator's action (response) with respect to the task environment. The operator's decision can be broken down into four categories: on-time, incorrect, false alarm, and missed (Rothrock, 2001). An on-time action is one that is executed by the operator within the time limit specified. An incorrect action is one that is not completed within the time required by the TW. A false alarm action is an operator action that has no relevance to the TW. A missed action is one that is not executed by the operator but has relevance to the TW. If the recorded TW events are classified as on-time, then they are considered as correct actions; the other events (incorrect, false alarm, and missed) are considered as wrong actions. Hence, the OAP is calculated by using the total number of on-time correct actions carried out during the task (Kim, 2014).

OAP =

Total number of "on‐time" correct action Total Number of TW

(2)

3. Results 3.1. Descriptive statistics and ANOVA Table 4 provides the descriptive statistics for a comparison of the overall retrospective confidence judgment (RCJ), operator action performance (OAP), and performance improvement (PI) between groups A and B. The ANOVA results showed some significant differences in RCJ, OAP, and PI between the groups. For the warning task, there were no significant difference between groups A and B (RCJ: F(1,15) = 3.74, p = 0.055; OAP: F (1,15) = 2.15, p = 0.144; PI: F(1,15) = 0.37, p = 0.547). For the identification task, the OAP and PI showed significant differences between the groups (RCJ: F(1,15) = 0.39, p = 0.531; OAP: F (1,15) = 4.01, p < 0.05; PI: F(1,15) = 6.67, p < 0.05).

(1)

198

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

4.1. Performance improvement between groups A and B

Table 5 Comparison of Operator Action Performance between Groups A and B (Bold: p < 0.05). Task

Warning Identification

Group

A B A B

Day1

According to the results, Hypothesis 1 is partially confirmed. Group A showed a significant performance improvement in the identification task as compared with group B. However, there was no significant difference between the two groups in the warning task. In other words, monitoring both the RCJ and the OAP scores (a response-oriented technique) improved only the performance of the task of identifying unknown aircraft and not that of the task related to issuing different levels of warning. The results in Table 5 show that the average OAP of group A in the identification task was significantly improved on Day 2 compared with Day 1. However, there was no significant difference between groups A and B in the performance of the warning task on Day 2. This supports the point that debiasing RCJs by using the responseoriented technique (feedback training with both the RCJ and the OAP scores) can yield a positive learning experience for trainees in the identification task (but has no effect on the warning task). The reason why this feedback training influences the performance of the identification task could be explained by cognitive load theory (CLT) (Mayer and Moreno, 2003; Paas and van Gog, 2009) and the monitoring-dualmemories (MDM) hypothesis (Nelson and Dunlosky, 1991). Based on CLT, three elements contribute to the learners' transfer of training into complex skills: 1) intrinsic load, which is influenced by the number of interacting elements that the trainee must be aware of during the training; 2) germane load, which is affected by the trainee's cognitive processes related to mentally organizing the knowledge through the training; and 3) extraneous load, which is influenced by the number of non-relevant learning elements during the training. By debiasing their RCJs, the participants in group A could reduce the germane load, which is one of the critical factors in skills acquisition. By viewing the RCJ and OAP of the identification task, the participants were able to reduce training-transfer dissociation, which made it easy for them to reallocate their training efforts to improve the performance of the identification task and which led them to retain all the information related to the target aircraft. In addition, according to the MDM hypothesis, people use the items stored in both short-term and long-term memory when they make judgments of learning. For RCJ, the items stored in longterm memory have a stronger effect than those residing in short-term memory because there is a stronger connection between long-term memory and the learner's retrieval process (Dunlosky and Nelson, 1992). RCJs are among the metacognitive monitoring metrics associated with retrieval. Hence, the identification task which requires more items stored in long-term memory, such as sensor information (see Fig. 2) and aircraft profiles (see Table 1), could result in a better retrieval process during the training.

Day2

Mean

StDev

P-value

Mean

StDev

P-value

0.3473 0.3052 0.1738 0.1555

0.2074 0.2884 0.1841 0.1907

0.428

0.4884 0.4127 0.3236 0.1988

0.2292 0.3151 0.1757 0.2135

0.196

0.779

0.003

3.2. Feedback impact on means scores by two days of training sessions On Day 1 training session (see Table 5), groups A and B showed similar mean OAP scores for both the warning and the identification task (warning task: F(1,15) = 0.63, p = 0.428; identification task: F (1,15) = 0.08, p = 0.779). However, on Day 2 training session (see Fig. 6), the OAP score of group A for the identification task was significantly higher than that of group B (warning task: F(1,15) = 1.7, p = 0.196; identification task: F(1, 15) = 9.16, p < 0.01).

3.3. Calibration The calibration can be calculated by subtracting the magnitude of the test performance from the corresponding magnitude of the judgment (Dunlosky and Metcalfe, 2008). Table 6 provides the calibration results. A positive value (RCJ > OAP) indicates over-confidence, whereas a negative value (RCJ < OAP) indicates under-confidence. The results showed no calibration improvement for the warning task in both groups (group A: F(1,15) = 2.28, p = 0.054; group B: F (1,15) = 0.99, p = 0.426). However, for the identification task, the participants in group A became more calibrated (group A: F (1,15) = 2.85, p < 0.05; group B: F(1,15) = 0.17, p = 0.972).

4. Discussion The objective of this study was to explore and provide evidence of a positive training effect caused by the metacognitive monitoring feedback in a computer-based training simulation. The findings showed how the trainees' metacognitive judgments are related to the degree of calibration bias and learning outcomes. The study also found an impact of metacognitive feedback training on the task, which required a comprehensive understanding of multiple information sources to carry out correct actions.

Fig. 6. Oap line plots for groups A and B.

199

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

Table 6 Comparison of Calibration between Groups A and B (Bold: p < 0.05). Task

Warning Identification

Group

A B A B

Trial 1

Trial 2

Trial 3

Trial 4

Trial 5

Trial 6

P-value

M

SD

M

SD

M

SD

M

SD

M

SD

M

SD

0.04 0.28 0.17 0.32

0.20 0.25 0.15 0.30

0.02 0.20 0.12 0.32

0.21 0.34 0.13 0.21

-0.08 0.14 0.09 0.28

0.21 0.30 0.21 0.26

-0.15 0.09 0.08 0.28

0.27 0.38 0.20 0.22

-0.13 0.08 0.02 0.34

0.22 0.39 0.17 0.15

-0.16 0.07 -0.05 0.30

0.24 0.25 0.18 0.23

0.054 0.426 0.020 0.972

showed that the participants in group A could predict their performance more accurately than those in group B. Finally, in trial 6, the mental states of the participants shifted from over-to under-confident, providing evidence of accuracy improvement. This suggested that the participants in group A were better equipped to evaluate their on-going cognitive and metacognitive processes by using the response-oriented technique. The calibration curve also showed that debiasing overconfidence in RCJs could significantly influence the performance of the identification task. However, we found no evidence that the monitoring of RCJ and OAP scores (response-oriented technique) affected the performance of the warning task and calibration. One possible explanation for this might be that the warning task was so easy and straightforward that the participants were not tempted to have a better understanding of the warning task. Dweck and Master (2009) found that students' own intelligence could influence their motivation. During the two days of training sessions, the participants handled the feedback about RCJ and OAP lightly when they carried out the warning task, which may have led to a limited learning effect of the warning task compared to the identification task.

However, this positive learning effect of the response-oriented technique was not observed in the warning task. One possible explanation is that the warning task was much easier than the identification task, and the trainees used more items in short-term memory to carry out the task. Thus, the participants did not require a comprehensive understanding of the materials related to the warning task. In this experiment, each participant had to learn both tasks in the same way through the practice exercise on Day 0. However, each task required different skills to be carried out. For the warning task, the participants had to know how to acquire distance information from the radar screen so as to issue the appropriate warnings (no warning, or Level 1, 2, or 3). On the other hand, the participants needed to learn how to understand relevant information on the target aircraft by using sensor data and the common aircraft profiles to update both the primary identification and the AIR identification of an unknown aircraft. The identification task was thus more complicated, and felt harder, than the warning task. According to the RCJs of the participants, both groups were more often under-confident when they carried out the warning task. This supports the point that the participants felt that the warning task was easier than the identification task, given that RCJ scores often indicate that the people are under-confident when they feel that the tests are relatively easy. This phenomenon is called the hard-easy effect (Nietfeld et al., 2005; Winne, 2004; Winne and Jamieson-Noel, 2002). In contrast, the participants in both groups showed the overconfidence effect when they did the identification task. This means that the identification task was harder than the warning task. Many studies have shown that trainees are more overconfident when the given tasks are difficult to carry out (Hacker et al., 2008).

5. Conclusion and future work In this research, the effects of metacognitive monitoring feedback were determined, and their impact on performance was explored. Despite the importance of accuracy of metacognitive judgments in computer-based learning environments, the training outcomes of debiasing overconfidence have been rarely studied in the literature. The findings of the present work support that the response-oriented technique (feedback training with both the RCJ and the OAP scores) provides a positive learning experience when the task requires a comprehensive understanding of multiple information sources to carry out the correct action. The participants, who monitored the feedback with both the RCJ and task performance, were able to accelerate their learning process through debiasing RCJs and to control their on-going cognitive activities. However, we found no learning effects of the feedback on the task, which required the perception of relevant elements in the environment, such as acquiring the distance from a target aircraft to the battleship and issuing an appropriate warning based on that distance. The main contribution of the present study lies in showing the benefits of using feedback with calibration of metacognitive judgments in a computer-based training simulation. Based on the results, it can be concluded that the metacognitive monitoring feedback is capable of improving learners' metacognitive judgments more accurately. The method leads to a good situational awareness of the task, which involves integrating many pieces of data to compile information. From a practical standpoint, this study shows that metacognitive monitoring feedback is beneficial to designing an advanced instructional strategy for learners' metacognition corresponding to the situation awareness level 2, comprehension. This comprehension (Level 2 SA) involves integrating several pieces of information to achieve the learners' goals and objectives (Endsley, 1995). By debiasing overconfidence through metacognitive monitoring feedback, a premature termination of learning could be prevented in a computer-based training simulation. One of the limitations of this study is that the data collected were not analyzed toward interpreting the detailed relationship between the

4.2. Feedback and debiasing overconfidence in retrospective confidence judgments Most of the participants in group A were overconfident on Day 1. However, in the identification task, they became more calibrated on Day 2 (see Table 6). Previous studies in metacognition have supported that accurate monitoring and effective control of learning are critical to improving human performance (Dunlosky and Metcalfe, 2008). Successful trainees have more knowledge about effective strategies in carrying out their tasks, and believe that they can skillfully execute all challenges presented by demanding training content. However, most learners are often overconfident of their general knowledge of a given task (Tversky and Kahneman, 1975). If the trainees are consistently overconfident in practice, then their skills acquisition can be hindered by their biased retrospective confidence judgments. Hence, we hypothesized that participants with a significantly improved performance might show a shift of mental state from over-to under-confidence or vice versa during the experiment (Hypothesis 2). The results showed that the average difference between the RCJ and OAP scores in the identification task decreased statistically in group A. Moreover, their performance was significantly higher on Day 2 than on Day 1. The graphs of the participants' RCJ (y-values) and OAP (x-values) in the identification task (see Fig. 7) showed that the response-oriented technique influenced the group A participants' RCJ accuracy related to performance. The dotted line in the graphs represents a perfect calibration of one's own task performance (RCJ = OAP). The graphs 200

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

Fig. 7. Calibration curve for the identification task based on the RCJ and OAP data. instructional tool in simulation-based training. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. SAGE Publications, pp. 565–569. Fiorella, L., Vogel-Walcutt, J.J., Fiore, S., 2012. Differential impact of two types of metacognitive prompting provided during simulation-based training. Comput. Hum. Behav. 28, 696–702. Flavell, J., 1979. Metacognition and cognitive monitoring. Am. Psychol. 34, 906–911. Gagne, R.M., 1985. The Condition of Learning and Theory of Instruction. Rinehart and Winston, New York. Georghiades, P., 2004. From the general to the situated: three decades of metacognition. Int. J. Sci. Educ. 26, 365–383. Gigerenzer, G., Hoffrage, U., Kleinbölting, H., 1991. Probabilistic mental models: a Brunswikian theory of confidence. Psychol. Rev. 98, 506. Hacker, D.J., Bol, L., Horgan, D.D., Rakow, E.A., 2000. Test prediction and performance in a classroom context. J. Educ. Psychol. 92, 160. Hacker, D.J., Bol, L., Keener, M.C., 2008. Metacognition in Education: A Focus on Calibration. Handbook of metamemory and memory 429455. . Hacker, D.J., Dunlosky, J., Graesser, A.C., 2009. Handbook of Metacognition in Education. Routledge. Hattie, J., Biggs, J., Purdie, N., 1996. Effects of learning skills interventions on student learning: a meta-analysis. Rev. Educ. Res. 66, 99–136. Huff, J.D., Nietfeld, J.L., 2009. Using strategy instruction and confidence judgments to improve metacognitive monitoring. Metacognit. Learn. 4, 161–176. Keren, G., 1990. Cognitive aids and debiasing methods: can cognitive pills cure cognitive ills? Adv. Psychol. 68, 523–552. Kim, J., Rothrock, L., Tharanathan, A., Thiruvengada, H., 2011. Investigating the effects of metacognition in dynamic control tasks. Human-Computer Interact. Des. Dev. Approach. 378–387. Kim, J.H., 2014. Simulation Training in Self-Regulated Learning: Investigating the Effects of Dual Feedback on Dynamic Decision-Making Tasks, Learning and Collaboration Technologies. Designing and Developing Novel Learning Experiences. Springer, pp. 419–428. Kim, J.H., Macht, G.A., Li, S., 2012. Comparison of individual and team-based dynamic decision-making task (Anti-Air warfare coordinator): consideration of subjective mental workload metacognition. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. SAGE Publications, pp. 2517–2521. Kim, J.H., Rothrock, L., Tharanathan, A., 2016. Applying fuzzy linear regression to understand metacognitive judgments in a human-in-the-loop simulation environment. IEEE Trans. Human-Mach. Syst. 46, 360–369. Klassen, R., 2002. A question of calibration: a review of the self-efficacy beliefs of students with learning disabilities. Learn. Disabil. Q. 25, 88–102. Ko, W.-H., 2012. A study of the relationships among effective learning, professional competence, and learning performance in culinary field. J. Hosp. Leis. Sport & Tour. Educ. 11, 12–20. Kuhl, J., Goschke, T., 1994. A theory of action control: mental subsystems, modes of control, and volitional conflict-resolution strategies. Volition personal. Action versus state Orientat. 5, 93–124. Mayer, R.E., Moreno, R., 2003. Nine ways to reduce cognitive load in multimedia learning. Educ. Psychol. 38, 43–52. McCarley, J.S., Gosney, J., 2005. Metacognitive judgments in a simulated luggage screening task. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. SAGE Publications, pp. 1620–1624. Mitchum, A.L., Kelley, C.M., 2010. Solve the problem first: constructive solution strategies can influence the accuracy of retrospective confidence judgments. J. Exp. Psychol. Learn. Mem. Cognit. 36, 699. Nelson, T., Narens, L., 1990. Metamemory: a theoretical framework and new findings. Psychol. Learn. motivation Adv. Res. theory 26, 125–173. Nelson, T.O., Dunlosky, J., 1991. When people's judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: the “delayed-JOL effect”. Psychol.

dynamic control task and the mental model of its metacognition. Another limitation is that the experiment was carried out only among university students. Further research on learners' calibration across performance levels, tasks, and domains is clearly necessary. The small sample size also presented a limitation. Moreover, the participants experienced the AAWC simulation over a short period of only two days. Hence, future works should consider the long-term effects of metacognitive monitoring feedback in an HITL simulation. Finally, it would be useful to examine the impacts of metacognition and its relation to comprehensive understanding in different age groups and in other applications. References Azevedo, R., 2005. Using hypermedia as a metacognitive tool for enhancing student learning? The role of self-regulated learning. Educ. Psychol. 40, 199–209. Bell, B.S., Kozlowski, S.W., 2007. Advances in technology-based training. Manag. Hum. Resour. N. Am Curr. Issues Perspect. 27. Boekaerts, M., Pintrich, P.R., Zeidner, M., 2005. Handbook of Self-Regulation. Elsevier. Bol, L., Hacker, D.J., 2001. A comparison of the effects of practice tests and traditional review on performance and calibration. J. Exp. Educ. 69, 133–151. Bol, L., Hacker, D.J., Walck, C.C., Nunnery, J.A., 2012. The effects of individual or group guidelines on the calibration accuracy and achievement of high school biology students. Contemp. Educ. Psychol. 37, 280–287. Chiu, M.M., Klassen, R.M., 2010. Relations of mathematics self-concept and its calibration with mathematics achievement: cultural differences among fifteen-year-olds in 34 countries. Learn. Instr. 20, 2–17. Coutinho, S., 2008. Self-efficacy, metacognition, and performance. North Am. J. Psychol. 10, 165. Cuevas, H.M., Fiore, S.M., Bowers, C.A., Salas, E., 2004. Fostering constructive cognitive and metacognitive activity in computer-based complex task training environments. Comput. Hum. Behav. 20, 225–241. Dougherty, M.R., Scheck, P., Nelson, T.O., Narens, L., 2005. Using the past to predict the future. Mem. Cognit. 33, 1096–1115. Dunlosky, J., Bjork, R.A., 2013. Handbook of Metamemory and Memory. Psychology Press. Dunlosky, J., Metcalfe, J., 2008. Metacognition. Sage Publications. Dunlosky, J., Nelson, T.O., 1992. Importance of the kind of cue for judgments of learning (JOL) and the delayed-JOL effect. Mem. Cognit. 20, 374–380. Dunning, D., Johnson, K., Ehrlinger, J., Kruger, J., 2003. Why people fail to recognize their own incompetence. Curr. Dir. Psychol. Sci. 12, 83. Dweck, C.S., Master, A., 2009. Self-theories and motivation. Handbook of motivation at School 123. Endsley, M.R., 1995. Toward a theory of situation awareness in dynamic systems. Hum. Factors J. Hum. Factors Ergon. Soc. 37, 32–64. Fiore, S., Vogel-Walcutt, J.J., 2010. Making metacognition explicit: developing a theoretical foundation for metacognitive prompting during scenario-based training. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. SAGE Publications, pp. 2233–2237. Fiore, S.M., Hoffman, R.R., Salas, E., 2008. Learning and performance across disciplines: an epilogue for moving multidisciplinary research toward an interdisciplinary science of expertise. Mil. Psychol. 20, S155. Fiorella, L., Vogel-Walcutt, J.J., 2011. Metacognitive prompting as a generalizable

201

Applied Ergonomics 67 (2018) 193–202

J.H. Kim

Tversky, A., Kahneman, D., 1975. Judgment under Uncertainty: Heuristics and Biases, Utility, Probability, and Human Decision Making. Springer, pp. 141–162. Van Overschelde, J.P., 2008. Metacognition: knowing about knowing. Handbook of metamemory and memory 47, 71. Vogel-Walcutt, J.J., Fiorella, L., Malone, N., 2013. Instructional strategies framework for military training systems. Comput. Hum. Behav. 29, 1490–1498. Wang, M.C., Haertel, G.D., Walberg, H.J., 1990. What influences learning? A content analysis of review literature. J. Educ. Res. 84, 30–43. Wiltshire, T.J., Rosch, K., Fiorella, L., Fiore, S.M., 2014. Training for collaborative problem solving improving team process and performance through metacognitive prompting. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. SAGE Publications, pp. 1154–1158. Winne, P.H., 2004. Students' calibration of knowledge and learning processes: implications for designing powerful software learning environments. Int. J. Educ. Res. 41, 466–488. Winne, P.H., Hadwin, A.F., 1998. Studying as self-regulated learning. Metacognit. Educ. theory Pract. 93, 27–30. Winne, P.H., Jamieson-Noel, D., 2002. Exploring students' calibration of self reports about study tactics and achievement. Contemp. Educ. Psychol. 27, 551–572.

Sci. 2, 267–270. Nelson, T.O., Narens, L., 1994. Why Investigate Metacognition?. Nietfeld, J.L., Cao, L., Osborne, J.W., 2005. Metacognitive monitoring accuracy and student performance in the postsecondary classroom. J. Exp. Educ. 7–28. Nietfeld, J.L., Cao, L., Osborne, J.W., 2006. The effect of distributed monitoring exercises and feedback on performance, monitoring accuracy, and self-efficacy. Metacognit. Learn. 1, 159–179. Paas, F., van Gog, T., 2009. Principles for designing effective and efficient training of complex cognitive skills. Rev. Hum. factors Ergon. 5, 166–194. Rothrock, L., 2001. Using time windows to evaluate operator performance. Int. J. Cognit. Ergon. 5, 1–21. Rothrock, L., Harvey, C.M., Burns, J., 2005. A theoretical framework and quantitative architecture to assess team task complexity in dynamic environments. Theor. Issues Ergon. Sci. 6, 157–172. Schraw, G., 1998. Promoting general metacognitive awareness. Instr. Sci. 26, 113–125. Schraw, G., Dennison, R.S., 1994. Assessing metacognitive awareness. Contemp. Educ. Psychol. 19, 460–475. Sethumadhavan, A., 2011. Knowing what you know the role of meta-situation awareness in predicting situation awareness. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Sage Publications, pp. 360–364.

202